• No results found

Intervention Study Locating and Appraising

the Results of a Research Study

Tables have it all! The results of intervention research studies are summarized in tables. A helpful habit to develop is to read

through the tables in a paper and reflect on the information contained in them before trying to understand the text in the results section. The title of a table indicates what will be included in the main part of the table. Figure 4.2 illustrates a typical table describing the demographic and clinical charac-teristics of the study sample for each group.1

Data are summarized using descriptive statistics. Descrip-tive statistics give you an overall impression of the typical val-ues for the group as well as the variability within and between the groups.2,3First, look at the demographic data. Compare the control and the two training groups on average age, height, weight, and body mass index (see Fig. 4.2, Note 4.1). The numerical values within the table are reported as means and standard deviations for each characteristic and each group. The details for the units of expression for each characteristic are added under the table as a table footnote (Fig. 4.2, Note 4.2).

A Applicability Quality

Results

Clinical Bottom Line

B C D

Clinical Questions Search

Appraisal:

Intervention Studies

Integrate Evaluate Step

Step2 1

Step3

Step4

Step5

Descriptive statistics

A. MEASURES OFCENTRALTENDENCY:Measures of central tendency are measures of the “average” or “most typical” and are the most widely used statistical description of data.

Many variables that are measured in rehabilitation research fall into a normal (bell-shaped) curve in the normal population (Fig. 4.3A). The logic of a normal distribution is based on repeated measures in a large sample of people. When variables such as weight or range of motion are measured in a large sample of people, the distribution of these variables takes on a bell shape.

Non-normally distributed data (skewed) (Fig. 4.3B) is com-mon with clinical populations. It is expected that the subjects in a research study on people with disease or injury may not have the typical distribution of values that are seen in a healthy population.

The results of measurement are reported using terms that are associated with specific types of distributions. For example, mean

and standard deviation are typically reported if a variable has a normal (bell-shaped) distribution. Median and mode are typically reported for variables with a non-normal (skewed) distribution.

1. Mean – the arithmetic average – the mean of a set of observations is simply their sum, divided by the number of observations.

2. Median – the median is the 50th percentile of a distribution – the point below which half of the observations fall.

3. Mode – the mode is the most frequently occurring observation – the most popular score of a class of scores.

B. MEASURES OFVARIABILITY ORDISPERSION: Measures of variability reflect the degree of spread or dispersion that charac-terizes a group of scores and the degree to which a set of scores differs from some measure of central tendency.

D I G G I N G D E E P E R 4 . 1

C H A P T E R 4 Critically Appraise the Results of an Intervention Research Study 47 Bottom Line of Intervention Study

B C D

Identify the need for information and develop a focused and searchable clinical question.

Conduct a search to find the best possible research evidence to answer your question.

Critically appraise the research evidence for applicability and quality: Intervention Studies.

Integrate the critically appraised research evidence with clinical expertise and the patient’s values and circumstances.

Evaluate the effectiveness and efficacy of your efforts in Steps 1– 4 and identify ways to improve them in the future.

Step

1

1. Range – the range is the difference between the highest and lowest scores in a distribution.

2. Standard deviation – the standard deviation is the most com-monly used measure of variability. The standard deviation is the average amount that each of the individual scores varies from the mean of the set of scores.

C. EXAMPLE CLINICAL VALUES: What is the average range of motion of the subjects before treatment?

The arithmetic mean, the sum of values of range of motion divided by the number of values of range of motion, would answer this question. When it is associated with a normal distri-bution, then it occurs at the middle of the distridistri-bution, with half of the scores above and half below. With a normal distribution

you also know that approximately 64% of the sample is 1 stan-dard deviation (SD) either above or below the mean, and approxi-mately 92% will be 2 SD above or below the mean. But the mean value is only one number. The mean may adequately represent the values of range of motion for a sample, or it may be less adequate.

To further understand the characteristics of this sample, you need to ask another question.

What is the variability in the range of motion of the subjects be-fore treatment?

The SD gives a measure of the variability in a sample. It is the square root of the variance. The variance is also a measure of the variability in a sample. If you assume a variable that is normally distributed, then each increment of the SD captures a known per-centage of your sample.

D I G G I N G D E E P E R 4 . 1 — cont’d

Table 1. Demographic and Clinical Data of the Patients in the Intervention and Control Groups*

Duration of neck pain, y

Short depression inventory score‡

Grip strength, right hand, N§

Grip strength, left hand, N§

Maximum oxygen uptaken, mL/kg per min Smoking, No. (%)

Abbreviation: N, Newton, which is a measure of force.

*Data are presented as mean (SD) unless otherwise indicated.

†Body mass index is calculated as weight in kilograms divided by the square of height in meters.

‡Mood is assessed on a theoretical range of 1 to 21, with a lower score indicating a better mood.

§Grip strength was measured using a hand-held Jamar grip-strength device while the participant was in a seated position and the elbow supported in a right angle.

46 Means and standard deviations for age are listed by group. Grip strength has a wide variability among groups Note 4.2

Units of measure and comments

F I G U R E 4 . 2Typical table describing the demographic and clinical characteristics of the study sample for each group. From: Ylinen J, Takala EP, Nykanen M, et al. Active neck muscle training in the treatment of chronic neck pain in women: a randomized controlled trial. JAMA. 2003;289:2509–2516; with permission.

1716_Ch04_045-058 30/04/12 2:26 PM Page 47

Appraisal Questions

The explanatory sections that follow include questions to con-sider when appraising the quality of an intervention study. Five questions are included that form Table 4.5. Combining the checklists found in Tables 3.3 and 4.5 gives a complete list of questions to use when appraising the applicability an interven-tion study (Appendix).

QUESTION 1:Were the groups similar at baseline?

The groups should be as similar as possible at the beginning of an intervention study, that is, before treatment. Randomization of subjects should ensure that groups are at least similar if not exactly the same. But how similar should they be? What if they are “somewhat” different? How different can samples be at the start of a study and still be acceptable? Now look at the “Hypo-thetical Ages” (see Fig. 4.2, Note 4.1). There is a 10-year mean age difference between the control and the strength groups. Are these ages similar “enough,” or could the difference in age con-tribute somehow to the subjects’ responses to treatment and thus to the results?

Recall that you want to be as certain as possible that the re-sults of a study are due to the intervention and not to other sources

of bias (sometimes referred to as error). Group differences at the beginning of a study are potentially a source of errors that could contribute to the results of a study and thus reduce your certainty that the intervention led to the results. Groups may have different values on the pre-intervention characteristics because:

nGroups really are different and randomization was not successful

nThey have been measured differently by different people

nThe instruments or tests used introduce error

Grip strength reported in Figure 4.2, Note 4.3, was measured with a dynamometer, a device that when squeezed registers the amount of force generated. The Strength Group started the study with a mean value that was 20 newtons greater than that of the Control Group. Is this a “real” difference or the result of bias in-troduced by the dynamometer or by the people measuring? First, examine the errors that you can make as measuring physical therapists (Fig. 4.4). If you measure strength twice in a patient, you expect to have some variability in your measures. This vari-ability can be termed error because you are not replicating your measure exactly. Accept that the measures cannot be exactly the same, but how much variability (error) is acceptable? If you ex-pect that the treatment will improve strength, but your repeated measures are not the same, then how can you know what is measurement error (variability) and what is an effect from treat-ment? If you expect benefits from treatment to be small, then any error can confound the results. The reliability of measures is the ability of people and instruments to produce consistent values over time. The tests and devices used and the measuring physical therapist should reliably reproduce measurements.

Reliability of Testers and Participants

QUESTION 2:Were outcome measures reliable and valid?

To have high-quality data and results from a study, the data must be valid and reliable. Reliable data are reproducible by one or more people (therapists) and are stable within a defined time period. Intra-rater reliability is the repeatability of a measure by the same therapist on the same patient at two or more time points. For example, if you measure a patient’s knee flexion range of motion as 120 degrees, you should be able to obtain that same score of 120 degrees the second and third time. Inter-rater reliability is the repeatability between two or more therapists measuring the same patient.

Both therapists should obtain the score of 120 degrees from the example above. The therapists who are testing study par-ticipants and the tools and instruments they use must be reliable (Fig. 4.4).

Patients also introduce variability into measurements. This variability may be the result of the natural fluctuations of the disease or normal physiological processes, day/night changes,

Mean, Median, Mode

Mean Median

Mode

F I G U R E 4 . 3(A) Normal distribution; (B) non-normal (skewed) distribution.

A

B

emotional state, state of alertness, and the myriad of other factors that influence human performance. Repeated measures of the variables of interest, termed intra-subject variability (for example, before an intervention begins) provide a quantitative method for establishing the true range of the variable. One pre-intervention measurement, similar to one mean value, has limited value without an understanding of the normal variability of what one is measuring. If intra-subject variability is high, it might mean that what you are measuring is high variability OR your measur-ing technique needs to improve and become more reliable.

Reliability of Measures is Determined by Instruments and Machines

The dynamometer in the example would need to be tested for measurement accuracy and reproducibility over time. A valid

instrument measures the constructs it was intended to measure.

A reliable instrument is one that yields repeatable results when administered correctly by reliable testers. Studies that investi-gate test and instrument reliability are presented in detail in Chapter 10.

Reliability within and between people is typically ex-pressed by association (correlation) of the values obtained in repeated measurements. Both intra-rater and inter-rater relia-bility should have high values of association, that is, values should be similar. The type of correlation that is performed depends on the measurement scale that is used, as the scale determines the form of the data.

Reliability should be established within a research study, and it should be high. Studies have higher validity when high reliability has been established between the raters actually conducting the study. Authors may cite other research in which reliability has been established, but this is not as valid as conducting reliability testing among raters actually performing the tests. The fact that two or more raters can achieve high reliability does not support the fact that the testers in the study you are appraising were similarly reliable.

Methods to Analyze Reliability

The Pearson product-moment coefficient of correlation is sometimes used to report reliability.2 Figure 4.5 illustrates why this is an insufficient measure of the reliability of mea -sures. The Pearson correlation between these two testers is a perfect 1.0; however, the actual measures are very different.

The range-of-motion values recorded by Physical Therapist 1 change in the same way as the values recorded by Physical Therapist 2, but the absolute values in the two series are not the same. A measure such as the intraclass correlation coeffi-cient (ICC) takes into account both the nature of the change C H A P T E R 4 Critically Appraise the Results of an Intervention Research Study 49

The same physical therapist repeatedly reproduces the same score of a

measurement on the same person and over time.

Different physical therapists repeatedly reproduce the same score of a

measurement on the same person.

Measures from the same person should remain stable over repeated measures in a short period of time.

The Person or Persons Completing the Measures Intra-rater reliability

1 1

The Person Who is Measured Intra-subject reliability Inter-rater reliability

A A

Therapist

Patient

1 2

A A

Therapist

Patient

1 1 1

A A A Therapist

Patient

F I G U R E 4 . 4Multiple forms of reliability. F I G U R E 4 . 5Range-of-motion values from two physical therapists.

Measurements

1 2 3 4 5

Physical Therapist 1 Physical Therapist 2

Degrees

1716_Ch04_045-058 30/04/12 2:26 PM Page 49

and the absolute values and thus is a preferred statistic to eval-uate the reliability of continuous data (ratio or interval).

There are various statistical methods to compute reliability for both instruments and people (Table 4.1). The choice of sta-tistic reflects the type of data that is analyzed.

Appraising Statistical Results

Tables are also used to display the result of interventions. Reading through tables before reading the results section can give you an overall impression as to the study outcomes. The table in Figure 4.6 is from a study by Santamato et al5comparing two interventions for the treatment of subacromial impingement syn-drome. The first three columns of the table list the tests, the base-line (before treatment) scores, and post-treatment scores on the tests used in the study for each of the two intervention groups.

These results are expressed as means and standard deviations for each of the treatment groups (HILT and US Therapy).

QUESTION 3:Were confidence intervals reported?

The fifth column of the table in Figure 4.6 lists the mean dif-ferences between the two treatment groups. In parentheses are the confidence intervals (CI) for the means. A CI is a range of values that includes the real (or true) mean.3,6Means are esti-mates of the true values that would be expected from a popu-lation of people who are treated and measured. In all research studies, only samples from the total population of possible participants are measured; these samples provide estimates of what might be expected from the larger group. You want an idea of how accurately the means represent the population Scales of measurement and types of data

RATIOSCALES: Variables on a ratio scale are ordered precisely and continuously. The measured intervals on the scale are equal. A value of 4 on a ratio scale is twice the value of 2 on the same scale.

Range of motion as measured with a goniometer is on a ratio scale.

Ratio level variables do not have a meaningful zero value.

INTERVALSCALES: Variables on an interval scale are also mea -sured precisely and share the properties of equal intervals as with ratio scales. Temperature recorded on a Fahrenheit or Celsius scale is the typical example of an interval scale. Interval scales have a meaningful zero value in that temperature can have a value below zero. Meaningful measures in physical therapy typically do not use interval scales.

D I G G I N G D E E P E R 4 . 2