Standard Deviation and Variance - Reading Statistics Huck

Two additional indices of dispersion, the standard deviation and the variance, are usually better indices of dispersion than are the ﬁrst three measures of variability we considered. This is because these two measures are each based on all of the

Q₁, Q₃ (Q₃ - Q12>2.

scores (and not just the high and low scores or the upper and lower quartile points).

The standard deviation is determined by (1) ﬁguring how much each score deviates from the mean and (2) putting these deviation scores into a computational formula.

The variance is found by squaring the value of the standard deviation.

In reporting their standard deviations, authors may use the abbreviation SD, the symbol s or or simply write out the word sigma. Occasionally, authors report the standard deviation using a plus/minus format—for example,

where the ﬁrst number (14.83) stands for the mean and the second number (2.51) stands for the standard deviation. The variance, being the square of the standard deviation, is symbolized as

Excerpts 2.18 and 2.19 illustrate two of the ways researchers indicate the nu-merical value of the standard deviation. In the ﬁrst of these, the abbreviation SD is used, whereas in the second the plus/minus format is used. These two formats for presenting the standard deviation are often seen in research reports.

Excerpt 2.20 shows how information on the standard deviation can be included in a table. In this excerpt, each row of numbers corresponds to a different variable considered important within the researchers’ study. In this table, notice that the abbreviation SD is used to represent the term standard deviation.

s²or s².

14.83 ; 2.51, s,

EXCERPT 2.17

• Box Plots

Source: Tilson, J. K. (2010). Validation of the Modiﬁed Fresno Test: Assessing physical thera-pists’ evidence based practice knowledge and skills. BMC Medical Education, 10(38), 1–9.

EBP-expert Faculty

EBP-trained Students

EBP-novice Students

Modified Fresno Test Score

0 50 100 150 200

FIGURE 1 Modiﬁed Fresno Test scores by group.

Box and whisker plot of modiﬁed Fresno Test scores for EBP–novice PT Students (n 31), EBP-trained PT Students (n 50), and EBP–expert PT Faculty (n 27).

The central box spans from the lower to the upper quartile, the middle line represents the median, the “” sign represents the mean, the whiskers extend from the 10th percentile to the 90th percentile of scores.

Although the standard deviation appears in research reports far more often than does any other measure of variability, a few researchers choose to describe the dispersion in their data sets by reporting the variance. Excerpt 2.21 is a case in point. This content of this excerpt illustrates nicely the danger of considering only measures of central tendency. The means made the boys and girls appear to be sim-ilar in terms of reading scores. However, the two groups differ in terms of how dis-persed their scores are.

Before concluding our discussion of the standard deviation and variance, I offer this helpful hint concerning how to make sense out of these two indices of variability. Simply stated, I suggest using an article’s reported standard deviation (or variance) to estimate what the range of scores probably was. Because the range EXCERPTS 2.18–2.19

• Reporting on the Standard Deviation

The mean age of the HC group was 69.79 years

Source: Sapir, S., Ramig, L. O., Spielman, J. L., & Fox, C. (2010). Formant centralization ratio:

A proposal for a new acoustic measure of dysarthric speech. Journal of Speech, Language &

Hearing Research, 53(1), 114–125.

The age of the participants ranged from 28 to 73 years, with a mean age of 49.8 years.

Source: Faris, J. A., Douglas, K. K., Maples, D. C., Berg, L. R., & Thrailkill, A. (2010). Job satisfaction of advanced practice nurses in the Veterans Health Administration. Journal of the American Academy of Nurse Practitioners, 22(1), 35–44.

(;7.9)

(SD = 7.51).

EXCERPT 2.20

• Reporting the Standard Deviation in a Table

TABLE 2 Sample size, mean, standard deviation, and range for AVI-SOS clientele descriptive variables

Variable N Mean SD Range

Age 105 41.6 8.5 19–61

Years lived in Victoria 103 17.3 13.4 0–55

Number of places slept last week 105 2.5 2.0 1–7

Years needle exchange client 105 7.2 5.3 0–19

Source: Exner, H., Gibson, E. K., Stone, R., Lindquist, J., Cowen, L., & Roth, E. A. (2009).

Worry as a window into the lives of people who use injection drugs: A factor analytic approach.

Harm Reduction Journal, 6(20), 1–6.

is such a simple concept, the standard deviation or variance can be demystiﬁed by converting it into an estimated range.

To make a standard deviation interpretable, just multiply the reported value of this measure of variability by about 4 to obtain your guess as to what the range of the scores most likely was. Using 4 as the multiplier, this rule of thumb tells you to guess that the range is equal to 20 for a set of scores in which the standard devia-tion is equal to 5. (If the research report indicates that the variance is equal to 9, you ﬁrst take the square root of 9 to get the standard deviation, and then you multiply by 4 to arrive at a guess that the range is equal to 12.)

When giving you this rule of thumb, I said that you should multiply the stan-dard deviation by “about 4.” To guess more accurately what the range most likely was in a researcher’s data set, your multiplier sometimes must be a bit smaller or larger than 4 because the multiplier number must be adjusted on the basis of the number of scores on which the standard deviation is based. If there are 25 or so scores, use 4. If N is near 100, multiply the standard deviation by 5. If N is gigan-tic, multiply by 6. With small Ns, use a multiplier that is smaller than 4. With 10–20 scores in the group, multiplying by 3 works fairly well; when N is smaller than 10, setting the multiplier equal to 2 usually produces a good guess as to range.

It may strike you as somewhat silly to be guessing the range based on the stan-dard deviation. If researchers regularly included the values of the stanstan-dard devia-tion and the range when summarizing their data (as was done in Excerpt 2.20), there would be no need to make a guess as to the size of R. Unfortunately, most researchers present only the standard deviation—and by itself, a standard deviation provides little insight into the degree of variability within a set of scores.

One ﬁnal comment is in order regarding this technique of using SD to guess R. What you get is nothing more than a rough approximation, and you should not expect your guess of R to “hit the nail on the head.” Using the standard deviation and range presented in Excerpt 2.20 (and using a multiplier of 5 because the N is about 100), we see that our guess of R is never perfect for any of the four rows of numbers in the excerpt. However, each of our four guesses turns out to approximate well the actual range, and it helps us understand how much spread is in a data set if only the standard deviation is presented.

EXCERPT 2.21

• Using the Variance to Measure Dispersion

To explore possible gender differences in reading performance, we analysed data from [two samples]. Although the difference between the average [reading] scores of males and females in these two samples was very small, the variance of reading performance was signiﬁcantly greater for males in both groups.

Source: Hawke, J. L., Olson, R. K., Willcut, E. G., Wadsworth, S. J., & DeFries, J. C. (2009).

Gender ratios for reading difﬁculties. Dyslexia, 15(3), 239–242.

In document Reading Statistics Huck (Page 61-65)