Standard deviation

The standard deviation has mathematical properties which make it, like the mean, much more useful in higher-order statistics. The procedure for standard devia-tion involves summing squared deviadevia-tions (producing a value variously called the sum of squared deviations, sum of squares, and statistically, ΣΣΣΣΣx², which is a fundamental component of many statistical procedures) in order to eliminate negative values. The pathway to standard deviation moves from deviations to the sum of squares to variance to standard deviation.

We’ll look at two ways to compute sum of squares. The first, called the deviation method, clearly illustrates what standard deviation means. The second, called the raw score method, is easier to use. Both procedures result in the same value for sum of squares.

Deviation Method

Compute deviations of all scores from the mean. Square all deviations (x²) and sum them (Σx²) as follows

score - mean = deviation squared

10 - 30 -20 400

20 - 30 -10 100

30 - 30 0 0

40 - 30 10 100

50 - 30 20 400

Σx=0 Σ Σ Σ Σ Σx²= 1000

Large groups will have a larger sum of squares than small groups, simply because there are more deviations in a large group. Dividing by N eliminates size of group from the result. This gives a truer picture of spreadedness in a group of num-bers no matter how many are in the group. Divide the sum of squares by N in order to factor out the variable of group size. The resulting value is called the variance of the scores, and is symbolized by the lower case Greek letter sigma (σσσσσ).

Variance (σ(σ(σ(σ(σ²) = ΣΣΣΣΣ^x² / N = 1000/5 = 200.0

Since we squared deviations before adding them, variance measures variability in squared units. It would be better if score variability were in the same unit of measure as the scores themselves. We can "undo" the squaring by taking the square root (/////)¹ of the variance, like this:

Standard Deviation (σσσσσ)= /////σσσσσ²= /////200 = 14.14

The number 14.14 represents the standardized measure of variability for our example. This number represents, in the same unit of measure as our scores, the degree of spread-out-ness of the scores from the mean. The larger the number, the greater the spread. It is useful in comparing the variabilities in different groups of scores, but will become more meaningful in future statistical procedures.

This deviation method shows you exactly what a "standard deviation" is, and is fine to use when you have a few scores and a whole number mean. But if you have a large data set, and the mean is a fraction, like 73.031, computing individual deviation scores, squaring them, and then summing them can be painfully tedious. A simpler way to compute the sum of squares -- and get the very same result -- is to use the raw score formula.

Raw Score Method

The raw score method uses the squares of each raw score (rather than squares of deviation scores) to produce the sum of squares. The raw score formula for sum of squares is:

ΣΣΣΣΣx^{2 =}ΣΣΣΣΣX²- (ΣΣΣΣΣX)²/N

where ΣΣΣΣΣX²refers to the sum of squared-raw-scores (square all the scores and sum them) and (ΣΣΣΣΣX)² refers to the sum-of-all-scores squared (sum the scores and then square the sum). Let's apply this formula to the same data that we used under the deviation method. We should get the same answer: Σx² = 1000.

1The "square root" symbol actually looks like this: ox but is difficult to produce within the text. So I am using the simpler (/x) symbol. Later, with more complicated formulas, I will use graphical characters to indicate square root.

Warning:

There is a great difference between ΣΣΣΣΣX² and ΣΣΣΣΣx². Do not confuse the two!

Data Reduction: Central Tendency and Variation Chapter 16

As you can see, both methods give sum of squares values of 1000. The raw score method is easier to do and less prone to arithmetic errors.

Equal Means, Unequal Standard Deviations

Let’s say I have two groups of scores. The first group consists of scores 1, 3, 5, 9, 11 and 13. We’ll use the letter “X” to refer to them. The second set of scores are 5, 6, 8 and 9. We’ll use the letter “Y” to refer to them. I’ve put them on a scale below like this:

Notice that the means of the two groups are equal. But the degree of scatter (variability) among the scores is not. Let’s compute the standard deviations of both groups to compare them. Which group should have the larger standard deviation?²

Using the deviation method, we calculate the sum of squares of X as follows:

i X_i x_i x_i²

1 1 -6 36

2 3 -4 16

3 5 -2 4

4 9 2 4

5 11 4 16

6 13 6 36

N = 6 ΣΣΣΣΣX = 42 ΣΣΣΣΣx = 0 ΣΣΣΣΣx² = 112 The variance of group X equals Σx²/N = 112/6 = 18.66.

The standard deviation is the square root of variance, or /18.66 = 4.32.

Using the raw score method, we calculate the sum of squares for Group X as follows:

i X_i X_i²

1 1 1

2 3 9

3 5 25

4 9 81

5 11 121

6 13 169

n = 6 ΣΣΣΣΣX = 42 ΣΣΣΣΣX²= 406

ΣΣΣΣΣx² = ΣX²- (ΣX)²/N = 406 - (42)²/6 = 406 - 294 = 112 We get the same result, 112, with either method.

2 Did you say the X's? Good. You can see from the graph that the X's are spread out more than the Y's (another way of saying this is that the range of X is greater than the range of Y). We would expect the X's

Now let’s compute variance and standard deviation for Group Y, which should produce a smaller sum of squares, variance and standard deviation than Group X did.

Here’s the deviation method:

i Y_i y_i y_i²

1 5 -2 4

2 6 -1 1

3 8 1 1

4 9 2 4

N = 4 ΣΣΣΣΣY = 28 ΣΣΣΣΣy = 0 ΣΣΣΣΣy² = 10 The variance of group Y equals Σy²/N = 10/4 = 2.5.

The standard deviation is the square root of variance, or /2.5 = 1.58.

Using the raw score method, we calculate the sum of squares for Group 2 as follows:

i Y_i Y_i²

1 1 1

2 6 36

3 8 64

4 9 81

n = 4 ΣΣΣΣΣY = 28 ΣΣΣΣΣY²= 206

ΣΣΣΣΣy² = ΣY²- (ΣY)²/N = 206 - (28)²/4 = 206 - 196 = 10

Again, we get the same result, 10, with either method. Since the sum of squares equals 10, variance equals 2.5 and standard deviation 1.58, as calculated above. The deviation method illustrates the meaning of standard deviation, the raw score method gives the same result more simply.

We have computed the standard deviation for both groups of scores. The groups have identical means, but different “spreads.” We expected that the scores of Group X would have the larger standard deviation than Group Y because of its larger spread of the scores. Calculations demonstrated a standard deviation of 4.32 in Group X and of 1.58 in Group Y, confirming our expectation.

In document Research Design and Statistical Analysis - 4th Ed 2006 (Page 182-185)