Standard deviation
The standard deviation has mathematical properties which make it, like the mean, much more useful in higher-order statistics. The procedure for standard devia-tion involves summing squared deviadevia-tions (producing a value variously called the sum of squared deviations, sum of squares, and statistically, ΣΣΣΣΣx2, which is a fundamental component of many statistical procedures) in order to eliminate negative values. The pathway to standard deviation moves from deviations to the sum of squares to variance to standard deviation.
We’ll look at two ways to compute sum of squares. The first, called the deviation method, clearly illustrates what standard deviation means. The second, called the raw score method, is easier to use. Both procedures result in the same value for sum of squares.
Deviation Method
Compute deviations of all scores from the mean. Square all deviations (x2) and sum them (Σx2) as follows
score - mean = deviation squared
10 - 30 -20 400
20 - 30 -10 100
30 - 30 0 0
40 - 30 10 100
50 - 30 20 400
Σx=0 Σ Σ Σ Σ Σx2= 1000
Large groups will have a larger sum of squares than small groups, simply because there are more deviations in a large group. Dividing by N eliminates size of group from the result. This gives a truer picture of spreadedness in a group of num-bers no matter how many are in the group. Divide the sum of squares by N in order to factor out the variable of group size. The resulting value is called the variance of the scores, and is symbolized by the lower case Greek letter sigma (σσσσσ).
Variance (σ(σ(σ(σ(σ2) = ΣΣΣΣΣx2 / N = 1000/5 = 200.0
Since we squared deviations before adding them, variance measures variability in squared units. It would be better if score variability were in the same unit of measure as the scores themselves. We can "undo" the squaring by taking the square root (/////)1 of the variance, like this:
Standard Deviation (σσσσσ)= /////σσσσσ2 = /////200 = 14.14
The number 14.14 represents the standardized measure of variability for our example. This number represents, in the same unit of measure as our scores, the degree of spread-out-ness of the scores from the mean. The larger the number, the greater the spread. It is useful in comparing the variabilities in different groups of scores, but will become more meaningful in future statistical procedures.
This deviation method shows you exactly what a "standard deviation" is, and is fine to use when you have a few scores and a whole number mean. But if you have a large data set, and the mean is a fraction, like 73.031, computing individual deviation scores, squaring them, and then summing them can be painfully tedious. A simpler way to compute the sum of squares -- and get the very same result -- is to use the raw score formula.
Raw Score Method
The raw score method uses the squares of each raw score (rather than squares of deviation scores) to produce the sum of squares. The raw score formula for sum of squares is:
ΣΣΣΣΣx2 = ΣΣΣΣΣX2 - (ΣΣΣΣΣX)2/N
where ΣΣΣΣΣX2 refers to the sum of squared-raw-scores (square all the scores and sum them) and (ΣΣΣΣΣX)2 refers to the sum-of-all-scores squared (sum the scores and then square the sum). Let's apply this formula to the same data that we used under the deviation method. We should get the same answer: Σx2 = 1000.
1The "square root" symbol actually looks like this: ox but is difficult to produce within the text. So I am using the simpler (/x) symbol. Later, with more complicated formulas, I will use graphical characters to indicate square root.
Warning:
There is a great difference between ΣΣΣΣΣX2 and ΣΣΣΣΣx2. Do not confuse the two!
Data Reduction: Central Tendency and Variation Chapter 16
As you can see, both methods give sum of squares values of 1000. The raw score method is easier to do and less prone to arithmetic errors.
Equal Means, Unequal Standard Deviations
Let’s say I have two groups of scores. The first group consists of scores 1, 3, 5, 9, 11 and 13. We’ll use the letter “X” to refer to them. The second set of scores are 5, 6, 8 and 9. We’ll use the letter “Y” to refer to them. I’ve put them on a scale below like this:
Notice that the means of the two groups are equal. But the degree of scatter (variability) among the scores is not. Let’s compute the standard deviations of both groups to compare them. Which group should have the larger standard deviation?2
Using the deviation method, we calculate the sum of squares of X as follows:
i Xi xi xi2
1 1 -6 36
2 3 -4 16
3 5 -2 4
4 9 2 4
5 11 4 16
6 13 6 36
N = 6 ΣΣΣΣΣX = 42 ΣΣΣΣΣx = 0 ΣΣΣΣΣx2 = 112 The variance of group X equals Σx2/N = 112/6 = 18.66.
The standard deviation is the square root of variance, or /18.66 = 4.32.
Using the raw score method, we calculate the sum of squares for Group X as follows:
i Xi Xi2
1 1 1
2 3 9
3 5 25
4 9 81
5 11 121
6 13 169
n = 6 ΣΣΣΣΣX = 42 ΣΣΣΣΣX2 = 406
ΣΣΣΣΣx2 = ΣX2 - (ΣX)2/N = 406 - (42)2/6 = 406 - 294 = 112 We get the same result, 112, with either method.
2 Did you say the X's? Good. You can see from the graph that the X's are spread out more than the Y's (another way of saying this is that the range of X is greater than the range of Y). We would expect the X's
Now let’s compute variance and standard deviation for Group Y, which should produce a smaller sum of squares, variance and standard deviation than Group X did.
Here’s the deviation method:
i Yi yi yi2
1 5 -2 4
2 6 -1 1
3 8 1 1
4 9 2 4
N = 4 ΣΣΣΣΣY = 28 ΣΣΣΣΣy = 0 ΣΣΣΣΣy2 = 10 The variance of group Y equals Σy2/N = 10/4 = 2.5.
The standard deviation is the square root of variance, or /2.5 = 1.58.
Using the raw score method, we calculate the sum of squares for Group 2 as follows:
i Yi Yi2
1 1 1
2 6 36
3 8 64
4 9 81
n = 4 ΣΣΣΣΣY = 28 ΣΣΣΣΣY2 = 206
ΣΣΣΣΣy2 = ΣY2 - (ΣY)2/N = 206 - (28)2/4 = 206 - 196 = 10
Again, we get the same result, 10, with either method. Since the sum of squares equals 10, variance equals 2.5 and standard deviation 1.58, as calculated above. The deviation method illustrates the meaning of standard deviation, the raw score method gives the same result more simply.
We have computed the standard deviation for both groups of scores. The groups have identical means, but different “spreads.” We expected that the scores of Group X would have the larger standard deviation than Group Y because of its larger spread of the scores. Calculations demonstrated a standard deviation of 4.32 in Group X and of 1.58 in Group Y, confirming our expectation.