Variability is an indication of how scores in a distribution are scattered or dispersed.
As Figure 3–4 illustrates, two or more distributions of test scores can have the same mean even though differences in the dispersion of scores around the mean can be wide.
In both distributions A and B, test scores could range from 0 to 100. In distribution A, we see that the mean score was 50 and the remaining scores were widely distributed around the mean. In distribution B, the mean was also 50 but few people scored higher than 60 or lower than 40.
Statistics that describe the amount of variation in a distribution are referred to as measures of variability. Some measures of variability include the range, the interquar-tile range, the semi-interquarinterquar-tile range, the average deviation, the standard deviation, and the variance.
The range The range of a distribution is equal to the differ-ence between the highest and the lowest scores. We could describe distribution B of Figure 3–3 , for example, as having a range of 20 if we knew that the highest score in this distri-bution was 60 and the lowest score was 40 (60 ⫺ 40 ⫽ 20).
With respect to distribution A, if we knew that the lowest score was 0 and the highest score was 100, the range would be equal to 100 ⫺ 0, or 100. The range is the simplest
mea-sure of variability to calculate, but its potential use is limited. Because the range is based entirely on the values of the lowest and highest scores, one extreme score (if it happens to be the lowest or the highest) can radically alter the value of the range. For example, suppose distribution B included a score of 90. The range of this distribution would now be equal to 90 ⫺ 40, or 50. Yet, in looking at the data in the graph for distribution B, it is clear that the vast majority of scores tend to be between 40 and 60.
As a descriptive statistic of variation, the range provides a quick but gross description of the spread of scores. When its value is based on extreme scores in a distribution, the resulting description of variation may be understated or overstated.
Better measures of variation include the interquartile range and the semi-interquar-tile range.
Figure 3–4
Two Distributions with Differences in Variability 50
Test score
Distribution A Distribution B
0 100
Frequency
50 Test score
0 40 60 100
Frequency
X X
J U S T T H I N K . . .
Devise two distributions of test scores to illustrate how the range can overstate or understate the degree of variability in the scores.
◆
Cohen−Swerdlik:
Psychological Testing and Assessment: An Introduction to Tests and Measurement, Seventh Edition
II. The Science of Psychological Measurement
3. A Statistics Refresher
100 © The McGraw−Hill
Companies, 2010
88 Part 2: The Science of Psychological Measurement
The interquartile and semi-interquartile ranges A distribution of test scores (or any other data, for that matter) can be divided into four parts such that 25% of the test scores occur in each quarter. As illustrated in Figure 3–5 , the dividing points between the four quarters in the distribution are the quartiles. There are three of them, respectively labeled Q 1 , Q 2 , and Q 3 . Note that quartile refers to a specifi c point whereas quarter refers to an interval. An individual score may, for example, fall at the third quartile or in the third quarter (but not “in” the third quartile or “at” the third quarter). It should come as no surprise to you that Q 2 and the median are exactly the same. And just as the median is the midpoint in a distribution of scores, so are quartiles Q 1 and Q 3 the quarter-points in a distribution of scores. Formulas may be employed to determine the exact value of these points.
The interquartile range is a measure of variability equal to the difference between Q 3 and Q 1 . Like the median, it is an ordinal statistic. A related measure of variability is the semi-interquartile range, which is equal to the interquartile range divided by 2. Knowledge of the relative distances of Q 1 and Q 3 from Q 2 (the median) provides the seasoned test interpreter with immediate information as to the shape of the distribution of scores. In a perfectly symmetrical distribution, Q 1 and Q 3 will be exactly the same dis-tance from the median. If these disdis-tances are unequal then there is a lack of symmetry.
This lack of symmetry is referred to as skewness, and we will have more to say about that shortly.
The average deviation Another tool that could be used to describe the amount of vari-ability in a distribution is the average deviation, or AD for short. Its formula is
AD ⫽ x n
∑
Figure 3–5
A Quartered Distribution
First quartile
score
Second quartile (median)
score Test scores
Third quartile
score First
quarter
Second quarter
Third quarter
Fourth quarter
Frequency
Q1 Q2 Q3
Chapter 3: A Statistics Refresher 89
The lowercase italic x in the formula signifi es a score’s deviation from the mean.
The value of x is obtained by subtracting the mean from the score ( X ⫺ mean ⫽ x ). The bars on each side of x indicate that it is the absolute value of the deviation score (ignoring the positive or negative sign and treating all deviation scores as positive). All the devia-tion scores are then summed and divided by the total number of scores ( n ) to arrive at the average deviation. As an exercise, calculate the average deviation for the following distribution of test scores:
85 100 90 95 80 Begin by calculating the arithmetic mean. Next, obtain the absolute value of each of the fi ve deviation scores and sum them. As you sum them, note what would happen if you did not ignore the plus or minus signs: All the deviation scores would then sum to 0. Divide the sum of the deviation scores by the number of measurements (5). Did you obtain an AD of 6? The AD tells us that the fi ve scores in this dis-tribution varied, on average, 6 points from the mean.
The average deviation is rarely used. Perhaps this is so because the deletion of alge-braic signs renders it a useless measure for purposes of any further operations. Why, then, discuss it here? The reason is that a clear understanding of what an average devia-tion measures provides a solid foundadevia-tion for understanding the conceptual basis of another, more widely used measure: the standard deviation. Keeping in mind what an average deviation is, what it tells us, and how it is derived, let’s consider its more fre-quently used “cousin,” the standard deviation.
The standard deviation Recall that, when we calculated the average deviation, the prob-lem of the sum of all deviation scores around the mean equaling zero was solved by employing only the absolute value of the deviation scores. In calculating the standard deviation, the same problem must be dealt with, but we do so in a different way. Instead of using the absolute value of each deviation score, we use the square of each score.
With each score squared, the sign of any negative deviation becomes positive. Because all the deviation scores are squared, we know that our calculations won’t be complete until we go back and obtain the square root of whatever value we reach.
We may defi ne the standard deviation as a measure of variability equal to the square root of the average squared deviations about the mean. More succinctly, it is equal to the square root of the variance. The variance is equal to the arithmetic mean of the squares of the differences between the scores in a distribution and their mean. The formula used to calculate the variance ( s 2 ) using deviation scores is
s x
n
2
=
∑
2
Simply stated, the variance is calculated by squaring and summing all the deviation scores and then dividing by the total number of scores. The variance can also be cal-culated in other ways. For example: From raw scores, fi rst calculate the summation of the raw scores squared, divide by the number of scores, and then subtract the mean squared. The result is
s X
n X
2
2 2
⫽
∑
⫺
J U S T T H I N K . . . After reading about the standard devia-tion, explain in your own words how an understanding of the average deviation can provide a “stepping-stone” to better understanding the concept of a standard deviation.
◆
Cohen−Swerdlik:
90 Part 2: The Science of Psychological Measurement
The variance is a widely used measure in psychological research. To make meaningful interpretations, the test-score distribution should be approximately normal. We’ll have more to say about “normal” distributions later in the chapter. At this point, think of a normal distribution as a distribution with the greatest frequency of scores occurring near the arithmetic mean. Correspondingly fewer and fewer scores relative to the mean occur on both sides of it.
For some hands-on experience with—and to develop a sense of mastery of—the concepts of variance and standard deviation, why not allot the next 10 or 15 minutes to calculating the standard deviation for the test scores shown in Table 3–1 ? Use both formulas to verify that they produce the same results. Using deviation scores, your cal-culations should look similar to these:
s x
Using the raw-scores formula, your calculations should look similar to these:
s X 1 standard deviation unit is approximately equal to 14 units of measurement or (with reference to our example and rounded to a whole number) to 14 test-score points. The test data did not provide a good normal curve approximation. Test professionals would describe these data as “positively skewed.” Skewness, as well as related terms such as negatively skewed and positively skewed, are covered in the next section. Once you are
“positively familiar” with terms like positively skewed, you’ll appreciate all the more the section later in this chapter entitled “The Area under the Normal Curve.” There you will fi nd a wealth of information about test-score interpretation in the case when the scores are not skewed—that is, when the test scores are approximately normal in distribution.
The symbol for standard deviation has variously been represented as s, S, SD, and the lowercase Greek letter sigma ( ). One custom (the one we adhere to) has it that s refers to the sample standard deviation and refers to the population standard devia-tion. The number of observations in the sample is n, and the denominator n – 1 is some-times used to calculate what is referred to as an “unbiased estimate” of the population
Chapter 3: A Statistics Refresher 91
value (though it’s actually only less biased; see Hopkins & Glass, 1978). Unless n is 10 or less, the use of n or n ⫺ 1 tends not to make a meaningful difference.
Whether the denominator is more properly n or n ⫺ 1 has been a matter of debate.
Lindgren (1983) has argued for the use of n ⫺ 1, in part because this denominator tends to make correlation formulas simpler. By contrast, most texts recommend the use of n ⫺ 1 only when the data constitute a sample; when the data constitute a population, n is preferable. For Lindgren (1983), it matters not whether the data are from a sample or a population. Perhaps the most reasonable convention is to use n either when the entire population has been assessed or when no inferences to the population are intended. So, when considering the examination scores of one class of students—including all the people about whom we’re going to make inferences—it seems appropriate to use n.
Having cleared the air (we hope) with regard to the n versus n ⫺ 1 controversy, our formula for the population standard deviation follows. In this formula, Xrepresents a sample mean and M a population mean:
(X M) n
⫺ 2
∑
The standard deviation is a very useful measure of variation because each individual score’s distance from the mean of the distribution is factored into its computation. You will come across this measure of variation frequently in the study and practice of mea-surement in psychology.
Skewness
Distributions can be characterized by their skewness, or the nature and extent to which symmetry is absent. Skewness is an indication of how the measurements in a distri-bution are distributed. A distridistri-bution has a positive skew when relatively few of the scores fall at the high end of the distribution. Positively skewed examination results may indicate that the test was too diffi cult. More items that were easier would have been desirable in order to better discriminate at the lower end of the distribution of test scores. A distribution has a negative skew when relatively few of the scores fall at the low end of the distribution. Negatively skewed examination results may indicate that the test was too easy. In this case, more items of a higher level of diffi culty would make it possible to better discriminate between scores at the upper end of the distribution.
(Refer to Figure 3–3 for graphic examples of skewed distributions.)
The term skewed carries with it negative implications for many students. We suspect that skewed is associated with abnormal, perhaps because the skewed distribution devi-ates from the symmetrical or so-called normal distribution. However, the presence or absence of symmetry in a distribution (skewness) is simply one characteristic by which a distribution can be described. Consider in this context a hypothetical Marine Corps Ability and Endurance Screening Test administered to all civilians seeking to enlist in the U.S. Marines. Now look again at the graphs in Figure 3–3 . Which graph do you think would best describe the resulting distribution of test scores? (No peeking at the next paragraph before you respond.)
No one can say with certainty, but if we had to guess then we would say that the Marine Corps Ability and Endurance Screening Test data would look like graph C, the positively skewed distribution in Figure 3–3 . We say this assuming that a level of dif-fi culty would have been built into the test to ensure that relatively few assessees would score at the high end of the distribution. Most of the applicants would probably score
Cohen−Swerdlik:
Psychological Testing and Assessment: An Introduction to Tests and Measurement, Seventh Edition
II. The Science of Psychological Measurement
3. A Statistics Refresher
104 © The McGraw−Hill
Companies, 2010
92 Part 2: The Science of Psychological Measurement Figure 3–6
The Kurtosis of Curves
–3 –2
Mesokurtic Leptokurtic Platykurtic
–1 0 +1 +2 +3
z scores
at the low end of the distribution. All of this is quite consistent with the advertised o bjective of the Marines, who are only looking for a few good men. You know: the few, the proud. Now, a question regarding this positively skewed distribution: Is the skewness a good thing? A bad thing? An abnormal thing? In truth, it is probably none of these things—it just is. By the way, while they may not advertise it as much, the Marines are also looking for (an unknown quantity of) good women. But here we are straying a bit too far from skewness.
Various formulas exist for measuring skewness. One way of gauging the skewness of a distribution is through examination of the relative distances of quartiles from the median. In a positively skewed distribution, Q 3 ⫺ Q 2 will be greater than the distance of Q 2 ⫺ Q 1 . In a negatively skewed distribution, Q 3 ⫺ Q 2 will be less than the d istance of Q 2 ⫺ Q 1 . In a distribution that is symmetrical, the distances from Q 1 and Q 3 to the median are the same.
Kurtosis
The term testing professionals use to refer to the s teepness of a distribution in its center is kurtosis. To the root kurtic is added to one of the prefi xes platy -, lepto -, or
meso- to describe the peakedness/fl atness of three gen-eral types of curves ( Figure 3– 6 ). Distributions are generally described as platykurtic (relatively fl at), lep-tokurtic (relatively peaked), or—somewhere in the middle— mesokurtic. Many methods exist for measur-ing kurtosis. Some c omputer programs feature an index of skewness that ranges from ⫺3 to ⫹ 3. In many ways, however, technical matters related to the measurement and interpretation of kurtosis are controversial among measurement specialists. So given that this can quickly become an advanced-level topic and that this book is of a more introductory nature, let’s move on. It’s time to focus on a type of distribution that happens to be the stan-dard against which all other distributions (including all of the kurtic ones) are com-pared: the normal distribution.
J U S T T H I N K . . . Like skewness, reference to the kurtosis of a distribution can provide a kind of
“shorthand” description of a distribution of test scores. Imagine and describe the kind of test that might yield a distribution of scores that form a platykurtic curve.
◆
Chapter 3: A Statistics Refresher 93