CHAPTER 12 TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U

(1)

One of the fundamental assumptions of both the t-test and the ANOVA is that the data

1

in the overall population is expected to conform to the normal distribution. If this assumption is not valid, Mann Whitney U is the most appropriate statistic.

CHAPTER 12

TESTING DIFFERENCES WITH ORDINAL DATA: MANN WHITNEY U

Previous chapters of this text have explained the procedures used to test hypotheses using interval data (t-tests and ANOVA’s) and nominal data (Chi-Square). This chapter provides the last of the major difference tests commonly used by researchers. Mann Whitney U is the name given to the statistic most commonly used to compare rankings, or ordinal values, in two sample groups to determine if they reflect the presence of a real difference in the larger populations they represent. Mann-Whitney U is a test to determine if the ranks of individuals or cases in two separate groups are equal to each other. It is most frequently employed when the data available to the researcher is ordinal in nature. Class rank represents a common example of this type of data which may be available for research. In some circumstances, a researcher may also wish to apply the Mann Whitney U test when interval data is present, but when it is not expected to be normally distributed within the population.¹

The first step in carrying out a Mann Whitney U test is to make sure that ranks are properly assigned. When beginning with interval data and converting to an ordinal format, it is important to remember that the ranks must be assigned on an aggregate basis that includes all observations in both groups. The process of assigning ranks must begin with the lowest value observed in the data set. This value is assigned a rank of 1. This process is illustrated in Figure 12:1 using an hypothetical research situation in which a researcher is to compare the ranked

(2)

scores of students who were under severe stress to the scores of students who were not under severe stress.

FIGURE 12:1

NUMBER OF MINUTES REQUIRED

Test Score Stress Rank

44 2 1

50 2 2

68 2 3

70 2 4

72 2 5

74 1 6

75 2 7

76 2 8

78 1 9

79 1 10

81 2 11

82 1 12

83 2 13

87 1 14

88 2 15

90 1 16

91 1 17

(3)

In the case of a tie, an average rank is computed and assigned to all of the tied scores. This is demonstrated in the example given in Figure 12:1 with the three scores of 92. These values are tied for the 18, 19, and 20 positions in the distribution. In this case, an average rank is calculated by adding the ranks together and dividing by the number of scores involved in the tie

. Therefore, each of the scores would be assigned a rank of 19. The next score is

assigned a rank of 21 and so on.

The second step in the process of comparing ranks involves dividing the cases included in the data set into two groups using the grouping variable that has been selected. The group with that includes the fewest cases is designated as group one while the group with the largest number of cases is designated as group 2. This data should be organized in a solution matrix with a separate section for each group containing the raw score and the assigned rank. Values for group

one are labeled as X with the ranks for group 1 labeled as R . For group 2, the values and ranks1 1 2 2

are designated as X and R . Once the solution matrix has been created, the researcher should

determine the number of cases included in each sample. These values are designated as n and1

n . After the sample size has been determined, a sum of ranks should be calculated by simply2

adding together the ranks of all scores included in group 1 and the ranks of all scores included in group 2. Figure 12:2 continues the earlier example and provides this solution matrix.

(4)

FIGURE 12:1

SOLUTION MATRIX MANN WHITNEY U

1 1 2 2

X R X R

74 6 44 1

78 9 50 2

79 10 68 3

82 12 70 4

87 14 72 5

90 16 75 7

91 17 76 8

92 19 81 11

92 19 83 13

93 21 88 15

n=10 92 19

94 22

n=12

The third step in the process involves the application of two separate formulas representing statistics called U and U’ (U & U prime). Those formulas are as follows:

(5)

The values for n are simply the number of cases in each sample and the value for the sum of Ranks is calculated in the solution matrix. The process of calculating both U and U’ for the present example is demonstrated below.

Once values have been obtained for both U and U’, they are compared to the critical values that are found in appendices I and J. To look up the appropriate critical value find the number of cases included in the first group (smallest group) along the top of the chart. Then find the number of cases included in the second group (largest group) along the side of the chart.

Two critical values will be listed as the point where these two points intersect in the chart. For the example employed thus far, the critical values at the .05 level are 29 and 91. At the .01 level, critical values are 21 and 99. In order for the differences in ranks to be considered significant, the lowest of the two scores must fall below the lower critical value AND the highest of the two scores must fall above the higher critical value. In this case, the obtained value of 32 is higher than the lower critical value at both levels of confidence (29 and 21) and the obtained value of 88 is below the critical value at both levels of confidence (91 and 99). Therefore, the researcher must accept the null hypothesis at both levels of confidence and conclude that the observed differences in ranks was not large enough to be statistically significant. This suggests that there is a substantial probability that the differences that appear to exist are the result of sampling error rather than a true difference in the rankings within the overall populations under consideration.

(6)

EXERCISES - CHAPTER 12

(1) Math SAT scores are recorded for two independent random samples selected from populations at two military academies. The data are reported in the table below: Based on how the students rank, do the academies have students of equal aptitude in

mathematics? Draw statistical and research conclusions.

SAT Academy

510 Army

540 Army

600 Navy

650 Navy

650 Army

670 Army

690 Army

695 Navy

700 Army

700 Navy

705 Army

710 Army

725 Army

730 Army

740 Army

750 Navy

755 Navy

760 Navy

765 Navy

775 Navy

(7)

(2) Two samples of nurses were selected from two state hospitals and evaluated by a research team. The team developed a composite ranking scheme based on a series of variables. The scores for the two samples are provided in the table below. Based on the rankings, do the hospitals have nurses who are equally proficient? Draw statistical and research conclusions.

Scores Hospital

32 a

40 a

51 b

52 b

60 b

60 a

62 a

65 a

74 a

75 a

78 b

80 a

84 a

86 b

86 a

87 b

90 a

90 b

91 b

95 b

96 b

97 b

98 b

100 a

102 b

104 a

(8)

(3) The Highway Safety Administration conducted a major study for several interstate highways before and after speed limits were raised by several sample states. These data are listed in the table below. Use Mann Whitney U to compare the rankings of each state in the category of fatal accidents. Draw statistical and research conclusions.

Fatal Accidents Per Year Speed Limit (1=65, 2=75)

21 1

38 1

40 1

42 1

49 2

57 2

68 1

69 2

70 2

70 1

72 1

83 2

124 2

212 2

(9)

(4) The Federal Aviation Administration selected a sample of airports for two areas of the United States to determine the possible affect of weather on flight

cancellations for these areas with different weather patterns. Some of these data are presented in the table below. Compare the ranks of airports in each of these regions. Draw statistical and research conclusions.

Weather Related Cancellations

Region

211 1

281 1

296 1

341 1

375 1

423 1

522 2

525 1

618 1

729 1

797 2

891 2

929 2

1018 2

1221 2