The Chi-Square Test STAT E-50
Introduction to Statistics
2
The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed frequencies with expected frequencies. In a hypothesis test, the expected frequencies are those we would expect if the null hypothesis of our test is true.
The formula is
where O represents the observed frequency and E represents the expected frequency. The value of df depends on the type of test you are performing.
22 O E
E
3 The Chi-Square Distribution
The χ2 distribution is
• nonnegative
• not symmetrical; it is skewed to the right
• distributed to form a family of distributions, with a separate distribution for each different number of degrees of freedom.
4 The Chi-Square Test for Goodness of Fit
The goodness-of-fit test compares the distribution of observed outcomes for a single categorical variable to the expected outcomes predicted by a probability model. This test involves one sample, and one variable.
Assumptions and Conditions:
Counted data condition
Be sure that the data is counts, or frequencies
Independence assumption Randomization condition
Sample size assumption
Expected cell frequency condition: each expected frequency is at least 5
5 The Chi-square test is one-sided
0 2(df, α)
6 Automobile insurance is much more expensive for teenage drivers than for older drivers. To justify this cost difference, insurance companies claim that the younger drivers are much more likely to be involved in costly accidents. To test this claim, a researcher obtains information about registered drivers from the Department of Motor Vehicles and selects a sample of 300 accident reports from the police department.
The DMV reports the percentage of registered drivers in each age category as reported below. The number of accident reports is also shown. Does this data indicate that accidents occur with the same distribution as the ages of the drivers?
H0:
Ha:
7 Automobile insurance is much more expensive for teenage drivers than for older drivers. To justify this cost difference, insurance companies claim that the younger drivers are much more likely to be involved in costly accidents. To test this claim, a researcher obtains information about registered drivers from the Department of Motor Vehicles and selects a sample of 300 accident reports from the police department.
The DMV reports the percentage of registered drivers in each age category as reported below. The number of accident reports is also shown. Does this data indicate that accidents occur with the same distribution as the ages of the drivers?
H0: The distribution of the ages of drivers involved in accidents is the same as the distribution of the ages of registered drivers.
Ha: The distribution of the ages of drivers involved in accidents is not the same as the distribution of the ages of registered drivers.
Check the conditions:
Counted data condition Randomization condition Expected cell frequency condition
(this is the data)
8 Age
percent of drivers
number of accidents (observed)
expected O - E (O - E)2 (O - E)2 E
Under 20 16 68
20 - 29 28 92
30 or over 56 140
Check the conditions:
Counted data condition Randomization condition Expected cell frequency condition
n = 300 300
9 Age
percent of drivers
number of accidents (observed)
expected O - E (O - E)2 (O - E)2 E
Under 20 16 68 48
20 - 29 28 92 84
30 or over 56 140 168
Note: Σ observed = Σ expected
Check the conditions:
Counted data condition Randomization condition Expected cell frequency condition
n = 300 300
10 Age
percent of drivers
number of accidents (observed)
expected O - E (O - E)2 (O - E)2 E
Under 20 16 68 48
20 - 29 28 92 84
30 or over 56 140 168
Note: Σ observed = Σ expected
Check the conditions:
Counted data condition Randomization condition Expected cell frequency condition
n = 300 300
11 Age
percent of drivers
number of accidents (observed)
expected O - E (O - E)2 (O - E)2 E
Under 20 16 68 48
20 - 29 28 92 84
30 or over 56 140 168
Note: Σ observed = Σ expected
Check the conditions:
Counted data condition Randomization condition Expected cell frequency condition
n = 300 300
12 Age
percent of drivers
number of accidents (observed)
expected O - E (O - E)2 (O - E)2 E
Under 20 16 68 48
20 - 29 28 92 84
30 or over 56 140 168
Note: Σ observed = Σ expected
Check the conditions:
Counted data condition Randomization condition
Expected cell frequency condition - each expected frequency ≥ 5
n = 300 300
13 Age
percent of drivers
number of accidents (observed)
expected O - E (O - E)2 (O - E)2 E
Under 20 16 68 48
20 - 29 28 92 84
30 or over 56 140 168
Note: Σ observed = Σ expected
14 Age
percent of drivers
number of accidents (observed)
expected O - E (O - E)2 (O - E)2 E
Under 20 16 68 48 20 400 8.33
20 - 29 28 92 84
30 or over 56 140 168
22 O E
E
Specify the sampling distribution model and the test you will use.
, with df = k-1
2 = df =
15 Age
percent of drivers
number of accidents (observed)
expected O - E (O - E)2 (O - E)2 E
Under 20 16 68 48 20 400 8.33
20 - 29 28 92 84 8 64 .76
30 or over 56 140 168 -28 784 4.67
22 O E
E
Note: Σ(O - E) = 0 Specify the sampling distribution model and the test you will use.
, with df = k-1
2 = df =
16 Age
percent of drivers
number of accidents (observed)
expected O - E (O - E)2 (O - E)2 E
Under 20 16 68 48 20 400 8.33
20 - 29 28 92 84 8 64 .76
30 or over 56 140 168 -28 784 4.67
Note: Σ(O - E) = 0 Specify the sampling distribution model and the test you will use.
, with df = k-1
2 = df =
22 O E
E
17 Age
percent of drivers
number of accidents (observed)
expected O - E (O - E)2 (O - E)2 E
Under 20 16 68 48 20 400 8.33
20 - 29 28 92 84 8 64 .76
30 or over 56 140 168 -28 784 4.67
Specify the sampling distribution model and the test you will use.
Since the conditions are met, we will use a Chi-square model with 2 degrees of freedom, and do a Chi-square goodness-of-fit test.
, with df = k-1
2 = df =
22 O E
E
18 , with df = k-1
2 = 13.76 df = 3 - 1 = 2 P-value:
Age percent
of drivers
number of accidents (observed)
expected O - E (O - E)2 (O - E)2 E
Under 20 16 68 48 20 400 8.33
20 - 29 28 92 84 8 64 .76
30 or over 56 140 168 -28 784 4.67
13.76
22 O E
E
19 20
2 = 13.76 df = 3 - 1 = 2 P-value: P < .005
Statistical conclusion:
Conclusion in context:
Age percent
of drivers
number of accidents (observed)
expected O - E (O - E)2 (O - E)2 E
Under 20 16 68 48 20 400 8.33
20 - 29 28 92 84 8 64 .76
30 or over 56 140 168 -28 784 4.67
13.76
21 Age
percent of drivers
number of accidents (observed)
expected O - E (O - E)2 (O - E)2 E
Under 20 16 68 48 20 400 8.33
20 - 29 28 92 84 8 64 .76
30 or over 56 140 168 -28 784 4.67
13.76
2 = 13.76 df = 3 - 1 = 2 P-value: P < .005
Statistical conclusion: Since the p-value is small, reject the null hypothesis.
Conclusion in context: The data indicates that the distribution of ages of drivers involved in accidents is not the same as the distribution of ages of the drivers in the population.
22
Using SPSS for a Goodness of Fit Test If you have the expected proportions:
1. Create a numeric variable with a width of 1 and no decimal places for the categories. Code the values of this variable as follows:
In the Values column, click on the box with the three dots:
23 You will then see the Value Labels dialog box. Since there are three categories of ages, enter the values 1, 2, and 3 as coding variables:
Enter the value "1" and code it as "under 20". (You do not have to use quotation marks; they will be added by SPSS.)
24 Then click on Add and you will see the results:
25 Continue adding all categories, one at a time, and then click on OK.
26 You will see the results in the Values column in Variable View.
27 2. Create a numeric variable with no decimal places for the observed
frequencies.
Then, for each category, enter the coded value:
As you enter each value you will see a drop-down box. If you click on it, you can choose from the list of labels. However, if you just move to the next column, you will see the category name associated with the coded value.
28 You can then enter the observed frequency for that category.
Repeat this until all observed frequencies have been entered:
29 3. Weight the cases using the observed frequencies.
30 4. Now select
> Analyze > Nonparametric Tests > Legacy Dialogs > Chi-Square
31 5. Select the variable with the observed frequencies as the Test Variable
In the Expected Values box, select Values:
32 6. Enter the expected percents (as decimals) one at a time, and click
on Add until all have been entered:
33 6. Enter the expected percents (as decimals) one at a time, and click
on Add until all have been entered:
(Note that you also have the option to choose All categories equal if that is appropriate.)
34 7. After the last value has been entered, click on OK. You should see a table showing the observed and expected frequencies and a table with the results of the Chi-square test:
These results show that χ2 = 13.762, and p = .001
count Observed N Expected N Residual
68 68 48.0 20.0
92 92 84.0 8.0
140 140 168.0 -28.0
Total 300
Test Statistics count Chi-Square 13.762a
df 2
Asymp. Sig. .001 a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is 48.0.
35 The Chi-Square Test for Homogeneity
In a test for homogeneity, we compare observed distributions for several groups to see if there are differences among the respective populations.
The central issue is whether the category proportions are the same for all of the populations. The test involves several samples but only one variable.
36 The article “Relationship of Health Behaviors to Alcohol and Cigarette Use by College Students “ (J. of College Student Development (1992)) included data on drinking behavior for independently chosen random samples of male and female students similar to the data shown below.
Does there appear to be a gender difference with respect to drinking behavior?
Drinking level (Drinks per week)
Gender
Men Women Total
None 140 ( ) 186 ( )
Low (1 - 7) 478 ( ) 661 ( )
Moderate (8 - 24) 300 ( ) 173 ( )
High (25 or more) 63 ( ) 16 ( ) Total
37 The Chi-Square Test for Homogeneity
Assumptions and Conditions:
Counted data condition
Be sure that the data is counts, or frequencies
Independence assumption Randomization condition
If you want to generalize from the data to a population.
Sample size assumption Expected cell frequency condition Each expected frequency is at least 5
38 The article “Relationship of Health Behaviors to Alcohol and Cigarette Use by College Students “ (J. of College Student Development (1992)) included data on drinking behavior for independently chosen random samples of male and female students similar to the data shown below.
Does there appear to be a gender difference with respect to drinking behavior?
H0:
Ha:
Check the conditions:
Counted data condition Randomization condition Expected cell frequency condition
39 The article “Relationship of Health Behaviors to Alcohol and Cigarette Use by College Students “ (J. of College Student Development (1992)) included data on drinking behavior for independently chosen random samples of male and female students similar to the data shown below.
Does there appear to be a gender difference with respect to drinking behavior?
H0: The proportions of the four drinking levels are the same for males and for females
Ha: The proportions of the four drinking levels are not the same for males and for females
Check the conditions:
Counted data condition Randomization condition Expected cell frequency condition:
(row total)(column total)
E n
40 Specify the sampling distribution model and the test you will use.
df = (R - 1)(C - 1) Drinking level (Drinks per week)
Gender
Men Women Total
None 140 ( ) 186 ( )
Low (1 - 7) 478 ( ) 661 ( )
Moderate (8 - 24) 300 ( ) 173 ( )
High (25 or more) 63 ( ) 16 ( )
Total 41 Specify the sampling distribution model and the test you will use. df = (R - 1)(C - 1) = (4 - 1)(2 - 1) = (3)(1) = 3 The conditions are met, so we will use a Chi-square model with 3 degrees of freedom, and do a Chi-square test of homogeneity. Drinking level (Drinks per week) Gender Men Women Total None 140 ( ) 186 ( )
Low (1 - 7) 478 ( ) 661 ( )
Moderate (8 - 24) 300 ( ) 173 ( )
High (25 or more) 63 ( ) 16 ( )
Total 42 Fill in the row and column totals. Drinking level (Drinks per week) Gender Men Women Total None 140 ( ) 186 ( )
Low (1 - 7) 478 ( ) 661 ( )
Moderate (8 - 24) 300 ( ) 173 ( )
High (25 or more) 63 ( ) 16 ( ) Total
43 Calculate the expected frequencies for each cell, using
Drinking level (Drinks per week)
Gender
Men Women Total
None 140 ( ) 186 ( ) 326
Low (1 - 7) 478 ( ) 661 ( ) 1139
Moderate (8 - 24) 300 ( ) 173 ( ) 473
High (25 or more) 63 ( ) 16 ( ) 79
Total 981 1036 2017 (row total)(column total) E = n 44 Calculate the expected frequencies for each cell, using Drinking level (Drinks per week) Gender Men Women Total None 140 ( 158.56 ) 186 ( ) 326
Low (1 - 7) 478 ( ) 661 ( ) 1139
Moderate (8 - 24) 300 ( ) 173 ( ) 473
High (25 or more) 63 ( ) 16 ( ) 79
Total 981 1036 2017 (row total)(column total) E = n 45 Calculate the expected frequencies for each cell, using Drinking level (Drinks per week) Gender Men Women Total None 140 ( 158.56 ) 186 ( 167.44 ) 326
Low (1 - 7) 478 ( ) 661 ( ) 1139
Moderate (8 - 24) 300 ( ) 173 ( ) 473
High (25 or more) 63 ( ) 16 ( ) 79
Total 981 1036 2017 (row total)(column total) E = n 46
2 2 O E E Drinking level (Drinks per week) Gender Men Women Total None 140 ( 158.56 ) 186 ( 167.44 ) 326Low (1 - 7) 478 ( 553.97 ) 661 ( 585.03 ) 1139 Moderate (8 - 24) 300 ( 230.05 ) 173 ( 242.95 ) 473
High (25 or more) 63 ( 38.42 ) 16 ( 40.58 ) 79
Total 981 1036 2017 47 2.17 +
2 2 O E E Drinking level (Drinks per week) Gender Men Women Total None 140 ( 158.56 ) 186 ( 167.44 ) 326Low (1 - 7) 478 ( 553.97 ) 661 ( 585.03 ) 1139 Moderate (8 - 24) 300 ( 230.05 ) 173 ( 242.95 ) 473
High (25 or more) 63 ( 38.42 ) 16 ( 40.58 ) 79
Total 981 1036 2017 48 2.17 + 2.06 +
2 2 O E E Drinking level (Drinks per week) Gender Men Women Total None 140 ( 158.56 ) 186 ( 167.44 ) 326Low (1 - 7) 478 ( 553.97 ) 661 ( 585.03 ) 1139 Moderate (8 - 24) 300 ( 230.05 ) 173 ( 242.95 ) 473
High (25 or more) 63 ( 38.42 ) 16 ( 40.58 ) 79
Total 981 1036 2017
49 2.17 + 2.06 +
10.418 + 9.865 +
22 O E
E
Drinking level (Drinks per week)
Gender
Men Women Total
None 140 ( 158.56 ) 186 ( 167.44 ) 326
Low (1 - 7) 478 ( 553.97 ) 661 ( 585.03 ) 1139 Moderate (8 - 24) 300 ( 230.05 ) 173 ( 242.95 ) 473 High (25 or more) 63 ( 38.42 ) 16 ( 40.58 ) 79
Total 981 1036 2017
50 2.17 + 2.06 +
10.418 + 9.865 + 21.27 + 20.14 + 15.73 + 14.89 =
22 O E
E
Drinking level (Drinks per week)
Gender
Men Women Total
None 140 ( 158.56 ) 186 ( 167.44 ) 326
Low (1 - 7) 478 ( 553.97 ) 661 ( 585.03 ) 1139 Moderate (8 - 24) 300 ( 230.05 ) 173 ( 242.95 ) 473 High (25 or more) 63 ( 38.42 ) 16 ( 40.58 ) 79
Total 981 1036 2017
51 2.17 + 2.06 +
10.418 + 9.865 + 21.27 + 20.14 + 15.73 + 14.89 = 96.54
22 O E
E
Drinking level (Drinks per week)
Gender
Men Women Total
None 140 ( 158.56 ) 186 ( 167.44 ) 326
Low (1 - 7) 478 ( 553.97 ) 661 ( 585.03 ) 1139 Moderate (8 - 24) 300 ( 230.05 ) 173 ( 242.95 ) 473 High (25 or more) 63 ( 38.42 ) 16 ( 40.58 ) 79
Total 981 1036 2017
52
53 The article “Relationship of Health Behaviors to Alcohol and Cigarette Use by College Students “ (J. of College Student Development (1992)) included data on drinking behavior for independently chosen random samples of male and female students similar to the data shown below.
Does there appear to be a gender difference with respect to drinking behavior?
H0: The proportions of the four drinking levels are the same for males and females
Ha: The proportions of the four drinking levels are not the same for males and females
2 = 96.54 df = 3 P-value: p < .005 Statistical conclusion:
Conclusion in context:
54 The article “Relationship of Health Behaviors to Alcohol and Cigarette Use by College Students “ (J. of College Student Development (1992)) included data on drinking behavior for independently chosen random samples of male and female students similar to the data shown below.
Does there appear to be a gender difference with respect to drinking behavior?
H0: The proportions of the four drinking levels are the same for males and females
Ha: The proportions of the four drinking levels are not the same for males and females
2 = 96.54 df = 3 P-value: p < .005
Statistical conclusion: p is small, so the null hypothesis is rejected Conclusion in context: The data does indicate a gender difference with respect to drinking behavior.
55
Using SPSS for a Test for Homogeneity
1. Create a string variable for each of the categories, and a numeric variable for the observed frequencies. Be sure to make the columns wide enough ("columns" in Variable View).
Then enter the values of these two variables:
2. Weight the cases using the observed frequencies.
(> Data > Weight Cases…)
56 3. Select > Analyze > Descriptive Statistics > Crosstabs…
Select one variable as the row variable and the other as the column variable.
Click on Statistics… and then on Chi-square.
57 Click on the Cells button, and select Observed and Expected in the Cell Display window. Then click on Continue.
Click on Display clustered bar charts to produce the graph shown in the results.
Click on Continue and then click on OK.
58 Your output should include a table showing the observed and expected frequencies:
gender * level Crosstabulation level
Total
high low moderate none
gender female Count 16 661 173 186 1036
Expected Count 40.6 585.0 242.9 167.4 1036.0
male Count 63 478 300 140 981
Expected Count 38.4 554.0 230.1 158.6 981.0
Total Count 79 1139 473 326 2017
Expected Count 79.0 1139.0 473.0 326.0 2017.0
59 and a table with the results of your Chi-square test:
These results show that χ2 = 96.526, and p = .000
Chi-Square Tests
Value df
Asymp. Sig. (2- sided)
Pearson Chi-Square 96.526a 3 .000
Likelihood Ratio 98.966 3 .000
N of Valid Cases 2017
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 38.42.
60 Here is the graph that represents the results:
61 The Chi-Square Test for Independence
In a test for independence, we investigate association between two categorical variables in a single population. There is one sample, but there are two variables.
Assumptions and Conditions:
Counted data condition Randomization condition
If you want to generalize from the data to a population.
Expected cell frequency condition
62 The table shown below was constructed using data in the article
“Television Viewing and Physical Fitness in Adults” (Research Quarterly for Exercise and Sport (1990)). The author hoped to determine whether time spent watching television is associated with cardiovascular fitness.
Subjects were asked about their television viewing time (per day, rounded to the nearest hour) and were classified as physically fit if they scored in the excellent or very good category on a step test.
Ho: Ha:
TV Group Fitness Level
Physically Fit Not Physically Fit Total
0 35 ( ) 147 ( )
1 - 2 101 ( ) 629 ( )
3 - 4 28 ( ) 222 ( )
5 or more 4 ( ) 34 ( )
Total 63 The table shown below was constructed using data in the article “Television Viewing and Physical Fitness in Adults” (Research Quarterly for Exercise and Sport (1990)). The author hoped to determine whether time spent watching television is associated with cardiovascular fitness. Subjects were asked about their television viewing time (per day, rounded to the nearest hour) and were classified as physically fit if they scored in the excellent or very good category on a step test. Ho: Fitness and TV viewing are independent Ha: Fitness and TV viewing are not independent TV Group Fitness Level Physically Fit Not Physically Fit Total 0 35 ( ) 147 ( )
1 - 2 101 ( ) 629 ( )
3 - 4 28 ( ) 222 ( )
5 or more 4 ( ) 34 ( )
Total 64 Check the conditions: Counted data condition Randomization condition Expected cell frequency condition 65 Specify the sampling distribution model and the test you will use. df = (R - 1)(C - 1) = (4 - 1)(2 - 1) = (3)(1) = 3 Since the conditions are met, we will use a Chi-square model with 3 degrees of freedom, and do a Chi-square test for independence. TV Group Fitness Level Physically Fit Not Physically Fit Total 0 35 ( ) 147 ( )
1 - 2 101 ( ) 629 ( )
3 - 4 28 ( ) 222 ( )
5 or more 4 ( ) 34 ( )
Total 66 Find the row and column totals. TV Group Fitness Level Physically Fit Not Physically Fit Total 0 35 ( ) 147 ( )
1 - 2 101 ( ) 629 ( )
3 - 4 28 ( ) 222 ( )
5 or more 4 ( ) 34 ( ) Total
67
TV Group Fitness Level
Physically Fit Not Physically Fit Total
0 35 ( ) 147 ( ) 182
1 - 2 101 ( ) 629 ( ) 730
3 - 4 28 ( ) 222 ( ) 250
5 or more 4 ( ) 34 ( ) 38
Total 168 1032 1200 (row total)(column total) E = n 68 TV Group Fitness Level Physically Fit Not Physically Fit Total 0 35 ( 25.48 ) 147 ( ) 182
1 - 2 101 ( 102.20 ) 629 ( ) 730
3 - 4 28 ( 35.00 ) 222 ( ) 250
5 or more 4 ( 5.32 ) 34 ( ) 38
Total 168 1032 1200 (row total)(column total) E = n 69 TV Group Fitness Level Physically Fit Not Physically Fit Total 0 35 ( 25.48 ) 147 ( 156.52 ) 182 1 - 2 101 ( 102.20 ) 629 ( 627.80 ) 730 3 - 4 28 ( 35.00 ) 222 ( 215.00 ) 250 5 or more 4 ( 5.32 ) 34 ( 32.68 ) 38
Total 168 1032 1200 (row total)(column total) E = n 70 TV Group Fitness Level Physically Fit Not Physically Fit Total 0 35 ( 25.48 ) 147 ( 156.52 ) 182 1 - 2 101 ( 102.20 ) 629 ( 627.80 ) 730 3 - 4 28 ( 35.00 ) 222 ( 215.00 ) 250 5 or more 4 ( 5.32 ) 34 ( 32.68 ) 38
Total 168 1032 1200
2 2 O E E 3.557 + .579 + 71 TV Group Fitness Level Physically Fit Not Physically Fit Total 0 35 ( 25.48 ) 147 ( 156.52 ) 182 1 - 2 101 ( 102.20 ) 629 ( 627.80 ) 730 3 - 4 28 ( 35.00 ) 222 ( 215.00 ) 250 5 or more 4 ( 5.32 ) 34 ( 32.68 ) 38Total 168 1032 1200
2 2 O E E 3.557 + .579 + 72
2 2 O E E 3.557 + .579 + .014 + .002 + 1.4 + .228 + .328 + .0539 = 6.161 TV Group Fitness Level Physically Fit Not Physically Fit Total 0 35 ( 25.48 ) 147 ( 156.52 ) 182 1 - 2 101 ( 102.20 ) 629 ( 627.80 ) 730 3 - 4 28 ( 35.00 ) 222 ( 215.00 ) 250 5 or more 4 ( 5.32 ) 34 ( 32.68 ) 38Total 168 1032 1200
73
P-value:
2 6.161 df = 3
TV Group Fitness Level
Physically Fit Not Physically Fit Total
0 35 ( 25.48 ) 147 ( 156.52 ) 182
1 - 2 101 ( 102.20 ) 629 ( 627.80 ) 730 3 - 4 28 ( 35.00 ) 222 ( 215.00 ) 250 5 or more 4 ( 5.32 ) 34 ( 32.68 ) 38
Total 168 1032 1200
74
75
P-value: p > .10
Statistical conclusion:
Conclusion in context:
2 6.161 df = 3
TV Group Fitness Level
Physically Fit Not Physically Fit Total
0 35 ( 25.48 ) 147 ( 156.52 ) 182
1 - 2 101 ( 102.20 ) 629 ( 627.80 ) 730 3 - 4 28 ( 35.00 ) 222 ( 215.00 ) 250 5 or more 4 ( 5.32 ) 34 ( 32.68 ) 38
Total 168 1032 1200
76
P-value: p > .10
Statistical conclusion: Since the p-value is large, we cannot reject the null hypothesis.
Conclusion in context: There is not enough evidence to conclude that time spent watching television is associated with cardiovascular fitness.
2 6.161 df = 3
TV Group Fitness Level
Physically Fit Not Physically Fit Total
0 35 ( 25.48 ) 147 ( 156.52 ) 182
1 - 2 101 ( 102.20 ) 629 ( 627.80 ) 730 3 - 4 28 ( 35.00 ) 222 ( 215.00 ) 250 5 or more 4 ( 5.32 ) 34 ( 32.68 ) 38
Total 168 1032 1200
77 Using SPSS for a Test for Independence
Follow the instructions for a Chi-Square test for homogeneity. You may define two string variables for the categories and one numeric variable for the counts, or you may choose to use coding for one or either of the variables representing the categories.
78 Then enter the frequencies as before:
79 Weight the cases by counts, and then use
> Analyze > Descriptive Statistics > Crosstabs…
Select one variable as the row variable and the other as the column variable.
Click on Statistics… and then on Chi-square.
Click on the Cells button, and select Observed and Expected in the Cell Display window.
Click on Display clustered bar charts to produce the graph shown in the results.
Then click on Continue and on OK.
80 SPSS output:
TVGroup * Fitness Crosstabulation Fitness
Total Fit Not Fit
TVGroup 0 Count 35 147 182
Expected Count 25.5 156.5 182.0
1-2 Count 101 629 730
Expected Count 102.2 627.8 730.0
3-4 Count 28 222 250
Expected Count 35.0 215.0 250.0
5 or more Count 4 34 38
Expected Count 5.3 32.7 38.0
Total Count 168 1032 1200
Expected Count 168.0 1032.0 1200.0
81 SPSS output:
These results show that χ2 = 36.161 and p = .104
Chi-Square Tests
Value df
Asymp. Sig. (2- sided)
Pearson Chi-Square 6.161a 3 .104
Likelihood Ratio 5.930 3 .115
N of Valid Cases 1200
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 5.32.
82 Here is the graph that supports these results:
83
1. A health professional selected a random sample of 100 patients from each of four major hospital emergency rooms to see if the major reasons for emergency room visits (accident, illegal activity, illness, other) are the same in all four hospitals.
This is an example of a. A goodness-of-fit test b. A test for homogeneity c. A test for independence
84
1. A health professional selected a random sample of 100 patients from each of four major hospital emergency rooms to see if the major reasons for emergency room visits (accident, illegal activity, illness, other) are the same in all four hospitals.
This is an example of
a. A goodness-of-fit test
b. A test for homogeneityc. A test for independence
85
2. An urban economist wants to determine whether the region of the United States a resident lives in is related to his level of education. He randomly selects 1800 US residents and asks them to report their level of education and the region of the US in which they live.
The economist is using a. A goodness-of-fit test b. A test for homogeneity c. A test for independence
86
2. An urban economist wants to determine whether the region of the United States a resident lives in is related to his level of education. He randomly selects 1800 US residents and asks them to report their level of education and the region of the US in which they live.
The economist is using a. A goodness-of-fit test b. A test for homogeneity
c. A test for independence87
3. As part of a class project, a student asked a random sample of students about their preferred soft drink: Pepsi, Coke, or 7-Up, to determine whether these three drinks were equally preferred by students.
The student should use a. A goodness-of-fit test b. A test for homogeneity c. A test for independence
88