Steps in Hypothesis Testing
4. Collect the data and compute the value for the test statistic from the sample data
10.2.3 Test of Hypothesis for the Difference of Two Population Proportions
Example:
In a survey of 200 students, 78 out of the 120 females in the sample passed Math 17 on their first take while this figure is 60 among the 80 male students. Will you agree that the proportion of males who passed Math 17 on their first take is higher than the proportion of females who passed the same course on their first take? Test at α=0.05.
Solution:
Let X = number of females (out of 120) who passed Math 17 on their first take Y = number of males (out of 80) who passed Math 17 on their first take
Where P1 is the population proportion of females who pass Math 17 on first take and P2 is the population proportion of males who pass Math 17 on their first take.
i. Hypotheses: Ho: P1 = P2 vs. Ha: P1 < P2
vi. Decision: Since z = -1.498011773 ≮ -1.645, we do not reject Ho at 5% level of significance.
vii. Conclusion: At 5% level of significance, based on sample results, there is insufficient evidence to say that the proportion of males who passed Math 17 on their first take is significantly different from the population of females who passed Math 17 on their first take.
• Chi-Square Test
Suppose a sample of units has been taken from a population and information on classification according to two nominal variables has been obtained. Tests of independence are useful in assessing whether classification in one categorical variable has a relationship with classification in another categorical variable.
For example, suppose that an employee responsible for monitoring the quality of products manufactured by their firm is concerned with determining whether or not there is a relationship between the production shift and the presence of a defect in the units produced.
If in the population of all units manufactured by the firm, the true proportion of defective items per shift is 5%, then being classified as defective or not has nothing to do with the shift of production of the unit. In this scenario, presence of defect is independent of the production shift. If however, the true proportions of defective items differ among the production shifts, then the presence of defect in a unit is related to when the unit has been produced.
Unfortunately, the employee has no way of knowing the proportion of defectives per shift in the population of units manufactured by the firm unless he subjects each unit produced by the firm to testing, which would be impractical. What he can do is to take a random sample of units from the population, classify the units according to production shift and presence of defect (the two categorical variables of interest), and to test the independence of the said variables.
Now it is possible that even if production shift and presence of defect are independent in the population, he may get a sample of units such that the sample proportions of defective units vary among the production shifts. This is because of sampling error, the error committed because only a sample has been taken from the population instead of taking information from all units in the population. But if the presence of defectives and non-defectives per shift in the sample of units would not differ much from what is expected under independence.
To illustrate, suppose that the employee obtained a sample of units such that the number of units classified per shift and the number of defectives and non-defectives are as follows:
Cross-tabulation of Units According to Production Shift and Presence of Defect
Shifts Without Defect With Defect Total
Morning E11 = r1c1 = (400)(950)/1000 E12 = r1c2 = (400)(50)/1000 r1 = 400 Afternoon E21 = r2c1 = (300)(950)/1000 E22 = r2c2 = (300)(50)/1000 r2 = 300 Night E31 = r3c1 = (300)(950)/1000 E32 = r3c2 = (300)(50)/1000 r3 = 300
Total c1 = 950 c2 = 50 N = 1000
The marginal frequencies and grand total are then as follows:
r1 = the number of units in the sample produced under Morning shift = 400 r2 = the number of units in the sample produced under Afternoon shift = 300 r3 = the number of units in the sample produced under Night shift = 300 c1 = the number of units in the sample without defects = 950
c2 = the number of units in the sample with defects = 50 N = the number of units in the sample = 1000
Note that among the 1000 units in the sample, 95% are classified as not being defective and 5% are classified as being defective. If production shift and presence of defect are truly independent, then we should expect that than, per production shift, we will be able to classify 95% of the units as non-defective and 5% as defective. That is,
E11 = number of units in the sample produced under Morning shift and classified as not having defects that we expect to obtain if production shift and presence of defect are truly independent = (400)(950)/1000 = 380.
E12 = number of units in the sample produced under Morning shift and classified as having defects that we expect to obtain if production shift and presence of defect are truly independent = (400)(50)/1000 = 20.
In general, Eij = the number of units in the sample produced under Shift I and outcome j that we expect to obtain if production shift and presence of defect are truly independent, i= 1, 2, 3, and j= 1,2.
The other expected frequencies under independence of production shift and presence of defect can be similarly calculated and are presented in the table below.
Expected Frequencies Under Independence of Production Shift and Presence of Defect
Shifts Without Defect With Defect Total
Morning 380 20 400
Afternoon 285 15 300
Night 285 15 300
Note that the expected frequencies were calculated using the marginal frequencies and the grand total. To illustrate,
( )( )
( )( )
If the observed number of units in the sample classified under the variables “production shift” and “presence of defect” (the observed frequencies) are not much different from what is expected (expected frequencies, Eij’s) under independence of production shift and presence of defect in the population, then there is no sufficient evidence based on the sample to reject independence . But if the said differences are large, then we tend to reject the hypothesis of independence of production shift and presence of defect in the units produced by the firm.
The question now is, how large should the differences between the observed frequencies and the expected frequencies under independence be for the employee to reject independence of production shift and presence of defect?
For example, suppose that the employee has already constructed the contingency table of 1000 units classified according to production shift and presence of defect:
Observed Frequencies of 1000 Units Classified According to Production Shift and Presence of Defect Shifts Without Defect With Defect Total
Morning 392 8 400
Afternoon 275 25 300
Night 283 17 300
Total 950 50 1000
Note that the sample percentages of defectives are different for the three shifts: 2% for Morning shift, 8.33% for Afternoon shift, 5.67% for Night shift. There are then differences between the observed and the expected number of units classified according to production shift and presence of defect. As an example, there are more units produced under afternoon shift that have defects (O22=25) than what one would expect if production shift and presence of defect are independent (E22=15). The differences Oij-Eij can be assessed for the three shifts (i=1,2,3) and outcome (j=1,2). But are these differences large enough for us to reject independence of production shift and presence of defect?
The Chi-Square Test of Independence is a formal statistical test that provides an objective assessment as to whether or not the magnitude of the differences between observed and expected frequencies are large enough to reject the null hypothesis (independence).
It makes use of the following test statistic and critical region:
where r = number of levels of the row variable and c = number of levels of the column variable
Example: Given below is a 3 x 2 contingency table of 1000 units classified according to production shift and presence of defect.
Observed Frequencies of 1000 Units Classified According to Production Shift and Presence of Defect Shifts Without Defect With Defect Total
Morning 392 8 400
a. Hypothesis: Ho: Production shift and presence of defect are not related.
HA: Production shift and presence of defect are related.
b. Level of Significance: α = 0.05
e. Computations:
Observed Frequencies of 1000 Units Classified According to Production Shift and Presence of Defect
Shifts Without Defect With Defect
Total Observed Expected Observed Expected
Morning 392 380 8 20 400 sufficient evidence to say that presence of defect in a unit is related to the shift when it was produced.
Exercises:
1. Mediterranean Diet Case Study: In the study, 605 survivors of heart attack who were made to undergo either the AHA diet or the Mediterranean diet were monitored and classified according to health condition. The resulting contingency table of subjects according to diet followed and health condition is presented below.
Diet Health Condition
Cancers Deaths Nonfatal Illness Healthy Total
AHA 15 24 25 239 303
Mediterranean 7 14 8 273 302
Total 22 38 33 512 605
Is there sufficient evidence to say that diet and health condition are related at α=0.05
2. A research was undertaken to study factors related to mother’s choice of infant feeding method.
One of the factors examined was monthly family income. Do the data below indicate an association between family income and method of feeding? Use 0.10 level of significance.
Monthly
Remarks:
1. The test is VALID if at least 80% of the cells have expected frequencies of AT