Unit 7: Hypothesis Testing and Confidence Intervals II
QBA 201 โ Summer 2013
Instructor: Michael Malcolm
7.1: Tests and confidence intervals for proportions
7.2: Tests and confidence intervals for difference of means
7.3: Tests and confidence intervals for difference of proportions
7.4: Small sample sizes
7.1: Tests and confidence intervals for proportions
Recall the procedures for a hypothesis test involving a population mean. The general procedure for hypothesis testing is:
1. State the hypotheses. 2. State the rejection region. 3. Compute the test statistic.
4. Form conclusion by comparing the test statistic and the rejection region.
Right-Sided Left-Sided Two-Sided
1. ๐ป0: ๐ = ๐0 versus ๐ป๐: ๐ > ๐0
2. RR: ๐ง > ๐ง๐ผ
3. ๐ง =๐ฅฬ โ๐0
๐ โ๐โ
4. Form conclusion.
1. ๐ป0: ๐ = ๐0 versus ๐ป๐: ๐ < ๐0
2. RR: ๐ง < โ๐ง๐ผ
3. ๐ง =๐ฅฬ โ๐0
๐ โ๐โ
4. Form conclusion.
1. ๐ป0: ๐ = ๐0 versus ๐ป๐: ๐ โ ๐0
2. RR: ๐ง < โ๐ง๐ผ 2โ or ๐ง > ๐ง๐ผ 2โ
3. ๐ง =๐ฅฬ โ๐๐ โ๐โ 0
4. Form conclusion.
A 1 โ ๐ผ confidence interval for the population mean ๐ is given by:
[๐ฅฬ โ ๐ง๐ผ 2โ โ๐๐ , ๐ฅฬ + ๐ง๐ผ 2โ โ๐๐]
The expression ๐
โ๐ that shows up in both the test statistic and in the confidence interval is known
as the standard error. It is the standard deviation of the sample mean.
It turns out that many hypothesis tests and confidence intervals can be derived using exactly the same โrecipeโ. The only trick is to get the standard error right. For example, in this section we will discuss hypothesis tests and confidence intervals for proportions.
For example, suppose we want to study the true proportion of residents of some population who intend to vote for a particular candidate. The true population proportion is ๐. The sample proportion ๐ฬ is the measured proportion of people in our sample who intend to vote for the candidate.
The hypotheses are set up the same way as in the case of a sample mean. Our null hypothesis is that the true population proportion ๐ is equal to some hypothesized value ๐0. We can test against
alternatives that ๐ is greater than, less than, or not equal to the hypothesized value. The appropriate tests and confidence intervals are given in the table below. The test statistics and
Right-Sided Left-Sided Two-Sided
1. ๐ป0: ๐ = ๐0 versus ๐ป๐: ๐ > ๐0
2. RR: ๐ง > ๐ง๐ผ 3. ๐ง = ๐ฬโ๐0
โ๐0(1โ๐0)
๐ 4. Form conclusion.
1. ๐ป0: ๐ = ๐0 versus ๐ป๐: ๐ < ๐0
2. RR: ๐ง < โ๐ง๐ผ 3. ๐ง = ๐ฬโ๐0
โ๐0(1โ๐0)
๐ 4. Form conclusion.
1. ๐ป0: ๐ = ๐0 versus ๐ป๐: ๐ โ ๐0
2. RR: ๐ง < โ๐ง๐ผ 2โ or ๐ง > ๐ง๐ผ 2โ
3. ๐ง = ๐ฬโ๐0
โ๐0(1โ๐0)
๐ 4. Form conclusion.
The form for a 1 โ ๐ผ confidence interval for the population proportion ๐ is given by the following.
[๐ฬ โ ๐ง๐ผ 2โ โ๐ฬ(1โ๐ฬ)๐ ,๐ฬ + ๐ง๐ผ 2โ โ๐ฬ(1โ๐ฬ)๐ ]
The only important thing to note is that the standard error for hypothesis testing is calculated by
assuming the null value ๐0, i.e. โ๐0(1โ๐๐ 0). For the confidence interval, we rely on the sample
proportion, i.e. the standard error is calculated as โ๐ฬ(1โ๐ฬ)๐ .
One warning is that these tests are unreliable for values of ๐ very close to 0 or 1 unless the sample size is very large. Testing or estimating something involving a true population proportion
๐ = 0.001 is obviously very difficult if the sample size is something like ๐ = 100. But
inferences when ๐ = 0.3 are much less problematic.
For our first example, suppose that a company advertises that only 5% of batteries that it produces are defective. A consumer testing agency wants to test this claim at a significance level
of ๐ผ = 0.01. In order to test the claim, the agency takes a random sample of 300 batteries and it
finds 22 defective batteries.
For this test, a one-sided alternative is appropriate since the agency is presumably interested in investigating whether the proportion of defectives is higher than claimed. Note that the sample proportion ๐ฬ to be used for testing is ๐ฬ =30022 = 0.0733.
We follow the steps for a right-sided hypothesis test as follows:
1. ๐ป0: ๐ = 0.05 versus ๐ป๐: ๐ > 0.05
3. ๐ง = 0.0733โ0.05
โ0.05(1โ0.05)
300
= 1.85
4. Since ๐ง is not in the rejection region, we do not reject the null hypothesis.
The agencyโs sample does not provide sufficient evidence that the proportion of defective batteries is higher than the claimed level of 0.05.
The p-value for these data can be calculated using the same technique that we derived in unit 6.4. Since the alternative is a right-sided alternative, the p-value is calculated from the test statistic as:
๐(๐ง > 1.85) = 0.0322
The p-value for these data ๐ = 0.0322 confirms that the null hypothesis could not be rejected at significance level ๐ผ = 0.01. However, notice that it could have been rejected if the agency had used a significance level ๐ผ = 0.05. Recall that the p-value is the lowest level of significance at which the null hypothesis can be rejected.
We can also apply the technique of unit 6.5 to estimate the power of this test. Recall that the power of a test is defined against a particular alternative. For example, suppose that the true proportion of defectives is actually ๐ = 0.07, so that the null should be rejected. What is the power of the agencyโs testing procedure for rejecting the null in this case?
We first compute the rejection region explicitly. Since the rejection region for the agencyโs test is
๐ง > 2.326, we need to find the particular values of ๐ฬ that lead to rejection. Plugging in the value
of the z-statistic:
๐ฬโ๐0
โ๐0(1โ๐0)
๐
> 2.326
๐ฬโ0.05 โ0.05(1โ0.05)
300
> 2.326 โ ๐ฬ > 0.0793
The power of the test is now the probability that ๐ฬ will actually fall into this rejection region when the true proportion of defectives is actually ๐ = 0.07. Using the Central Limit Theorem:
๐(๐ฬ > 0.0793|๐ = 0.07) = ๐ (๐ง >0.0793โ0.07
โ0.07(1โ0.07)
300
)
Notice that these calculations use the alternative value ๐ = 0.07 for the standard error, which is appropriate since what we are doing is to calculate the probability that ๐ฬ will fall in the rejection region when this alternative is true.
Thus, this test is quite low-powered against the alternative that ๐ = 0.07. In this case, even though the null ๐ = 0.05 should be rejected, our test is only able to reject the null 26.43% of the time.
Finally, a 95% confidence interval for the true population proportion of defectives ๐ would be calculated as follows:
[0.0733 โ 1.96โ0.0733(1โ0.0733)300 , 0.0733 + 1.96โ0.0733(1โ0.0733)300 ] = [0.0438, 0.1028]
With respect to polling, confidence intervals for proportions are frequently given a โmargin of errorโ interpretation. For example, suppose prior to the 2012 election that a polling company called 2500 US voters and asked whether they intended to vote for Obama. 1302 of the respondents indicated that they intended to vote for Obama, so the sample proportion can be calculated as ๐ฬ =13022500= 0.5208.
We can find a 95% confidence interval as such:
[0.5208 ยฑ 1.96โ0.5208(1โ0.5208)2500 ] = [0.5208 ยฑ 0.0196] = [0.5012, 0.5404]
A pollster might say in this case that his estimate for the proportion of voters who intend to vote for Obama is 52.08% with a margin of error 1.96%. This basically means that the confidence interval extends 0.0196 in both directions of the point estimate ๐ฬ = 0.5208.
Suppose the pollster wants to get the margin of error down to 1.5%. The necessary sample size
can be derived using the formula from unit 6.7. Recall that this formula gives ๐ = (๐ง๐ผ 2๐โ ๐)
2
. We
use the standard deviation estimate for Bernoulli random variables ๐ = โ๐ฬ(1 โ ๐ฬ):
๐ = (1.96โ0.5208(1โ0.5208)0.015 )2 = 4261.06
Rounding up, a sample size of 4262 is needed in order for the pollster to lower his margin of error to 1.5%. Note that we used ๐ฬ in this formula. If you donโt have a starting estimate, then ๐ฬ =
EXERCISES
1. A local law enforcement agency claimed that fewer than 50% of store owners actually turn shoplifters over to police. A random sample of 40 store owners indicated that only 24 of them turn shoplifters over to police.
a. Is there enough evidence to conclude at ๐ผ = 0.05 significance that the law enforcement agencyโs claim is correct?
b. What is the p-value for the test?
c. Calculate a 95% confidence interval for the proportion of store owners who turn shoplifters over to police.
7.2: Tests and confidence intervals for difference of means
Rather than testing whether a mean is equal to some hypothesized value, we may sometimes be interested in testing whether the means of two different populations are different from each other. For example, we might want to test whether the mean salaries of men and women who work in some particular industry are significantly different from each other. In other words, we are comparing the means of two different populations.
Here, the null hypothesis is that the difference between the two population means is equal to some specified value. We can have both one-sided and two-sided alternatives giving that the difference in means is higher than, lower than or equal to this specified value.
The test follows the same setup as earlier tests, as long as the standard error is calculated correctly.
Right-Sided Left-Sided Two-Sided
1. ๐ป0: ๐1โ ๐2= ๐ท0 versus
๐ป๐: ๐1โ ๐2> ๐ท0
2. RR: ๐ง > ๐ง๐ผ
3. ๐ง =(๐ฅฬ 1โ๐ฅฬ 2)โ๐ท0
โ๐12
๐1+๐22๐2 4. Form conclusion.
1. ๐ป0: ๐1โ ๐2= ๐ท0 versus
๐ป๐: ๐1โ ๐2< ๐ท0
2. RR: ๐ง < โ๐ง๐ผ
3. ๐ง =(๐ฅฬ 1โ๐ฅฬ 2)โ๐ท0
โ๐12
๐1+๐22๐2 4. Form conclusion.
1. ๐ป0: ๐1โ ๐2= ๐ท0 versus
๐ป๐: ๐1โ ๐2โ ๐ท0
2. RR: ๐ง < โ๐ง๐ผ 2โ or ๐ง > ๐ง๐ผ 2โ
3. ๐ง =(๐ฅฬ 1โ๐ฅฬ 2)โ๐ท0
โ๐12
๐1+๐22๐2 4. Form conclusion.
The idea is to take random samples from both populations. The random sample from the first population consists of ๐1 observations and the random sample from the second population consists of ๐2 observations. We then record the sample mean for each random sample, ๐ฅฬ 1 for the sample from the first population and ๐ฅฬ 2 for the sample from the second population. As usual, the test statistic technically depends on the true population variances ๐12 and ๐22, but since these are usually unknown, in practice we substitute the sample variances ๐ 12 and ๐ 22.
The usual case is that we are testing for equality of two means. In other words, the normal case is that the hypothesized difference is ๐ท0 = 0. You might occasionally be interested in testing for some other value for the difference โ for example, you might want to know whether the mean difference in the life of two batteries is more than 3 months in order to justify a cost difference. But the most frequently encountered case is simply to test whether two means are equal to each other.
Right-Sided Left-Sided Two-Sided
1. ๐ป0: ๐1โ ๐2= 0 versus
๐ป๐: ๐1โ ๐2> 0
2. RR: ๐ง > ๐ง๐ผ
3. ๐ง =(๐ฅฬ 1โ๐ฅฬ 2)
โ๐12
๐1+๐22๐2 4. Form conclusion.
1. ๐ป0: ๐1โ ๐2= 0 versus
๐ป๐: ๐1โ ๐2< 0
2. RR: ๐ง < โ๐ง๐ผ
3. ๐ง =(๐ฅฬ 1โ๐ฅฬ 2)
โ๐12
๐1+๐22๐2 4. Form conclusion.
1. ๐ป0: ๐1โ ๐2= 0 versus
๐ป๐: ๐1โ ๐2โ 0
2. RR: ๐ง < โ๐ง๐ผ 2โ or ๐ง > ๐ง๐ผ 2โ
3. ๐ง =(๐ฅฬ 1โ๐ฅฬ 2)
โ๐12
๐1+๐22๐2 4. Form conclusion.
For the case where the null is that the two means are equal, the right-sided test is relevant when you want to test the hypothesized alternative that ๐1 > ๐2. The left-sided test is relevant when you want to test the hypothesized alternative that ๐1 < ๐2. The two-sided test is relevant when
you have no direction in mind and simply want to test whether the two are different.
A 1 โ ๐ผ confidence interval for the difference ๐1โ ๐2 is given by:
[(๐ฅฬ 1โ ๐ฅฬ 2) โ ๐ง๐ผ 2โ โ๐1
2
๐1+
๐22
๐2, (๐ฅฬ 1โ ๐ฅฬ 2) + ๐ง๐ผ 2โ โ
๐12 ๐1+
๐22 ๐2]
As an example, suppose that an environmental researcher wants to study the level of water pollution in two different locations on a river to determine whether they differ. He takes a random sample of 30 readings from the first location and a random sample of 35 readings from the second location. Among the samples from the first location, the mean pollution level is 1.65 ppm with a standard deviation of 0.26 ppm. Among the samples from the second location, the mean pollution level is 1.43 ppm with a standard deviation of 0.22 ppm.
We want to test at ๐ผ = 0.05 whether the two mean pollution levels differ from each other.
Since the researcher does not specify a direction for the alternative, we use a two-sided test. Proceeding through the steps:
1. ๐ป0: ๐1 โ ๐2 = 0 versus ๐ป๐: ๐1โ ๐2 โ 0 2. RR: ๐ง < โ1.96 or ๐ง > 1.96
3. Test statistic: ๐ง = 1.65โ1.43
โ0.262
30 + 0.222
35
= 3.65
4. The test statistic falls in the rejection region, so we can reject the null hypothesis.
We can form a 95% confidence interval for the difference between the pollution levels at the two sites using the formula given above.
[(1.65 โ 1.43) โ 1.96โ0.26302+0.22352, (1.65 โ 1.43) + 1.96โ0.26302+0.22352]
EXERCISES
1. APGAR scores are a 1-10 measure of a newbornโs health and alertness at birth. A researcher is interested in studying whether babies of mothers who smoke while pregnant have lower APGAR scores than babies of mothers who do not smoke. To study this question, the researcher takes a random sample of 35 newborns of mothers who smoke while pregnant. He finds a mean APGAR of 7.80 and a standard deviation of 1.73 among these newborns. He then takes a random sample of 86 newborns of mothers who did not smoke while pregnant. He finds a mean APGAR of 8.48 and a standard deviation of 0.97 among these newborns. Test the hypothesis that the researcher is interested in at a significance level of ๐ผ = 0.05.
2. A random sample of 48 men with new CPA certifications showed a starting salary of $80,168 and a standard deviation of $8000. At the same time, a random sample of 39 women with new CPA certifications showed a starting salary of $70,754 and a standard deviation of $6000.
a. Is there enough evidence to conclude that men are paid more than women? Use a significance level of ๐ผ = 0.01.
7.3: Tests and confidence intervals for difference of proportions
Unit 7.1 dealt with testing whether a proportion was equal to some hypothesized value. But we might be interested in knowing whether proportions from two populations are different from each other. For example, people who live in one area might be exposed to some kind of pollutant, and a researcher might be interested in knowing whether a higher proportion of people from this area contract cancer than people from some other area.
The setup is virtually the same as testing for a difference in two population means. We take a sample of size ๐1 from the first population and record the sample proportion ๐ฬ1 from this population. We then take a sample of size ๐2 from the second population and record the sample proportion ๐ฬ2 from this population. The procedures are as follows.
Right-Sided Left-Sided Two-Sided
1. ๐ป0: ๐1โ ๐2= ๐ท0 versus
๐ป๐: ๐1โ ๐2> ๐ท0
2. RR: ๐ง > ๐ง๐ผ
3. ๐ง = (๐ฬ1โ๐ฬ2)โ๐ท0
โ๐1(1โ๐1)
๐1 +๐2(1โ๐2)๐2 4. Form conclusion.
1. ๐ป0: ๐1โ ๐2 = ๐ท0 versus
๐ป๐: ๐1โ ๐2< ๐ท0
2. RR: ๐ง < โ๐ง๐ผ
3. ๐ง = (๐ฬ1โ๐ฬ2)โ๐ท0
โ๐1(1โ๐1)
๐1 +๐2(1โ๐2)๐2 4. Form conclusion.
1. ๐ป0: ๐1โ ๐2= ๐ท0 versus
๐ป๐: ๐1โ ๐2โ ๐ท0
2. RR: ๐ง < โ๐ง๐ผ 2โ or ๐ง > ๐ง๐ผ 2โ
3. ๐ง = (๐ฬ1โ๐ฬ2)โ๐ท0
โ๐1(1โ๐1)
๐1 +๐2(1โ๐2)๐2 4. Form conclusion.
Note that the procedure is almost identical to the procedure for testing the difference between two means; you just need to calculate the standard error properly. As usual, the standard errors technically depend on the true proportions ๐1 and ๐2, but since these are unknown in practice we substitute the sample proportions ๐ฬ1 and ๐ฬ2.
The usual case is to just to test whether two proportions are different from each other, so we use
๐ท0 = 0 in most cases. In other words. The null hypothesis is ๐1 = ๐2. The right-sided alternative
tests against the alternative that ๐1 > ๐2, and the left-sided alternative tests against the alternative that ๐1 < ๐2.
Right-Sided Left-Sided Two-Sided
1. ๐ป0: ๐1โ ๐2= 0 versus
๐ป๐: ๐1โ ๐2> 0
2. RR: ๐ง > ๐ง๐ผ
3. ๐ง = (๐ฬ1โ๐ฬ2)
โ๐1(1โ๐1)
๐1 +๐2(1โ๐2)๐2 4. Form conclusion.
1. ๐ป0: ๐1โ ๐2 = 0 versus
๐ป๐: ๐1โ ๐2< 0
2. RR: ๐ง < โ๐ง๐ผ
3. ๐ง = (๐ฬ1โ๐ฬ2)
โ๐1(1โ๐1)
๐1 +๐2(1โ๐2)๐2 4. Form conclusion.
1. ๐ป0: ๐1โ ๐2= 0 versus
๐ป๐: ๐1โ ๐2โ 0
2. RR: ๐ง < โ๐ง๐ผ 2โ or ๐ง > ๐ง๐ผ 2โ
3. ๐ง = (๐ฬ1โ๐ฬ2)
โ๐1(1โ๐1)
Some textbooks for the case where the null is ๐1โ ๐2 = 0 take this null that the two are equal as a given and form a โpooledโ proportion as such:
๐ฬ =๐1๐ฬ1+๐2๐ฬ2
๐1+๐2
This is essentially the weighted average of the two population proportions. The idea is that this is the โbestโ estimate of the proportion taken as given the null that the two are equal. This pooled proportion is then substituted in the standard error, so that the test statistic is calculated as:
๐ง = ๐ฬ1โ๐ฬ2
โ๐ฬ (1โ๐ฬ )
๐1 + ๐ ฬ (1โ๐ฬ )
๐2
This technique does not work for the case where we are testing ๐1โ ๐2 = ๐ท0, with ๐ท0 โ 0. Nevertheless, it is good practice for the case where the null is ๐1โ ๐2 = 0.
A 1 โ ๐ผ confidence interval for the true difference ๐1โ ๐2 is given by:
[(๐ฬ1โ ๐ฬ2) โ ๐ง๐ผ 2โ โ๐ฬ1(1โ๐ฬ๐ 1)
1 +
๐ฬ2(1โ๐ฬ2)
๐2 , (๐ฬ1โ ๐ฬ2) + ๐ง๐ผ 2โ โ
๐ฬ1(1โ๐ฬ1)
๐1 +
๐ฬ2(1โ๐ฬ2)
๐2 ]
As an example, suppose that a researcher is interested in testing whether Hondas or Toyotas are more likely to need major repairs within two years of purchase. The researcher takes a sample of 400 Honda owners and 500 Toyota owners. Within the first two years, 53 of the Hondas needed major repairs, while 78 of the Toyotas needed major repairs. We want to test at a significance level ๐ผ = 0.10 whether the two are significantly different.
Note that the sample proportions are:
๐ฬ1 = 40053 = 0.1325
๐ฬ2 =50078 = 0.1560
For hypothesis testing, we will use the pooled proportion:
๐ฬ =400โ 0.1325+500โ 0.1560400+500 = 0.1456
1. ๐ป0: ๐1โ ๐2 = 0 versus ๐ป๐: ๐1โ ๐2 โ 0
2. RR: ๐ง < โ1.645 or ๐ง > 1.645
3. Test statistic: ๐ง = 0.1325โ0.1560
โ0.1456(1โ0.1456)
400 +
0.1456(1โ0.1456) 500
= โ0.99
4. The test statistic does not fall in the rejection region, so we do not reject the null hypothesis.
In this case, there is not enough evidence to conclude that the proportion of cars needing repairs in the first two years is different between the two companies. Indeed, using procedures to compute p-values for a two-sided test, the p-value for this test is 2 โ ๐(๐ง < โ0.99) = 0.3222.
Finally, a 95% confidence interval for the true difference in proportions would be:
[(0.1325 โ 0.1560) ยฑ 1.96โ0.1325(1โ0.1325)400 +0.1560(1โ0.1560)500 ]
EXERCISES
1. In 2009, a magazine conducted a survey in which 92% of married men said that they would vote for a woman as President. However, in 1975, a similar poll showed that only 73% would have voted for a woman. Suppose that the 2009 survey consisted of 2000 observations and the 1975 survey consisted of 1500 observations.
a. Is there enough evidence to conclude at ๐ผ = 0.05 level of significance that a higher proportion of married men are in 2009 willing to vote for a woman as President than in 1975?
b. What is the p-value for the hypothesis test given in (a)?
7.4: Small Sample Sizes
All of our hypothesis tests and confidence intervals so far have relied on an asymptotic normal distribution for the test statistic. For example, the cutoff values for hypothesis tests and the endpoints for the confidence intervals use the ๐ง distribution. The reason for this is the central limit theorem, which we encountered in unit 5.3. The sample average from any distribution can be approximated by a normal distribution as long as the sample size is sufficiently large.
Recall the idea from unit 5 that the exact sampling distribution of the sample mean depends upon the distribution of the population from which it was drawn, but as the sample size gets larger, the distribution of the sample mean approaches a normal distribution. But, again, this is a limiting result and the approximation is only good for large sample sizes.
How large is large? The normal rule of thumb is that sample sizes of about 30 are large enough for the central limit theorem to provide a good approximation. So, any time the sample size is larger than about 30, you can apply the tests and confidence intervals given in the previous section.1
What about small sample sizes? That is, what if the sample size is not large enough to apply the central limit theorem and use the normal distribution for our hypothesis tests and confidence intervals? The basic answer is that you canโt really say anything general. Because the sampling distribution in the small sample size case depends on the exact form of the population distribution, there is no general procedure for testing hypotheses and constructing confidence intervals.
However, for one special case, we can do something. If the population distribution from which the sample is drawn is a normal distribution, then the exact distribution of the normalized sample mean, using the sample standard deviation, is given by the t-distribution. So, in fact we can implement hypothesis testing and confidence intervals for small sample sizes if the population from which the sample is drawn obeys a normal distribution.
How can we check whether the population distribution is normal? There are tests you can do, but practically the quickest way is to just do a quick plot of the data and see whether it appears to be basically symmetric and without any serious outliers. The sensitivity to the assumptions depends on how small the sample size is. If the sample size is tiny, then the validity of hypothesis tests and confidence intervals is very sensitive to the normality assumption. For larger sample sizes, the t-distribution is fairly robust in the sense that it works well as long as the distribution is reasonably close to normal, i.e. free from very serious skew and/or outliers.
1 One warning about this was already discussed earlier. For tests involving proportions, you need very large sample
To summarize the basic principles for implementing hypothesis testing and confidence intervals with various sample sizes:
๏ท If the sample size is large (๐ > 30 or so), then the central limit theorem applies, and you should use the large-sample tests based on the z-distribution covered in previous sections.
๏ท If the sample size is small (๐ < 30 or so) and the population from which the sample is drawn is normal, you should use the small-sample tests described in this section.
๏ท If the sample size is small (๐ < 30 or so) and the population from which the sample is drawn is not normal, then there is nothing you can do. You donโt have enough information to do any reliable statistical inference.
Note that the small-sample tests and confidence intervals based on the t-distribution should never be used for testing a sample proportion. In this case, the population is a series of 0/1 observations (i.e. the condition is either true or false) which by definition do not obey a normal distribution. So the small-sample inferences discussed in this section do not make sense for tests and confidence intervals involving proportions.
For small-sample inference involving the value of a mean, the implementation is virtually the same as the implementation for the large-sample case, with one exception. Hypothesis tests and confidence intervals for the large-sample case are based on the limiting z-distribution, which is a good approximation of the sampling distribution only in the case of large sample sizes. But for the small sample case we instead use the t-distribution, which is the exact sampling distribution when the population from which the sample is drawn is normal.
For small-sample hypothesis tests and confidence intervals involving the value of a mean, we use the t-distribution with ๐ โ 1 degrees of freedom, where ๐ is the sample size. For purposes of completeness, the procedures for the hypothesis tests and confidence interval are below. They are identical to the large-sample case except for the use of the t-distribution in place of the z-distribution.
Right-Sided Left-Sided Two-Sided
1. ๐ป0: ๐ = ๐0 versus ๐ป๐: ๐ > ๐0 2. RR: ๐ก > ๐ก๐ผ
3. ๐ก =๐ฅฬ โ๐0
๐ โ๐โ
4. Form conclusion.
1. ๐ป0: ๐ = ๐0 versus ๐ป๐: ๐ < ๐0 2. RR: ๐ก < โ๐ก๐ผ
3. ๐ก =๐ฅฬ โ๐0
๐ โ๐โ
4. Form conclusion.
1. ๐ป0: ๐ = ๐0 versus ๐ป๐: ๐ โ ๐0
2. RR: ๐ก < โ๐ง๐ผ 2โ or ๐ก > ๐ง๐ผ 2โ
3. ๐ก =๐ฅฬ โ๐0
๐ โ๐โ
4. Form conclusion.
[๐ฅฬ โ ๐ก๐ผ 2โ โ๐๐ , ๐ฅฬ + ๐ก๐ผ 2โ โ๐๐ ]
As an example, suppose that customers rate airports on a scale of 1-10, and that these ratings are known to be approximately normally distributed. A researcher surveys 12 people at random from Amsterdamโs Schiphol Airport about their customer satisfaction and obtains a sample mean ๐ฅฬ = 7.75 and a sample standard deviation ๐ = 1.215. The researcher is interested in testing whether there is sufficient evidence to conclude that the true mean rating of customers at Schiphol Airport exceeds 7. We want to test at the ๐ผ = 0.05 level of significance.
Applying the steps for a right-sided alternative as given above.
1. ๐ป0: ๐ = 7 versus ๐ป๐: ๐ > 7
2. RR: ๐ก > 1.796 (Note that this is read from the line with ๐ = 12 โ 1 = 11 degrees of
freedom). 3. ๐ก = 7.75โ7
1.215 โ12โ = 2.14
4. Since ๐ก falls in the rejection region, we reject the null hypothesis.
In this case, we have enough evidence to conclude at the 5% level of significance that the true mean rating for Schiphol Airport does exceed 7.
What is the p-value for the test? We donโt have the whole distribution, but if we look at the line in the t-table for 11 degrees of freedom, observe that ๐ก0.05 = 1.796 and that ๐ก0.025 = 2.201. Since our calculated t-statistic for this test is ๐ก = 2.14, notice that the test rejects the null hypothesis for ๐ผ = 0.05 but would not have fallen in the rejection region for ๐ผ = 0.025. Thus, we know that the p-value is somewhere in the interval 0.025 < ๐ < 0.05, since the p-value is the lowest level of significance for which the null hypothesis can be rejected.
A 95% confidence interval for the true mean rating for the airport is:
[7.75 โ 2.2011.215
โ12 , 7.75 + 2.201
1.215
โ12] = [6.98, 8.52]
The procedures for hypothesis testing and confidence intervals are basically the same as the procedures for the large sample case, again substituting asymptotic values from the z-distribution with exact values from the t-distribution. In this case, the relevant t-distribution is that with ๐1+
๐2 โ 2 degrees of freedom.
One difference is that these tests and confidence intervals use a โpooledโ variance estimator, which is calculated as follows:
๐ ๐2 = (๐1โ1)๐ 1
2+(๐ 2โ1)๐ 22
๐1+๐2โ2
The procedures for a hypothesis test are as follows:
Right-Sided Left-Sided Two-Sided
1. ๐ป0: ๐1โ ๐2= ๐ท0 versus
๐ป๐: ๐1โ ๐2> ๐ท0
2. RR: ๐ก > ๐ก๐ผ
3. ๐ก =(๐ฅฬ 1โ๐ฅฬ 2)โ๐ท0
โ๐ ๐2(๐11+๐21)
4. Form conclusion.
1. ๐ป0: ๐1โ ๐2= ๐ท0 versus
๐ป๐: ๐1โ ๐2< ๐ท0
2. RR: ๐ก < โ๐ก๐ผ
3. ๐ก =(๐ฅฬ 1โ๐ฅฬ 2)โ๐ท0
โ๐ ๐2(๐11+๐21)
4. Form conclusion.
1. ๐ป0: ๐1โ ๐2= ๐ท0 versus
๐ป๐: ๐1โ ๐2โ ๐ท0
2. RR: ๐ก < โ๐ก๐ผ 2โ or ๐ก > ๐ก๐ผ 2โ
3. ๐ก =(๐ฅฬ 1โ๐ฅฬ 2)โ๐ท0
โ๐ ๐2(๐11+๐21)
4. Form conclusion.
Again, the normal case is testing whether two population means are equal. That is, we test whether the difference is ๐ท0 = 0.
A 1 โ ๐ผ confidence interval for the true difference ๐1โ ๐2 is given by:
[(๐ฅฬ 1โ ๐ฅฬ 2) โ ๐ก๐ผ 2โ โ๐ ๐2(๐1
1+
1
๐2) , (๐ฅฬ 1โ ๐ฅฬ 2) + ๐ก๐ผ 2โ โ๐ ๐
2(1 ๐1+
1 ๐2)]
For example, suppose that wait times to speak to a reservation agent when calling major airlines are known to be normally distributed. A marketing company randomly placed 22 calls to Delta, waiting an average of 2.5 minutes with a standard deviation of 0.8 minutes. The company also randomly placed 20 calls to Southwest, waiting an average of 2.1 minutes with a standard deviation of 1.1 minutes. The company is interested in testing at a significance level of ๐ผ = 0.05 whether there is a difference in mean waiting times at the two companies.
๐ ๐2 = (22โ1)0.8
2+(20โ1)1.12
22+20โ2 = 0.9108
We can now apply the steps for a two-sided test as given above:
1. ๐ป0: ๐1 โ ๐2 = 0 versus ๐ป๐: ๐1โ ๐2 โ 0
2. RR: ๐ก < โ2.021 or ๐ก > 2.021 (Note that this is read from the line with ๐1+ ๐2โ 2 =
22 + 20 โ 2 = 40 degrees of freedom).
3. ๐ก = (2.5โ2.1)โ0
โ0.9108(201+221)= 1.36
4. Since ๐ก does not fall in the rejection region, we cannot reject the null hypothesis.
The data gathered by the marketing firm does not provide sufficient evidence to conclude that the true mean waiting times for calls placed to the two airlines are actually different.
For the p-value, note that ๐ก0.05= 1.684 and ๐ก0.10 = 1.303. Our calculated test statistic ๐ก = 1.36 falls between the two. However, since it is a two-sided test, we have to double these. Thus, the p-value is somewhere in the interval 0.10 < ๐ < 0.20 (because the rejection region has to be symmetric on both sides). This is not very good evidence for rejecting the null hypothesis. The p-value gives the lowest level of significance at which the null hypothesis can be rejected.
A 95% confidence interval for the true mean difference ๐1โ ๐2 is given by:
[0.4 โ 2.021โ0.9108 (221 +201) , 0.4 + 2.021โ0.9108 (221 +201)] = [โ0.1959, 0.9959]
One final methodological note โ If the line for the relevant degrees of freedom is not shown on the t-table you are using, it is standard practice to read from the line with the next-lowest number of degrees of freedom. For example, if you need to use the t-distribution with 37 degrees of freedom (which is not given on the table), you should instead use the line for the t-distribution with 35 degrees of freedom.
EXERCISES
For the exercises below, you can assume that the populations from which the samples are drawn are normally distributed.
1. Environmental regulations specify that the mean level of some toxin in fish be lower than 102 ppm. A field worker takes a sample of 5 fish and obtains the following readings:
{99, 102, 94, 99, 95}.
a. Is there enough evidence to conclude at significance level ๐ผ = 0.05 that the regulated standard is being met?
b. What is the p-value for the test in (a)?
c. Form a 99% confidence interval for the true mean toxicity level.
7.5: Tests and Confidence Intervals for Variances
The previous sections have dealt with hypothesis tests and confidence intervals for means and proportions. In some circumstances, we may also be interested in inference involving variances. For example, a company might produce parts on its machines and specify that the variance in the partโs weight should not exceed 0.001 grams. That is, the company is not interested in studying the mean weight of the parts, but rather is interested in studying the variability in the parts it produces. In this section, we will cover testing whether a variance is equal to a particular null value. In the next section, we will cover comparing variances of two different populations.
The central limit theorem told us that, for large enough sample sizes, the asymptotic distribution of the sample mean was always approximately normal, regardless from the population result from which it was drawn. Unfortunately, there is no similar result for sample variances. That is, even in large samples, the distribution of the sample variance will always depend upon the distribution of population from which the sample is drawn. Thus, there simply are no general results for inference dealing with variances, even for large sample sizes.
However, in the special case where the population distribution is normal, it is known that the sample variance obeys a ๐2 distribution. This is read โchi-squared distributionโ. To emphasize, inferences in this section dealing with variances are only valid for the case where the population distribution is normal. Even for large samples, the ๐2 distribution describes the distribution of the sample variance only for the case where the population from which the data are generated is normal. In this specific case, we can develop hypothesis tests and confidence intervals as given below.
The details of the test are shown below. The null is that the true population variance is equal to some null value ๐02. We can then implement a hypothesis test of this null hypothesis against right-sided, left-sided or two-sided alternatives. We use the sample variance ๐ 2 to construct a test statistic, and we compare this test-statistic against our rejection region, based on the ๐2 distribution with ๐ โ 1 degrees of freedom, tables of which are easily accessible.
Right-Sided Left-Sided Two-Sided
1. ๐ป0: ๐2= ๐02 versus
๐ป๐: ๐2> ๐02
2. RR: ๐2> ๐๐ผ2
3. ๐2=(๐โ1)๐ 2
๐02
4. Form conclusion.
1. ๐ป0: ๐2= ๐02 versus
๐ป๐: ๐2< ๐02
2. RR: ๐2< ๐1โ๐ผ2
3. ๐2=(๐โ1)๐ 2
๐02 4. Form conclusion.
1. ๐ป0: ๐2= ๐02 versus
๐ป๐: ๐2โ ๐02
2. RR: ๐2> ๐๐ผ 22โ or ๐2< ๐1โ๐ผ 22 โ 3. ๐2=(๐โ1)๐ 2
๐02
4. Form conclusion.
values for test statistics. When we use the z-distribution, for example, the cutoff value for an upper-tailed test with ๐ผ = 0.05 significance is ๐ง0.05= 1.645. This is the z-statistic that cuts off the top 5% of the distribution. But since the distribution is symmetric, the critical value for a lower-tailed test cuts off the lower 5% of the distribution. This is ๐ง0.95 = โ๐ง0.05= โ1.645. Similarly, the cutoff values for a 2-tailed test with 5% significance level are ๐ง0.025 = 1.96 and
๐ง0.975 = โ๐ง0.025 = โ1.96.
However, if the distribution were not symmetric about zero, then it would not be true in general that ๐ง0.95 = โ๐ง0.05. For example, suppose we are using the ๐2 distribution with 10 degrees of
freedom. To test against a right-sided alternative at significance level ๐ผ = 0.05, the relevant rejection region is ๐2 > 18.3 since ๐0.052 = 18.3. However, to test against a left-sided alternative at the same significance level, the rejection region is ๐2 < 3.94 since ๐0.952 = 3.94. In other words, this is the critical value that chops off the bottom 5% of the distribution. To test against a two-sided alternative at the same significance level, the relevant rejection region is ๐2 > 20.5 or
๐2 < 3.25 since ๐
0.0252 = 20.5 and ๐0.9752 = 3.25.
A 1 โ ๐ผ confidence interval for the true population variance is given by:
[(๐โ1)๐ ๐ 2
๐ผ 2โ 2 ,
(๐โ1)๐ 2
๐1โ๐ผ 22 โ ]
As an example, suppose that fill measurements in soda cans are known to be normally distributed. An inspector is interested in testing whether there is evidence that the variance in fill measurements is less than 0.01 ounces. He takes a random sample of 10 cans and finds that the sample variance is 0.0016 ounces. We want to use this data to test the inspectorโs hypothesis at a significance level ๐ผ = 0.05. Implementing the steps for a left-sided hypothesis test as given above.
1. ๐ป0: ๐2 = 0.01 versus ๐ป๐: ๐2 < 0.01
2. RR: ๐2 < 3.33 (Note that this is read from the line with ๐ โ 1 = 10 โ 1 = 9 degrees of
freedom and using ๐1โ๐ผ2 = ๐0.952 ). 3. ๐2 = (10โ1)โ 0.00160.01 = 1.44
4. Since ๐2 falls in the rejection region, we can reject the null hypothesis.
A 95% confidence interval for the true variance in fill measurements is given by:
[(10โ1)โ 0.001619.0 ,(10โ1)โ 0.00162.70 ] = [0.000758, 0.005333]
EXERCISES
1. A company produces machined engine parts that are supposed to have a diameter variance that is no greater than 0.0002 inches. A random sample of 10 parts gave a sample variance of 0.0003. Is this enough evidence to reject the null hypothesis at a 5% level of significance?
2. An experimenter was convinced that the variability in his measuring equipment yielded a variance of 4, but 16 measurements resulted in a sample variance of 6.1.
a. Determine whether there is enough evidence to reject the experimenterโs claim at significance level ๐ผ = 0.05.
b. What is the p-value associated with this test?
7.6: Tests for Equality of Variances
In the previous section, we tested whether a true population variance took on a particular value. In this section, we will compare two population variances and test whether the two variances are equal or whether there is evidence that the variances are different.
As with the previous section, these tests are valid only under the assumption that the population distributions are both normal. Even in large samples, these tests are specifically for comparing variances of normally distributed populations. There is no generally valid distribution to use for inference when dealing with non-normal populations. However, for normally distributed populations, we use the fact that the sampling distribution for the ratio of two sample variances is known to obey the F-distribution.
The testing procedures are as given below. The idea is that we are testing whether the true variances from two different populations ๐12 and ๐22 are equal to each other. We can test against the right-sided alternative ๐12 > ๐22, the left-sided alternative ๐12 < ๐22 or against the two-sided alternative ๐12 โ ๐22. The test statistic is based on the observed sample variances from the two populations ๐ 12 and ๐ 22.
Right-Sided Left-Sided Two-Sided
1. ๐ป0: ๐12= ๐22 versus
๐ป๐: ๐12> ๐22
2. RR: ๐น > ๐น๐ผ 3. ๐น =๐ 12
๐ 22
4. Form conclusion.
1. ๐ป0: ๐12= ๐22 versus
๐ป๐: ๐12< ๐22
2. RR: ๐น > ๐น๐ผ 3. ๐น =๐ 22
๐ 12
4. Form conclusion.
1. ๐ป0: ๐12= ๐22 versus
๐ป๐: ๐12โ ๐22
2. RR: ๐น > ๐น๐ผ 2โ
3. ๐น = larger sample variance
smaller sample variance
4. Form conclusion.
The relevant F-distribution to use for the critical values depends on the numerator and the denominator degrees of freedom. Whichever sample variance is in the numerator of the test statistic, the โnumerator degrees of freedomโ is the sample size used in calculating this sample variance minus one. Whichever sample variance is in the denominator of the test statistic, the โdenominator degrees of freedomโ is the sample size used in calculating this sample variance minus one.
For example, suppose that a manager wants to test the variability in production levels at two different feed plants. Production levels are known to be normally distributed. He records daily production levels at the two plants. At the first plant, he records production levels for 13 days and observes a mean production level of 26.3 tons with a variance of 67.24 tons. At the second plant, he records production levels for 18 days, and observes a mean production level of 19.7 tons with a variance of 22.09 tons. Is there sufficient evidence to conclude at significance level
Since no direction is specified for the alternative, it is appropriate to use a two-sided alternative. Following the procedures outlined above:
1. ๐ป0: ๐12 = ๐22 versus ๐ป๐: ๐12 โ ๐22
2. RR: ๐น > 2.82 (Note that this is read from the table giving critical values for ๐น๐ผ 2โ =
๐น0.025. The larger sample variance will appear in the numerator of our test statistic, and from this first plant we have 13 observations. The smaller sample variance will appear in the denominator of our test statistic, and from this second plant we have 18 observations. Thus, the relevant critical value uses 12 degrees of freedom in the numerator and 17 degrees of freedom in the denominator).
3. ๐น =67.2422.09= 3.04
4. Since the test statistic falls in the rejection region, we reject the null hypothesis.
Thus, there is sufficient evidence for the manager to reject the null hypothesis that the variability in production levels at the two plants is equal in favor of the alternative that the variability differs between the two plants.
EXERCISES
1. Suppose that inflation rates are known to be normally distributed. A country hires a new central bank president, and some people are concerned that inflation rates are becoming more volatile. A random sample of 6 monthly inflation rates before the new president took over showed a mean of 2.4167 and a variance of 2.0618 (in percentage terms). A random sample of 5 monthly inflation rates taken after the new president took over showed a mean of 4.36 and a variance of 8.8381. Is this enough evidence to conclude at significance level ๐ผ = 0.01 that the variance in inflation rates is higher under the new president than it was before the new president took over? What can you say about the p-value of the test?
2. The closing prices of two common stocks were recorded for a period of 16 days. Stock prices are known to be normally distributed over short periods of time. The standard deviation in closing prices for the first stock was 1.24 but the variance in closing prices for the second stock was 1.72. Is this enough evidence to conclude at significance level
๐ผ = 0.05 that there is a difference in variability in closing prices of the two stocks? What