Chapter 22
Comparing Two Proportions
Comparisons between two percentages are much
more common than questions about isolated percentages. And they are more interesting.
We often want to know how two groups differ,
whether a treatment is better than a placebo
Another Ruler
In order to examine the difference between two
proportions, we need another ruler—the standard deviation of the sampling distribution model for
the difference between two proportions.
Recall that standard deviations don’t add, but
The Standard Deviation of the Difference Between Two Proportions
Proportions observed in independent random
samples are independent. Thus, we can add their variances. So…
The standard deviation of the difference between
two sample proportions is
Thus, the standard error is
1 1 2 21 2
1 2
ˆ ˆ p q p q
SD p p
n n
1 1 2 21 2
1 2
ˆ ˆ ˆ ˆ ˆ ˆ p q p q
SE p p
n n
Assumptions and Conditions
Independence Assumptions:
Randomization Condition: The data in each group
should be drawn independently and at random from a homogeneous population or generated by a
randomized comparative experiment.
The 10% Condition: If the data are sampled without
replacement, the sample should not exceed 10% of the population.
Independent Groups Assumption: The two groups
Assumptions and Conditions (cont.)
Sample Size Condition:
Each of the groups must be big enough…
Success/Failure Condition: Both groups are big
The Sampling Distribution
We already know that for large enough samples,
each of our proportions has an approximately Normal sampling distribution.
The Sampling Distribution (cont.)
Provided that the sampled values are
independent, the samples are independent, and the samples sizes are large enough, the sampling distribution of is modeled by a Normal
model with
Mean:
Standard deviation:
1 2
p
p
1 2
ˆ ˆ
p p
1 1 2 21 2
1 2
ˆ ˆ p q p q
SD p p
n n
Two-Proportion
z
-Interval
When the conditions are met, we are ready to find the
confidence interval for the difference of two proportions:
The confidence interval is
where
The critical value z* depends on the particular confidence
level, C, that you specify.
1 1 2 2 1 21 2
ˆ ˆ ˆ ˆ ˆ ˆ p q p q
SE p p
n n
p
ˆ
1p
ˆ
2
z
SE p
ˆ
1p
ˆ
2
Example 1: Prostate Cancer
There has been a debate among doctors over whether surgery can prolong life among men suffering from prostate cancer, a type of cancer that typically develops and spreads very slowly. In the
summer of 2003, The New England Journal of Medicine published results of some Scandinavian research. Men diagnosed with
prostate cancer were randomly assigned to either undergo surgery or not. Among the 347 men who had surgery, 16 eventually died of prostate cancer, compared with 31 of the 348 men who did not have surgery.
a. Was this an experiment or an observational study?
b. Create a 95% confidence interval for the difference in rates of death
for the two groups of men.
c. Based on you confidence interval, is there evidence that surgery
Example 1 continued….
a. Experiment – we’re assigning treatments! b. Check conditions….(-.0802,-.0058)
c. Since the entire interval lies under 0, there is
Everyone into the Pool
The typical hypothesis test for the difference in
two proportions is the one of no difference. In symbols, H0: p1 – p2 = 0.
Since we are hypothesizing that there is no
difference between the two proportions, that means that the standard deviations for each proportion are the same.
Since this is the case, we combine (pool) the
Everyone into the Pool (cont.)
The pooled proportion is
where and
If the numbers of successes are not whole
numbers, round them first. (This is the only
time you should round values in the middle of a calculation.)
1 2
1 2
ˆ
pooledSuccess
Success
p
n
n
1 1 1
ˆ
Everyone into the Pool (cont.)
We then put this pooled value into the formula,
substituting it for both sample proportions in the standard error formula:
1 2
1 2
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
pooled pooled pooled pooledpooled
p
q
p
q
SE
p
p
n
n
Compared to What?
We’ll reject our null hypothesis if we see a large
enough difference in the two proportions.
How can we decide whether the difference we
see is large?
Just compare it with its standard deviation.
Unlike previous hypothesis testing situations, the
null hypothesis doesn’t provide a standard
Two-Proportion
z
-Test
The conditions for the two-proportion z-test are
the same as for the two-proportion z-interval.
We are testing the hypothesis H
0: p1 = p2.
Because we hypothesize that the proportions are
equal, we pool them to find
1 2
1 2
ˆ
pooledSuccess
Success
p
n
n
Two-Proportion
z
-Test (cont.)
We use the pooled value to estimate the standard error:
Now we find the test statistic:
When the conditions are met and the null hypothesis is true,
this statistic follows the standard Normal model, so we can
1 2
1 2
ˆ ˆ ˆ ˆ
ˆ ˆ pooled pooled pooled pooled
pooled
p q p q
SE p p
n n
1 2 1 2ˆ
ˆ
ˆ
ˆ
pooledp
p
z
SE
p
p
Example 2: Snoring and Age
The National Sleep Foundation asked a random sample of 1010 US adults questions about their sleep habits. The sample was selected in the fall of 2001 from random
telephone numbers, stratified by region and sex,
guaranteeing that an equal number of men and women were interviewed (2002 Sleep in America Poll, National Sleep Foundation, Washington, D.C.). One of the
questions asked about snoring. Of the 995 respondents, 37% of adults reported that they snored at least a few
nights a week during the past year. Split into two age categories, 26% of the 184 people under 30 snored,
Example 2 continued…
Hypothesis:
The null hypothesis is that there is no difference in snoring rates in the two age groups. The alternative hypothesis is that the rates are different.
0
:
0
:
0
young old
A
young old
p
p
H
p
p
Example 2 continued…
Model
1. Randomization: The patients were randomly selected
by telephone number and stratified by sex and region.
2. Independence: The two groups are independent of each
other because the sample was selected at random.
3. 10% condition: The number of adults surveyed is
certainly far less that 10% of the population.
4. Success Failure: In the younger age group, 48 snored
and 136 didn’t. In the older group, 318 snored and 493 didn’t. The observed numbers of both successes and failures are more than 10 for both groups.
Example 2 continued…
Conclusion:
Using the 2-proportion z-test, the P-value of .0008 is very small. This means that if there really were no difference in snoring rates between the age
groups, the probability of the difference observed in this study is only .08%. This is so small that I reject the null hypothesis. I conclude that there is a difference in the rate of snoring between older and younger adults.