swm02_22.ppt

(1)

(2)

Chapter 22

(3)

Comparing Two Proportions

 Comparisons between two percentages are much

more common than questions about isolated percentages. And they are more interesting.

 We often want to know how two groups differ,

whether a treatment is better than a placebo

(4)

Another Ruler

 In order to examine the difference between two

proportions, we need another ruler—the standard deviation of the sampling distribution model for

the difference between two proportions.

 Recall that standard deviations don’t add, but

(5)

The Standard Deviation of the Difference Between Two Proportions

 Proportions observed in independent random

samples are independent. Thus, we can add their variances. So…

 The standard deviation of the difference between

two sample proportions is

 Thus, the standard error is





1 1 2 2

1 2

ˆ ˆ p q p q

SD p p

n n

  





1 1 2 2

1 2

ˆ ˆ ˆ ˆ ˆ ˆ p q p q

SE p p

n n

(6)

Assumptions and Conditions

 Independence Assumptions:

 Randomization Condition: The data in each group

should be drawn independently and at random from a homogeneous population or generated by a

randomized comparative experiment.

 The 10% Condition: If the data are sampled without

replacement, the sample should not exceed 10% of the population.

 Independent Groups Assumption: The two groups

(7)

Assumptions and Conditions (cont.)

 Sample Size Condition:

 Each of the groups must be big enough…

 Success/Failure Condition: Both groups are big

(8)

The Sampling Distribution

 We already know that for large enough samples,

each of our proportions has an approximately Normal sampling distribution.

(9)

The Sampling Distribution (cont.)

 Provided that the sampled values are

independent, the samples are independent, and the samples sizes are large enough, the sampling distribution of is modeled by a Normal

model with

 Mean:

 Standard deviation:

1 2

p







1 2

ˆ ˆ

p p





1 1 2 2

1 2

ˆ ˆ p q p q

SD p p

n n

(10)

Two-Proportion

z

-Interval

 When the conditions are met, we are ready to find the

confidence interval for the difference of two proportions:

 The confidence interval is

where

 The critical value z* depends on the particular confidence

level, C, that you specify.





1 1 2 2 1 2

1 2

ˆ ˆ ˆ ˆ ˆ ˆ p q p q

SE p p

n n

  



p

ˆ

1

p

ˆ

2



z

SE p



ˆ

1

p

ˆ

2





(11)

Example 1: Prostate Cancer

There has been a debate among doctors over whether surgery can prolong life among men suffering from prostate cancer, a type of cancer that typically develops and spreads very slowly. In the

summer of 2003, The New England Journal of Medicine published results of some Scandinavian research. Men diagnosed with

prostate cancer were randomly assigned to either undergo surgery or not. Among the 347 men who had surgery, 16 eventually died of prostate cancer, compared with 31 of the 348 men who did not have surgery.

a. Was this an experiment or an observational study?

b. Create a 95% confidence interval for the difference in rates of death

for the two groups of men.

c. Based on you confidence interval, is there evidence that surgery

(12)

Example 1 continued….

a. Experiment – we’re assigning treatments! b. Check conditions….(-.0802,-.0058)

c. Since the entire interval lies under 0, there is

(13)

Everyone into the Pool

 The typical hypothesis test for the difference in

two proportions is the one of no difference. In symbols, H₀: p₁ – p₂ = 0.

 Since we are hypothesizing that there is no

difference between the two proportions, that means that the standard deviations for each proportion are the same.

 Since this is the case, we combine (pool) the

(14)

Everyone into the Pool (cont.)

 The pooled proportion is

where and

 If the numbers of successes are not whole

numbers, round them first. (This is the only

time you should round values in the middle of a calculation.)

1 2

ˆ

_pooled

Success

p

n







1 1 1

ˆ

(15)

Everyone into the Pool (cont.)

 We then put this pooled value into the formula,

substituting it for both sample proportions in the standard error formula:



1 2



1 2

ˆ

pooled pooled pooled pooled

pooled

p

q

p

q

SE

p

n

(16)

Compared to What?

 We’ll reject our null hypothesis if we see a large

enough difference in the two proportions.

 How can we decide whether the difference we

see is large?

 Just compare it with its standard deviation.

 Unlike previous hypothesis testing situations, the

null hypothesis doesn’t provide a standard

(17)

Two-Proportion

z

-Test

 The conditions for the two-proportion z-test are

the same as for the two-proportion z-interval.

 We are testing the hypothesis H

0: p1 = p2.

 Because we hypothesize that the proportions are

equal, we pool them to find

1 2

ˆ

_pooled

Success

p

n





(18)

Two-Proportion

z

-Test (cont.)

 We use the pooled value to estimate the standard error:

 Now we find the test statistic:

 When the conditions are met and the null hypothesis is true,

this statistic follows the standard Normal model, so we can



1 2



1 2

ˆ ˆ ˆ ˆ

ˆ ˆ pooled pooled pooled pooled

pooled

p q p q

SE p p

n n   





1 2 1 2

ˆ

pooled

p

z

SE

p





(19)

Example 2: Snoring and Age

The National Sleep Foundation asked a random sample of 1010 US adults questions about their sleep habits. The sample was selected in the fall of 2001 from random

telephone numbers, stratified by region and sex,

guaranteeing that an equal number of men and women were interviewed (2002 Sleep in America Poll, National Sleep Foundation, Washington, D.C.). One of the

questions asked about snoring. Of the 995 respondents, 37% of adults reported that they snored at least a few

nights a week during the past year. Split into two age categories, 26% of the 184 people under 30 snored,

(20)

Example 2 continued…

Hypothesis:

The null hypothesis is that there is no difference in snoring rates in the two age groups. The alternative hypothesis is that the rates are different.

0 :

0









young old

A

young old

p

H

p

(21)

Example 2 continued…

Model

1. Randomization: The patients were randomly selected

by telephone number and stratified by sex and region.

2. Independence: The two groups are independent of each

other because the sample was selected at random.

3. 10% condition: The number of adults surveyed is

certainly far less that 10% of the population.

4. Success Failure: In the younger age group, 48 snored

and 136 didn’t. In the older group, 318 snored and 493 didn’t. The observed numbers of both successes and failures are more than 10 for both groups.

(22)

(23)

Example 2 continued…

Conclusion:

Using the 2-proportion z-test, the P-value of .0008 is very small. This means that if there really were no difference in snoring rates between the age

groups, the probability of the difference observed in this study is only .08%. This is so small that I reject the null hypothesis. I conclude that there is a difference in the rate of snoring between older and younger adults.