DISTRIBUTION OF THE DIFFERENCE BETWEEN TWO SAMPLE PROPORTIONS

SAMPLING DISTRIBUTIONS

2. Sampling is from a nonnormally distributed population with a known population variance:

5.6 DISTRIBUTION OF THE DIFFERENCE BETWEEN TWO SAMPLE PROPORTIONS

Often there are two population proportions in which we are interested and we desire to assess the probability associated with a difference in proportions computed from sam-ples drawn from each of these populations. The relevant sampling distribution is the distribution of the difference between the two sample proportions.

Sampling Distribution of Characteristics The character-istics of this sampling distribution may be summarized as follows:

If independent random samples of size and are drawn from two populations of dichotomous variables where the proportions of observations with the characteristic of interest in the two populations are and respectively, the distribution of the difference between sample proportions, is approximately normal with mean

and variance

when and are large.

We consider and sufficiently large when and

are all greater than 5.

Sampling Distribution of Construction To physically con-struct the sampling distribution of the difference between two sample proportions, we would proceed in the manner described in Section 5.4 for constructing the sampling dis-tribution of the difference between two means.

Given two sufficiently small populations, one would draw, from population 1, all possible simple random samples of size and compute, from each set of sample data, the sample proportion . From population 2, one would draw independently all possi-ble simple random samples of size and compute, for each set of sample data, the sample proportion One would compute the differences between all possible pairs of sample proportions, where one number of each pair was a value of and the other a value of The sampling distribution of the difference between sample proportions, then, would consist of all such distinct differences, accompanied by their frequencies (or relative frequencies) of occurrence. For large finite or infinite populations, one could approximate the sampling distribution of the difference between sample proportions by drawing a large number of independent simple random samples and proceeding in the

5.6 DISTRIBUTION OF THE DIFFERENCE BETWEEN TWO SAMPLE PROPORTIONS 155

To answer probability questions about the difference between two sample propor-tions, then, we use the following formula:

(5.6.1)

EXAMPLE 5.6.1

The 1999 National Health Interview Survey, released in 2003 (A-7), reported that 28 per-cent of the subjects self-identifying as white said they had experienced lower back pain during the three months prior to the survey. Among subjects of Hispanic origin, 21 per-cent reported lower back pain. Let us assume that .28 and .21 are the proportions for the respective races reporting lower back pain in the United States. What is the probability that independent random samples of size 100 drawn from each of the populations will yield a value of as large as .10?

Solution: We assume that the sampling distribution of is approximately nor-mal with mean

and variance

The area corresponding to the probability we seek is the area under the curve of to the right of .10. Transforming to the standard normal distribu-tion gives

Consulting Table D, we find that the area under the standard normal curve that lies to the right of is The probability of observ-ing a difference as large as .10 is, then, .3121. ^■ EXAMPLE 5.6.2

In the 1999 National Health Interview Survey (A-7), researchers found that among U.S.

adults ages 75 or older, 34 percent had lost all their natural teeth and for U.S. adults ages 65–74, 26 percent had lost all their natural teeth. Assume that these proportions are the parameters for the United States in those age groups. If a random sample of 250

old are drawn from these populations, find the probability that the difference in percent of total natural teeth loss is less than 5 percent between the two populations.

Solution: We assume that the sampling distribution is approximately normal.

The mean difference in proportions of those losing all their teeth is

and the variance is

The area of interest under the curve of is that to the left of .05. The corresponding z value is

Consulting Table D, we find that the area to the left of is

.2420. ■

EXERCISES

5.6.1 According to the 2000 U.S. Census Bureau (A-8), in 2000, 9.5 percent of children in the state of Ohio were not covered by private or government health insurance. In the neighboring state of Pennsylvania, 4.9 percent of children were not covered by health insurance. Assume that these proportions are parameters for the child populations of the respective states. If a random sample of size 100 children is drawn from the Ohio population, and an independent random sample of size 120 is drawn from the Pennsylvania population, what is the probability that the samples would yield a difference, of .09 or more?

5.6.2 In the report cited in Exercise 5.6.1 (A-8), the Census Bureau stated that for Americans in the age group 18–24 years, 64.8 percent had private health insurance. In the age group 25–34 years, the per-centage was 72.1. Assume that these perper-centages are the population parameters in those age groups for the United States. Suppose we select a random sample of 250 Americans from the 18–24 age group and an independent random sample of 200 Americans from the age group 25–34; find the prob-ability that is less than 6 percent.

5.6.3 From the results of a survey conducted by the U.S. Bureau of Labor Statistics (A-9), it was esti-mated that 21 percent of workers employed in the Northeast participated in health care benefits programs that included vision care. The percentage in the South was 13 percent. Assume these percentages are population parameters for the respective U.S. regions. Suppose we select a sim-ple random samsim-ple of size 120 northeastern workers and an independent simsim-ple random samsim-ple of 130 southern workers. What is the probability that the difference between sample proportions,

will be between .04 and .20?

pN1- pN2,

pN2 - pN1

pN1 - pN2

z = -.70 z = .05 - 1.082

2.00186 = -.70 pN1 - pN2

s²_Np

1-Np2= p₁11 - p12

n₁ + p₂11 - p22

n₂ = 1.3421.662

250 + 1.2621.742

200 = .00186 m_Np_{1 -}_Np₂ = .34 - .26 = .08

pN1 - pN2

EXERCISES 157

5.7 SUMMARY

This chapter is concerned with sampling distributions. The concept of a sampling distri-bution is introduced, and the following important sampling distridistri-butions are covered:

In document Daniel's Bio-statistics (Page 169-172)