DISTRIBUTION OF THE DIFFERENCE BETWEEN TWO SAMPLE MEANS

SAMPLING DISTRIBUTIONS

2. Sampling is from a nonnormally distributed population with a known population variance:

5.4 DISTRIBUTION OF THE DIFFERENCE BETWEEN TWO SAMPLE MEANS

Frequently the interest in an investigation is focused on two populations. Specifically, an investigator may wish to know something about the difference between two population means. In one investigation, for example, a researcher may wish to know if it is reason-able to conclude that two population means are different. In another situation, the researcher may desire knowledge about the magnitude of the difference between two population means. A medical research team, for example, may want to know whether or not the mean serum cholesterol level is higher in a population of sedentary office work-ers than in a population of laborwork-ers. If the researchwork-ers are able to conclude that the pop-ulation means are different, they may wish to know by how much they differ. A knowl-edge of the sampling distribution of the difference between two means is useful in investigations of this type.

Sampling from Normally Distributed Populations The following example illustrates the construction of and the characteristics of the sampling distribu-tion of the difference between sample means when sampling is from two normally dis-tributed populations.

EXAMPLE 5.4.1

Suppose we have two populations of individuals—one population (population 1) has experienced some condition thought to be associated with mental retardation, and the other population (population 2) has not experienced the condition. The distribution of

intelligence scores in each of the two populations is believed to be approximately nor-mally distributed with a standard deviation of 20.

Suppose, further, that we take a sample of 15 individuals from each population and compute for each sample the mean intelligence score with the following results:

and If there is no difference between the two populations, with respect to their true mean intelligence scores, what is the probability of observing a difference this large or larger between sample means?

Solution: To answer this question we need to know the nature of the sampling distri-bution of the relevant statistic, the difference between two sample means, Notice that we seek a probability associated with the difference between two sample means rather than a single mean. ■

Sampling Distribution of Construction Although, in prac-tice, we would not attempt to construct the desired sampling distribution, we can concep-tualize the manner in which it could be done when sampling is from finite populations.

We would begin by selecting from population 1 all possible samples of size 15 and com-puting the mean for each sample. We know that there would be such samples where is the population size and Similarly, we would select all possible samples of size 15 from population 2 and compute the mean for each of these samples. We would then take all possible pairs of sample means, one from population 1 and one from popu-lation 2, and take the difference. Table 5.4.1 shows the results of following this procedure.

Note that the 1’s and 2’s in the last line of this table are not exponents, but indicators of population 1 and 2, respectively.

Sampling Distribution of Characteristics It is the distri-bution of the differences between sample means that we seek. If we plotted the sample differences against their frequency of occurrence, we would obtain a normal distribution with a mean equal to the difference between the two population means, and a variance equal to 1s²1>nm1₁2 + 1s- m2,²2>n22.That is, the standard error of the difference between

5.4 DISTRIBUTION OF THE DIFFERENCE BETWEEN TWO SAMPLE MEANS 147

TABLE 5.4.1 Working Table for Constructing the Distribution of the Difference Between Two Sample Means

Samples Samples Sample Sample All Possible

from from Means Means Differences

Population 1 Population 2 Population 1 Population 2 Between Means

x_N₁c_n₁1 - xN2c_n₂2

sample means would be equal to It should be noted that these properties convey two important points. First, the means of two distributions can be subtracted from one another, or summed together, using standard arithmetic operations.

Second, since the overall variance of the sampling distribution will be affected by both contributing distributions, the variances will always be summed even if we are interested in the difference of the means. This last fact assumes that the two distributions are inde-pendent of one another.

For our present example we would have a normal distribution with a mean of 0 (if there is no difference between the two population means) and a variance of The graph of the sampling distribution is shown in Figure 5.4.1.

Converting to z We know that the normal distribution described in Example 5.4.1 can be transformed to the standard normal distribution by means of a modification of a previously learned formula. The new formula is as follows:

(5.4.1)

The area under the curve of corresponding to the probability we seek is

the area to the left of The z value corresponding to

assuming that there is no difference between population means, is

By consulting Table D, we find that the area under the standard normal curve to the left of -1.78is equal to .0375. In answer to our original question, we say that if there is no

z = -13 - 0

D 1202²

15 + 1202² 15

= -13

253.3 = -13

7.3 = -1.78

-13, x₁ - x2 = 92 - 105 = -13.x₁ - x2

z = 1x1 - x22 - 1m1 - m22 A

s²₁ n₁ + s²₂

n₂ 31202²>154 + 31202²>154 = 53.3333.

21s²1>n12 + 1s²2>n22.

FIGURE 5.4.1 Graph of the sampling distribution of when there is no difference between population means, Example 5.4.1.

x1- x2

difference between population means, the probability of obtaining a difference between sample means as large as or larger than 13 is .0375.

Sampling from Normal Populations The procedure we have just followed is valid even when the sample sizes, and are different and when the population variances, and have different values. The theoretical results on which this procedure is based may be summarized as follows.

Given two normally distributed populations with means and and variances and , respectively, the sampling distribution of the difference,

between the means of independent samples of size and drawn from these populations is normally distributed with mean and variance

Sampling from Nonnormal Populations Many times a researcher is faced with one or the other of the following problems: the necessity of (1) sampling from nonnormally distributed populations, or (2) sampling from populations whose functional forms are not known. A solution to these problems is to take large samples, since when the sample sizes are large the central limit theorem applies and the distribution of the difference between two sample means is at least approximately normally distributed with a mean equal to and a variance of To find probabilities associated with specific values of the statistic, then, our procedure would be the same as that given when sampling is from normally distributed populations.

EXAMPLE 5.4.2

Suppose it has been established that for a certain type of client the average length of a home visit by a public health nurse is 45 minutes with a standard deviation of 15 min-utes, and that for a second type of client the average home visit is 30 minutes long with a standard deviation of 20 minutes. If a nurse randomly visits 35 clients from the first and 40 from the second population, what is the probability that the average length of home visit will differ between the two groups by 20 or more minutes?

Solution: No mention is made of the functional form of the two populations, so let us assume that this characteristic is unknown, or that the populations are not normally distributed. Since the sample sizes are large (greater than 30) in both cases, we draw on the results of the central limit theorem to answer the question posed. We know that the difference between sample means is at least approximately normally distributed with the following mean and variance:

5.4 DISTRIBUTION OF THE DIFFERENCE BETWEEN TWO SAMPLE MEANS 149

The area under the curve of that we seek is that area to the right of 20. The corresponding value of z in the standard normal is

In Table D we find that the area to the right of is We say, then, that the probability of the nurse’s ran-dom visits resulting in a difference between the two means as great as or greater than 20 minutes is .1093. The curve of and the correspon-ding standard normal curve are shown in Figure 5.4.2. ■

EXERCISES

5.4.1 The study cited in Exercises 5.3.1 and 5.3.2 gives the following data on serum cholesterol levels in U.S. females:

Population Age Mean Standard Deviation

A 20–29 183 37.2

B 30–39 189 34.7

x₁ - x2

1 - .8907 = .1093. z = 1.23

z = 1x1 - x22 - 1m1 - m22 A

s²₁ n₁ + s²₂

n₂

= 20 - 15

216.4286 = 5

4.0532 = 1.23 x₁ - x2

FIGURE 5.4.2 Sampling distribution of and the corresponding standard normal distribution, home visit example.

x1- x2

Use these estimates as the mean and standard deviation for the respective U.S. populations.

Suppose we select a simple random sample of size 50 independently from each population. What is the probability that the difference between sample means will be more than 8?

5.4.2 In the study cited in Exercises 5.3.4 and 5.3.5, the calcium levels in men and women ages 60 years or older are summarized in the following table:

Mean Standard Deviation

Men 797 482

Women 660 414

Use these estimates as the mean and standard deviation for the U.S. populations for these age groups. If we take a random sample of 40 men and 35 women, what is the probability of obtain-ing a difference between sample means of 100 mg or more?

5.4.3 Given two normally distributed populations with equal means and variances of and what is the probability that samples of size and will yield a value of greater than or equal to 8?

5.4.4 Given two normally distributed populations with equal means and variances of and what is the probability that samples of size and will yield a value of as large as or larger than 12?

5.4.5 For a population of 17-year-old boys and 17-year-old girls, the means and standard deviations, respectively, of their subscapular skinfold thickness values are as follows: boys, 9.7 and 6.0; girls, 15.6 and 9.5. Simple random samples of 40 boys and 35 girls are selected from the populations.

What is the probability that the difference between sample means will be greater than 10?

5.5 DISTRIBUTION OF THE

In document Daniel's Bio-statistics (Page 160-165)