The sampling distribution of a sample statistic calculated from a sample of n measurements is the probability distribution of the statistic.

Guidelines for Interpreting a Standard Deviation

Deﬁnition 1.21 The sampling distribution of a sample statistic calculated from a sample of n measurements is the probability distribution of the statistic.

In actual practice, the sampling distribution of a statistic is obtained mathemat- ically or by simulating the sampling on a computer using the procedure described previously.

If ¯y has been calculated from a sample of n = 25 measurements selected from a population with mean μ= 10 and standard deviation σ = 5, the sampling distribution shown in Figure 1.13 provides all the information you may wish to know about its behavior. For example, the probability that you will draw a sample of 25 measurements and obtain a value of ¯y in the interval 9 ≤ ¯y ≤ 10 will be the area under the sampling distribution over that interval.

Generally speaking, if we use a statistic to make an inference about a population parameter, we want its sampling distribution to center about the parameter (as is the case in Figure 1.13) and the standard deviation of the sampling distribution, called the standard error of estimate, to be as small as possible.

Two theorems provide information on the sampling distribution of a sample mean.

Theorem 1.1

If y1, y2, . . . , ynrepresent a random sample of n measurements from a large (or inﬁnite) population with mean μ and standard deviation σ , then, regardless of the form of the population relative frequency distribution, the mean and standard error of estimate of the sampling distribution of ¯y will be

Mean: E(¯y) = μ_¯y= μ

Standard error of estimate: σ¯y= √σn

Theorem 1.2

The Central Limit Theorem For large sample sizes, the mean ¯y of a sample from a population with mean μ and standard deviation σ has a sampling distribution that is approximately normal, regardless of the probability distribution of the sampled population. The larger the sample size, the better will be the normal approximation to the sampling distribution of ¯y.

Theorems 1.1 and 1.2 together imply that for sufﬁciently large samples, the sampling distribution for the sample mean ¯y will be approximately normal with mean μ and standard error σ_¯y = σ/√n. The parameters μ and σ are the mean and standard deviation of the sampled population.

How large must the sample size n be so that the normal distribution provides a good approximation for the sampling distribution of ¯y? The answer depends on the shape of the distribution of the sampled population, as shown by Figure 1.14. Generally speaking, the greater the skewness of the sampled population distribution, the larger the sample size must be before the normal distribution is an adequate approximation for the sampling distribution of ¯y. For most sampled populations, sample sizes of n≥ 30 will sufﬁce for the normal approximation to be reasonable. We will use the normal approximation for the sampling distribution of ¯y when the sample size is at least 30.

Figure 1.14 Sampling distributions of¯x for different populations and different sample sizes

Original population Uniform Triangular bimodal Exponential Normal Sampling distribution of x for n = 2 Sampling distribution of x for n = 5 Sampling distribution of x for n = 30 x x x x x x x x x x x x x x x x Example 1.10

Suppose we have selected a random sample of n= 25 observations from a population with mean equal to 80 and standard deviation equal to 5. It is known that the population is not extremely skewed.

(a) Sketch the relative frequency distributions for the population and for the sampling distribution of the sample mean, ¯y.

Solution

(a) We do not know the exact shape of the population relative frequency distri- bution, but we do know that it should be centered about μ= 80, its spread should be measured by σ = 5, and it is not highly skewed. One possibility is shown in Figure 1.15(a). From the central limit theorem, we know that the sampling distribution of ¯y will be approximately normal since the sam- pled population distribution is not extremely skewed. We also know that the sampling distribution will have mean and standard deviation

μ_¯y= μ = 80 and σ_¯y =√σ n =

5 √

25 = 1 The sampling distribution of ¯y is shown in Figure 1.15(b).

Figure 1.15 A population relative frequency

distribution and the sampling distribution for ¯y

(a) Population relative frequency distribution

(b) Sampling distribution of y

70 75 _{m = 80} 85 90 y

77 my= 80 83

(b) The probability that ¯y will exceed 82 is equal to the highlighted area in Figure 1.15. To ﬁnd this area, we need to ﬁnd the z-value corresponding to ¯y = 82. Recall that the standard normal random variable z is the difference between any normally distributed random variable and its mean, expressed in units of its standard deviation. Since ¯y is a normally distributed random variable with mean μ_¯y= μ and standard deviation σ_¯y = σ/√n, it follows that the standard normal z-value corresponding to the sample mean, ¯y, is

z=(Normal random variable)− (Mean)

Standard Deviation = ¯y − μ¯y

σ_¯y

Therefore, for ¯y = 82, we have

z= ¯y − μ¯y σ_¯y =

82− 80 1 = 2

The area A in Figure 1.16 corresponding to z= 2 is given in the table of areas under the normal curve (see Table 1 of Appendix C) as .4772. Therefore, the tail area corresponding to the probability that ¯y exceeds 82 is

Figure 1.16 The sampling distribution of¯y P(y > 82) my= 80 2_σ_y_{= 2} 82 A y

The central limit theorem can also be used to justify the fact that the sum of the sample measurements possesses a sampling distribution that is approximately normal for large sample sizes. In fact, since many statistics are obtained by summing or averaging random quantities, the central limit theorem helps to explain why many statistics have mound-shaped (or approximately normal) sampling distributions.

As we proceed, we encounter many different sample statistics, and we need to know their sampling distributions to evaluate the reliability of each one for making inferences. These sampling distributions are described as the need arises.

1.8 Estimating a Population Mean

We can make an inference about a population parameter in two ways: 1. Estimate its value.

2. Make a decision about its value (i.e., test a hypothesis about its value).

In this section, we illustrate the concepts involved in estimation, using the estimation of a population mean as an example. Tests of hypotheses will be discussed in Section 1.9.

To estimate a population parameter, we choose a sample statistic that has two desirable properties: (1) a sampling distribution that centers about the parameter and (2) a small standard error. If the mean of the sampling distribution of a statistic equals the parameter we are estimating, we say that the statistic is an unbiased estimator of the parameter. If not, we say that it is biased.

In Section 1.7, we noted that the sampling distribution of the sample mean is approximately normally distributed for moderate to large sample sizes and that it possesses a mean μ and standard error σ/√n. Therefore, as shown in Figure 1.17,

Figure 1.17 Sampling distribution of¯y f(y ) y .95 m 1.96sy 1.96sy

Figure 1.18 Locating zα/2

on the standard normal curve

a/2 a/2

−za/2 0 za/2

¯y is an unbiased estimator of the population mean μ, and the probability that ¯y will fall within 1.96 σ_¯y= 1.96 σ/√nof the true value of μ is approximately .95.∗

Since ¯y will fall within 1.96σ_¯y of μ approximately 95% of the time, it follows that the interval

¯y − 1.96σ¯y to ¯y + 1.96σ¯y

will enclose μ approximately 95% of the time in repeated sampling. This interval is called a 95% confidence interval, and .95 is called the confidence coefficient.

Notice that μ is fixed and that the confidence interval changes from sample to sample. The probability that a confidence interval calculated using the formula

¯y ± 1.96σ¯y

will enclose μ is approximately .95. Thus, the confidence coefficient measures the confidence that we can place in a particular confidence interval.

Confidence intervals can be constructed using any desired confidence coefficient. For example, if we define zα/2 to be the value of a standard normal variable that

places the area α/2 in the right tail of the z distribution (see Figure 1.18), then a 100(1− α)% conﬁdence interval for μ is given in the box.

In document (7th Edition) William Mendenhall-A Second Course in Statistics_ Regression Analysis-Prentice Hall (2011) (Page 45-49)