Sample Size Determination for Estimation In previous sections, the steps in constructing a confidence interval in estimating an

Interpretation of (1-α)100% Confidence Interval:

9.2 Sample Size Determination for Estimation In previous sections, the steps in constructing a confidence interval in estimating an

unknown parameter ( or p) involves

1. Getting a random sample of size n from the population. 2. Computing the point estimate based on the sample.

3. Choosing the appropriate formula based on the problem (i.e. is the population variance known? Is X normally distributed? Or is Xi defined as a binary variable taking

on 1 for “success” or 0 for “failure” for the ith trial,i= 1, 2, …, n? Is the sample size large? And so on) to calculate the (1-α)100% confidence interval.

4. Interpreting the resulting (1-α)100% confidence interval.

Suppose that the population variance 2 is known and we state, say, that “we are (1-

α)100% confident that  is within the interval . ̅ ⁄ _√𝜎 ̅ ⁄ _√𝜎/.”

Note that  is going to be within the (1-α)100% confidence interval if and only if the error, e of estimating  using ̅ is at most ⁄ _√𝜎. Then, saying “we are (1-α)100% confident

that  is within the interval . ̅ ⁄ _√𝜎 ̅ ⁄ _√𝜎/” is equivalent to saying that “We are

(1-α)100% confident that the error e of estimating  using ̅ cannot exceed ⁄ _√𝜎.”

Now suppose a researcher desires to estimate  using ̅ with (1-α)100% confidence and wishes that the random sample of size n that he takes will give an estimate which is within a specified value e of . That is, he wishes to be (1-α)100% confident that the random sample that he would take will give a realized value of ̅ such that the error of estimating  will not exceed a specified value e. How large a sample is necessary should the researcher take?

In such scenario, we don’t intend to construct (1-α)100% confidence for  from a sample that is already taken from the population. In fact, the sample has not been taken yet and we are about to determine the sample size first. What we have are the following:

1. The population standard deviation  is known.

2. The confidence coefficient (1-α) is known, i.e. the researcher sets how confident he wishes to be in estimating  using ̅.

3. The maximum amount of error ⁄ _√𝜎 in estimating  using ̅ is specified.

Assuming (1), the objective then is to determine the sample size n that satisfies (2) and (3). But since zα/2,  and e are known, then the formula ⁄ _√𝜎 will give . ⁄ / .

So to interpret, “We can be (1-α)100% confident that getting a random sample of . ⁄ _/

will provide an estimate which is at most a specified amount e away from the value of  ”.

Note that the larger the sample size is, the smaller the standard error _√𝜎 of the sample mean is. The possible values of the sample mean fluctuate less then as the sample size is increased. But the sample size cannot be increased at the whim of the researcher since each additional unit in the sample entails costs and in whatever study, the research design is influenced by the budgetary constraints. The selection of the sample size is then a compromise between the extent of precision of results desired and the financial considerations.

When the computed sample size is not an integer, we round it up to the nearest integer.

Example

An electrical firm manufactures light bulbs that have a length of life that is approximately normally distributed, with a standard deviation of 40 hours. How large a sample is needed if we wish to be 95% confident that the sample mean will be within 10 hours of the true mean?

Solution

Let L = length of life (in hours) of a light bulb manufactured by a certain electrical firm L approximately ~ N(, 2)

We can be (1-α)100% confident that getting a random sample of . ⁄ / will provide an estimate which is at most a specified amount e away from the value of .

Given:  =40 e =10 1-α = 0.95 α = 0.05 = 0.025 ⁄ = z0.025 = 1.96

Then .( ) ( ) / ( )

Therefore, we could be 95% confident that taking a random sample of 62 light bulbs will provide an estimate which is within 10 hours of the true mean length of life of light bulbs.

Let. { 𝑟 𝑟 _𝑟 . Recall that P is the

probability of success. An estimator of P is the sample proportion, ̂ ∑ , the proportion of success in the sample. Recall that E( ̂) = P and Var( ̂) = ( )_n .

Note that the standard error of ̂ involves P, the parameter of interest. Hence, in constructing a (1-α)100% confidence interval for P, the confidence limits are supposedly ̂ √ ( ) and ̂ √ ( ), which are not independent of P, the parameter that is

supposedly being estimated. However, for large samples, little error is introduced in substituting the statistic ̂ for the true proportion P. Therefore, an approximate (1-α)100% confidence interval for P is given by ( ̂ √ ( ) ̂ √ ( )).

Similar to how we determine the sample size in estimating  using ̅, it can be reasoned out that saying we are (1-α)100% confident that P is within the interval

( ̂ √ ( ) ̂ √ ( )) is the same as saying we are (1-α)100% confident that

the error e of estimating P using ̂ cannot exceed √ ( ).

Therefore, a researcher could be (1-α)100% confident that getting a random sample of ( )

will provide an estimate which is at most a specified amount e away from the value of P.

The formula using ( ) is used when an approximate value for p is available. However in some cases when we do not have an approximation of P to start with, we might as well work with the largest sample size that we could obtain given the degree of confidence and the extent of error we are willing to commit. Such maximum sample size is attained by using

P= Q = 0.5. This will give us the conservative formula .

Example: A chemist has prepared a product designed to kill 60% of a particular type of insect.

How large a sample should be used if he desires to be 95% confident that he is 0.02 of the true fraction of insects killed?

In document Course Notes Statistics (Page 121-124)