Interpretation of (1-α)100% Confidence Interval:
9.2 Sample Size Determination for Estimation In previous sections, the steps in constructing a confidence interval in estimating an
unknown parameter ( or p) involves
1. Getting a random sample of size n from the population. 2. Computing the point estimate based on the sample.
3. Choosing the appropriate formula based on the problem (i.e. is the population variance known? Is X normally distributed? Or is Xi defined as a binary variable taking
on 1 for “success” or 0 for “failure” for the ith trial,i= 1, 2, …, n? Is the sample size large? And so on) to calculate the (1-α)100% confidence interval.
4. Interpreting the resulting (1-α)100% confidence interval.
Suppose that the population variance 2 is known and we state, say, that “we are (1-
α)100% confident that is within the interval . ̅ ⁄ √ 𝜎 ̅ ⁄ √ 𝜎/.”
Note that is going to be within the (1-α)100% confidence interval if and only if the error, e of estimating using ̅ is at most ⁄ √ 𝜎. Then, saying “we are (1-α)100% confident
that is within the interval . ̅ ⁄ √ 𝜎 ̅ ⁄ √ 𝜎/” is equivalent to saying that “We are
(1-α)100% confident that the error e of estimating using ̅ cannot exceed ⁄ √ 𝜎.”
Now suppose a researcher desires to estimate using ̅ with (1-α)100% confidence and wishes that the random sample of size n that he takes will give an estimate which is within a specified value e of . That is, he wishes to be (1-α)100% confident that the random sample that he would take will give a realized value of ̅ such that the error of estimating will not exceed a specified value e. How large a sample is necessary should the researcher take?
In such scenario, we don’t intend to construct (1-α)100% confidence for from a sample that is already taken from the population. In fact, the sample has not been taken yet and we are about to determine the sample size first. What we have are the following:
1. The population standard deviation is known.
2. The confidence coefficient (1-α) is known, i.e. the researcher sets how confident he wishes to be in estimating using ̅.
3. The maximum amount of error ⁄ √ 𝜎 in estimating using ̅ is specified.
Assuming (1), the objective then is to determine the sample size n that satisfies (2) and (3). But since zα/2, and e are known, then the formula ⁄ √ 𝜎 will give . ⁄ / .
So to interpret, “We can be (1-α)100% confident that getting a random sample of . ⁄ /
will provide an estimate which is at most a specified amount e away from the value of ”.
Note that the larger the sample size is, the smaller the standard error √ 𝜎 of the sample mean is. The possible values of the sample mean fluctuate less then as the sample size is increased. But the sample size cannot be increased at the whim of the researcher since each additional unit in the sample entails costs and in whatever study, the research design is influenced by the budgetary constraints. The selection of the sample size is then a compromise between the extent of precision of results desired and the financial considerations.
When the computed sample size is not an integer, we round it up to the nearest integer.
Example
An electrical firm manufactures light bulbs that have a length of life that is approximately normally distributed, with a standard deviation of 40 hours. How large a sample is needed if we wish to be 95% confident that the sample mean will be within 10 hours of the true mean?
Solution
Let L = length of life (in hours) of a light bulb manufactured by a certain electrical firm L approximately ~ N(, 2)
We can be (1-α)100% confident that getting a random sample of . ⁄ / will provide an estimate which is at most a specified amount e away from the value of .
Given: =40 e =10 1-α = 0.95 α = 0.05 = 0.025 ⁄ = z0.025 = 1.96
Then .( ) ( ) / ( )
Therefore, we could be 95% confident that taking a random sample of 62 light bulbs will provide an estimate which is within 10 hours of the true mean length of life of light bulbs.
Let. { 𝑟 𝑟 𝑟 . Recall that P is the
probability of success. An estimator of P is the sample proportion, ̂ ∑ , the proportion of success in the sample. Recall that E( ̂) = P and Var( ̂) = ( )n .
Note that the standard error of ̂ involves P, the parameter of interest. Hence, in constructing a (1-α)100% confidence interval for P, the confidence limits are supposedly ̂ √ ( ) and ̂ √ ( ), which are not independent of P, the parameter that is
supposedly being estimated. However, for large samples, little error is introduced in substituting the statistic ̂ for the true proportion P. Therefore, an approximate (1-α)100% confidence interval for P is given by ( ̂ √ ( ) ̂ √ ( )).
Similar to how we determine the sample size in estimating using ̅, it can be reasoned out that saying we are (1-α)100% confident that P is within the interval
( ̂ √ ( ) ̂ √ ( )) is the same as saying we are (1-α)100% confident that
the error e of estimating P using ̂ cannot exceed √ ( ).
Therefore, a researcher could be (1-α)100% confident that getting a random sample of ( )
will provide an estimate which is at most a specified amount e away from the value of P.
The formula using ( ) is used when an approximate value for p is available. However in some cases when we do not have an approximation of P to start with, we might as well work with the largest sample size that we could obtain given the degree of confidence and the extent of error we are willing to commit. Such maximum sample size is attained by using
P= Q = 0.5. This will give us the conservative formula .
Example: A chemist has prepared a product designed to kill 60% of a particular type of insect.
How large a sample should be used if he desires to be 95% confident that he is 0.02 of the true fraction of insects killed?