Nihar Ranjan Roy
Business Statistics, By Ken Black, Wiley India Edition
Text Book
Estimating a mean, a proportion, or a variance for a population with a single sample
Estimating
Estimating the population mean using z statistic ( σ is known).
A point estimate is a statistic taken from a sample that is used to estimate a population parameter.
A point estimate is only as good as the representativeness of its sample.
If other random samples are taken from the population, the point estimates derived from those samples are likely to vary. Because of variation in sample statistics, estimating a population parameter with an interval estimate is often preferable to using a point estimate.
An interval estimate (confidence interval) is a range of values within which the analyst can declare, with some confidence, the population parameter lies.
Confidence intervals can be two sided or one sided.
We cover here only two-sided confidence intervals.
How are confidence intervals constructed?
ESTIMATING THE POPULATION MEAN USING z statistic
As a result of the central limit theorem, the following z formula for sample means can be used if the population standard deviation is known when sample sizes are
large, regardless of the shape of the population distribution, or for smaller sizes if the population is normally distributed.
= ̅ − Rearranging the formula we get
= ̅ −
Because a sample mean can be greater than or less than the population mean, z can be positive or negative. Thus the preceding expression takes the following form.
= ̅ ±
central limit theorem
Rewriting this expression yields the
confidence interval formula for estimating with large sample sizes if the population standard deviation is known.
̅ ± Or
̅ − ≤ ≤ ̅ +
where
= the area under the normal curve outside the confidence interval area
= the area in one end (tail) of the
distribution outside the confidence interval
Confidence Interval
For 95% confidence, = .05 and /2 = .025. The value of Z or z.025 is found by looking in the standard normal table under .5000- .0250= .4750.
This area in the table is associated with a z value of 1.96.
Another way can be used to locate the table z value.
Because the distribution is symmetric and the intervals are equal on each side of the population mean, 1⁄2(95%), or .4750, of the area is on each side of the mean.
Z distribution table yields a z value of 1.96 for this portion of the normal curve.
Thus the z value for a 95% confidence interval is always 1.96. In other words, of all the possible values along the
horizontal axis of the diagram, 95% of them should be within a z score of 1.96 from the population mean.
As an example, in the cellular telephone company problem of estimating the population mean number of minutes called per residential user per month, from the sample of 85 bills it was determined that the sample mean is 510 minutes.
Suppose past history and similar studies indicate that the population standard deviation is 46 minutes.
Using this sample mean, a confidence interval can be calculated within which the researcher is relatively confident that the actual population mean is located.
̅ − ≤ ≤ ̅ +
Problem
̅ − ≤ ≤ ̅ +
The confidence interval is constructed from the point estimate, which in this problem is 510 minutes, and the error of this estimate, which is 9.78 minutes. The resulting confidence
interval is 500.22≤µ≤ 519.78. The cellular telephone company researcher is 95%, confident that the average length of a call for the population is between 500.22 and 519.78 minutes.
A survey was taken of U.S. companies that do business with firms in India. One of the questions on the survey was: Approximately how many years has your company been trading with firms in India? A random sample of 44 responses to this question yielded a mean of 10.455 years. Suppose the population standard deviation for this question is 7.7 years. Using this information, construct a 90%
confidence interval for the mean number of years that a company has been
trading in India for the population of U.S. companies trading with firms in India.
Problem
Solution
If ≥ 5% ℎ ℎ
̅ − −
− 1 ≤ ≤ ̅ + −
− 1
Finite Correction
A study is conducted in a company that employs 800 engineers. A random sample of 50 engineers reveals that the average sample age is 34.3 years.
Historically, the population standard deviation of the age of the company’s engineers is approximately 8 years.
Construct a 98% confidence interval to estimate the average age of all the engineers in this company.
Problem
When the population standard deviation is unknown, the sample standard deviation must be used in the estimation process.
The z formulas are inappropriate for use when the population standard deviation is unknown (and is replaced by the sample standard deviation).
Another mechanism to handle such cases was developed by a British statistician,William S. Gosset. (t test)
ESTIMATING THE POPULATION MEAN USING THE t statistics
(σ unknown)
Gosset developed the t distribution, which is used instead of the z distribution for doing inferential statistics on the population mean when the population standard deviation is unknown and the population is normally distributed.
The formula for the t statistic is
t = ̅ −
This formula is essentially the same as the z formula, but the distribution table values are different.
The t distribution actually is a series of distributions because every sample size has a different distribution, thereby creating the potential for many t tables