A-1 UNDERSTANDING STATISTICAL INTERVALS It is often desirable to use the terms confidence, tolerance, and prediction interval. (The use of tolerance in this instance is different from that of PTC 1.) An eloquent treatment of these terms is given by Gerald Hahn [24] in his 1970 paper, which is reproduced here by permission.
Statistical intervals are frequently misunderstood and misused. Here is an explanation of when to use confidence, tolerance, and prediction intervals.
Engineers have come to appreciate that few things in life are known exactly. The most they can do is obtain an estimate and construct an interval which, with a high probability, contains the quantity of interest. This article describes three different types of statistical intervals and shows where each should be used.
The three intervals are: (1) a confidence interval to contain a population mean, (2) a tolerance interval to contain a specified proportion of the population, and (3) a prediction interval to contain all of a specified number of future observations.
Many nonstatistical users of statistics are well acquainted with confidence intervals. Some are also aware of tolerance intervals, but most nonstat-isticians know very little about prediction intervals despite their practical importance. A frequent mis-take is to calculate a confidence interval on the population mean when the actual problem calls for a tolerance interval or a prediction interval.
At other times, a tolerance interval is used when a prediction interval is needed.
This confusion is understandable since most texts on statistics devote extensive space to confidence intervals on population parameters, make limited reference to tolerance intervals, and almost never talk about prediction intervals. This is unfortunate because tolerance intervals or prediction intervals are needed as frequently in industrial applications as confidence intervals, and given the required tabulations, the procedure for constructing them
73
This is Job # 000608 $U65 Electronic Page # 73 of 11 PL: PTCT Start Odd
is no more difficult. Table A.1 lists the information needed to construct all three intervals.
CONFIDENCE INTERVAL FOR THE POPULATION MEAN
The sample mean y is an estimate of the un-known mean , but differs from it because of sampling fluctuations. However, it is possible to construct a statistical interval known as a confi-dence interval for the population mean . This interval containswith a specific probability. This probability is known as the associated confidence level. Thus a 95 percent confidence interval on the population mean is an interval which contains
with a probability of 0.95. It is calculated as:
y ± cM(n)s,
where cM (n) is obtained from the first column of Table A-1.1 as a function of n, the sample size.
For the example shown in the box cM (5) p 1.24 and the 95 percent confidence interval for is:
50.10 ± (1.24)(1.31).
Consequently, one can be 95 percent confident that the interval 48.48 to 51.72 contains the un-known value of . More precisely, over a large number of samples, the interval calculated in this manner will contain the unknown mean 95 percent of the time.
TOLERANCE INTERVAL TO CONTAIN A SPECIFIC PROPORTION OF THE POPULATION.
Instead of, or in addition to, a confidence interval to contain, many applications require an interval to enclose a specific proportion of the population.
For a normal distribution, if and are known exactly, it can be stated that 90 percent of the population is located in the interval.
Copyright ASME International
Provided by IHS under license with ASME Licensee=Mott MacDonald Ltd/5956936002
Not for Resale, 08/17/2010 06:03:02 MDT No reproduction or networking permitted without license from IHS
--`,,``,````````,`,,```,`,,-`-`,,`,,`,`,,`---ASME PTC 19.1-2005 TEST UNCERTAINTY
Table A-1.1 Factors for Calculating the Two-Sided 95% Probability Intervals for A Normal Distribution
Factors for Confidence Factors for Tolerance Interval
Number of Interval to Contain to Contain at Least 90%, Factors for Prediction Interval to Contain the Given the Population Mean 95% and 99% of the Values of All of 1, 2, 5, 10, and 20 Future
Observations Population Observations
n cM(n) cT,90(n) cT,95(n) cT,99(n) cP,1(n) cP,2(n) cP,5(n) cP,10(n) cP,20(n)
A two-sided 95 percent interval is y ± c(n)s, where c(n) is the appropriate tabulated value and y and s are the mean and the standard deviation of the given sample of size n.
± 1.64
However, if only sample estimates y and s of the population values and are given, the best that can be stated is that with a chosen probability (say 0.95) the interval contains at least 90, 95, or 99 percent of the population. Such an interval is called a tolerance interval and can be calculated for a normal population with the help of the factors cT,90(n), cT,95(n), and cT,99(n), shown in columns 2, 3, and 4 of Table A-1.1.
For example, it can be stated with 95 percent confidence that the interval:
y ± cT,90(n)s
contains at least 90 percent of a normal population The tolerance interval for the example in the box may be calculated as:
50.10 ± (4.28)(1.31)
or 44.49 to 55.71 where cT,90(n)s p 4.28. Thus, one may be 95 percent confident that the preceding interval contains at least 90 percent of the sampled population.
74
This is Job # 000608 $U65 Electronic Page # 74 of 11 PL: PTCT Except 8886
The fact that both a population proportion (or percentage) and a statistical probability (also a percentage) are associated with a tolerance interval is sometimes confusing to the engineer. The first of these numbers refers to the proportion (or percentage) of the population that the interval is to contain. The second number specifies the probability that the calculated interval really con-tains at least the specified proportion of the popula-tion. Whenandare known exactly, an interval to contain a specified proportion of the population may still be of interest, but, in this case, there is no longer any uncertainty associated with the proportion of the population contained in the interval.
PREDICTION INTERVAL TO CONTAIN ALL OF A SPECIFIED NUMBER OF FUTURE OBSERVATIONS.
Another type of interval is one that will contain all the values of one or more future observations.
This is known as a prediction interval. The last five columns of Table A-1.1 provide values of the factor cP,k(n) such that all of k future observations from the same normal population will be located in the interval:
Copyright ASME International
Provided by IHS under license with ASME Licensee=Mott MacDonald Ltd/5956936002
Not for Resale, 08/17/2010 06:03:02 MDT No reproduction or networking permitted without license from IHS
--`,,``,````````,`,,```,`,,-`-`,,`,,`,`,,`---TEST UNCERTAINTY ASME PTC 19.1-2005
Fig. A-1 How the Lengths of the Statistical Intervals for the Example Compare
y ± cP,k(n)s,
with a probability of 0.95.
For example, if two additional readings are taken from the example in the box, k p 2 and n p 5.
From Table A-1.1 the factor cP,2(5) p 3.70. Thus two future units from the sampled population will be located in the interval:
50.10 ± (3.70)(1.31)
or 45.25 to 54.95, with a probability of 0.95.
The relative lengths of the three intervals ob-tained in the preceding examples are compared in Fig. A-1. It is seen that for the given sample of 5, the confidence interval to contain the population mean is appreciably smaller than both the tolerance interval and the prediction interval. Also a toler-ance interval to include at least 90 percent of the population with a probability of 0.95 is somewhat larger than a prediction interval to contain both of two future observations.
Inspection of the tabulations indicates that a confidence interval on the mean is always smaller than the other two intervals, but that the relative sizes of the tolerance and prediction intervals de-pend upon the proportion of the population to be contained in the prediction interval. Also, unlike the other two intervals, the length of a confidence
75
This is Job # 000608 $U65 Electronic Page # 75 of 11 PL: PTCT Except 8887
interval approaches zero as the sample size in-creases (the interval converging to the point ).
HOW TO SELECT THE RIGHT INTERVAL.
The statistician’s job is to develop correct proce-dures for answering relevant questions. The engi-neer must decide upon the relevant questions.
Once the questions to be answered have been clearly stated, it should be easy to decide upon the correct intervals. The following comments are offered to serve as a guide to the engineer in this process.
The mean is the most commonly used single value to describe a population. For the normal distribution, the meanis one of the two parame-ters which uniquely defines the distribution. It is identical to the median (50 percent point) and mode (most common value) of the distribution.
The population mean is therefore of great interest in characterizing product performance, and is often used as a standard by which competing processes are compared. Its use for such comparisons is especially appropriate when it is reasonable to assume that each of the competing processes has the same statistical variability (as measured by the process standard deviation) and, therefore, the differences between processes can be described
Copyright ASME International
Provided by IHS under license with ASME Licensee=Mott MacDonald Ltd/5956936002
Not for Resale, 08/17/2010 06:03:02 MDT No reproduction or networking permitted without license from IHS
--`,,``,````````,`,,```,`,,-`-`,,`,,`,`,,`---ASME PTC 19.1-2005 TEST UNCERTAINTY
completely by differences in their means. The as-sumption of equal standard deviations is fre-quently made.
Because of random fluctuations, a sample does not provide perfect information about the popula-tion mean . Thus, a confidence interval is estab-lished which contains the unknown value of with a specified degree of confidence.
If, instead of characterizing typical process per-formance, you are interested in estimating the range of variation of the underlying population or of the observations in a future sample, then a tolerance interval or a prediction interval is needed.
Specifically, a tolerance interval is applicable if limits are needed that contain most of the sampled population, while a prediction interval would be used to obtain limits to contain all of a small number of future units from the population. Thus, an engineer who is concerned with the perform-ance of a mass-produced item, such as a transistor or a lamp, would generally be interested in a tolerance interval to enclose a high proportion of the sampled population.
In contrast, a prediction interval to contain all of k future observations may be thought of as the astronaut’s interval. A typical astronaut, who has been assigned to a specific number of flights, is generally not very interested in what will happen on the average in the population of all space flights, of which his happen to be a random sample (confidence interval on the mean), or even what will happen in at least 99 percent of such flights (tolerance interval). His main concern is the worst that will happen in the one, three, or five flights in which he will be personally involved. Similarly, a turbine engineer who is bidding on an order of three units based upon his past experience on five units of the same type, would use a prediction interval to obtain specification limits to contain the performance parameter for all three units with a high probability. Prediction intervals are also required by the typical customer who purchases one or a small number of units of a given product and is concerned with predicting the performance of the particular units he has purchased (in contrast to the long-run performance of the process from which the sample has been selected).
WHERE TO GET MORE INFORMATION.
Standard books on elementary engineering sta-tistics, Reference 4, give prime space to the concept of confidence intervals and, in many cases, also
76
This is Job # 000608 $U65 Electronic Page # 76 of 11 PL: PTCT Except 8886
discuss tolerance intervals, but make no mention of prediction intervals except in a regression context.
Such intervals, however, are discussed in Refer-ences 1, 2, 3, and 5. Further, Reference 3 provides a comprehensive comparison of statistical intervals for a normal population (including more detailed tabulations than are given here) and a discussion of methods for constructing the various intervals.
This article also considers additional types of statis-tical intervals such as:
A prediction interval to contain a future sample mean,
A prediction interval to contain a future sample standard deviation,
A confidence interval for the population stan-dard deviation,
A confidence interval for a population percentile.
Finally, a new time-sharing computer program calculates a wide variety of statistical intervals, including confidence, tolerance, and prediction in-tervals, Reference 6.
THE EXAMPLE PROBLEM
The calculation of the three intervals are illustrated here by the following numerical example. Assume that readings obtained on a normally distributed perform-ance parameter based on a random sample of five units are: 51.4, 49.5, 48.7, 49.3, and 51.6. From this informa-tion, the sample mean y¯ and the sample standard devia-tion s are calculated by well-known expressions:
y pip 1兺n yi/n p (51.4 + . . . + 51.6)/5 p 50.10
1. Hahn, G. J., “Additional Factors for Calculating Prediction Intervals for Samples from a Normal Distribution.” Journal of the American Statistical Association, 65, December, 1970.
2. Hahn, G. J., “Factors for Calculating Two-Sided Prediction Intervals for Samples from a Normal Distribution.” Journal of the American Statistical Association, 64, September, 1969.
Copyright ASME International
Provided by IHS under license with ASME Licensee=Mott MacDonald Ltd/5956936002
Not for Resale, 08/17/2010 06:03:02 MDT No reproduction or networking permitted without license from IHS