Confidence Intervals for Validating Simulation Models Confidence Intervals for Validating Simulation Models11 Summary
Summary. This paper summarizes the processes for building and using confidence intervals to evaluate. This paper summarizes the processes for building and using confidence intervals to evaluate the validity of simulation models.
the validity of simulation models.22
We can use statistical methods to evaluate the goodness of fit between simulation engines and real life. We can use statistical methods to evaluate the goodness of fit between simulation engines and real life. This requires the operator to select a
This requires the operator to select a probabilitprobability of compliance that feels y of compliance that feels good, e.g., 98%. With that valuegood, e.g., 98%. With that value of probability of inclusion (P) we
of probability of inclusion (P) we can calculate the probability of exclusion (a), and with that and thecan calculate the probability of exclusion (a), and with that and the sample size
sample size n n calculate the calculate the interval (in interval (in terms of thterms of the experimentae experimental average l average and variance and variance S2)S2) withinwithinwhichwhich the
the simulation simulation data must fall to be data must fall to be representative of real life.representative of real life. The basic principle is to
The basic principle is to use live data sampling to use live data sampling to establish a statistically valid estimate of the establish a statistically valid estimate of the populationpopulation mean and variance for a particular parameter as well as a range of values about each (called confidence mean and variance for a particular parameter as well as a range of values about each (called confidence intervals) that are dependent on the operator’s sense of percent confidence required. Given these
intervals) that are dependent on the operator’s sense of percent confidence required. Given these intervals the task is to then decide whether the corresponding mean and variance statistics of the intervals the task is to then decide whether the corresponding mean and variance statistics of the simulation data fall within the acceptable confidence interval about the sample statistics.
simulation data fall within the acceptable confidence interval about the sample statistics. Parameters other than mean and variance can
Parameters other than mean and variance can be used for be used for comparison. For example, the Universal Navalcomparison. For example, the Universal Naval Task List (UNTL) contains a host of measures of effectiveness (MOEs) that can be used for making Task List (UNTL) contains a host of measures of effectiveness (MOEs) that can be used for making comparisons between systems and processes. We have restricted ourselves to the mean and variance in comparisons between systems and processes. We have restricted ourselves to the mean and variance in this paper only for
this paper only for illustrative purposes.illustrative purposes. Confidence Interval
Confidence Interval.. A confidence interval is the region in the vicinity of a specified value of aA confidence interval is the region in the vicinity of a specified value of a phenomenon’s parameter within which another value may lie with a given probability
[100(1-phenomenon’s parameter within which another value may lie with a given probability [100(1-
α
α
)%], based)%], based on the results of observingon the results of observing n n samples of the phenomenon:samples of the phenomenon: α
α% Interval% Interval
Figure 1 -
Figure 1 - Generic Confidence IntervaGeneric Confidence Interval Graphl Graph
In general, experimental sample data are assumed to be only representative of some larger (and In general, experimental sample data are assumed to be only representative of some larger (and generally infinite) set of data elements. Suppose we are measuring the time
generally infinite) set of data elements. Suppose we are measuring the time t t to detect a target after itto detect a target after it arrives within a specified range of the sensor. We can observe the detection phenomenon a number of arrives within a specified range of the sensor. We can observe the detection phenomenon a number of times and calculate the average time to detect. But we might be more interested in knowing what the times and calculate the average time to detect. But we might be more interested in knowing what the average detection time would be if we observed an infinite number of trials. This infinite set is said to be average detection time would be if we observed an infinite number of trials. This infinite set is said to be the global “population” of data.
the global “population” of data. Sample and Population Statistics
Sample and Population Statistics. A “statistic” is some . A “statistic” is some number calculated from datnumber calculated from data that is used toa that is used to characterize the data set. There are many
characterize the data set. There are many different statistics; the more commonly used are the arithmeticdifferent statistics; the more commonly used are the arithmetic mean (“average”) and the variance.
mean (“average”) and the variance.33
••
The population arithmetic mean is generally indicated by the Greek “m” The population arithmetic mean is generally indicated by the Greek “m” letterletter µ (µ (“mu”), while the“mu”), while the sample meansample mean is indicated by a letter with a bar over is indicated by a letter with a bar over the top, for examplethe top, for example t t (called “t-bar”). The(called “t-bar”). The sample mean is calculated as the
sample mean is calculated as the sum of the sum of the parameters divided by their count:parameters divided by their count:44
1 1
C. Andrews La Varre, Booz Allen & Hamilton Inc., Newport, RI, October 2000 C. Andrews La Varre, Booz Allen & Hamilton Inc., Newport, RI, October 2000
2 2
Douglas C. Montgomery,
Douglas C. Montgomery, Statistical Quality Control Statistical Quality Control , John Wiley & Sons, 1985, ISBN 0-471-80870-9, Section 2-3, Chapter 3., John Wiley & Sons, 1985, ISBN 0-471-80870-9, Section 2-3, Chapter 3.
3 3
Others include the standard deviation, the
Others include the standard deviation, the t t statistic, and many, many others. We will highlight those of immediate interest for thisstatistic, and many, many others. We will highlight those of immediate interest for this problem.
problem.
4 4
For example, the mean of 2, 4, and 6 = (2+4+6)/3 = 4 For example, the mean of 2, 4, and 6 = (2+4+6)/3 = 4
1 1 n n i i t t t t n n ==
==
∑
∑
••
The population variance is generally indicated by the squared Greek “s” letterThe population variance is generally indicated by the squared Greek “s” letter σσ22(“sigma(“sigma squared”) while the sample variance is indicated by the squared capital “S” lettersquared”) while the sample variance is indicated by the squared capital “S” letter S S 22(“S- (“S-squared”). The sample variance is the average
squared”). The sample variance is the average squared squared distance that the parameter varies fromdistance that the parameter varies from its its average:average:55
( (
))
22 1 1 1 1 n n i i t t t t t t n n ==−−
==
−−
∑
∑
••
The square root of the variance is called the “standard deviation” as both the populationThe square root of the variance is called the “standard deviation” as both the population σσand theand the samplesample S S .. It turns out that
It turns out that66the sample and population statistics are related mathematically by expressions thatthe sample and population statistics are related mathematically by expressions that depend on the sample size. For the normal distribution discussed below these relationships are that: depend on the sample size. For the normal distribution discussed below these relationships are that:
••
the sample meanthe sample mean t t is distributed normally about the true population meanis distributed normally about the true population mean µµ, but with a variance, but with a variance equal to the population varianceequal to the population variance σσ22divided by the sample sizedivided by the sample size n n ::σσxx22==σσ22 / / n n ..
Similarly the variance of the sample set is not the same as that of
Similarly the variance of the sample set is not the same as that of the populationthe population; here there is a function; here there is a function of the sample
of the sample variance that fits variance that fits a particular curve a particular curve called the Chi-squared districalled the Chi-squared distribution. The formula is ratherbution. The formula is rather arcane so is bypassed here for simplicity.
arcane so is bypassed here for simplicity.
Confidence intervals are used to evaluate how close sample statistics are to the population statistics. Confidence intervals are used to evaluate how close sample statistics are to the population statistics. Effectively, they let us scale the observed data to a common (“standard”) interval in order to reach Effectively, they let us scale the observed data to a common (“standard”) interval in order to reach consistent conclusions about experimental data. The formulas are different for different kinds of events. consistent conclusions about experimental data. The formulas are different for different kinds of events. We will describe the formulas for the most common type
We will describe the formulas for the most common type of event, that whose data conform to the familiarof event, that whose data conform to the familiar bell-shaped curve, the “Normal”
bell-shaped curve, the “Normal” distribution.distribution. Normal Distribution
Normal Distribution. A “Normal” distribution curve is a symmetric . A “Normal” distribution curve is a symmetric curve centered on an average, thecurve centered on an average, the “arithmetic mean” value. It gives a graph (a continuous histogram) of the expected number of times a “arithmetic mean” value. It gives a graph (a continuous histogram) of the expected number of times a Normally-distribut
Normally-distributed parameter is likely to ed parameter is likely to occur. The edges (or occur. The edges (or “skirts”) of the graph fall “skirts”) of the graph fall off in a off in a mannermanner defined by the “variance”, such that 69% of the values occur within plus or minus one sigma (+
defined by the “variance”, such that 69% of the values occur within plus or minus one sigma (+ σσ) on) on either side of the mean. The mean can be any value, as can the variance.
either side of the mean. The mean can be any value, as can the variance.
--σσσσ σσσσ
Figure 2 - "Normal" Distribution Figure 2 - "Normal" Distribution Standard Normal Distribution
Standard Normal Distribution. The “Standard” Normal distribution is a specially scaled version of the. The “Standard” Normal distribution is a specially scaled version of the ordinary “Normal” distribution. This curve is centered on zero and shaped so that its variance is 1.0, ordinary “Normal” distribution. This curve is centered on zero and shaped so that its variance is 1.0, causing 69% of the observations will fall between the values +1 and -1:
causing 69% of the observations will fall between the values +1 and -1:
5 5
For the same example, 2, 4, 6 vary from their average (4) by –2, 0, 2 respectively, the squared amounts being 4, 0, 4, respectively. For the same example, 2, 4, 6 vary from their average (4) by –2, 0, 2 respectively, the squared amounts being 4, 0, 4, respectively. So their variance is the average of these three values, or 8/3. The term is squared to nullify the effect of negative numbers, since we So their variance is the average of these three values, or 8/3. The term is squared to nullify the effect of negative numbers, since we are interested in just the size of the distance. The sum is divided by n-1 rather than n to ensure the result more closely approaches are interested in just the size of the distance. The sum is divided by n-1 rather than n to ensure the result more closely approaches the population variance.
the population variance.
6 6
Ibid
0 0 -1
-1 11
Figure 3 -
Figure 3 - "Standard" Normal Distribution"Standard" Normal Distribution The process of converting a normal to a standard normal distribution is called
The process of converting a normal to a standard normal distribution is called standardization standardization andand involves using a conversion parameter
involves using a conversion parameter z z ::
µ µ σ σ
−−
==
t t z z (1)(1)These characteristics are very useful as a “standard” since any “Normal” curve can be scaled to the These characteristics are very useful as a “standard” since any “Normal” curve can be scaled to the “Standard Normal” to allow making comparisons and drawing conclusions.
“Standard Normal” to allow making comparisons and drawing conclusions.77 Normal and Standard Normal Probabilities
Normal and Standard Normal Probabilities . It turns out that the probability of a value being less than or. It turns out that the probability of a value being less than or equal to some value
equal to some value t t in the Normal distribution is calculated as the integral of the Normal distributionin the Normal distribution is calculated as the integral of the Normal distribution over the range from the far
over the range from the far left edge up to the valueleft edge up to the value t t ..88 The value of this integral is
The value of this integral is mathematically equal to the probability of the variablemathematically equal to the probability of the variable z z being less than orbeing less than or equal to the expression above, or:
equal to the expression above, or:
µ µ σ σ
≤≤
≡≡
≤≤
−−
==
0 0 00 0 0 0 0 ( ) ( ) (( )) P P t t t t P P z z z z t t z z (2) (2)These “cumulative probability” values are tabulated in a number of different texts for various
These “cumulative probability” values are tabulated in a number of different texts for various values ofvalues of z z .. It also turns out that the probability of a value being less than some other is equal to one minus the
It also turns out that the probability of a value being less than some other is equal to one minus the probability of it being bigger than the other:
probability of it being bigger than the other:
≤
=
≤
= −−
>>
(( )) 11 (( )) P
P u q u q P P u q u q (3)(3) Equations (2) and (3) can be used to calculate the probability that some value is between two other
Equations (2) and (3) can be used to calculate the probability that some value is between two other values. For example, the probability that the mean is between
values. For example, the probability that the mean is between a a andand b b is equal to the probability that it isis equal to the probability that it is less than or equal to
less than or equal to b b minus minus the probability that it is less than or equal tothe probability that it is less than or equal to a a ::
≤≤ ≤
≤
=
=
≤≤
−−
≤≤
(( )) (( )) (( ))
P
P a a t t b b P P t t b b P P t t a a (4)(4)
Confidence Intervals
Confidence Intervals.. So with (2) and (4) we can use the tabulated values forSo with (2) and (4) we can use the tabulated values for µµ σ σ
−−
==
a a a a z z andand µµ σ σ−−
==
b b b b z z to calculate the probability of a <to calculate the probability of a < t t << b b ::
(( )) z zb b zza a b b a a P P a a t t b P b P z z P P z z P P P P µ µ µµ σ σ σσ
−−
−−
≤≤ ≤≤
==
≤≤
−−
≤≤
==
−−
(5) (5) 7 7The standard normal and cumulative standard normal distributions are widely published in t
The standard normal and cumulative standard normal distributions are widely published in t ables. Additionally they are readilyables. Additionally they are readily available in Microsoft Excel with the functions NORMDIST() and NORMSDIST(). The first computes the values for either a normal or available in Microsoft Excel with the functions NORMDIST() and NORMSDIST(). The first computes the values for either a normal or standard normal distribution, the latter only for the cumulative standard normal distribution.
standard normal distribution, the latter only for the cumulative standard normal distribution.
8 8
The “integral” is simply a very precise way of adding things up. It “integrates” a range of incremental probabilities into a cumulative The “integral” is simply a very precise way of adding things up. It “integrates” a range of incremental probabilities into a cumulative probability value. In a discrete problem it is represented by the summation symbol, the Greek capital S:
probability value. In a discrete problem it is represented by the summation symbol, the Greek capital S:ΣΣ. The difference between a. The difference between a summation and an integral is essentially only the size of the samples being added up. So if we add up the incremental probabilities summation and an integral is essentially only the size of the samples being added up. So if we add up the incremental probabilities of a number being just so, we get the total probability of a number being less than or equal to the last “just so” value.
where P
where Pza za and Pand Pzb zb are obtained from the Cumulative Standard Normal Distribution tables. What thisare obtained from the Cumulative Standard Normal Distribution tables. What this
means is that for any Normally distributed set of data we can calculate the likelihood of its mean value means is that for any Normally distributed set of data we can calculate the likelihood of its mean value being between two arbitrarily chosen values
being between two arbitrarily chosen values a a andand b b .. Application to Evaluating Simulated Data
Application to Evaluating Simulated Data . We can collect data on a . We can collect data on a simulation engine for a range ofsimulation engine for a range of parameters, collect data on live events for the same parameters, and use the process above to compare parameters, collect data on live events for the same parameters, and use the process above to compare the closeness of the simula
the closeness of the simulation data to the live data. For example, if tion data to the live data. For example, if we have a mean time to we have a mean time to detect of 30detect of 30 minutes in the simulation data and a mean time of 50 minutes from the live data, we can use the
minutes in the simulation data and a mean time of 50 minutes from the live data, we can use the confidence interval to calculate the probability that the simulation mean
confidence interval to calculate the probability that the simulation mean is within range of the populationis within range of the population mean as established by the sampling of live data, that is, is 30 minutes within the
100(1-mean as established by the sampling of live data, that is, is 30 minutes within the 100(1-
α
α
)% interval)% interval about the sample mean of 50 minutes.about the sample mean of 50 minutes. If it is not in that interval then we need to If it is not in that interval then we need to ask questions about whyask questions about why it is not, and what it would take to get it into that range.
it is not, and what it would take to get it into that range.
This does not, however, immediately answer the question about sample size needed to reach these This does not, however, immediately answer the question about sample size needed to reach these conclusions. At this point the literature gets really arcane.
conclusions. At this point the literature gets really arcane.
However, we can simplify it. Recall that the sample and population statistics are related by a normal However, we can simplify it. Recall that the sample and population statistics are related by a normal distribution for the mean and a Chi-squared distribution for the variance. These relationships allow us, for distribution for the mean and a Chi-squared distribution for the variance. These relationships allow us, for a given probability of containment, to calculate a confidence interval for the mean under different
a given probability of containment, to calculate a confidence interval for the mean under different condition
conditions. Specifically (and s. Specifically (and this is this is turn-the-cranturn-the-crank stuff k stuff for an for an Operations ReseaOperations Research / rch / Systems AnalysisSystems Analysis [ORSA] person):
[ORSA] person):
Unknown population distribution, known population mean and variance. Unknown population distribution, known population mean and variance.
a.
a. Select Select a a desired desired probabilitprobability of y of containmentcontainment P P .. b.
b. Calculate Calculate the the resulting resulting probabiliprobability of ty of exclusionexclusion / / 22 11 2 2 22 P P αα α α
==
−−
==
.. c.c. Calculate Calculate the the sample sample meanmean t t d.
d. Calculate Calculate the the sample sample sizesize n n ..
e.
e. Calculate Calculate the the value value ofof z z for that exclusion probability:for that exclusion probability: / / 22 1 1 2 2 a a z z α α µ µ σ σ
−−
−−
==
f.f. Calculate the Calculate the interval within interval within which the which the population mean population mean is contained is contained with a with a probability ofprobability of P P == (1-(1-
α
α
) from:) from: // 22 // 22 z z z z x x x x n n n n α α σ σ µµ αα σσ−−
≤≤ ≤≤ ++
Normal population distribution, known population mean, unknown variance. Normal population distribution, known population mean, unknown variance.
a.
a. Select Select a da desired esired probability probability of of containmentcontainment P P .. b.
b. Calculate Calculate the the resulting resulting probabilitprobability of y of exclusionexclusion / / 22 11 2 2 22 P P αα α α
==
−−
==
.. c.c. Calculate Calculate the the sample sample meanmean t t and varianceand variance S S 2 2 d.
d. Calculate Calculate the the degrees degrees of of freedom freedom == n n – 1 from a sample size– 1 from a sample size n n .. e. Use
e. Use t- t- distribution tables to get the percentage pointdistribution tables to get the percentage point ωωof theof the t t -distribution with-distribution with n n -1 degrees of-1 degrees of freedom and exclusion
freedom and exclusion probabiliprobabilityty αα /2 /2 f.
f. Use Use the the Chi-squared tables Chi-squared tables to to get get the the percentage percentage pointpoint ξξα/2α/2andand
ξξ
(1-(1-αα /2) /2)of the Chi-squared-of theChi-squared-distribution with
distribution with n n -1 degrees of freedom for both exclusion probabilities-1 degrees of freedom for both exclusion probabilities αα /2 and (1- /2 and (1-
α
α
/2) /2) g.g. Calculate the Calculate the interval within interval within which the which the population mean population mean is contained is contained with a with a probabilitprobability ofy of P P = (1-= (1-
α
α
)) from:t t t t n n n n ω ωσ σ ωωσσ µ µ
−−
≤≤ ≤≤ ++
h.h. Calculate the interval Calculate the interval within which within which the population the population variance is variance is contained with a contained with a probability ofprobability of P P == (1-(1-
α
α
) from:) from:(
(
)
)
22(
(
))
22 2 2 2 2 ((11 22)) 1 1 11 n n SS nn SS α α αα σ σ ζ ζ ζζ −−−−
−−
≤≤
≤≤
Evaluating the Validity of Simulation Data
Evaluating the Validity of Simulation Data . We can use these methods to evaluate the validity of . We can use these methods to evaluate the validity of thethe simulation
simulation data. data. Specifically:Specifically: a.
a. Calculate Calculate the the simulation simulation data data statisticsstatistics t t andand S S 2 2 .. b.
b. Use Use standard standard methods methods to to extrapolate to extrapolate to populationpopulation µµandandσσ22
c.
c. Postulate Postulate live live sample sample sizessizes n n and sample meanand sample mean t t and varianceand variance S S 2 2 d.
d. Calculate the Calculate the population mean population mean and and variance confidence variance confidence levels as levels as described.described. e.
e. If the simulatioIf the simulation data mean ann data mean and variance do nod variance do not fall in those int fall in those intervals then take tervals then take steps to calibratesteps to calibrate the program to have the results reflect the real data.
the program to have the results reflect the real data. Alternative Approaches
Alternative Approaches. There are, of . There are, of course, many other methods of comparing data. “Curve Fitting”course, many other methods of comparing data. “Curve Fitting” (regression theory) can be used to devise a closed form equation that describes the data. The equations (regression theory) can be used to devise a closed form equation that describes the data. The equations can then be compared to evaluate their similarities. Another interesting approach is the can then be compared to evaluate their similarities. Another interesting approach is the Kolmogorov-Smirnov test.
Smirnov test.99The Kolmogorov-Smirnov D is a particularly simple measure: It is defined as the maximumThe Kolmogorov-Smirnov D is a particularly simple measure: It is defined as the maximum value of the absolute difference between two cumulative distribution functions. This allows direct
value of the absolute difference between two cumulative distribution functions. This allows direct comparison between two sets of data after constructing a synthetic cumulative distribution function for comparison between two sets of data after constructing a synthetic cumulative distribution function for each set. The simplicity of this approach is that no assumptions need be made about the actual
each set. The simplicity of this approach is that no assumptions need be made about the actual distribution of the sample sets, you simply perform an absolute distance test between curves that distribution of the sample sets, you simply perform an absolute distance test between curves that represent each set.
represent each set. Conclusion
Conclusion. We can use statistical methods to evaluate the goodness of fit between simulation engines. We can use statistical methods to evaluate the goodness of fit between simulation engines and real life. This requires the operator to select a probability of compliance that feels good,
and real life. This requires the operator to select a probability of compliance that feels good, e.g.e.g., 98%., 98%. With that value of probability of inclusion (
With that value of probability of inclusion ( P)P) we can calculate the probability of exclusion (we can calculate the probability of exclusion (
α)
α)
, and with, and with that and the sample sizethat and the sample size n n calculate the interval (in terms of calculate the interval (in terms of the experimental averagethe experimental average t t and varianceand variance S S 2 2 )) within which the simulation data must fall to be representative of real life. We can also
within which the simulation data must fall to be representative of real life. We can also use a variety ofuse a variety of other methods to establish a confidence of similarity between separate data sets.
other methods to establish a confidence of similarity between separate data sets.
9 9
http://www.ulib.org/webRoot/Books/Numerical_Recipes/bookcpdf/c14-3.pdf http://www.ulib.org/webRoot/Books/Numerical_Recipes/bookcpdf/c14-3.pdf