Confidence Intervals

(1)

Confidence Intervals for Validating Simulation Models Confidence Intervals for Validating Simulation Models11 Summary

Summary. This paper summarizes the processes for building and using confidence intervals to evaluate. This paper summarizes the processes for building and using confidence intervals to evaluate the validity of simulation models.

the validity of simulation models.22

We can use statistical methods to evaluate the goodness of fit between simulation engines and real life. We can use statistical methods to evaluate the goodness of fit between simulation engines and real life. This requires the operator to select a

This requires the operator to select a probabilitprobability of compliance that feels y of compliance that feels good, e.g., 98%. With that valuegood, e.g., 98%. With that value of probability of inclusion (P) we

of probability of inclusion (P) we can calculate the probability of exclusion (a), and with that and thecan calculate the probability of exclusion (a), and with that and the sample size

sample size n n calculate the calculate the interval (in interval (in terms of thterms of the experimentae experimental average l average and variance and variance S2)S2) withinwithinwhichwhich the

the simulation simulation data must fall to be data must fall to be representative of real life.representative of real life. The basic principle is to

The basic principle is to use live data sampling to use live data sampling to establish a statistically valid estimate of the establish a statistically valid estimate of the populationpopulation mean and variance for a particular parameter as well as a range of values about each (called confidence mean and variance for a particular parameter as well as a range of values about each (called confidence intervals) that are dependent on the operator’s sense of percent confidence required. Given these

intervals) that are dependent on the operator’s sense of percent confidence required. Given these intervals the task is to then decide whether the corresponding mean and variance statistics of the intervals the task is to then decide whether the corresponding mean and variance statistics of the simulation data fall within the acceptable confidence interval about the sample statistics.

simulation data fall within the acceptable confidence interval about the sample statistics. Parameters other than mean and variance can

Parameters other than mean and variance can be used for be used for comparison. For example, the Universal Navalcomparison. For example, the Universal Naval Task List (UNTL) contains a host of measures of effectiveness (MOEs) that can be used for making Task List (UNTL) contains a host of measures of effectiveness (MOEs) that can be used for making comparisons between systems and processes. We have restricted ourselves to the mean and variance in comparisons between systems and processes. We have restricted ourselves to the mean and variance in this paper only for

this paper only for illustrative purposes.illustrative purposes. Confidence Interval

Confidence Interval.. A confidence interval is the region in the vicinity of a specified value of aA confidence interval is the region in the vicinity of a specified value of a phenomenon’s parameter within which another value may lie with a given probability

[100(1-phenomenon’s parameter within which another value may lie with a given probability [100(1-

α

)%], based)%], based on the results of observing

on the results of observing n n samples of the phenomenon:samples of the phenomenon: α

α% Interval% Interval

Figure 1 -

Figure 1 - Generic Confidence IntervaGeneric Confidence Interval Graphl Graph

In general, experimental sample data are assumed to be only representative of some larger (and In general, experimental sample data are assumed to be only representative of some larger (and generally infinite) set of data elements. Suppose we are measuring the time

generally infinite) set of data elements. Suppose we are measuring the time t t to detect a target after itto detect a target after it arrives within a specified range of the sensor. We can observe the detection phenomenon a number of arrives within a specified range of the sensor. We can observe the detection phenomenon a number of times and calculate the average time to detect. But we might be more interested in knowing what the times and calculate the average time to detect. But we might be more interested in knowing what the average detection time would be if we observed an infinite number of trials. This infinite set is said to be average detection time would be if we observed an infinite number of trials. This infinite set is said to be the global “population” of data.

the global “population” of data. Sample and Population Statistics

Sample and Population Statistics. A “statistic” is some . A “statistic” is some number calculated from datnumber calculated from data that is used toa that is used to characterize the data set. There are many

characterize the data set. There are many different statistics; the more commonly used are the arithmeticdifferent statistics; the more commonly used are the arithmetic mean (“average”) and the variance.

mean (“average”) and the variance.33

••

The population arithmetic mean is generally indicated by the Greek “m” The population arithmetic mean is generally indicated by the Greek “m” letterletter µ (µ (“mu”), while the“mu”), while the sample mean

sample mean is indicated by a letter with a bar over is indicated by a letter with a bar over the top, for examplethe top, for example t t (called “t-bar”). The(called “t-bar”). The sample mean is calculated as the

sample mean is calculated as the sum of the sum of the parameters divided by their count:parameters divided by their count:44

1 1

C. Andrews La Varre, Booz Allen & Hamilton Inc., Newport, RI, October 2000 C. Andrews La Varre, Booz Allen & Hamilton Inc., Newport, RI, October 2000

2 2

Douglas C. Montgomery,

Douglas C. Montgomery, Statistical Quality Control Statistical Quality Control , John Wiley & Sons, 1985, ISBN 0-471-80870-9, Section 2-3, Chapter 3., John Wiley & Sons, 1985, ISBN 0-471-80870-9, Section 2-3, Chapter 3.

3 3

Others include the standard deviation, the

Others include the standard deviation, the t t statistic, and many, many others. We will highlight those of immediate interest for thisstatistic, and many, many others. We will highlight those of immediate interest for this problem.

problem.

4 4

For example, the mean of 2, 4, and 6 = (2+4+6)/3 = 4 For example, the mean of 2, 4, and 6 = (2+4+6)/3 = 4

(2)

1 1 n n i i t t t t n n ==

==

∑

••

The population variance is generally indicated by the squared Greek “s” letterThe population variance is generally indicated by the squared Greek “s” letter σσ22(“sigma(“sigma squared”) while the sample variance is indicated by the squared capital “S” letter

squared”) while the sample variance is indicated by the squared capital “S” letter S S 22(“S- (“S-squared”). The sample variance is the average

squared”). The sample variance is the average squared squared distance that the parameter varies fromdistance that the parameter varies from its its average:average:55

( (

))

22 1 1 1 1 n n i i t t t t t t n n ==

−−

==

−−

∑

••

The square root of the variance is called the “standard deviation” as both the populationThe square root of the variance is called the “standard deviation” as both the population σσand theand the sample

sample S S .. It turns out that

It turns out that66the sample and population statistics are related mathematically by expressions thatthe sample and population statistics are related mathematically by expressions that depend on the sample size. For the normal distribution discussed below these relationships are that: depend on the sample size. For the normal distribution discussed below these relationships are that:

••

the sample meanthe sample mean t t is distributed normally about the true population meanis distributed normally about the true population mean µµ, but with a variance, but with a variance equal to the population variance

equal to the population variance σσ22divided by the sample sizedivided by the sample size n n ::σσxx22==σσ22 / / n n ..

Similarly the variance of the sample set is not the same as that of

Similarly the variance of the sample set is not the same as that of the populationthe population; here there is a function; here there is a function of the sample

of the sample variance that fits variance that fits a particular curve a particular curve called the Chi-squared districalled the Chi-squared distribution. The formula is ratherbution. The formula is rather arcane so is bypassed here for simplicity.

arcane so is bypassed here for simplicity.

Confidence intervals are used to evaluate how close sample statistics are to the population statistics. Confidence intervals are used to evaluate how close sample statistics are to the population statistics. Effectively, they let us scale the observed data to a common (“standard”) interval in order to reach Effectively, they let us scale the observed data to a common (“standard”) interval in order to reach consistent conclusions about experimental data. The formulas are different for different kinds of events. consistent conclusions about experimental data. The formulas are different for different kinds of events. We will describe the formulas for the most common type

We will describe the formulas for the most common type of event, that whose data conform to the familiarof event, that whose data conform to the familiar bell-shaped curve, the “Normal”

bell-shaped curve, the “Normal” distribution.distribution. Normal Distribution

Normal Distribution. A “Normal” distribution curve is a symmetric . A “Normal” distribution curve is a symmetric curve centered on an average, thecurve centered on an average, the “arithmetic mean” value. It gives a graph (a continuous histogram) of the expected number of times a “arithmetic mean” value. It gives a graph (a continuous histogram) of the expected number of times a Normally-distribut

Normally-distributed parameter is likely to ed parameter is likely to occur. The edges (or occur. The edges (or “skirts”) of the graph fall “skirts”) of the graph fall off in a off in a mannermanner defined by the “variance”, such that 69% of the values occur within plus or minus one sigma (+

defined by the “variance”, such that 69% of the values occur within plus or minus one sigma (+ σσ) on) on either side of the mean. The mean can be any value, as can the variance.

either side of the mean. The mean can be any value, as can the variance.

--σσσσ σσσσ

Figure 2 - "Normal" Distribution Figure 2 - "Normal" Distribution Standard Normal Distribution

Standard Normal Distribution. The “Standard” Normal distribution is a specially scaled version of the. The “Standard” Normal distribution is a specially scaled version of the ordinary “Normal” distribution. This curve is centered on zero and shaped so that its variance is 1.0, ordinary “Normal” distribution. This curve is centered on zero and shaped so that its variance is 1.0, causing 69% of the observations will fall between the values +1 and -1:

causing 69% of the observations will fall between the values +1 and -1:

5 5

For the same example, 2, 4, 6 vary from their average (4) by –2, 0, 2 respectively, the squared amounts being 4, 0, 4, respectively. For the same example, 2, 4, 6 vary from their average (4) by –2, 0, 2 respectively, the squared amounts being 4, 0, 4, respectively. So their variance is the average of these three values, or 8/3. The term is squared to nullify the effect of negative numbers, since we So their variance is the average of these three values, or 8/3. The term is squared to nullify the effect of negative numbers, since we are interested in just the size of the distance. The sum is divided by n-1 rather than n to ensure the result more closely approaches are interested in just the size of the distance. The sum is divided by n-1 rather than n to ensure the result more closely approaches the population variance.

the population variance.

6 6

Ibid

(3)

0 0 -1

-1 11

Figure 3 -

Figure 3 - "Standard" Normal Distribution"Standard" Normal Distribution The process of converting a normal to a standard normal distribution is called

The process of converting a normal to a standard normal distribution is called standardization standardization andand involves using a conversion parameter

involves using a conversion parameter z z ::

µ µ σ σ

−−

==

t t z z (1)(1)

These characteristics are very useful as a “standard” since any “Normal” curve can be scaled to the These characteristics are very useful as a “standard” since any “Normal” curve can be scaled to the “Standard Normal” to allow making comparisons and drawing conclusions.

“Standard Normal” to allow making comparisons and drawing conclusions.77 Normal and Standard Normal Probabilities

Normal and Standard Normal Probabilities . It turns out that the probability of a value being less than or. It turns out that the probability of a value being less than or equal to some value

equal to some value t t in the Normal distribution is calculated as the integral of the Normal distributionin the Normal distribution is calculated as the integral of the Normal distribution over the range from the far

over the range from the far left edge up to the valueleft edge up to the value t t ..88 The value of this integral is

The value of this integral is mathematically equal to the probability of the variablemathematically equal to the probability of the variable z z being less than orbeing less than or equal to the expression above, or:

equal to the expression above, or:

µ µ σ σ

≤≤

≡≡

≤≤

−−

==

0 0 00 0 0 0 0 ( ) ( ) (( )) P P t t t t P P z z z z t t z z (2) (2)

These “cumulative probability” values are tabulated in a number of different texts for various

These “cumulative probability” values are tabulated in a number of different texts for various values ofvalues of z z .. It also turns out that the probability of a value being less than some other is equal to one minus the

It also turns out that the probability of a value being less than some other is equal to one minus the probability of it being bigger than the other:

probability of it being bigger than the other:

≤

=

≤

= −−

>>

(( )) 11 (( )) P

P u q u q P P u q u q (3)(3) Equations (2) and (3) can be used to calculate the probability that some value is between two other

Equations (2) and (3) can be used to calculate the probability that some value is between two other values. For example, the probability that the mean is between

values. For example, the probability that the mean is between a a andand b b is equal to the probability that it isis equal to the probability that it is less than or equal to

less than or equal to b b minus minus the probability that it is less than or equal tothe probability that it is less than or equal to a a ::

≤≤ ≤

≤

=

≤≤

−−

≤≤

(( )) (( )) (( ))

P

P a a t t b b P P t t b b P P t t a a (4)(4)

Confidence Intervals

Confidence Intervals.. So with (2) and (4) we can use the tabulated values forSo with (2) and (4) we can use the tabulated values for µµ σ σ

−−

==

a a a a z z andand µµ σ σ

−−

==

b b b b z z to calculate the probability of a <

to calculate the probability of a < t t << b b ::

(( )) z zb b zza a b b a a P P a a t t b P b P z z P P z z P P P P µ µ µµ σ σ σσ

−−









≤≤ ≤≤

==

_

≤≤

_

−−

_

≤≤

_









==

−−

(5) (5) 7 7

The standard normal and cumulative standard normal distributions are widely published in t

The standard normal and cumulative standard normal distributions are widely published in t ables. Additionally they are readilyables. Additionally they are readily available in Microsoft Excel with the functions NORMDIST() and NORMSDIST(). The first computes the values for either a normal or available in Microsoft Excel with the functions NORMDIST() and NORMSDIST(). The first computes the values for either a normal or standard normal distribution, the latter only for the cumulative standard normal distribution.

standard normal distribution, the latter only for the cumulative standard normal distribution.

8 8

The “integral” is simply a very precise way of adding things up. It “integrates” a range of incremental probabilities into a cumulative The “integral” is simply a very precise way of adding things up. It “integrates” a range of incremental probabilities into a cumulative probability value. In a discrete problem it is represented by the summation symbol, the Greek capital S:

probability value. In a discrete problem it is represented by the summation symbol, the Greek capital S:ΣΣ. The difference between a. The difference between a summation and an integral is essentially only the size of the samples being added up. So if we add up the incremental probabilities summation and an integral is essentially only the size of the samples being added up. So if we add up the incremental probabilities of a number being just so, we get the total probability of a number being less than or equal to the last “just so” value.

(4)

where P

where Pza za and Pand Pzb zb are obtained from the Cumulative Standard Normal Distribution tables. What thisare obtained from the Cumulative Standard Normal Distribution tables. What this

means is that for any Normally distributed set of data we can calculate the likelihood of its mean value means is that for any Normally distributed set of data we can calculate the likelihood of its mean value being between two arbitrarily chosen values

being between two arbitrarily chosen values a a andand b b .. Application to Evaluating Simulated Data

Application to Evaluating Simulated Data . We can collect data on a . We can collect data on a simulation engine for a range ofsimulation engine for a range of parameters, collect data on live events for the same parameters, and use the process above to compare parameters, collect data on live events for the same parameters, and use the process above to compare the closeness of the simula

the closeness of the simulation data to the live data. For example, if tion data to the live data. For example, if we have a mean time to we have a mean time to detect of 30detect of 30 minutes in the simulation data and a mean time of 50 minutes from the live data, we can use the

minutes in the simulation data and a mean time of 50 minutes from the live data, we can use the confidence interval to calculate the probability that the simulation mean

confidence interval to calculate the probability that the simulation mean is within range of the populationis within range of the population mean as established by the sampling of live data, that is, is 30 minutes within the

100(1-mean as established by the sampling of live data, that is, is 30 minutes within the 100(1-

α

)% interval)% interval about the sample mean of 50 minutes.

about the sample mean of 50 minutes. If it is not in that interval then we need to If it is not in that interval then we need to ask questions about whyask questions about why it is not, and what it would take to get it into that range.

it is not, and what it would take to get it into that range.

This does not, however, immediately answer the question about sample size needed to reach these This does not, however, immediately answer the question about sample size needed to reach these conclusions. At this point the literature gets really arcane.

conclusions. At this point the literature gets really arcane.

However, we can simplify it. Recall that the sample and population statistics are related by a normal However, we can simplify it. Recall that the sample and population statistics are related by a normal distribution for the mean and a Chi-squared distribution for the variance. These relationships allow us, for distribution for the mean and a Chi-squared distribution for the variance. These relationships allow us, for a given probability of containment, to calculate a confidence interval for the mean under different

a given probability of containment, to calculate a confidence interval for the mean under different condition

conditions. Specifically (and s. Specifically (and this is this is turn-the-cranturn-the-crank stuff k stuff for an for an Operations ReseaOperations Research / rch / Systems AnalysisSystems Analysis [ORSA] person):

[ORSA] person):

Unknown population distribution, known population mean and variance. Unknown population distribution, known population mean and variance.

a.

a. Select Select a a desired desired probabilitprobability of y of containmentcontainment P P .. b.

b. Calculate Calculate the the resulting resulting probabiliprobability of ty of exclusionexclusion / / 22 11 2 2 22 P P αα α α

==

−−

==

.. c.

c. Calculate Calculate the the sample sample meanmean t t d.

d. Calculate Calculate the the sample sample sizesize n n ..

e.

e. Calculate Calculate the the value value ofof z z for that exclusion probability:for that exclusion probability: _/_{/ 2}₂ 1 1 2 2 a a z z α α µ µ σ σ



₋₋



₋₋









==

f.

f. Calculate the Calculate the interval within interval within which the which the population mean population mean is contained is contained with a with a probability ofprobability of P P == (1-(1-

α

) from:) from: // 22 // 22 z z z z x x x x n n n n α α σ σ _µ_µ αα σσ

−−

≤≤ ≤≤ ++

Normal population distribution, known population mean, unknown variance. Normal population distribution, known population mean, unknown variance.

a.

a. Select Select a da desired esired probability probability of of containmentcontainment P P .. b.

b. Calculate Calculate the the resulting resulting probabilitprobability of y of exclusionexclusion / / 22 11 2 2 22 P P αα α α

==

−−

==

.. c.

c. Calculate Calculate the the sample sample meanmean t t and varianceand variance S S 2 2 d.

d. Calculate Calculate the the degrees degrees of of freedom freedom == n n – 1 from a sample size– 1 from a sample size n n .. e. Use

e. Use t- t- distribution tables to get the percentage pointdistribution tables to get the percentage point ωωof theof the t t -distribution with-distribution with n n -1 degrees of-1 degrees of freedom and exclusion

freedom and exclusion probabiliprobabilityty αα /2 /2 f.

f. Use Use the the Chi-squared tables Chi-squared tables to to get get the the percentage percentage pointpoint ξξα/2α/2andand

ξξ

(1-(1-αα /2) /2)of the Chi-squared-of the

Chi-squared-distribution with

distribution with n n -1 degrees of freedom for both exclusion probabilities-1 degrees of freedom for both exclusion probabilities αα /2 and (1- /2 and (1-

α

/2) /2) g.

g. Calculate the Calculate the interval within interval within which the which the population mean population mean is contained is contained with a with a probabilitprobability ofy of P P = (1-= (1-

α

)) from:

(5)

t t t t n n n n ω ωσ σ ωωσσ µ µ

−−

≤≤ ≤≤ ++

h.

h. Calculate the interval Calculate the interval within which within which the population the population variance is variance is contained with a contained with a probability ofprobability of P P == (1-(1-

α

) from:) from:

(

)

22

(

))

22 2 2 2 2 ((11 22)) 1 1 11 n n SS nn SS α α αα σ σ ζ ζ ζζ ₋₋

−−

≤≤

Evaluating the Validity of Simulation Data

Evaluating the Validity of Simulation Data . We can use these methods to evaluate the validity of . We can use these methods to evaluate the validity of thethe simulation

simulation data. data. Specifically:Specifically: a.

a. Calculate Calculate the the simulation simulation data data statisticsstatistics t t andand S S 2 2 .. b.

b. Use Use standard standard methods methods to to extrapolate to extrapolate to populationpopulation µµandandσσ22

c.

c. Postulate Postulate live live sample sample sizessizes n n and sample meanand sample mean t t and varianceand variance S S 2 2 d.

d. Calculate the Calculate the population mean population mean and and variance confidence variance confidence levels as levels as described.described. e.

e. If the simulatioIf the simulation data mean ann data mean and variance do nod variance do not fall in those int fall in those intervals then take tervals then take steps to calibratesteps to calibrate the program to have the results reflect the real data.

the program to have the results reflect the real data. Alternative Approaches

Alternative Approaches. There are, of . There are, of course, many other methods of comparing data. “Curve Fitting”course, many other methods of comparing data. “Curve Fitting” (regression theory) can be used to devise a closed form equation that describes the data. The equations (regression theory) can be used to devise a closed form equation that describes the data. The equations can then be compared to evaluate their similarities. Another interesting approach is the can then be compared to evaluate their similarities. Another interesting approach is the Kolmogorov-Smirnov test.

Smirnov test.99The Kolmogorov-Smirnov D is a particularly simple measure: It is defined as the maximumThe Kolmogorov-Smirnov D is a particularly simple measure: It is defined as the maximum value of the absolute difference between two cumulative distribution functions. This allows direct

value of the absolute difference between two cumulative distribution functions. This allows direct comparison between two sets of data after constructing a synthetic cumulative distribution function for comparison between two sets of data after constructing a synthetic cumulative distribution function for each set. The simplicity of this approach is that no assumptions need be made about the actual

each set. The simplicity of this approach is that no assumptions need be made about the actual distribution of the sample sets, you simply perform an absolute distance test between curves that distribution of the sample sets, you simply perform an absolute distance test between curves that represent each set.

represent each set. Conclusion

Conclusion. We can use statistical methods to evaluate the goodness of fit between simulation engines. We can use statistical methods to evaluate the goodness of fit between simulation engines and real life. This requires the operator to select a probability of compliance that feels good,

and real life. This requires the operator to select a probability of compliance that feels good, e.g.e.g., 98%., 98%. With that value of probability of inclusion (

With that value of probability of inclusion ( P)P) we can calculate the probability of exclusion (we can calculate the probability of exclusion (

α)

, and with, and with that and the sample size

that and the sample size n n calculate the interval (in terms of calculate the interval (in terms of the experimental averagethe experimental average t t and varianceand variance S S 2 2 )) within which the simulation data must fall to be representative of real life. We can also

within which the simulation data must fall to be representative of real life. We can also use a variety ofuse a variety of other methods to establish a confidence of similarity between separate data sets.

other methods to establish a confidence of similarity between separate data sets.

9 9

http://www.ulib.org/webRoot/Books/Numerical_Recipes/bookcpdf/c14-3.pdf http://www.ulib.org/webRoot/Books/Numerical_Recipes/bookcpdf/c14-3.pdf