AND A NALYSIS
Step 6: Document and approve the data
6.8 Distribution Fitting
6.8.2 Theoretical Distributions
As mentioned earlier, it may be appropriate to use the frequency distribution in the simulation model as it is, as an empirical distribution. Whenever possible, however, the underlying distribution from which the sample data came should be determined. This way the actual distribution of the population can be used to
1
FIGURE 6.11
Histogram showing arrival count per five-minute interval.
20
15
10
5
0
0 1 2 3 4 5 6 7 8 9 10 11
Arrival count
FIGURE 6.12
Frequency distribution for 100 observed inspection times.
Frequency
FIGURE 6.13 Histogram distribution for 100 observed inspection times.
generate random variates in the simulation rather than relying only on a sampling of observations from the population. Defining the theoretical distribution that best fits sample data is called distribution fitting. Before discussing how theoretical distributions are fit to data, it is helpful to have a basic understanding of at least the most common theoretical distributions.
There are about 12 statistical distributions that are commonly used in sim-ulation (Banks and Gibson 1997). Theoretical distributions can be defined by a simple set of parameters usually defining dispersion and density. A normal distri- bution, for example, is defined by a mean value and a standard deviation value. Theoretical distributions are either discrete or continuous, depending on whether a finite set of values within the range or an infinite continuum of possible values within a range can occur. Discrete distributions are seldom used in manufacturing and service system simulations because they can usually be defined by simple probability expressions. Below is a description of a few theoretical distributions sometimes used in simulation.
These particular ones are presented here purely be- cause of their familiarity and ease of understanding. Beginners to simulation usu- ally feel most comfortable using these distributions, although the precautions given for their use should be noted. An extensive list of theoretical distributions and their applications is given in Appendix A.
Binomial Distribution
The binomial distribution is a discrete distribution that expresses the probability ( p) that a particular condition or outcome can occur in n trials. We call an oc- currence of the outcome of interest a success and its nonoccurrence a failure. For a binomial distribution to apply, each trial must be a Bernoulli trial: it must
be independent and have only two possible outcomes (success or failure), and the probability of a success must remain constant from trial to trial. The mean of a binomial distribution is given by np, where n is the number of trials and p is the probability of success on any given trial. The variance is given by np(1 − p).
A common application of the binomial distribution in simulation is to test for the number of defective items in a lot or the number of customers of a particular type in a group of customers. Suppose, for example, it is known that the probabil- ity of a part being defective coming out of an operation is .1 and we inspect the parts in batch sizes of 10. The number of defectives for any given sample can be determined by generating a binomial random variate. The probability mass func- tion for the binomial distribution in this example is shown in Figure 6.14.
Uniform Distribution
A uniform or rectangular distribution is used to describe a process in which the outcome is equally likely to fall between the values of a and b. In a uniform distribution, the mean is (a + b)/2. The variance is expressed by (b
− a)2/12. The probability density function for the uniform distribution is shown in Figure 6.15.
FIGURE 6.14 The probability mass function of a binomial distribution (n = 10, p = .1).
0.6 0.5 0.4 0.3 0.2 0.1 0
0 1 2 3 4 5 6 7 8 9 10
x
FIGURE 6.15 The probability density function of a uniform distribution. f (x)
a b
x
p(x)
FIGURE 6.16 The probability density function of a
triangular
distribution. f (x)
a m b
x
The uniform distribution is often used in the early stages of simulation pro-jects because it is a convenient and well-understood source of random variation.
In the real world, it is extremely rare to find an activity time that is uniformly distributed because nearly all activity times have a central tendency or mode.
Sometimes a uniform distribution is used to represent a worst-case test for varia-tion when doing sensitivity analysis.
Triangular Distribution
A triangular distribution is a good approximation to use in the absence of data, es- pecially if a minimum, maximum, and most likely value (mode) can be estimated. These are the three parameters of the triangular distribution. If a, m, and b repre- sent the minimum, mode, and maximum values respectively of a triangular distri- bution, then the mean of a triangular distribution is (a + m + b)/3. The variance is defined by (a2 + m2 + b2 − am − ab − mb)/18. The probability density func- tion for the triangular distribution is shown in Figure 6.16.
The weakness of the triangular distribution is that values in real activity times rarely taper linearly, which means that the triangular distribution will probably create more variation than the true distribution. Also, extreme values that may be rare are not captured by a triangular distribution. This means that the full range of values of the true distribution of the population may not be represented by the tri- angular distribution.
Normal Distribution
The normal distribution (sometimes called the Gaussian distribution) describes phenomena that vary symmetrically above and below the mean (hence the bell-shaped curve). While the normal distribution is often selected for defining ac-tivity times, in practice manual acac-tivity times are rarely ever normally distributed. They are nearly always skewed to the right (the ending tail of the distribution is longer than the beginning tail). This is because humans can sometimes take sig- nificantly longer than the mean time, but usually not much less than the mean
FIGURE 6.17 The probability density function for a normal distribution.
f (x)
x
time. Examples of normal distributions might be
• Physical measurements—height, length, diameter, weight.
• Activities involving multiple tasks (like loading a truck or filling a customer order).
The mean of the normal distribution is designated by the Greek letter mu (µ).
The variance is σ 2 where σ (sigma) is the standard deviation. The probability density function for the normal distribution is shown in Figure 6.17.
Exponential Distribution
Sometimes referred to as the negative exponential, this distribution is used fre-quently in simulations to represent event intervals. The exponential distribution is defined by a single parameter, the mean (µ). This distribution is related to the Poisson distribution in that if an occurrence happens at a rate that is Poisson dis- tributed, the time between occurrences is exponentially distributed. In other words, the mean of the exponential distribution is the inverse of the Poisson rate. For example, if the rate at which customers arrive at a bank is Poisson distributed with a rate of 12 per hour, the time between arrivals is exponentially distributed with a mean of 5 minutes. The exponential distribution has a memoryless or forgetfulness property that makes it well suited for modeling certain phenomena that occur independently of one another. For example, if arrival times are expo- nentially distributed with a mean of 5 minutes, then the expected time before the next arrival is 5 minutes regardless of how much time has elapsed since the pre- vious arrival. It is as though there is no memory of how much time has already elapsed when predicting the next event—hence the term memoryless. Examples
FIGURE 6.18 The probability density function for an exponential distribution.
f (x )
x
of this distribution are
• Time between customer arrivals at a bank.
• Duration of telephone conversations.
• Time between arrivals of planes at a major airport.
• Time between failures of certain types of electronic devices.
• Time between interrupts of the CPU in a computer system.
For an exponential distribution, the variance is the same as the mean. The prob- ability density function of the exponential distribution is shown in Figure 6.18.