Develop Probability Distributions to Model Uncertainty

cost risk and uncertainty

Step 2: Develop Probability Distributions to Model Uncertainty

Uncertainty is best modeled with a probability distribution that accounts for all possible outcomes according to the probability that they will occur. Figure 18 gives an example of a known distribution that models all outcomes associated with rolling a pair of dice.

Figure 18: The Distribution of Sums from Rolling Two Dice Probability Value 0 probability that outcome is less than 2 100% probability that outcome will

not exceed 12 50% probability

that outcome is above or below 7 (this is the median)

Most likely outcome

Source: GAO.

In figure 18, the horizontal axis shows the potential value of dice rolls, while the vertical axis shows the probability associated with each roll. The value at the midpoint of all rolls is the median. In the example, the median is also the most likely value (that is, average = a roll of 7), because the outcomes associated with rolling a pair of dice are symmetric.

Besides descriptive statistics, probability distributions provide other useful information, such as the boundaries of an outcome. For example, the lower bound in figure 18 is 2 and the upper bound is 12. By examining the distribution, it is easy to see that both the upper and lower bounds have the lowest probability of occurring, while the chances of rolling a 6, 7, or 8 are much greater.

It is difficult to pick an appropriate probability distribution for the point cost estimate as a whole, because it is composed of several subsidiary estimates based on the WBS. These WBS elements are often estimated with a variety of techniques, each with its own uncertainty distributions that may be asymmetrical. Therefore, just simply adding the most likely WBS element costs does not result in the most likely cost estimate because the risk distributions associated with the subelements differ.

One way to resolve this issue is to create statistical probability distributions for each WBS element or risk by specifying the risk shape and bounds that reflect the relative spread and skewness of the distribution. The probability distribution represents the risk shape, and the tails of the distribution reflect the best and worst case outcomes. Even though the bounds are extremes and unlikely to occur, the distribution acknowledges the possibility and probability that they could happen. Probability distributions are typically determined using the 3-point estimates of optimistic, most likely, and pessimistic values to identify the amount of spread and skewness of the data. However, if risks are used directly, they will be assigned to specific cost elements or activities in a schedule and will perform appropriately in a simulation.56

56_{Risks can be entered directly or they can be assigned as multiplication factors to specific cost elements or schedule activities. If}

this “risk driver” approach is used, the data collected, including probability of occurrence and impact (typically a 3-point estimate), will be on the risks themselves. Hence, the focus is on the risks, not on their impact on activities or cost line items. This focus on the risks makes it easy to understand the results and to focus on mitigating risks directly.

Using a simulation tool such as Monte Carlo, a cost estimator can develop a statistical summation of all probable costs, allowing for a better understanding of how likely it is that the point estimate can be met. A Monte Carlo simulation also does a better job of capturing risk, because it takes into consideration that some risks will occur while others may not. Furthermore, the simulation can adjust the risks beyond the upper and lower bounds to account for the fact that experts do not typically think in extremes. Figure 19 shows why different WBS element distributions need to be statistically summed in order to develop the overall point estimate probability distribution.

Figure 19: A Point Estimate Probability Distribution Driven by WBS Distributions

+

...

=

RPE RPE RPE

Confidence level

+ +

Cost 6RPE Probability density RPE

Bell curve S curve

Cost = x₁ + x₂ + x₃ + ... + xn

x1 xn

Inputs Outputs

Probability distributions for each cost element in a system’s work breakdown structure A cumulative probability distribution of the system’s total cost

20 50 70 85 100 Source: NASA.

Note: RPE = reference point estimate.

In figure 19, the sum of the reference point estimates has a low level of probability on the S curve. In other words, there is only a 20 percent chance or less of meeting the point estimate cost. Therefore, in order to increase the confidence in the program cost estimate, it will be necessary to add more funding to reach a higher level of confidence.

Next to knowing the bounds or 3-point estimates for the uncertainty of the WBS element or risk, choosing the right probability distribution for each WBS element is important for capturing the uncertainty correctly. For any WBS element, selecting the probability distribution should be based on how effectively it models actual outcomes. Since different distributions model different types of risk, knowing the shape of the distribution helps in visualizing how the risk will affect the overall cost estimate uncertainty. A variety of probability distribution shapes are available for modeling cost risk. Table 24 lists eight of the most common probability distributions used in cost estimating uncertainty analysis.

Table 24: Eight Common Probability Distributions

Distribution Description Shape Typical application

Bernoulli Assigns probabilities of “p” for success and “1 – p” for failure; mean = “p”; variance = “1 – p”

Probability

Values

0 1

With likelihood and consequence risk cube models; good for representing the probability of a risk occurring but not for the impact on the program

Beta Similar to normal distribution but does not allow for negative cost or duration, this continuous distribution can be symmetric or

skewed Least

Lik ely

Most

Probability To capture outcomes biased toward

the tail ends of a range; often used with engineering data or analogy estimates; the shape parameters usually cannot be collected from interviewees

Distribution Description Shape Typical application

Lognormal A continuous distribution positively skewed with a limitless upper bound and known lower bound; skewed to the right to reflect the tendency toward higher cost

Probability

Values

To characterize uncertainty in nonlinear cost estimating relationships; it is important to know how to scale the standard deviation, which is needed for this distribution

Normal Used for outcomes likely to occur on either side of the average value; symmetric and continuous, allowing for negative costs and durations. In a normal distribution, about 68% of the values fall within one standard deviation of the mean

Least Lik ely Most Probability Values

To assess uncertainty with cost estimating methods; standard deviation or standard error of the estimate is used to determine dispersion. Since data must be symmetrical, it is not as useful for defining risk, which is usually asymmetrical, but can be useful for scaling estimating error

Poisson Peaks early and has a long tail compared to other distributions

Least Lik ely Most Probability Values

To predict all kinds of outcomes, like the number of software defects or test failures

Triangular Characterized by three points (most likely, pessimistic, and optimistic values), can be skewed or symmetric and is easy to understand because it is intuitive; one drawback is the absoluteness of the end points, although this is not a limitation in practice since it is used in a simulation Least Lik ely Most Probability Values

To express technical uncertainty, because it works for any system architecture or design; also used to determine schedule uncertainty

Uniform Has no peaks because all values, including highest and lowest possible values, are equally likely

Probability

Equally likely throughout

Values

With engineering data or analogy estimates

Weibull Versatile, can take on the characteristics of other

distributions, based on the value of the shape parameter “b”— e.g., Rayleigh and exponential distributions can be derived from ita

Probability

Values

In life data and reliability analysis because it can mimic other distributions and its objective relationship to reliability modeling

Source: DOD, NASA, SCEA, and Industry.

a_{The Rayleigh and exponential distributions are a class of continuous probability distribution.}

The triangular, lognormal, beta, uniform, and normal distributions in table 24 are the most common distributions that cost estimators may use to perform an uncertainty analysis. They are generally sufficient, given the quality of the information derived from interviews and the granularity of the results. However,

many other types of distributions are discussed in myriad literature sources and are available through a variety of statistical tools.

The point to remember is that the shape of the distribution is determined by the characteristics of the risks they represent. If they are applied to WBS elements, they may combine the impact of several risks, so it may take some thought to determine the most appropriate distribution to use. For a CER, it is a best practice to use prediction interval statistical analysis to determine the bounds of the probability distribution because it is an objective method for determining variability. The prediction interval captures the error around a regression estimate and results in a wider variance for the CER.

When there is no objective way to pick the distribution bounds, a cost estimator will resort to interviewing several people—especially experienced personnel both within and outside the program—about what the distribution bounds should be. Promising anonymity to the interviewees may help secure their unbiased thoughts. Separating the risk analysis function organizationally from the program and program manager often provides the needed independence to withstand political and other pressures for biased results. One way to avoid the potential for experts to be success oriented when choosing the upper and lower extremes of the distribution is to look for historical data that back up the distribution range. If historical data are not available, it may be necessary to adjust the tails to account for the fact that being overly optimistic usually results in programs costing more and taking longer than planned. Thus, it is necessary to skew the tails to account for this possibility in order to properly represent the overall risk. Organizations should, as a best practice, examine and publish default distribution bounds that cost estimators can use when the data cannot be obtained objectively.

Once all cost element risks have been identified in step 1 and distributions have been chosen to model them in step 2, correlation between the cost elements must be examined in order to fully capture risk, especially risk related to level-of-effort cost elements.

In document GAO Cost Estimating GAO. Best Practices for Developing and Managing Capital Program Costs. Applied Research and Methods (Page 184-188)