• No results found

Probability Distribution Statistics

Most often we do not know every value and its probability. Thus we cannot apply the equations we have discussed to calculate statistics directly. However, if we know the probability distribution of values, or can estimate what the probability function might be, then we can apply the statistics that have been derived for those distributions. And, appropriately so for project management, we can do quite nicely using arithmetic approximations for the statistics rather than constantly referring to a table of values. Of course, electronic spreadsheets have much better approximations, if not exact values, so spreadsheets are a useful and quick tool for statistical analysis.

Three-Point Estimate Approximations

Quite useful results for project statistics are obtainable by developing three-point estimates that can be used in equations to calculate expected value, variance, and standard deviation. The three points commonly used are:

l Most pessimistic value that yet has some small probability of happening.

l Most optimistic value that also has some small probability of happening.

l Most likely value for any single instance of the project. The most likely value is the mode of the distribution.

Variance Yes Yes If the random variables are not

independent, then a covariance must be computed.

Standard deviation Cannot add or subtract

Yes To add or subtract standard deviations, first compute the sum of the variances and then take the square root.

It is not uncommon that the optimistic and most likely values are much closer to each other than is the pessimistic value. Many things can go wrong that are drivers on the pessimistic estimate; usually, there are fewer things that could go right. Table 2-5 provides the equations for the calculation of approximate values of statistics for the most common distributions.

It is useful to compare the more common distributions under the conditions of identical estimates.

Figure 2-6 provides the illustration. Rules of thumb can be inferred from this illustration:

l As between the Normal, BETA, and Triangular distributions for the same estimates of optimism and pessimism (and the same mode for the BETA and Triangular), the expected value becomes more pessimistic moving from BETA to Triangular to Normal distribution.

Table 2-5: Statistics for Common Distributions

Statistic Normal [*] BETA[**] Triangular Uniform[***]

[*]Formulas are approximations only to more complex functions.

[**]BETA formulas apply to the curve used in PERT calculations. PERT is discussed in Chapter 7. In general, a BETA distribution has four parameters, two of which are fixed to ensure the area under the curve integrates to 1, and two, α and β, determine the shape of the curve. Normally, fixing or

estimating α and β then provides the means to calculate mean and variance. However, for the BETA used in PERT, the mean and variance formulas have been worked out such that α and β become the calculated parameters.

Since in most project situations the exact shape of the BETA curve does not need to be known, the calculation for α and β is not usually performed. If α and β are equal, then the BETA curve is symmetrical.

If the range of values of the BETA distributed random variable is normalized to a range of 0 to 1, then for means less than 0.5 the BETA curve will be skewed to the right; the curve will be symmetrical for mean = 0.5 and skewed left if the mean is greater than 0.5.

[***]In general, variance is calculated as Var(X) = E(X2) - [E(X)]2. This formula is used to derive the variance of the Triangular and Uniform distributions.

The variance for the Uniform reduces to (P - O)2/12 if the optimistic value is 0; similarly, the standard deviation reduces to (P - O)/3.45.

Note: O optimistic value, P = pessimistic value, ML = most likely value.

l The variance and standard deviation of the Normal and BETA distributions are about the same when the pessimistic and optimistic values are taken at the 3σ point. However, since the BETA distribution is not symmetrical, the significance of the standard deviation as a measure of spread around the mean is not as great as in the case of the symmetrical Normal distribution.

Figure 2-6: Statistical Comparison of Distributions.

In addition to the estimates given above, there are a couple of exact statistics about the Normal distribution that are handy to keep in mind:

l 68.3% of the values of a Normal distribution fall within ±1σ of the mean value.

l 95.4% of the values of a Normal distribution fall within ±2σ of the mean value, and this figure goes up to 99.7% for ±3σ of the mean value.

l A process quality interpretation of 99.7% is that there are three errors per thousand events. If software coding were the object of the error measurement, then "three errors per thousand lines of code" probably would not be acceptable. At ±6σ, the error rate is so small, 99.9998%, it is more easily spoken of in terms of "two errors per million events," about 1,000 times better than "3σ".

[20]

[20]The Six Sigma literature commonly speaks of 3.4 errors per million events, not 2.0 errors per

million. The difference arises from the fact that in the original program developed at Motorola, the mean of the distribution was allowed to "wander" ±1.5σ from the expected mean of the distribution. This

"wandering" increases the error rate from 2.0 to 3.4 errors per million events. An older shorthand way of referring to this error rate is "five nines and an eight" or perhaps "about six nines."