Probability Theory

(1)

A P P E N D I X

672

A

This appendix is intended to serve as a brief review of the probability and statistics concepts used in this text. Students requiring more review than is available in this ap-pendix should consult one of the texts listed in the bibliography.

A.1 PROBABILITY

Uncertainty in organizational decision making is a fact of life. Demand for an organi-zation’s output is uncertain. The number of employees who will be absent from work on any given day is uncertain. The price of a stock tomorrow is uncertain. Whether or not it will snow tomorrow is uncertain. Each of these events is more or less uncertain. We do not know exactly whether or not the event will occur, nor do we know the value that a particular random variable (e.g., price of stock, demand for output, num-ber of absent employees) will assume.

In common terminology we reflect our uncertainty with such phrases as “not very likely,’’ “not a chance,’’ “for sure.’’ But, while these descriptive terms communicate one’s feeling regarding the chances of a particular event’s occurrence, they simply are not precise enough to allow analysis of chances and odds.

Simply put, probability is a number on a scale used to measure uncertainty. The range of the probability scale is from 0 to 1, with a 0 probability indicating that an event has no chance of occurring and a probability of 1 indicating that an event is ab-solutely sure to occur. The more likely an event is to occur, the closer its probability is to 1. This probability definition, which is general, needs to be further augmented to il-lustrate the various types of probability that decision makers can assess. There are three types of probability that the operations manager should be aware of:

(2)

A.2 EVENT RELATIONSHIPS AND PROBABILITY LAWS 673

•

Subjective probability

•

Logical probability

•

Experimental probability

Subjective Probability

Subjective probability is based on individual information and belief. Different indi-viduals will assess the chances of a particular event in different ways, and the same individual may assess different probabilities for the same event at different points in time. For example, one need only watch the blackjack players in Las Vegas to see that different people assess probabilities in different ways. Also, daily trading in the stock market is the result of different probability assessments by those trading. The sellers sell because it is their belief that the probability of appreciation is low, and the buyers buy because they believe that the probability of appreciation is high. Clearly, these different probability assessments are about the same events.

Logical Probability

Logical probability is based on physical phenomena and on symmetry of events. For example, the probability of drawing a three of hearts from a standard 52-card playing deck is 1/52. Each card has an equal likelihood of being drawn. In flipping a coin, the chance of “heads’’ is 0.50. That is, since there are only two possible outcomes from one flip of a coin, each event has one-half the total probability, or 0.50. A final exam-ple is the roll of a single die. Since each of the six sides are identical, the chance of any one event occurring (i.e., a 6, a 3, etc.) is 1/6.

Experimental Probability

Experimental probability is based on frequency of occurrence of events in trial situa-tions. For example, in determining the appropriate inventory level to maintain in the raw material inventory, we might measure and record the demand each day from that inventory. If, in 100 days, demand was 20 units on 16 days, the probability of demand equaling 20 units is said to be 0.16 (i.e., 16/100). In general, experimental probability of an event is given by

Both logical and experimental probability are referred to as objective probability in contrast to the individually assessed subjective probability. Each of these is based on, and directly computed from, facts.

A.2 EVENT RELATIONSHIPS AND PROBABILITY LAWS

Events are classified in a number of ways that allow us to state rules for probability computations. Some of these classifications and definitions follow.

probability of event number of times event occurred total number of trials =

(3)

1. Independent events: events are independent if the occurrence of one does not af-fect the probability of occurrence of the others.

2. Dependent events: events are termed dependent if the occurrence of one does af-fect the probability of occurrence of others.

3. Mutually exclusive events: two events are termed mutually exclusive if the occur-rence of one precludes the occuroccur-rence of the other. For example, in the birth of a child, the events “It’s a boy!’’ and “It’s a girl!’’ are mutually exclusive.

4. Collectively exhaustive events: a set of events is termed collectively exhaustive if on any one trial at least one of them must occur. For example, in rolling a die, one of the events 1, 2, 3, 4, 5, or 6 must occur; therefore, these six events are collec-tively exhaustive.

We can also define the union and intersection of two events. Consider two events

A and B. The union of A and B includes all outcomes in A or B or in both A and B. For

example, in a card game you will win if you draw a diamond or a jack. The union of these two events includes all diamonds (including the jack of diamonds) and the re-maining three jacks (hearts, clubs, spades). The or in the union is the inclusive or. That is, in our example you will win with a jack or a diamond or a jack of diamonds (i.e., both events).

The intersection of two events includes all outcomes that are members of both events. Thus, in our previous example of jacks and diamonds, the jack of diamonds is the only outcome contained in both events and is therefore the only member of the in-tersection of the two events.

Let us now consider the relevant probability laws based on our understanding of the above definitions and concepts. For ease of exposition let us define the following notation:

P(A) = probability that event A will occur P(B) = probability that event B will occur

If two events are mutually exclusive, then their joint occurrence is impossible. Hence, P(A and B) = 0 for mutually exclusive events. If the events are not mutually exclusive, P(A and B) can be computed (as we will see in the next section); this prob-ability is termed the joint probprob-ability of A and B. Also, if A and B are not mutually ex-clusive, then we can also define the conditional probability of A given that B has al-ready occurred or the conditional probability of B given that A has alal-ready occurred. These probabilities are written as P(A| B) and P(B| A), respectively.

The Multiplication Rule

The joint probability of two events that are not mutually exclusive is found by using the multiplication rule. If the events are independent events, the joint probability is given by

P(A and B) = P(A) × P(B|A) or P(B) × P(A|B)

If the events are independent, then P(B|A) and P(A|B) are equal to P(B) and P(A), respectively, and therefore the joint probability is given by

(4)

A.2 EVENT RELATIONSHIPS AND PROBABILITY LAWS 675

P(A and B) = P(A) × P(B)

From these two relationships, we can find the conditional probability for two depen-dent events from

and

Also, the P(A) and P(B) can be computed if the events are independent, as

and

The Addition Rule

The addition rule is used to compute the probability of the union of two events. If two events are mutually exclusive. The P(A and B)= 0 as we indicated previously. There-fore, the probability of either A or B or both is simply the probability of A or B. This is given by

P(A or B) = P(A) + P(B)

But, if the events are not mutually exclusive, then the probability of A or B is given by

P(A or B)= P(A) + P(B) − P(A and B)

We can denote the reasonableness of this expression by looking at the following Venn diagram. P B P A B P A ( ) ( and ) ( ) = P A P A B P B ( ) ( and ) ( ) = P B A P A B P A ( | ) ( and ) ( ) = P A B P A B P B ( | ) ( and ) ( ) = B A

The two circles represent the probabilities of the events A and B, respectively. The shaded area represents the overlap in the events; that is, the intersection of A and B. If we add the area of A and the area of B, we have included the shaded area twice. Therefore, to get the total area of A or B, we must subtract one of the areas of the in-tersection that we have added.

(5)

If two events are collectively exhaustive, then the probability of (A or B) is equal to 1. That is, for two collectively exhaustive events, one or the other or both must occur, and therefore, P(A or B) must be 1.

A.3 STATISTICS

Because events are uncertain, we must employ special analyses in organizations to ensure that our decisions recognize the chance nature of outcomes. We employ statis-tics and statistical analysis to

1. Concisely express the tendency and the relative uncertainty of a particular situa-tion.

2. Develop inferences or understanding about a situation.

“Statistics’’ is an elusive and often misused term. Batting averages, birth weights, student grade points are all statistics. They are descriptive statistics. That is, they are quantitative measures of some entity and, for our purposes, can be considered as data about the entity. The second use of the term “statistics’’ is in relation to the body of theory and methodology used to analyze available evidence (typically quantitative) and to develop inferences from the evidence.

Two descriptive statistics that are often used in presenting information about a population of items (and consequently in inferring some conclusions about the popu-lation) are the mean and the variance. The mean in a population (denoted as µ) can be computed in two ways, each of which gives identical results.

where

k= the number of discrete values that the random variable Xjmay assume

Xj= the value of the random variable

P(Xj) = is the probability (or relative frequency) of Xjin the population

Also, the mean can be computed as

where

N= the size of the population (the number of different items in the population) Xi= the value of the ith item in the population

The mean is also termed the expected value of the population and is written as E(X). µ= =

∑

X N_i i N / 1 µ= =

∑

X P X_j _j j k ( ) 1

(6)

A.3 STATISTICS 677 The variance of the items in the population measures the dispersion of the items about their mean. It is computed in one of the following two ways:

or

The standard deviation, another measure of dispersion, is simply the square root of the variance or

σ = √σ2

Descriptive Versus Inferential Statistics

Organizations are typically faced with decisions for which a large portion of the rele-vant information is uncertain. In hiring graduates of your university, the “best’’ prospective employee is unknown to the organization. Also, in introducing a new product, proposing a tax law change to boost employment, drilling an oil well, and so on, the outcomes are always uncertain.

Statistics can often aid management in reducing this uncertainty. This is accom-plished through the use of one or the other, or both, of the purposes of statistics. That is, statistics is divided according to its two major purposes: describing the major char-acteristics of a large mass of data and inferring something about a large mass of data from a smaller sample drawn from the mass. One methodology summarizes all the data; the other reasons from a small set of the data to the larger total.

Descriptive statistics uses such measures as the mean, median, mode, range,

vari-ance, standard deviation, and such graphical devices as the bar chart and the his-togram. When an entire population (a complete set of objects or entities with a com-mon characteristic of interest) of data is summarized by computing such measures as the mean and the variance of a single characteristic, the measure is referred to as a

pa-rameter of that population. For example, if the population of interest is all female

freshmen at your university and all their ages were used to compute an arithmetic av-erage of 19.2 years, this measure is called a parameter of that population.

Inferential statistics also uses means and variance, but in a different manner. The

objective of inferential statistics is to infer the value of a population parameter through the study of a small sample (a portion of a population) from that population. For example, a random sample of 30 freshmen females could produce the information that there is 90 percent certainty that the average age of all freshmen women is be-tween 18.9 and 19.3 years. We do not have as much information as if we had used the entire population, but then we did not have to spend the time to find and determine the age of each member of the population either.

σ2 µ 2 1 = − =

∑

(X ) N i i k σ2 µ 2 1 = − =

∑

(X_j ) P X( _j) j k

(7)

Before considering the logic behind inferential statistics, let us define the primary measures of central tendency and dispersion used in both descriptive and inferential statistics.

Measures of Central Tendency

The central tendency of a group of data represents the average, middle, or “normal’’ value of the data. The most frequently used measures of central tendency are the

mean, the median, and the mode.

The mean of a population of values was given earlier as

where

µ = the mean (µ pronounced “mu’’)

Xi= the value of the ith data item

N= the number of data items in the population

The mean of a sample of items from a population is given by

where –

X= the sample mean (pronounced “X bar’’) Xi= the value of the ith data item in the sample

n= the number of data items selected in the sample

The median is the middle value of a population of data (or sample) where the data are ordered by value. That is, in the following data set

3, 2, 9, 6, 1, 5, 7, 3, 4

4 is the median since (as you can see when we order the data) 1, 2, 3, 3, 4, 5, 6, 7, 9

50 percent of the data values are above 4 and 50 percent below 4. If there are an even number of data items, then the mean of the middle two is the median. For example, if there had also been an 8 in the above data set, the median would be 4.5 = (4 + 5)/2.

The mode of a population (or sample) of data items is the value that most fre-quently occurs. In the above data set, 3 is the mode of the set. A distribution can have more than one mode if there are two or more values that appear with equal frequency.

X X n i i n = =

∑

1 µ= =

∑

N X N i i k 1

(8)

A.3 STATISTICS 679

Measures of Dispersion

Dispersion refers to the scatter around the mean of a distribution of values. Three measures of dispersion are the range, the variance, and the standard deviation.

The range is the difference between the highest and the lowest value of the data set, that is, Xhigh − Xlow.

The variance of a population of items is given by

where

σ2_{= the population variance (pronounced “sigma squared”)} The variance of a sample of items is given by

where

S2= the sample variance

The standard deviation is simply the square root of the variance. That is,

and

σ and S are the population and sample standard deviations, respectively.

Inferential Statistics

A basis of inferential statistics is the interval estimate. Whenever we infer from par-tial data to an entire population, we are doing so with some uncertainty in our infer-ence. Specifying an interval estimate (e.g., average weight is between 10 and 12 pounds) rather than a point estimate (e.g., the average weight is 11.3 pounds) simply helps to relate that uncertainty. The interval estimate is not as precise as the point esti-mate. S X X n i i n = − =

∑

( )2 1 σ = −µ =

∑

(X ) N i i n 2 1 S X X n i i n 2 2 1 = − =

∑

( ) σ2 µ 2 1 = − =

∑

(X ) N i i N

(9)

Inferential statistics uses probability samples where the chance of selection of each item is known. A random sample is one in which each item in the population has an equal chance of selection.

The procedure used to estimate a population mean from a sample is to 1. Select a sample of size n from the population.

2. Compute X–the mean and S the standard deviation.

3. Compute the precision of the estimate (i.e., the ± limits around X–within which the mean µ is believed to exist).

Steps 1 and 2 are straightforward, relying on the equations we have presented in ear-lier sections. Step 3 deserves elaboration.

The precision of an estimate for a population parameter depends on two things: the standard deviation of the sampling distribution, and the confidence you desire to have in the final estimate. Two statistical laws provide the logic behind Step 3.

First, the law of large numbers states that as the size of a sample increases toward infinity, the difference between the estimate of the mean and the true population mean tends toward zero. For practical purposes, a sample of size 30 is assumed to be “large enough’’ for the sample estimate to be a good estimate of the population mean.

Second, the central limit theorem states that if all possible samples of size n were taken from a population with any distribution, the distribution of the means of those samples would be normally distributed with a mean equal to the population mean and a standard deviation equal to the standard deviation of the population divided by the square root of the sample size. That is, if we took all of the samples of size 100 from the population shown in Figure 1, the sampling distribution would be as shown in Figure 2. The logic behind Step 3 is that

1. Any sample of size n from the population can be considered to be one observation from the sampling distribution with the mean µ_x= µ and the standard deviation

σx σ n = µx = 50 σ = 20 X µx = 50 X σx =_nσ= 20 100= 2

(10)

A.3 STATISTICS 681 2. From our knowledge of the normal distribution, we know that there is a number (see normal probability table, back cover) associated with each probability value of a normal distribution (e.g., the probability that an item will be within ± 2 stan-dard deviations of the mean of a normal distribution is 94.45 percent, Z= 2 in this case).

3. The value of the number Z is simply the number of standard deviations away from the mean that a given point lies. That is,

or in the case of Step 3

4. The precision of a sample estimate is given by Zσx–.

5. The interval estimate is given by the point estimate X–plus or minus the precision, or X– ± Zσx–.

In the previous example shown in Figures 1 and 2, suppose that a sample estimate X– was 56 and the population standard deviation σ was 20. Also, suppose that the desired confidence was 90 percent. Since the associated Z value for 90 percent is 1.645, the interval estimate for µ is

or

56 ± 3.29 or 52.71 to 59.29

This interval estimate of the population mean is based solely on information de-rived from a sample and states that the estimator is 90 percent confident that the true mean is between 52.71 and 59.29. There are numerous other sampling methods and other parameters that can be estimated; the student is referred to one of the references in the bibliography for further discussion.

Standard Probability Distributions

The normal distribution, discussed and shown in Figure 2, is probably the most com-mon probability distribution in statistics. Some other comcom-mon distributions are the Poisson, a discrete distribution, and the negative exponential, a continuous distribu-tion. In project management, the beta distribution plays an important role. A continu-ous distribution, it is generally skewed, as in Figure 1. Two positive parameters, alpha

56 1 645 20 100 ±    . Z X Z x x =( −µ ) σ Z=(X−µ) σ

(11)

ANDERSON, D., D. SWEENEY, and T. WILLIAMS.

Statis-tics for Business and Economics. 7th ed. Cincinnati,

OH: South-Western, 1998.

BHATTACHARYYA, G., and R. A. JOHNSON. Mathematical

Statistics. Paramus, NJ: Prentice-Hall, 1999.

KEEFER, D. L., and W. A. VERDINI. “Better Estimation

of PERT Activity Time Parameters.’’ Management

Sci-ence, September 1993.

MENDENHALL, W., R. L. SCHAEFFER, and D. WACKERLY.

Mathematical Statistics with Applications, 3rd ed.

Boston: PWS-Kent, 1986.

NETER, J., W. WASSERMAN, and G. A. WHITMORE.

Applied Statistics, 3rd. ed. Boston: Allyn and Bacon,

1987.

and beta, determine the distribution’s shape. Its mean, µ, and variance, σ2, are given by

These are often approximated by

µ = (a + 4m + b)/6 and a standard deviation approximated by

σ = (b − a)/6 where

a is the optimistic value that might occur once in a hundred times, m is the most likely (modal) value, and

b is the pessimistic value that might occur once in a hundred times.

Recent research (Keefer and Verdini, 1993) has indicated that a much better approxi-mation is given by

µ = 0.630 d + 0.185 (c + e)

σ2_{= 0.630 (d − µ)}2_{+ 0.185 [(c − µ)}2_{+ (e − µ)}2_] where

c is an optimistic value at one in 20 times, d is the median, and

e is a pessimistic value at one in 20 times.

See Chapter 8 for another method for approximating µ and σ.2 BIBLIOGRAPHY σ αβ α β α β 2 2 ₂ = + + + ( ) ( ) µ α α β = +