• No results found

The Multinomial Distribution

The Poisson Approximation to the Binomial Distribution

4.9 The Multinomial Distribution

An immediate generalization of the binomial distribution arises when each trial can have more than two possible outcomes. This happens, for example, when a manu-factured product is classified as superior, average, or poor, when a student’s perfor-mance is graded as an A, B, C, D, or F, or when an experiment is judged successful, unsuccessful, or inconclusive. To treat this kind of problem in general, let us consider the case where there are n independent trials, with each trial permitting k mutually exclusive outcomes whose respective probabilities are

p1, p2, . . . , pk with

k i=1

pi= 1

Referring to the outcomes as being of the first kind, the second kind,. . . , and the kth kind, we shall be interested in the probability f (x1, x2, . . . , xk) of getting x1 outcomes of the first kind, x2outcomes of the second kind,. . . , and xkoutcomes of the kth kind, with

k i=1

xi= n

Using arguments similar to those which we employed in deriving the equation for the binomial distribution in Section 4.2, it can be shown that the desired probability is given by

Multinomial distribution f (x1, x2, . . . , xk)= n!

x1!x2!. . . xk! px1 1 px2

2 . . . pxkk

for xi= 0, 1, . . . , n for each i, but with the xisubject to the restriction

k i=1

xi= n

The joint probability distribution whose values are given by these probabilities is called the multinomial distribution; it owes its name to the fact that for the various values of the xithe probabilities are given by the corresponding terms of the multinomial expansion of ( p1+ p2+ · · · + pk)n.

EXAMPLE 27 Calculating a probability using the multinomial distribution

The probabilities that the light bulb of a certain kind of projector will last fewer than 40 hours of continuous use, anywhere from 40 to 80 hours of continuous use, or more than 80 hours of continuous use are 0.30, 0.50, and 0.20. Find the probability that among eight such bulbs 2 will last fewer than 40 hours, 5 will last anywhere from 40 to 80 hours, and 1 will last more than 80 hours.

Solution Substituting n= 8, x1= 2, x2= 5, x3= 1, p1= 0.30, p2= 0.50, and p3= 0.20 into the formula, we get

f (2, 5, 1) = 8!

2!5!1!(0.30)2(0.50)5(0.20)1

= 0.0945 j

Exercises

4.72 Suppose that the probabilities are, respectively, 0.40, 0.40, and 0.20 that in city driving a certain kind of im-ported car will average less than 22 miles per gallon, anywhere from 22 to 25 miles per gallon, or more than 25 miles per gallon. Find the probability that among 12 such cars tested, 4 will average less than 22 miles per gallon, 6 will average anywhere from 22 to 25 miles per gallon, and 2 will average more than 25 miles per gallon.

4.73 As can easily be shown, the probabilities of getting 0, 1, or 2 heads with a pair of balanced coins are 1

4,1 2, and1

4. What is the probability of getting 2 tails twice, 1 head and 1 tail 3 times, and 2 heads once in 6 tosses of a pair of balanced coins?

4.74 Suppose the probabilities are 0.89, 0.09, and 0.02 that the finish on a new car will be rated acceptable, easily repairable, or unacceptable. Find the probability that, among 20 cars painted one morning, 17 have accept-able finishes, 2 have repairaccept-able finishes, and 1 finish is unacceptable.

4.75 Using the same sort of reasoning as in the derivation of the formula for the hypergeometric distribution, we can derive a formula which is analogous to the

multi-nomial distribution but applies to sampling without re-placement. A set of N objects contains a1objects of the first kind, a2objects of the second kind,. . . , and ak ob-jects of the kth kind, so that a1+ a2+ · · · + ak= N.

The number of ways in which we can select x1objects of the first kind, x2objects of the second kind, . . . , and xkobjects of the kth kind is given by the product of the number of ways in which we can select x1of the a1 objects of the first kind, x2of the a2objects of the sec-ond kind, . . . , and xkof the akobjects of the kth kind.

Thus, the probability of getting that many objects of each kind is simply this product divided by the total number of ways in which x1+ x2+ · · · + xk= n ob-jects can be selected from the whole set of N obob-jects.

(a) Write a formula for the probability of obtaining x1 objects of the first kind, x2objects of the second kind, . . . and xkobjects of the kth kind.

(b) If 20 defective glass bricks include 10 that have cracks but no discoloration, 7 that are discolored but have no cracks, and 3 that have cracks and dis-coloration, what is the probability that among 6 of the bricks chosen at random for further checks 3 will have cracks only, 2 will only be discolored, and 1 will have cracks as well as discoloration?

4.10 Simulation

In recent years, simulation techniques have been applied to many problems in the various sciences. If the processes being simulated involve an element of chance, these techniques are referred to as Monte Carlo methods. Very often, the use of Monte Carlo simulation eliminates the cost of building and operating expensive equipment. It is used, for instance, in the study of collisions of photons with elec-trons, the scattering of neuelec-trons, and similar complicated phenomena. Monte Carlo methods are also useful in situations where direct experimentation is impossible—

say, in studies of the spread of cholera epidemics, which, of course, cannot be in-duced experimentally on human populations. In addition, Monte Carlo techniques are sometimes applied to the solution of mathematical problems which cannot be solved by direct means, or where a direct solution is too costly or requires too much time.

A classical example of the use of Monte Carlo methods in the solution of a prob-lem of pure mathematics is the determination ofπ (the ratio of the circumference of a circle to its diameter) by probabilistic means. Early in the eighteenth century, George de Buffon, a French naturalist, proved that if a very fine needle of length a is thrown at random on a board ruled with equidistant parallel lines, the probabil-ity that the needle will intersect one of the lines is 2a/πb, where b is the distance between the parallel lines. What is remarkable about this fact is that it involves the constantπ = 3.1415926 …, which in elementary geometry is approximated by the circumferences of regular polygons enclosed in a circle of radius1

2. Buffon’s result implies that if such a needle is actually tossed a great many times, the proportion of the time it crosses one of the lines gives an estimate of 2a/πb and, hence, an

Sec 4.10 Simulation 129

estimate ofπ since a and b are known. Early experiments of this kind yielded an estimate of 3.1596 (based on 5,000 trials) and an estimate of 3.155 (based on 3,204 trials) in the middle of the nineteenth century.

Although Monte Carlo methods are sometimes based on actual gambling de-vices (for example, the needle tossing in the estimation ofπ), it is usually expedient to use so-called random digits or random numbers generated by computer soft-ware. We will illustrate an application using a table of random numbers that consists of many pages on which the digits of 0, 1, 2, …, and 9 are set down in a “random”

fashion, much as they would appear if they were generated one at a time by a gam-bling device giving each digit an equal probability of being selected. Actually, we could also construct such tables ourselves—say, by repeatedly drawing numbered slips out of a hat or by using a perfectly constructed spinner—but in practice such tables are usually generated by means of computers.

Although tables of random numbers are constructed so that the digits can be looked upon as values of a random variable having the discrete uniform distribution

f (x) = 1

10 for x = 0, 1, 2, . . . , or 9, they can be used to simulate values of any discrete random variable, and even continuous random variables.

To illustrate the use of a table of random numbers, let us simulate, say, tossing three balanced coins. The distribution for the number of heads is

Number of Heads Probability

0 1/8 = 0.125

1 3/8 = 0.375

2 3/8 = 0.375

3 1/8 = 0.125

Since the probabilities in this distribution are given to three decimal places, we use three-digit random numbers. Our scheme is to allocate 125 (or one-eighth) of the 1,000 random numbers from 000 to 999 to 0 heads, 375 (or three-eighths) to 1 head, 375 (or three-eighths) to 2 heads, and 125 (or one-eighth) to 3 heads.

We use the following scheme:

Cumulative Random Number of Heads Probability Probability Numbers

0 0.125 0.125 000–124

1 0.375 0.500 125–449

2 0.375 0.875 500–874

3 0.125 1.000 875–999

The column of cumulative probabilities was added to facilitate the assignment of the random numbers. Observe that in each case the last random digit is one less than the number formed by the three decimal digits of the corresponding cumulative probability.

With this scheme, if we arbitrarily use the twenty-second, twenty-third, and twenty-fourth columns of the first page of Table 7W, starting with the sixth row and going down the page, we get 197, 365, 157, 520, 946, 951, 948, 568, 586, and 089, and we interpret this as 1, 1, 1, 2, 3, 3, 3, 2, 2, and 0 heads.

The method we have illustrated here with reference to a game of chance can be used to simulate observations of any random variable with a given probability distribution.

However, in practice it is much more efficient to use common computer software based on this scheme.

EXAMPLE 28 Simulation of arrival of cars at toll booth

Suppose that the probabilities are 0.082, 0.205, 0.256, 0.214, 0.134, 0.067, 0.028, 0.010, 0.003, and 0.001 that 0, 1, 2, 3, . . . , or 9 cars will arrive at a toll booth of a turnpike during any one-minute interval in the early afternoon.

Use computer software to simulate the arrival of cars at the toll booth during 20 one-minute intervals in the early afternoon.

Solution We illustrate using MINITAB with the values set in C1 and the probabilities in C2.

Data:

C1: 0, 1, . . . , 9

C2: 0.082, 0.205, . . . , 0.001 Dialog box:

Calc> Random Data > Discrete

Type 20 after Generate. Type C3 below Store. Type C1 in Values in:. Type C2 in Probabilities in Click OK.

Output:

4 1 5 4 1 2 5 0 1 4

3 3 1 0 1 1 2 5 1 2

Suppose we are interested in a somewhat complex event, say, 11 or more cars arrive in at least one three-minute interval among the 20 one-minute intervals. It is a simple manner to repeat the simulation of 20 one-minute periods 100 times. The probability that 11 or more cars arrive in at least one three-minute interval is estimated by the proportion of times that event occurs. In the single sample of size 20 here, that event

does not occur. j

Exercises

4.76 Simulate tossing a coin.

(a) For a balanced coin, generate 100 flips.

(b) For a coin with probability of heads 0.8, generate 100 flips.

4.77 The probabilities that a quality control team will visit 0, 1, 2, 3, or 4 production sites on a single day are 0.15, 0.22, 0.35, 0.21, and 0.07.

(a) Simulate the inspection team’s visits on 30 days.

(b) Repeat the simulation of visits on 30 days a total of 100 times. Estimate the probability that there are more than 10 visits over five consecutive days.

4.78 Depending on the availability of parts, a company can manufacture 3, 4, 5, or 6 units of a certain item per week with corresponding probabilities of 0.10, 0.40, 0.30, and 0.20. The probabilities that there will be a weekly demand for 0, 1, 2, 3, …, or 8 units are, respec-tively, 0.05, 0.10, 0.30, 0.30, 0.10, 0.05, 0.05, 0.04, and 0.01. If a unit is sold during the week that it is made, it will yield a profit of $100; this profit is reduced by

$20 for each week that a unit has to be stored. Simu-late the operation of this company for 50 consecutive weeks and estimate its expected weekly profit.

Sec 4.10 Simulation 131

Do’s and Don’ts

Do’s

1. Keep in mind that any scheme for assigning a numerical value to each pos-sible outcome should quantify a feature of the outcome that is important to the scientist. That is, any random variable should convey pertinent infor-mation about the outcome.

2. Describe the chance behavior of a discrete random variable X by its prob-ability distribution function

f (x)= P[X = x] for each possible value x 3. Summarize a probability distribution, or the random variable, by its

mean:μ = all x

x· f (x) variance:σ2= all x

(x− μ)2· f (x)

standard deviation:σ =

all x

(x− μ)2· f (x)

4. Use a special family of distributions, for instance the binomial distribution b( x; n, p) =

n x



px(1− p)n−x for x= 0, 1, . . . , n

having mean np and variance np(1− p), if the underlying assumptions are reasonable. The hypergeometric distribution might be entertained when sampling without replacement from a finite collection of units each of which is one of two possible types. It will be well approximated by the binomial when the sample size n is a small fraction of the population size N.

5. For counts whose possible values do not have a specified upper limit, con-sider the Poisson distribution

f (x; λ) = λxe−λ

x! for x= 0, 1, 2, . . . λ > 0

having meanλ and variance λ. You do need to check that the Poisson dis-tribution is reasonable. The sample mean and variance should be about the same size.

Don’ts

1. Never apply the binomial distribution to counts without first checking that the conditions hold for Bernoulli trials: independent trials with the same probability of success for each trial. If the conditions are satisfied, then the binomial distribution is appropriate for the number of successes in a fixed number of trials.

2. Never use the formula np(1− p) for the variance of a count of successes without checking that the trials are independent.

Review Exercises

4.79 A manufacturer of smart phones has the following probability distribution for the number of defects per phone:

(a) Determine the probability of 2 or more defects.

(b) Is a randomly selected phone more likely to have 0 defects or 1 or more defects?

4.80 Upon reviewing recent use of conference rooms at an engineering consulting firm, an industrial engineer determined the following probability distribution for the number of requests for a conference room per half-day:

(a) Currently, the building has two conference rooms.

What is the probability that the number of requests will exceed the number of rooms for a given half-day?

(b) What is the probability that the two conference rooms will not be fully utilized on a given half-day?

(c) How many additional conference rooms are re-quired so that the probability of denying a request is not more than 0.10?

4.81 Refer to Exercise 4.80 and obtain the

(a) mean; (b) variance; (c) standard deviation for the number of requests for conference rooms.

4.82 Determine whether the following can be probability distributions of a random variable that can take on only the values of 0, 1, and 2:

(a) f (0)= 0.34 f (1)= 0.34 and f (2) = 0.34.

(b) f (0)= 0.2 f (1)= 0.6 and f (2) = 0.2.

(c) f (0)= 0.7 f (1)= 0.4 and f (2) = −0.1.

4.83 Check whether the following can define probability distributions, and explain your answers.

(a) f (x)= x

4.84 An engineering student correctly answers 85% of all questions she attempts. What is the probability that the first incorrect answer was the fourth one?

4.85 If the probability is 0.20 that a downtime of an auto-mated production process will exceed 2 minutes, find the probability that 3 of 8 downtimes of the process will exceed 2 minutes using (a) the formula for the bi-nomial distribution; (b) Table 1 or software.

4.86 If the probability is 0.90 that a new machine will produce 40 or more chairs, find the probabilities that among 16 such machines

(a) 12 will produce 40 or more chairs;

(b) at least 10 will produce 40 or more chairs;

(c) at most 3 will not produce 40 or more chairs.

4.87 In 16 experiments studying the electrical behavior of single cells, 12 use micro-electrodes made of metal and the other 4 use micro-electrodes made from glass tubing. If 2 of the experiments are to be terminated for financial reasons, and they are selected at random, what are the probabilities that

(a) neither uses micro-electrodes made from glass tubing?

(b) only one uses micro-electrodes made from glass tubing?

(c) both use micro-electrodes made from glass tubing?

4.88 As can be easily verified by means of the formula for the binomial distribution, the probabilities of getting 0, 1, 2, or 3 heads in 3 flips of a coin whose probability of heads is 0.4 are 0.216, 0.432, 0.288, and 0.064. Find the mean of this probability distribution using (a) the formula that definesμ;

(b) the special formula for the mean of a binomial dis-tribution.

4.89 With reference to Exercise 4.88, find the variance of the probability distribution using

(a) the formula that definesσ2;

(b) the special formula for the variance of a binomial distribution.

4.90 Find the mean and the standard deviation of the distri-bution of each of the following random variables (hav-ing binomial distributions):

(a) The number of heads in 440 flips of a balanced coin.

(b) The number of 6’s in 300 rolls of a balanced die.

Key Terms 133

(c) The number of defectives in a sample of 700 parts made by a machine, when the probability is 0.03 that any one of the parts is defective.

4.91 Use the Poisson distribution to approximate the bino-mial probability b(1; 100, 0.02).

4.92 With reference to Exercise 4.87, find the mean and the variance of the distribution of the number of micro-electrodes made from glass tubing using

(a) the probabilities obtained in that exercise;

(b) the special formulas for the mean and the variance of a hypergeometric distribution.

4.93 The daily number of orders filled by the parts de-partment of a repair shop is a random variable with μ = 142 and σ = 12. According to Chebyshev’s the-orem, with what probability can we assert that on any one day it will fill between 82 and 202 orders?

4.94 Records show that the probability is 0.00008 that a truck will have an accident on a certain highway. Use the formula for the Poisson distribution to approximate the probability that at least 5 of 20,000 trucks on that highway will have an accident.

4.95 The number of weekly breakdowns of a computer is a random variable having a Poisson distribution with λ = 0.2. What is the probability that the computer

will operate without a breakdown for 3 consecutive weeks?

4.96 A manufacturer determines that a big screen HDTV set had probabilities of 0.8, 0.15, 0.05, respectively, of be-ing placed in the categories acceptable, minor defect, or major defect. If 3 HDTVs are inspected,

(a) find the probability that 2 are acceptable and 1 is a minor defect;

(b) find the marginal distribution of the number in minor defect;

(c) compare your answer in part (b) with the binomial probabilities b(x; 3, 0.15). Comment.

4.97 Suppose that the probabilities are 0.2466, 0.3452, 0.2417, 0.1128, 0.0395, 0.0111, 0.0026, and 0.0005 that there will be 0, 1, 2, 3, 4, 5, 6, or 7 polluting spills in the Great Lakes on any one day. Simulate the num-bers of polluting spills in the Great Lakes in 30 days.

4.98 A candidate invited for a visit has probability 0.6 of being hired. Let X be the number of candidates that visit before 2 are hired. Find

(a) P ( X≤ 4 ); kth moment about the mean 114

kth moment about the origin 113 Kurtosis 114

Law of large numbers 116 Mean 108

CHAPTER