Solving problems by Monte Carlo methods - Computer Simulations and Monte Carlo Methods

Computer Simulations and Monte Carlo Methods

5.3 Solving problems by Monte Carlo methods

k :−1

λln(U1· . . . · U^k)≤ 1

= max

k : U1· . . . · U^k≥ e^−λ

. (5.4)

This formula for generating Poisson variables is rather popular.

Normal distribution Box-Muller transformation

( Z1 = p

−2 ln(U¹) cos(2πU2)

Z2 = p

−2 ln(U²) sin(2πU2)

converts a pair of generated Standard Uniform variables (U1, U2) into a pair of independent Standard Normal variables (Z1, Z2). This is a rather economic algorithm. To see why it works, solve Exercise 5.13.

5.3 Solving problems by Monte Carlo methods

We have learned how to generate random variables from any given distribution. Once we know how to generate one variable, we can put the algorithm in a loop and generate many variables, a “long run.” Then, we shall estimate probabilities by the long-run proportions, expectations by the long-run averages, etc.

5.3.1 Estimating probabilities

This section discusses the most basic and most typical application of Monte Carlo methods.

Keeping in mind that probabilities are long-run proportions, we generate a long run of experiments and compute the proportion of times when our event occurred.

For a random variable X, the probability p = P{X ∈ A} is estimated by ˆ

p = bP{X ∈ A} = number of X1, . . . , XN ∈ A

N ,

where N is the size of a Monte Carlo experiment, X1, . . . , XN are generated random vari-ables with the same distribution as X, and a “hat” means the estimator. The latter is a very common and standard notation:

Notation θˆ = estimator of an unknown quantity θ

How accurate is this method? To answer this question, compute E(ˆp) and Std(ˆp). Since the number of X1, . . . , XN that fall within set A has Binomial(N, p) distribution with ex-pectation (N p) and variance N p(1− p), we obtain

E(ˆp) = 1

The first result, E(ˆp) = p, shows that our Monte Carlo estimator of p is unbiased, so that over a long run, it will on the average return the desired quantity p.

The second result, Std(ˆp) = p

p(1− p)/N, indicates that the standard deviation of our estimator ˆp decreases with N at the rate of 1/√

N . Larger Monte Carlo experiments produce more accurate results. A 100-fold increase in the number of generated variables reduces the standard deviation (therefore, enhancing accuracy) by a factor of 10.

Accuracy of a Monte Carlo study

In practice, how does it help to know the standard deviation of ˆp?

First, we can assess the accuracy of our results. For large N , we use Normal approximation of the Binomial distribution of N ˆp, as in (4.19) on p. 94. According to it,

N ˆp− Np

We have computed probabilities of this type in Section 4.2.4.

Second, we can design a Monte Carlo study that attains desired accuracy. That is, we can choose some small ε and α and conduct a Monte Carlo study of such a size N that will guarantee an error not exceeding ε with high probability (1− α). In other words, we can find such N that

P{|ˆp − p| > ε} ≤ α. (5.6)

If we knew the value of p, we could have equated the right-hand side of (5.5) to α and could have solved the resulting equation for N . This would have shown how many Monte Carlo simulations are needed in order to achieve the desired accu-racy with the desired high probability. However, p is unknown (if p is known, why do we need a Monte Carlo study to estimate it?). Then, we have two possibilities: attains its maximum at p = 0.5.

1. Use an “intelligent guess” (preliminary estimate) of p, if it is available.

2. Bound p (1− p) by its largest possible value (see Figure 5.5), p (1− p) ≤ 0.25 for 0 ≤ p ≤ 1.

In the first case, if p^∗ is an “intelligent guess” of p, then we solve the inequality

2Φ − ε√

to find the conservatively sufficient size of the Monte Carlo study.

Solutions of these inequalities give us the following rule.

Size of a Monte

random variables, where p^∗is a preliminary estimator of p, or

N≥ 0.25zα/2

random variables, if no such estimator is available.

Recall (from Example 4.12) that zα= Φ⁻¹(1−α) is such a value of a Standard Normal variable Z that can be exceeded with probability α. It can be obtained from Table A4. The area under the Normal curve to the right of zα equals α (Figure 5.6).

If this formula returns N that is too small for the Normal approximation, we can use Chebyshev’s inequality (3.8) on p. 54. We then obtain that

✲^x

✻φ(x)

0 zα

✢

Area = α

FIGURE 5.6 Critical value zα.

N ≥ p^∗(1− p^∗)

αε² , (5.7)

if an “intelligent guess” p^∗ is available for p, and

N ≥ 1

4αε², (5.8)

otherwise, satisfy the desired condition (5.6) (Exercise 5.14).

Example 5.14 (Shared computer). The following problem does not have a simple an-alytic solution (by hand), therefore we use the Monte Carlo method.

A supercomputer is shared by 250 independent subscribers. Each day, each subscriber uses the facility with probability 0.3. The number of tasks sent by each active user has Geometric distribution with parameter 0.15, and each task takes a Gamma(10, 3) distributed computer time (in minutes). Tasks are processed consecutively. What is the probability that all the tasks will be processed, that is, the total requested computer time is less than 24 hours?

Estimate this probability, attaining the margin of error±0.01 with probability 0.99.

Solution. The total requested time T = T1+ . . . + TX consists of times Ti requested by X active users. The number of active users X is Binomial(C, p), and each of them sends a Geometric(q) number of tasks Yi. Thus, each Ti = Ti,1+ . . . + Ti,Yi is the sum of Yi

Gamma(β, λ) random variables. Overall, the distribution of T is rather complicated, al-though a Monte Carlo solution is simple.

It is hard to come up with an “intelligent guess” of the probability of interest P{T < 24 hrs}. To attain the required accuracy (α = 0.01, ε = 0.01), we use

simulations, where zα/2= z0.005 = 2.575 is found from Table A4. Use this table to verify that Φ(2.575)≈ 0.995. The obtained number N is large enough to justify the Normal ap-proximation.

Next, we generate the number of active users, the number of tasks, and the time required by each task, repeat this procedure N times, and compute the proportion of times when the to-tal time appears less than 24 hrs = 1440 min. The following MATLAB program can be used:

N=16577; % number of simulations

C=250; p=0.3; q=0.15; % parameters

beta=10; lambda=3; % parameters of Gamma distribution TotalTime=zeros(N,1); % save the total time for each run

for k=1:N; % do-loop of N runs

X=sum( rand(C,1)<p ); % the number of active users is generated

% as a sum of Bernoulli(p) variables Y=ceil( log(1-rand(X,1))/log(1-q) );

% the number of tasks for each of X users

% is generated according to formula (5.3) TotalTasks=sum(Y); % total daily number of tasks

T=sum( -1/lambda * log(rand(beta,TotalTasks)) );

% generate Gamma times as in Example 5.11 TotalTime(k)=sum(T); % total time from the k-th Monte Carlo run

end; % end of simulations

P est=mean(TotalTime<1440) % proportion of runs with the total time less

% than 24 hours; this is our estimator of p.

The resulting estimated probability should be close to 0.17. It is a rather low probability

that all the tasks will be processed. ♦

5.3.2 Estimating means and standard deviations

Estimation of means, standard deviations, and other distribution characteristics is based on the same principle. We generate a Monte Carlo sequence of random variables X1, . . . , XN

and compute the necessary long-run averages. The mean E(X) is estimated by the average (denoted by ¯X and pronounced “X-bar”)

X =¯ 1

From (5.9), we conclude that the estimator ¯X is unbiased for estimating µ. From (5.10), its standard deviation is Std( ¯X) = σ/√

N , and it decreases like 1/√

N . For large N , we can use the Central Limit Theorem again to assess accuracy of our results and to design a Monte Carlo study that attains the desired accuracy (the latter is possible if we have a

“guess” about σ).

Variance of a random variable is defined as the expectation of (X − µ)². Similarly, we estimate it by a long-run average, replacing the unknown value of µ by its Monte Carlo estimator. The resulting estimator is usually denoted by s²,

s²= 1

Remark:Does the coefficient _N−1¹ seem surprising? Proper averaging should divide the sum of N numbers by N ; however, only (N − 1) in the denominator guarantees that E(s²) = σ², that is, s²is unbiased for σ². This fact will be proved in Section 8.2.4. Sometimes, coefficients _N¹ and even _N+1¹ are also used when estimating σ². The resulting estimates are not unbiased, but they have other attractive properties. For large N , the differences between all three estimators are negligible.

Example 5.15 (Shared computer, continued). In Example 5.14, we generated the total requested computer time for each of N days and computed the proportion of days when this time was less than 24 hours. Further, we can estimate the expectation of requested time by computing the average,

ExpectedTime = mean(TotalTime);

The standard deviation of daily requested time will then be estimated by

StandardDeviation = std(TotalTime);

The mean requested time appears to be 1667 minutes, exceeding the 24-hour period by 227 minutes, and the standard deviation is around 243 minutes. It is clear now why all the tasks get processed with such a low probability as estimated in Example 5.14. ♦

5.3.3 Forecasting

Forecasting the future is often an attractive but uneasy task. In order to predict what happens in t days, one usually has to study events occurring every day between now and t days from now. It is often found that tomorrow’s events depend on today, yesterday, etc.

In such common situations, exact computation of probabilities may be long and difficult (although we’ll learn some methods in Chapters 6 and 7). However, Monte Carlo forecasts are feasible.

The currently known information can be used to generate random variables for tomorrow.

Then, they can be considered “known” information, and we can generate random variables for the day after tomorrow, and so on, until day t.

The rest is similar to the previous sections. We generate N Monte Carlo runs for the next t days and estimate probabilities, means, and standard deviations of day t variables by long-run averaging.

In addition to forecasts for a specific day t, we can predict how long a certain process will last, when a certain event will occur, and how many events will occur.

Example 5.16 (New software release). Here is a stochastic model for the number of errors found in a new software release. Every day, software developers find a random number of errors and correct them. The number of errors Xtfound on day t has Poisson(λt) distribution whose parameter is the lowest number of errors found during the previous 3 days,

λt= min{X^t−1, Xt−2, Xt−3} .

Suppose that during the first three days, software developers found 28, 22, and 18 errors.

(a) Predict the time it will take to find all the errors.

(b) Estimate the probability that some errors will remain undetected after 21 days.

Solution. Let us generate N = 1000 Monte Carlo runs. In each run, we generate the number of errors found on each day until one day this number equals 0. According to our model, no more errors can be found after that, and we conclude that all errors have been detected.

The following MATLAB code uses method (5.4) as follows,

N=1000; % number of Monte Carlo runs

Time=zeros(N,1); % the day the last error is found

Nerrors=zeros(N,1); % total number of errors

for k=1:N; % do-loop of N Monte Carlo runs

Last3=[28,22,18]; % errors found during last 3 days

DE=sum(Last3); % detected errors so far

T=0; X=min(Last3); % T = # days, X = # errors on day T

while X>0; % while-loop until no errors are found

lambda=min(Last3); % parameter λ for day T

U=rand; X=0; % initial values

while U>=exp(-lambda);

U=U*rand; X=X+1; % according to (5.4), X is Poisson(λ) end;

T=T+1; DE=DE+X; % update after day T

Last3=[Last3(2:3), X];

end; % the loop ends when X=0 on day T

Time(k)=T-1;

Nerrors(k)=DE;

end;

Now we estimate the expected time it takes to detect all the errors by mean(Time), the probability of errors remaining after 21 days by mean(Time>21), and the expected total number of errors by mean(Nerrors). This Monte Carlo study should predict the expected time of about 16.75 days to detect all the errors, the probability 0.22 that errors remain

after 21 days, and about 222 errors overall. ♦

5.3.4 Estimating lengths, areas, and volumes

Lengths

A Standard Uniform variable U has density fU(u) = 1 for 0≤ u ≤ 1. Hence, U belongs to a set A⊂ [0, 1] with probability

P{U ∈ A} = Z

1 du = length of A. (5.11)

Monte Carlo methods can be used to estimate the probability in the left-hand side. At the same time, we estimate the right-hand side of (5.11), the length of A. Generate a long run of Standard Uniform random variables U1, U2, . . . , Un and estimate the length of A by the proportion of Ui that fall into A.

What if set A does not lie within a unit interval? Well, we can always choose a suitable system of coordinates, with a suitable origin and scale, to make the interval [0, 1] cover the given bounded set as long as the latter is bounded. Alternatively, we can cover A with some interval [a, b] and generate non-standard Uniform variables on [a, b], in which case the estimated probability P{U ∈ A} should be multiplied by (b − a).

Areas and volumes

Computing lengths rarely represents a serious problem; therefore, one would rarely use Monte Carlo methods for this purpose. However, the situation is different with estimating areas and volumes.

The method described for estimating lengths is directly translated into higher dimensions.

Two independent Standard Uniform variables U and V have the joint density fU,V(u, v) = 1 for 0≤ u, v ≤ 1, hence,

P{(U, V ) ∈ B} = ZZ

1 du dv = area of B

for any two-dimensional set B that lies within a unit square [0, 1]× [0, 1]. Thus, the area of B can be estimated as a long-run frequency of vectors (Ui, Vi) that belong to set B.

Algorithm 5.5 (Estimating areas)

1. Obtain a large even number of independent Standard Uniform variables from a random number generator, call them U1, . . . , Un; V1, . . . , Vn.

2. Count the number of pairs (Ui, Vi) such that the point with coordinates (Ui, Vi) be-longs to set B. Call this number NB.

3. Estimate the area of B by NB/n.

Similarly, a long run of Standard Uniform triples (Ui, Vi, Wi) allows to estimate the volume of any three-dimensional set.

Areas of arbitrary regions with unknown boundaries

Notice that in order to estimate lengths, areas, and volumes by Monte Carlo methods, know-ing exact boundaries is not necessary. To apply Algorithm 5.5, it is sufficient to determine which points belong to the given set.

Also, the sampling region does not have to be a square. With different scales along the axes, random points may be generated on a rectangle or even a more complicated figure. One way to generate a random point in a region of arbitrary shape is to draw a larger square or rectangle around it and generate uniformly distributed coordinates until the corresponding point belongs to the region. In fact, by estimating the probability for a random point to fall into the area of interest, we estimate the proportion this area makes of the entire sampling region.

10 miles

✛ ✲

8miles

❄

✻

FIGURE 5.7: Monte Carlo area estimation. Fifty sites are randomly selected; the marked sites belong to the exposed region, Example 5.17.

Example 5.17 (Size of the exposed region). Consider the following situation. An emergency is reported at a nuclear power plant. It is necessary to assess the size of the region exposed to radioactivity. Boundaries of the region cannot be determined; however, the level of radioactivity can be measured at any given location.

Algorithm 5.5 can be applied as follows. A rectangle of 10 by 8 miles is chosen that is likely to cover the exposed area. Pairs of Uniform random numbers (Ui, Vi) are generated, and the level of radioactivity is measured at all the obtained random locations. The area of dangerous exposure is then estimated as the proportion of measurements above the normal level, multiplied by the area of the sampling rectangle.

In Figure 5.7, radioactivity is measured at 50 random sites, and it is found above the normal level at 18 locations. The exposed area is then estimated as

50(80 sq. miles) = 28.8 sq. miles.

♦

Notice that different scales on different axes in Figure 5.7 allowed to represent a rectan-gle as a unit square. Alternatively, we could have generated points with Uniform(0,10) x-coordinate and Uniform(0,8) y-coordinate.

5.3.5 Monte Carlo integration

We have seen how Monte Carlo methods can be used to estimate lengths, areas, and volumes.

We can extend the method to definite integrals estimating areas below or above the graphs of corresponding functions. A MATLAB code for estimating an integral

I = Z 1

g(x)dx is

N = 1000; % Number of simulations

U = rand(N,1); % (U,V) is a random point V = rand(N,1); % in the bounding box I = mean( V < g(U) ) % Estimator of integral I

Expression V < g(U) returns an N× 1 vector. Each component equals 1 if the inequality holds for the given pair (Ui, Vi), hence this point appears below the graph of g(x), and 0 otherwise. The average of these 0s and 1s, calculated by mean, is the proportion of 1s, and this is the Monte Carlo estimator of integral I.

Remark: This code assumes that the function g is already defined by the user in a file named

“g.m.” Otherwise, its full expression should be written in place of g(U).

Remark:We also assumed that 0 ≤ x ≤ 1 and 0 ≤ g(x) ≤ 1. If not, we transform U and V into non-standard Uniform variables X = a + (b − a)U and Y = cV , as in the rejection method, then the obtained integral should be multiplied by the box area c(b − a).

Accuracy of results

So far, we estimated lengths, areas, volumes, and integrals by long-run proportions, the method described in Section 5.3.1. As we noted there, our estimates are unbiased, and their standard deviation is

whereI is the actual quantity of interest.

Turns out, there are Monte Carlo integration methods that can beat this rate. Next, we derive an unbiased area estimator with a lower standard deviation. Also, it will not be restricted to an interval [0, 1] or even [a, b].

Improved Monte Carlo integration method First, we notice that a definite integral

I =

equals the expectation of (b− a)g(X) for a Uniform(a, b) variable X. Hence, instead of using proportions, we can estimate I by averaging (b − a)g(Xⁱ) for some large number of Uniform(a, b) variables X1, . . . , XN.

Furthermore, with a proper adjustment, we can use any continuous distribution in place of Uniform(a, b). To do this, we choose some density f (x) and writeI as

I = this density and compute the average of g(Xi)/f (Xi).

In particular, we are no longer limited to a finite interval [a, b]. For example, by choosing f (x) to be a Standard Normal density, we can perform Monte Carlo integration from a =−∞

to b = +∞:

N = 1000; % Number of simulations

Z = randn(N,1); % Standard Normal variables

f = 1/sqrt(2*Pi) * exp(-Z.ˆ2/2); % Standard Normal density Iest = mean( g(Z)./f(Z) ) % Estimator of R∞

−∞g(x) dx

Remark:recall that “dot” operations .ˆ and ./ stand for pointwise power and division. Without a dot, MATLAB performs standard matrix operations instead.

Accuracy of the improved method

As we already know from (5.9), using long-run averaging returns an unbiased estimator ˆI, hence E(ˆI) = I. We also know from (5.10) that

Std(ˆI) = σ

√N,

where σ is the standard deviation of a random variable R = g(X)/f (X). Hence, the esti-mator ˆI is more reliable if σ is small and N is large. Small σ can be obtained by choosing a density f (x) that is approximately proportional to g(x) making R nearly a constant with Std(R)≈ 0. However, generating variables from such a density will usually be just as difficult as computing the integralI.

In fact, using a simple Standard Uniform distribution of X, we already obtain a lower standard deviation than in (5.12). Indeed, suppose that 0≤ g(x) ≤ 1 for 0 ≤ x ≤ 1. For a Uniform(0, 1) variable X, we have f (X) = 1 so that

σ²= Var R = Var g(X) = Eg²(X)− E²g(X) = Z 1

g²(x)dx− I²≤ I − I²,

because g²≤ g for 0 ≤ g ≤ 1. We conclude that for this method,

Std(ˆI) ≤ s

I − I²

N =

rI(1 − I)

N .

Comparing with (5.12), we see that with the same number of simulations N , the latter method gives more accurate results. We can also say that to attain the same desired accu-racy, the second method requires fewer simulations.

This can be extended to an arbitrary interval [a, b] and any function g∈ [0, c] (Exercise 5.16).

Summary and conclusions

Monte Carlo methods are effectively used for estimating probabilities, expectations, and other distribution characteristics in complex situations when computing these quantities by hand is difficult. According to Monte Carlo methodology, we generate a long sequence of random variables X1, . . . , XN from the distribution of interest and estimate probabilities by long-run proportions, expectations by long-run averages, etc. We extend these methods to the estimation of lengths, areas, volumes, and integrals. Similar techniques are used for forecasting.

All the discussed methods produce unbiased results. Standard deviations of the proposed estimators decrease at the rate of 1/√

N . Knowing the standard deviation enables us to assess the accuracy of obtained estimates, and also, to design a Monte Carlo study that attains the desired accuracy with the desired high probability.

In this chapter, we learned the inverse transform method of generating random variables, rejection method, discrete method, and some special methods. Monte Carlo simulation of more advanced models will be considered in Chapters 6 and 7, where we simulate stochastic processes, Markov chains, and queuing systems.

Exercises

5.1. Derive a formula and explain how to generate a random variable with the density f (x) = (1.5)√

x for 0 < x < 1

if your random number generator produces a Standard Uniform random variable U . Use the inverse transform method. Compute this variable if U = 0.001.

5.2. Let U be a Standard Uniform random variable. Show all the steps required to generate (a) an Exponential random variable with the parameter λ = 2.5;

(b) a Bernoulli random variable with the probability of success 0.77;

(d) a discrete random variable with the distribution P (x), where P (0) = 0.2, P (2) = 0.4, P (7) = 0.3, P (11) = 0.1;

(e) a continuous random variable with the density f (x) = 3x², 0 < x < 1;

(f) a continuous random variable with the density f (x) = 1.5x², −1 < x < 1;

(g) a continuous random variable with the density f (x) = ₁₂¹ √³

x, 0≤ x ≤ 8.

If a computer generates U and the result is U = 0.3972, compute the variables generated

In document Probability and Statistics for Computer Scientists- Baron, Michael (Page 139-154)