• No results found

Other (noniterative) methods for generating random variates

In document Bayesian Econometric Methods (Page 181-193)

Basics of Bayesian computation

Exercise 11.15 (Gibbs sampling in a regression model with inequality constraints:

11.4 Other (noniterative) methods for generating random variates

Table 11.8: Posterior results forθ for various M–H algorithms.

Acceptance Posterior Mean Posterior Variance Probability

True Value 0.00 8.00

Independence Chain M–H Algorithm

d= 1.00 0.16 3.48 0.65

d= 2.00 0.04 7.15 0.84

d= 3.00 0.01 7.68 0.79

d= 6.00 −0.05 8.03 0.49

d= 20.00 −0.07 7.87 0.16

d= 100.00 0.12 7.41 0.03

Random Walk Chain M–H Algorithm

c= 0.10 0.33 1.13 0.98

c= 0.50 0.20 6.87 0.91

c= 1.00 −0.08 7.01 0.83

c= 4.00 −0.03 7.93 0.52

c= 10.00 0.05 7.62 0.28

c= 100.00 0.23 6.25 0.03

11.4 Other (noniterative) methods for generating random variates

In the examples considered by Exercises 11.7–11.12 and 11.15, the posterior simulators only involved sampling from well-known distributions. Routines for drawing from such distributions are typically found in most, if not all, statistical software packages, as Exercise 11.2 suggests. In some cases, however, a posterior simulator may require the researcher to generate draws from a nonstandard distribution. Although importance sampling (e.g., Exercises 11.4 and 11.5) and the M–H algorithm (e.g., Exercises 11.17 and 11.18) often provide useful solutions in such a situation, it is certainly possible that simpler or more efficient computational procedures are available. Moreover, it is also useful to look inside the “black box” of existing routines in software packages to see how, in fact, draws from standard distributions are often obtained. In the exercises that follow in the remainder, of this chapter, we describe several alternative approaches for generating variates from nonuniform distributions. The first of these, discussed in the following two exercises, is the inverse transform method.

Exercise 11.19 (The inverse transform) Suppose thatX is a continuous random variable with distribution functionF and density f . Further, assume that the distribution function F can be easily calculated. LetU ∼ U(0, 1), a uniform random variable on the unit interval, and defineY = F−1(U).

Derive the distribution of the random variableY . How can this result help us to generate variates from the densityf ?

158 11 Basics of Bayesian computation

0 1

F−1(U) U

Plot of c.d.f. F(x) Hypothetical U ~ U(0,1)

Figure 11.4 — Graphical depiction of inverse transform method.

Solution

We describe two different approaches for deriving the distribution ofY . First, we consider the c.d.f. method, and Figure 11.4 helps to illustrate this argument. Note that

Pr[Y ≤ a] = Pr[F−1(U) ≤ a]

= Pr[U ≤ F (a)]

= F (a),

where the last line follows sinceU ∼ U(0, 1), and intuition behind the second line can be seen from Figure 11.4. This derivation shows thatY has distribution function F .

Another approach for establishing this result makes use of a change of variables. First note thatp(u) = I[0 ≤ u ≤ 1], with I(·) denoting an indicator function and U = F (Y ).

Thus,p(y) = f(y)I[0 ≤ F (y) ≤ 1] = f(y).

This result is extremely useful, because it provides a way to generate draws fromf . If the c.d.f.F and its inverse F−1 are easily calculated, we can first drawU ∼ U(0, 1) and then calculateY = F−1(U). It follows that Y is a draw from f. The next exercise provides two applications of this method.

11.4 Other (noniterative) methods for generating random variates 159 Exercise 11.20 (Applications of the inverse transform: drawing exponential and trun-cated normal variates)

(a) Consider the exponential density (see Appendix Theorem 2) with density function p(x|θ) = θ−1exp(−x/θ), x > 0.

Show how the inverse transform method can be used to generate draws from the exponential density.

(b) Letx ∼ T N[a,b](µ, σ2) denote that x is a truncated normal random variable. Specif-ically, this notation defines that x is generated from a normal density with mean µ and varianceσ2, which is truncated to lie in the interval[a, b]. The density function for x in this case is given as

Show how the inverse transform method can be used to generate draws from the truncated normal density.

The results of Exercise 11.19 imply that if we solve forx in the equation u= 1 − exp

−x θ

 ,

withu denoting a realized draw from a U(0, 1) distribution, then x has the desired expo-nential density. A little algebra provides

x= −θ ln(1 − u) as the solution.

(b) Fora≤ x ≤ b, the c.d.f. of the truncated normal random variable is F(x) =

160 11 Basics of Bayesian computation

The results of Exercise 11.19 reveal that ifx is a solution to the equation u= Φx−µ

σ

− Φa−µ

σ

 Φ

b−µσ

− Φa−µ

σ

,

whereu is realized draw from a U(0, 1) distribution, then x ∼ T N[a,b](µ, σ2). It follows that

x= µ + σΦ−1

 Φ

a− µ σ

+ u

Φ

b− µ σ

− Φ

a− µ σ

.

Our ability to draw from the truncated normal density will be quite important for the esti-mation of many models considered in Chapter 14.

Exercise 11.21 (Acceptance/rejection sampling) Consider the following strategy for drawing from a densityf(x) defined over the compact support a ≤ x ≤ b:

1. Generate two independent uniform random variablesU1 andU2 as follows:

Uiiid∼ U(0, 1), i = 1, 2.

2. Let

M ≡ max

a≤x≤bf(x).

If

M U2 > f(a + [b − a]U1),

start over. That is, go back to the first step and generate new values forU1 andU2, and again determine ifM U2> f(a + [b − a]U1). When

M U2 ≤ f(a + [b − a]U1) set

x= a + (b − a)U1

as a draw fromf(x).

(a) What is the probably that any particular iteration in this algorithm will produce a draw that is accepted?

(b) Sketch a proof as to why x, when it is accepted, has distribution function F(x) =

*x

a f(t) dt.

11.4 Other (noniterative) methods for generating random variates 161

The third line uses the fact thatU1andU2are independent, the fourth and fifth lines follow from the fact that Ui ∼ U(0, 1), i = 1, 2, and the fifth line also applies a change of variable, settingt= a + (b − a)U1. Thus the probability of accepting a candidate draw in the algorithm is[M(b−a)]−1. Note that, when using this method to sample from a uniform distribution on[a, b], all candidates from the algorithm are accepted.

(b) Consider Pr(x ≤ c|x is accepted). We seek to show that this probability equals the c.d.f.

valueF(c) =*c

Therefore, a candidate draw that is accepted from the acceptance/rejection method has dis-tribution functionF , as desired.

162 11 Basics of Bayesian computation Exercise 11.22 (Using acceptance/rejection sampling, 1) (a) Consider the triangular density function, given as

p(x) = 1 − |x|, x ∈ [−1, 1].

Use the acceptance/rejection sampling method of Exercise 11.21 to generate 25,000 draws from this distribution.

(b) Suppose it is desired to sample a random variable x from a T N[0,4](1, 1) distribution, that is, a normal density with unit mean and variance, that has been truncated to the interval [0, 4]. Apply the acceptance/rejection method to generate 25,000 draws from this truncated normal distribution.

(c) Using the 25,000 draws obtained from (a) and (b), estimate the density functions of the accepted draws in each case. Evaluate the performances of the acceptance/rejection method by comparing the density estimates to the actual densities.

Solution

(a–c) The Matlab code used to simulate draws from these distributions is provided on the Web site associated with this book. For (a), note thatM = 1 and b − a = 2, so the over-all acceptance rate is one-half. (That is, we would expect that 50,000 pairs of independent uniform variates in the acceptance/rejection algorithm are needed to produce a final sample of 25,000 draws.) For (b), M ≈ .475 and b − a = 4, so the overall acceptance rate is approximately .53.

Graphs of the actual densities (dotted) and kernel density estimates based on the 25,000 draws (solid) are provided in Figure 11.5. For the triangular case, the estimated density is nearly indistinguishable from the actual density. For the truncated normal case, the two graphs are again very similar. The slight discrepancy at the lower limit of zero seems re-lated to the performance of the kernel density estimator at the boundary.

Exercise 11.23 (A generalized acceptance/rejection algorithm) Suppose it is of in-terest to generate draws from a densityf(θ) (henceforth, refered to as the target density).

Let Θ denote the support of f(θ), and suppose there exists some approximating density s(θ), called the source density, with support Θ, whereΘ ⊆ Θ.

In many applications requiring posterior simulation, the normalizing constant of the tar-get density is unknown, since the joint or conditional posteriors are only given up to pro-portionality. To this end, let us work with the kernels of both the source and target densities and write

f(θ) = cff˜(θ), s(θ) = cs˜s(θ),

so that ˜f and˜s denote the target and source kernels, respectively, and cf andcsdenote the associated normalizing constants. Finally, let

M˜ ≡ sup

θ∈Θ

f˜(θ)

˜s(θ)

 .

11.4 Other (noniterative) methods for generating random variates 163

−1 −0.5 0 0.5 1

0 0.5 1

X

Triangular Density

0 1 2 3 4

0 0.25 0.5

X

Truncated Normal Density

Figure 11.5 — Analytical densities (dotted) and kernel density estimates (solid) obtained using 25,000 draws from acceptance/rejection sampler: triangular (left) and truncated normal (right).

Consider the following algorithm:

1. DrawU uniformly on[0, 1] [i.e., U ∼ U(0, 1)].

2. Draw a candidate from the source densitys(θ) [i.e., θcand∼ s(θ)].

3. If

U f˜cand) M˜˜s(θcand), then set

θ= θcand

as a draw from f(θ). Otherwise, return to the first step and repeat the process until condition (3) is satisfied.

(a) Show how this algorithm includes the one provided in Exercise 11.21 as a special case.

(b) What is the overall acceptance rate in this algorithm?

(c) Sketch a proof of why this algorithm provides a draw fromf(θ).

164 11 Basics of Bayesian computation Solution

(a) Consider using this algorithm to generate a draw fromf(θ) with compact support [a, b], as described in Exercise 11.21. In addition, employ a source densitys(θ) that is uniform over the interval[a, b].

In this case we can write

f(θ) = cfg(θ)I(a ≤ θ ≤ b) = cff˜(θ), whereM is defined as the maximum of f in Exercise 11.21.

To implement the algorithm with the given uniform source density, we first generate θcand∼ U(a, b), which is equivalent to writing θcand= a+(b−a)U1, where U1 ∼ U(0, 1).

We then generateU2∼ U(0, 1) and accept θcandprovided U2 f˜cand)

This decision rule and the random variables U1 and U2 are identical to those described in Exercise 11.21. So the algorithm provided in this exercise reduces to that described in Exercise 11.21 when the target density has compact support and a source density that is uniform over that same support is employed.

(b) The overall acceptance rate is Pr

11.4 Other (noniterative) methods for generating random variates 165 (c) Following Geweke (2005, Section 4.2.1), we note that for any subsetA ofΘ

Pr(θ is accepted, θ ∈ A) = )

A

f˜(θ)

M˜˜s(θ)s(θ) dθ

= cs M˜

)

A

f˜(θ) dθ.

Since

Pr(θ ∈ A|θ is accepted) = Pr(θ is accepted, θ ∈ A) Pr(θ is accepted)

= csM˜−1*

Af˜(θ) dθ cs[ ˜M cf]−1

= cf )

A

f˜(θ) dθ

= )

Af(θ) dθ,

it follows that whenθ is accepted from the algorithm, it is indeed a draw from f(θ).

Exercise 11.24 (Using acceptance/rejection sampling, 2)

(a) Using the algorithm of Exercise 11.23 and aU(−1, 1) source density, generate 25,000 draws from the triangular distribution p(θ) = 1 − |θ|, θ ∈ [−1, 1]. Comment on your results.

(b) Generate an equal number of draws from the triangular distribution using the same algorithm and anN(0, σ2) source density. First, consider a standard normal source with σ2 = 1. Then, investigate the performance of the acceptance/rejection method with σ2 = 2 andσ2 = 1/6. Comment on your results.

Solution

(a–c). For the U(−1, 1) source density, note that the solution to part (a) of the previous exercise shows that the algorithm of Exercise 11.23 reduces to the algorithm of Exercise 11.21. Thus, from part (a) of Exercise 11.21, it follows that the overall acceptance rate is .5. We regard this as a benchmark and seek to determine if an alternate choice of source density can lead to increased efficiency.

For the normal source density in part (b), when σ2 = 1, the maximum of the target/

source ratio occurs atθ = 0, yielding a value of ˜M = 1. Since the normalizing constant of the standard normal is cs = (2π)−1/2, it follows that the overall acceptance rate is

166 11 Basics of Bayesian computation

(2π)−1/2 ≈ .40. When comparing this algorithm to one with σ2 = 2, it is clear that the standard normal source will be preferred. The maximum of the target/source ratio with σ2 = 2 again occurs at θ = 0, yielding ˜M = 1. However, the overall acceptance probability reduces to1/√

4π ≈ .28.

The final choice ofσ2 = 1/6 is reasonably tailored to fit this application. One can easily show that the mean of the target is zero and the variance is1/6, so the N(0, 1/6) source is chosen to match these two moments of the triangular density. With a little algebra, one can show that the maximum of the target/source ratio occurs at

θ= 1 + 1/3

2 ≈ .789.

(Note that another maximum also occurs at −.789 since both the target and source are symmetric about zero.) With this result in hand, the maximized value of the target/source ratio is ˜M ≈ 1.37, yielding a theoretical acceptance rate of 1/(1.37

2π[1/6]) ≈ .72.

Thus, theN(0, 1/6) source density is the most efficient of the candidates considered here.

In Figure 11.6 we plot the alternate source densities, scaled up by ˜M , against the tri-angular density. The efficiency of the algorithm increases with the proximity of the scaled

−1 −0.5 0 0.5 1

0 0.2 0.4 0.6 0.8 1 1.2 1.4

Density

N(0,1/6)

N(0,1) U( 1,1)

Triangular Density

θ

Figure 11.6 — Triangular density together with three different scaled source densities.

11.4 Other (noniterative) methods for generating random variates 167 source density to the target. As the figure suggests, theN(0, 1/6) source is most efficient, as its selection results in the fewest number of discarded draws. Although it might seem that the figure suggests a preference for the N(0, 1) source over the U(−1, 1) alternative, note that this figure is only plotted over the support of the target density. When a candidate outside the[−1, 1] range is drawn from the N(0, 1) source, it must be discarded, resulting in increased inefficiency of theN(0, 1) choice relative to the U(−1, 1) source.

Exercise 11.25 (The weighted bootstrap) The weighted bootstrap of Smith and Gelfand (1992) and the highly related sampling-importance resampling (SIR) algorithm of Rubin (1987, 1988) circumvents the need to calculate the “blanketing constant” ˜M in acceptance sampling.

Consider the following procedure for obtaining a draw from a density of interestf : 1. Drawθ1,θ2, . . . , θnfrom some approximating source densitys(θ).

2. Like Exercise 11.23, let us work with the kernels of the target and source densities and thus define

f(θ) = cff˜(θ), s(θ) = cs˜s(θ).

Set

wi = wii) = f˜i)

˜s(θi)

and define the normalized weights

˜

wi= wi

n

i=1wi.

3. Drawθfrom the discrete set1, θ2, . . . , θn} with Pr(θ= θj) = ˜wj, j = 1, 2, . . . , n.

Show that θ provides an approximate draw fromf(θ), with the accuracy of the ap-proach improving withn, the simulated sample size from the source density.

168 11 Basics of Bayesian computation Solution

Note that

Pr(θ ≤ c) = n

j=1

˜

wjI(θj ≤ c)

=

n

j=1wjI(θj ≤ c)

n

j=1wj

= [1/n]n

j=1wjI(θj ≤ c) [1/n]n

j=1wj

E[w(θ)I(θ ≤ c)]

E[w(θ)]

=

*

−∞w(θ)I(θ ≤ c)s(θ) dθ

*

−∞w(θ)s(θ) dθ

=

*c

−∞[ ˜f(θ)/˜s(θ)]s(θ) dθ

*

−∞[ ˜f(θ)/˜s(θ)]s(θ) dθ

=

*c

−∞f˜(θ) dθ

*

−∞f˜(θ) dθ

= ) c

−∞f(θ) dθ

≡ F (c).

In the fourth line, we consider the limit as the simulated sample sizen approaches infinity.

To proceed from line 6 to 7, we notes(θ)/˜s(θ) = cs, and similarly note thatf(θ)/ ˜f(θ) = cf when moving from lines 7 to 8.

Again, this algorithm only provides an approximate draw fromf , with its accuracy in-creasing withn. Of course, the performance of the algorithm depends on how accurately s approximates f . If, for example, the mass under s is concentrated in a region where f places little mass, then the algorithm will typically perform poorly.

12

In document Bayesian Econometric Methods (Page 181-193)