Implementing an Adaptive-Rejection Sampler

2.6 Simulations

2.6.3 Implementing an Adaptive-Rejection Sampler

In order to effectively compare the computational time and resources between methods a level playing field is necessary. This need for an impartial comparison motivates us to create an adaptive-rejection sampler for our problem. In Section 2.3 we established a Gibbs sampler capable of handling the MCMC requirements of the problem at hand. The conditional distributions of τ | β, u, Equation 2.5, and β | τ, u Equation 2.6, are well-known, easy to sample from distributions. However, the conditional distribution of u | β, τ , Equation 2.3, was not recognizable and requires an adaptive-rejection sampler to sample from this distribution.

We followed the work of Gilks and Wilde [7] to implement a method to sample from this distribution. Since the distribution in Equation 2.3 is log concave the implementation is fairly simple. Given a set of starting points we evaluate the log density (hereafter, we denote the log density as simply the density) and derivative of the density at those points. We use the derivatives to create an upper boundary on the density by connecting tangent lines. Where the tangent lines between two adjacent points intersect we form a node. The nodes partition the domain of the density function. At the original points, we create a lower boundary to the density function by creating chords between the original points (not the nodes). Between the upper and lower bounds we have sandwiched the density function; beyond the first and last points the lower bound is negative infinity.

Since the upper boundary completely covers the density function and is divided into regions we can sample from this upper boundary. To do this, integrate the exponential of the upper bound over each region; these values become the weights for selecting which region to draw a sample from. Typically this accomplished by randomly sampling from a multinomial distribution where the weights of each region are normalized so they sum up to one. With a particular region selected the next step is to find a proposal point. The proposal point is the posited value which will form the next step in the Markov chain. A random sample from within the selected region is sufficient for the proposal point. At the proposal value, calculate the density, the value of the upper bound, and if it exists, the lower bound at the point. Note if the lower bound does not exist then set it equal to −∞. Draw an independent sample, w from a Unif(0, 1). The sample w is used to calculate what Gilks and Wild called the squeeze test. The squeeze test is simply a measure of how close the upper and lower bounds are to each other; since the upper and lower bounds sandwich the density, then if the distance between the two bounds is small enough, then accept the proposal point as a sample from the density. Since the upper and lower bounds are with respect to the log density, then the test becomes simply, if w ≤ exp(l − u) where l and u are the lower and upper bounds, respectively, evaluated at the proposal value and if the statement is true then accept the

−2 0 2 4 6 8 10 −15 −5 5 ● ● ●

Point 1 Point 2 Point 3

(Step 1) −2 0 2 4 6 8 10 −15 −5 5 ● ● ● ● ● First Region

Second Region Third Region

(Step 2) −2 0 2 4 6 8 10 −15 −5 5 ● ● ● ● ● First Region

Second Region Third Region

(Step 3) −2 0 2 4 6 8 10 −15 −5 5 ● ● ● ● ● (Step 4) u35

Example of Adaptive−Rejection Sampling

Figure 2.9: An example of how to implement the ARS using observation 35 from the HDP data. (Step 1) At the “points”, evaluate the density function and calculate the derivative. (Step 2) Connect the tangent lines. At the the intersection, places nodes to divide up domain. These lines (short dash lines) form an upper boundary on the density function. (Step 3) Connect the density function at adjacent points, not the nodes. These lines (long dash lines) form a lower bound to the density function between the points. (Step 4) Sample from the domain by first selecting a region (with the probability of the region being proportional to the area under of the upper boundary) and then select a value to be the proposal point (the dot-dash line). Calculate the upper bound and the density at the proposal point. Next, calculate the lower bound at the proposal point. If the proposal point is outside of the original points, then the lower bound at the proposal point will be −∞. If the proposal point is inside of the original points then the lower bound at the proposal point is on a chord between two points. (Step 5) (Not shown) carry out the squeeze and the rejection test, if necessary, to see if the proposal value is acceptable. If it is not accepted, treat the proposal point as an additional point (not a node) and repeat the steps until an acceptable point is found.

proposal value. If however, the squeeze test fails, there is a second test, the rejection test. The rejection test is similar to the squeeze test; it measures how close the upper bound is to the density itself. In other words, if w ≤ exp(g − u) where g is the log density evaluated at the proposal point, then accept the proposal point as a sample from the density.

The purpose of having sequential tests as opposed to a single test comes from the need to acquire additional samples from the density. The rejection test is used as a respite in case the squeeze test fails. If the density is close enough to the upper boundary then the proposal point is a valid sample, however, there still is room for refinement. In any case, if the squeeze test fails, the proposal point becomes a new point and subsequently the equation of the tangent line is formed. New nodes are determined and the domain of the density function is further subdivide. By refining the bounds, it becomes easier to obtain additional samples from the density. A graphical summary of adaptive-rejection sampling for physician 35, in the HDP data with known values of β and τ , is given in Figure 2.9.

The advantage of the adaptive-rejection sampler is that it will always return a value the appropriate density. However, there are a few of caveats when implementing the sampler that arise due to the algorithm repeating until a proposal is accepted:

(1) The number and position of the starting points matters. The first initial point must have a positive derivative and the last initial point must have a negative derivative in order for the algorithm to work. To insure this happens, in the initialization of the algorithm we calculate the derivative at each point. If either of these derivatives is invalid then we utilize the concavity of the density function to identify the maximum and then redistribute the initial points to insure compliance. This is easily done by placing the first initial point less than the maximum and the last initial point greater than the maximum.

apart. This is similar to how the derivatives need to comply with the requirement in (1), but differs slightly in that the starting values be far enough part so the sandwiching of the density is better. In fact, Gilks and Wild recommend that the starting points be at the 0.1 and 0.9 quantiles of the distribution.

(3) We are using the adaptive-rejection sampling algorithm to sample from ui| β, τ, Xi, yi then each iteration is unique. This is because the iterates for β and τ are different at each iteration resulting in a different density for ui at each iteration. This complicates our choice for initialization points. In order tackle issues (2) and (3) we find the 0.1 and 0.9 quantiles of the upper boundary to initialize the sampling for the next iteration even though it is not the same density from iteration to iteration.

(4) We implemented the adaptive-rejection algorithm using recursive functions within R. When the squeeze test within the adaptive-rejection algorithm fails, a new point is added to the domain and the process of sampling begins anew, which is why we used recursive programming. However, recursion is capable of utilizing all of the computational resources, running out of memory, and possibly returning an error. In our implementation, when this does happen, which is very rare, we simply return the sample from the previous iteration. In the future we would recommend different recursion programming techniques such as tail calls and trampolines.

In document Asymptotic posterior approximation and efficient MCMC sampling for Generalized Linear Mixed Models (Page 55-59)