The Metropolis-Hastings Algorithm - Bayesian Methodology

1.5 Bayesian Methodology

1.5.3 The Metropolis-Hastings Algorithm

The Metropolis-Hastings algorithm provides a method for drawing samples from a given posterior distribution. The algorithm constructs a Markov chain whose stationary distribution is the required posterior. Values are drawn from a sensi- bly chosen proposal distribution and “corrected” so that, asymptotically, they behave as random observations from the posterior distribution. The algorithm itself is a form of rejection sampling that uses an acceptance/rejection rule to converge to the posterior. Once the chain has converged the simulated values can be treated as a sample from the posterior distribution and used to estimate

the marginal posterior distributions of the model parameters, and posterior summary statistics of interest.

To initialise the Markov chain, the Metropolis-Hastings algorithm begins by setting starting values for the model parameters, denoted by θ1, chosen arbi- trarily. At iteration h of the algorithm, where the current state of the chain is denoted by θh, the model parameters are updated using the following steps:

1. Sample a proposed value for θ, denoted by θ∗, from the proposal distribution q(θ∗_|θh).

2. Calculate the acceptance probability α(θh,θ∗), where α(θh,θ∗) = min 1,π(θ ∗ |x)q(θh_|θ∗) π(θh_|x)q(θ∗_|θh) .

3. With probability α(θh,θ∗) accept θ∗ and set θh+1 = θ∗, else reject and set θh+1 =θh.

In practice, to conduct this step, we draw a random variable u from the Uniform[0,1] distribution, and if u _≤α(θh,θ∗) accept the move.

Importantly, the Metropolis-Hastings algorithm only requires the posterior distribution to be known up to proportionality, as the constants of proportionality cancel out in the expression for the acceptance probability, α(θh,θ∗).

Single Update Metropolis-Hastings Algorithm

Not all model parameters need be updated simultaneously in the Metropolis- Hastings algorithm; rather parameters can be updated singly, one-at-a-time, using thesingle update Metropolis-Hastings algorithm. Here each iteration of the Metropolis-Hastings algorithm consists of an updating step for every individual parameter, which is analogous to the updating procedure described above. For example, say θ is d-dimensional. At iteration h suppose the current values are

θh =_{θh

1, . . . , θhd}. To update the first parameter, denoted byθ1, the algorithm proceeds as follows:

1. Propose a value forθ1 from the proposal distribution q1(θ1∗|θh1), where θ1∗ and θh

1 denote the proposed value and the current value respectively. 2. Let θ∗₁ =_{θ∗

1, θh2, . . . , θhd} and θ1h ={θh1, . . . , θhd}. 3. Calculate the acceptance probability α1(θ1h, θ∗1), where

α1(θ1h, θ∗1) = min 1,π(θ ∗ 1|x)q1(θh1|θ∗1) π(θh₁_|x)q1(θ1∗|θ1h) .

4. With probability α1(θ1h, θ∗1) set θ1h+1 =θ1∗, else set θ1h+1 =θh1.

The remaining (d₋1) parameters are updated in analogous steps, noting that for the jth parameter the proposal distribution isqj(θ∗j|θjh), and the parameter vectorsθ∗_j andθh_j contain the previously updated values where appropriate, i.e.

θ∗_j =_{θ₁h+1, . . . , θh_j−+11, θ∗j, θjh+1, . . . , θdh}andθ h

j ={θh1+1, . . . , θhj−+11, θhj, θjh+1, . . . , θhd}. Iteration h is completed when alld parameters have been updated.

Block updating, where sets of parameters are simultaneously updated, is also possible (Chib and Greenberg, 1995). This is often useful for highly correlated parameters.

The Proposal Distribution

There are many choices for sensible proposal distributions. A common form, one that is adopted throughout this thesis, is to base the proposal distribution around the current value resulting in a random walk Metropolis-Hastings algorithm. For example, random walk proposal distributions for θ∗

1 include the Uniform[θh

1−δ1, θh1+δ1] distribution and the Normal(θ1h, σ12) distribution. Note, δ1 and σ12 are referred to as the proposal step length and proposal variance respectively.

As the Uniform and Normal proposal distributions above are symmetric the acceptance probability reduces to the ratio of the posterior distributions evalu- ated at the proposed and current values, i.e.

α1(θh1, θ1∗) = min 1,π(θ ∗ 1|x) π(θh₁_|x) .

Although the choice of proposal distribution is essentially arbitrary, if poorly chosen the Metropolis-Hastings algorithm will be inefficient. If small moves are proposed, the updates will generally be accepted, but it will take a long time to explore the posterior parameter space. Conversely, if large moves are proposed, the updates will generally be rejected, and the algorithm will be inefficient. So that the chain efficiently traverses the posterior parameter space, acceptance rates (the proportion of times the proposed move is accepted) between 0.2 and 0.4 are optimal (Gelman et al., 1996). Once the proposal distribution is chosen, by increasing or decreasing the proposal variance/step length, the proposal distribution can be tuned to achieve an acceptance rate within this desirable range. Throughout this thesisa priori proposal tuning is achieved by

running the algorithm for a “small” number of iterations, typically in the order of 10,000, calculating the resulting acceptance rates, and making appropriate adjustments to the proposal distributions. This is generally an iterative process.

The Gibbs Sampler

The Gibbs sampler is a special case of the single update Metropolis-Hastings algorithm. Here the proposal distribution for a parameter, q, is its conditional posterior distribution given the current values of the other parameters, i.e. θ∗

j is sampled from π(θj|θ1h+1, . . . , θjh−+11, θhj+1, . . . , θhd,x). The Gibbs sampler is highly efficient as the acceptance probability, α(θ∗

j|θjh), is always one, but does require the ability to sample from the posterior conditional distributions. Ideally, these conditional distributions will be of a standard distributional form. This can be engineered by using a conjugate prior: a prior distribution where the resulting posterior distribution is of the same distributional form. For examples of conjugate priors see Gelman et al. (2004). Metropolis-Hastings updating steps can also be introduced into a Gibbs sampler, known as Metropolis-Hastings within-Gibbs. For parameters with standard conditional distributions Gibbs updates are used, else Metropolis-Hastings single updates are used (see King et al., 2009).

Number of Iterations

The Metropolis-Hastings algorithm (or Gibbs sampler) must be run for a suit- ably large number of iterations to ensure that a) the chain has converged and b) enough samples are drawn from the posterior distribution (post-convergence) so that Monte Carlo (sampling) errors are small.

The initial samples drawn prior to convergence are discarded as burn-in. It is only the remaining samples that are used for inference. It is necessary to ensure that the length of the burn-in period is long enough so that the remaining samples can be assumed to arise from the posterior distribution of interest. Examining the trace-plots of the individual parameters, to check if the samples have “settled down” to values based around a constant mean, provides a simple means for detecting a lack of convergence. Another simple approach is to repeat the simulations, running the chain from multiple different overdispersed starting points. If essentially identical posterior estimates are obtained, this suggests convergence. The Brooks-Gelman-Rubin diagnostic, based upon the idea of

using an “Analysis of Variance” to assess whether or not each of the multiple chains has the same distribution, formalises this approach (Brooks and Gelman, 1998). Other, more formal, techniques exist (Cowles and Carlin, 1996), however it is important to note that convergence diagnostics can only provide evidence of a lack of convergence, no technique can prove that the Markov chain has converged to its stationary distribution. Standard convergence diagnostics are adopted throughout this thesis, although details are generally omitted. Further, long burn-in periods are used that are overly conservative.

After convergence, further iterations are needed to obtain samples for posterior inference. The more iterations, the more accurate posterior estimates will be. Accuracy of the posterior estimates can be assessed by the Monte Carlo error, the uncertainty arising from the finite number of samples (see Gilks et al., 1996).

In document Statistical models for the long term monitoring of songbird populations: a Bayesian analysis of constant effort sites and ring recovery data (Page 42-46)