The Two-Component Gibbs Sampler (Data Augmentation)

1.6 Markov Chain Monte Carlo Methods

1.6.2 The Two-Component Gibbs Sampler (Data Augmentation)

The data augmentation was originally developed by Tanner and Wong (1987) for finding fixed point solutions to integral equations which appear in statistical inference and it can be viewed as the stochastic analogue to EM algorithm (see Dempster et al., 1977). It is most often used to obtain samples from the joint distribution ofX = X(1)_{, X}(2) _{say, by sampling from the conditional distributions.}

Such a scheme has a similar structure with the Gibbs sampler with Gelfand and Smith (1990) showing that the latter is at least as efficient as the former. Follow- ing the standard practice in the literature (see for example, Liu et al., 1994, Meng and van Dyk, 2001), we will identify in this thesis the data augmentation with the two-component Gibbs sampler.

Data augmentation is by far the most widely adopted computational method for performing modern Bayesian analysis of missing data problems. The target distribution is the joint posterior of the missing dataX and the parametersθ. By con- struction, simulation from the conditional distributionsπ(θ_|X, Y) andπ(X_|θ,Y) are tractable and more feasible than simulation from the marginal distribution of the parameters given the observed data, π(θ_|Y). Note that there are many cases where the latter is not even available in closed form due to the integration in (1.5). Therefore we use the two-component Gibbs sampler which update X and θ, to obtain samples fromπ(θ,X_|Y).

1.6.3 The Metropolis-Hastings Algorithm

The Metropolis algorithm (Metropolis et al., 1953) manages to sampleπ, at least approximately, in a way which does not require the knowledge of its normalisation constant. In this section we will describe the more general Metropolis−Hastings algorithm introduced by Hastings (1970). It is generally believed that most of the MCMC algorithms can be considered as a special case of this algorithm. We denote byπu the un-normalised density on Rd with respect to d-Lebegue measure,

µLeb

d . Also assume that is possible to carry out simulations of a Markov chain with transition density q(X,_·) with respect to the same measure. Such a transition density, called proposal density does not need to have any connection with πu, although its choice is important since it can actually influence the efficiency of the resultant Markov chain.

The Metropolis-Hastings algorithms proceeds as follows. An initial starting value

X0 is chosen; then given the current state of the chain, Xn=x, a candidate value

Yn+1 =y is generated according to the proposal density q(Xn,·). The generated values is then accepted with probabilityα(x, y) , given by:

α(x, y) =      minπu(y) πu(x) q(y,x) q(x,y),1 , if πu(x)q(x, y)>0 0, if πu(x)q(x, y) = 0

If the candidate value is accepted, then we set Xn+1 = y, otherwise if it is not

accepted, we set Xn+1 =x. It easy to see that the Markov chain induced by such

an algorithm has transition law P with densities

p(x, y) =q(x, y)α(x, y), x₆=y

with respect to µLeb

d and with probability of remaining at the same value equal to

r(x) =

The algorithm is implemented as follows:

The Metropolis Hastings Algorithm 1. Choose X0;

2. Set n = 0;

3. Repeat the following steps:

Sample Yn+1 ∼q(Xn,·); Sample Un+1 ∼U(0,1); If Un+1 ≤α(Xn, Yn+1) then Set Xn+1=Yn+1; Else Set Xn+1=Xn; n =n+ 1

It can be easily proven (see for example, the Lemma 2.4.1. of Roberts and Tweedie, 2006) that the algorithm ensures reversibility of the chain with respect to π, i.e. satisfies the detailed balance

π(x)p(x, y) = p(y)p(y, x).

We should note that any α(_·,_·) which satisfies the following equation

π(x)q(x, y)α(x, y) =π(y)q(y, x)α(y, x)

can be used. A class of algorithms which have other accept/reject rules can be found in Peskun (1973). However, it turns out that the accept/rule of the Metropolis-Hastings algorithm optimises the proportion of ultimately accepted

moves. Therefore, it is also optimal in the sense of minimising the asymptotic variance of any ergodic average moment estimator (see for example Peskun, 1973, Tierney, 1998, Roberts and Tweedie, 2006).

The framework of the Metropolis-Hastings algorithm is very general since it does not impose any restriction on the choice of q(_·,_·). Therefore, we will proceed by describing some special cases of this algorithm which have draw much attention in the literature. The simplest possible choice of for the proposal distribution chooses

q(·,·) to be independent of its first argument:

q(x, y) =q(y)

and therefore we can write the accept/reject ratio as

α(x, y) = min πu(y) πu(x) q(x) q(y),1 .

This is algorithm is calledIndependence Samplerand it is clear that by takingq(·) to be proportional toπu(˙) the algorithm reduces to i.i.d. sampling from π.

The algorithm which was essentially introduced in Metropolis et al. (1953) is known asSymmetric Random walk Metropolis. The proposal distribution is of the following form

q(x, y) =q(|x−y|)

and reveals states that is a function of the distance betweenx and y. In this case the accept/reject ratio reduces to

α(x, y) = min πu(y) πu(x) ,1

The accept/reject mechanism can be interpreted as follows. We accept all moves which increase πu but reject moves which decrease π. Thus, the algorithm biases the random walk by moving towards modes of π more often that moving away

from them (Roberts and Tweedie, 2006). This algorithm became one of the most widely used MCMC methods due to the fact that is extremely easy to implement. In the accept/reject ratio, only πu(·) is involved while the proposal densities do not take any part at all. Therefore many calculations can be avoided. Possibly, the most popular proposal for performing a RWM is typically of this form:

q(x, y)≡N(x, σ2)

whereσ is considered as a scaling factor chosen by the user to optimise algorithm performance; see for example Roberts et al. (1997).

Finally, the-so-called Multiplicative Random walk Metropolis offers an attractive alternative to the RWM when the state space is in the positive half line. Such an algorithm can be considered as a logarithmic random walk algorithm, in the sense that is equivalent to the RWM with a N(0, σ2_{) proposal distribution and target}

distribution obtained by a logarithmic transformation of the original target. The proposed move is to a random multiple of the current state. Thus, from the current state, x, we propose a candidate value y = zexp (U) where, U _∼ N(0, σ2_{). The}

accept/reject ratio turns out to be:

α(x, y) = min πu(y) πu(x) y x,1 .

It can be illustrated via simulations that such an algorithm can behave much more efficiently by having frequent short excursions into the tail of the target density especially in comparison of the RWM which has rare but lengthy excursions.

In document Efficient Bayesian inference for partially observed stochastic epidemics and a new class of semi parametric time series models (Page 37-41)