Chapter 3 MCMC
3.4 Marginal MCMC
In many situations a hidden data model is either the natural representation or ex- tending the model with augmented data provides a powerful tool for inference. In particular inference for epidemics where the infection times are unobserved provides such an example where usually the primary interest is in the distribution of pa- rameters rather than the conditional distribution of the number of infectives. This is considered in detail in chapter 4 where the GIMH and MCWM algorithms are applied. A generic hidden data model with parametersθ∈Θ is thatX∈ X is unob- served or augmented data andY∈ Y is observed data with a joint distribution that has a natural factorisation π(y,x, θ) = π(y|x, θ)π(x|θ)p(θ) where p is the prior2. When interest is in the marginal posterior π(θ|y), rather than the joint posterior
π(θ,x|y) and the marginal is both intractable and difficult to sample from then the pseudo marginal algorithms of Andrieu and Roberts (2009) are often an appropriate choice.
3.4.1 Full posterior approach
Most previous approaches target π(θ,x|y) and then marginalise by ignoring x in the samples obtained from the Markov chain. A standard approach is to use a de- terministic scan Gibbs sampler at the top level onx, θ, often the exact distributions
π(θ|X,Y) and π(X|θ,Y) are unavailable and so MH steps are used. The resulting algorithm is given in algorithm 3.2.
3.4.2 GIMH
The grouped independence Metropolis-Hastings (GIMH) algorithm was introduced by Beaumont (2003) in a genetics context, and is described by Andrieu and Roberts in terms of the marginal π(θ) of π(θ, Z). An importance sampler estimate ˜πN(θ) ofπ(θ) is used within a Metropolis-Hastings step, the justification for this is given in more general terms in section 3.5, in particular in their terms ˜πN(θ) =PN
i=1π(θ, Zi)/qθ(Zi)
2
Algorithm 3.2 MH within Gibbs algorithm for hidden data The target isπ(θ,X|y) each outer step repeats these steps
1. MH sample of θ|X,y∝π(X,y|θ)p(θ) by 2. Proposeθ0 from q(θ0|θ)
3. Acceptθ0 with probability min(A,1) where
A= π(X,Y|θ
0)p(θ0)q(θ|θ0) π(X,Y|θ)p(θ)q(θ0|θ)
4. MH sample of X|θ,y∝π(X,y|θ) by 5. ProposeX0 from q(X0|X)
6. AcceptX0 with probability min(A,1) where
A= π(X
0,Y|θ)q(X|X0) π(X,Y|θ)q(X0|X)
whereZi∼qθ(.) in the case we are consideringZ is identical toXand the full pos-
terior is
π(θ, Z) = π(θ,X,Y)
π(Y) ∝π(Y|θ,X)π(X|θ)p(θ)
the constantπ(Y) cancels in the calculation of the acceptance ratio and the estimate ˜ πN(θ) becomes ˜ πN(θ) = nz X i=1 π(y|θ,xi)π(xi|θ)p(θ) qθ(xi) (3.4.1)
where thexi arenz values i.i.d. ∼qθ(.)3.
A simple (but rarely optimal) choice for qθ(.) is π(X|θ) which is often easy to sample from, in this case the calculation of ˜πN(θ) simplifies asqθ(.) cancels and
this also speeds the calculation, especially whenπ(X|θ) is expensive to calculate. In chapter 4 the GIMH is applied to the Indian buffet epidemic.
A common problem with the GIMH algorithm is that the Markov chain can get stuck with a very small probability of moving, this happens when the estimate ˜
πN(θ) is much larger thanπ(θ) and so also larger thanπ(θ0) for nearly all proposed
θ0. Although the eventual escape and convergence to the exact target is guaranteed as the number of steps → ∞ it may require an unacceptable time to do so. The
3
solution is to improve the estimate ofπ(θ), the obvious approach of increasingnz in
generating ˜πN(θ) will help, but if qθ(.) is such that the weight distribution has the typical heavy tail nz, then the results from section 3.2 show that nz would have to
be increased exponentially to achieve the desired improvement. An analysis of the reasons for the sticking of the chain is presented below in section 3.5. The analysis is given in more general terms and to distinguish the original algorithm from the generalised algorithm the name stochastic exact Metropolis-Hastings (SEMH) is introduced.
3.4.3 MCWM
A closely related approximate algorithm was introduced by O’Neill et al. (2000) the Monte Carlo within Metropolis algorithm (MCWM), which is also analysed and generalised by (Andrieu and Roberts, 2009). In the absence of a better proposal
qθ(.) this provides an alternative which in general does not suffer from sticking but
has a bias that in general is not known. The generalisation of MCWM is called the Stochastic Approximate Metropolis-Hastings algorithm (SAMH), and the bias is analysed in section 3.5.4.
Effect of Limited Support on the Proposal
In complex hidden data models it can be difficult to know the support ofX given
y and so proposals will generate values of x that give zero likelihood, this is not a problem for standard MCMC or the GIMH algorithms as these proposals are rejected, the only effect is to reduce the acceptance rate. However in some of the examples considered in chapter 4 this happened sufficiently often that the probability of allnz importance samples being zero and henceW = 0 in one or more steps of a
long run was significant. MCWM when bothW = 0 andW0 = 0 requires specifying the behavior as accepting with probability p0 ∈ [0,1], the subsequent examples all usedp0 = 0.
3.4.4 MCWM bias
Simulations have been performed to illustrate the bias in the MCWM algorithm as the dimension d increases, the target chosen has the same dimension d for θ
andZ with a known marginal, π(θ) multivariate normal∼N(0d,1d) and π(Z|θ)∼
N(θ,1d). In order to compare the d-dimensional simulated distributions with the
median against the expectation. The points shown in figure 3.4.1 are for increasing dimension and 3 values ofnz=10,20,40.
Figure 3.4.1: Bias of MCWM multivariate normal, sample median vs true median
nz = 10 (black), 20 (red), 40 (blue)