3.5 MCMC inference via Uniformization
3.5.1 Comparison with existing sampling algorithms
A simple Monte Carlo approach to obtaining posterior samples from an endpoint- conditioned MJP (i.e. an MJP with noiseless observations at the endpoints of an obser- vation interval) is rejection sampling: sample paths from the prior given the observed
MCMC inference via Uniformization 46
start-state and reject those that do not end in the observed end-state (Nielsen,2002). For multiple noiseless observations over an interval, one uses the Markov property of the MJP to break the problem into a number of independent endpoint conditioned inference problems.
Rejection sampling can be extended to the case of noisy observations by importance sampling or, more practically, by sequential Monte Carlo methods like particle filter- ing (Fan and Shelton, 2008). Recently, Golightly and Wilkinson (2011) have applied particle MCMC methods to correct the bias introduced by standard particle filter- ing methods. However, these methods are efficient only in situations where the data exerts a relatively weak influence on the trajectory (compared to the prior): a large state-space or an unlikely end state can result in large numbers of rejections or small effective sample sizes. Though these algorithms are simple and general purpose, their flexibility means they do not fully exploit the structure of the MJP, and often require complicated modifications to make proposals that ‘hit the data’.
A second approach, more specific to the MJP, uses matrix exponentiation (equa- tion (3.10)) to integrate out the infinitely many paths leading from the state at the time of one observation to state at the next. In particular, let ti be the time of the ith observation, and P (S(ti)|X[0,ti]) be a vector of the probability over states at time ti, given all observations upto (and including) the ith observation. Then,
P (S(ti+1) = s|X[0,ti+1]) ∝ P (Xti+1|s)exp (A(ti+1− ti))
|P (S(t
i)|X[0,ti])
s (3.29)
This suggests a dynamic programming algorithm to sample the MJP state at a finite set of times ˜T ≡ (˜t1, · · · ˜tm): make a forward pass through this set, successively calculating the marginal distribution over states using equation (3.29) (starting with the initial distribution over states, π0). Having calculated the distribution at the end time ˜tm, sample the MJP state at this time. Now, make a backward pass through the times, conditionally sampling a new state at ˜ti given the state at time ˜ti+1. For more details, see (Hobolth and Stone,2009) and the references therein.
This method has the advantage of exploiting the properties of the MJP to make ‘op- timal’ proposals, which unlike with the methods of the previous paragraph are always accepted. One might view this difference as similar to that between running the stan- dard forward-filtering backward-sampling algorithm and say, a particle filter on the discrete-time Markov chain. However, this analogy breaks down computationally, since matrix exponentiation is an expensive operation that scales as O(N3), N being the number of states. Thus, this method does not scale well when the dimensionality of the MJP state space is large, and in particular, this does not extend to MJPs with infinite state spaces. Also, the matrix resulting from matrix exponentiation is dense and any structure, e.g. sparsity, in the rate matrix A cannot be exploited. Note also that the set of times ˜T must include the set of observation times, and we therefore need at least as many matrix-exponentiations as there are observations. As we will see in
MCMC inference via Uniformization 47
the section on Markov modulated Poisson processes, there are many situations where the frequency of observations is much higher than the frequency of state changes in the MJP, and ideally, we would like the number of expensive matrix exponentials to scale with the latter quantity. Our MCMC sampler does not require any expensive matrix exponentiations; moreover, the length of the discrete-time Markov chain scales with the number of Poisson events (and thus, on the uniformization rate Ω). This is a property of the dynamics of the MJP, rather than, say, the observation process. We elaborate on this point insection 3.6.
Another limitation of the previous scheme is that we recover the MJP state only at a finite set of times. Having marginalized out the states at all remaining times, we need an additional step to fill in the rest of the trajectory. Sampling the entire trajectory is important in situations where one is performing inference on the MJP parameters (subsection 3.5.2), here one needs statistics like the total time spent in each state and the number of transitions between each pair of MJP states. One option to fill in the MJP trajectory is to use rejection sampling. A more popular approach is to use uni- formization as outlined inHobolth and Stone(2009). Like our sampler, these methods proceed by sampling the Poisson events W in the interval between observations, and then running a discrete-time Markov chain on this set of times to sample a new tra- jectory. However, sampling from the posterior distribution over the number of Poisson events can be tricky (depending crucially on the observation process), and usually re- quires a random number of O(N3) matrix multiplications (as the sampler iterates over the possible number of Poisson events). Our sampler is also based on unformization, but unlike existing work which produce independent samples, ours is an MCMC al- gorithm. By sampling the Poisson process conditioned on the current trajectory, the details of the observation process become irrelevant. The latter only enter when run- ning the HMM forward-backward algorithm. In this sense, our sampler is a convenient general purpose sampler for MJP-based models, with the user only having to provide a function that calculates the probability of observations in any segment of time where the MJP remains in a fixed state. At the price of producing correlated samples, our method extends naturally to various extensions of MJPs, scales as O(n2), does not re- quire matrix exponentiation, and easily exploits structure in the rate matrix. Moreover, we demonstrate that our sampler mixes very rapidly.