Exact Inference - Bayesian inference for continuous time Markov chains

The MCMC method relies on likelihood evaluation which can be intractable for a CTMC especially when the state of Markov chain is unbounded or large. Due to state space size, the direct inference is difficult. Yet, adopting a pseudo-marginal method, which is based on a random truncation, can provide an exact Bayesian inference to such difficult problem (Georgoulas et al., 2017). The method is constructed on the auxiliary variable Gibbs scheme that is proposed in (Rao and Teh, 2013). Then, the likelihood is formulated to permit the deployment of a Russian Roulette random truncation procedure as described in (Lyne et al., 2015) and (Filippone and Girolami, 2014). This results in two novel developed pseudo-marginal sampling methods for Markov process. The first one is a Metropolis-Hastings pseudo-marginal approach and the second one is an auxiliary variable pseudo-marginal Gibbs sampler. In this thesis, we focus on the second scheme, and it will be performed on one of the biological case studies considered in this thesis.

3.8.1 Sampling a Trajectory

As we mentioned in section 2.6.1, the analytical analyses of a CTMC through the CME can be difficult or computationally expensive. An alternative method is to use uniformisation as described in section 2.6.1. The main idea behind the uniformisation is to build a DTMC by assuming a common exit rate for all Markov states, denoted q. This results in a sample path (X, T ) from the discrete process. Standard methods can address the resulting discrete system. Hence, a new sample path of the process, given observations is drawn according to a forward filtering-backward sampling scheme (Rao and Teh, 2013), (Georgoulas et al., 2017).

3.8.2 Gibbs Sampling for Finite State

In this section, an exact Gibbs sampler for the unknown parameter will be considered (following the explanation in section 3.1 in (Georgoulas et al., 2017)). Let us consider a particular case when the reaction of the system is associated with the functional form of the stochastic law is:

hi(x) = θiρi(x), (3.41)

Then, a conjugate prior for θi is suggested to simplify the sampling step in the Gibbs

algorithm.

Let us assume that X = {x0, x1, · · · , xN} is a sequence of states at a sequence of

times T = {t0, t1, · · · , tN}which is form the full sampling path of the process. Each

reaction is associated with a different update vector v. The reaction at time tn+1 is

denoted by vn, where the update vector is (xn+1− xn). Then, the total rate of the

state xn based on equation (3.41) is given by:

h(xn) = L

i=1

θiρi(xn),

where L represents distinct reaction types. As we assumed that the waiting time for the CTMC follows the exponential distribution and it can be defined as:

π(tn+1|tn, xn) = h(xn)e−dtnh(xn), where dtn= tn+1− tn. (3.42)

Then, we can define the probability of the following state xn+1 is

θvnρvn(xn)

h(xn) , and

hence the likelihood can be formulated as follows:

π(X, T |θ) = π(X|θ)π(T |X, θ). (3.43)

By substituting the equation (3.42) in (3.43), we obtain:

π(X, T |θ) = N −1 Y n=0 θvnρvn(xn) h(xn) h(xn)e−dtnh(xn) π(X, T |θ) = N −1 Y n=0 θvnρvn(xn)e −dtnh(xn)_. (3.44)

If we assume a Gamma distribution for each parameter

π(θi) = bai i Γ(ai) θai−1 i e −biθi_, (3.45)

The following distribution can be evaluated using equation (3.45) and equation (3.44):

Chapter 3. Bayesian Inference Methods 71 ∝ θai+Ki−1 i e −(bi+ PN −1 n=0 dtnρi(xn))θi_, (3.46)

which implies that the parameters are Gamma distributed with rate bi+PN −1_n=0 dtnρi(xn)

and shape ai+ Ki, where Ki represents the number of times that the i reaction is

observed. This results in an exact Gibbs sampler for the unknown parameter. However, uniformsation does not work for many interesting systems with infinite state spaces. Alternatively, there is a method based on introducing a random truncation which is used to provide an unbiased estimator of the likelihood. The esti- mated likelihood then uses pseudo-marginal MCMC (Andrieu and Roberts, 2009) and (Beaumont, 2003). This results in two methods relying on random truncation namely Metropolis-Hastings, which targets the marginal likelihood, and an auxiliary variable Gibbs scheme.

3.8.3 Russian Roulette

Let us suppose that we want to estimate the following infinite sum:

f =

∞

N =0

fN,

which can be approximated by picking one term fk, where k can be chosen from

a chosen discrete probability π0, π1, · · ·. Then, the estimator of f is ˆf = _πfk

k. The

estimator is unbiased because ˆf has the expectation E[ ˆf ] = P∞

N =0

πNπN = f. This

method is importance sampling, and the main problem with this method is that variance of the estimator ˆf can be large or infinite (Georgoulas et al., 2017), (Lyne et al., 2015).

The variance estimator can be reduced through Russian Roulette sampling procedure. The method relies on approximating f with a partial sum where each term j can be chosen randomly. The probability of stopping the sum (as described in section 3.2.2 in (Georgoulas et al., 2017)) is 1 − qj, else, we continue the process to

form the partial sum ˆf = Pj

N =0

πN, the term πN =

QN −1

j=1 qj. This process results

in an unbiased estimator of the full sum (where the proof the unbiased estimator can be found in (Lyne et al., 2015)). This method is known as Russian Roulette, For further details (see (Lyne et al., 2015), (Georgoulas et al., 2017)).

3.8.4 Expanding the Likelihood

The Russian Roulette method requires expressing the likelihood as an infinite sum. Hence, the likelihood for a Markov process can be formalised as a set of an infinite series and then the space of processes path can be decomposed into "a nested sum over a subspace of trajectories which differ by at most N from the observations" (Georgoulas et al., 2017).

As it is explained in (section 3.2.1 in (Georgoulas et al., 2017)), the likelihood for a single observation (xt, t)at one dimensional process can be written as:

π(xt|x0, θ) = ∞

N =0

πt(xt, max(x0:t− xv) = N |x0, θ), (3.47)

supposing that at time 0, the initial state is known as x0 and assume that xv =

max(x0, xt). The possible states can be visited in the interval [0, t] is denoted by

x0:t and the maximum value of the process is represented by max(x0:t) = N. The

variable to sum over N is assumed to be the maximum state’s value achieved in that time. The constraint assumption for x0:t does not mean that state space is defined.

Therefore, it will be not considered as a solution of the CME. If we have:

f_tN(x0, x) = πt(x0, max(x0:t− xv) ≤ N |x, θ),

every term in the equation (3.47) decomposes as:

π_tN(xt, x0) = πt(xt, max(x0:t− xv) = N |x0, θ) = ftN(xt, x0) − ftN −1(xt, x0), (3.48)

where each term in equation (3.48) can be considered as the transient probability for a finite Markov process. For each finite state space, a generator matrix can be defined and then a transient probability can be computed using (2.7) Georgoulas et al. (2017).

3.8.5 Modified Gibbs Sampler

It is only possible to sample trajectories directly when there is a bounded state space. This is because a finite number of states is required in the uniformisation process. An alternative approach is proposed in (Georgoulas et al., 2017) to avoid this issue which works by sampling a truncation point through the Russian Roulette method.

Chapter 3. Bayesian Inference Methods 73

This truncation defines a finite state space and thereafter sample a parameter and path as described in 3.8.2. An auxiliary variable (truncation point z∗_{) is proposed}

in the sampler.

Therefore, we are sampling from the conditional posterior over the chosen truncation z∗ e.g π(X, T |θ0, Y, z∗) instated of the correct conditional posterior π(X, T |θ0, Y ). Hence, an acceptance ratio should be introduced as not each drawn path and parameter sample are accepted. The summary of the modified Gibbs sampler (as described in (Georgoulas et al., 2017)) is given in Algorithm 9.

Algorithm 9 Auxiliary variable Gibbs sampler for finite state Markov

1: Draw a parameter θ0 from π(θ|X, T ) according to equation (3.46). 2: Draw a sample X∗|θ0, Y:

(a) Select a truncation point z∗ _{through Russian Roulette method.}

(b) Draw a trajectory (X∗_{, T}∗₎ _{from π(X, T |θ}0_{, Y, z}∗₎ _{according to section}

(3.8.1).

3: The acceptance ratio is calculated as a follow:

(a) Calculate πi+1_(X∗_|θ0_{, Y )}_{and π}i_(X∗_|θ0_{, Y )}_{, the conditional posterior of the}

proposed trajectory under the new and old truncations. (b) Calculate πi+1_(X

i|θ0, Y ) and πi(Xi|θ0, Y ), the conditional posterior of the

old trajectory under the proposed and old truncations. (c) Setting a = πi+1_(X∗_|θ0_{,Y )π}i+1_(X

i|θ0,Y )

πi_(X∗_|θ0_{,Y )π}i_(X

i|θ0,Y )

4: Accept the new sample with the acceptance probability a and then set θ0 = θi+1

and (X∗_{, T}∗_{) = (X}

i+1, Ti+1), otherwise set θi = θi+1 and (Xi, Ti) = (Xi+1, Ti+1).

The probabilities that are required to compute the acceptance ratio a can be evaluated through the forward-backwards algorithm, the outline of this algorithm is given in the section A.1 of Appendix A.

In document Bayesian inference for continuous time Markov chains (Page 91-95)