Auxiliary Variable Gibbs sampling for CTBNs

4.2 Inference in CTBNs

4.2.1 Auxiliary Variable Gibbs sampling for CTBNs

In this section, we describe a Gibbs sampling algorithm to simulate the CTBN posterior over an interval [tstart, tend], given a set of observations X at times {to1, ...toO}. An iteration of the overall algorithm proceeds by performing Gibbs updates on all nodes in the CTBN; in the following we describe the update step for a single node n. Thus, we are given the complete sample paths of all nodes in node n’s Markov blanket MB(n) and a starting distribution π0 over states at time tstart. Importantly, our algorithm

Inference in CTBNs 62

produces a dependent Gibbs update, so that we also need the old trajectory of node n. To avoid notational clutter, we suppress all references to the node index n. Thus, call the old trajectory of node n, S(t) ≡ (S, T ), and the new trajectory ˜S(t) ≡ ( ˜S, ˜T ). Recall also that over the time interval [tstart, tend], the parents of node n can change state; consequently the rate matrix governing the dynamics of node n changes in a piecewise constant manner. We do not indicate the dependence of rate matrices on the configuration of the parents, and instead just call the relevant rate matrix at time t, At.

The Gibbs update for node n begins as depicted in subplot a of figure 4.3, with the current trajectory of node n and that of its Markov blanket. Like chapter 3, we first reconstruct the thinned Poisson events, and then update the trajectory. In principle, we could imagine (S, T ) had a uniformized construction from a subordinating Pois- son process with rate Ω, so that we resample the thinned events from an piecewise- inhomogeneous Poisson process with rate (Ω − |At_S(t)|). However, such an Ω would have to dominate the event rates corresponding to all configurations of the parents of node n. Abusing notation, let Ap be the rate matrix when P(n) takes on configuration p (it will always be clear from the context whether the superscript refers to time or parent configuration). Then we need

Ω ≥ |Ap_s| ∀p, s (4.8)

This can be inefficient, particularly in large CTBNs with a few unstable states. In such a situation, the subordinating Poisson rate Ω can be determined by a possibly atypical configuration p of P(n) that leads to instability in node n (and thus large values of |Aps| for some s). This can leads to a very large number of thinned events, and a consequent inefficiency in the forward-backward algorithm.

Instead, since the rate matrix At varies in a piecewise-constant manner, we might consider subordinating it to a piecewise inhomogeneous Poisson process. For a rate matrix Ap, define a corresponding Poisson rate Ωp ≥ maxs(|Aps|), and (abusing notation again) define Ωt as the Poisson rate at time t. We then resample the thinned events, now from a Poisson process with rate

Ωt− |At S(t)|

. Now, the Poisson rate at any time time t is dictated by the relevant configuration of the Markov blanket of the node. Like chapter 3, the posterior Poisson intensity of the thinned events is still piecewise constant, changing only when either S(t) changes state (the times in T ) or when one of the parents changes state (we call this set of times P ).

The correctness of such an approach is obvious for a piecewise-inhomogeneous MJP; we can just view this as a sequence of MJPs with different parameters. Our situation is a bit more subtle (though still straightforward). In particular, the rate matrix At at any time t is not fixed, but varies from Gibbs iteration to iteration as the configuration of MB(n) changes. One way to see why our scheme is still valid is by viewing the overall Gibbs update from S(t) to ˜S(t) as a transition operator parametrized by the

Inference in CTBNs 63

a)

b)

c)

Figure 4.3: Gibbs update for a node of a CTBN. The colours refer to the associated Markov blanket configuration.

Poisson rate Ω. We saw in section 3.5 that any operator with Ω > maxs|Ats| has the correct stationary distribution. Now, under our scheme, we choose a particular Ω (and therefore a particular transition operator) depending on the configuration of the node’s Markov blanket. This is valid.

Figure 4.3(b) shows the result of resampling the thinned events U from the rate (Ωt− |Ai

Inference in CTBNs 64

associated Markov blanket configuration.

Figure 4.3(c) shows the final step in the Gibbs update, where we thin the set W ≡ (T ∪U ) by constructing a subordinated Markov chain on the set of times ˜W ≡ T ∪U ∪P . For this step, we include P only to emphasize that the parameters of the Markov chain (its transition and emission matrices) change after events in P ; it is important to realize that the MJP path for node n will not change state at the times in P (so that when t ∈ P the transition matrix is simply Bt _{= I). At times t ∈ T ∪ U , the transition} matrix is

Bt= I + 1 ΩtA

t _(4.9)

Since the Poisson rate Ωt _{varies with time, the transition operator B}t _{must do so too.}

Characterizing the emission matrix of the Markov chain is easy; observe that if node n had no children, we could proceed by resampling the states of the subordinated hidden Markov model using the likelihood function Li(s) in equation (3.28). To account for the presence of children C, we must weigh the probability of a complete trajectory S(t) with the probabilities of the child trajectories under that path, seeequation (4.7). Each child factor φ(Sc, Tc|SP(c), TP(c)) is the density an MJP, and from the Markov property, this factorizes as

φ(Sc, Tc|SP(c), TP(c)) = | ˜W |−1

Y i=0

φi(Sc, Tc|SP(c), TP(c)) (4.10)

Here, φi(Sc, Tc|SP(c)_{, T}P(c)_{) is the density of a segment of the child trajectory over} ( ˜wi, ˜wi+1) for successive elements in ˜W . As before, we define ˜w0 = tstart, and ˜w_{| ˜}_{W |} = tend. Evaluating φi under any configuration of sn is now a simple matter of counting how much time the child node spent in each state, and well as the number of transitions between each pair of states, under each setting of the other parents of c.

The total likelihood function for the state of node n at step i of the hidden Markov model (i.e. over the interval [wi, wi+1)) must include all children as well as the observations. This is just the product of the individual terms:

Li(s) = Li(s) Y c∈C

φi(Sc, Tc|SP(c), TP(c)) (4.11)

Calculatingequation (4.11) is straightforward as we make a forward pass through the event times. Given the transition probability (equation (4.9)) and the likelihood (equation (4.11)) of the Markov chain at step i, we use the forward filtering-backward sampling algorithm to obtain a trajectory of node n (subplot c infigure 4.3).

Since the new trajectory ˜S(t) is obtained via introducing auxiliary variables and con- ditionally sampling a new path in the extended space, the MCMC sampler retains the

Experiments 65

conditional distribution as its stationary distribution. Ergodicity of the conditional update, and thus the overall Gibbs sampler is straightforward to see, so that we have the result:

Proposition 4.1. The auxillary variable Gibbs sampler described above converges to the posterior distribution over the CTBN sample paths.

Note that unlike the Gibbs sampler ofEl-Hay et al.(2008) which produces independent samples from the conditional distribution, ours produces dependent Gibbs updates. With the trajectory updates part of an overall Gibbs cycle, we find that a condition- ally independent sample has a negligible benefit towards mixing, and is significantly wasteful, once the computational cost is factored in.

4.3 Experiments

In the following, we evaluate a C++ implementation of our algorithm on a number of CTBNs. As in chapter 3, for a rate-matrix Ap, the parameter Ωp was set to 2 maxs(|Aps|), so that for any node, the rate of the subordinating Poisson process varies with the configuration of its parents.

In document Markov chain Monte Carlo for continuous-time discrete-state systems (Page 61-65)