5.3 Sampling via Uniformization
5.4.5 Computational efficiency and mixing
For our final experiment, we compare our proposed blocked Gibbs sampler with the Metropolis-Hastings sampler ofAdams et al. (2009)¶. We ran both algorithms on two datasets, synthetic dataset 1 from section 5.4.1 and the coal mine disaster dataset.
§
Collected by Ariel Rokem at Andreas Herz’s lab; provided through the CRCNS program (http:
//crcns.org)
¶
Discussion 90 0 500 1000 1500 1.5 2 2.5 3 Time (ms)
01
1.5
2
0.1
0.2
Figure 5.9: Left: Posterior mean intensity for neural data with 1 standard de- viation error bars. Superimposed is the log stimulus (scaled and shifted). Right: Posterior over the gamma shape parameter.
Synthetic dataset 1 Coalmine dataset
Mean ESS Minimum ESS Time(sec) Mean ESS Minimum ESS Time(sec) Gibbs 93.45 ± 6.91 50.94 ± 5.21 77.85 53.54 ± 8.15 24.87 ± 7.38 282.72
MH 56.37 ± 10.30 19.34 ± 11.55 345.44 47.83 ± 9.18 18.91 ± 6.45 1703
Table 5.3: Sampler comparisons. Numbers are per 1000 samples.
All involved 20 MCMC runs of 5000 iterations each (following a burn-in period of a 1000 iterations). For both datasets, we evaluated the latent GP on a uniform grid of 200 points, calculating the effective sample size (ESS) of each component of the Gaussian vectors (using R-CODA (Plummer et al.,2006)). For each run, we return the mean and the minimum ESS across all 200 components. In Table5.3, we report these numbers: not only does our sampler mix faster (resulting in larger ESSs), but also takes less computation time. Additionally, our sampler is simpler and more natural to the problem, and does not require any external tuning parameters.
5.5
Discussion
In this chapter, we have described how uniformization allows us to produce exact samples from a nonstationary renewal process whose hazard function is modulated by a Gaussian process. Like the previous chapters, we exploited our uniformization-based construction to develop a novel and efficient MCMC sampler for posterior inference.
There are a number of interesting avenues worth following. First is the restriction that the hazard function be bounded: while this covers a large and useful class of renewal processes, it would be useful to extend our approach to produce exact samples for renewal processes with unbounded hazard functions. We leave the description of such a scheme for the next chapter. In any case, followingOgata(1981), it is easy to extend our ideas to Bayesian inference for more general point processes with bounded hazard rates. For instance, the firing rate at any instant can depend not just on the time since the last event, but on the entire pattern of the previous event history. Such models are
Discussion 91
often more realistic descriptions of various phenomena (Paninski et al.,2007), and for the case of a completely observed point process, we can handle such extensions without incurring any additional computational burden.
Observe that while our generative process involved running a Markov chain on the set of Poisson events, unlikechapter 3, posterior inference did not involve running the forward-backward algorithm. The reason is that we are working in a framework where the renewal events are completely and perfectly observed. We can easily relax this restriction, allowing the event times to be observed noisily, and even allow missing times. Such a situation will require a forward-backward sampling scheme, and can have a complexity that scales quadratically with the number of Poisson events (this is not linear like the MJP because our system is no longer Markov). The next chapter which describes inference for continuous-time semi-Markov jump processes will make clear how one might deal with issues like noisy observation times, missing events etc.
A limitation with our proposal of using a Gaussian process prior on the modulating func- tions is that inference scales cubically with the total number of Poisson points (thinned or otherwise). Thus our approach will not scale well to large problems. As we suggested, because we are working with point processes (and therefore GPs) on the real line, it is possible to choose covariance kernels (other than the squared-exponential) that allow efficient, linear inference. The idea is essentially to use a kernel whose precision matrix has a finite support, and then use efficient forward-backward sampling techniques. We also show in the next chapter how we can reduce the number of thinned events, making GP inference easier. There is also a vast literature concerning approximate sampling for Gaussian processes. An important question is how these approximations compare to approximations introduced via time-discretization. Additionally, even though we con- sidered GP modulating functions, our uniformization-based sampler will also be useful for Bayesian inference involving simpler priors on modulating functions, eg. splines or Markov jump processes.
Chapter 6
Beyond uniformization:
subordinating to general
continuous-time processes
6.1
Introduction
In the last three chapters, we studied a framework for efficient posterior inference in continuous-time discrete-state systems based on the idea of uniformization. We started with the Markov jump process, and after studying two extensions of this model (viz. the MMPP and the CTBN), we moved on to renewal processes. In this chapter, we extend our ideas to semi-Markov processes. Semi-Markov processes are essentially generaliza- tions of the MJP where the waiting times of each state follow some general density on R+ beyond the memoryless exponential. Equivalently, these are generalizations of renewal processes that allow for multiple states with different dynamics. Working with these processes, we shall see that the uniformization framework of the previous chapters can prove restrictive, and we will develop methods beyond uniformization to carry out MCMC inference.
Recall that uniformization involves first sampling candidate event times from a Poisson process whose rate dominates all event rates in the system of interest. This restricts us to systems with bounded event rates; we saw for example that our methods in the last chapter do not extend to bursty renewal densities with unbounded hazard rates. Similarly, recall that in the Lotka-Volterra model (subsection 4.3.1), birth and death rates are proportional to the sizes of the relevant populations. Since we cannot a priori bound the maximum size of a population over any finite interval, we cannot construct a constant bound on all event rates in the system. In chapter 4, we got around this issue by approximating the original system with a truncated one that does have a bounded population size. This however introduces a bias into our inferences, and to
Semi-Markov processes 93
keep this small, we needed a conservative bound on the population size (and thus on the maximum event rate in the system). This in turn can lead to bounding rates that are significantly larger than typical rates witnessed in the system, introducing a large number of thinned events that have to be resampled at the time of inference. A related source of inefficiency is the presence of states in an MJP with widely differing event rates. Again, by picking a single rate Ω that dominates all event rates in the system, the average number of Poisson events (and thus the computational cost of our algorithm) scales with the leaving rate of the most unstable state. At the same time, this state is often the one that the system will spend the least amount of time in.
A hint of the ideas that follow was provided when we studied inference for CTBNs (section 4.2). There, rather than picking a single dominating Poisson rate for a node of a CTBN, we allowed the dominating rate to depend on the current configuration of the parents of the node. In this chapter, we extend this idea, allowing the dominating Poisson rate to vary not just with configuration of a node’s Markov blanket (if any) but also with the state of the node itself. This will allow us to develop a general framework for MCMC inference for a much wider variety of continuous-time discrete-state systems. First however, to provide ourselves with a concrete problem to address, we introduce semi-Markov processes.