Effective Sample Size - ABC SMC Algorithm Tuning

4.5 ABC SMC Algorithm Tuning

4.5.5 Effective Sample Size

ABC SMC can provide a reasonable estimation of model parameters. However, a critical issue that can arise when performing this sequential sampling method is weight degeneracy after a few iterations, but it can be overcome by adding a resampling step to the algorithm. In this way, the particles with negligible weight are eliminated and those with a high weight are replicated.

However, performing resampling at each step means that only particles with a high weight are used and diversity in the particle population can thus be lost and can add computational cost to the algorithm. Therefore, it would be more efficient to perform a resampling step, only when needed, to avoid this problem. In this case, the Effective Sample Size (ESS) of the weighted particle set is needed to decide whether a resampling step is required or not. When the sample size of the weighted samples falls below the defined threshold, a resampling step should be applied. ESS can be defined as:

ESS = 1 PN i=1w 0₂ i ,

where the wights w0

i are unnormalised. We can define wi =

w0_i

i=1w

Chapter 4. Approximate Bayesian Computation 91 ESS = 1 PN i=1w 2 i = ( PN i=1w 0 i)2 PN i=1w 0₂ i ,

where ESS takes values between 1 and N (Robert and Casella, 2010).

4.6 Summary

In this chapter, we discussed the family of ABC algorithms concerning parameter inference for a complex stochastic model with intractable likelihood. The ABC method has proven its ability as a powerful tool for applying inference in a wide variety of intractable model situations. We began by introducing the basic technique of a likelihood-free method and then considered the development of a perfect rejection method to the ABC rejection method. We identified the important factor that can influence the quality of estimation (), in the case of the rejection method, and we also described the impact of setting on the accuracy and computational cost. In order to obtain a more efficient approximation of the posterior, ABC with SMC was introduced.

In this chapter, we discussed the ABC SMC technique, and we investigated a possible algorithmic setting that can improve the efficiency of the algorithm. Within an ABC algorithm the main goal is to achieve an accurate estimation with a lower computational time, compared to other Bayesian inference methods. We also identified that the choice of plays an essential role in performing the ABC SMC algorithm. An adaptive choice of the tolerance level, based on quantiles, was introduced instead of having to set the tolerance values manually. Yet, it is hard to determine which quantile should be used, but we conclude that the use of a higher quantile resulted in a more efficient estimation but adding to the computational cost.

We also outlined a possible issue that can arise in practice: the main difficulty in performing ABC SMC is to determine the approximate number of required simu- lations to provide a reasonable estimation. In some situations, it is unpractical to match precisely the simulated and observed data; hence, there is always the smallest tolerance level that will not be exceeded anymore. It might be beneficial to be able to approximately specify a target tolerance level to avoid intensive running time without any improvement in estimation. It may, therefore, be beneficial to identify the tolerance level based on a pilot run of the model, and hence an appropriate stopping time for the algorithm could then be determined. The main advantage of the ABC approach is that it applies to any complex model without any restriction,

Chapter 5 Application to the Lotka-Volterra

Model

Lotka-Volterra (LV) is considered as stochastic reaction network which was devel- oped within the modelling context in ecology (Lotka, 1932). This system has been modelled the process several times, and many inference methods have been applied to this model. For example, ABC SMC (Toni et al., 2009), PMCMC (Golightly and Wilkinson, 2011), (Owen et al., 2015) and pseudo-marginal sampler based on truncation (Georgoulas et al., 2017). These methods will be applied with a slightly different setting. A comprehensive comparison between these approaches in terms of accuracy and the computational cost will be performed. The ABC SMC scheme of (Toni et al., 2009) was performed with a deterministic tolerance schedule, and in this thesis, this method will be performed with an adaptive tolerance schedule tuning. In addition, a different choice of tolerance schedule, a predefined target tolerance and the number of particles will be investigated. Owen et al. (2015) consider the ABC algorithm to initialise the PMCMC algorithm to obtain a faster convergence. In contrast, in this thesis, the PMCMC will be applied to different synthetic data sets that are generated at different parameter settings to assess the performance of the algorithm. In a pseudo-marginal sampler based on a random truncation of (Geor- goulas et al., 2017), the LV model comprises four reactions while the LV model in this thesis is designed to make it comprise of three reactions. This method provides an exact result for the model, which makes it an attractive approach to be used to assess the accuracy of other approximation methods.

This chapter aims to build a stochastic simulation of the LV model. Afterwards, the time series data that are generated from the model are used as our synthetic data. Also, this chapter will demonstrate the performance of inference of reaction rates

of the LV model from a Bayesian perspective over continuous time Markov chain models with no available explicit likelihood. In addition, the comparisons will be carried out between these inference methods in terms of accuracy and computational expensiveness.

5.1 The Lotka-Volterra Model (LV)

The LV model can be defined in terms of a stochastic kinetic model consisting of two (nonnegative integer values) species, x1 for the prey and x2 for the predators. The

interaction between them within the population can be modelled by the following reaction equations:

R1 : x1 → 2x1 prey reproduction

R2 : x1+ x2 → 2x2 predator reproduction

R3 : x2 → φ predator death

(5.1)

The LV model includes biochemical reactions relying on hazard functions which depend on the current state of the system as defined in section 2.5.1. The hazard function for the first order reaction R1 is defined by the hazard:

h1(x, c1) = c1x1.

The hazard function of a second order reaction R2 can be defined by the number of

combinations of two species x1 and x2, it is given as:

h2(x, c2) = c2x1x2.

For the reaction R3, the hazard is:

h3(x, c3) = c3x2.

Having identified the reaction equations and the stochastic hazard functions of the system, the system can be defined as the LV model using CTMC. The states of our CTMC are labelled with pairs of values (prey and predator counts), so that: x(t) = (x1(t), x2(t)). The transitions of the CTMC correspond to the reaction

equations in (5.1). There can be more than one outgoing transition from a given state. The transition to the next state will be determined by the minimum of

Chapter 5. Application to the Lotka-Volterra Model 95 (0,0) (1,0) (2,0) (3,0) (0,1) (0,2) (0,3) (1,1) (1,2) (1,3) (2,1) (2,2) (2,3) (2,3) (3,3) (3,1) c1x1 c2 x1 x2 (0,n) (1,n) (2,n) (3,n) (m,n) (m,3) (m,2) (m,1) (m,0) c3 x2 Prey Pre d a to rs

Figure 5.1: The CTMC that describes the LV model. States are labeled with the number of prey and predators. Also, the transitions are associated with correspond- ing reactions. All the transitions represented with horizontal arrows will have the transition rate h1 = c1x1, the transitions represented with vertical arrows will have

the transition rate h3 = c3x2 and the transitions represented with diagonal arrows

the exponentially distributed waiting times such as: waiting time = min(t1, t2, t3),

where t1 ∼ Exp(h1), t2 ∼ Exp(h2) and t3 ∼ Exp(h3). Figure 5.1 shows how the

interactions between prey and predators can be modelled by CTMC.

In document Bayesian inference for continuous time Markov chains (Page 112-118)