Application of the ABC SMC - Inference Based on Synthetic Data

6.4 Inference Based on Synthetic Data

6.4.2 Application of the ABC SMC

The ABC SMC algorithm (described in chapter 4) is performed for each data set generated from the Repressilator model according to Table 6.1. The target tolerance for the ABC SMC algorithm was selected as described in section 4.5.4. We randomly selected parameter values from the prior and simulated 2000 data sets from the model. Then, we compute Euclidian distances between these data sets, and choose 1% percentile of these distances (for one fixed value of model parameters) as our target tolerance. The resulting target tolerance for this study is = 20.9.

In all experiments, the tolerance schedule was chosen to be adaptive as the αth _{= 0.5}

quantile of the discrepancy between the observed and the simulated data sets. The prior distributions for each parameter are defined as previously:

π(α) ∼ Uniform(0, 2500) π(α0) ∼Uniform(0, 2)

π(n) ∼Uniform(1, 3) π(β) ∼Uniform(0, 8).

In the first experiment, we consider the first data sets simulated in section 6.3. It is the same data set as used for exact inference in section 6.4.1. The data were simulated using the kinetic parameters θ = {α = 1000, α0 = 1, n = 2, β = 1}.

The resulting approximate posteriors obtained using the ABC SMC algorithm for the first synthetic data set are depicted in Figure 6.7. Being an over dispersed approximation to the exact posterior, the ABC SMC result is certainly wider, but it still has substantial posterior support for the parameter value used for data generation. The marginal posteriors for α, α0 and β are recovered reasonably well,

while the posterior for n is biased in comparison to the true posterior. This may be due to nontrivial and nonlinear contribution of parameter n to the behaviour of the system, as it contributes as the power parameter controlling inhibition affinity. The overlap of the approximate and the exact posteriors not empty, but maximum a posteriori estimate are still quite different. Most importantly, the approximate posterior is wider than the exact one, as the ABC SMC produces an over dispersed approximation.

Figure 6.8 depicts the ESS for the ABC SMC algorithm with N = 8000 particles. The ESS never drops below the threshold of 4000 particles, so population resam-

Chapter 6. Application to Repressilator System 151 α α0 n β α 0 500 1000 15002000 2500 0e+00 2e−04 4e−04 6e−04 500 1000 1500 2000 0.5 1.0 1.5 500 1000 1500 2000 1.5 2.0 2.5 500 1000 1500 2000 0.5 1.0 1.5 2.0 2.5 3.0 3.5 α0 0.0 0.5 1.0 1.5 2.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.5 1.0 1.5 1.5 2.0 2.5 0.5 1.0 1.5 0.5 1.0 1.5 2.0 2.5 3.0 3.5 n 1.0 1.5 2.0 2.5 3.0 0.0 0.2 0.4 0.6 0.8 1.0 0.5 1.0 1.5 0.5 1.0 1.5 2.0 2.5 3.0 3.5 β 0 1 2 3 4 5 6 0 1 2 3 4

Figure 6.7: The posterior distribution of the Repressilator model parameters obtained using the ABC SMC algorithm. The dashed line represents the prior distribution.

pling was never needed. This demonstrates that a healthy population is maintained throughout the run of the samples.

For the second experiment we used a synthetic data set generated using very small parameter values θ = {α = 1, α0 = 0.001, n = 2, β = 0.2}. It was done to investigate

whether the ABC SMC would explore the extremes of the parameter space well using a wide prior to begin.

The approximate posterior distributions for this case was obtained through a sequence of 24 distributions, and are depicted in Figure 6.9 (initial distribution), Fig- ure 6.10 (intermediate distribution) and Figure 6.11 (posterior distribution). Figure 6.12 demonstrates how marginal posteriors evolve along the sequence of ABC ap- proximations in the ABC SMC algorithm. This posterior is remarkably different to the prior. Marginal posterior distributions parameters α and β collapse to a tight distribution around the values used for data generation. The marginal posterior distribution for α0 demonstrates moderate divergence from the prior, while

the marginal posterior for n hardly diverges from the prior at all emphasising the difficulty of inferring n in this nonlinear model.

Figure 6.13 depicts the ESS for this experiment using a population of N = 8000 particles the ESS dropped below the threshold of 4000 particles only once, and we resampled the population in that case. Again, this shows that a reasonable diversity is maintained in the population, and we do not observe any population degeneracy problems.

The population size for ABC SMC is one of the key parameters of this algorithm. We investigate the impact of selecting the population size at different levels: N = {4000, 2000, 1000, 500}, independently. In the case, N = 1000 the ESS was dropping below the resampling threshold only a few times (5 times at worst), while for N = 500 resampling was required at the majority at the stages as shown in Figure 6.14. Judging by the performed diagnostic, it will be reasonable to use N = 1000 particles as it maintains a healthy population while requiring less computational time than our original setting N = 8000. The algorithm required 25 stages to reach the target tolerance for N = {8000, 4000}, and it took 24 stages for N = {2000, 1000, 500}.

The marginal posterior distribution obtained using different population sizes are compared in Figure 6.15 and Figure 6.16. The results are remarkably similar, which confirms that the algorithm converges to the same distribution. In the case N = 500, the result is still acceptable despite the worse performance of the ESS. This confirms

Chapter 6. Application to Repressilator System 153 5 10 15 0 2000 4000 6000 8000 N=8000 ESS

Figure 6.8: The plot shows the ESS obtained from performing the ABC SMC algorithm on the first synthetic data set (Experiment 1).

that resempling stage of the algorithm is mitigating the problems correctly, and the population never collapses to a single (or a few) particles.

Additionally, we performed parameter inference for all the nine synthetic data sets discussed in section 6.3. The results obtained are similar to the case discussed above. For example, for experiment θ = {α = 100, α0 = 1, n = 2, β = 1}, we repeated

inference independently three times using different random number generator seeds, and the resulting marginal posteriors are depicted in Figure 6.17. It demonstrates that the sampler converges to the same target distribution every time. The same convergence property was observed for all nine data sets.

The computational cost of performing ABC SMC is significantly impacted by the size of the population. The importance distribution density (i.e. the density of the perturbation kernel) needs to be evaluated for every proposed particle as a mixture of normal distributions, with the same number of components as the number of particles employed. The cost of doing it at every stage is, therefore, O(N2₎_{. The}

time required to run the ABC SMC with different population sizes is compared in figure 6.18. The box plots are produced using the results of all nine data sets considered in this chapter. Obviously, the largest populations require a larger time

to sample. α α0 n β α 0 500 1000 150020002500 0e+00 1e−04 2e−04 3e−04 4e−04 500 1000 1500 2000 0.5 1.0 1.5 500 1000 1500 2000 1.5 2.0 2.5 500 1000 1500 2000 1 2 3 4 5 6 α0 0.0 0.5 1.0 1.5 2.0 0.0 0.1 0.2 0.3 0.4 0.5 0.5 1.0 1.5 1.5 2.0 2.5 0.5 1.0 1.5 1 2 3 4 5 6 n 1.0 1.5 2.0 2.5 3.0 0.0 0.1 0.2 0.3 0.4 0.5 1.5 2.0 2.5 1 2 3 4 5 6 β 0 2 4 6 8 0.00 0.05 0.10 0.15

Figure 6.9: The distribution of model parameters at the first stage of ABC SMC algorithm. All the samples at this stage are coming from the prior distribution.

In document Bayesian inference for continuous time Markov chains (Page 172-176)