Review of Likelihood-Free Methods - Bayesian inference for continuous time Markov chains

The likelihood free approach is a common method that can be used within the Bayesian context to deal with an intractable model. A stochastic model may be used to model a dynamical system; this implies that if we are able to generate data from this model, given the model parameter has some value θ ∈ Θ, then the output

of such a system is given by:

y∗ ∼ π(·|θ),

where y∗ _{∈ X} _{is the state of the model and the likelihood for this model is π(·|θ). The}

main aim is to infer the unknown model parameter within a Bayesian perspective where the posterior can provide all the information about the parameter by using the following joint distribution:

π(θ, y∗) ∝ π(y∗|θ)π(θ). (4.1) A possible method that can be used when the likelihood is not available is Approx- imate Bayesian Computation. The basic idea of this likelihood-free method is that the target posterior can be obtained through an augmented model. An auxiliary variable is introduced in the model; thus, the posterior π(θ|y) ∝ π(y|θ)π(θ) can be adopted within this augmentation to be:

π(θ, y∗|y) ∝ π(y|y∗, θ)π(y∗|θ)π(θ), (4.2) where auxiliary y∗ _{∈ X} _{represents ith simulated dataset among simulated data from}

the model likelihood y∗ _{∼ π(y}∗_|θ)_{, where y}∗ _{is on the same state space as y. The}

function π(y|y∗_{, θ)} _{is defined to weight the intractable posterior π(θ|y), where the}

high value of the weight in regions indicates the similarity between y∗ _{and y. In the}

case that y∗ _{= y}_{, the function π(y|y}∗_{, θ)} _{is considered to be constant with respect to}

some parameter θ, thus, π(y|y∗_{, θ) = c}_{. On the other hand, the low values of weight}

in the regions mean that the datasets y∗ _{and y are dissimilar. The main interest lies}

in the marginal augmented posterior:

πABC(θ|y) ∝

π(y|y∗, θ)π(y∗|θ)π(θ)dy∗. (4.3) The marginalisation can be performed by integrating out of the auxiliary simulated data y∗_{. The resulting posterior π}

ABC(θ|y)is considered as an approximation to the

Chapter 4. Approximate Bayesian Computation 79

πABC(θ|y) ∝

∗_|θ) _{represents N simulated datasets from the model. The approxi-}

mated expectation in (4.6) was initially introduced by Marjoram et al. (2003); then, it was studied and used by Toni et al. (2009).

Typically, there are two main approaches present in the literature to simulate the posterior πABC(θ|y)underlying a likelihood-free concept. The first approach is aim-

ing to simulate from the augmented model π(θ, y∗_|y)_{, where the joint samples (y}∗_{, θ)}

are obtained before marginalisation of the posterior. The other approach works on marginalising space directly πABC(θ|y∗)through Monte Carlo integration, as shown

in (4.6). Throughout this thesis, this second approach, in particular, will be used. The ABC approaches have been developed and gained popularity rapidly since they were first introduced in population genetics by Pritchard et al. (1999) and Tavaré et al. (1997). Then, the research focused on the development of ABC within the likelihood-free method. The likelihood-free approaches were introduced in the Bayesian literature by Tavaré et al. (1997). A recent development was an adaption of the ABC algorithm, which was used to handle an inference issue in statistical genetics. Then, a sequence of developments occurred within a basic likelihood-free setting, a significant development in ABC methodology replacing the direct comparison of the datasets in weighting function by tolerance level; this introduced the concept of approximation and the sample obtained was considered to be from the approximated posterior (Beaumont et al., 2002), (Marjoram et al., 2003),(Marin et al., 2012).

Further developments were introduced by Wilkinson (2013) who showed that ABC can provide an exact result under a model error assumption; more details of this methods can be found in (Wilkinson, 2013). A major effective development in the ABC methodology was made by Sisson et al. (2007a) and Peters et al. (2012). They proposed a novel approach using SMC within the ABC method. Several algorithms were developed to obtain a more efficient scheme for the ABC method, compared to the standard rejection technique (Del Moral et al., 2006). There have also been various developments focusing on improving the efficiency of the ABC SMC, taking into consideration the setting of the tolerance level and the choice of the type of

perturbation kernel (Filippi et al., 2013)(Del Moral et al., 2012), (Sisson et al., 2007a). An illustration of the SMC ABC and its recent development, as well as the design of the algorithm, will be given in detail in this chapter.

The concept of the ABC approach is based on proposing a new candidate parameter θ0; given this candidate’s value, a new dataset will be simulated from the model y∗ ∼ π(y∗_|θ0₎_{. If the simulated dataset is equal or "close enough" to the observed}

one y∗ _{≈ y}_{, then the candidate parameter will be accepted into the posterior sample}

π(θ|y). If the equality condition is satisfied, the proposed parameter is considered to be drawn from the exact posterior distribution; when the datasets are "close enough" but not equal, the proposed parameter is drawn from the approximated posterior distribution. To be able to carry out this inference procedure, the following essential ingredients are required:

1. Ability to simulate a data set: the key requirement for applying ABC or the likelihood- free approach is the ability to generate a synthetic data y∗ _from

the model π(y∗_|θ0₎_{. The essential term in Bayesian inference is computing the}

likelihood π(y|θ0₎ _{which is replaced by an approximation of the closeness of}

the simulated data set to the observed dataset. Hence, the main component for the ABC method is to be able to simulate from the model.

2. Distance metrics: after having obtained a synthetic data set from the model, the ABC then makes use of the distance metrics function to measure the distance between the simulated and observed data. A popular choice of the distance is the Euclidean distance, which is defined as follows:

ρ(y, y∗) = v u u t N X i=1 (yi− y∗i)2.

The metric ρ(·, ·) must satisfy the standard properties: (a) ∀(y∗_{, y), ρ(y}∗_{, y) ≥ 0}

(b) ∀(y∗_{, y), ρ(y}∗_{, y) = 0} _{iff y}∗ _{= y}

(d) ∀(y, y∗_{, z), ρ(y}∗_{, z) ≤ ρ(y}∗_{, y) + ρ(y, z).}

If ρ(y, y∗_{) = 0}_{, this implies the simulated data exactly matches the observed}

data, and the corresponding sample θ0 _{is coming from the exact posterior}

Chapter 4. Approximate Bayesian Computation 81

3. Weighting function: having specified the distance function and measured the difference between the observed and synthetic datasets, the following step is to define the weighting function or indicator function based on π(y|y∗

i, θ).

Numerous definitions have been proposed using a direct comparison of both datasets as: π(y|y_i∗, θ) ∝    1 if y∗ = y 0 if y∗ 6= y (4.7) However, an exact comparison may be impossible in most cases as it requires a huge computational cost to obtain at least one acceptable sample. Therefore, in order to obtain a practical sampler, the above equation is redefined by using of the distance function and introducing a tolerance level , so that it can be expressed as follows: π(y|y_i∗, θ) ∝    1 if ρ(y∗, y) ≤ 0 if ρ(y∗, y) > (4.8)

4. Tolerance schedule: the choice of the tolerance schedule plays an important role in the efficiency of the ABC. Using the distance metric, these approximations tend to the exact posterior as goes to zero;

lim

→0π(θ|ρ(y ∗

, y) < ) = π(θ|y).

A small contributes to increasing the accuracy of the approximation but the computational cost will increase. In contrast, large values of lead to imprecise approximation. The resulting approximated posterior will converge to the correct one as:

if → 0, πABC(θ|y) → π(θ|y)

while

if → ∞, πABC(θ|y) → π(θ).

It is clear from the above that the accuracy of the resulting approximated posterior and its correct convergence will be based on the value of . The deterministic tolerance can be employed with rejection algorithm and the Sequential

Approximate Bayesian Computation (ABC SMC) algorithm (described later in this chapter). We can also use a deterministic and fixed sequence of tolerances t. However, an alternative effective choice of the has been used where the

tolerance is adaptive through the iterations. This adaptive tolerance method will be discussed in more detail in section 4.5.1.

In document Bayesian inference for continuous time Markov chains (Page 99-104)