ABC Rejection - Bayesian inference for continuous time Markov chains

The ABC rejection process is based on two steps: first, a new value is proposed for the model parameter θ0 _{from the prior distribution; then data y}∗ _{is simulated}

from the model given the sampled parameter θ0_{, and compared to the observed y.}

The proposed value can be accepted if the equality between the simulated and the observed data is satisfied, or otherwise rejected. Finally, the accepted candidate value is assumed to be a part of the exact posterior distribution. This procedure is known as the likelihood free rejection algorithm or Perfect rejection sampling algorithm and presented in the following terms:

Algorithm 10 Perfect rejection sampling algorithm

1: Propose a parameter θ0 from the prior: θ0 ∼ π(θ).

2: Simulate data y∗ from the model using the sampled parameter θ0, y∗ ∼ π(·|θ0).

3: If y∗ = y, then a sampled parameter is accepted, or otherwise rejected.

This algorithm is exact. To illustrate this, let us assume that there is a joint distribution for the accepted simulated data y∗ _{and proposed parameter θ}0_{, denoted}

πABC(θ0, y∗), so that:

πABC(θ0, y∗) = π(y∗|θ0)π(θ0)1y(y∗),

where the indicator function 1 equivalent to the weighting function π(y|y∗_{, θ)} _{in the}

equation (4.7).

In the case of the equality between the simulated and the actual data being satisfied, the indicator function is 1y(y∗) = 1. As we are interested in the marginal parameter

space, the simulated data y∗ _{is marginalised as:}

πLF(θ0) =

Chapter 4. Approximate Bayesian Computation 83

= π(y|θ0)π(θ0) ∝ π(θ0|y∗),

and this is a proof that all the accepted candidate θ0 _{are sampled from the exact}

posterior distribution.

The likelihood free rejection sampling algorithm can be performed only if equality between observed data and simulated data is obtained with a certain probability. However, finding simulated data that exactly matches the observed data with non- zero probability is a rare case and can occur only if the data are discrete variables. When data is continuous, it is indeed very challenging to obtain non-zero probability equality. The main disadvantage of this algorithm is that the acceptance rate will be small and a large number of simulations from the model is needed to obtain a reasonable sample size which can represent the posterior distribution.

Due to the restrictions of the perfect rejection sampling, another ABC approach was proposed to mitigate the problems associated with low acceptance rates by modifying the acceptance condition. The algorithm is based on introducing a distance function that measures the similarity between simulated and observed data, and the equality in the previous algorithm is replaced by a certain tolerance level > 0 to measure the closeness between the datasets. If the distance is below the defined tolerance level, the proposed parameter θ0 _{will be accepted as a sample from the approximate}

posterior distribution. Otherwise, the sample proposed from the prior θ0 _{will be}

discarded. A summary of this algorithm is detailed below: Algorithm 11 ABC rejection sampling algorithm

1: Propose a parameter θ0 from the prior: θ0 ∼ π(θ)

2: Simulate data y∗ from the model using the sampled parameter θ0, y∗ ∼ π(·|θ0).

3: If ρ(y∗, y) < ; then the sampled parameter is accepted, or otherwise rejected. The accepted proposed parameter θ0 _{and simulated data can form the joint distri-}

bution πABC(θ0, y∗|y) as:

πABC(θ0, y∗|y) = π(θ0, y∗|ρ(y∗, y) < )π(θ0)

∝ π(y∗|θ0)π(θ0)1ρ(y∗_,y)<,

and here the indicator function corresponds to the weighting function that was defined previously in equation (4.8). Then, the resulting approximate marginal

posterior distribution after the simulated data are marginalised out is given by:

πABC(θ0) =

πABC(θ0, y∗|y)dy∗.

In this algorithm, the simulated data are marginalised out, which implies that these simulated data do not need to be kept in order to obtain a sample from the parameter posterior distribution. The performance of the ABC scheme depends on the setting of the tolerance : setting them to be small improves the approximation of the posterior but in the meantime results in expensive computational time. However, a large tolerance setting results in a poor approximation of the posterior as most proposed values from the prior are accepted. We, therefore, conclude that a suitable setting for the tolerance is such that can balance between the accuracy and the computational time (McKinley et al., 2009). Wilkinson (2013) suggests a method to improve the inference in light of considering an error , which is either an error on data y or an error in the model π(·|θ). Let us assume that the error has a density ∼ π(·). Then, it can be possible to describe the posterior in terms of errors, with

the ABC rejection algorithm becoming: Algorithm 12 Generalised ABC (GABC)

1: Draw θ0 from the prior π(θ)

2: Simulate the data y∗ from the model π(·|θ0)

3: Accept θ to the posterior sample with probability π y−y

∗

c .

As proposed in (Wilkinson, 2013), the constant c is chosen to ensure that the term

π(y−y∗)

c defines a probability. When = 0, we have c = π(0) which maximises

the acceptance rate of the algorithm. For more information on this method, see (Wilkinson, 2013).

However, Toni et al. (2009) states that the possible drawback of using the ABC rejection method is that it still suffers from a low acceptance rate and poor accuracy, especially when the prior and the posterior have a significantly different form. Moreover, Wilkinson (2013) mentions that sampling from the prior repeatedly can lead to an inefficient rejection algorithm.

Chapter 4. Approximate Bayesian Computation 85

4.4 Sequential Approximate Bayesian Computation

In document Bayesian inference for continuous time Markov chains (Page 104-107)