Approximate Bayesian Computation - Bayesian approaches to parameter estimation

5.3 Bayesian approaches to parameter estimation

5.3.1 Approximate Bayesian Computation

Approximate Bayesian Computation (ABC) has been developed over the last 10 years in population genetics in order to estimate parameters of interest when the likelihood cannot be estimated or is computationally intractable. Given a sample of SNP data with an unknown genealogy then the likelihood function requires integrating over all possible genealogies. Figure 1.3 shows that in a sample of size 6 from a neutral model, there are 2700 possible branching structures. For parameter φ to be estimate, this method computes a set of summary statistics denoted by S_obsfrom observed data and then, loosely speaking, simulates data under the proposed model for a range of values of φ computing S_sim and accepts values of φ when η(Sobs, Ssim) ≈ 0 for some distance measure η(·, ·) and so replacing the full likelihood function p(x|φ) by p(S(x)|φ) for observed data x.

CHAPTER 5. ESTIMATING POPULATION PARAMETERS 86 More precisely, Robert et al. (2011) provide the underlining ABC procedure. Given ob-served data x ∈ X , summary statistics S and distance measure η, then samples {φ₁, . . . , φ_m} are simulated from

f (φ, S(z)|x) ∝ π(φ)f (S(z)|φ)IA^η,x(z),

where

A^η_,x = {z ∈ X |η{S(z), S(x)} ≤ },

and

IA^η,x(z) =







1, if z ∈ A^η,x ; 0, otherwise,

is an indicator function and chosen to be ’close’ to zero. The performance of this method depends on the choice of η and S. Stephens (2007) suggests that using statistics that are directly affected by the parameter of interest will increase the efficiency of this method.

However, it is useful to consider not only how informative a statistic is in parameter estim-ation but also how informative the statistic is given the set of statistics already included in the analysis. Joyce and Marjoram (2008) assess the question: given statistics S1, . . . , Sk−1, then how beneficial will an additional statistic S_k be? They compare the log likelihood function l{S₁, . . . , S_k−1|φ} to l{S₁, . . . , S_k−1, S_k|φ}, the difference being the latter function contains the additional term l{S_k|S₁, . . . , S_k−1, φ}. They attached a score δ_k to each new statistic S_k given S₁, . . . , S_k−1 and φ. Once a score falls below a pre-specified threshold then the corresponding statistic is not included in the inference. Their methodology was to find a set of approximately sufficient statistics. For example, if S₁, . . . , S_k−1 are sufficient statistics then l(S_k|S₁, . . . , S_k−1, φ) = l(S_k|S₁, . . . , S_k−1) and so S_k adds no additional in-formation about φ. Barnes et al. (2011) take an inin-formation theory approach to finding the set of statistics that minimise the consequent loss of information that results from in-corporating summary statistics into any inference as opposed to using the full data. Given

a larger set S of statistics they devise an algorithm to find the minimal set U ⊂ S such that S contains no more information about φ given U . In order to assess the performance of the set of statistics S, Nunes and Balding (2010) implemented the square root of the sum of squared errors, RSSE;

RSSE = 1 na

i=1

Ii|φ_i− φ_obs|²

1/2

where n_sis the number of simulations in the ABC algorithm, n_ais the number of accepted draws and

Ii =







1, if φi is an accepted draw;

0, otherwise.

For each subset U ⊆ S, the ABC algorithm is repeated no times and RSSE computed with the intention to find the set U^∗ which minimises

MRSSE =

j=1

RSSE_j.

With attention given to methods of selecting the optimal set of statistics, the focus now moves to describing the evolution of ABC algorithms. Beaumont et al. (2002) proposed a method of inference based on summary statistics as an extension to a method described by Tavar´e et al. (1997) who were interested in estimating the time to the most recent common ancestor of a sample conditioning on the observed number of segregating sites Sn. They specified priors π_N and πµ on the population size N and mutation rate µ per site per generation. In a genealogy of length L, S_n∼ P oi(¹₂Lθ). The algorithm used was:

1. Simulate N and µ from π_N and π_µ.

2. Simulate data under the standard coalescent model to find coalescent waiting times {W₁, . . . , Wn}.

3. Find the time to the most recent common ancestor TM RCA and the total length of the genealogy T_total using {W₁, . . . , W_n}.

CHAPTER 5. ESTIMATING POPULATION PARAMETERS 88 4. Keep TM RCA, N and µ with probability

u ∝ P r(S_n= k|T_total= L)

= 1

k!e⁻^Lθ² Lθ 2

This process simulates samples {T_{M RCA}₁, . . . , T_{M RCA}_nsim} from distribution of T_{M RCA}, given N and µ|Sn= k and an estimate of TM RCAis ˆTM RCA= _n¹

sim

Pnsim

i=1 TM RCAi. Pritchard et al. (1999) studied an ancestral population of size N_Athat underwent a period of exponential expansion at rate r that began at time t0. They used three summary statistics, namely the average heterozygosity ¯H, the number of distinct haplotypes, n, and the mean of the variance in repeat numbers (using microsatellite data), ¯V . The rejection algorithm implemented was:

1. Calculate observed statistics, s = {n, ¯H, ¯V }.

2. Simulate φ⁰ from the prior distribution π_φwith φ⁰= {N_A⁰, t⁰₀ and r⁰}.

3. Given φ⁰, simulate a genealogy under the required model and then microsatellite data.

4. Computed the statistics s⁰ = {n⁰, ¯H⁰, ¯V⁰} on the simulated data.

5. If |s⁰_i− s_i| < δ for all i = 1, 2 and 3, then accept φ⁰.

Again, this process produces draws from the joint distribution of the parameters of interest φ given the observed summary statistics s. Beaumont et al. (2002) extended this method by following the first four steps to produce draws (φ_i, s_i) for i = 1, · · · , n_s and weighting draw φi by |si− s|. They assume that φ_i can be modelled by

φ_i = α + (s_i− s)^Tβ + e_i for i = 1, · · · , n_s.

for some coefficients α and β to be estimated and independent errors ei Since E(φ|S = s) = α, the authors write

φ^∗ = φ − (si− s) ˆβ

as a random sample from p(φ|S = s). Although the relationship between {φi} and {s_i} may not be linear, the authors assume that the relationship is linear locally to s. They estimate (α, β) by minimising

i=1

(φ_i− α − (s_i− s)β)²K_δ(|s_i− s|)

for band–width δ (synonymous with the last step of the above algorithm) and kernel function K_δ.

Fearnhead and Prangle (2012) describe ABC algorithms based on rejection and MCMC sampling methods. If interest lies in estimating a parameter φ given observed data x_obs, the Metropolis-Hastings algorithm, detailed by Gelman et al. (2004), aims to make draws φ₁, . . . , φ_N_sim from the target distribution p(φ|x_obs). Given a prior distribution π(φ), the algorithm begins by drawing φ0 from π(φ) and, for i = 1, . . . , Nsim, a draw φ⁰ is made from a proposal distribution q(φ⁰|φ_i−1) and is accepted with probability min(α, 1) where

α = p(φ⁰|x_obs)/q(φ⁰|φ_i−1) p(φⁱ⁻¹|x_obs)/q(φⁱ⁻¹|φ⁰).

The algorithm described by Fearnhead and Prangle requires specification of statistics S(x) with s_obs = S(x_obs) and a density K(x). It is initiated by simulating φ₀ from π(φ) and calculating sobs, and for i = 1, . . . , nsim,

1. Simulate φ⁰ from proposal distribution q(φ⁰|φ_i−1).

2. Simulate data xsim, from the required model, and calculate S(xsim) = ssim.

CHAPTER 5. ESTIMATING POPULATION PARAMETERS 90 3. With probability

α = min

1,K{(ssim− s_obs)/h}

K{(s_i−1− s_obs)/h}

π(φ⁰)g(φi−1|φ⁰) π(φ_i−1)g(φ⁰|φ_i−1)

set φi = φ⁰, otherwise set φi = φi−1, for some pre-specified bandwidth h > 0.

This algorithm may be used to estimate the population divergence time τ by setting

• S = F_st(=F say),

• a N (τ_i−1, σ²) proposal distribution,

• a U ni(0.00001, 0.7) prior distribution for τ , and,

• K(x) =







1, if |x| ≤ 0.5;

0, otherwise.

Therefore,

α = min

1, K{(F_sim− F_obs)/h}π(τ )g(τ_i−1|τ ) K{(Fi−1− F_obs)/h}π(τi−1)g(τ |τi−1)

and

K{(Fsim− F_obs)/h} =







1, if |(F_sim− F_obs)/h| < ¹₂; 0, otherwise.

Necessarily, K{(F_i−1− F_obs)/h} = 1 and

π(τ ) π(τ_i−1) =







1, if τ ∈ (0.00001, 0.7);

0, otherwise.

Since the proposal distribution is normal, g(τi−1|τ ) = g(τ |τ_i−1) and α = 1 or 0. The ABC MCMC algorithm details how to estimate τ .

Algorithm 1 (ABC MCMC).

For i = 1, . . . , Nsim :

1. Simulate τ from N (τ_i−1, σ²).

2. Simulate data using τ and calculate Fst = Fsim.

3. If α = 1 then set τ_i= τ and F_i= F_sim. If α = 0 set τ_i = τ_i−1 and F_i= F_i−1.

This algorithm produces a set of draws τ₁, τ₂, . . . , τ_N_sim. Figure 5.8 shows density estimates of simulated τ ’s for a range of true τ values using this algorithm. On each plot, the red dot shows the true value of τ . For most values of τ , the density peaks around the true values with a few exceptions. By taking the lower and upper 2.5% of each distribution, figure 5.9 shows 95% credible bands for the range of τ values. This algorithm estimates τ well; the bands contain of the line τ = ˆτ and are narrower than those produced by the F_st-based estimator (figure 5.4).

In document Methods for demographic inference from single-nucleotide polymorphism data (Page 101-107)