Bayesian computation - Bayesian analysis of radiocarbon dates

3.3 Bayesian analysis of radiocarbon dates

3.3.5 Bayesian computation

The key challenge of implementing any realistic Bayesian analysis is the computation.

A small range of problems has an analytical solution through use of conjugate priors.

In these kinds of analyses the parameters feature as variables in well-defined functions (Hoff 2009, Ch.3). In a number of other cases grid approximation and numerical so-lutions are a possibility. For example, the single parameter problems of calibrating radiocarbon dates, or even basic Bayesian wiggle-matches can be executed in this way (Bronk Ramsey et al. 2010). Most realistic applications, however, are beyond the scope of these low computation methods. This is because for each added parameter a new dimension is added to the problem. Hence, numerical computation of a single radio-carbon calibration on a prior with 1000 possible values requires 1000 calculations. The same computation for this date in a phase requires 1000³ computations: for the prior and for each of the boundaries; with each added parameter the number of computa-tions increases by several orders of magnitude and approximating the entire grid of possible solutions becomes unviable (Kruschke 2015a, 235). For this reason Bayesian methods were of little use in applied research until the introduction of the Monte Carlo Markov chain techniques (MCMC) in the 1980s. A clear exposition of the MCMC is provided by Kruschke (2015a, Ch. 7) and the summary below follows his exposition, unless referenced otherwise.

Imagine that we want to describe a discrete probability distribution (Figure 3.20), but for whatever reason we are unable to conduct an exact survey of the values. One way of doing this is to begin at any value of the distribution and flip a coin; if heads, consider a move to the left and if tails, consider a move to the right. This is the proposal distribution. Next, consider the relative probability value of the location of the proposed move to the current location. If it is higher, move there. If it is lower move there with the relative probability of the proposed to current locations:

the probability of the move happening is equal to the probability of the proposed parameter value divided by the probability of the current parameter value. Finally, add a counter for whatever location we are at the end of the process. Over a large enough number of iterations, the distribution of counters will be proportional to the target distribution with certainty. Hence the MCMC algorithm is a means of drawing samples representative of the target distribution, without having to calculate approximations of entire grids. Note that any probability distribution can be used as a proposal distribution it does not have to be Bernoulli distribution (the coin flip). Indeed, most real applications use more complex distributions. The MCMC algorithm used throughout this thesis is the Metropolis-Hastings algorithm (Hastings 1970; Metropolis et al. 1953), which differs from the simple example of Figure Figure 3.20 in that it can

update any number of parameters in one step and can use any number of proposal distributions.

All MCMC methods are based on samples from the target distribution. Although they will always describe the target distribution given a sufficient number of repetitions, in practice the computation times required mean that analyses stop short of the point of certainty. This in turn means that the MCMC sample may suffer from unrepre-sentativeness and inaccuracy. The first thing to consider is whether the sample is representative of the posterior distribution; there always exists a probability that the sampler will become trapped in some specific part of the posterior distribution and the outcome will be unrepresentative. The usual approach to preventing this is to run multiple chains and see how well they resemble one another if they are indistinguish-able, it is stated that they are well mixed and hence ought to be representative. Note also that each chain will have begun from a different initial value and for a number of iterations will not be mixed with the other chains (Figure 3.21). This period is called the burn in and is discarded from the analyses. The other important question regards the size of the MCMC sample. In general, if the proposal distributions are too narrow or too wide relative to the problem, the chain values might be too dependent on their indirect predecessors and hence provide a poor picture of the variability of the target distribution. For this reason, the number of iterations is not the same as an effective sample size (ESS). A good algorithm will ensure that the dependency is minimal and that the ESS is large compared to the number of iterations. If the ESS is too small, the results of the analyses may be biased.

Figure 3.20: Basic Metropolis Monte Carlo Markov chain algorithm. Begin at any arbitrary point of the target distribution here it is K. Select a next candidate location according to a proposal distribution here it is one step left or right with equal 50% probability. If the probability of the candidate location is higher than that of the current location, move. Otherwise, move with a probability equal to that of the candidate location divided by the current location. After the move place a counter in the location of the move and begin anew. After sufficient repetitions the distribution of counters will approximate the target distribution with certainty.

Figure 3.21: MCMC diagnostics. The two MCMC simulations approximate to the same dis-tribution, however, the chains of the left simulations cluster together. In practice they would produce a poor description of the parameter with over-represented low values. Compare to the much better mixed chains on the right.

In document Improving the 14C dating of south-west Scottish wetland sites (Page 93-97)