The Bayesian Trans-dimensional Approach - Advances in Trans dimensional Geophysical Inference

4.2 Method

4.2.2 The Bayesian Trans-dimensional Approach

The approach in this chapter uses Bayesian inference to assess probability density func- tions (PDFs) on model parameters representing conductivities of the subsurface. From these empirical PDFs, inferences on models of likely structure can be obtained using expected values, medians or modes. An additional beneﬁt of this approach though is in being able to estimate uncertainties and non-uniqueness by examining the spread of the ensemble at each point of the model. The Bayesian approach [Brooks et al., 2011, Gelman et al., 2004] uses Bayes theorem,

p(m|d) = p(m)p(d|m)

p(d) , (4.1)

where m is the vector of M model parameters, d is the N observed data, p(m) is

independent prior information on the model parameters (e.g. physical constraints

on the range of plausible conductivities), p(d|m) is the likelihood function and p(d)

is a normalising term commonly referred to as the evidence. Since the time domain AEM problem involves a non-linear forward model, accurately estimating the evidence normalisation term is not feasible analytically, although numerical approximations are available [Skilling, 2006]. Fortunately, relative inferences without computing this term are suﬃcient for both model inference and uncertainty estimates.

In time domain AEM, the observations at each point consist of a response curve(s) representing the observed response of the secondary ﬁeld from conducting bodies be- neath the surface, an example of which is shown later. Under the assumption that a

Gaussian noise model accurately approximates the noise resulting from measurement and theory error, the likelihood function can be written as

p(d_i_|m) =p 1 (2π)j_|_C r| exp § −1 2(G(m)i−di) T_C−1 r (G(m)i−di) ª , (4.2)

where d_i is the ith AEM sounding along the ﬂight path, j is the number of time

windows in the sounding,G(m)_i is the predicted response as a function of the model

parametersm, andC_r is the covariance matrix representing the potentially correlated

noise on the data.

Markov chain Monte Carlo (McMC) techniques are used to generate samples that con- verge to the target distribution, in this case, the posterior probability density (PPD),

given by p(m|d)in (4.1). This is an iterative approach that perturbs the current model

by sampling a proposal density functionQ(m_→m′₎_{to generate a new candidate model}

m′. The new model is accepted, that is becomes the current model in the chain, or re-

jected meaning the previous model is retained, according to the Metropolis-Hastings [Metropolis et al., 1953, Hastings, 1970] probability rule

α(m_→m′) =min 1, p(m′) p(m) p(d_|m′₎ p(d|m) Q(m′→m) Q(m→m′₎ . (4.3)

The acceptance probability terms ensure correct convergence to sampling the posterior by maintaining “detailed balance” of the Markov chain(s) [Brooks et al., 2011]. The more general Metropolis-Hastings-Green [Green, 1995] acceptance criteria that includes model dimension changes is

α(m→m′) =min 1, p(m ′₎ p(m) p(d|m′) p(d_|m) Q(m′_→_m₎ Q(m_→m′₎|J | , (4.4)

where nowm′_{may contain a diﬀerent number of unknowns than}_m_{, and the additional}

term|J |is the determinant of the Jacobian that represents the variable transformations

Following Hawkins and Sambridge [2015] and Chapter 3, the trans-dimensional tree approach is used with a wavelet parameterisation to represent the image based model.

In this approach, the model m consists of a hierarchy of wavelet coeﬃcients, from

coarse scale to fine, that are trans-dimensionally sampled to reconstruct the subsurface distribution of conductivity. The benefits of this approach compared to simply sampling over all pixels is that the parameterisation can adapt to different scale length features. This in turn results in better constraint on the parameters of the inversion and more robust estimates of the uncertainty. Earlier trans-dimensional approaches exist that parameterise 2D regions of interest in terms of Voronoi cells, however the trans- dimensional tree approach with wavelet basis has been demonstrated to be more efficient for geophysical imaging problems, both in terms of computational time and convergence rates. The choice of wavelet basis also leverages the innate ability of wavelets to decorrelate and compress images meaning complex subsurface features can be represented with fewer parameters.

The operation of the trans-dimensional tree is briefly recapitulated here with full details in Hawkins and Sambridge [2015] and Chapter 3. In Figure 4.2, an example abstract tree of wavelet coefficients is shown on the left. In this schematic of the tree, active coefficients are shown as solid dots. Inactive nodes equate to having the corresponding wavelet coefficient set to zero. From top to bottom, each level corresponds to pro- gressively finer structure. Through the application of the inverse wavelet transform using a chosen wavelet basis, this hierarchy of wavelet coefficients can be mapped into a conductivity image shown on the right. For the simulation studies presented, the bi-orthogonal wavelet basis commonly referred to as CDF 9/7 [Cohen et al., 1992] is used which provides good compression of information, as evidenced by its use in the JPEG 2000 image compression standard [Unser and Blu, 2003]. The image constructed from the model of wavelet coefficients is then used by the forward model to generate synthetic response curve predictions. These predictions are then compared with the observations in the likelihood function (4.2).

Figure 4.2: A cartoon illustration of the Trans-dimensional Tree method with wavelet parameterisation of the sub-surface conductivity. On the left the sub-surface conductivity is represented abstractly as a hierarchy of wavelet coefficients with different scale lengths. On the right is shown a corresponding sub-surface conductivity image illustrating how the trans-dimensional tree approach can adapt local to varying scale lengths of heterogeneity.

At each step of the Markov chain, the randomly selected perturbations of the current model are: add a new wavelet coefficient to the tree, remove a wavelet coefficient from the tree, or change the value of an existing wavelet coefficient. The probability of adding a new wavelet coefficient is set to the same as that of removing a wavelet coefficient to maintain detailed balance. In the general case, the starting model and the chain of models during convergence are often poor fits to data and are discarded as part of the “burnin” process. The remaining “chain” of candidate models then forms the ensemble from inferences can be made.

Common problems in sampling algorithms are poor convergence due to poor tuning of proposal distributions, sampling local minima due to non-linear effects and the re- lated problem of the difficulty of sampling multi-modal posterior distributions. To overcome these issues, Parallel Tempering [Earl and Deem, 2005, Dosso et al., 2012, Sambridge, 2014] is used to more effectively explore the posterior space during inversion. In this approach, multiple Markov chains are run at different temperatures which reduce the influence of the likelihood in the modified acceptance criteria

α(m_→m′) =min  1, p(m′) p(m) p(d|m′) p(d|m) _T1 Q(m′→m) Q(m→m′₎|J |  , (4.5)

whereT is the temperature. A set of logarithmically spaced temperatures with multi-

ple chains at each temperature are run with statistical information collected from the set of Markov chains at a temperature of one. At higher temperatures, the influence of the likelihood ratio is reduced and this allows the high temperature chains to more actively explore the prior space. Periodically, model exchanges are attempted between chains at different temperatures which allows sharing of information about posterior regions of interest between chains. This results in better sampling of non-linear problems and more robust and effective sampling of the entire prior space to give greater confidence in the final results, that is a local minimum or a single modality in a multi- modal posterior does not bias the results. Similar probabilistic Bayesian approaches have previously been reported [Rosas-Carbajal et al., 2014, Hauser et al., 2015], how-

ever in trans-dimensional sampling the observations are used to adapt the resolution

as required instead of a priori ﬁxed a global correlation length. Additionally, paral-

lel independent Markov chains with parallel tempering are used in this study to more thoroughly explore the range of possible solutions.

4.3 Application to Broken Hill Managed Aquifer

In document Advances in Trans dimensional Geophysical Inference (Page 175-180)