• No results found

Literature review

1.2 Bayesian inference

1.2.4 MCMC in practice: software

A separate but related issue to diagnosing mixing is writing MCMC code. Even if one designs a mathematically correct and well tuned MCMC algorithm, buggy MCMC code can generate samples that are not from the desired distribution. In this section we give an overview of some methods for testing MCMC code (see [25] for a overview of such methods).

There are several difficulties with testing MCMC code:

• The algorithm is stochastic, so there is no single ”correct” output. The output is rather a distribution that is unknown before writing the software (if it were known then the software would not be necessary)

• The software may work in simple cases but fail in more complicated cases. So testing the software on a simple distribution may not find a bug that only appears when sampling from more complicated distriutions.

A first thing to keep in mind is to write modular code; namely to structure code into re-usable and loosely coupled blocks. For example in an object oriented paradigm, one might create a move class which can propose samples and do the Metropolis Hastings accept reject step, a backend class which stores previous samples and regularly saves

1.2. Bayesian inference 61 them to file, and a sampler class which runs the main MCMC loop. The benefits of a modular structure is that is encourages the programmer to think about the structure of the program and to create re-usable components, which means that the components are easier to test.

We distinguish two types of tests: unit tests and integration tests ([25]). Unit tests are about testing small chunks of code such as individual functions and checking that they have the required functionality. Integration tests are about testing the entirety of the software without reference to individual parts. Writing tests simultaneously to writ-ing code also encourages the programmer to explicitly state what the expected output should be as well as consider under what conditions the code could fail. Very similarly to MCMC diagnostics, code with comprehensive tests allow the practitioner (and any user of the software) to trust the outputs.

We now present a non-exhaustive list of tests one could write (besides unit tests) for MCMC code:

• Test the software against analytically known solutions. For example, one can use the MCMC sampler to sample from a Gaussian (both 1 dimensional and higher) and to check the marginal distributions along with the empirical covariance ma-trix of the samples. This also applies to deterministic software, such as PDE solvers: one can test a solver for a PDE against initial conditions with known solution.

• For a Gibbs sampler (as described in [25]), one can check that the conditional distributions are consistent with the joint distributions. Namely, for any two sam-ples x and x0one can check that the following equality holds: p(xp(x|z)0|z)= p(xp(x,z)0,z). For most models, if the formula used in the conditional distribution is wrong, then this equality will not hold.

• For any MCMC sampler one can run the following test (sometimes called the prior reproduction test): sample from the prior θ0∼ π0, generate data using this prior sample X ∼ p(X |θ0), and run the to-be-tested sampler long enough to get an independent sample from the posterior θp∼ π(θ |X). The sample from the pos-terior should be distributed according the prior (derivation in theorem (A.0.1) in the appendix); one can repeat this procedure to obtain many samples θp and test

whether they are distributed according to the prior. The authors in [14] suggest a slight variation on this idea: at each replication they sample from the posterior Ltimes (and obtain the samples (θ1, θ2, ..., θL)) and estimate ˆq=L1L1Iθ01. By repeating this test many times (thus obtaining many realisations of ˆq), they check that ˆqis uniformly distributed. For whichever of the two versions one chooses, this type of test has a high coverage (ie: it tests many aspects of the software at once) but can be very computationally expensive to run, as one needs to run the sampler on the posterior for each prior sample θ0. Furthermore, if the MCMC sampler needs extensive hand tuning then it is not practical to get enough sam-ples θp to reliably assess their sampling distribution. One can however mitigate this problem by running the test for blocks of parameters (ie: conditional distri-butions) rather than for the entire posterior at once.

• In some cases, simply sampling from the prior using the to-be-tested sampler can be a powerful test. This can be very easy to implement (simply remove the likelihood from the posterior) and can run very quickly. If this method is prac-tical for the sampling problem, it can test almost the entire framework (with the exception of the likelihood); for example it can test that the Hastings corrections in the proposals were correctly calculated and implemented. It might however not be applicable in some cases, for example where the prior is not proper or where sampling from the prior is qualitatively different than sampling from the posterior. In the latter case, one might have to implement a different proposal which slightly defeats the purpose of testing the code. However in cases such as the Bayesian inverse problem considered in this thesis, this test is very easy to perform and is almost as powerful as the prior reproduction test (which in this case is impractical to do on the entire posterior).

Similarly to MCMC diagnostics, these tests do not ”prove” that a piece of software works correctly, they rather increase our confidence in it. Furthermore, the amount and variety of automated tests must be roughly proportional to the complexity of the code base.