Assuming separability, we need only specify the diagonal elements θi of Θ, and
therefore may write
ρ (x, x0) = exp " − p X i=1 θi(xi− x0i) 2 # .
A further simplification is to make the function isotropic, by making all the correla- tion lengths equal, θi = θ ∀i. This is only viable if the input data have been rescaled
so that the range of each input is the same. This extends the stationarity property by asserting that the correlation between errors at two points depends only on the magnitude of the distance between them, and not the direction.
Having decided upon a correlation function, we must attempt to deal with the correlation hyperparameters θi. Although technically in a Bayesian framework they
are unknown parameters, they are often treated as fixed, and so the problem becomes one of estimation. Finding an appropriate value is crucial to accurate prediction; a small value will decrease the predictive variance, and the emulator may be overly confident, whereas a large one will cause the emulator to be too uncertain. The values can be validated by checking the performance of the emulator as a predictor, using simulator data that wasn’t used to build the emulator (O’Hagan, 2006). A common approach, and the one used predominantly in this thesis, is to use maxi- mum marginal likelihood estimation, where β and σ2
are integrated out in order to
maximise the log-likelihood over θ. Diagnostics can be used to flag up problems aris- ing from poorly chosen functions or misplaced assumptions (Bastos and O’Hagan, 2009), and these will be explored in Section 3.5.
3.4
Limitations of this approach
Emulation as described above is not automatically a good choice for any particular simulator. The model relies on assumptions which are not always appropriate.
Firstly, the model described in this chapter is only suitable if the simulator’s output is continuous everywhere in the input space, or if the regression surface can be chosen such that it perfectly captures any discontinuities. If this is not the case, the training data may combine with these wrong assumptions to badly damage the
3.4. Limitations of this approach 33
emulator’s predictions. It is wise to ask a simulator expert if they expect the output to be continuous, and whether given the output at a point x, they would expect to be informed about the output at a very nearby point x0 (Oakley, 2002).
The posterior distribution s (˜x) | s (x) for a Gaussian process emulator is deter- mined by the choice of prior p (β, σ2
). As we have mentioned, this choice is usually
biased towards convenience rather than an accurate representation of beliefs. It is rare that one sees a choice other than the conjugate Normal Inverse-Gamma or its weak form shown in Equation 3.2. Although neither of these will ever be correct, they have the advantage of being conjugate, and therefore leading to relatively simple computations. Expecting a simulator expert to specify ‘the best’ prior distribution for β and σ2
, without limiting him to such a family, would be quite unreasonable.
In using the weak and non-informative prior,
p β, σ2 ∝ 1 σ2
, (3.2)
the model asserts that we have no information at all about the coefficients β, and this is unlikely to be true. The Normal Inverse-Gamma prior allows the user to specify some information, even though it may have a high variance. This is usually done using a combination of two methods. Firstly, one can pose questions about the behaviour of the simulator at various inputs to an expert, and find parameters that fit these, a process known as elicitation, explained in more detail by Oakley (2002). Secondly, one can use simulator data itself to estimate appropriate parameter values. Craig et al. (1997) combine these approaches in their case study. When simulator training data is scarce, the specification of the prior distribution is crucial to making good predictions. The elicitation approach is a demanding one however, and in the absence of a dedicated expert and the presence of many simulator runs, a non- informative prior is a pragmatic choice.
The choice of correlation function further constrains the model, and is another source of contention. The Gaussian correlation function is often criticised for being too smooth. For example Rougier (2009) prefers the Mat´ern class, even though this leads to less tractable results. Constraining the correlated error to be separable and even isotropic is another potentially inappropriate simplification. It may be that in order to capture the behaviour of the simulator, off-diagonal terms must be included
3.4. Limitations of this approach 34
in the correlation matrix Σ, or there may at least be different levels of smoothness in different directions.
Although we have emphasised the emulator’s efficiency compared to the simu- lator’s, it is limited in one way that the simulator is not. To build an emulator requires the inverse or the factorisation of the correlation matrix Σ (x). For training data containing n points, this is an n × n positive definite matrix. Rather than a straightforward inverse function, such as R’s ‘solve’, the Cholesky decomposition can therefore be used, which improves stability and increases the number of points that can be handled. Even so, this operation limits the amount of training data an emulator can handle, and can still lead to numerical instability.
Kaufman et al. (2011) propose the use of a correlation function with finite sup- port, such that for points sufficiently far apart the correlation function is zero. This makes the correlation matrix Σ (x) sparse, and drastically increases the capacity of the emulator, through the use of sparse matrix techniques.
In general, building an emulator requires the arrangement and monitoring of a large number of quantities, and of the modelling choices made at each step. By the time one comes to using an emulator for prediction of a simulator’s behaviour at some new points, the original training data, regression and correlation length specifications and so on could easily have become confused, or have been lost. Although these issues can be avoided by careful organisation, they are still real. With this in mind, we present an object-oriented framework for emulation, in Chapter 7, which enforces a tight structure on the entire emulation process. This framework also brings benefits in computational savings and in ease of adaptation. Indeed, once the core framework has been introduced, which fits around the techniques in this chapter, it will be extended to include the methods presented in the later chapters. Any of the issues raised in this section would be rich areas for study. In this thesis however, the focus is on developing new and fairly general frameworks for emulating multiple simulators, rather than on building the best possible emulators for a particular setting. The methods developed in Chapters 5 and 6 can be used with any of the modelling choices described in this chapter, and so the choices made to illustrate them will often be fairly simple and pragmatic ones.