Emulation of multivariate simulators using thin-plate splines with application to atmospheric dispersion

(1)

Thin-plate spline emulation of multivariate simulators with

application to atmospheric dispersion

Veronica E. Bowman1 _{and David C.Woods}2∗ 1_{Defence Science and Technology Laboratory}

2_{University of Southampton}

It is often desirable to build a statistical emulator of a complex computer simulator in order to perform analysis which would otherwise be computationally infeasible. We propose methodology to model multivariate output from a computer simulator taking into account spatial structure in the responses. The utility of this approach is demonstrated by applying it to a chemical and biological hazard prediction model. Predicting the hazard area which results from an accidental or deliberate chemical or biological release is imperative in civil and military planning and also in emergency re-sponse. The hazard area resulting from such a release is highly structured in space and we therefore propose the use of a thin-plate spline to capture the spatial structure and fit a Gaussian process emulator to the coefficients of the resultant basis functions. We compare and contrast three different techniques for emulating multivariate output: a fully Bayesian approach with a principal component basis, a fully Bayesian approach with a thin-plate spline basis, assuming that the basis coefficients are independent and a “plug-in” Bayesian approach with a thin-plate spline basis and a separable covariance structure. We develop methodology for these latter two emulators and demonstrate that a thin-plate spline emulator significantly outperforms the principal component emulator. Further, the separable emulator, which accounts for the dependence between basis coefficients, provides substan-tially more realistic quantification of uncertainty, and is also computationally more tractable, allowing almost instant emulation.

AMS 2000 subject classifications: Primary 62G08, 62H25; secondary 62P12.

Keywords and phrases: Dimension reduction; Gaussian process; Multivariate regression; Principal components; Separable emulator.

1. Introduction

The simulation of scientific and engineering systems via complex mathematical models has become a common method of gaining knowledge about processes where physical experimenta-tion is infeasible or unaffordable. Encapsulated in computer codes or simulators, many of these models require substantial computing time to evaluate the response for a given set of inputs. For even moderately expensive simulators, the computational resources required to perform, for example, Monte Carlo inference may be prohibitive in practice. Hence, building an emulator or surrogate for the computer model, trained on a, usually small, set of simulator evaluations, has become standard practice; see for example, the seminal paper of Sacks et al. (1989), Kennedy et al. (2005), who presented a number of case studies of such computer experiments, and the book-length treatments of Santner et al. (2003) and Fang et al. (2006). Computationally cheap emulators allow for real-time decision making and greater scientific understanding, for example through sensitivity and uncertainty analyses.

∗_{Address for correspondence: David Woods, Southampton Statistical Sciences Research Institute, University}

(2)

Statistical modelling is a common method for constructing emulators. Essentially, simula-tor output is treated as a realisation of a stochastic process, and regression models are fitted to the data in order to approximate the relationship between the simulator inputs and the outputs. The most common emulator is the Gaussian process (see, for example, Rasmussen and Williams, 2006), a smooth non-parametric interpolator. An emulator based on a statis-tical model allows for prediction of the simulator at untested inputs and quantification of the associated uncertainty. For deterministic simulators, as considered in this paper, this uncer-tainty is a result of incomplete knowledge of the simulator across the whole input space and approximation error from the regression model.

Increasingly, modern applications involve highly multivariate simulators, with each run of the simulator producing data from, for example, a curve, surface or other high-dimensional structure. The standard approach is to perform dimension reduction on the multivariate output using a set of appropriate basis functions and then use scalar emulation methods, such as the Gaussian process, on the basis coefficients. We delay our discussion of the related statistical literature to Section 2.

Motivated by a simulator of chemical and biological dispersion, the contribution of this paper is to propose the use of thin-plate splines as basis functions for two-dimensional multivariate data with spatial structure, and to develop the necessary methodology for their application. For a two-dimensional output, a thin-plate spline basis provides a spatial mapping using the proximity of data to a set of knots (see Section 3). We implement both a fully Bayesian thin-plate spline emulator using Markov Chain Monte Carlo and also a “plug-in” emulator (see, for example, Kennedy and O’Hagan, 2001) with a separable covariance structure (Rougier, 2008) and correlation parameters estimated using a validation data set. We provide a detailed comparison of these competing methodologies.

Dispersion simulators have widespread application in environmental monitoring, civilian emergency planning and military applications, for example in the protection against terrorism threats. Available simulators range from quite simple Gaussian plume models (e.g. Clarke, 1979), through Gaussian puff models (e.g. Sykes et al., 1998) to computationally expensive Lagrangian models (e.g. Jones et al., 2007).

In this paper, the problem of emulating a multivariate simulator built from a Gaussian puff model is considered. For each run of the simulator, the response of interest is the dosage, defined as the integrated concentration over time, measured at a large number of points in a two-dimensional geographical domain. Inputs to the simulator include meteorological variables (such as wind speed and direction) and variables related to the source of the dispersion (such as location and size of release). Although this simulator is relatively fast (∼ 1 minute per run), this can still be too slow for tasks which require a large number of evaluations in a limited time. The motivation for this work is the eventual need to replace evaluations of the simulator in a sensor placement algorithm.

Previously, a number of authors have considered the univariate problem of calibrating (rel-atively simple) dispersion models using actual dispersion data obtained under a single, but uncertain, set of meteorological and source conditions; see, for example, Smith and French (1993), Kennedy and O’Hagan (2001), Politis and Robertson (2004) and Robins et al. (2009). Typically, the aim of such work is to solve the inverse problem of identifying unknown features

(3)

of the source.

The remainder of this paper is organised as follows. In Section 2, we introduce and describe the multivariate emulator, dimension reduction techniques for multivariate outputs and separa-ble covariance structures. The necessary methods for thin-plate spline emulation are developed in Section 3, and applied to an illustrative dispersion model in Section 4. In this section, we also compare our methodology to applying dimension reduction via principal components. In Section 5, we conclude with some discussion and areas for future research. To aid the reader, Appendix A contains a full of list of the nomenclature developed in the following sections. Other appendices provide mathematical and computational details of the methods.

2. Multivariate emulation

Let xi = (x1i, . . . , xq1i) be the vector of input values at which the ith run of the simulator

is performed, i.e. the ith input point (i = 1, . . . , n), and let Yi = (Y1(s1), . . . , Yr(sr))T be

the vectorised output from this run. The vector sj = (s1j, . . . , sq2j) locates the jth output in

the q2 dimensional output domain. For example, for a two-dimensional simulator, q2 = 2 and

sj = (sij, s2j), which may be the geographical coordinates of the response.

Dimension reduction is obtained through assuming, for each output vector, the linear model

Yi = p

X

k=1

ak(s)βk(xi) + ei. (1)

In (1), a1(s), . . . , ap(s) are a set of r × 1 basis vectors which are assumed independent of

xi but which may depend on the indexes s = (sT1, . . . , sTr)T. The corresponding coefficients

β1(xi), . . . , βk(xi) may depend on the inputs xi, and ei is a r-vector of errors resulting from the

basis function approximation. Let β(xi) = (β1(xi), . . . , βp(xi))T. Assuming Yi has been

mean-centered and standardized to have variance equal to one, we then complete the hi-erarchical specification of the model through choice of the following prior distributions for β = (β(x1)T, . . . , β(xn)T)T and e = (eT1, . . . , eTn):

β|C ∼ N (0np, C) , (2)

and

e|σ2 ∼ N (0nr, Inrσ2) , (3)

with C a np × np covariance matrix, Inr the nr × nr identity matrix and hyper-parameter σ−2

given a gamma prior distribution with density π(σ−2) ∝ σ2−2aσ_e1/bσσ2_.

Similar models have been developed and applied by a variety of authors, usually assuming ak(s) = ak for all k = 1, . . . , p. Campbell et al. (2006) considered a variety of basis functions

for one-dimensional functional responses, including orthogonal polynomials and data-adaptive choices such as principal components and partial least squares. A wavelet basis was used by Bayarri et al. (2007) in a calibration problem with a functional response. Higdon et al. (2008) used a principal components basis for an example of cylinder deformation from the Manhattan

(4)

project. These latter authors had a two-dimensional functional response, dependent on angle and time, and were again concerned with model calibration.

Principal Components Analysis (PCA; Jolliffe, 2002) is a data-reduction technique that has been commonly used in the literature for defining a new, orthogonal basis for a set of multivariate data. For a r × n matrix Y = [Y1, . . . , Yn], the principal components are defined

through the singular value decomposition of Y = U DVT, where U is a r × n orthogonal matrix, D is a n×n diagonal matrix holding the singular values and V is a n×n orthonormal matrix. A p dimensional principal component basis a1, . . . , ap is given by the first p columns of n−1/2U D,

with weights βk(xi) given by entry (k, i) in n1/2V . The data-dependent nature of the principal

components provides a flexible non-parametric modelling approach; the basis functions are orthogonal and have the property of defining subspaces with the largest variance, see Hastie et al. (2009, ch.3).

For the specification of the covariance matrix, C, for β, we consider two simplifying cases.

(i) Model parameters βj(xu) and βk(xv) are assumed independent for j 6= k (j, k = 1, . . . , p; u, v =

1, . . . , n). That is, C is a block diagonal matrix with the ith n × n block having uvth entry defined via a stationary and isotropic correlation function

Ciuv = Wir(x)uvτi. (4)

Here, W_ir(x) is defined as an n×n between-run correlation matrix for β_k = (βk(x1), . . . , βk(xn)),

with uvth entry ρ(xu, xv; θi) and the τiare scale parameters, with each τi−1given a gamma

prior distribution. We define θi = (θ1i, . . . , θq1i) as a vector of parameters controlling the

correlation.

(ii) The covariance matrix C = W τ , with W a correlation matrix assumed to have a separable structure (Rougier, 2008). That is, W = Ws(s) ⊗ Wr(x). Here, Ws(s) is a p × p within-run correlation matrix common to the β(xi), x = (xT1, . . . , xTn)T and Wr(x) is

an n × n between-run correlation matrix, defined in (4), but under the assumption that W₁r(x) = . . . = W_pr(x), θ1 = . . . = θp and τ1 = . . . = τp. We discuss the choice of Ws(s)

for our thin-plate spline emulators in Section 3.

In this paper, we use the Gaussian correlation function

ρ(xu, xv; θ) = q1 Y k=1 θ4(xku−xkv)2 k . (5)

A fully Bayesian approach requires a prior distribution for each θk and we assume common

beta prior densities with π(θk) ∝ θaθ −1

k (1 − θk)bθ−1.

Point prediction of the response at a new input point, x?, is via

ˆ Y? = p X k=1 ak(s) ˆβk(x?) , (6)

(5)

where ˆβk(x?) is an appropriate summary of the posterior distribution of βk(x?). There is

no conjugate prior distribution available for unknown θ and so to obtain the full marginal distribution, Markov Chain Monte Carlo (MCMC; O’Hagan and Forster, 2004, ch. 10) must be used or a maximum a posteriori (MAP), or other estimator, of θ can be substituted into the conditional posterior distribution of βk(x∗).

3. Thin-plate spline emulators

To provide a set of flexible and data-driven basis vectors a1(s), . . . , ap(s) that maintain the

multidimensional spatial structure inherent in s, we use thin-plate regression splines; see Wood (2003). Define the r × r matrix E to have uvth entry ηlq2(||su − sv||), with || · || defined as

Euclidean distance and

ηlq2(t) =      (−1)l+1+q2/2 22l−1_π_q2/2_{(l−1)!(l−q} 2/2)!t 2l−q2_{log(t) for q} 2 even , Γ(q2/2−l) 22l_πq2/2(l−1)!t 2l−q2 _{for q} 2 odd .

A thin plate spline for the ith simulator run is then the solution to

minimize ||Yi− Eδi− T αi||2+ λiδTi Eδi subject to TTδi = 0 (7)

with respect to δi and αi for λi ≥ 0 (i = 1, . . . , n). The matrix T holds basis vectors

corre-sponding to orthogonal polynomials in Rq2 _{of degree less than d.}

To avoid problems of choosing “knot locations” in selecting a regression basis, Wood (2003) defined a thin plate regression spline basis as a rank p1 approximation to the spline, with basis

vectors given by the columns of Up1Dp1Zp1 and T , where Dp1 is a p1 × p1 diagonal matrix

holding the p1 largest eigenvalues of E ordered by absolute value, Up1 is an r × p1 matrix

holding the corresponding eigenvectors and Zp1 is a p1 dimensional orthogonal column basis

such that TT_U

p1Zp1 = 0. That is, an approximation to problem (7) with λi = 0 is given by

minimize ||Yi− Up1Dp1Zp1βp1(xi) − T βp−p1(xi)||

2_,

where δi = Up1Zp1βp1(xi), αi = βp−p1(xi), β(xi) = (βp1(xi)

T_{, β}

p−p1(xi)

T₎T _{and the condition}

TT_U

p1Zp1 = 0 ensures T

T_δ

i = 0. Hence, we take the first p1 basis vectors, a1(s), . . . , ap1(s),

to be the columns of Up1Dp1Zp1, and the second p − p1 basis vectors, ap1+1(s), . . . , ap(s), to be

the columns of T .

To define the common prior distribution on β(xi), we use the fact that βp1(xi) = Z

T p1U

T p1δi

and define the normal prior distribution δi ∼ N (0, Qτ ) with Q having uvth entry ρ(su, sv; ν),

with ρ(·) defined by (5) and correlation parameters ν = (ν1, . . . , νq2) (u, v = 1, . . . , r).

There-fore, we have the prior β_p₁(xi) ∼ N (0, ZpT1U

T

p1QUp1Zp1τ ). Further, we assume βp1(xi) and

(6)

(a) Perspective Plot x Coordinate y Coordinate 0 20 40 60 80 100 120 0 20 40 60 80 100 120 (b) Contour Plot

Figure 1: A Typical Dosage Output from the Dispersion Model Output on a Logged Scale

4. Application to the dispersion simulator

In this section, the methodology from Sections 2 and 3 is applied to an exemplar dispersion simulator. As described in Section 1, the response of interest is dosage (integrated concentration over-time). For hazard prediction, final dosage after a release is the main output of interest. The methodology could also be applied to emulate concentration at a given time or extended to include time as an extra dimension to either the input or output (cf Conti and O’Hagan, 2010).

Atmospheric dispersion is affected by both the local geography and meteorology of the studied area. Hence, dispersion simulators in general have a variety of inputs associated with both these classes of variables. Complex spatial features, both natural (e.g. hills, valleys or coasts) and man-made (e.g. buildings) can substantially affect the impact of meteorology. However, simulations involving complex geography have only limited applicability (i.e to that terrain) and can obscure general trends in dispersion due to changing meteorology. Hence, there is interest in investigating dispersion simulators on flat, non-urban, inland terrain, and we demonstrate our methodology on such a scenario.

In addition, there are numerous other inputs to a dispersion emulator, including those related to the release (see, for example, Bowman and Woods, 2012). For this application, the input space is limited to the q1 = 2 dimensions which, from previous experience, are known to

have most affect on dosage: mass of the release and wind speed. For a flat terrain scenario, it is standard to specify other variables known to affect dosage, such as wind direction and location of release, post-emulation through rotation and translation operations. Example output from one run of the simulator considered is given in Figure 1.

For the ith run, dosage is output on a q2 = 2 dimensional k × k grid (i = 1, . . . , n); here,

the grid is common to all runs. Hence, sj = (s11, s21) denotes the (standardised) geographical

location of the calculated dosages. The vector Yi holds the r = k2 dosages for the ith run; we

in fact model log (Yi + 1). The simulator input values are held in xi = (x1i, x2i).

We apply and compare three emulation approaches, following Section 2:

(7)

et al., 2008).

2. A fully Bayesian approach using a thin plate spline basis (Wood, 2003) and assuming independence of the elements of β(xi) (Independent TPS emulator).

3. A “plug-in” Bayesian approach using a thin plate spline basis and assuming a separable covariance structure (Rougier, 2008) for β (Separable TPS emulator).

For emulators 1 and 2, the posterior predictive distribution for Y is obtained using MCMC with convergence assessed graphically using diagnostic plots; a nugget term was added to improve the conditioning of the variance-covariance matrix of (2). For these emulators, we also assume Ws_{(s) = I}

p; that is, within-run the elements of β(xi) are assumed independent. While

for emulator 1 this assumption follows directly from the use of a PC basis, we acknowledge it may lead to overconfident prediction intervals and an over-estimate of the effective sample size for emulator 2. However, it substantially reduces the computational burden of the MCMC algorithm and provides a benchmark for the plug-in, separable emulator 3. We investigate these issues in the following subsections.

4.1. Design of the simulator runs

The emulators were trained using 80 simulator runs generated as a maximin Latin Hyper-cube sample (Morris and Mitchell, 1995). A further 75 simulator runs were generated randomly as a Monte Carlo sample, with 40 of these runs used for validation and choice of some prior hyper-parameters. The remaining 35 runs were used as test cases to assess emulator accuracy. All simulator runs were generated for a k = 8 grid of spatially uniform points.

4.2. PC Emulator

Bayesian inference for the PC emulator proceeded using MCMC and the likelihood derived by Higdon et al. (2008). Specification of the prior hyper-parameters for the common gamma distributions for each τ_i−1 is simplified by the standardization of the simulator output, which leads to an expectation of the variance of each β_i(x) being close to one. Hence, we choice an informative Gamma(5, 0.2) prior distribution, again following Higdon et al. (2008) (see Appendix E, Figure 9 for the parameterization of the gamma distribution being used). Other hyper-parameters, for the prior distributions for σ−2 in (3) and θi in (5), in addition to the

number of principal components, are chosen using the validation runs, in the spirit of empirical Bayes.

For a given number of principal components and values of the hyper-parameters, predictions for the 40 validation runs were obtained using the mean of the posterior predictive distribution for the βk(x). Figure 2 shows the mean squared error (MSE) for 45 different combinations

of (a) number of principal components; (b) hyper-parameters aθ and bθ, common to the beta

distributions assumed for θi; and (c) hyper-parameters aσ and bσ for the gamma distribution

assumed for σ−2. Hyper-parameters for this latter model parameter are treated as tuning constants, rather than being elicited from subject experts, as the response is deterministic and

(8)

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 10 20 30 40 13 14 15 16 Index MSE p=2 p=3 p=4 p=5 p=10 aθ bθ aσ bσ 1 0.1 2 0.001 2 0.01 ◦ 2 0.1 + 1 3 2 0.001 2 0.01 ◦ 2 0.1 + 2 5 2 0.001 2 0.01 ◦ 2 0.1 +

Figure 2: PC emulator: mean squared error (MSE) from the validation set of 40 runs for 45 combinations of numbers of principal components (PC) and hyper-parameter values. The dotted lines indicate breaks between the three different combinations of aθ, bθ, the common

hyper-parameters for θk. The different plotting symbols indicate different combinations of aσ

and bσ, the hyper-parameters for σ−2.

hence a non-zero σ2 _{only results from approximation error introduced by the truncated basis}

expansion.

In general, MSE is lowest for smaller numbers of principal components and lower values of bσ,

we choose aσ = 2 to define a gamma distribution with limz→0+f (z) = 0 and lim_z→0+f0(z) = b−2_σ ,

so that larger values of bσ result in higher prior density for larger values of σ−2; various shapes

of beta prior were tested and the results are fairly robust to the choices of aθ and bθ. The

probability density functions for both the beta and gamma distributions used throughout can be found in Appendix E. The upward trend in MSE for greater numbers of principal components is indicative of slight over fitting to the training data; lower values of bσ lead to greater prior

density for larger values of σ2_{, appropriate for low-order basis approximations.}

We choose hyper-parameters to give the minimum MSE: p = 3, aθ = 1, bθ = 3, aσ = 2

and bσ = 0.01; however, several similar parameter settings produce comparable sized errors.

The MCMC is run for 10000 iterations with a burn-in of 1000 which produces acceptable trace plots with no bias from starting location. No thinning is performed as the trace plots show the chains mixing well.

(9)

● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 9.3 9.4 9.5 9.6 9.7 Index MSE p=5 p=10 p=15 aθ bθ aσ bσ 1 0.1 2 0.01 ◦ 2 0.1 + 2 5 2 0.01 ◦ 2 0.1 + 5 1 2 0.01 ◦ 2 0.1 + 0.5 0.5 2 0.01 ◦ 2 0.1 +

Figure 3: Independent TPS emulator: mean squared error (MSE) from the validation set of 40 runs for 24 combinations of numbers of thin-plate spline basis functions and hyper-parameter values. The dotted lines indicate breaks between the three different combinations of aθ, bθ, the

common hyper-parameters for θi. The different plotting symbols indicate different combinations

of aσ and bσ, the hyper-parameters for σ−2.

4.3. Independent TPS Emulator

The Independent TPS Emulator is implemented in a similar way to the PC emulator but using a thin-plate spline basis as described in Section 3. We adapt the Higdon et al. (2008) methodology for principal components for these non-orthogonal basis functions (see Ap-pendix C). To reduce computational complexity, we apply the Woodbury matrix identity (see, for example, Graybill, 1969, Ch.8) to update the inverse and determinant of the expanded model matrix within the MCMC iterations. This replaces the inversion of a np × np square matrix with the inversion of a n × n matrix when sampling each of the nugget, τi and θi. An

outline of this implementation is given in Appendix D.

The values of prior hyper-parameters and numbers of thin-plate spline basis functions are determined as for the PC emulator, using the same validation runs. Choices for the hyper-parameters of the beta distributions were made to test robustness over a wide range of potential shapes of the density function; choice of the gamma parameters were made to skew towards the expected lower values without giving zero support to high values in order to avoid bias. See Appendix E for the densities used. The validation results are given in Figure 3.

In this case, the choice of the number of basis functions dominates the choice of any other parameter. The logged dispersion surface is relatively smooth between sample points and therefore large numbers of basis functions will lead to over-fitting.

(10)

The chosen combination of parameters for the Independent TPS emulator is at the minimum MSE, p = 5, aθ = 1, bθ = 0.1 , aσ = 2 and bσ = 0.01; however, several similar hyper-parameter

setting produce comparable MSE.

4.4. Separable TPS emulator

For the separable emulator, outlined in Section 2, estimates of the correlation parameters θ = (θ1, θ2), for the between-run correlation striation, and ν = (ν1, ν2), for the within-run

correlation structure, are plugged in to the conditional posterior density (see also Rougier, 2008, and Rougier et al., 2009).

These parameters are again chosen using the validation set, along with the hyper-parameters aτ and bτ for the gamma prior on τ . Numerical results (not shown) indicate that the MSE

is robust to the choice of aτ and bτ, and a Gamma(1,1) prior distribution was chosen, giving

near linearly decreasing support between 0 and 1. For each of p = 5, 10, 15 basis functions, the MSE was calculated for a four-dimensional grid of values for θ1, θ2, ν1, ν2, with each correlation

parameter varied between 0 and 0.3 in steps of 0.05. These ranges were chosen using information gained from the choice of prior hyper-parameters for θi for the Independent TPS emulator.

Results of these studies (not shown, due to their high dimensional nature) once again indicated that the number of basis functions is the most important tuning parameter. The minimum

0.5 0.5 1 1 1.5 1.5 2 2 2.5 3 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 (a) x−Location y−Location 0.5 1 1 1.5 1.5 2 2 2.5 2.5 3 3 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 (b) x−Location y−Location 0.5 0.5 1 1 1.5 1.5 2 2 2.5 2.5 3 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 (c) x−Location y−Location 0.5 0.5 1 1 1.5 1.5 2 2 2.5 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 (d) x−Location y−Location

Figure 4: Log dosage for one test run: simulator response (top left); PC emulator (top right), Independent TPS emulator (bottom left) and the posterior predictive means for Separable TPS emulator(bottom right).

(11)

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● SVD Independent TPS Seperable TPS 0 10 20 30 40 MSE

Figure 5: Mean squared errors for each of the PC, Independent TPS and Separable TPS emulators calculated using the posterior predictive mean across the test set.

MSE was obtained for p = 5 and θ = ν = (0.05, 0.05).

4.5. Comparison of Emulators

Each of the three emulators were applied to a set of 35 test runs and their performances evaluated. All three emulators produced a similar spatial structure to the true dispersion model, as is illustrated in the contour plots of the posterior predictive mean in Figure 4. However, the PC emulator has generally higher MSE than either TPS emulator across all of the test runs; see Figure 5. The median MSE for the PC emulator (13.68) is considerably greater than the upper quartile of either the Independent (5.22) or Separable TPS (3.42) emulators. The separable TPS emulator performs slightly better than the independent TPS emulator, with (lower quartiles of 1.81 for the separable emulator and 1.15 for the independent emulator and upper quartiles of 5.22 for the separable emulator and 3.42 for the independent emulator). This differences demonstrate the advantage of modeling the within-run correlation between basis functions.

4.6. Uncertainty quantification

The Bayesian approach to emulation allows the uncertainty associated with predictions of the simulator to be quantified and assessed. Realistic and appropriate uncertainty quan-tification is a key determinant of a good emulator. Prediction uncertainty for the PC and independent TPS emulators can be obtained from the MCMC samples. For the separable TPS emulator, we use (6) with the conditional posterior distribution for β and an additive error term corresponding to the approximation error (3). This latter term was estimated using the

(12)

0 10 20 30 40 50 60 −40 −35 −30 −25 −20 −15

Index of Sample Point

Logged Dosage V

alue

Figure 6: PC emulator: dosage (×), approximate posterior predictive mean (•) and intervals for the 64 spatial locations for one test run.

0 10 20 30 40 50 60 −40 −35 −30 −25 −20 −15 −10

Logged Dosage V

alue

Figure 7: Independent TPS emulator: dosage (×), approximate posterior predictive mean (•) and intervals for the 64 spatial locations for one test run.

mean squared error from the thin-plate spline approximation to the test data.

To demonstrate the predicted uncertainty from the three emulators of the dispersion sim-ulation, we present approximate prediction intervals (posterior predictive mean ±3 posterior predictive standard errors) for a single test run for each of the PC (Figure 6), Independent TPS (Figure 7) and Separable TPS (Figure 8) emulators; other test runs produce similar results. We also calculate the frequentist coverage of these prediction intervals using the test data. From

(13)

0 10 20 30 40 50 60

−40

−30

−20

−10

Logged Dosage V

alue

Figure 8: Separable TPS emulator: dosage (×), approximate posterior predictive mean (•) and intervals for the 64 spatial locations for one test run.

Figure 7, it is clear that excluding within-run correlation in the Independent TPS emulator has led to substantial under-estimation of the uncertainty (coverage of 28%), with almost all observed responses lying outside the, very narrow, prediction intervals. The MCMC chains had converged and sufficiently explored the parameter space, and hence the under-estimation of the uncertainty appears to be a consequence of modeling assumptions. The other emulators give more realistic assessment of uncertainty, with wider prediction intervals in the more complex (central) areas of the dosage plume and coverages of 83% (PC emulator) and 98% (independent TPS emulator).

5. Discussion and future work

We have discussed the emulation of multivariate simulators with two-dimensional spatial structure. Thin-plate splines were demonstrated to be an effective method for dimensional-reduction, that resulted in substantially increased prediction accuracy over the use of principal components. When a separable covariance structure is assumed for the basis coefficients, a computationally feasible emulator results that provides realistic measure of uncertainty. For the separable emulator, we have adopted a plug-in approach using the posterior predictive distribution conditional on the correlation parameters. Clearly, there is the potential for this approach to under-estimate the posterior uncertainty. However, the results in Section 4.6 suggest that it is more important to account for the within-run correlations caused by the non-orthogonal thin-plate spline basis functions.

For the dispersion problem, future work could include developing emulators with qualitative inputs (see, for example, Qian et al., 2008) particularly inputs such as release type relating to the source term, and building emulators for dynamic responses such as concentration over time.

(14)

More generally, further research is required into designing the computer experiments for these multivariate hierarchical problems, including the choice of the simulator inputs and also, in some applications, the selection of output domain locations.

Acknowledgments

This work was supported by the Defense Threat Reduction Agency. The first author was supported by a Dstl Associate Fellowship and the second author was partly supported by a Fellowship from the UK Engineering and Physical Sciences Research Council.

c

A. Nomenclature

n number of simulator runs.

p number of basis functions.

r the number of outputs per simulator run.

x q1-vector of input values.

Yi r-vector of responses from xi.

sj q2-vector giving the location of the jth

output in the q2-dimensional output domain.

β = (β(x1)T, . . . , β(xn)T)T np-vector of basis coefficients.

a(s) r × 1 basis vector.

ei r-vector of approximation errors due to the

basis expansion of Yi.

W np × np correlation matrix for the Gaussian

process prior for β.

τ scale of the Gaussian process prior for β.

σ2 _{scale of the error from the basis approximation.}

θi q1-vector of correlation lengths for the prior

on β(xi).

ν q2-vector of within-run correlation lengths.

B. Development of the conditional posterior density for the separable thin-plate spline emulator

For β|τ, W ∼ N (0np, W τ ), a mean-zero Gaussian process, and τ−1 ∼ Gamma(aτ, bτ), the

marginal density is

π(β|W ) ∝ 1 + βTW−1β/b−(np+a)/2 ,

a np-variate t density with aτ degrees of freedom, E(β|τ, W ) = 0, and Var(β|τ, W ) = _a_τbτ₋₂W

(see O’Hagan and Forster, 2004, ch. 11, and Rougier, 2008). The joint conditional poste-rior density for τ and the p-vector β? = (β1(x?), . . . , βp(x?))

T

can be written π(β?, τ |β) = π(β?|τ, β)π(τ |β), with the first density a normal distribution

(15)

β?|τ, β ∼ N (m, τ S) ,

with m = (W?₎T_W−1_{β, and S = W}??_{− (W}?₎T_W−1_W?_{. Here, W}? _{is the p × np matrix of}

correlations between β? and β, and W??is the p×p matrix of correlations between the elements of β?. The second density is

π(τ |β) ∝ τ−1−(aτ+np)/2_exp−(b

τ + βTW−1β)/2τ−1 .

That is, τ−1|β ∼ Gamma(aτ + np, (bτ + βTW−1β)−1).

Hence, for a separable correlation structure, point estimates of β? can be provided by

E(β?|τ, β) = (Ws_{⊗ W}r?₎T

W−s⊗ W−rβ ,

with Wr? the n-vector of correlations between x? and x. Similarly, uncertainty in β? can be assessed via

Var(β?|β, τ ) = τnW??− Ws⊗ (Wr?)TW−rWr? o

. (8)

A plug-in estimate of the uncertainty can be achieved by replacing τ in (8) with its posterior mean, E(τ |β) = bτ + βTW−1β /(aτ + np − 2).

C. Construction of the sampling distribution for the Independent TPS Simulator

As stated in equation (2),

β|C ∼ N (0np, C) , (9)

where C is controlled by parameter vectors τ and θ. We specify independent Gamma(aτ, bτ)

priors for each τi and independent beta(aθ, bθ) priors for each θi

π(τi) ∝ τiaτ−1exp

−τi/bτ _{i = 1 . . . pn ,}

π(θik) ∝ θaθ −1

ik (1 − θik)bθ−1 i = 1 . . . pn, k = 1 . . . q1p .

We expect the marginal variance to be close to one due to the standardisation of the simu-lator output, therefore following Higdon et al. (2008) we set aτ = 5, bτ = 0.2 and the values of

aθ and bθ are obtained using the validation data.

We define an nr vector ˜Y to be the concatenation of all n simulation output vectors

˜

Y = vec([Y1; . . . ; Yn]) .

Given the precision σ−2 of the errors the likelihood is then

L( ˜Y |β, σ2) ∝ σ−nrexp{−1 2σ

−2

(16)

where the nr × np matrix

A = [In⊗ a1; . . . ; In⊗ ap]

is constructed from the basis vectors ak. A Gamma(aσ, bσ) distribution is specified for the error

precision σ−2, with values of the hyperparameters found using the validation data. We then follow the factorization in Higdon et al. (2008),

L(χ|β, σ2) ∝ σ−npexp{−1 2σ −2 (β − ˆβ)T(ATA)(β − ˆβ)} × σ−n(r−p)exp{−1 2σ −2_˜ YT(I − A(ATA)−1AT) ˜Y } ,

to define a dimension reduced likelihood and a modified Gamma(a0_σ, b0_σ) prior for σ−2:

L( ˆβ|β, σ2) ∝ σ−npexp{−1 2σ −2 ( ˆβ − β)T(ATA)( ˆβ − β)} , a0_σ = aσ+ n(r − p) 2 , b0_σ = bσ+ 1 2Y˜ T (I − A(ATA)−1AT) ˜Y , ˆ β = (ATA)−1ATY .˜ Then the normal-gamma model

˜ Y |β, σ2 ∼ N (Aβ, σ−2Inr) , σ−2 ∼ Gamma(aσ, bσ) is equivalent to ˆ β|β, σ2 ∼ N (β, σ2_(AT_A)−1 ) , σ−2 ∼ Gamma(a0_σ, b0_σ) .

The likelihood depends on the simulator data only through ˆβ; therefore integrating out β with respect to its prior distribution (9) gives

π(σ−2, τ , θ| ˜Y ) ∝ |(σ−2ATA)−1+ C|−12 × exp{−1 2 ˆ βT([σ−2ATA)−1] + C])−1β} ×ˆ (σ−2)a0σ−1_exp−σ−2/b0σ× p Y i=1 τaτ−1 i exp −τi/bτ _× p Y i=1 ( _q₁ Y k=1 θaθ−1 ik (1 − θik)bθ−1 ) .

This posterior distribution is explored via MCMC using standard metropolis updates. The MCMC simulation is hampered by the inversion of the matrix

(17)

[(σ−2ATA)−1+ C] ,

which is of size np × np. However, only part of the matrix is updated at each step of the MCMC, therefore the matrix inversion is carried out once at the start of the chain and then updated via the method described in Appendix D. The MCMC is run for 1500 iterations with a burn-in of 250, which produced trace plots indicating convergence and good mixing.

D. Reduction of Computational Burden

Inversion of the np × np matrix [(σ−2AT_A)−1 _{+ C], see Appendix C, is computationally}

intensive when using a non-orthogonal basis such as a thin plate spline. To overcome this problem and make the MCMC updates feasible, we note that any update of the parameters τ , θ, or the nugget only change an n × n sub-matrix of the covariance matrix C. This sub-matrix depends on the parameter being updated, and hence care is required in locating the correct segment. Once the inverse of the whole matrix has been performed for the update of σ−2 the proceeding inverses can be computed using the Woodbury formula.

(D + P Q)−1 = D−1− D−1_{P (I + QD}−1_{P )}−1_QD−1_,

where D, P and Q all denote matrices of the correct size, and I is an identity matrix. For our problem, D = [(σ−2AT_A)−1_{+ C] and is size np × np, P is size np × n and Q is size n × np. In}

order to make use of this result, we need to find a P Q equal to the difference in the D matrix resulting from the update.

Let P =   0 P1 0  , Q = 0 Q1 0

where each 0 denotes a matrix of the correct dimension to position the change in the D matrix for the current update. Note that in the first updates, the initial 0 matrix will be of size 0 × 0 and in the last updates the final 0 matrix will be of size 0 × 0. Then

P Q =   0 0 0 0 P1Q1 0 0 0 0   .

We obtain P1Q1 by performing LU decomposition on the difference between the current D

matrix and the proposed D matrix, a simple np × np subtraction. Using block notation and remembering that D and therefore D−1 are symmetric, let

D−1 =   D₁₁−1 D−1₁₂ D₁₃−1 D−T₁₂ D−1₂₂ D₂₃−1 D−T₁₃ D₂₃−T D₃₃−1  .

(18)

Then QD−1P can be replaced by Q1D−122P1, and inversion of (I + QD−1P )−1 by inversion of

X = (I + Q1D−122P1)−1, an n × n matrix.

Finally simple block multiplication of the defined matrices, taking account of symmetry, results in (D + P Q)−1 = D−1−   D−1₁₁ D−1₁₂ D₁₃−1 D−T₁₂ D−1₂₂ D₂₃−1 D−T₁₃ D₂₃−T D₃₃−1     0 0 0 0 P1XQ1 0 0 0 0  ×   D−1₁₁ D−1₁₂ D₁₃−1 D−T₁₂ D−1₂₂ D₂₃−1 D−T₁₃ D₂₃−T D₃₃−1   = D−1−   D−1₁₂P1XQ1D12−T D −1 12P1XQ1D22−1 D −1 12P1XQ1D−123 D−1₂₂P1XQ1D12−T D −1 22P1XQ1D22−1 D −1 22P1XQ1D−123 D−T₂₃ P1XQ1D−T12 D −T 23 P1XQ1D−122 D −T 23 P1XQ1D23−1  .

This can be computed efficiently using bespoke multiplication routines. However, for large iteration cycles it is recommended that a full inversion be performed approximately once every 100 steps to ensure that small numerical errors do not appreciate. A similar derivation can be performed using the matrix determinant lemma (Harville, 1997).

E. Beta and Gamma Priors

0.0 0.2 0.4 0.6 0.8 1.0 0 1 2 3 4 5 x Beta(a,b) a=1, b=0.1 a=1,b=3 a=2, b=5 a=5, b=1 a=0.5, b=0.5 0.0 0.1 0.2 0.3 0.4 0 10 20 30 40 50 x G amma (c, d) c=2,d=0.001 c=2,d=0.01 c=2,d=0.1

Figure 9: Probability density functions for the Beta (top) and Gamma (bottom) distributions with the chosen hyper-parameter values; beta density f (z) ∝ za−1_{(1 − z)}b−1_{, z ∈ [0, 1], a, b > 0;}

(19)

References

Bayarri, M. J., Berger, J. O., Cafeo, J., Garcia-Donato, G., Liu, F., Palomo, J., Parthasarathy, R. J., Paulo, R., Sacks, J. and Walsh, D. (2007) Computer model validation with functional ouput. Annals of Statistics, 35, 1874–1906.

Bowman, V. E. and Woods, D. C. (2012) Weighted space-filling designs. Submitted for publi-cation.

Campbell, K., McKay, M. D. and Williams, B. J. (2006) Sensitivity analysis when model outputs are functions. Reliability Engineering and System Safety, 91, 1468–1472.

Clarke, R. H. (1979) The first report of a working group on atmospheric dispersion: A model for short and medium range dispersion of radionuclides released to the atmosphere. Tech. Rep. NRPB-R91, National Radiological Protection Board, Harwell.

Conti, S. and O’Hagan, A. (2010) Bayesian emulation of complex multi-output and dynamic computer models. Journal of Statistical Planning and Inference, 140, 640–651.

Fang, K., Li, R. and Sudjianto, A. (2006) Design and Modelling for Computer Experiments. Boca Raton: Chapman and Hall.

Graybill, F. A. (1969) Matrices with Applications in Statistics. Belmont, CA: Wadsworth. Harville, D. A. (1997) Matrix Algebra from a Statisticians Perspective. New York: Springer. Hastie, T., Tibshirani, R. and Friedman, J. H. (2009) Elements of Statistical Learning. New

York: Springer.

Higdon, D., Gattiker, J., Williams, B. J. and Rightly, M. (2008) Computer model validation using high-dimensional ouput. Journal of the American Statistical Association, 103, 570–583. Jolliffe, I. T. (2002) Principal Component Analysis. New York: Springer, 2nd edn.

Jones, A. R., Thomson, D. J., Hort, M. and Devenish, B. (2007) The U.K. Met Office’s next-generation atmospheric dispersion model, NAME III. In Air Pollution Modeling and its Application XVII (eds. C. Borrego and A.-L. Norman), 508–589. New York: Springer. Kennedy, M. C., Anderson, C. W. and O’Hagan, A. (2005) Case studies in Gaussian process

modelling of computer codes. In Sensitivity Analysis of Model Output, 476–485. Los Alamos National Laboratory.

Kennedy, M. C. and O’Hagan, A. (2001) Bayesian calibration of computer models (with dis-cussion). Journal of the Royal Statistical Society, B, 63, 425–464.

Morris, M. D. and Mitchell, T. J. (1995) Exploratory designs for computer experiments. Journal of Statistical Planning and Inference, 43, 381–402.

(20)

O’Hagan, A. and Forster, J. J. (2004) Bayesian Inference, vol. 2B of Kendall’s Advanced Theory of Statistics. London: Arnold, 2nd edn.

Politis, K. and Robertson, L. (2004) Bayesian updating of atmospheric dispersion after a nuclear accident. Journal of the Royal Statistical Society C, 53, 583–600.

Qian, P. Z. G., Wu, H. and Wu, C. F. J. (2008) Gaussian process models for computer experi-ments with qualitative and quantitative factors,. Technometrics, 50, 383–396.

Rasmussen, C. E. and Williams, C. K. I. (2006) Gaussian processes for machine learning. Cambridge, MA: MIT Press.

Robins, P., Rapley, V. E. and Green, N. (2009) Realtime sequential inference of static param-eters with expensive likelihood calculations. Journal of the Royal Statistical Society C, 58, 641–662.

Rougier, J. C. (2008) Efficient emulators for multivariate deterministic functions. Journal of Computational and Graphical Statistics, 17, 827–843.

Rougier, J. C., Guillas, S., Maute, A. and Richmond, A. D. (2009) Expert knowledge and multivariate emulation: the thermosphere-ionosphere electrodynamoics general circulation model (tie-gcm). Technometrics, 51, 414–424.

Sacks, J., Welch, W. J., Mitchell, T. J. and Wynn, H. P. (1989) Design and analysis of computer experiments (with discussion). Statistical Science, 4, 409–435.

Santner, T. J., Williams, B. J. and Notz, W. I. (2003) The Design and Analysis of Computer Experiments. New York: Springer.

Smith, J. and French, S. (1993) Bayesian updating of atmospheric dispersion models for use after an accidental release of radioactivity. Journal of the Royal Statistical Society D, 42, 501–511.

Sykes, R. I., Parker, S. F., Henn, D. S., Cerasoli, C. P. and Santos, L. P. (1998) PC-SCIPUFF Version 1.2pd Technical Documentation. Tech. Rep. ARAP 718, ARAP Group, Princeton, NJ.

Wood, S. N. (2003) Thin plate regression splines. Journal of the Royal Statistical Society B, 65, 95–114.