Fitting a risk emulator - Approximation of the backward induction calculation

3.4 Approximation of the backward induction calculation

3.4.4 Fitting a risk emulator

We fit the emulator (3.4.12) as described in Section 2.2.2. We first make a prior

specification for the moments of the basis parameters, Ehα(i)_j i and Varhα(i)_j i, and

for the covariance structure of the residual process, Covhu(i)_j (.) , u(i)_j (.0)i; then, we

generate data from the risk function at this stage at known settings of the risk inputs, and use this data to adjust our prior moments. This allows us to make fast predictions for the risk at any new setting of its inputs with an uncertainty level which describes our confidence in these predictions. We consider some aspects of this model fit.

Generating risk evaluations: In order to fit the emulator, we generate evalu-

ations of the risk at known input settings. At wave i, we denote the set of N_j(i)

risk values that we use for the adjustment by R(i)_j = {R(i)_j1, R(i)_j2, . . . , R(i)

jN_j(i)}; R (i) jk is

the kth evaluation of the risk, obtained at input setting {z[j]k, w[j]k, d[j]k}. When

generating the risk evaluations, there are two separate cases to consider:

At the final stage, j = n, the data is generated directly from the risk (3.3.6) from an optimal terminal decision

R(i)_nk = ρt_nz[n]k, w[n]k, d[n]k

the amount of data that we can use to update the emulator is therefore limited by the computational effort required to evaluate (3.3.6), and the computational resources that we have available. In the case where this risk is closed-form (see Section 3.1.1) and the model is simple, generating data incurs almost no cost,

and the process of inverting the data covariance to update the emulator is the more rate-limiting step; however, if the model is more complex, or if there is no closed-form expression for the risk, then we must use numerical methods, making the data generation more costly. Where the terminal risk must be evaluated numerically, we may do this using an MCMC algorithm, or using the Bayesian integration technique presented in Section 2.4.1; any uncertainty about the terminal risk value resulting from its numerical evaluation is ac- counted for as measurement error when fitting the risk emulator.

At any other stage j = (n−1), . . . , 1, the risk is computed through comparison of the risk from an optimal terminal decision at the current stage with the risk from an optimally-designed experiment at the next stage

R(i)_jk = min h ρt_jz[j]k, w[j]k, d[j]k , s (i) j+1z[j]k, w[j]k, d[j]k i . (3.4.16)

Our numerical approximations to the risks introduce uncertainty, and so s(i)_j+1

is unknown. As discussed in Section 3.4.8, our emulator for the risk at stage (j + 1) induces an uncertainty specification for this quantity; we can com-

pute the expectation Ehs(i)_j+1z[j]k, w[j]k, d[j]k

for each input setting, and the

covariances Covhs(i)_j+1z[j]k, w[j]k, d[j]k , s

(i)

j+1z[j]l, w[j]l, d[j]l

between risks at each pair of inputs. We then use this uncertainty specification to characterise a multivariate Gaussian distribution, and we use samples drawn from this dis-

tribution to compute expectations EhR(i)_jki and covariances CovhR_jk(i), R(i)_jli

for the risk values. We fit our emulator to the expectations EhR_jk(i)i, using

the covariances to characterise the measurement error structure.

Modelling choices: When building our model for the risk at stage j, we want to

choose the basis functions h(i)_jp (.) and the covariance of the residual process u(i)_j (.)

in such a way that we can obtain an accurate representation of the risk at any given input point; however, we also want to make sure that we can carry out the integrals (3.4.13) as cheaply as possible, and that the sampling procedure used to characterise the minimum of the risk (Section 3.4.8) does not become too computationally expensive. The following specifications may be suitable in a wide range of problems:

3.4. Approximation of the backward induction calculation 109

Risk as a mean function: where we can compute the risk ρt

j[.] from an

optimal terminal decision directly, or relatively inexpensively using numerical methods, it is often the case that using this as a basis function for the regression term will account for a large amount of the systematic variation in the true risk; in parts of the input space where it is optimal to make an immediate decision, this basis function will completely account for the risk behaviour, and in parts of the space where we will continue sampling, the variation can be absorbed using other mean functions and the residual process.

Where the expectation of ρt

j[.] over zj or wj cannot be computed easily, as

required in equation 3.4.13, then using ρt

j[.] as a basis function can introduce

complications at this stage of the approximation procedure; to get around

this, we can instead use ρt

j ˜z[j], ˜w[j], d[j]

as a basis function, where ˜z[j] =

{z1, . . . , Ezj|z[j−1], w[j−1], d[j−1] } and ˜w[j] = {w1, . . . , Ewj|w[j−1], d[j] } are

the data and external input sets with the value at the current stage j replaced by the expectation conditional on the values from previous stages (for complex models where we use an approximating emulator, as in Section 3.4.3, we approximate the full conditional moments using the adjusted moments

Ez[j−1]zj|w[j−1], d[j−1]

and Ew[j−1]wj|d[j] ). The difference between this ba-

sis function and the true risk can then be explained using additional basis functions, the residual process or the nugget term. The terminal risk is used as part of the basis function for the emulator in both the simple example in Section 3.4.5 and the more complex one presented in Section 4.1.

Input space reduction: in a problem with many input variables or large amounts of data, it may be the case that the majority of the variability in the risk function is driven by a small number of linear combinations of the

{z[j], w[j], d[j]}; in this instance, modelling the risk in terms of only these linear

combinations has the potential to significantly reduce the complexity of the emulator that we fit, and the subsequent calculations. For example, in the case where we collect a large amount of data at each stage, we may choose to fit an

emulator in terms of the sample mean of the zj, or we might choose to identify

the directions of canonical correlation between zj and R

(i)

of the linear combinations of zj that explain the most variability. Mardia et al.

[1979] (chapter 11) provide an introduction to canonical correlation analysis. Variable removal: where there is little evidence that a variable has a sys-

tematic effect on the value of the risk, it may be appropriate to remove it from the model entirely, absorbing any remaining variability using an uncorrelated residual term; for example, where all distributions are Gaussian, and the loss

function is an un-weighted quadratic, the data zj has no effect on the risk,

and so can safely be removed from the model without decreasing our ability to explain the behaviour of the risk.

Specifying the prior: Often, our prior knowledge about the behaviour of the risk

will be poor, and so specifying an appropriate basis function set and corresponding prior coefficient moments is challenging; the amount of risk data that we have available, however, is only limited by the amount of computer time that we can devote to generating it. In most cases, therefore, we can generate an additional, smaller set of risk evaluations which can be used to carry out an initial linear regression. This

regression can be used to fix the prior moments Ehα_jp(i)i and Covhα(i)_jp, α(i)_jqi of the

basis coefficients. We can also use the residuals of this regression fit to empirically

fix the prior marginal covariance Varhu(i)_j (.)i of the residual process.

Determining the nugget variance: Where we include a nugget term ξ_j(i), we do

so to account for variability in the risk which cannot be explained using the regression and residual terms which we have selected; for example, where we have specified that the systematic components depend only on a low-dimensional summary of the observed data, and we must account for the risk variability which cannot be explained using this summary. In most cases, we can assess the level of this variability by generating further samples from the risk.

Where we can, we assess Var h

ξ_j(i) i

by holding constant those inputs for which we expect the regression and residual components to explain risk variability, and then varying the inputs which will induce the variability that we want to capture using the nugget; this should be done at a number of different settings of the fixed inputs,

3.4. Approximation of the backward induction calculation 111 to check that the level of variability does not change drastically across the input

space. Varhξ_j(i)i is fixed to the variance of the sample (or the average variance of

the samples from different settings of the fixed inputs).

Fitting the emulator: For a particular setting of the correlation parameters, we

can adjust our beliefs about r(i)_j using the calculations outlined in Section 2.2.2;

our adjusted expectation E

R(i)_j

r(i)_j [.]i is computed as in equation (2.2.12) and

our adjusted covariance Cov_R(i)

j h

r(i)_j [.] , r(i)_j [.0]i between function values at different

input settings is computed as in equation (2.2.13) .

Determining correlation parameters: Having used the initial regression and

residual analysis to empirically fix the regression priors and the marginal variance of

u(i)_j (.) , it remains to determine any parameters which govern the degree of correla-

tion between input settings for the residual; since the dependence of the covariance function on these is typically non-linear, we cannot perform the usual Bayes linear analysis by specifying a prior and using the data to adjust. Instead, we fix the correlation parameters using a leave-one-out cross validation procedure [Rasmussen and Williams, 2006]; we leave out each point in turn, and predict it using the fit to the remainder. To assess the quality of the fit for each candidate correlation parameter setting, we use the sum of the predictive Gaussian likelihoods for all points. For further discussion of cross validation, see Section 2.2.3.

Checking inter-wave correspondence: Under the procedure 2, we have the

option of performing multiple waves of analysis. At the first wave (i = 1), we emulate the risk function at each stage for the first time; then, at subsequent waves (i = 2, 3, . . . ), we repeatedly re-emulate the same risk functions over sub-regions of the design input space. We hope that as we focus our models on sub-regions of the design input space, we will be able to model risk behaviour more accurately; at a minimum however, we expect that the predictive error bars of the emulator fitted at wave i should overlap with those of the emulators fitted at stages 1, . . . , (i − 1). This property can be checked to ensure that our models are behaving as they should. If we do not see overlap between emulators at consecutive waves, this could be an

indication of problems with our model specification; for example, failure to fully acknowledge the level of our uncertainty about the risk at one or more stages.

In document Bayes Linear Strategies for the Approximation of Complex Numerical Calculations Arising in Sequential Design and Physical Modelling Problems. (Page 131-136)