Chapter 4 Bayesian Numerical Integration: Advanced Methods
4.1.3 Numerical Experiments
We now proceed to illustrate the performance of multi-output BQ on a range of toy problems and real-world applications in order to illustrate the advantages, but also the limitations, of the methodology.
Prior Specification
One of the main challenges with multi-output BQ is the selection of appropriate hyperparameters. In this section, we consider multi-output BQ with covariance function C which is parameterised by γ = (γ1, . . . , γl) ∈ Rl. To optimise these
parameters, we propose to use an empirical Bayes approach and maximise the log- marginal likelihood: l(γ) = −1 2f(X) > C(X,X)−1f(X)−1 2log|C(X,X)| − nP 2 log(2π). This can be efficiently optimised by making use of gradients, given by:
∂l(γ) ∂γi = 1 2f(X) > C(X,X)−1∂C(X,X) ∂γi C(X,X)−1f(X)−1 2Tr C(X,X)−1∂C(X,X) ∂γi .
for alli= 1, . . . , l. Clearly, this is just one option for parameter selection, and the reader is referred to Chapter 3 for alternatives to empirical Bayes.
Multi-fidelity modelling
Consider the problem of integrating some function fhigh : X →
R representing
some complex engineering model of interest. We may be interested in such integrals for a variety of tasks, including statistical inference or optimisation. These models usually require the simulation of underlying physical systems, which can make each evaluation prohibitively expensive and will therefore limit the number of integrand evaluationsnto the order of tens or hundreds.
To tackle this issue, multi-fidelity modelling proposes to build cheap, but less accurate, approximationsf1low, . . . , fPlow−1 :X →Rto the model of interestfhigh. The cheaper models can then be used to accelerate computation for the task of interest. Several approaches are possible. One could for example use surrogate models (e.g. support vector machines, GPs or neural networks), projection-based models (Krylov subspace or reduced basis methods) or a models where the underlying physics is simplified; see Peherstorfer et al. [2016a] for an overview.
In this section, we consider the problem of numerical integration in such a multi-fidelity setup. Two related methods for MC estimation are the multi-fidelity MC estimator [Peherstorfer et al., 2016a] and the multilevel MC of Giles [2015], both of which are based on control variate identities.
We approach this problem with multi-output BQ on the vector-valued func- tion f = (fhigh, f1low, . . . , fPlow−1)>. Note that multi-output GPs were already pro- posed for multi-fidelity modelling in [Perdikaris et al., 2016; Raissi and Karniadakis,
2016; Parussini et al., 2017], and we extend their methodologies to the task of nu- merical integration. We consider two toy problems from the work of Raissi and Karniadakis [2016] to highlight some of the advantages and disadvantages of our methodology: 1. A step function onX = [0,2]: f1low(x) = 0, 0≤x≤1 1, 1< x≤2 fhigh(x) = −1, 0≤x≤1 2, 1< x≤2
2. The Forrester function with Jump onX = [0,1]:
f1low(x) = (32x−12)2sin(12x−4) + 10(x−1), x≤ 1 2 3 + (32x−1 2) 2sin(12x−4) + 10(x−1), x > 1 2 fhigh(x) = 2flow(x)−20(x−1), x≤ 12 4 + 2flow(x)−20(x−1), x > 12
Of course, the theory developed in the previous section does not apply to this case since we are interested in evaluating the low-fidelity integrand more frequently than the high-fidelity integrand. An extension of the theory which fits this setting is reserved for future work.
The functions considered and the corresponding posteriors with credible in- tervals are given in Figure 4.1. The uni-output and multi-output BQ estimates for integration of these functions against a uniform measure Π are given in the table in Figure 4.2. In both cases, 20 equidistant points are used, with point number 4,10,11,14 and 17 used to evaluate the high fidelity model and the others used for the low fidelity model. The choice of hyperparameters was made using empirical Bayes for both the seperable and process convolution covariances.
Note that both of these problems are challenging for several reasons. Firstly, due to their discontinuity, the integrands are not in the RKHSHCcorresponding to
the covariance functionC used in the multi-output BQ prior. More concerningly, the problems are misspecified in the sense that the true function is not even in the support of the prior. It is therefore difficult to interpret the posterior distribution on Π[f], and we end up with credible intervals which are too wide. This is for exam- ple illustrated in the values of the posterior variance for the high-fidelity Forrester function.
Figure 4.1: Test functions and Gaussian process interpolants in multi-fidelity mod-
elling. Plot of the Step function (top) and Forrester function (bottom) in blue with
GP 95% credible intervals in red. The plots on the left correspond to uni-output BQ, the plots in the middle to multi-output BQ with the linear co-regionalisation model and the plots on the right to multi-output BQ with process convolution covariance.
ferent scales and so require tuning of several kernel hyperparameters. This can of course make it challenging for multi-output BQ since the number of function evalu- ationsnis small and the empirical Bayes performance will tend to be inefficient in those cases.
Despite these two issues, it is interesting to note that both of the multi- output BQ methods manage to significantly outperform uni-output BQ in terms of point estimate, as the sharing of data allows the multi-output models to better represent the main trends in the functions. Furthermore, the multi-output BQ does not suffer from the issues of overconfident posterior credible intervals present in uni-output BQ. To see this, contrast for example the posterior variances for the high-fidelity step function. The process convolution prior allows for much more
Model BQ LMC-BQ PC-BQ Step (l) 0.024 (0.223) 0.021 (0.213) 0.016 (0.516) Step (h) 0.405 (0.03) 0.09 (0.091) 0.036 (0.155) For. (l) 0.076 (4.913) 0.076 (4.951) 0.075 (33.954) For. (h) 3.962 (3.984) 2.856 (27.01) 1.063 (63.801)
Figure 4.2: Uni-output and multi-output Bayesian quadrature estimates for multi-
fidelity modelling. Performance of uni-output BQ and multi-output BQ with linear
model of co-regionalisation kernel (LMC-BQ) and process convolution kernel (PC- BQ) on the step function (Step) and the Forrester function with jump (For.) in the low fidelity (l) and high fidelity (h) cases. The values given are absolute errors with the variance in brackets.
complex functions, which likely explains that it provides significant gains over the linear co-regionalisation model.
Global illumination
Our second application of multi-output BQ revisits the global illumination example from Chapter 3. We follow the setup previously described and consider the problem as Π[fω0] =R
S2f
ω0(ω
i)Π(dωi) where Π is the uniform measure onS2, andfω0(ωi) =
Li(ωi)ρ(ωi, ω0)[ωi·ω0]+ is a function which can be evaluated by making a call to
an environment map (which we consider to be a black box). One scenario which is common in these type of problems is to look at an object from different angles ω0,
with the camera moving. In this case, it is reasonable to assume that the different integrandsfω0 will be very similar when the difference in the angleω
0is small, and it
is therefore natural to consider jointly estimating their integrals. In the experiments we considerf1, . . . , f5 on a great circle of the sphere at intervals determined by an
angle of 0.005π.
We therefore consider two-output and five-output BQ with different IID re- alisationsX1, . . . ,X5 from the uniform measure Π. We propose to use a separable
covariance with scalar-valued RKHSHcbeing a Sobolev space of smoothness 32 over
S2: c(x,x0) = 83 − kx−x0k22. For the matrixB representing the covariance between
outputs, we propose to make this covariance proportional to the difference in angle at which the camera looks at the object. In particular we choose (B)ij = exp(ωi·ωj−1)
for simplicity. This could be generalised to include a lengthscale and amplitude hy- perparameter inferred by empirical Bayes, however this would most likely require a larger value ofn.
Figure 4.3: Uni-output and multi-output Bayesian quadrature estimates in the global
illumination problem. Plot of error estimates for f1 (top) and f2 (bottom), in the
case of the red, green and blue channels. The log-error is plotted for uni-output BQ (red), two-output and five-output BQ based on the linear model of co-regionalisation (blue and magenta respectively) and standard MC (dotted black).
gration error (for a fixed number of evaluationsnof each integrand) is significantly reduced by increasing the number of outputsP. Since the experiments use different point sets for each integrand, it is reasonable to assume that the convergence rate obtained in practice will be at least as good as that for identical point sets.
Proposition 10(Consistency of multi-output BMC with separable covari- ance function on the sphere). LetX be the sphereS2 and suppose all integrands
are evaluated on the same point set X which consists of IID realisations from the
uniform measure onX. Furthermore, assume Cis a separable kernel withcdefined
above. Then: e( ˆΠBQ; Π,HC, p) = OP n−34 .
The same rate with improved rate constant was observed in Chapter 3 when using QMC point sets, and similar gains could be obtained in this multi-output case. Before concluding this section, we note that there a significant potential further gains for the use of multi-output BQ for this application. Similar integration problems need to be computed for three colors in every pixel of an image, and for every image in a video. This is challenging computationally and limits the use of
MC methods to a few dozen points. Designing specific matrix-valued kernels for this application could provide enormous gains since we usually end up with thousands of correlated integrands. Furthermore, the weights only depend on the choice of kernel and not on function values. They could therefore be precomputed off-line and later used in real-time in parallel at no more computational cost than MC weights.
Conclusion and Future work
There are several potential extensions of multi-output BQ which we reserve for future work. One important question remaining is that of the choice of sampling distribution. In the multi-output case, the problem is even more complex than in the uni-output case due to the interaction between the different integration problems. However, the literature on the design of experiments for co-kriging/multi-output GPs may provide some useful algorithm, and the use of more advanced sampling distributions will certainly provide significant gains.
The multi-output BQ methodology has the potential to impact a wide range of applications domains, the most obvious being areas where co-kriging/multi-output GPs are already being used. Other areas also include multivariate time series anal- ysis and time-evolving computer models Conti and O’Hagan [2010], model compar- ison in Bayesian statistics or even the development of new probabilistic numerical methods.