• No results found

4.2 Gaussian Process Emulators

4.2.3 Latin Hypercube Design

GP emulators are constructed from a set of N simulator evaluations. However, due to the computational expense in running a simulator each evaluation should be optimal

for making inferences about the simulator. The process of generating a strategy for where to evaluate a simulator sits within the Design of Experiments (DoE) field. The main objective of a DoE is to fill a given input domain, known as space-filling. In the context of a simulator it may be that several parameters are to be statistically studied and require emulation. This leads to designing an experiment that covers a several dimension sized domain in which simulator evaluations are to be run. A DoE method will look to fill that space in a manner that allows good coverage for a given budget of simulator runs.

For the majority of emulation applications an initial space filling design is required (that may later be updated in order to improve emulator performance). Numerous strategies exist for generating a DoE with examples being Monte Carlo sampling techniques, Latin Hypercube Design (LHD), maximum entropy sampling (discussed further in Section 6.2.2), Sobol sampling and Halton sampling. Detailed explanations of these approaches are beyond the scope of this thesis, with the choice of DoE method being user and problem dependent; for a detailed review see [119]. Most of these approaches create a uniformly spaced design, however when fitting a GP emulator, evaluation locations should also be close to the domain boundary in order to accurately capture the behaviour in these regions. To visualise this problem an example is introduced where a simulator is constructed from Eq. (4.39) (with 15 equidistant training points) and is presented in Fig. 4.6a (the hyperparameter estimates are ˆω = 30.15, ˆσ2

f = 1.42 and ˆβ = 0.66 with a fixed nugget ν = 1 × 10 −9). Typical code uncertainty will be in the form of Fig. 4.6b, where increases are seen around the boundary of the domain, meaning that to improve emulator performance a concentration of design points should be located at the boundary. A method for achieving this is called a Generalised Maximum Latin Hypercube Design (GMLHD) [120].

y = η(x) = 1.2x + N (x | 0, 0.1) − N (x | 0.3, 0.3) − N (x | −0.1, 0.4) + cos(2π × 2x) (4.39)

Latin Hypercubes

A Latin Hypercube (LHC) is a random space filling DoE that is a D dimensional extension of the Latin square sampling method. A sampling design is Latin square if

4.2. GAUSSIAN PROCESS EMULATORS 79 (a) -1 -0.5 0 0.5 1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 (b)

Figure 4.6: Panel (a) presented the posterior emulator prediction for a numerical example. Panel (b) demonstrates typical GP emulator uncertainty (one standard deviation,pdiag(V (η∗)) ) when equidistant training points are used in [−1, 1].

given an N × N grid of possible sample locations in two dimensions (D = 2), there is only one sample in each row and location. An example for N = 21 is displayed in Fig. 4.7a.

To construct a random LHC L, in the space RD of N points (in each dimension), elements of the vector x = {x1, . . . , xN}T(typically in [0, 1] and then scaled) are trans- formed through random permutations for each dimension (i.e. L = {x1, . . . , xD}). However, by construction a LHC will not necessarily be maximally separated. For this reason maximum (or optimised) LHCs are constructed.

Optimal criteria must be defined in order to generate a maximum LHC. Here two criteria are used, a distance measure (Eq. (4.40)) — specifically the LHC with minimum squared euclidean distance d(L) and minimal re-occurrences of that minimum distance n(L) — and a force measure (Eq. (4.41)) — namely the sum norm of the repulsive forces F (L), when samples are considered electrically charge particles (where a squared term is used to avoid square root computations, increasing the speed with which F (L) is calculated).

d(L) = min 1≤i, j≥N, i6=j||xi− xj|| 2 (4.40) F (L) = N X i=1 N X j=i+1 1 ||xi− xj||2 (4.41)

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 (a) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 (b) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 (c)

Figure 4.7: Latin squares where N = 21. Panel (a) demonstrates a random Latin square, panel (b) a maximum Latin square (using a force criteria), and panel (c) a generalised maximum Latin square (transforming (b) through Eq. (4.43) where a = 0).

With these definitions a LHC L1 is better than L2 when d(L1) > d(L2); in scenarios where d(L1) = d(L2) then where n(L1) < n(L2). For the force criteria, L1 is better than L2 when F (L1) < F (L2). The specified criteria can be framed as an optimisation problem to identify a maximum LHC for x in RD.

One approach to optimising a LHC is to use a genetic algorithm [121] as outlined in Algorithm 2. The fitness function is either evaluated by assessing d(L) and n(L) (if distance is the criteria) or F (L) (when force is used). The best half of the population are the largest distances (with the least repeats of that distance) or the smallest force, with these surviving LHCs becoming parents. In the cross-over stage, children are created by keeping the best LHC (becoming the 1st and (Npop/2 + 1)th child) and performing cross-overs with the remaining i survivors. The first Npop/2 children are

4.2. GAUSSIAN PROCESS EMULATORS 81

obtained by taking a random column of the ith parent and substituting it into the best LHC. Remaining children are generated by taking a random column of the best LHC and placing it inside the corresponding ith parent. Once all the children have been generated mutation is performed on all but the 1st child. Here each column for every child is assigned a number in [0, 1] based on a uniform distribution and when lower than a threshold pmut, two random elements are swapped in that column before the fitness is assessed. The best LHC from that new population is checked against Eq. (4.42) (for the force criteria). An example of an optimised LHC is presented in Fig. 4.7b.

F (Lk) − F (Lk−n) < ε(F (Ln) − F (L0)) (4.42)

Where n is a few iterations (e.g. n = 50), as there is no guarantee of improvement every iteration, L0 is the initial best LHC, k is the current iteration where Eq. (4.42) is only assessed when k is a multiple of n, and ε is small (here ε = 10−7).

Algorithm 2 Optimised Maximum Latin Hypercube Draw Npop random LHCs

Evaluate fitness for all individuals in population Stop = false

while Stop 6= true do

Select best half of population as survivors Cross-over survivors to generate Npop children Mutate children to generate new population

Evaluate fitness for all individuals in new population if best LHC meets stopping criteria then

Stop = true end if

end while

Generalised Maximum Latin Hypercube Design

A GMLHD aims to reduce the uncertainty at the edges of a GP emulator by placing design points near the boundary, whilst remaining well spaced [120]. The main approach is to take a uniform maximum LHC (in [0, 1]D) where the ith, jth element is denoted zi,j and transform the design points through a beta quantile function (an inverse CDF) given a tuning parameter a ∈ [0, 1], shown in Eq. (4.43).

LHD NMSE DM D log p (y∗| N − p, η∗) Random 90.281 ± 62.162 5078 ± 1.7739 −186 ± 420 Maximum 0.012 338 1494 Generalised (a = 0.8) 0.008 345 1507 Generalised (a = 0.6) 0.005 292 1502 Generalised (a = 0.4) 0.004 206 1474 Generalised (a = 0.2) 0.003 142 1429 Generalised (a = 0) 0.003 113 1374

Table 4.2: Comparison of GP emulator predictions when trained using different LHDs where D = 2 and N = 21. The random LHD results are an average of 25 realisations with the mean and standard deviation are shown. The simulator is Eq. (4.44). Both DM D and log p (y∗| N − p, η∗) assume independent posterior variance.

xi,j = 1 B((1 + a)/2, (1 + a)/2) Z zi,j 0 t(a−1)/2(1 − t)(a−1)/2dt (4.43)

Where B denotes a beta function. A beta quantile function is implemented as it is known that for large degree polynomial regression an arc-sine distribution (when a = 0) is the limit distribution of its D-optimal design (see [120] for more mathematical justifications). An arc-sine distribution will put more mass on the design space edges, whereas the other extreme where a = 1 will result in a uniform distribution (leaving the maximum LHC unchanged). Figure 4.7c presents an example of a generalised maximum Latin square where Figure 4.7b is transformed through Eq. (4.43).

Table 4.2 presents a comparison of LHDs when a random, maximum and generalised maximum LHCs are used to determine the training points of a GP emulator, where D = 2 and N = 21. The numerical example uses the simulator shown in Eq. (4.44). The training GP emulators were tested against a N × N grid and validation metrics assessed as displayed in Table 4.2.

y = η(X) = 2(x1− 2 + 10x2− 8x22)

2+ 2x

2+ 1 (2x2)2 (4.44)

It is demonstrated that as expected a random LHD performs worst on all validation metrics with a maximum LHD being outperformed by the GMLHD. This agrees with finding of Dette and Pepelyshev in [120], who show that a GMLHD will outperform a maximum LHD and Sobol sampling for a variety N and D. Generally the decrease in a coincides with better emulator performance, as shown by the NMSEs and