Surrogates models - Feedback loop: Surrogates, sensitivity and Bayesian model updating

2.8 Feedback loop: Surrogates, sensitivity and Bayesian model updating

2.8.2 Surrogates models

Surrogate models, also known in the literature as meta-models or emulators, are mathematical models used to mimic the input-output relation of the computational expensive numerical model M. This is generally done by replacing the high-fidelity model with a cheaper analytical model ˆM. Some examples of surrogate models are Artificial Neural Networks (ANN) [166], Poly-Harmonic Splines [78], Extreme Learning Machines [58], Kriging models and response surfaces.

Artificial neural networks

An ANN is a mathematical model defining a function ˆM : I Ñ Y where ˆMpgq is a composition (e.g. non-linear weighted sum) of other weighted functions gipxq. The

basic architecture for a feed-forward ANN consists of one input layer, one or more hidden layers and one output layer [166]. Each layer employs several artificial neurons, also known as nodes, which are connected to the neurons of the adjacent layers by weighted links. In each neuron, the inputs are first weighted and, then, summed as follows:

gpxq “

i“1

ωi¨ gipxq ` b

where ωi are the weights, gipxq is the output of the node i in the previous layer and b

is the bias, which is generally introduced in the hidden and output layers and acts as a threshold for the argument of the activation function. The sum gpxq is processed by an activation function K to produce the neuron’s output. An example of commonly employed activation function is the sigmoidal function, defined as follow:

Kpgq “ 1 p1 ` e´gq

Figure 2.12 exemplifies a neural network architecture and node functionality is de- picted.

Select a suitable ANN topology is a problem specific task and it can be useful to maximise the emulator performance. Advanced optimisation approaches can be employed for the selection. A simple but computationally demanding method consists of a heuristic testing of different architectures, exploring different combinations of hidden neurons and hidden layers. Then, the best ANN architecture is selected based on a performance indicator. The coefficient R2 _{can be used as the performance index of}

the ANN regression and used to select the most suitable ANN architecture. The R2

coefficient is expressed as follows:

R2 “ 1 ´ ř

ipyi´ ˆyiq2

Input Layer Hidden Layer Output Layer ... ... K( ) Artificial Neuron Bias Bias Bias

Figure 2.11: Conceptual scheme of an Artificial Neural Network architecture and the function of an artificial neuron.

where yi is the ith output of the high-fidelity model, ˆyi is the output predicted by the

surrogate model, ¯y is the average of the output of the high-fidelity model.

Gaussian process emulators

A Gaussian Process Emulator (GPE), also known as Kriging emulator, is a stochastic process (i.e. a collection of random variables in the time and/or space domains), such that every finite linear combination of them is normally distributed. The concept of Gaussian processes is named after Carl Friedrich Gauss because it is based on the notion of the Gaussian distribution (normal distribution). Gaussian processes can be seen as an infinite-dimensional generalization of multivariate normal distributions. The distribution of a Gaussian process is the joint distribution of all those (infinitely many) random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space. Algorithms that involve GPs use a measure of the similarity between points also known as the kernel function. The kernel is used to forecast the value for an unvisited data point in the input domain (i.e. a point not available in the training data). One of the main advantages of GPEs is that the predicted output has a measure of uncertainty which is automatically associated with it (i.e.the marginal distribution at that point). Gaussian processes are useful in statistical modelling, benefiting from properties inherited from the normal and to replace computationally expensive models. Let fpxq be a function or a computer model mapping a multidimensional input on the real line, f : Rm _{Ñ R. Let X “ tx}

set of points in the input space) and corresponding set of outputs Y “ ty1, .., ynu, such

that xi P Rm@i “ 1..., n denotes a given input configuration and each output reads

yi “ f pxiq@i “ 1..., n. Then, each pair pxi, yiq denotes a training run for the Gaussian

Process Emulator, which is assumed to be an interpolation model yi “ ˆf pxiq. If a

fully parametrised Gaussian process prior is assumed for the outputs of the simulator, then the set of design points has a joint Gaussian distribution. The general assumption is that the simulator satisfies the statistical model for the output with the following structure:

f pxq “ hpxqTβ ` Zpx|σ2, φq (2.29) where hp¨q is a vector of known basis (location) functions of the input, β is a vector of regression coefficients, and Zpx|σ2_{, φq} _{is a Gaussian process with zero mean and}

covariance function covpx, x1_{|φ, σ}2_{q “ σ}2_{kpx, x}1_|φq where σ2 _{is the signal noise and}

φ P Rm denotes the length-scale parameters of the correlation (kernel) function kp¨, ¨q. The kernel function is capable of measuring the distance between different input (and corresponding output) configurations are. The base of such measure is related to the Euclidean distance in such a way that it weights differently each input variable.

Figure 2.12: An example of Gaussian process regression using 6 data points gnerated by an unknown function f pxq. The uncertainty (confidence interval) is larger in the area of the input spaced where data is not provided and is zero in correspondence of the training points.

In document Robust Computational Frameworks for Power Grid Reliability, Vulnerability and Resilience Analysis (Page 64-67)