Likelihood Function and Prediction Errors

2.2 Bayesian Inference and Uncertainty Quantification

2.2.3 Likelihood Function and Prediction Errors

Bayesian updating from prior to posterior belief crucially depends on the choice of likelihood function. It defines how information about the system, that is contained in the observations, is transferred to the knowledge about model parameters Θm,

yielding the posterior. Mathematically, the likelihood function resembles the assumed distribution of errors or residuals r between model predictions y and data D. Originally, it is a frequentist approach (Fisher, 1922) but can likewise be used and interpreted under the Bayesian paradigm (Del Giudice et al., 2013).

In Bayes’ theorem (Equation 4), the likelihood is p(D|Θm, Mm) which means that

it is a function of D given Θm of a certain Mm. However, when updating the pa-

rameter prior p(Θm|Mm), we are interested in the dependence on Θm. Therefore,

we employ the original model formulation in Equation 1 to make the likelihood a function of the model parameters Θm given observed data D: L(Θm|D). Note,

that the re-labelled L(Θm|D) does not necessarily integrate to 1 and is therefore

In most general terms, errors are often assumed to be independent and identi- cally distributed (i.i.d.). A specific instance of this is Gaussian white noise, i.e., uncorrelated normally distributed errors with zero mean and finite variance. The corresponding likelihood function for Ns residuals for given data D is a multivari-

ate Gaussian: L(Θm|D) = (2π)− Ns 2 |R|− 1 2exp −1 2r(Θm) T_R−1 r(Θm) (8) It describes the normal distribution of residuals r(Θm) = y(Θm) − D, i.e., the

distribution of predictions y(Θm) centred at the observed data D. In case of un-

correlated errors, the variance-covariance matrix of errors R contains only diagonal elements. They represent the measurement uncertainty (see in Equation 1) and can be interpreted as weighting factors for each residual - the larger the measurement uncertainty, the smaller the weight of the corresponding residual. Then, the exponential argument resembles the weighted sum of squared errors (WSSE), which makes the logarithmic likelihood proportional to the WSSE, i.e. a common error metric.

Depending on the modelling task at hand, alternative likelihood formulations might potentially be more suitable than Gaussian white noise. However, this normal error model allows easily to shed light on the issues involved in the adequate assignment of a likelihood function with respect to the M-setting of the modelling task. Note, that the “Gaussian white-noise likelihood model” in Equation 8 does not account for a systematic bias between model predictions and observations that occurs when wrong models are used. Such bias can be accounted for within the likelihood function by modifying R (e.g., by having non-zero off-diagonal elements) or a separate error model (e.g. Del Giudice et al., 2013). However, while outside of the M-closed setting such a statistical error treatment might help to increase predictive performance, it contradicts the idea behind identifying a true model in the M-closed case according to which no systematic bias exists. Hence, in M-closed, the likelihood function should only account for measurement uncertainty. Philosophically, another perspective is to automatically account for errors by making the model stochastic instead of describing them by a likelihood function (Nearing et al., 2016). Yet, mathematically, corresponding equations of these two perspectives are equivalent.

Further, the above likelihood formulation does not depend on the state of model variables, i.e., it cannot represent errors relatively to the values themselves. A common example from hydrologic modelling is that the measurement uncertainty of stream discharge depends on the flow regime, e.g., errors are relative to the

magnitude of discharge under low, medium or high flow conditions. This pro- blem of variance that changes depending on the magnitude of the QoI is known as heteroscedasticity (Sorooshian and Dracup, 1980). Possible solutions in modelling are to make the elements in R dependent on the magnitude of the me- asurements, i.e., to assign relative errors, or to apply a transformation, e.g., the Box-Cox-Transformation (Box and Cox, 1964), to rescale the values and control the variance. This, however, might introduce additional (uncertain) transformation parameters and requires an adjustment of the likelihood function (see, e.g., Sch¨oniger et al., 2014). Alternatively, the likelihood has to be evaluated over all possible states, which mathematically resembles an expensive integration that is often analytically intractable and computationally infeasible (Albert et al., 2015). Then, so-called Approximative Bayesian Computation (ABC) methods allow for sampling the likelihood function rather than fully evaluating it and this way to pursue Bayesian inference. Further, ABC methods allow to infer an approximate posterior using summary statistics of the QoI if the model output space Y is high- dimensional and a likelihood function like Equation 8 becomes unsuitable (Albert et al., 2015).

Opposed to this are approaches that do not employ a rigorous likelihood definition that would allow to infer a full probability distribution of model output y. A respective popular method in hydrology is the so-called GLUE (generalized likelihood uncertainty estimation; Beven and Binley, 1992). It uses a certain rescaled error metric (related to “acceptable” not to probable errors) to weigh model predictions. Based on this, prediction envelopes are delimited and the whole model is rated in its forecast performance for comparison against alternatives. Note, that while such an approach provides pragmatic estimates of acceptable predictions and corresponding ranges of variability, they should rather be considered as weighted sensitivity analysis (Montanari, 2007) and do not allow for rigorous probabilistic uncertainty quantification from a Bayesian perspective.

In document Bayesian Multi-Model Frameworks - Properly Addressing Conceptual Uncertainty in Applied Modelling (Page 40-42)