4. Methods of Uncertainty Analysis
4.6 Guidance in choosing a methodology
It is impossible to suggest a universal uncertainty methodology, only to make clear the different assumptions and choices that are necessary in the application of any of the methods presented above. Discussion about what might be the ‘right’ methodology continues (e.g. Gupta et al., 225; Beven and Young, 83). Therefore, any attempt to produce guidance in choosing a methodology for a particular application will necessarily reflect the view of the author. The following decision tree is such an initial (non-objective) attempt to provide guidance on the choice of methods. This tree will be refined within the FRMRC project. It has to be noted that NO guidance is given with respect to the type of model (physically-based, data-based), which should be used. This has to be decided on different criteria.
Figure 9 Decision tree for uncertainty analysis tools (blue boxes represent the questions to derive a decision for an uncertainty method, yellow boxes show the major classifications of several uncertainty methods and orange boxes stand for individual methods or small sub-groups of those)
The orange and yellow boxes in figure 9 are explained in the previous chapter. The blue boxes are decisions or assumptions which have to be made in order to derive one specific uncertainty method. The boxes can be explained as following:
Data available for model evaluation?
The question is if data are available which have not been used in creating the model and allow a comparison to the model results.
Uncertainties can be defined statistically
Distribution functions can be assumed or approximated for all uncertainties. If this question is answered with yes, then it is for example possible to define a range of possible floodplain roughness based on a Gaussian/normal distribution (see explanation below).
Model structure is 'simple' e.g. Manning equation
No numerical analysis or complex direct solution schemes are required to compute these equations.
Model has a long runtime and / or many parameters
The more parameters a model has the more runs are necessary to describe the response surface adequately. In this particular figure, parameters are taken as example and synonym for sources of uncertainty (see chapter 2).
For example, a two parameter model would need four simulations to sample a two by two grid. 100 simulations would be already necessary to sample a 10 by 10 grid. The amount of subdivision needed for each parameter would depend on the non-linearity of the response surface. The sampling increases with the number of parameters. For our example it could be computed by
p
N S
N
=
D
Equation 3
NS: Number of samples
D: Number of subdivisions in the parameter space Np: Number of parameters.
Figure 10 Example of sampling the parameter space
In this particular set-up the total processor run-time required would be the number of samples multiplied by the execution time of one model realisation. It has to be noted, that this is just a crude example as most uncertainty techniques have more efficient techniques to quantify the response surface.
In summary, this question depends highly on the type of model or model cascade used and should be based on previous experience. As a bold statement, we argue that a model with more than 8 parameters and an execution time of more than 2 minutes should be considered as computer intensive (if executed on a single CPU).
Model aim: Real-time forecasting
This question is based on practical considerations. A forecast is called a real-time if the combined reaction- and execution-time of the forecast is shorter than the maximum delay that is allowed, in view of circumstances outside the forecast. In other words, a flood forecast including uncertainty analysis is computed as soon as all input data are available.
The two branches arising at this node could be combined, as all methods under this heading are examples of conditioning on data and could be applied for real-time forecasting or off-line analysis. The main difference is that methods which are quoted in the left leave usually treat data sequentially time step by time step, whereas methods in the right leave usually work on en bloc data set. However, even this distinction is blurred. Therefore the question is reasoned by preference based on extensive experience of the authors.
Model can be assumed linear with deterministic input and simple error structure
Linear
A system is linear if its response is directly proportional to changes in the quantities of the system, for every part of the system.
Deterministic input
Model input does not vary randomly in time. In contrast stochastic input varies randomly in time. A simple example, would be the difference between one rainfall prediction (deterministic) and an ensemble of rainfall predictions (stochastic).
Simple error structure
The model errors can be explained by for example simple distributions such as the normal distribution. Most error structures are more complicated and cannot be easily approximated.
Model is mildly nonlinear with deterministic input
Mildly nonlinear
A relationship between numerical quantities is called nonlinear if there is not a constant proportion relating changes in one quantity to changes in the other. Nonlinear systems are probably easiest understood as "everything except the relatively few systems which prove to be linear"226. Mildly nonlinear are all systems which could be approximated by linear systems subject to a small model error.
Deterministic input (see above)
Errors can be assumed Gaussian
This assumption is mathematically convenient in that it means that advantage can be taken of a body of statistical theory. It is not always verified that the actual errors in an application are indeed Gaussian.
Model Error
The difference between a quantity and its estimated or measured quantity. The latter being based on the whole population.
Gaussian / Normal Distribution
A normal distribution I a variate X and a mean µ and the variance σ2 is a statistic distribution with probability function 2 2 ( ) /(2 )
1
( )
2
xP x
e
µ σσ
π
− −=
Equation 4 On the domainx∈ −∞ ∞(
, )
.Model residuals have simple structure
Model residual (after 227)
A residual is an observable estimate of the unobservable error. The simplest case involves a random sample of n men whose heights are measured. The sample average is used as an estimate of the population average. Then we have:
• The difference between the height of each man in the sample and the unobservable population average is an error, and
• The difference between the height of each man in the sample and the observable sample average is a residual.
Residuals are observable; errors are not (see 227).
Simple structure
The model residuals can be explained by for example simple distributions such as the normal distribution. Most residual structures are more complicated and cannot be easily approximated.
In general we would argue that most environmental models have a complicated residual structure which can rarely be approximated in advance or indeed after the analysis.