Uncertainty Characterization and Representation

Vicente J. Romero

2. Uncertainty Characterization and Representation

Here we discuss the sub-elements of the Uncertainty Characterization and Representation box in the middle row of Figure 1, and address the definition of uncertainties categorically, interpretationally, and quantitatively/mathematically.

Uncertainties are categorized into various types according to their nature in the context of a given uncertainty analysis. The different natures determine the different implications and treatments of the uncertainties in an analysis. The different natures or types of uncertainty discussed here are:

aleatory vs. epistemic; probabilistic vs. non-probabilistic; random vs. systematic; and traveling vs. non-traveling.

Aleatory uncertainty characterizes the inherent randomness or variability of a quantity in a set or population of units, events, etc. (paraphrased from Helton, et al. [9]). For example, random variability of material properties in a set of units being tested or modeled is an aleatory uncertainty. Alternative terminologies include: random variability, stochastic variability, and irreducible uncertainty.

Aleatory uncertainties are usually characterized by random variables that are assigned probability density functions (PDFs), or their transform-equivalent cumulative distribution functions (CDFs).

Probabilistic modeling is most firmly established when suitable experimental data exists as a basis for a frequency distribution of values. Even then, when very limited (sparse) data samples are available, significant uncertainty exists regarding the true PDF from which the samples come.

Among five methods studied by Romero et al. [10], a simple Tolerance Interval approach for dealing with this PDF-form uncertainty was found to be most practical. The method has been found to be very robust in avoiding underestimation of true PDF variance from sparse samples of the PDF (for many different PDF shapes) [10, 18], but it can potentially be very conservative and exaggerate variance considerably. Despite this, the method is very simple and inexpensive, making it a pragmatic choice.

Epistemic uncertainty characterizes the lack of knowledge about the appropriate value to use for a quantity that is assumed to have a fixed value in the context of a specific application (paraphrased from Ref. 9). Alternative terminologies for epistemic uncertainty include: lack-of-knowledge uncertainty, subjective uncertainty, or reducible uncertainty.

Epistemic uncertainty regarding a quantity’s value can be probabilistic or non-probabilistic.

Uncertainty intervals are simple non-probabilistic representations of uncertainty, often used when the uncertainty is obtained largely from expert opinion and/or rough quantitative estimates.

Epistemic uncertainty can be modeled with various other types of uncertainty representations between the extremes of interval and probabilistic PDF representations, such as Possibility Theory and Dempster-Shafer belief structures from Evidence Theory; see e.g. Helton et al. [9].

Such intermediate representations are relatively involved and expensive and are beyond the scope of this chapter. A simplified probability box representation approach is discussed later in this section for handling uncertainties involving both interval and PDF sources, where the PDF can represent either aleatory or epistemic uncertainty.

Aleatory and epistemic uncertainties signify different types of uncertainty, so it is often important to keep them separate in model calibration, validation, and margin analysis. This helps in interpreting the different implications of these uncertainties in the analysis, e.g., Refs. 9, 19, 25, 26, 31, and 32.

Another common distinction made between types of uncertainties comes from the experimental uncertainty literature: systematic vs. random.² Systematic uncertainty exists when measurement

2 The terms systematic uncertainty and random uncertainty are not strictly proper. Nevertheless, the terms systematic uncertainty, random uncertainty, and random variability are commonplace in engineering literature and practice, and are used here. Paraphrasing from Ref. 11: the concept of uncertainty varying randomly from test to test is perplexing, perhaps even nonsensical. It is measurement errors themselves, or physical quantities themselves, that are conceived as randomly varying over a set of units or events, or systematically having the same value in the multiple units or events.

errors and/or properties, parameters, and inputs (such as boundary conditions) of a system or systems under consideration do not vary over a population of nominally similar units or events, but the quantities are uncertain in the epistemic sense. Random uncertainty or random variability exists when measurement errors or system conditions vary randomly over a set or population of nominally similar units or events. This is the equivalent of aleatory uncertainty.

A distinction is also made between traveling and non-traveling uncertainties [12-15] in the context of model validation and calibration when using the Real Space framework for comparing experimental and simulation results and their uncertainties [14-16, 18, 31, 32]. Non-traveling uncertainties in the experiments and/or simulations are those that do not “travel” consistently to model use beyond the calibration or validation setting. Conversely, traveling uncertainties do extrapolate consistently to downstream applications of the model. For example, innate random variability of a material in a system being tested and modeled is a traveling uncertainty, whereas random measurement variability in the tests is a non-traveling uncertainty; it will not be innately present in model prediction scenarios where the model is being used subsequent to the calibration or validation activity. Both examples are sources of random variability in the calibration or validation activity, but one is a traveling uncertainty and one is not. Thus, these two sources of random variability (even if both are modeled probabilistically) are treated differently in the Real Space uncertainty accounting and comparison system. This reflects their different significance and consequences to prediction error and uncertainty; see References 13 and 15.

Experimental or simulation uncertainty often comes in a discrete form as a set of experimental or simulated values of an uncertain quantity. Very often the discrete values are proposed to come from a continuum of values or possible values. Thus, they can be modeled with continuum representations of uncertainty, such as PDFs and intervals. But sometimes uncertainty cannot be treated with a continuum approach and must be treated with less familiar discrete uncertainty approaches. Such cases include discrete model forms, examples of which are given in Table 2.

Reference 16 demonstrates a method for representing, propagating, and aggregating discrete model-form uncertainty (epistemic).

A situation not involving discrete model forms, but instead model inputs that come in discrete (not parametrically continuous) form, is exemplified in the following. In electronics modeling applications, the Gummel-Poon (GP) model parameters (often numbering 10 or more) determined through calibration are unique to each individual transistor tested [17]. When tests on nominally identical devices are performed, the resulting sets of GP parameters define different points in the calibration parameter space. However, the parameter space is generally not treated as a continuous parameter space that can be interpolated or extrapolated. The concern is that running the transistor model with parameter sets corresponding to other points in the space may not represent physically realizable devices, or may not reflect devices representative of the population from which the calibration devices came. These concerns appear applicable in many other calibration situations involving aleatory variations of the physical specimens or systems calibrated to. However, many popular calibration approaches do not address this concern.

Discrete random function inputs to models are also sometimes encountered. For example, consider multiple stress-strain curves from repeated (replicate) tests of a material. A continuum parametric or spectral representation is theoretically possible to infer from the multiple stress-strain curves from replicate tests of the material, but in practice this is very difficult to accomplish [18]. Instead, the authors of Refs. 18, 19, and 31-33 employ a very simple approach for approximately representing, propagating, and aggregating random variability information that comes in discrete form, whether discrete parameter sets as in the electronics case, or random functions like stress-strain curves.

When frequency-based probabilistic (PDF) sources of uncertainty and epistemic sources contribute to the uncertainty of a given quantity, then the combined uncertainty involves many potential candidate PDFs, as shown figuratively in Figures 2–4. It is usually prohibitively costly to work with populations of PDFs, so simplifying approximate treatments are necessary. In the special case of mixed aleatory and epistemic uncertainty arising from data sample sparseness, a tolerance interval approach is often effective as explained previously. Otherwise, when a mix of epistemic interval or effective-interval [34] sources and frequency-based probabilistic sources are present, a simplified approach to handling these disparate types of uncertainty using probability boxes is explained next.

Probability Boxes (Pboxes) [20, 21] are simplified representations of families or populations of PDFs/CDFs, in which the CDFs in the family do not cross each other. Thus, the extreme upper and lower CDFs of the family form a “probability box” which bounds all the CDFs in the family.

This restriction is often met in real application situations, especially in model validation and calibration where relatively controlled (thus small) uncertainty magnitudes exist in the experiments and simulations and this limits significant interaction effects between the probabilistic and interval uncertainties. Because of the accommodating nature of this type of CDF family, uncertainty analysis often needs to be conducted with only the two bounding CDFs that comprise the Pbox in order to bound uncertainty results for the family of PDFs/CDFs. The concept is demonstrated in Refs. 19, 31, and 32 where even simpler and less expensive “Level 1”

approximate Pboxes [34] are used to make the model validation assessments feasible by completely decoupling the probabilistic and the interval or effective-interval uncertainties.

Numerical Solution-Bias Uncertainty Estimation - Solution bias error typically comes from spatial and/or temporal discretization of the governing continuum physics equations and geometry, and from incomplete convergence in iterative solutions of the discrete equations (due to non-zero error tolerances needed for computational affordability). “Solution” or “calculation”

verification attempts to quantify solution bias error.

Model Prediction-Bias Uncertainty – As established in Section 2, model validation attempts to quantify model prediction-bias uncertainty, and model conditioning (which may include model calibration) attempts to reduce prediction bias and uncertainty. Model validation and model conditioning involve experimental and simulation sources of uncertainty, which potentially involve everything in Figure 1 except for Margin Assessment, which is discussed in Section 4.

In model validation or calibration, model results may not adequately match experimental results.

But because validation and calibration are fairly complex and expensive endeavors, generally the model being worked with is the best that can be afforded or obtained, so the model will not typically be rejected and abandoned if it does not meet accuracy goals. Rather, efforts will usually first be made to better reconcile the model with reality. Usually the model or some modified version will be leveraged for prediction needs, even if that means lowered expectations and reduced prediction duties and domain of applicability.

Reconciliation can come from investigating experimental and modeling factors that are perceived to contribute most to the failure to meet the accuracy goals, with remediation of the factors as practical. This can take the form of: more accurate and precise measurements and control of experimental inputs and conditions; more experiments if experimental uncertainty is being significantly driven by too few experimental samples of stochastically varying phenomena;

improved data processing procedures; reducing the discretization-related uncertainties in the model solutions; improving the model form through modified and/or added behavioral mechanism and parameters.

If these types of actions do not fully reconcile the agreement between the model and experiments, or cannot be afforded or conducted, then the best chance for success in upcoming uses of the model may be to condition it to match the experimental data as well as possible. An argument is constructed in Ref. 12 that model conditioning is a good strategy for likely reduction of risk in extrapolative prediction; the conditioned model will certainly be more accurate in at least a local neighborhood of extrapolative prediction, and the advantage may extend to larger extrapolations as well.

Thus, if the model is found inadequate initially, model conditioning can add value to it before going forward to other predictions. There is no single model-conditioning approach that works best in all circumstances. Approaches in at least the following two categories exist. A combination of these approaches can also be used.

• Approach 1 – Model Calibration

- Manipulate suitable model parameters to correct the model output results to match the experimental data. Here, this is generically called model calibration. Other terms are also used in engineering literature and practice.

• Approach 2 – Output Adjustment Function

- A corrective value or function (with or without uncertainty) can be applied to the prediction results from the unaltered model. (This is called “add factoring” in the work by Sterman [22].) For example, a prescribed amount may be added to or subtracted from the predicted result; or the added or subtracted amount might be a function of the response magnitude and/or boundary condition inputs of the problem [23]; or the correction might be a multiplicative scaling of the predicted result by a constant percentage or by a scaling function that varies according to the response magnitude and/or boundary conditions. An example of the latter in Ref. 12 gives a self-adjusting correction that is reasoned to be somewhat robust in extrapolation over the range of circumstances for the particular applications the model was to be used for. A calibration (Approach 1) was not feasible, given resource constraints. It would have been ineffective anyway because the calibratable traveling parameters of the model did not have the sensitivity needed to adequately address the large experimental uncertainty in the boundary condition—which drove the validation uncertainty.

Model conditioning subsequent to a model validation finding of inadequate agreement between model predictions and experimental results may involve just Approach 1 (an initial calibration or a recalibration of the model based on the experiments/results from the validation activity), or just Approach 2, or Approach 1 followed by Approach 2.

The result of model conditioning is an adjusted prediction model and its associated uncertainty.

How to best use a model’s validation- or calibration- characterized prediction bias and uncertainty to potentially adjust or bias-correct the model to mitigate prediction risk beyond the validation or calibration conditions is a very difficult question and an active area of research. See Refs. 12-15, 19, and 23-30 for extended discussions and methodology proposals and demonstrations.

In document Simulation Credibility: Advances in Verification, Validation, and Uncertainty Quantification (Page 175-180)