Probability-based methods - Methods to address uncertainty in energy modelling

TIMER / IMAGE

5 U NCERTAINTY ANALYSIS

5.2 Methods to address uncertainty in energy modelling

5.2.3 Probability-based methods

Probability-based methods are an extension of sensitivity analysis approach in the sense that a predefined number of realities of how an input variable will evolve over time is

evolution. Based on the probability of an input factor, uncertainty propagation creates a distribution function of the output parameter. Thus, probability-based methods give an indication of the likelihood of outputs dependent on the likelihood attached to uncertain model inputs (Rotmans and van Asselt 2001, p. 117).

The simplest form of implementation, called Monte Carlo method, involves specifying a distribution (discrete or continuous) and a range on an input variable, e.g. the development of the demand for residential heating, and then propagating this uncertainty through to the model output. For this purpose the model is run many times via sampling from the probability distribution. Sampling means that values are drawn at random from the specified distribution. The evaluation of the resulting output distribution is the last step. This distribution is, however, only an approximation of the exact distribution. An extension of this approach is the use of joint distributions for more than one input variable.

The appeal of Monte Carlo sampling is that its computational complexity is linear in the number of uncertain input variables in contrast to discrete probability methods (Morgan et al. 1990, p. 199). Moreover, there is no need to discretise continuous distributions, since the values can be directly taken from a continuous distribution.

Concerning the sampling process, i.e. how random numbers are chosen out of a given distribution, broadly two main methods can be compared: random sampling and stratified sampling. Random sampling is also called pseudo-random because of the fact that the random numbers are machine-generated by a deterministic process and are therefore not random in a strict sense. The advantage of this method is that it produces unbiased estimates of the mean and the variance.

Of particular importance during the sampling process is not primarily the randomness of the sample but a resulting equidistribution property of data points in the distribution. This expresses the need for a better and more complete coverage of the sample space of the input factors than it is possible with random sampling. Stratified sampling can improve the coverage by dividing the input space into strata. Input values are then obtained by sampling separately from within each stratum instead of the whole distribution (Morgan et al. 1990, p. 204). A widely used method for stratified sampling is Latin Hypercube sampling (LHS).

For LHS each uncertain input variable is divided up into equiprobable intervals or strata and a single value is sampled at random from within each of these intervals according to the distribution function. This step is repeated as often as required. The division of the input space assures that the sampled data points are more evenly spread out, so that the sample from each input represents the mean and variance of the distribution more accurately. This is especially the case if the model is roughly linear and if output uncertainty is dominated by only a few input variables. Problems can occur for models that exhibit periodicity with respect to an input (Morgan et al. 1990, p. 205).

Next to the two categories discussed above, there exist also quasi-random sampling, which is characterised by an enhanced convergence rate, and importance sampling. The latter technique generates more sample points to illuminate certain aspect of special interest and fewer in other parts in the case that the analyst is more interested in some parts of the output distribution.

Although Monte Carlo analysis gives a distribution of an output variable and insights into the relative importance of different input variables, it possesses several drawbacks. Ultimately the accuracy of the outcome distribution depends on the accuracy of the probability density functions of the uncertain input variables. In most cases, neither mean nor range and probability distribution are known, which makes it very difficult to choose a meaningful distribution. Nordhaus (1994, p. 144) states that the definition of a distribution function of uncertain variables in this context sometimes resembles “fine arts more than high science”. In general, it can be said that the selected range has a bigger influence compared with the assigned distribution (Saltelli et al. 2000, p. 21). This is because high impact, low probability events can be important to consider.

Another problem is the accuracy of the method. This can be addressed by increasing the sample size, which again leads to another problem. The number and dimensionality of uncertain variables can render Monte Carlo analysis impractical to use. Today’s energy models rely on many uncertain input variables, which possess a large dimensionality and show mutual interactions. The PAGE2002 model, for example, has 19 unrelated variables with independent distributions. To have, on average, at least one iteration from the most unlikely quintile (5%) for all 19 variables, it would be necessary to run the model 20 trillion times (Stanton et al. 2008, p. 7). This makes it basically impossible to illuminate worst case situations in most variables at the same time.

In addition, it is not always simple to identify policy relevant variables via uncertainty propagation. An outcome variable, such as CO2 emissions, can vary greatly with changes in an input variable. But this pattern can be exactly the same across policy alternatives, so that this method will not necessarily identify the policy relevant variables and parameters. An alternative is to vary certain policy-relevant parameters, such as a CO2 tax or a renewable share as an additional constraint in the model.

A last problem is how to assess the correlation between different uncertain input variables and an according representation in the probability function. It is very difficult to specify joint distributions due to the unknown extent of correlations between variables. In the presence of significant interdependencies among variables, uncertainties can be grossly misrepresented if an independent distribution is specified for each variable.

Examples for an application of Monte Carlo analysis in energy modelling are the ICAM, EPPA, MERGE and PAGE model (Dowlatabadi 1998; Webster et al. 2002; Kypreos 2008; Hope 2009). In addition, the Stern Review (Stern 2007, p. 229) has been underpinned by a probabilistic model developed by Dennis Anderson. Further studies that have employed uncertainty propagation as a tool of uncertainty analysis can be found in an overview compiled by Peterson (2006, p. 14).

Another concept used in this context is rank transformation. This is a procedure where data points for all input factors are replaced with their corresponding ranks 1 (highest value) to N (lowest value). After generation, the observed outcomes are also replaced by their corresponding rank. In a next step, one is able to perform a regression analysis, where the outcome variable is the dependent variable and the input variables are the independent variables. Based on this regression a partial rank correlation coefficient can be calculated that measures the specific contribution of each uncertain input to the output uncertainty. The difference in the coefficient of determination (R2) between the transformed model and the one based on raw data indicates the nonlinearity of the model. Rank transformation can be particularly useful for regression analysis in a highly nonlinear model. One example where rank transformation has been employed, is the PAGE model (Hope et al. 1993, p. 336). This method can also serve to identify conceptual errors if the estimated sensitivities possess the wrong sign (see e.g. Kleijnen 1994, p. 327). In principle, a ranking of uncertain inputs is also possible based on

sensitivity analysis, but this does not enable the analyst to perform a meaningful regression analysis due to the lack of sufficient data.

A limitation to this approach is that an altered model is being studied, so that possible sensitivity measures give information about a different model. Through the rank transformation the importance of higher-order interactions are decreased at the benefit of first-order terms (Saltelli et al. 2000, p. 26). This opens up the possibility to overlook the influence of interactions in an analysis based on ranks.

In document Decomposing long-run carbon abatement cost curves - robustness and uncertainty (Page 169-173)