Dam Risk and Uncertainty
9.3 Sampling Methods
In water resource engineering, making a decision about system operation and capacity is strongly dependent on the system’s reaction under some predictable conditions. However, it is not possible to assess the system’s reaction with distinct certainty, as the various system components are subject to
different kinds of uncertainty. One problem regarding the different uncertain variables in complex and nonlinear models is deriving the PDF of uncertain variables and determining the appropriate statisti-cal moments or probability distribution of model outputs. Furthermore, any analysis in the real world is based on historical recorded data, while usually historical records are not long enough and the data includes all sorts of errors. Sampling is potentially an applicable method to compound several random input values and get results with appropriate accuracy. Sampling can be defined as the procedure of selecting an individual from a specific statistical population to evaluate characteristics of the entire population. In other words, it is the extrapolation from sample to the population and helps engineers to improve quality of data and saving in time and cost. Hence, the MCS and LHS, as two significant sampling techniques, will be briefly explained in the following sections.
9.3.1 Monte carlo Simulation
Simulation is a process of recreating a real situation, usually based on a set of hypotheses and math-ematical formula. Simulation is a useful tool for evaluating system performance in different conditions and also to test new theories in the form of a computer program. The MC process is a numerical simu-lation that replicates stochastic variables according to a certain statistical distribution. In other words, MC uses random numbers to model a desired process. To generate continuous random numbers based on the MCS, consider X as a random variable and Fx(X) as its cumulative distribution function (CDF);
the inverse function for any value of u ∼ u(0,1) can be written as
X=Fx−1( )u (9.1)
where
Fx−1( ) is the inverse functionu u has a uniform distribution on (0,1)
It should be noted that the continuous probability distributions in hydrosystem engineering are strictly uptrend for all random variables X, and thus, there is a unique relationship between Fx(x) and u as u = Fx(X). To generate m random variables using the CDF-inverse method, the following steps should be repeated m times:
1. Draw a uniform random variate as u ∼ u(0,1) (random number generator).
2. Find x such that x F u= x−1( ).
There are two major concerns about the MCS. First, it needs large computations to generate random values, and second, its result accuracy strongly depends on the number of iterations and simulations.
In this method, increasing sample size is a prerequisite to achieving higher precision results, while the achieved results will lead to sampling errors related to the number of selected random variates with an
TABLE 9.1 Uncertainty Analysis Methods
Analytic Methods Approximation Methods Sampling Methods
Derived distribution First-order variance estimation Monte Carlo simulation Fourier, Laplace, and exponential
transforms Rosenblueth’s probabilistic point estimation Latin hypercube sampling Mellin transforms Harr’s probabilistic point estimation Correlated sampling Estimations of probability and
quantile using moments Li’s probabilistic point estimation Antithetic variates Source: Tung, Y.K. et al., Hydrosystems Engineering Reliability Assessment and Risk Analysis, McGraw-Hill Professional, New York, 2005.
Dam Risk and Uncertainty 151
inverse relation to the sample size number. On the other hand, increasing sample size entails an increase in computer time needed for generating random variates and the simulation process.
9.3.2 Latin Hypercube Sampling
As stated earlier, increasing sample size in sampling-based methods can reduce sampling errors, but simulation process and computer time for generating random variates will be increased. On this basis, there are some reduction variance techniques to increase the precision of MCS outcomes without the need to increase sample size [24]. Some of the most important methods of variance reduction are antithetic-variate technique, control variates, importance sampling technique, LHS, correlated sam-pling, and stratified sampling technique. LHS is one of the main variance reduction techniques that can increase the efficiency of the output statistics parameters. This method is frequently used to decrease the number of necessary runs of MCS to achieve a reasonably accurate random distribution. In this method, the range of each variable is divided into m nonoverlapping intervals with the equal probability 1/m. Then, a random variate is selected from each range with regard to the desire probability distribu-tion [22]. A simple and primary algorithm for applying the LHS method is
1. Divide the range of input variables into the number of m.
2. Generate M uniform random number from U(0,1/M).
3. Perform random permutation.
4. Determine random variates (xi,j) by applying the following equation:
x F
m P r
i j, = j i j, − i j,
−1 1 (9.2)
where ri,j and Pi,j are random number and random permutation, respectively [17].
Based on the LHS method, each generated random variate is placed in a separate interval with the equal probability of 1/m. For example, Figure 9.1 shows the range of each variable for m = 5 nonoverlapping intervals with an equal probability of 1/5 = 0.2.
On the other hand, the generated random variates from the MC technique are randomly distributed, and there may be more than one random variate, or no random variate placed in an equal probability area. Figure 9.2 illustrates the main differences between the MCS and LHS techniques.
0.2 0.2 0.2 0.2 0.2
a b c d e
FIGuRE 9.1 Nonoverlapping probability area for m = 5.
As it can be seen from this figure, with the LHS sampling strategy, each row and each column is filled by a black circle, which represents a generated random variate. Whereas based on MCS method, some rows and columns do not contain any black circle and some rows and columns have been filled with more than one random variate.