5 Describing random variables
5.3 The choice of distribution
One task is to determine the most appropriate type of distribution for each variable and the corresponding parameter values. The data forming the basis for the choice of a specific distribution are usually limited. This leads to the question, "How should the distribution be selected in order to represent the variable as accurately as possible?".
Firstly, as pointed out by Haimes et al. (1994), the distribution is not formally selected. The distribution is evidence of, and a result of, the underlying data. In many cases the distribution type is determined by what is previously known about the variable. For example, a strength variable cannot have negative values, which eliminates some distributions. Two categories can be defined depending on the amount of data available separated:
• if the amount of data is large
• if the amount of data is small or irrelevant.
This implies that there are two methods available for the selection of a distribution and the corresponding parameters. The probability distribution of the event can be estimated either according to the classical approach, or according to the subjective approach, also known as the Bayesian approach, after the English mathematician Thomas Bayes (1702-1761).
5.3.1 The classical approach
If the data base is large, the distribution can be easily determined by fitting procedures. The parameters of the distribution can be derived by standard statistical methods. This is normally referred to as the classical approach.
The classical approach defines the probability on the basis of the frequency with which different outcome values occur in a long sequence of trials. This means that the parameters, describing the variable, are assigned based on past experiments. There is no judgement involved in this estimation. It is based purely on experimental data.
Additional trials will only enhance the credibility of the estimate by decreasing the variability. The errors of the estimate are usually expressed in terms of confidence limits.
An example of the frequency defined according to the classical approach is illustrated by the calculation of the probability that throwing a dice will result in a ‘four’. The conditions of the experiment are well defined. Throwing the dice a thousand times will lead to the probability of 1/6 that the result will be a ‘four’. The probability will not be exactly 1/6 but close to it. Increasing the number of trials will improve the probability.
5.3.2 The Bayesian approach
If only a small amount of data is available, this data together with expert judgement can be used to form the basis for the choice of distribution, which has the highest degree of belief. The choice will thus be partially subjective. By applying the method of Bayes, the subjective distribution can be updated in a formal manner, as soon as new data become available.
Bayes’ method assumes that the parameters of the random
variables are also random variables and can therefore be combined with the variability of the basic random variable in a formal statistical way by using conditional probabilities. This assumption will reflect the probable uncertainty inherent in the variable. The estimate of a parameter which is based on subjective judgement is improved by including observation data in the estimate. The new estimate is a probability, on condition that experiments or other observations have been performed, and that these results are known. The method can be used for both discrete probability mass functions and continuous probability density functions.
Applying the dice example to this approach means that the person, conducting the experiment, does not have to throw the dice at all. He knows from past experience and assumptions that the
probability will be 1/6 if the dice is correctly constructed. He makes this estimate by judgement. If the dice is corrupt and favours the outcome ‘two’ this will only be seen in the experiment
conducted according to the classical approach. The subjective estimate will, therefore, be false prediction of the true probability of the outcome 'four'. However, he can make a few throws to see if the dice is correctly balanced or not. Based on the outcome of this new experience, he can update his earlier estimate of the true probability, using Bayes' theorem. If subsequent new trials are performed and the probability continuously updated, subjective method will converge towards the classical estimate of the probability.
5.3.3 Bayes' theorem
In the following, a brief formal description of Bayes' theorem will be presented. A more detailed description can be found in, for example, Ang et al. (1975).
Each variable can be assigned a PDF which the engineer thinks represents the true distribution reasonably well. This first assumption is denoted the prior density function. The improved distribution, achieved by including new data, is denoted the posterior density function.
For a discrete variable, Bayes' theorem can be formulated as
P P P P P i i i i n ( ) ( ) ( ) ( ) ( ) Θ Θ Θ Θ Θ = = = = =
∑
θ ε ε θ θ ε θ θ | | = | = i i 1 [5.3]describing the posterior probability mass function for the random variable Θ expressed by i = n possible values. The posterior probability is the result of considering new experimental values, ε, in combination with the prior probability P(Θ = θi). The term
P(ε|Θ=θi) is defined as the conditional probability that ε will
occur, assuming that the value of the variable is θi. A short
Assume that the probability of a fire occurring which can be described by the fire growth rate, αf, can be expressed as the
discrete function illustrated in Figure 5.2. The figure illustrates the probability (vertical axis) as a function of the fire growth rate, αf (horizontal axis). The value αf can be calculated giving
0.009 kW/s2, as can be expected from the figure.
0 0.005 0.01 0.015 0.02 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 5.2. Prior probability mass function of αf.
After carrying out an extensive post-fire investigation on similar fire scenarios, the investigators' results indicate a slightly
different probability function, as illustrated in Figure 5.3. This new information will be used to update existing information in terms of the prior probability information. It is evident that the new data are more uniformly distributed.
0 0.005 0.01 0.015 0.02 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 5.3. New data on the variable αf after the post-fire
investigation.
The posterior probabilities for αf = 0.005, 0.01 and 0.015 kW/s2
can now be evaluated.
P( f . ) . . . . . . α = = ⋅ ⋅ + ⋅ + ⋅ = 0 005 0 3 0 4 0 3 0 4 0 6 0 3 01 0 3 0 36
The other two probabilities can be derived in the same manner
P P f f ( . ) . ( . ) . α α = = = = 0 01 0 54 0 015 0 10
and are illustrated in Figure 5.4. The new value of αf can be
derived based on the posterior probability function
αf =0 36 0 005 0 54 0 01 0 10 0 015⋅ + ⋅ + ⋅ =0 0087kW s
2
0 0.005 0.01 0.015 0.02 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 5.4. Posterior probability mass function of αf.
The theorem can also be used for continuous functions and the appearance is similar to that in the discrete situation. The solution usually requires numerical integration procedures.
f P f P f d " ' ' ( ) ( ( ) ( ( ) θ ε θ θ ε θ θ θ = −∞ ∞