Monte Carlo simulation for probabilistic analysis

4.1 Introduction

4.1.2 Monte Carlo simulation for probabilistic analysis

The most fundamental question to answer is: “Why to use Monte Carlo (MC) simulation in this study?” Two reasons are covering this issue: (1) The final outcome of the calculations performed is depended on an unknown relationship between all the risks involved in the five construction phases of a typical building project. All the construction phases in the execution of a building are interconnected in terms of schedule and budget decisions without having although a clear mathematical connection. (2) All the risks involved in each cost element/construction phase are unknown variables. Consequently it is needed a set of random distributions to model them.

The MC method, as it is understood today, encompasses any technique of statistical sampling employed to approximate solutions to quantitative problems5_{. A system or a real-life situation is}

developed and described by correlated or uncorrelated system-specific variables. These variables have different possible values which are represented by probability distribution functions of the values for each variable identified in the system. The MC method simulates the system many times (hundreds or thousands), each time randomly choosing a value for each variable from its probability distribution (Kwak & Ingall 2009, p. 45). The outcome is a probability distribution of the overall value of the system calculated through the iterations of the model (Kwak & Ingall 2009, p. 45).

The Project Management Institute (PMI) had defined the MC method as:

“A technique that computes or iterates the project cost or schedule many times using input values selected at random from probability distributions of possible costs or durations, to calculate a distribution of possible total project cost or completion dates (PMI, 2004a)”.

In Monte Carlo simulation, risk and uncertainty is represented by probability distributions which recognize that each value in a range of potential outcomes has its own probability of occurring. Probability distributions are therefore a much more realistic way of describing uncertainty in risk analysis (Wallace, 2010, p. 6). Monte Carlo becomes important in project managerial decisions as it enables the project manager to justify schedule reserves, budget reserves or both with the issues that could adversely affect the project completion. The output accuracy of the MC results, regardless the scope adopted (budgeting, scheduling or both) depends on the iterations number and on the subjective input required before the simulation.

Monte Carlo method, although constitutes a powerful tool in the hands of project or cost engineers, is effective only if it is provided with the appropriate information to simulate. Among its limitations, MC requires accurate input values derived from experts or professionals judgments. The selection of probability distributions is a tricky decision step affecting the final outcome. Kwak and Ingall (2009, p. 50) had noted this, rising the focus on the prior experience and the detailed data from previous projects that project managers should use in reviewing estimates and choosing probability distributions. Two other drawbacks of MC method were reported by Williams (Williams 2003): firstly the requirement for high computing power plus the resources and time demanded to complete a simulation activity and secondly the very wide project duration distributions that lead to executing unintelligent iterations while assuming no management action.‖

In respect to the three limitations of MC method as identified above, this study takes the following measures:

P a g e | 76

1. The choice of probability distributions is driven based on the most recent trends on probabilistic cost estimating and it will be discussed below.

2. The high computational power was not an inhibiting factor as the software used (@RISK) is an excel add-in performing smoothly.

3. Project duration distributions were not examined in this study as the focus of the risk analysis is based only on cost.

Previously (in § 3.2.2.3), while analyzing the tools used for conducting risk assessment, we saw that in every stochastic-probabilistic process five questions have to be answered:

1. What data are available for each cost element (Xi)?

2. Are the individual cost elements (Xi) strongly correlated or are they relatively independent? 3. What data for the final cost (Yi) are required? Is the mean and variance sufficient, or is the

probability distribution function for Y needed?

4. Does the model contain only additive or multiplicative combinations of Xis, or are there other more complex forms?

5. How many cost elements are there in the model? The study addresses these questions, as follows:

1. For every risk of the 27 collected, the available data are: the probability of occurrence and the impact range of each risk. Both data are modeled throughout appropriate distributions and their combination provides an individual level of risk effect. This effect of each risk is then transformed to a cost impact value for the cost line (or cost element) assessed.

2. Whether or not correlations exist among the 27 individual risk factors (Xis) is a very difficult and arbitrary task. From the preliminary talks with the participants, it was seen that they were not enable of providing correlation values. Thus we have to assume no correlations among the 27 individual risk factors.

3. The final cost (Yi) of each of the 5 elements assessed requires two monetary values. Firstly, the base estimate (Ej) of each cost element. Secondly, the cost risks (CR) quantification. Both values are obtained throughout the values participants provided in the survey form. Simply the cost element (Yi) is the sum of the base estimate (Ej) plus the summation of cost risks as shown in the formula below:

𝑌𝑗 = 𝐸𝑗 + 𝑛 ∗ 𝐶 𝑅_𝑖 𝑋𝑖

(12) Where:

Y = cost element (or construction phase) j = number of cost elements (range: 1 – 5) E = estimate

n = number of risks assessed in the specific cost element CR = cost risk

i = number of individual risk factors (range: 1 – 27)

4. The cost risk estimates derived by the application of the model are considered additive. The summation of cost risks (Xis) leads to the total project cost.

5. The execution of building projects was decomposed into five cost elements. The model is constructed based on the sequence of the construction phases as follows: (1) land preparation, (2) foundations, (3) substructure, (4) superstructure, and (5) finishes.

4.1.3 Statistics, Tests & Probability distributions

Statistics is the wide area of mathematics undertaking the summary and analysis of data collected. The field of statistics is divided into two basic groups: descriptive and inferential statistics. Descriptive statistics is a branch of statistics in which data are only used for descriptive purposes and are not employed to make predictions (Sheskin 2003, p. 1). So the use of descriptive statistics aims to present and summarize data. Inferential statistics employs data in order to draw inferences or make predictions

P a g e | 77

(Sheskin 2003). Typically, sample data are derived to draw inferences about one or more populations from which the samples have been derived.

The statistical tests deployed are divided into: parametric and non-parametric. Parametric is a statistical test when the information about the populations (sample) is completely known by means of its parameters. Frequently used parametric tests are: f-test, z-test, t-test and ANOVA. Non-parametric is called the test when there is no information about the populations and its parameters. A non- parametric test is often useful to test hypotheses regarding the population examined. Frequently used non-parametric tests are: mann-Whitney, rank sum test, Wilcoxon rank/sum test and Kruskal-Wallis test. Below some basic tests are summarized (Table 17).

Various types of probability distributions play a role in modeling expert opinion (Table 18). Probability Distribution Functions (PDFs) can be categorized into: parametric and non-parametric. Vose (Vose 1997) defined as parametric a distribution based on a mathematical function which, combined with one or more distribution parameters which determine its shape and range. These parameters will often have limited obvious or intuitive relationship to the distribution‘s shape. Vose defined non-parametric distributions as the ones whose shape and range are directly determined by their parameters in an obvious or intuitive way.

Table 17. Parametric and analogous non-parametric tests (Hoskin 2012)

Analysis type Parametric tests Non-parametric tests

Compare means between two distinct/independent groups

Two-sample t-test Wilcoxon rank-sum test Compare two quantitative

measurements taken from the same individual

Paired t-test Wilcoxon signed-rank test Compare means between

three or more

distinct/independent groups

Analysis of variance (ANOVA) Kruskal-Wallis test

Estimate the degree of association between two quantitative variables

Pearson coefficient of

correlation Spearman‘s rank correlation

Table 18. Frequently used Probability Density Functions

Parametric Non-parametric (or Distribution

free)

Normal Uniform

Log-normal General

Beta Triangle

Weibull Double Triangle

Pareto Cumulative

Log-logistic Discrete (e.g. Binomial, Poisson) Hyper-geometric Beta PERT*

Gamma & Exponential Beta PERT

*_{The beta PERT distribution is frequently used to model an expert‘s opinion. Although it is, strictly}

speaking, a parametric distribution, it has been adapted so that the expert need only to provide estimates of the minimum, most likely, maximum values for the variable and then the Beta PERT distribution finds a shape that fits (Vose 1997).

□ “(Three) Points estimating”

Traditionally construction cost estimates have been ―point-in-time‖ estimates that represent a single value for the cost of a project and its elements, which can lead to miscommunication among

P a g e | 78

designers, project managers, funders and decision-makers (Hobbs 2010, p. 21). Point estimates are one methodology designed to fit the probability distributions describing the cost of each cost element (i.e. metals, finishes, earthworks) in building projects.

The three-point estimation technique is used in project management for the construction of an approximate probability distribution representing the outcome of future events. When information about risk sources is significantly limited making a judgment from experience is required. Three-point estimates are often made for the cost or schedule effects of project risk. However they may also be used in connection with other important variables. Three-point estimates originated with the Program Evaluation and Review Technique (PERT). This method uses three estimates to define an approximate range for an activities cost: Most Likely (M), Optimistic (O) (or best case), and Pessimistic (P) (or worst case). The cost estimate is calculated using a weighted average, as the next formula shows:

𝐶𝑜𝑠𝑡 𝐵𝑎𝑠𝑒 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 =𝑂 + 4𝑀 + 𝑃

6 (13)

When using a range-estimating methodology, the cost items are assumed to be random variables rather than known parameters. As a result, a probable distribution of total cost is obtained as compared to a single-point estimate that would be generated in a deterministic cost approach. The resulting probability distribution enables the cost estimator to define cost estimate values that can be associated with prescribed levels of confidence, thus permitting a quantification of the exposure to financial risk (Back et al. 2000). In respect to applying ―point-estimates‖ the most frequently used PDFs associated with 3-point estimates are the Triangular and Beta PERT distributions. Below it is presented the shape of these distributions (Figure 37) and their basic parameters: mean and standard deviation (Table 19).

Figure 37: Triangular and BetaPERT PDF shapes

Table 19. Mean & St. Deviation parameters of Triangular and Beta PERT distributions

PDF Mean Standard Deviation

Triangular 𝜇 =𝑎 + 𝑏 + 𝑐 3 𝜍 = 𝑎 2_{+ 𝑏}2_{+ 𝑐}2_{− 𝑎𝑏 − 𝑎𝑐 − 𝑏𝑐} 18 Beta PERT 𝜇 = 𝑎 + 4𝑏 + 𝑐 6 𝜍 ≅ 𝑐 − 𝑎 6

Several other PDFs are often investigated and used by scholars. However, triangular distribution has be the dominant in the choice of cost engineering studies. Nemuth (2008, p. 8) recognizes also that for regular and practical cases the triangular distribution with the threshold values (minimum, most likely, maximum) are useful. Other continuous distributions, such as rectangular, beta, normal or uniform distributions could be used. Humphreys (Humphreys 2008, p. 6) points out the use of double triangular distribution in cases where the single triangular implies a probability of overrunning

P a g e | 79

different from the probability assessed by the project team. Touran and Wiser (Touran & Wiser 1992) used a multivariate lognormal distribution for generating correlated random numbers for various construction cost components.

Some researchers have posed their objection against the assumption of using triangular PDFs for modeling cost estimations in construction projects (Chau 1993, 1995b). In the study of Piccardi (Picardi 1972) the lognormal distribution is preferred against the triangular one. Hulett, Hornbacher, and Whitehead (Hulett et al. 2008, p. 14) recommended a switch from the triang to the trigen distribution.

4.1.4 Monte Carlo assumptions 4.1.4.1 Types of probabilities

―Subjective‖ and ―objective‖ probabilities are the two different types in which the term probability is categorised. The differentiation of these probabilities is based on how they are estimated and the relation to some real world phenomena. A straightforward way for dealing with the two types of probabilities is what Anscombe and Aumann interpreted: objective probabilities are seen as ―physical probabilities‖ or ―chances‖ and subjective probabilities are seen as ―logical probabilities‖ (Anscombe & Aumann 1963). Subjective probabilities are based on expert experience or knowledge of the process at hand.

The measurement of the likelihood as perceived and expected by individuals is dependent on the past experiences and knowledge. The observed preferences of decision-makers defines their subjectivity in the probability measurement as in past studies was widely examined (Ramsey 1931; Savage 1954; Kahneman & Tversky 1974).

Objective probabilities are based on historical data. Observed data on various outcomes (i.e. failures rates, accidents rates, cost overruns) can be recorded and elaborated to produce a range of probability distributions representing them. Objective probabilities represent the frequentist school and mathematically are expressed by the ―relative frequency‖ as was seen previously (see in § 3.1).

Due to the uniqueness and fragmented nature of the construction industry, subjective probabilities seem to be used more used than objective probabilities in construction risk management. The lack of historic data or availability of limited suitable data hinders the process of predicting variability in future outcomes (i.e. returns, cost outcomes, contingencies).

The present study deals with subjective probabilities assigned by the contractors onto the risk factors assessed.

Taking the value (estimate) of each cost element as a random variable, Xi, the probability that the cost element takes the certain value can be described by a probability distribution function, P(X i=x), where

x is the possible value of the cost item (practically the risk-adjusted estimate) and P(X i=x) is a function that assigns a probability to every possible cost element value, x.

This allows us to place or identify:

1. a value of how much the cost estimate is likely to vary within a degree of (statistical) confidence 2. a value of how much is the contingency level on the desired confidence level

3. contribution values of each risk factor on the cost estimate for each cost element

4.1.4.2 Choice of probability distributions

Choosing the most appropriate probability distribution is one of the most elegant issues as a researcher I had to consider and implement. Although the criticism existing around the use and properties of the triangular distribution, such that underestimates the uncertainty in cost component variables (Chau 1995a p. 376) or that it negatively predisposes experts to underestimate their best and worst case scenarios (Smith et al. 2006, p. 90), there are still valuable properties of the Triangular distribution that should be seriously taken into consideration before proceeding to the simulation stage. An example of risk distributions is provided in Table 20 (Smith et al. 1999, p. 86).

Considerable arguments in favor of using triangular distributions have been provided when dealing with construction costs (Back et al. 2000, p. 30). Other studies respect this pattern and apply the same

P a g e | 80

the triangular distribution (Salling & Leleur 2011). Four conditions have been proposed, regarding probability distributions, when dealing with construction cost estimating activities:

1. On any estimate, upper and lower limits should exist. Beyond these limits an estimator is relatively certain that no values will occur. Consequently, a closed-ended distribution is desirable. 2. The distribution should be continuous. It is illogical to assume that the probability distribution for

project costs is discrete. The construction cost may have any value within reasonably defined limits.

3. The distribution of construction cost will be unimodal. This is particularly relevant to construction cost, as it is expected that cost will have one most-probable value.

4. The distribution must be able to have a greater freedom to be higher than lower with respect to the estimation - skewness must be expected.

Table 21 below summarizes the four properties required for applying a probability density function in construction activities cost estimations, as presented in (Back et al., 2000, p. 30). The triangular distribution as can be seen is the only distribution satisfying the four pre-set conditions (Table 20). Based on the table‘s options and the next figure‘s decomposition of distributions options we conclude to the choice of the triangular distribution (Figure 38).

Table 20. Examples of probability density functions for building activities

Activity Distribution

Construction Long-normal

Equipment Triangular

Market size Triangular

Market share Triangular

Unit price Triangular

Operational cost Triangular

Table 21. Properties of possible probability distributions

Distribution Bounded Continuous Unimodal Skewness

Normal No Yes Yes No

Log-normal At one end Yes Yes Yes

Uniform Yes Yes No No

Gamma At one end Yes Yes Yes

Triangular Yes Yes Yes Yes

Beta Yes Yes Sometimes Yes

Figure 38: Distributional choices6

P a g e | 81 4.1.4.3 Interdependency of system variables

The correlation represents the co-movement of two cost elements; when one is more expensive, the other tends to cost more as well or cost less for a negative correlation (Yang 2005, p. 275). Correlation is an indicator ranging from -1.0 to +1.0 representing the strength of the interdependence between two or more variables and it simply shows if the cost elements are expected to ―move together‖. One of the more common sources of error in Monte Carlo simulation is that cost components are assumed to be independent, so changes in one cost component do not affect any other component. This is clearly inaccurate in typical construction projects (Touran & Wiser 1992, p. 260). There is a huge problem with reflecting interdependencies among various risks in risk analysis (Kaczmarzyk 2013). Wall (Wall 1997, p. 241) had argued that ―the effect of correlations is more significant than the effect of the choice of distributions‖, the same observation was shared by (Chau 1995a; Le Isidore et al. 2001, p. 419).

This frequent cited concern with Monte Carlo simulation lies on the assumption that ―all of the cost components or system variables are independent (Hobbs 2010, p. 25). In reality however, there is usually significant correlation between risks in projects since if one thing goes well or badly other things tend to follow (Wallace 2010, p. 19). Therefore project managers have to seriously consider the effect of correlation into their results interpretation as correlation can increase the total project uncertainty.

Hulett (Hulett 2002) has identified two basic reasons for explaining why correlation occurs: Firstly and most important is the influence of a common ―cost driver‖ that can influence the total project actual cost if correlation is strongly positive. If a new ground freezing technique is used several other cost elements, such as labor productivity, workers‘ familiarity with the new technology have to be successful, too. Secondly, one element may depend on other. For example the cost of providing an incomplete Ground Baseline Report (GBR) to a contractor and demanding from an inexperienced contractor to deliver a high-quality and on-time earthwork project. The incompleteness of the GBR in combination to the contractor‘s inexperience can lead to an important cost overrun and schedule

In document Risk sharing in traditional construction contracts for building projects A contractor's perspective in the Greek construction industry (Page 91-200)