A novel multiple linear regression model for forecasting S-curves Karl Blyth White Building Services Ltd, Newton-le-Willows, UK, and

(1)

A novel multiple linear regression

model for forecasting S-curves

Karl Blyth

White Building Services Ltd, Newton-le-Willows, UK, and

Ammar Kaka

School of the Built Environment, Heriot-Watt University, Edinburgh, UK

Abstract

Purpose– Cash flow forecasting is an indispensable tool for construction companies, and is essential for the survival of any contractor at all stages of the work. A simple and fast technique of forecasting cash flow accurately is required, considering the short time available and the associated cost. Seeks to examine this issue.

Design/methodology/approach– The paper argues that instead of producing an S-curve that is based on historical projects combined (state-of-the-art is based on classifying projects into groups and producing a standard curve for each group simply by fitting one curve into the historical data), here the attempt is to produce an individual S-curve for an individual project. A sample of data from 50 projects was collected and 20 criteria were identified to classify these projects. Using the most influential criteria, a multiple linear regression model was created to forecast the programme of works and hence the S-curves. A further six projects were used to validate and test the model.

Findings– The results of the model developed in this paper were compared with previous models and evaluated. It is concluded that the model produced more accurate results than existing value and cost models.

Originality/value– The paper proposes an alternative and novel approach to the development of standard value and cost commitment S-curves. This approach is based on a multiple linear regression model of the programmes of works.

KeywordsCash flow, Financial forecasting, Project management, Construction industry

Paper typeResearch paper

Introduction

Cash is the most important of a construction company’s resources, because more companies become bankrupt due to lack of liquidity for supporting their day-to-day activities, than because of inadequate management of other resources (Singh and Lakanathan, 1992). Insolvency is more likely to occur in this industry than any other (Kaka and Price, 1993).

Cash flow forecasting is essential for the survival of any contractor at all stages of the work. Ideally, cash flow forecasts should be based on the construction programme and a bill of quantities (Allsop, 1980). Cash flow forecasting at the tendering stage needs to be simple and fast however, considering the short time available and the associated cost. Contractors rarely prepare a detailed construction plan at this stage, and usually wait until winning the contract. Therefore a simple and fast technique for forecasting cash flow accurately is required.

The majority of cash flow forecasting models developed have been based on standard value S-curves, representing the running cumulative value of work, developed using data from completed construction projects. S-curves are widely used in the industry for controlling projects throughout their execution phases. They are

The current issue and full text archive of this journal is available at

www.emeraldinsight.com/0969-9988.htm

ECAM

13,1

82

Engineering, Construction and Architectural Management Vol. 13 No. 1, 2006 pp. 82-95

qEmerald Group Publishing Limited 0969-9988

(2)

valuable to project management in stating current status and predicting the future of projects. Although they are used in scheduling and planning, for reporting actual, earned and planned values and for resource loading various activities of a project (Miskawi, 1989), their reliability and accuracy is still in question.

The paper begins with a clear statement of the main objectives of the research followed by a detailed literature review based on previous attempts at developing models to forecast S-curves. A section on how and where the data was collected and used in the development of the model is then discussed. The model is developed and then tested with a set of new projects from the sample that were not used in creation of the initial model. The accuracy of the present model is again tested against relevant previous research models and conclusions are made to determine whether or not the current research has improved on existing research.

Objectives of the research

The research objective was to produce a model that would standardise activities, and forecast the duration, cost and end dates of these activities based upon determined project characteristics. Using this output, the model will then rapidly produce a forecast of cost flow and cash flow for a project at the pre-tender stage. Cost flow and cash flow, if accurately forecast, can be used to plan and control the financial recourses of the project This paper attempts to automatically produce a programme of work based on characteristics of the individual projects (so it produces a project schedule and cost of each activity using regression models). This is then converted into an S-curve. The authors’ argument is that instead of producing an S-curve that is based on historical projects combined (state of the art is based on classifying projects into groups and producing a standard curve for each group simply by fitting one curve into the historical data), here the attempt is to produce an individual S-curve for an individual project. It is the belief of the authors that each project is unique and has a unique S-curve. In order to generate this, regression models are developed to predict what the programme of work should look like. The methodology of the work described in this paper therefore, was to produce a multiple linear regression model that predicts S-curves for individual projects, based on project characteristics (function of building, size, location, construction form etc.). In order to do this the model would be required to both forecast the overall cost and duration of a project, and moreover to establish the costs and timings of individual activities within the construction programme. In order to do the latter, it was necessary to map the programme of works for each building onto a generic format and then to consider the time relationships (or lags), between individual activities. This model would then be tested in order to confirm the hypothesis that the accuracy of such predictions would be greater than for previous models.

To the authors’ knowledge there has not been an attempt at automatically forecasting an S-curve that is unique to project characteristics (in the level of details reported in this paper at least). The other question the paper addresses is whether more accuracy can be achieved by calculating the S-curve from automatically generated work schedules or rather by directly forecasting the S-curve without reference to the programme of works. Previous attempts at developing models to forecast S-curves

There have been many attempts in the past to develop cash flow forecasting models. They were mainly part of more comprehensive models aimed at assisting contractors

Model for

forecasting

S-curves

(3)

or clients forecast their cash flow on an individual project level (Kaka and Price, 1996), or on a company level (Kaka, 1994). The majority of these models were based on the idea of developing standard S-curves to represent the running value or cost of different types of construction projects. Typically this was achieved by collecting data relating to the monthly valuations and the projects’ general characteristics. These projects would then be classified and distributed into groups and average S-curves would then be fitted on the individual groups (Balkau, 1975; Bromilow and Henderson, 1977; Hudson, 1978; Oliver, 1984; Miskawi, 1989; Khosrowshahi, 1991; Evans and Kaka, 1998). Several mathematical models were used to fit the S-curves (e.g. alpha-beta cubic equation, Weibull function, DHSS model etc.). These models could be used, given that the total value and duration of the projects to be constructed are known, to forecast the cumulative monthly (or at any other time interval), value/cost of that project. The accuracy of these previous models is in question (see Kaka and Price (1993), for detailed analysis of previous models).

Kenley and Wilson (1986), argued that the underlying principle of the idiographic approach is that the value curves are generally unique and that they should be modelled separately, hence a curve should be fitted for each project. This concept acts as a basis for the work involved in this paper.

Petros (1996), investigated the effect of having different works plans on the cost flow curve of one project. Four different planners attempted to schedule the construction activities of the same industrial project. Each of the four plans was analysed and used to estimate a cost flow curve. Results showed the significant variability of the possible S-curves for the same project arising from planning differences.

Kaka (1999), suggested that unless more accurate standard S-curves are produced (perhaps by the use of a more detailed classification criteria), contractors should resort to detailed calculations using the works plan and cost estimates. He also indicated that as construction projects are unique, future attempts to standardise the cost/value relationship were likely to fail. He instead used a stochastic model based on historical data, as it allowed users to incorporate variability and inaccuracy in their forecasts and decision making.

The two main findings of the literature search applicable to this research revealed that previous attempts to forecast S-curves have not been accurate. First, that cash flow forecasts are likely to be inaccurate due to the fact that construction projects are unique and the progress of work varies greatly from one project to another, and second, that the choice of project groupings in previous work has been poor. These problems helped shape the nature of this research to test the first hypothesis that more accurate S-curves could be modelled if projects had the same base activities in the works programme, and hence could be more easily related, thus improving the accuracy of the S-curves produced by the model. The second hypothesis responds from the first in that, groups of the same type of project can be more easily modelled as a result.

Data collection

It was necessary to include a wide range of firms in the construction industry to provide a variety of construction projects. The information was sought from clients, contractors, and quantity surveyors as it was believed that these would be the most likely to yield the detailed cost and programme information required. The contractors were identified from the New Civil Engineer Contractors File, 1999 and client and

ECAM

13,1

(4)

quantity surveying organisations were identified via general and employment publicity material. Information was not sought, for example, from architects or engineers, as it is believed that they will not have the access to the detailed information on the costs and programmes that were needed.

The sample consisted of over 200 firms. Each firm was sent a structured, multiple-choice questionnaire, consisting of 20 questions based on the characteristics of the project, along with an accompanying letter explaining the purposes of the data collection and why they had been asked to participate. If it was easier for a company to discuss the data collection, interviews were arranged to collect the data in person, and to discuss any queries that may have been apparent. The structure, content and format of the questionnaire was created from a sample in line with the Nkado (1992), study plus additional questions and also with the principle that characteristics based around the project size are perceived as being effective in the current research in that the determination of such information could be obtained quickly and easily by the respondents in the sample. The questionnaire was split into two sections. Questions 1-8 sought non-numerical (non-interval level), data (e.g. project type, location etc.), whereas questions 9-20 sought numerical (interval level), data (e.g. ground floor area, number of floors etc.). The data required for each project for use in the model were as follows:

. _{the project’s characteristics (based on the multiple-choice questionnaire);} . _{the project programme (i.e. the bar chart for the construction plan); and} . _{the individual activity/element and overall costs (or approximate estimates).}

A total of 47 projects meeting the above specifications were received. The data on a further 14 projects were obtained from a series of cost models published inBuilding

magazine using data provided by the QS practice, Davis, Langdon & Everest (1988-1994). In this, a building project was published each week and a breakdown of the programme of works and the costs of each activity were provided. Five projects were unfinished and therefore rejected. This made a total of 56 building projects for use in the analysis. The sample consisted of building projects only. The projects were finished in the period 1993-1999. The costs were not adjusted using relevant cost indices as it is only the proportions of total cost that are important when used in the S-curve analysis. The cost breakdown of the data can be seen in Table I. It was decided to use 50 random projects in the development of the model and the remaining six were put aside to be used for testing. The sample sizes of both the development data and test data were considered sufficient for this research.

Company type Projects received Projects used Minimum value of project (£m) Maximum value of project (£m) Total value of projects (£m) Contractor 25 24 3.47 29.76 200.68 Q.S. 14 12 5.79 15.32 102.89 Client 8 6 3.31 23.44 51.94 Cost models 14 14 2.20 10.01 119.97 Total 61 56 2 2 475.49 Table I.

Cost breakdown of the data received

Model for

forecasting

S-curves

(5)

When analysing the S-curves, the point of 100 per cent cost commitment for the contractor was taken to be the time of practical completion on site. The origin was taken to be the commencement of work on site, a convenient and easily identified point. Developing the model

Standardising activities and amalgamating durations and costs

As the projects in the data set differed greatly, especially in the number of activities, it was necessary to standardise them in order to be able to compare and combine them. All of the Gantt charts for 50 projects received were analysed to see if any similarities could be found in their activities. The frequency with which specific design options occurred suggested that many could be regarded as generic. A range of between 20 (minimum, one storey), and 39 (maximum, seven storeys), activities were identified and used as a template for the projects in the sample. This was achieved by combining sub-activities into logical groups determined by analysis of all the programmes in conjunction with advice from professional practitioners. The overall duration of a standardised activity was calculated from the start date of the first sub-activity to the end date of the last sub-activity. Costs for each standardised activity were similarly aggregated.

Determination of time lags

The next step in the analysis was to calculate the time lags relating activities to those dependent upon them. This would determine when a specific activity would begin.

Initially, based on the practitioners’ advice referred to above and the specific programmes collected, a “normal” sequence of activities and activity relationships was determined and the associated lag times calculated. If any lag time determined in this manner was negative, then the previous activity in the programme was chosen as predecessor and the lag recalculated until there was a zero or positive lag for all activities in all of the projects. Table II shows the resulting set of relationships.

Using the above relationships, each activity in each programme was then analysed thus:

Time lag¼Activity start2Start of predecessor activity ð1Þ

Duration of activity¼End date of activity2Start date of activity ð2Þ

The time lag expressed as a percentage of the predecessor activity was then calculated: Percentage Lag¼100£Time lag=Duration of predecessor ð3Þ

Regression modelling

Regression analysis is a technique that will fit an expression to a set of collected data. In multiple linear regression analysis, as used in this model, several independent variables are used to model a single response variable (Weisberg, 1980). In this case, since the aims of the project include prediction of overall duration and cost and of cost flow, each activity in the programme may require three models; one whose dependent variable is activity duration, one whose dependent variable is lag time and one whose dependent variable is cost. That is, it was anticipated that, for time prediction alone, up to 78 models would be needed. In fact, as outlined below, in order to deal with

ECAM

13,1

(6)

variability in projects and their characteristics, significantly more models were produced. In all types of model the independent variables are project characteristics collected in the survey.

The regression analysis was performed using a combination of largely automated analysis using SPSS and user-directed analysis using Microsoft Excel. Several hundred models were generated, tested and modified or rejected, using a total of 16 different modelling strategies. Three methodological issues are of interest in this process: dealing with variability in the input data; dealing with non-numerical variables and determining the optimum structure for each model.

Activity no. Activity name Activity dependency

1 Site set-up

2 Foundations Site set-up

3 Drainage Foundations

4 Ground floor Foundations

5 Frame Foundations

6 External walls ground Frame

7 External walls 1st External walls ground 8 External walls 2nd External walls ground 9 External walls 3rd External walls ground 10 External walls 4th External walls ground 11 External walls 5th External walls ground 12 External walls 6th External walls ground 13 Internal walls ground External walls ground 14 Internal walls 1st Internal walls ground 15 Internal walls 2nd Internal walls ground 16 Internal walls 3rd Internal walls ground 17 Internal walls 4th Internal walls ground 18 Internal walls 5th Internal walls ground 19 Internal walls 6th Internal walls ground 20 Internal doors Internal walls ground

21 Lift/stairs 1st floor 22 1st floor Frame 23 2nd floor 1st floor 24 3rd floor 1st floor 25 4th floor 1st floor 26 5th floor 1st floor 27 6th floor 1st floor 28 Roof Frame

29 Watertight (milestone) Internal doors 30 Windows and ext. doors Internal walls 31 Plumbing and sanitary-ware Internal walls 32 Mechanical services Internal walls 33 Electrical services Internal walls

34 Floor finishes Roof

35 Ceiling finishes Internal walls

36 Wall finishes Mechanical services

37 Fixtures and fittings Mechanical services

38 External works Foundations

39 Hand over and clean Wall finishes

Table II.

The standardised building activities in order with their activity dependencies

Model for

forecasting

S-curves

(7)

As was discussed above, the projects in the data set varied significantly in their nature, not least in the functions of the buildings and their sizes. It was expected therefore, that “outliers”, i.e. data-points distant from the general trend lines, might prejudice the goodness-of-fit of the regression models. However, decisions on the removal of outliers always represent a compromise between improving accuracy and reducing the applicability of the data set. Two strategies were used to attack this problem.

First, previous studies have generally indicated that building function is a strong influence on times and costs, therefore tests were performed to examine the relationship between other variables (e.g. size in terms of gross floor area), and building type. On the basis of these tests, it was decided that a number of building types should be modelled separately. This action, though increasing the number of models and the complexity in their application, was considered worthwhile since it greatly reduced the number of outliers across the range of variables.

Second, the remaining apparent outliers were examined in SPSS by assessing their uniqueness. Projects that were apparent outliers, when considering a given variable, and whose characteristics were commonly found amongst the data set were deemed “true” outliers (i.e. truly unrepresentative of the group of projects that shared the characteristics). These data points were removed from the consideration of the specific variable. Projects that appeared to be outliers, but whose characteristics were not represented elsewhere in the data set, were considered to form a logically different group of projects and were retained in the analysis. Any change was not finalised until its effects on both goodness-of-fit (Rsquared), and accuracy of prediction for the test projects were assessed. It was found that the former and latter tests were sometimes contradictory, in which case the latter test took precedence.

Not all of the input variables can be expressed directly in numerical form (e.g. location), and cannot therefore be used in a regression analysis without further processing. Several strategies were tested to deal with the problem. SPSS provides for the problem by allowing the use of “boolean” variables. This method was initially used on the integers designated to the relevant non-interval variable. For example, in dealing with building function each possible value (e.g. “office”, “school” etc.), is assigned a variable that can be either “on” (value¼1), or “off” (value¼0). Obviously, for any single project only one variable for building type can be “on”. As an alternative, this approach was compared to a number of patterns of manual allocation of integer values to represent the different values of a variable (e.g. values in the range 1 to 11 in the case of building type). This approach was also found to be effective by Ameenet al.(2003). A huge number of patterns of allocation were possible, but several random patterns and patterns based on the frequency of occurrence of values were tried. In each case the best method was chosen by goodness-of-fit to the sample data and the accuracy achieved by the regression models produced in predicting values for the test projects.

Not all input variables have the same influence on the predictions produced by the models. Furthermore, the importance of variables may vary between models (i.e. the different models for each activity and each dependent variable). It is therefore usually possible to simplify models by removing the least influential variables. As with the elimination of outliers described above, this process represents a compromise, in this case between simplicity (and thus speed and ease of use), and accuracy. Again, SPSS offers automated methods for this process, identifying the relative influence of each independent variable and adding or removing them in a stepwise fashion (e.g. working

ECAM

13,1

(8)

from most to least influential), and testing the effects at each change. This method was also used by Draper and Smith (1998). As above, this process was compared to number of manual manipulations of the model, including various random selections of variables as starting points for stepwise refinements. In each case the best result, judged by goodness-of-fit to sample data and accuracy of prediction for the test projects, was selected.

Utilizing the modelling framework

The final predictive models were constructed as an Excel template, using the regression equations generated. The template performed any manipulation necessary for preparing input data for use in the equations and did the necessary repetition of calculation needed to deal with multi-storey buildings of varying sizes. When inputting the maximum number of 21 (coded where necessary in the methods discussed previously), interval and non-interval level data variables into the template, it immediately produces predictions for individual activity duration and cost, activity start and end date, as well as the associated time lag from its chosen predecessor and consolidates these into a single programme prediction. A maximum of 16 variables could be chosen and used (out of the 21 input), in the modelling of each dependent variable in the Excel model. This limit was sufficient for most of the models, since SPSS indicated that greatest accuracy could be achieved with fewer than 16 variables. In the few cases where more than 16 variables were identified by SPSS, the final model included only the most influential variables. Tests on the effects on the accuracy of prediction of this simplification showed its influence to be negligible.

The predictive data produced were then automatically linked from Microsoft Excel and transferred into Microsoft Project where it could then be developed further in line with other topics of research that are discussed in other papers.

Calculation of S-curves

The programme and cost data predicted by the regression model was transferred into Microsoft Project and this software calculated the actual monthly costs and hence the cumulative running and total costs for each standardised project. An S-curve was then fitted for each of the fifty projects using the logit transformation and standardised over a set duration of ten time periods, in order to be represented graphically. See Figure 1 for an example of the S-curve generated for a typical project. The logit transformation has been confirmed to be an appropriate and accurate method for fitting S-curves. For more details on the logit transformation see Kenley and Wilson (1986).

Measuring the accuracy of fit

It is necessary to measure the accuracy of fit for the S-curves of any given project for three reasons. The first reason is to compare the actual and predicted curves with each other for any given project. Second, to compare a project’s S-curves with other curves for that given project type. And third, to draw comparisons with this and other calculated models.

The measure chosen, the “standard deviation about the estimate of Y”, or “SDY”, was first put forward as a risk index by Jepson (1969), and was later adopted by Berny and Howes (1982). It was used by Kenley and Wilson (1986), in their idiographic value model. Kaka and Price (1993), Evanset al.(1996), Petros (1996), and Al-Jifri (1998), also

Model for

forecasting

S-curves

(9)

used this method. “SDY” adopts the common measure of dispersion and is calculated as follows: SDY ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X Y2YE2 s =N

Where:Y ¼actual value at any accounting period;YE¼estimated (or fitted), value;

N ¼number of observations/intervals (accounting periods).

This allows models to be compared with one another. The model with the lowest value of SDY has the best fit, and hence is the most accurate.

Kenley and Wilson (1986), devised a systematic process of trial and error to locate an optimum exclusion range in their value curves. It was concluded that an approximate exclusion range of 10 percent at both extremes would produce the lowest average SDY value.

The extreme data points within a cash flow analysis are likely to be the least significant, as they involve small amounts of monies, but can be the most dominant in the regression analysis. It therefore makes sense to leave out the data points outside the 10 per cent range at the extremes.

The exclusion range of 10 per cent was also used by Kaka and Price (1993), but Evanset al.(1996), used an exclusion range of between 10 per cent and 16 per cent in order to achieve better results with their data.

Analysis and testing

The first stage of analysis and testing was to apply the modelling to the 50 projects used to create it. The regression model results for the 50 projects were placed individually into MS Project and their cash flow profiles generated. S-curves were fitted for each of the projects and presented graphically with the actual curves, in order to make a quick visual comparison. The calculation of SDY values, between modelled and actual curves, was used as a final, objective measure of the accuracy of the individual fits.

Figure 1.

A typical example of the standardised S-curve over ten time periods

ECAM

13,1

(10)

A total of 11 project groupings were identified based upon project function (e.g. offices, industrial etc.). The SDY results for the 50 sample projects, distributed according to the 11 project types, can be seen in Table III. The most accurate predictions (ignoring “airports” which had a sample size of one), were for shopping centres (PF9), and produced an average SDY of 1.088 per cent. Furthermore, the standard deviation of 0.357 per cent suggests that the variation around this mean is small. The highest mean SDY of 4.735 per cent for any group came from the “hospitals” (PF6). Conversely, this group also produced the lowest standard deviation and which might suggest strong similarities between the buildings. “General retail projects” (PF2), contained the best and least accurate fits for any individual project (0.533 per cent and 8.852 per cent), and thus, had the highest standard deviation of SDY values. This was possibly indicative of a large variation between buildings in this group.

Although the projects produced good SDY values overall, with a mean error for all projects of under 2.5 per cent, it is clear that some group results were of low significance due to their small sample sizes. Hence, increasing the sample size in each case would give a more realistic measure of accuracy.

Initial results of applying the model were very encouraging, but a more rigorous test would be to apply it to a new set of projects not used in the initial analysis nor used in the development of the model. As mentioned previously, six projects were randomly selected from the initial data set for this purpose. As before, the programme and cost data were generated by entering the projects’ characteristics into the regression template and Microsoft Project was used to process these results into a cash flow prediction. The goodness-of-fit of the predicted curves to the actual curves was assessed and Table IV gives the individual SDY results.

As expected, the test results as a whole were less accurate than for the original projects, but still deemed reliable. The overall mean SDY was a little under 4 per cent and individual projects varied between about 1 per cent and 8 per cent.

Comparisons with previous models

In order to evaluate how accurate the present study is, it was necessary to compare the results with previous value and cost models. Table V illustrates this comparison between the SDY results for individual S-curves.

Project function Mean Maximum Minimum Std dev. Sample size PF1 (Offices) 2.775 5.707 0.547 1.571 11 PF2 (Retail) 3.054 8.852 0.533 2.722 7 PF3 (Housing) 2.431 3.512 1.359 1.006 5 PF4 (Education) 3.042 3.661 1.619 0.830 5 PF5 (Leisure) 1.922 3.516 0.575 1.332 5 PF6 (Hospital) 4.735 4.805 4.664 0.100 2 PF7 (Commercial) 1.552 3.970 0.581 1.619 4 PF8 (Industrial) 2.455 5.000 1.006 2.211 4 PF9 (Shopping centre) 1.088 1.430 0.627 0.357 4 PF10 (Airport) 0.709 0.709 0.709 2 1 PF11 (Museum/Gallery) 1.559 2.099 1.019 0.764 2 Overall 2.493 8.852 0.533 1.636 50 Table III.

The SDY results for the 11 project types for the 50 projects

Model for

forecasting

S-curves

(11)

A small difference in the accuracy when compared to the other models might demonstrate that construction projects are unique, and thus contractors must conduct detailed estimates of the cost and schedule of work in order to forecast cash flow (Kaka and Price, 1993). While an improved accuracy would reveal that the causes of variability of value and cost commitment curves, mentioned earlier, have significant influence.

Despite producing the lowest individual SDY value of 0.250, the Al-Jifri (1998), study was the least accurate, and produced a high maximum of 17.660 and also produced the highest mean average of 7.580. This was mainly due to the fact that the SDY was calculated for cost elements, as well as a limited number of projects, and the absence of features that connected the projects together, e.g. building type. Kenley and Wilson (1986), produced the most accurate results overall, with regards to the lowest mean average of 1.750, but this was put down to the fact that despite the larger sample size than the present study, there were only two building types. This meant that buildings were more likely to be similar and hence use comparable building methods, techniques, etc. Although slightly less accurate in terms of SDY results, but not sample size, than the previously mentioned study, similar reasons can be identified for Kaka and Price (1993), as they only used a total of three building types i.e. commercial, industrial and public, but produced specialised categories like size and type of contract etc.

Study Kenley and Wilsona (1986) Kaka and Price (1993) Petros (1996) Evans and Kaka (1998) Al-Jifri (1998) Present study Projects used 72 150 4 29 39 50

Figures based on Value Cost Cost Value Cost Cost Minimum 1.030 0.475 4.308 1.413 0.250 0.533 Maximum 7.330 8.987 7.618 8.644 17.660 8.852 Mean 1.750 2.996 5.581 5.229 7.580 2.493 Median 1.652 2.736 5.199 5.699 5.430 2.099 Std dev. 0.782 1.572 1.551 1.931 5.312 1.636

Notes:aAs Kenley and Wilson (1986) analysed a total of 72 projects spread into two sample groups, i.e. commercial and industrial, it was decided to combine the findings to produce one set of overall results. This would be a slightly more accurate indicator in comparison, as the present model encompasses a variety of building types

Table V.

A summary of the SDY results of previous models

Test project number and function Individual

1 (Offices) 2.623 2 (Industrial) 7.674 3 (Retail) 1.870 4 (Commercial) 1.677 5 (Commercial) 8.282 6 (Offices) 1.181 Mean 3.885 Std dev. 3.210 Table IV.

The SDY results for the six test projects

ECAM

13,1

(12)

The present cost curve model produced more accurate SDY results overall in four out of the five cases, and queries the conclusion that cost commitment curves, as proposed by Kaka and Price (1993), are the most reliable. The minimum SDY value of 0.533 produced was lower than that of 1.030, 1.413, and 4.308, for Kenley and Wilson (1986), Evans and Kaka (1998), and Petros (1996), respectively, and compared very well with the other two studies. Again, the maximum value of 8.852 compared very well with the more accurate models. Apart from the Kenley and Wilson (1986), study, the present model produced the smallest mean and median, of 2.493 and 2.099 respectively, when compared to the other models, and confirms the overall accuracy of this study. This is reinforced by the small standard deviation of 1.636 establishing that the SDY all of the projects do not diverge from the small mean. When analysing the Kenley and Wilson (1986), study, Kaka and Price (1993), concluded that a group of one type of buildings would have a smaller variability in project S-curves, and hence emphasises the accuracy of the present study which encompasses 11 building groups.

Conclusions

The use of cost curves for the production of standard S-curves has been developed and successfully tested. Previous value and cost models have been discussed and the inadequacies of each have been established. A cost commitment curve was produced by Kaka and Price (1993), and was said to eliminate some of these reasons, and hence concluded that the cost commitment model would be more reliable. Petros (1996), concluded that although slightly more reliable, their accuracy was still in question. With this in mind, it was decided to use cost curves as the basis of the present model, as they are more accurate in general than value curves, for reasons mentioned earlier.

A total of 50 projects were used in the development of the regression model. Prior to this, the programmes of work for all of the projects were standardised so they could be compared more accurately and realistically. This involved producing a set of between 20 and 39 (depending on number of storeys), standardised activities, to which sub-activity costs and durations where amalgamated accordingly. These data were then used in the development of a multiple linear regression model that uses the characteristics of each project to predict each actual activity cost, duration, and start and end date. Hence, the whole programme of works could be predicted. From this the cumulative and monthly cash flow can be produced.

The actual and fitted cost curves were then generated for the 50 projects. The projects were classified into various criteria and distributed into 11 groups based upon project function (building type). The cost curves were then subjected to the logit transformation. The alpha and beta values derived were used to regenerate the average curves of the eleven groups. To determine the accuracy of the model, the SDY was measured to evaluate the individual fits and against the average within a group. A minimum and a maximum SDY of 0.533 and 8.852 respectively were produced. These results were complemented with a 2.493 mean average. With regards to the grouping, it was found that the shopping centres (PF9), were the most accurate buildings as a whole, and produced the lowest mean average of 1.088 compared to the hospitals (PF6), which produced the highest mean SDY average of 4.735.

The model was tested on a further six test projects that were not included in the initial development of the multiple linear regression model. This would be a more objective test of the accuracy of the model. As expected, the results for SDY were

Model for

forecasting

S-curves

(13)

slightly less accurate but very similar, with a 1.181 minimum, an 8.282 maximum, and a mean of 3.885. This reinforced the initial conclusion that the model was reliable.

The results were compared with previous value and cost models to establish the effectiveness of the present model. A total of five models were identified and used in the comparison. In general the present model was proven to be more accurate than four out of the five models. In terms of maximum and minimum values, the most accurate model, Kenley and Wilson (1986), only consisted of two building types though, and hence results are expected to be accurate, because of the specific nature of the data. Apart from this study, the present model produced the lowest mean SDY of 2.493, and median of 2.099, showing the small spread of results. The results were also more accurate than the alternative to value curves, the cost commitment model, as proposed by Kaka and Price (1993). The majority of the SDY values for that model ranged from between 3.2 and 8.4, compared to the Kenley and Wilson (1986), study that produced figures of between 4.8 and 11.6. The bulk of the SDY values for the present model ranged 1.580 to 3.493 with a median of 2.099, and confirms that better overall results are achieved with the multiple linear regression cost model. The reasons why the results are not significantly lower could be the size of the sample data. This limited difference in the accuracy when compared to the other models confirm the conclusion that construction projects in the sample are unique, but can be modelled more easily via a standardised works programme that consists of the same base activities for each project. The research has therefore proved that, subject to reservations about the moderate sample size used in development, a multiple linear regression model based on the framework for a standardised works programme can be used to accurately model S-curves. It provides the first innovative steps as a basis for further research, and would benefit from an increasing number of projects, that would help determine the true accuracy of the proposed model.

References

Al-Jifri, M.A. (1998), “A study of the accuracy of calculating cost flow curves for individual cost element”, unpublished MArch thesis, University of Liverpool, Liverpool.

Allsop, P. (1980), “Cash flow and resource aggregation from estimator’s data (computer program CAFLARR)”, MSc in Construction Management project report, Loughborough University of Technology, Loughborough.

Ameen, J.R.M., Neale, R.H. and Abrahamson, M. (2003), “An application of regression analysis to quantify a claim for increased costs”,Construction Management and Economics, Vol. 21, pp. 159-65.

Balkau, B.J. (1975), “A financial model for public works programmes”, paper presented at the National ASOR Conference, Sydney, 25-27 August.

Berny, J. and Howes, R. (1982), “Project management control using real time budgeting and forecasting models”,Construction Papers, Vol. 2, pp. 19-40.

Bromilow, F.J. and Henderson, J.A. (1977),Procedures for Reckoning the Performance of Building Contracts, 2nd ed., CSIRO, Division of Building Research, Highett.

Draper, N.R. and Smith, H. (1998), Applied Regression Analysis, 3rd ed., John Wiley & Sons, New York, NY.

Evans, R.C. and Kaka, A.P. (1998), “Analysis of the accuracy of standard/average value curves using food retail building projects as case studies”, Engineering, Construction and Architectural Management, Vol. 5, pp. 42-5.

ECAM

13,1

(14)

Evans, R., Kaka, A.P. and Lewis, J. (1996), “Development of a computer based model to assist contractors to plan, forecast and control cash flow”,ARCOM: Proceedings of the 12th Annual Conference and Annual General Meeting, Vol. 2, pp. 556-65.

Hudson, K.W. (1978), “DHSS expenditure forecasting method”,Chartered Surveyor – Building and Quantity Surveying Quarterly, Vol. 5, pp. 42-5.

Jepson, W.B. (1969), “Financial control of construction and reducing the element of risk”, Contract Journal, 24 April, pp. 862-4.

Kaka, A.P. (1994), “Contractors’ financial budgeting using computer simulation”,Construction Management and Economics, Vol. 14, pp. 35-44.

Kaka, A.P. (1999), “The development of a benchmark model that uses historical data for monitoring the progress of current construction projects”,Engineering, Construction and Architectural Management, Vol. 6 No. 3, pp. 256-66.

Kaka, A.P. and Price, A.D.F. (1993), “Modelling standard cost commitment curves for contractors’ cash flow forecasting”,Construction Management and Economics, Vol. 11, pp. 271-83.

Kaka, A.P. and Price, A.D.F. (1996), “Towards more flexible and accurate cash flow forecasting”, Construction Management and Economics, Vol. 14, pp. 35-44.

Kenley, R. and Wilson, O. (1986), “A construction project cash flow model – an idiographic approach”,Construction Management and Economics, Vol. 4, pp. 213-32.

Khosrowshahi, F. (1991), “Simulation of expenditure patterns of construction projects”, Construction Management and Economics, Vol. 9, pp. 113-32.

Miskawi, Z. (1989), “An S-curve equation for project control”,Construction Management and Economics, Vol. 7, pp. 115-25.

Nkado, R.N. (1992), “Construction time information system for the building industry”, Construction Management and Economics, Vol. 10, pp. 489-509.

Oliver, J.C. (1984), “Modelling cash flow projections using a standard micro-computer spreadsheet program”, MSc project in Construction Management, Loughborough University of Technology, Loughborough.

Petros, H.S. (1996), “An investigation into the effects of construction planning on cost flow curves: a case study”, PhD thesis, Department of Architecture and Building Engineering, University of Liverpool, Liverpool.

Singh, S. and Lakanathan, G. (1992), “Computer based cash flow model”,Proceedings of the 36th Annual Trans., American Association of Cost Engineers, Morgantown, VA, pp. R5.1-R5.14. Weisberg, S. (1980),Applied Linear Regression, John Wiley & Sons, Chichester.

Model for

forecasting

S-curves

95

To purchase reprints of this article please e-mail:[email protected]