In the parametric method, a statistical relationship is developed between historical costs and program, physical, and performance characteristics. The method is sometimes referred to as a top-down approach. Types of physical characteristics used for parametric estimating are weight, power, and lines of code. Other program and performance characteristics include site deployment plans for information technology installations, maintenance plans, test and evaluation schedules, technical performance measures, and crew size. These are just some examples of what could be a cost driver for a particular program.
Sources for these cost drivers are often found in the technical baseline, cost analysis requirements document or cost analysis data requirement. The important thing is that the attributes used in a parametric estimate should be cost drivers of the program. The assumption driving the parametric approach is that the same factors that affected cost in the past will continue to affect future costs. This method is often used when little is known about a program except for a few key characteristics like weight or volume.
Using a parametric method requires access to historical data, which may be difficult to obtain. If the data are available, they can be used to determine the cost drivers and to provide statistical results and can be adjusted to meet the requirements of the new program. Unlike an analogy, parametric estimating relies on data from many programs and covers a broader range. Confidence in a parametric estimate’s results depends on how valid the relationships are between cost and the physical attributes or performance characteristics. Using this method, the cost estimator must always present the related statistics, assumptions, and sources for the data.
The goal of parametric estimating is to create a statistically valid cost estimating relationship using
historical data. The parametric CER can then be used to estimate the cost of the new program by entering its specific characteristics into the parametric model. CERs established early in a program’s life cycle should be continually revisited to make sure they are current and the input range still applies to the new program. In addition, parametric CERs should be well documented, because serious estimating errors could occur if the CER is improperly used.
Parametric techniques can be used in a wide variety of situations, ranging from early planning estimates to detailed contract negotiations. It is always essential to have an adequate number of relevant data points, and care must be taken to normalize the dataset so that it is consistent and complete. In software, the development environment—that is, the extent to which the requirements are understood and the strength of the programmers’ skill and experience—is usually the major cost driver. Because parametric relationships are often used early in a program, when the design is not well defined, they can easily be reflected in the estimate as the design changes simply by adjusting the values of the input parameters. It is important to make sure that the program attributes being estimated fall within (or, at least, not far outside) the CER dataset. For example, if a new software program was expected to contain 1 million software lines of code and the data points for a software CER were based on programs with lines of code ranging from 10,000 to 250,000, it would be inappropriate to use the CER to estimate the new program.
To develop a parametric CER, cost estimators must determine the cost drivers that most influence cost. After studying the technical baseline and analyzing the data through scatter charts and other methods, the cost estimator should verify the selected cost drivers by discussing them with engineers. The CER can then be developed with a mathematical expression, which can range from a simple rule of thumb (for example, dollars per pound) to a complex regression equation.
The more simplified CERs include rates, factors, and ratios. A rate uses a parameter to predict cost, using a multiplicative relationship. Since rate is defined to be cost as a function of a parameter, the units for rate are always dollars per something. The rate most commonly used in cost estimating is the labor rate, expressed in dollars per hour.
A factor uses the cost of another element to estimate a new cost using a multiplier. Since a factor is defined to be cost as a function of another cost, it is often expressed as a percentage. For example, travel costs may be estimated as 5 percent of program management costs.
A ratio is a function of another parameter and is often used to estimate effort. For example, the cost to build a component could be based on the industry standard of 20 hours per subcomponent.
Rates, factors, and ratios are often the result of simple calculations (like averages) and many times do not include statistics. Table 14 contains a parametric cost estimating example.
Table 14: An Example of the Parametric Cost Estimating Method
Program attribute Calculation
A cost estimating relationship (CER) for site activation (SA) is a
function of the number of workstations (NW) SA = $82,800 + ($26,500 x NW)
Data range for the CER 7–47 workstations based on 11 data points Cost to site activate a program with 40 workstations $82,800 + ($26,500 x 40) = $1,142,800 Source: © 2003, Society of Cost Estimating and Analysis (SCEA), “Costing Techniques.”
In table 14, the number of workstations is the cost driver. The equation is linear but has both a fixed component (that is, $82,800) and a variable component (that is, $26,500 x NW).
In addition, the range of the data is from 7 to 47 workstations, so it would be inappropriate to use this CER for estimating the activation cost of a site with as few as 2 or as many as 200 workstations. In fact, at one extreme, the CER estimates a cost of $82,800 for no workstation installations, which is not logical. Although we do not show any CER statistics for this example, the CERs should always be presented with their statistics. The reason for this is to enable the cost estimator to understand the level of variation within the data and model its effect with uncertainty analysis.
CERs should be developed using regression techniques, so that statistical inferences may be drawn. To perform a regression analysis, the first step is to determine what relationship exists between cost (dependent variable) and its various drivers (independent variables). This relationship is determined by developing a scatter chart of the data. If the data are linear, they can be fit by a linear regression. If they are not linear and transformation of the data does not produce a linear fit, nonlinear regression can be used. The independent variables should have a high correlation with cost and should be logical.
For example, software complexity can be considered a valid driver of the cost of developing software. The ultimate goal is to create a fit with the least variation between the data and the regression line. This process helps minimize the statistical error or uncertainty brought on by the regression equation.
The purpose of the regression is to predict with known accuracy the next real-world occurrence of the dependent variable (or the cost), based on knowledge of the independent variable (or some physical, operational, or program variable). Once the regression is developed, the statistics associated with the relationship must be examined to see if the CER is a strong enough predictor to be used in the estimate. Most statistics can be easily generated with the regression analysis function of spreadsheet software. Among important regression statistics are
R-squared, ■
statistical significance, ■
the F statistic, and ■
the t statistic. ■
R-squared
The R-squared (R2) value measures the strength of the association between the independent and dependent (or cost) variables. The R2 value ranges between 0 and 1, where 0 indicates that there is no relationship between cost and its independent variable, and 1 means that there is a perfect relationship between them. Thus, the higher R2 is the better. An R2 of 91 percent in the example in table 14, for example, would mean that the number of workstations (NW) would explain 91 percent of the variation in site activation costs, indicating that it is a very good cost driver.
Statistical Significance
Statistical significance is the most important factor for deciding whether a statistical relationship is valid. An independent variable can be considered statistically significant if there is small probability that its corresponding coefficient is equal to zero, because a coefficient of zero would indicate that the independent variable has no relationship to cost. Thus, it is desirable that the probability that the coefficient is equal to zero be as small as possible. How small is denoted by a predetermined value called the significance level. For example, a significance level of .05 would mean there was a 5 percent probability that a variable was not statistically significant. Statistical significance is determined by both the regression as a whole and each regression variable.
F Statistic
The F statistic is used to judge whether the CER as a whole is statistically significant by testing to see whether any of the variables’ coefficients are equal to zero. The F statistic is defined as the ratio of the equation’s mean squares of the regression to its mean squared error, also called the residual. The higher the F statistic is, the better the regression, but it is the level of significance that is important.
t Statistic
The t statistic is used to judge whether individual coefficients in the equation are statistically significant. It is defined as the ratio of the coefficient’s estimated value to its standard deviation. As with the F statistic, the higher the t statistic is, the better, but it is the level of significance that is important.