Cost Estimation Methods For Software Engineering

(1)

Cost Estimation Methods

For Software Engineering

By

Andre Ladeira

Dissertation submitted in partial fulfillment of the requirements for the degree

Magister lngeneriae in Engineering Management In the faculty of Engineering

at the

Rand Afrikaans University

Supervisor: Prof L Pretorius January 2002

(2)

Cost Estimation Methods for Software Engineering

Executive Summary

This dissertation summarizes several classes of software cost estimation models and techniques. Experience to date indicates that expertise-based techniques are less mature than the other classes of techniques (algorithmic models), but that all classes of techniques are challenged by the rapid pace of change in software technology. The primary conclusion is that no single technique is best for all situations, and that a careful comparison of the results of several approaches is most likely to produce realistic estimates.

As more pressure on accurate cost estimation increase, research attention is now directed at gaining a better understanding of the software-engineering process as wall as constructing and evaluating software cost estimation tools. This dissertation evaluated four of the most popular algorithmic models used to estimate software cost (SLIM, COCOMO II, Function points and SLOC)

This dissertation also provides an overview of the baseline cost estimation model tailored to these new forms of software engineering. The major new modeling capabilities are an adaptable family of software sizing models, involving Function Points and Source Lines of Code. These models are serving as a framework for an extensive current data collection and analysis effort to further refine and calibrate the model's estimation capabilities.

(3)

Index

Chapter 1 Introduction ...

&

1.1. Background to the problem ... 6

1.2. Background literature ... 6 1.3. Problem Statement ... 10 1.4. Research objective ... 11 1.5. Conclusion ... 11

Chapter 2 Estimation Processes ... 12

2.1 . Software Cost ... 12

2.2. Software Cost Estimation Process ... 13

2.3. Estimation and the software process ... 13

2.4. Inputs and Outputs to the Estimation Process ... 15

2.5. The Estimation Process ... 19

2.6. Timing of the estimates ... 21

2.7. Estimation Constraints ... 22

2.8. Data gathering ... 24

2.9. Problems with the Cost Estimation Process ... 25

2.1 0. Problems with Requirements ... 26

2.11. Conclusion ... 28

Chapter 3 Size Estimation ... 30

3.1 Lines of code ... 30

3.2 Function Point Analysis ... 34

(4)

Cost Estimation Methods for Software Engineering

Chapter 4 E:!itirrlettiC>rl

~E!tll()cJ!i

... 44

4.1. Software Life-cycle Management (SLIM) method ... 44

4.2. Constructive Cost Model (COCOMO II) ... 49

4.3. Expertise-Based Technique ... 57

4.4. Cost Estimation method ... 59

4.5. Conclusion ... 61

Chapter

5 Case Stucty ... 64

5.1 Project description ... 64

5.2 Project Size estimation ... 64

5.3 Effort estimation ... 66

5.4 Conclusion ... 70

Chapter 6 Conclusions a net Recorr1rt1enctations ... 71

6.1 Conclusions ... 71

6.2 Recommendations ... 73

6.3 Further Investigation ... 76

References ...

Error! Bookmark not defined.

(5)

Appendix A ... 82

Scaling Drivers ... 86

Appendix 8 ... 85

Architecture I Risk Resolution ... 89

Appendix C ... 86

Team Cohesion ... 90

Appendix 0 ... 86

Process Maturity ... 90

Appendix E ... 87

Product Complexity ... 91

Appendix F ... 88

Effort multipliers ... 92

Appendi>e

c:J ...•••••••...••...•••...•••...••••...

!JE;

Nu Metro Server Technical Specification ... 99

List of figures

Figure 1.1: Influencing factors to be evaluated to produce an accurate estimate .9 Figure 1.2: Information to be used to predict scenarios on future projects ... 10

Figure 1.3: Estimation principle ... 11

Figure 2. 1: Classical view of software estimation process ... 16

(6)

Figure 3.1: Definition checklist for source statements counts ... 32

Figure 4.1: Rayleigh curve ... 44

List of tables

Table 1.1: Project levels of complexity ... 8

Table 3.1: Function point complexity matrix ... 37

Table 3.2: Function point complexity-weight matrix ... 37

Table 4.1: Rating scheme for the COCOMO II scale factors ... 55

Table 4.2: Effort multipliers cost driving rating for the post-architecture model. .. 58

(7)

Chapter 1 Introduction

1.1. Background to the problem

"If there is one management danger zone to mark above all others, it is software cost estimation."

Robert Glass - Building Software Quality

The reason for the strong emphasis on software engineering cost estimation is that it provides the vital link between the general concepts and techniques of economic analysis and the particular world of software engineering. There is no good way to perform a software cost-benefit analysis, breakeven analysis, or make-or-buy analysis without some reasonably accurate method of estimating software engineering costs, and their sensitivity to various product, project, and environmental factors. Software engineering cost estimation techniques also provide an essential part of the foundation for good engineering management.

Cost in a project is also due to the requirements for software, hardware and human resources. The bulk of the cost of software development is due to the human resources needed, and most cost estimation procedures focus on this aspect. Most cost estimates are determined in terms of person-months (PM).

1.2. Background literature

As the cost of the project depends on the nature and characteristics of the project, at any point, the accuracy of the estimate will depend on the amount of reliable information that is available about the final product [4][27]. When the project is being initiated or during the feasibility study, the analysts have only

(8)

some idea of the data the system will get and produce and the major functionality of the system. There is a great deal of uncertainty about the actual specifications of the system. As the user specifies the system more fully and accurately, the uncertainties are reduced and more accurate cost estimates can be made. Despite the limitations, cost estimation models have matured considerably and generally give fairly accurate estimates.

By far, the project sizing technique that delivers the greatest accuracy and flexibility is function point analysis [24]. Based upon logical, user-defined requirements, function points permit the early sizing of the software problem domain. In addition, the function point methodology presents the opportunity to size a user requirement regardless of the level of detail available. An accurate function point size can be determined from the detailed information included in a thorough user requirements document, or an adequate function point size can be derived from the limited information provided in an early proposal.

An alternative sizing method is counting lines of code [20]. It dependent upon information that is not available until later in the development life cycle. Function points accurately size the stated requirement. If the problem domain is not clearly or fully defined, the project will not be properly sized. When there are missing, brief, or vague requirements, a simple process using basic diagramming techniques with the requesting user can be executed to more fully define the requirements.

In addition to the project size, project complexity must be properly evaluated [Matson]. To some extent, complexity levels are evaluated by 14 general system characteristics:

• Data communication

• Distributed data processing • Performance

(9)

• Transaction rate • Online data entry • End-user efficiency • Online update • Complex processing • Reusability • Installation ease • Operational ease • Multiple sites

The assessment of a project's complexity should also take into consideration complex interfaces, database structures, and contained algorithms. The assessment of complexity can be based upon five varying levels of complexity as shown in table 1.1 :

Level1: Simple addition/subtraction Simple logical algorithms Simple data relationships

Level2: Many calculations, including multiplication/division in series More complex nested algorithms

Multidimensional data relationships

Level3: Significant number of calculations typically contained in payroll/actuarial/rating/scheduling applications

Complex nested algorithms

Multidimensional and relational data

relationships with a significant number of attributive and associative relationships

Level4: Differential equations typical Fuzzy logic

Extremely complex logical and mathematical algorithms typically seen in military/telecommunications/real-time/automated process control/navigation systems

Extremely complex data

LevelS: Online, continuously available, critically timed

Event-driven outputs that occur simultaneously with inputs Buffer area or queue to determine processing priorities Memory, timing, and communication constraints

_•

Business Impact Development Staff

•

Automation

Skills

_•

Customer

Involvement

Figure 1.1: Influencing factors to be evaluated to produce an accurate estimate [19].

(11)

Estimate Project Completion

Access: Size \ ~

Complexity Size

Influence ,__~ Rate of delivery •, ... ~ Complexity

Factors Influence Factors

Select a baseline profile

Baseline of

Performance _{Rate of Delivery}Create Profile

Time to Market Defects

Figure 1.2: Information to be used to predict scenarios on future projects

An organization should develop profiles that reflect the rate of delivery for a project of a given size, complexity, and risk factors [12].

1.3. Problem Statement

At the core of the estimating challenge are two issues [14]: the need to understand and express (as early as possible) the engineering problem domain, and the need to understand the capability to deliver the required software solution within a specified environment. Only then it will be able to accurately predict the effort required to deliver a project.

The current engineering problem domain can be defined simply as the scope of the required software. The problem domain must be accurately assessed for its size and complexity. To complicate the situation, experience tells that at the point in time that an initial estimate is required (early in the system's life cycle) it cannot be presumed that all the necessary information is available. Therefore, there must be a rigorous process that permits a further clarification of the problem domain.

An effective estimating model considers three elements: size, complexity, and risk factors. When factored together, they result in a more accurate cost estimate (see Figure 1.3).

(12)

Cost Estimation Methods for Software Engineering Definition

~

( Project Size ) * \~ Project Complexity Capability * ( Risk Factors

=

~

Figure 1.3: Estimation principle

1.4. Research objective

The objective in this dissertation is as follows: Estimates

-Schedule

-Effort -Costs

• Determining the software engineering cost estimation principle and process.

• Investigating different size estimation methods and determining the difference between the different methods.

• Investigating different effort estimation methods or techniques and determining the difference between the methods or techniques.

1.5. Conclusion

The structure of the research dissertation will be as follows:

• Chapter two will cover a comprehensive literature review of the subject matter and related fields.

• Chapter three will cover an investigation into two different size estimation methods (function points calculation and source line of code count).

• Chapter four will cover an investigation into three different effort and cost estimation methods used in the software engineering industry.

• Chapter five will cover a case study where the estimated methods were used.

• Chapter six contains the conclusion and recommendations on the findings made.

(13)

Chapter 2 Estimation Processes

2.1. Software Cost

Despite the terminology, software engineering cost does not refer directly to a monetary value associated with software development. Such a value is almost impossible to arrive at and not always useful. The questions are "What's the effort involved?" and "How long will it take?" The answers to these two questions can then be translated to the monetary value. This leads to the following definition of software cost.

Software cost consists of three elements [9]:

• Manpower loading is the number of engineering and management personnel allocated to the project as a function of time.

• Effort is defined as the engineering and management effort required to complete a project, usually measured in units such as person-months. The types and the levels of skills for the resources influence the cost of the project.

• Duration is the amount of time (usually measured in months) required to complete the project.

Arriving at a cost estimate involves using a number of different factors to try to determine the overall cost of a system. Deciding which factors to include and combining them to arrive at the estimate make up the engineering cost estimation process that is defined as follows [14]:

Direct costs include items such as analysis, design, coding, testing and integration. Depending on who is doing the engineering and why, software cost

(14)

may also include a number of other items such as training, customer support, installation, level of documentation, configuration management, and quality assurance.

2.2. Software Cost Estimation Process

A software cost estimation process is the set of techniques and procedures that an organization uses to arrive at a software cost estimate. Generally there is a set of inputs to the process (e.g., system requirements) and an output of effort, manpower loading, and/or duration.

It is very difficult to examine the software cost estimation process without the overall context of the software development process in use within a given organization.

The set of procedures, techniques, and standards that an organization uses for organizing, managing, and controlling software development projects is called the software process.

Organizations have different software processes, depending on the type of software they are developing. For many organizations, the development process is very informal; in other cases it is well documented and stringently monitored.

2.3. Estimation and the Software Process

Cost of a project can be estimated for a number of reasons. Why it is done is an important factor in determining when and how it is done. The reasons why a cost estimation process is undertaken include the following [14]:

• Project approval. For every project there must be a decision by the organization to undertake the project. Such a decision requires an estimate of the money and resources required to complete it.

(15)

• Project management. Project managers are responsible for planning and control of projects. Both activities require an estimate of the activities required to complete a project and the resources required for each activity. • Project team understanding. For members of a project team to work together more efficiently on a project, it is necessary that each one understand his/her role in the project and the overall activities of the project. A project task definition, which can be used for this purpose, is generated by a cost estimate.

The "why" of the cost estimation process can be any of the above reasons and is one of the factors determining when the estimate is done. Project approval requires estimates to be performed very early in the project life cycle, often before requirements have been clearly specified. The project approval process typically has a number of points where a "go/no go" decision must be made. At each of these points, an estimate may be required to permit management to make the decision. Early in the project life cycle, these may be approximate order of magnitude estimates sufficient to allow the organization to determine whether they should continue to look at a project. Late in the project, management can get much more detailed estimates of cost to completion in order to decide whether to cancel an ongoing project.

For managing and understanding a project, an estimate can be done early in the development of the project to arrive at an initial estimate, and then repeated on a regular basis during development to keep the estimate current [1][2][3]. For these estimates the prime concern is not necessarily the absolute "cost," but the estimated set of tasks required to complete the project, the results of each of these tasks, how these tasks fit together, and the resources required to complete each task.

(16)

Re-estimates are required throughout the development cycle regardless of why the estimate is done. As a project progresses, more information is available on the product and the process being used to develop it. This information can be used to increase the accuracy and detail of the estimate.

2.4. Inputs and Outputs to the Estimation Process

The software cost estimation process computes a set of outputs as a function of a set of inputs. The inputs to the estimation process depend on when the estimate is being performed. Very early estimates are necessarily based on sparse and incomplete data regarding the project and the development process.

Preliminary estimates are needed before requirements are known or architecture has been defined [22]. Such estimates will necessarily be based on sketchy data and will not have a high degree of accuracy. Estimates performed late in the development cycle are based on a much wider set of information. Computing cost to completion late in the development cycle allows a great deal of project and process information to be used. Given that more information is available, more detailed estimates can be made, which have a much greater degree of accuracy than the initial estimates.

Most models of cost estimation view the estimation process as being a function computed from a set of cost drivers. These drivers are assumed to be the characteristics of a system that determine the final cost of production. In most of the advocated cost estimation techniques, the primary cost driver is assumed to be the software requirements [2][3][1 0]. In this model of software cost estimation (illustrated in Figure 2.1 ), the requirements are the primary input to the process and form the basis for the estimate. The estimate is then adjusted according to a number of other cost drivers (such as experience of personnel and complexity of system) to arrive at the final estimate.

(17)

In this classical view, the effort, duration, and loading are computed as fixed numbers (perhaps with tolerances), or a set of relationships between the values is given, allowing managers to trade off costs in order to minimize any of the three values. Cost drivers Requirement Other cost drivers Software cost estimation process

Figure 2.1: Classical view of software estimation process [22]

Loading

In fact, the cost estimation process can be much more complex than that portrayed in Figure 2.1. There is interdependency between many items of information, all of which are relevant to the cost estimation process (Figure 2.2).

Many of the data items that are inputs to the cost estimation process are modified and output by the process. Thus, rather than viewing the cost estimation process as a function of the requirements, it is often more accurate to view this process as trying to satisfy a set of constraints. The inputs to the system are a set of constraints on the requirements, software architecture, financial resources, etc., while the outputs are a cost estimate and a set of assumptions that satisfy all the constraints.

This view allows the constraints to be imposed on any of the factors that affect the cost. These factors range far beyond requirements to include issues such as delivery date, finances and software process.

Requirements are viewed as constraints that must be satisfied. In a few cases, these requirements are fixed, complete, and correct. In most cases, however,

(18)

during estimation the estimator detects inconsistencies and ambiguities in the requirements. As part of the estimation process, the estimator will resolve some of these ambiguities by imposing new constraints on the requirements. In other cases, the problems with the requirements remain, with a corresponding affect on the accuracy of the estimate.

Financial, calendar, manpower, architectural, and software process constraints are also significant to the cost estimation process. Financial, calendar, and manpower constraints limit the amount of resources that can be allocated to a project. Financial constraints limit the amount of money that can be budgeted for the project; calendar constraints specify a delivery date that must be met; and manpower constraints limit the number of people that can be allocated to the project. For example, if a fixed amount of money is available for a project, then the estimated cost should satisfy this financial constraint, perhaps by varying the functionality.

The software architecture defines the different components used to construct the system and the interrelationships between these components. The stage in the development life cycle determines whether the software architecture is a factor for the estimation process. For example, maintenance organizations that are working with an existing system are constrained to use the existing architecture and can base their estimates on this architecture.

The cost estimation process for new development may not make any assumptions on the software architecture and base the estimate entirely on the basis of system functionality. For many larger contracts, the software process becomes one of the constraints that must be satisfied by the estimating process. Many organizations have within their software process a standard Work Breakdown Structure (WBS), which defines the tasks to be performed to complete a project. Frequently, the estimating process will be working under the

(19)

Cost Estimation Methods for Software Engineering

constraint that the standard WBS must be used for a project. The estimating process will then tailor the WBS to the specific project, adding sufficient detail.

For example, one situation where constraints to the software process affect the estimation process is the requirement to develop according to the ISO 9000 standard. Significant cost is incurred by adhering to this standard; for small changes, ISO 9000 can actually be the dominant cost factor. When estimating a system developed to this standard, estimators must be aware of the cost incurred by use of the standard.

Cost drivers Cons1raints Other inputs Vaque requirements Other cos drivers Sofhvare cost estimation process

Figure 2.2: Actual cost estimation process [22]

Less vaque (and modified) requirements Loading Contingency Tentative WBS Less fuzzy architecture

Aside from the various constraints, other factors that must be included as part of the estimation process are the risks associated with the project. These risks could include, for example, dependency on outside contractors, lack of experience in the application domain, etc. These risk factors should be identified

(20)

as early as possible in order include them in the decision making and project management processes.

2.5. The Estimation Process

An estimate is arrived at by taking the identified constraints, applying the estimation process, and generating results that satisfy all the constraints. A variety of techniques are used by different organizations to arrive at these estimates. The processes used can be classified as either model based or analogy based.

Model-based estimation builds a costing model of system development based on the characteristics of the system being built, the process being used to build it, and it's the development environment.

A model can be a formal mathematical model or a set of informal guidelines used by an estimator. Informal models are used by experienced developers who have gained sufficient knowledge about system development by working on previous projects. The informal model used by such an estimator is expressed as a set of "rules of thumb" or, at an even more primitive level, as a "gut feel" [30]. When questioned as to how they developed their model and how they apply it, estimators are usually unable to say exactly what it is they do. It appears to be an issue of gaining the required experience in order to arrive at accurate estimates.

Formal models attempt to quantify all inputs to the cost estimation process, and then apply a set of equations that describes the relationships between the inputs and the outputs of the cost estimation process. The equations are developed through analysis of historical data and must be calibrated to each individual development environment. The best known formal models are Boehm's COCOMO II [2][4] function points, and Putnam's application of Rayleigh curves to the development process [27].

(21)

The usual method of applying the formal model is to transform the requirements into a measure of the "size" of the system. This size measure, which can be either SLOG (Source Lines of Code) or FPs (Function Points), is used as the basis for creating the cost estimates. The estimator can also quantify a set of other cost drivers, examples of which include:

• Product attributes, e.g., required reliability, product complexity, etc. • Computer attributes, e.g., memory constraints.

• Personnel attributes, e.g., applications experience, programming language experience.

These cost drivers become multipliers that can be used to increase or decrease the initial estimate. The bulk of the current literature and research on cost estimation is devoted to formal models, particularly as relates to new system development [2][4][27].

Analogy-based estimating processes estimate costs by comparing the current development project with previous development projects undertaken by the organization. An analogy-based technique requires maintenance of a history of past projects; this information can be used as a reference point. Past projects with properties similar to the current project are identified and their costs used as a basis for estimating the current project.

At the most informal-level of analogy based techniques, the history of past projects is maintained in the estimator's memory. Finding past projects with properties similar to the current project involves the estimator thinking of similar project and what cost was involved in those projects. Such an approach is highly dependent on the memory of the individual estimators and a very low employee turnover.

The analogy-based approach can be made more rigorous in a number of ways. The history of past projects can be maintained as a computerized database, with

(22)

detailed metrics and descriptions of characteristics recorded for each project. Using a historical database, an estimator can query the database searching for projects with similar characteristics and then base the estimate on actual costs and process of the previous projects. Such an approach avoids the fallibility of human memory and provides a much more detailed historic record of what occurred in the course of a project [9].

2.6. Timing of the estimates

Estimation is not a task done once, at the beginning of a project. Rather, estimates and re-estimates are undertaken throughout the life of a project [7][8][10]. The success of an estimator is not necessarily the accuracy of the initial estimates, but rather the rate at which the estimates converge to the actual costs. The timing of estimates depends on the type of organization involved and why the estimate is being performed.

Contractors usually perform two estimates early in the development life cycle. The first is done to prepare a bid for the contract, usually in a relatively quick fashion, with the objective of arriving at a winning bid. The timing of this bid is very much dependent on the procuring agency that issues the Request for Proposal (RFP). The contractor is required to generate an estimate at this point, basing it on information within the RFP and obtained informally from the contracting agency.

Upon winning a bid, most contracting organizations immediately undertake a second, more detailed, estimation process. The objective of this estimate is to develop a more accurate and detailed cost estimate and project plan which are based on the previous estimate and WBS. Frequently, much discussion between the contractor and the agency is necessary to deal with previously undetected issues and problems in the requirements.

(23)

2. 7. Estimation Constraints

An estimation process involves arnvmg at an estimate that satisfies the constraints. These constraints vary depending on the timing of the estimate and the organization performing the estimate, but can include:

• System requirements. • Delivery date. • Financial. • Manpower resources. • System architecture. • Software process.

When preparing a bid to develop new software, a contracting organization is usually faced with constraints on system requirements, delivery date, manpower resources, and software process. Depending on the system under construction, constraints may be placed upon the architecture. The constraints on the requirements of the system vary considerably among projects. Some projects have requirements which are well understood and well documented within the RFP. In these cases, the constraints on the requirements are well understood by all parties involved. However, in many cases, requirements are not clearly understood up front, or are flexible in terms of the actual functionality to be delivered as part of the end product.

Delivery date and financial resources are constraints that are very firm and have a large impact upon a contractor's preparation of a bid for estimation purposes. There are two reasons that these constraints are imposed upon contractors. First, the procuring agency has a budget and timetable, which they are under pressure to meet and which they are not willing to exceed. Second, there will be competing bids submitted.

(24)

Once the bid has been won, the contractor performs another more detailed estimate [8][10]. This estimate is in many ways more realistic because there is less pressure to satisfy financial constraints; it is usually done by the project manager to determine how much the system is really going to cost. Although financial constraints affect the process, the manager usually defines in much more detail the functionality of the system and the process used to develop the system. This results in a more accurate estimate and can determine whether the system may be built for the contracted price.

Re-estimates done by contractors during development involve modifying the duration, effort, and functionality. As understanding of the tasks increases, more accurate estimates can be made regarding effort and duration. As the requirements of the system are better understood, they can be re-estimated and appropriate modifications made to the effort and duration estimates.

From a procuring agency's perspective, estimates are performed under a different set of constraints. Project Directors try to balance the following constraints while getting approval for the project:

• Financial. How much money is the organization willing to put into this project?

• Calendar. When do I have to show results to keep management satisfied? • Requirements. What is the functionality required of the system?

Each of these constraints has a different level of priority, depending on the particular project. Once project development begins, control of the project passes from the Project Director to the Project Manager (PM). At this point budgetary approval has been received and all previous estimates are considered to be cast in stone. Thus, here is great pressure on the PM not to change any of the previous estimates.

(25)

The PM must decide in what order to sacrifice the financial, calendar, and requirements constraints. Different PMs have different approaches; generally they try to maintain the functionality of the system, but let either the calendar or financial constraints slip. In reality, however, it appeared that if the original estimates were incorrect, all of the constraints were affected.

2.8. Data gathering

It seems obvious that without knowledge of the past, it is impossible to predict what may happen on future projects. (Even with knowledge of the past, there is still no guarantee 1that the future can be predicted.) A corollary is that if an organization wants to improve its cost estimation process, should gather relevant data on previous projects.

The simplest way to gather data is to have a stable work force so that project and process data are maintained in the memory of the individuals of the organization. The individuals can then use this information to estimate costs of other projects. However, relying on individuals' imperfect memories is barely sufficient for small projects; for large projects it is completely inadequate.

Even if this information is gathered, it is often done for financial purposes and is not used by software managers to estimate the cost of future projects. There are a number of reasons why this data may not be useful [27][30]:

• The data is not accurate. If the primary perceived purpose of time sheets is to monitor the staff, the accuracy of the figures in the time sheets must be questioned.

• The data is not accessible. Often time sheets are gathered for the benefit of the financial department rather than to assist estimators. Thus, they are kept on systems not easily accessible to estimators, or worse, are simply stored as m asses of paper files.

• The data is not broken down in a useful way. The overall cost of a project has a limited usefulness. What is usually of more interest to an estimator

(26)

is how the project was broken down into activities and the cost of each of these individual activities.

2.9. Problems with the Cost Estimation Process

What factors make software cost estimation difficult? There are situations where a high level of accuracy in cost estimation can be found; many of these situations were identified by the following characteristics [3]:

• The users are experienced in the system, know what they want, and can express what they want.

• The requirements are clear, precise, correct, and complete. • The project duration is short.

• The manpower loading is small.

• The people doing the estimation are experienced in the application domain and have developed similar systems.

• The development environment and development process are familiar to all people involved.

• Staff turnover is low both among the developers and the users.

• No unfamiliar software or hardware from outside suppliers is to be integrated with the final product.

A project satisfying the above characteristics frequently resulted in accurate cost estimates. However, most of the projects did not satisfy the above conditions and therefore the estimates produced were not accurate. The characteristics needed for accurate estimates can be reversed in order to enumerate problems leading to inaccurate estimates:

• Problems with the requirements. • Issues in maintenance.

• Procurement process. • System size.

(27)

• Lack of historical data.

• Lack of application domain expertise. • Embedded software.

2.10. Problems with Requirements

Almost universally and without exception, organizations blame problems with the requirements as a major reason why cost estimates were inaccurate. The problems are numerous: incomplete, ambiguous, inconsistent, incorrect, and incomprehensible.

The problem of users not understanding the requirements existed for all types of systems and all types of developments. For new development projects, users would request systems (and quotes) before there was a complete understanding of the problem or the solution.

Cost estimates can be made without a clear understanding of the requirements of the system being built; it must be accepted that these estimates have a very high likelihood of error.

Requirements creep. As projects progress and the knowledge of the problem increases, it seems inevitable that users (and developers) request more and more features and changes to be included in the product. Thus, over the development of the project, new features work their way into the requirements, leading to "requirements creep" (or, as Boehm described it, "requirements gallop"). New feature requests come from many sources and for many reasons, but the problem seems to be universal. Correct and complete requirements for complex systems are impossible to achieve. A fact that must be accepted is that a complete statement of the requirements cannot be defined before development begins [14]. This has nothing to do with the competence of the users or the developers but rather is inherent in the nature of complex computer system

(28)

applications. Unless the system being developed is almost identical to a previously developed system, the requirements will invariably be wrong and/or incomplete. As a project evolves, users and developers gain a better understanding of the problem and of the solutions. As people gain a better understanding of the problem being solved the requirements evolve.

One frequent assumption is that the requirements will be firm before development begins. Anyone working under this assumption will meet serious problems when trying to estimate software costs accurately.

Since the requirements are probably wrong or incomplete, it is unlikely that the estimates based on those requirements will be accurate. If the requirements are included as part of the RFP put out by a procurement agency and a contractor is expected to submit a firm bid based on those requirements, a frequent result later in the development stages is confrontation between the contractor and the agency as they argue over the meaning of each requirement and the cost associated with the changing requirements.

Long development time, leading to requirements that are obsolete before the system is delivered [8][10]. The rate of change in technology is so fast that any attempts to predict what the technology will be in a few years are doomed to failure. As the technology changes, so do the range of solutions to problems, and the users' expectations of the solutions. Projects with a long time between initiation and expected delivery suffer in that the solution is usually obsolete by the time it is delivered. The customer is dissatisfied because the product does not satisfy the new requirements. Large staff turnover for end users, resulting in changing requirements as new staff arrive. Developing software systems requires a consistent users' base throughout the development cycle. If the users' base changes too frequently, requirements continually change, and it is difficult for developers to obtain consistent answers and comments from the end users.

(29)

2.11. Conclusion

All private businesses have two concepts in common. These are • Ensure that a profit is made

• Ensure their survival

To ensure that this happens, all projects taken on must ensure that the business is not worse of than when started with the project. This can be accomplished when the initial cost estimate is complete and accurate.

To determine the cost of a software project, being low level software integration or high level web page development, the process is no different. The estimation process has many unknown factors that must be determined before the

estimation process can be started. The following factors must be considered • The software process. Most software engineering firms or companies

have a different management methodology on developing software. These differences can influence the cost estimation processes. There can be more documentation or formal processes that must be completed before the development process can move into the next step.

• There are more inputs to be considered than in previous year of software cost estimation. Previously the only considerations taken into account was were the system requirement and cost drivers. Today there are more factors to consider. Some of them are the company software process, financial constraints, risk factors and the specific software architecture • System requirement. In some cases the required software to be

engineered is a new system based on new technology released. There is no data or experienced manpower available. A steep learning curve must be taken into consideration.

Another obstacle in the cost estimation process is the specific requirements set by the client. In many cases these requirements are vague, incomplete and

(30)

ambiguous. The system analyst or project manager must set up a task team to determine the complete and correct requirements. This process can be time consuming and sometimes expensive.

Time brackets allocated for request for proposal (RFP) are inadequate. The project manager or the specific member assigned with the RFP must create a cost estimation with the vague information supplied. This in turn may cause that the estimation process is inaccurate.

These obstacles can be resolved by firstly estimating the size of the project with available requirement. Different size estimation methods are available; the most popular methods are the counting of source lines of code and function point counting.

These methods will be discussed in more detail in the next chapter. The goal of the chapter is to determine what method would be best suited for one of the biggest obstacles, accurate estimations with limited requirements.

(31)

Chapter 3 Size Estimation

3.1 Lines of code

The traditional size metric for estimating software development effort and for measuring productivity has been lines of code (LOC). A large number of cost estimations models have been produced, most of which are functional lines of code, or thousands of lines of code (KLOC). The definition of KLOC is important when comparing these models. Some models include comment lines, and others do not. Similarly, the definition of what effort (E) is being estimated is equally important. Effort may represent only coding at one extreme of the total analysis, design, coding and testing effort at the other extreme. As a result, it is difficult to compare these models.

The abbreviation NCLOC is used to represent a non-commented source line of code. NCLOC is also sometimes referred to as effective lines of code (ELOC). NCLOC is therefore a measure of the uncommented length.

The commented length is also a valid measure, depending on whether or not line documentation is considered to be a part of programming effort. The abbreviation CLOC is used to represent a commented source line of code [11]

By measuring NCLOC and CLOC separately the total length can be defined:

Total length (LOC)

=

NCLOC + CLOC Equation 3.1

KLOC is used to denote thousands of lines of code.

A logical source statement has been chosen as the standard line of code. Defining a line of code is difficult due to conceptual differences involved in

(32)

accounting for executable statements and data declarations in different software languages. The goal is to measure the amount of intellectual work put into program development, but difficulties arise when trying to define consistent measures across different languages.

To minimize these problems, the Software Engineering Institute (SEI) definition checklist for a logical source statement is used in defining the line of code measure. The Software Engineering Institute (SEI) has developed this checklist as part of a system of definition checklists, report forms and supplemental forms to support measurement definitions [12][20].

Figure 3.1 shows a portion of the definition checklist as it is being applied to support the development of the COCOMO II model. Each checkmark in the "Includes" column identifies a particular statement type or attribute included in the definition, and vice-versa for the excludes. Other sections in the definition clarify statement attributes for usage, delivery, functionality, replications and development status.

There are also clarifications for language specific statements for ADA, C, C++, CMS-2, COBOL, FORTRAN, JOVIAL and Pascal.

(33)

Definition Checklist for Source Statements

Counts

1•11~ !<kal ~nun:t· litH.·~

LH;.tkal ·mlllT\' .. t:iknwut'

Statenwn! type

Vtlt~P:"? a ,Jnp or stntA't)£..:~: t";Ottt;;'lt:?S nlOt8 th"~'! t>ne :ypit.

t~ta:;.:;.d~< ;: ~3> Nte lypB< ·/r:U: tr-t: n:gtJ?St J:! t!' ... >:den<:t:, 1 Ex>>cutablo?: 2 Noru,;,x<'::vt<ib!oi:i 3 Df?cJar,1tlons 4 C<;I!IPII,-r (hrectiv"'" !) Comrnt?nts On !h&lr v.vn hn<'s f 8 g

ClftlinBs ·hith S(lUh~"> cod;,

Banwtrs and m>t1·1Aant; SfMCt.:r::. B!;mk 1 .;;mpty) .:::ornmt'nts 10 BL:mk llm:-s

Hmv produced

1 Pn)\1! arm tM:>d

[),:.1u1ition

:? (71\on;:.r.atf.td wlt11 :;ourcof!o ~~<)fle \JRn;,r-.tors

3 Conv•:rted mth <Jlllf.lmat.;:d transli:ltrm;

4 CopiBd m reused ,.;ithNJt ch;ul(!*

5 l'.k:-dified

On!,nn De!irulion ! N;:.w v.'0rk no prk>r ;:,xlf.Jfii!C;:.

2 !Ynor wo1k: !ake-n or <~<lapted from

3 A, f•tB'ViOtJ$ vBr$iOn bvHd, 01 t~h;asi1

4 \>>mmerd$11. <>ff-the-~he!f soft-v:,~re (COTS> •:.th&r than ht>r<~riBS

S Gt•\'ernnwnt fumrsh,;d &<•ft·Nnm ·GFSJ t:>lher thrn1 tBur,.;; hhr:~rins

6 Another product

7 A v.;;ndor-suppl~d h1t1quage support llhrDry (lmmodified!

8 A ve-uda·supphed t>pet.:1tin9 ::;y::;tem vr ttti!!ly o_vnrn<Y.htte\l.'

g A lo::ri1 or modified !l'lnlJH<'H~e support hor,~ty or or:-F<ralh'l\') system

iO Cltllt-r cummtm:10! hbrary

i'l A reuse library (Softwani df<si(lned for ret;so:,• 12 Other t;ofl•f;drt:: compon•:nt <X l!brm,·

i3

14

Figure 3.1: Definition checklist for source statements counts [4]

Some changes were made to the line-of-code definitions that depart from the default definition provided in [20]. These changes eliminate categories of

(34)

software which are generally small sources of project effort. Not included in the definition are commercial-off-the-shelf software (COTS), government furnished software (GFS), other products, language support libraries and operating systems, or other commercial libraries. Code generated with source code generators is not included though measurements will be taken with and without generated code to support analysis.

There are a number of problems with using LOG as the unit of measure for software size. The primary problem is the lack of a universally accepted definition for exactly what is a line of code really is.

Another difficulty with lines of code as a measure of system size is its language dependence. It is not possible to directly compare project development by using different languages.

Still another problem with the lines of code measure is the fact that it is difficult to estimate the number of lines of code that will be needed to develop a system from the information available at requirements or design phase of development [7][8].

If cost models based on size are to useful, it is necessary to be able to predict the size of the final product as early and accurately as possible. Unfortunately, estimating software size using the lines of code metric depends so much on previous experience with similar project that experts can make radically different estimates.

Finally, the lines of code measure places undue emphasis on coding, which is only one part of the implementation phase of a software development project. It is stated that coding accounts only for 10% to 15% of the total effort on a large engineering system. It is also questioned whether the total effort is really linearly

(35)

3.2 Function Point Analysis

The function point cost estimation approach is based on the amount of functionality in a software project and a set of individual project factors [3][17][15]. Function points are useful estimators since they are based on information that is available early in the project life cycle.

Software engineers have been searching for a metric that is applicable for a broad range of software engineering environments. The metric should be technology independent and support the need for estimating, project management, measuring quality and gathering requirements. Function Point Analysis is the measure that accomplishes all these requirements.

There have been many misconceptions regarding the appropriateness of Function Point Analysis in evaluating emerging environments such as real time embedded code and Object Oriented programming. Since function points express the resulting work-product in terms of functionality as seen from the user's perspective, the tools and technologies used to deliver it are independent.

Introduction to Function Point Analysis

One of the initial design criteria for function points was to provide a mechanism that both software engineers and users could utilize to define functional requirements. It was determined that the best way to gain an understanding of the users' needs was to approach their problem from the perspective of how they view the results an automated system produces. Therefore, one of the primary goals of Function Point Analysis is to evaluate a system's capabilities from a user's point of view. To achieve this goal, the analysis is based upon the various ways users interact with computerized systems. From a user's perspective a system assists them in doing their job by providing five (5) basic functions. Two of these address the data requirements of an end user and are referred to as

(36)

Data Functions. The remaining three addresses the user's need to access data and are referred to as Transactional Functions.

Function point calculations

Function points (FP) measure size in terms of the amount of functionality in a system. Function points are computed by first calculating an unadjusted function point count (UFC). Counts are made for the following categories [Fenton]:

Internal Logical Files - The first data function allows users to utilize data they are responsible for maintaining. For example, a pilot may enter navigational data through a display in the cockpit prior to departure. The data is stored in a file for use and can be modified during the mission. Therefore the pilot is responsible for maintaining the file that contains the navigational information. Logical groupings of data in a system, maintained by an end user, are referred to as Internal Logical Files (ILF).

External Interface Files - The second Data Function a system provides an end user is also related to logical groupings of data. In this case the user is not responsible for maintaining the data. The data resides in another system and is maintained by another user or system. The user of the system being counted requires this data for reference purposes only. For example, it may be necessary for a pilot to reference position data from a satellite or ground-based facility during flight. The pilot does not have the responsibility for updating data at these sites but must reference it during the flight. Groupings of data from another system that are used only for reference purposes are defined as External Interface Files (ElF).

The remaining functions address the user's capability to access the data contained in ILFs and EIFs. This capability includes maintaining, inquiring and

(37)

External Input - The first Transactional Function allows a user to maintain Internal Logical Files (ILFs) through the ability to add, change and delete the data. For example, a pilot can add, change and delete navigational information prior to and during the mission. In this case the pilot is utilizing a transaction referred to as an External Input (EI). An External Input gives the user the capability to maintain the data in ILF's through adding, changing and deleting its contents.

External Output - The next Transactional Function gives the user the ability to produce outputs. For example a pilot has the ability to separately display ground speed, true air speed and calibrated air speed. The results displayed are derived using data that is maintained and data that is referenced. In function point terminology the resulting display is called an External Output (EO).

External Inquiries - The final capability provided to users through a computerized system addresses the requirement to select and display specific data from files. To accomplish this a user inputs selection information that is used to retrieve data that meets the specific criteria. In this situation there is no manipulation of the data. It is a direct retrieval of information contained on the files. For example if a pilot displays terrain clearance data that was previously set, the resulting output is the direct retrieval of stored information. These transactions are referred to as External Inquiries (EQ).

In addition to the five functional components described above there are two adjustment factors that need to be considered in Function Point Analysis.

Functional Complexity - The first adjustment factor considers the Functional Complexity for each unique function. Functional Complexity is determined based on the combination of data groupings and data elements of a particular function. The number of data elements and unique groupings are counted and compared to a complexity matrix that will rate the function as low, average or high

(38)

complexity. Each of the five functional components (ILF, ElF, El, EO and EQ) has its own unique complexity matrix.

Tables 3.1 shows the complexity rating matrix for the different categories calculated.

For ILF and ElF For EO .and EQ ForEI

Record Data Elements File Data Elements File Data Elements Elements 1 - 20

-

51+ Types 1 - 6- 19 20 + Types 1 - 5

-19 50 5 4 15

1 Low Low Avg 0 or 1 Low Low Avg 0 or 1 Low Low

2-5 Low Avg High 2-3 Low Average High 2-3 Low Avg

6+ Avg High High 4+ Avg High High 3+ Avg High

Table 3.1: Function point complexity matrix [11]

Table 3.2 shows the complexity weight matrix that must be applied after the function points have be categorized and complexities determined.

Function Type Complexity-Weight

Low Average High

Internal Logistic Files 7 10 15

External Interfaces Files 5 7 10

External Inputs 3 4 6

External Outputs 4 5 7

External Enquiries 3 4 6

Table 3.2: Function point complexity-weight matrix [11]

All of the functional components are analyzed in this way and added together to derive an Unadjusted Function Point count (UFP).

16 +

Avg High High

(39)

UFP= ~X _~₁*W ₁ Equation 3.2

Where Xi is the specific number for specific function type andWi is the complexity weight value listed in table 3.2

Value Adjustment Factor

The Technical complexity factor (TCF) is when the Unadjusted Function Point count is multiplied by the second adjustment factor called the Value Adjustment Factor. This factor considers the system's technical and operational characteristics and is calculated by answering 14 questions [1 ][29]. The factors are:

• Data Communications. The data and control information used in the application are sent or received over communication facilities.

• Distributed Data Processing. Distributed data or processing functions are a characteristic of the application within the application boundary.

• Performance. Application performance objectives, stated or approved by the user, in either response or throughput, influence (or will influence) the design, development, installation and support of the application.

• Heavily Used Configuration. A heavily used operational configuration, requiring special design considerations, is a characteristic of the application.

• Transaction Rate. The transaction rate is high and influences the design, development, installation and support.

• On-line Data Entry. On-line data entry and control information functions are provided in the application.

• End -User Efficiency. The on-line functions provided emphasize a design for end-user efficiency.

• On-line Update. The application provides on-line update for the internal logical files.

(40)

• Complex Processing. Complex processing is a characteristic of the application.

• Reusability. The application and the code in the application have been specifically designed, developed and supported to be usable in other applications.

• Installation Ease. Conversion and installation ease are characteristics of the application. A conversion and installation plan and/or conversion tools were provided and tested during the system test phase.

• Operational Ease. Operational ease is a characteristic of the application. Effective start-up, backup and recovery procedures were provided and tested during the system test phase.

• Multiple Sites. The application has been specifically designed, developed and supported to be installed at multiple sites for multiple organizations. • Facilitate Change. The application has been specifically designed,

developed and supported to facilitate change.

Each component is rated from 0 to 5, where 0 means the component has no influence on the system and 5 means the component is essential [26]. The technical complexity factor (TCF) can then be calculated as [19]:

TCF

=

0.65 + 0.01 (LFi) Equation 3.3

Where Fi is the function counts determined in the initial analysis process. The TCF can range from 0.65 to 1.35 because a figure of 0.65 would result if all the complexity factors had no influence, and a figure of 1.35 would indicate all the complexity factors had a significant influence.

Each of these factors is scored based on their influence on the system being counted. The resulting score will increase or decrease the Unadjusted Function Point count by 35%. This calculation provides us with the Adjusted Function

(41)

FP=UFP*TCF Equation 3.4

Function Points as a Sizing Metric

Function points are a synthetic method, much the same as square feet or meters that permit the calculation of a relative size for individual software projects, applications, or subsystems even in their early requirements stages. Function point counting is typically performed when a developer wants to size and estimate development time and effort for an application or a project. In addition to functional size, other risk and complexity factors must be considered when estimating effort. These factors include, but are not limited to [19]:

• Development and/or maintenance tasks to be performed

• Application complexities; e.g., logical complexity, mathematical complexity, security requirements, etc.

• Performance considerations • Source code languages used

• Extent of reusable components from previously developed documents and code

• Skill sets of both development and user personnel in all phases

• The process and technology to be applied in development and maintenance

• The environment in which development and/or maintenance will take place

• When the impact of selected risk and complexity factors is considered, the effort required for development or maintenance of a certain range of function points can be estimated accurately.