Defect Density Models - The Science of Software Quality Engineering

2.2 The Science of Software Quality Engineering

2.2.1 Defect Density Models

2.2.1.1 Defect Density and Module Size

In 1971 Akiyama published the first attempt to quantify software quality proposing a regression-based model for defect density prediction in terms of module size. [Aki71] Akiyama’s model used defects discovered during testing as a measure of system complexity. This approach was later shown to be insufficient when N. Fenton [FNSS99] compared various defect prediction models and demonstrated that some complex systems have lower defect densities (see Figure 2.3 ). Fenton observed that the def- inition of defects differed from study to study and that for models to accurately predict defects, in addition to size, the models must also take into consideration key factors such as:

• The unknown relationship between defects and failures.

• Problems of using size and complexity metrics as sole “predictors” of defects.

• False claims about software decomposition.

Fenton also suggested that factors such as code maturity, software reuse and op- timal size of modules may affect the defect densities differently at various points in a product’s lifecycle and noted that “most defects in a system are benign in the sense that in the same given period of time they will not lead to failures” and therefore, despite their usefulness from a developer’s perspective, (i.e., improving the quality of software before release), “defect counts cannot be used to predict reliability because,...it does not measure the quality of the system as a user is likely to experience it”. [FNSS99] (i.e., pre-release defect removal may not translate into post-release reliability).

Number Hypothesis Case study evidence? 1a a small number of modules contain most of the total faults discovered during pre-release testing Yes evidence of 20-60 rule 1b if a small number of modules contain most of the faults discovered during pre-release testing

then this is simply because those modules constitute most of the code size

No -

2a a small number of modules contain most of the operational faults Yes evidence of 20-80 rule

2b if a small number of modules contain most of the operational faults then this is simply because those modules constitute most of the code size

No strong evidence of a converse hypothesis 3 Modules with higher incidence of faults in early pre-release

likely to have higher incidence of faults in system testing

Weak support - 4 Modules with higher incidence of faults in all pre-release testing

likely to have higher incidence of faults in post-release operation

No strongly rejected

5a Smaller modules are less likely to be failure prone than larger ones No -

5b Size metrics (such as LOC) are good predictors of number of prerelease faults in a module Weak support - 5c Size metrics (such as LOC) are good predictors of number of postrelease faults in a module No - 5d Size metrics (such as LOC) are good predictors of a moduleâĂŹs (pre-release) fault-density No - 5e Size metrics (such as LOC) are good predictors of a moduleâĂŹs (post-release) fault-density No -

6 Complexity metrics are better predictors than simple size metrics of fault and failure-prone modules No No (for cyclomatic complexity), but some weak support for metrics based on SigFF 7 Fault densities at corresponding phases of testing and operation

remain roughly constant between subsequent major releases of a software system

Yes -

8 Software systems produced in similar environments have broadly similar fault densities at similar testing and operational phases

Yes -

Table 2.1: Results from testing fault metrics hypothesis (reproduced from [FNSS99])

2.2.1.2 Defect Density Over the Lifetime of the Code

Emphasis on reliability as a characteristic of overall quality led to the acknowledge- ment of the need to distinguish between defects discovered at different life-cycle phases. Table 2.1 lists the results of a case study testing the validity of many of the hypotheses at the root of Software Reliability Engineering (SRE) metrics. Hy- pothesis number 4 is particularly interesting, because the result from this study sug- gests no clear evidence for the relationship between module complexity, pre-release discovered defects, and post-release faults (resulting in failure). This and other evidence [MD99, MCKS04] led to the current understanding that defect density as a metric for software quality must be measured “holistically” [HR06] i.e., over the entire lifetime of the software.

Studies looking at defect density over the code lifetime show that the most suc- cessful models for predicting failures from defects are those which measure contri- butions from LOC changes to a software module. These models show that large

and/or recent changes to a module or code-base result in the highest fault poten-

2.2.1.3 Defect Density and Software Reuse

The recognition that releasing new software or changing old software resulted in more defects being discovered (presumably as a result of more defects being added), led to one of the most widely recommended software quality maxims: Reuse software whenever possible. The belief that old, already in use software is relatively free of defects and the practice of reusing old code (whether it be lines, sections or entire modules) soon became widespread, and there is much software engineering evidence to support this belief. In fact, a meta-analysis looking at software reuse from 1994-2005 found that “Systematic software reuse is significantly related to lower problem (defect, fault, or error) density” [MCKS04]. As a result, even today the current software engineering models, such as the Capability Maturity Model [Jel00], strongly recommend software reuse, and the consensus is that the longer software remains unchanged (the older it is) the fewer defects are likely to be found. The

longer software has stood the test of time the higher its apparent quality.

In document The Software Vulnerability Ecosystem: Software Development In The Context Of Adversarial Behavior (Page 43-45)