TRW Defect Data - Defect Identification - The Pareto Principle Applied to Software Quality Assu

The Pareto Principle Applied to Software Quality Assurance

6.4 Defect Identification

6.4.2 TRW Defect Data

Another source of data for a cost by type analysis is provided in SoftwareReliability [9]. This book presents an extensive collection of analysis of error data performed at TRW. Project TRW1 is broken down into four subprojects. Each is a project unto itself because of the differing management, languages, development personnel, requirements, and so on.

Table 6.13 presents an analysis that is similar to the breakdown of the Rubey data. Although the definition of error types does not completely agree for the two

6.4 Defect Identification 135

Table 6.12 Common Symptoms for Software Defects: Erroneous Arithmetic Computation

Error Category Total Serious Moderate Minor No. % No. % No. % No. % Wrong arithmetic operations performed 69 61 12 55 47 64 10 56 Loss of precision 9 8 1 5 6 8 2 11 Overflow 8 7 3 14 3 4 2 11 Poor scaling of intermediate results 22 20 4 18 15 21 3 17 Incompatible scaling 5 4 2 9 2 3 1 5 Total 113 100 22 19 73 65 18 16

studies, there is a striking similarity in the two sets of data: logic errors and data-handling errors rank first and second in the serious error category in the Rubey data, and they likewise rank first and second in the TRW data (in fact, their respec- tive percentages are similar) [10].

The TRW study further analyzes various subtypes of errors. For example, logic errors are divided into the following types:

• _{Incorrect operand in logical expression;} • _{Logic activities out of sequence;}

• _{Wrong variable being checked;} • _{Missing logic on condition test.}

It is very important as well as interesting to examine this more detailed analysis of the two most costly errors: logic and data handling. The results are shown for Project TRW1. Table 6.14 shows the results for logic errors and Table 6.15 shows the detailed data handling errors. This data indicates that the most frequent error subtype (according to TRW’s data) and the most serious subtype (according to Rubey’s data) is missing logic or condition tests. The second most frequent and seri- ous error subtype is data initialization done improperly.

Another interesting study performed by TRW was to analyze error types according to major error categories. A particular error will have its source in one of the fol-

Table 6.13 Percentage Breakdown of Code Change Errors into Major Error Categories Project TRW1 Project TRW1Major Error Categories Proj. TRW2 (%) Proj. TRW3 (%) Applications Software (%) Simulator Software (%) Operating System (%) PA Tools (%) Computational (A) 9.0 1.7 13.5 19.6 2.5 0 Logic (B) 26.0 34.5 17.1 20.9 34.6 43.5 Data input (C) 16.4 8.9 7.3 9.3 8.6 5.5 Data output (E)

Data handling (D) 18.2 27.2 10.9 8.4 21.0 9.3 Interface (F) 17.0 22.5 9.8 6.7 7.4

Data definition (G) 0.8 3.0 7.3 13.8 7.4 3.7 Data base (H) 4.1 2.2 24.7 16.4 4.9 2.8

Other (J) 8.5 0 9.4 4.9 13.6 35.2

Table 6.14 Project TRW1 Detailed Error Category Breakdown

Percent of Major Category Detailed Error Categories

Applications Software Simulator Software Operating System S/W PA Tools B000 LOGIC ERRORS 2.1 8.3 0 4.3

B100 Incorrect operand in logical expression 21.3 6.2 7.1 4.3 B200 Logic activities out of sequence 17.0 29.2 10.7 10.6 B300 Wrong variable being checked 4.3 8.3 14.3 2.1 B400 Missing logic or condition test 46.8 39.6 60.7 76.6

lowing stages of development: requirements, specifications, design, or coding. TRW performed this detailed analysis for 23 major error categories during the design and coding stages of development for Project TRW2. The results are shown in Table 6.16.

6.4 Defect Identification 137

Table 6.15 Project TRW1 Detailed Error Category Breakdown

Percent of Major Category Detailed Error Categories

Applications Software Simulator Software Operating System S/W PA Tools D000 DATA HANDLING ERRORS 10.0 21.1 11.8 70.0 D100 Data initialization not done 6.7 10.5 17.6 0 D200 Data initialization done improperly 20.0 10.5 41.2 10.0 D300 Variable used as a flag or index not set properly 20.0 5.3 23.5 10.0 D400 Variable referred to by wrong name 6.7 21.1 0 0 D500 Bit manipulation done incorrectly 10.0 0 0 0 D600 Incorrect variable type 3.3 10.5 0 0 D700 Data packing/unpacking error 10.0 5.3 0 10.0 D900 Subscripting error 13.3 15.7 5.9 10.0

Table 6.16 Project TRW2 Error Sources

% of Total Code Probable Sources Major Error Categories Change Errors % Design % Code

Computational (AA) 9.0 90 10

Logic (BB) 26.0 88 12

I/O (CC) 16.4 24 76

Data handling (DD) 18.2 25 75

Operating system/ system support software (EE) 0.1 (1)

Configuration (FF) 3.1 24 76

Routine/routine interface (GG) 8.2 93 7 Routine/system software interface (HH) 1.1 73 27 Tape processing interface (II) 0.3 90 10 User requested change (JJ) 6.6 83 17 Data base interface (KK) 0.8 10 90 User requested change (LL) 0 (2)

Preset data base (MM) 4.1 79 21

Global variable/ compool definition (NN) 0.8 62 38

Recurrent (PP) 1.3 (1) Documentation (QQ) 0.8 (1) Requirements compliance (RR) 0.4 89 11 Unidentified (SS) 1.0 (1) Operator (TT) 0.7 (1) Questions (UU) 1.1 (1) Averages 62% 38%

Notes: (1) Although errors in these categories required changes to the code, their source breakdown of design versus

code is not attempted here. Those categories considered in all other categories encompass 95% of all code change errors. (2) For Project TRW2 product enhancements or changes to the design baseline were considered “out-of-scope” and therefore are not present here.

The following observations are offered about the data in Table 6.16. The overall result shown—62% of all errors being design errors and 38% coding errors—is very representative of what other studies of similar data have shown. A rule-of-thumb used in the industry is that about 65% of all the errors will be design errors and 35% coding errors. The fact that 65% of all errors are design errors suggests why the average cost of an error is so high. Another important point illustrated by Table 6.16 is the high cost of logic errors. Indeed, logic errors are the most frequent, and, con- sidering that 88% of logic errors are design errors, they contribute enormously to the cost of a given development. This data and observation reinforce the point made by Rubey’s data: logic errors are the most serious error type. One of the implications of this result is that work done by SQA personnel with specifications should be heavily concentrated in the areas of logic and data handling.

A further area to investigate is the identification of internal modules within a system that can result in high cost. That is, is there a way to identify the modules whose errors will have a large impact on the cost of the system? Specifically, a module’s error becomes costly if that module has many affects on the rest of the modules in a system. A given module could be highly “coupled” with the rest of a system as a result of the parameters it passes, the global data it affects, the interrupts it can cause, or the modules it involves. If such a highly coupled module has errors, it can be very costly since erroneous assumptions made in the module can be spread throughout the rest of the system. The SQA personnel should look at module cou- pling to assure that it is minimized. It should be noted that the term module can be applied to any internal unit of a system.

In document Handbook of Software Quality Assurance. Fourth Edition (Page 155-158)