Chapter 5. Applying the Seven Basic Quality Tools in Software Development
5.7 Control Chart
The control chart is a powerful tool for achieving statistical process control (SPC). However, in software development it is difficult to use control charts in the formal SPC manner. It is a formidable task, if not impossible, to define the process capability of a software development process. In production
environments, process capability is the inherent variation of the process in relation to the specification limits. The smaller the process variation, the better the process's capability. Defective parts are parts that are produced with values of parameters outside the specification limits. Therefore, direct relationships exist among specifications, process control limits, process variations, and product quality. The smaller the process variations, the better the product quality will be. Such direct correlations, however, do not exist or at least have not been established in the software development environment.
In statistical terms, process capability is defined:
where USL and LSL are the upper and lower engineering specification limits, respectively, sigma is the standard deviation of the process, and 6 sigma represents the overall process variation.
If a unilateral specification is affixed to some characteristics, the capability index may be defined:
where u is the process mean, or
In manufacturing environments where many parts are produced daily, process variation and process capability can be calculated in statistical terms and control charts can be used on a real-time basis.
Software differs from manufacturing in several aspects and such differences make it very difficult, if not impossible, to arrive at useful estimates of the process capability of a software development organization.
The difficulties include:
Specifications for most defined metrics are nonexistent or poorly related to real customer needs.
Well-defined specifications based on customer requirements that can be expressed in terms of metrics are lacking for practically all software projects (more accurately, they are extremely difficult to derive).
Software is design and development, not production, and it takes various phases of activity
(architecture, design, code, test, etc.) and considerable time to complete one project. Therefore, the life-cycle concept is more applicable to software than control charts, which are more applicable to sequential data from ongoing operations.
Related to the above, metrics and models specific to software and the life-cycle concept have been and are still being developed (e.g., software reliability models, defect removal models, and various in-process metrics) and they are going through the maturing process. These models and metrics seem to be more effective than control charts for interpreting the software patterns and for product quality management.
Addison Wesley: Metrics and Models in Software Quality Engineering, Second Edition 5.7 Control Chart
Even with the same development process, there are multiple common causes (e.g., tools, methods, types of software, types of components, types of program modules) that lead to variations in quality.
The typical use of control charts in software projects regularly mix data from multiple common cause systems.
There are also the behavioral aspects of process implementation (e.g., skills, experience, rigor of process implementation) that cause variations in the quality of the product (Layman et al., 2002).
Many assumptions that underlie control charts are not being met in software data. Perhaps the most critical one is that data variation is from homogeneous sources of variation; this critical assumption is not usually met because of the aforementioned factors. Therefore, even with exact formulas and the most suitable type of control charts, the resultant control limits are not always useful. For instance, the control limits in software applications are often too wide to be useful.
Within a software development organization, multiple processes are often used, and technology and processes change fast.
Even when a process parameter is under control in the sense of control charts, without the direct connection between process limits and end-product quality, what does it mean in terms of process capability?
Despite these issues, control charts are useful for software process improvement� when they are used in a relaxed manner. That means that control chart use in software is not in terms of formal statistical process control and process capability. Rather, they are used as tools for improving consistency and stability. On many occasions, they are not used on a real-time basis for ongoing operations. They are more appropriately called pseudo-control charts.
There are many types of control chart. The most common are the X-bar and S charts for sample
averages and standard deviations, and the X-bar and R charts for sample averages and sample ranges.
There are also median charts, charts for individuals, the p chart for proportion nonconforming, the np chart for number nonconforming, the c chart for number of nonconformities, the u chart for
nonconformities per unit, and so forth. For X-bar and S charts or X-bar and R charts, the assumption of the statistical distribution of the quality characteristic is the normal distribution. For the p and the np charts, the assumption of statistical distribution is the binomial distribution. For the c and the u charts, it is assumed that the distribution of the quality characteristic is the Poisson distribution. For details, see a text in statistical quality control (e.g., Montgomery (1985)).
The most approximate charts for software applications are perhaps the p chart, when percentages are involved, and the u chart, when defect rates are used. The control limits are calculated as the value of the parameter of interest (X-bar or p, for example) plus/minus three standard deviations. One can also
increase the sensitivity of the chart by adding a pair of warning limits, which are normally calculated as the value of the parameter plus/minus two standard deviations. As the calculation of standard deviations differs among types of parameters, the formulas for control limits (and warning limits) also differ.
For example, control limits for defect rates (u chart) can be calculated as follows:
where , value for the center line, is the cumulative defect rate (weighted average of defect rates) across the subgroups, and ni is the size of subgroup i for the calculation of defect rate (e.g., the number of lines of source code or the number of function points). Usually the subgroups used as the unit for calculating and controlling defect rates could be program modules, components, design review sessions of similar length in time, design segments, code segments for inspections, and units of document reviews. Note that Addison Wesley: Metrics and Models in Software Quality Engineering, Second Edition 5.7 Control Chart
in the formula, ni is the subgroup size and therefore the control limits are calculated for each sample.
Therefore the control limits will be different for each data point (subgroup) in the control chart. The second approach is to base the control chart on an average sample size, resulting in an approximate set of control limits. This requires the assumption that future sample size (subgroup size) will not differ greatly from those previously observed. If this approach is used, the control limits will be constant and the
resulting control chart will not look as complex as the control chart with variable limits (Montgomery, 1985). However, if the sample sizes vary greatly, the first approach should be used.
Control limits for percentages (e.g., effectiveness metric) can be calculated as follows:
where , the center line, is the weighted average of individual percentages and ni is the size of subgroup i. Like the m chart, either the approach for variable control limits or the approach for constant control limits (provided the sample sizes don't vary greatly) can be used. If the true value of p is known, or is specified by management (e.g., a specific target of defect removal effectiveness), then p should be used in the formulas, instead of .
Some examples of metrics from the software development process can be control charted, for instance, inspection defects per thousand lines of source code (KLOC) or function point, testing defects per KLOC or function point, phase effectiveness, and defect backlog management index (as discussed in Chapter 4). Figure 5.12 shows a pseudo-control chart on testing defects per KLOC by component for a project at IBM Rochester, from which error-prone components were identified for further in-depth analysis and actions. In this case, the use of the control chart involved more than one iteration. In the first iteration, components with defect rates outside the control limits (particularly high) were identified. (It should be noted that in this example the control chart is one-sided with only the upper control limit.)
Figure 5.12. Pseudo-Control Chart of Test Defect Rate�First Iteration
In the second iteration, the previously identified error-prone components were removed and the data were plotted again, with a new control limit (Figure 5.13). This process of "peeling the onion" permitted the identification of the next set of potentially defect-prone components, some of which may have been masked on the initial charts. This process can continue for a few iterations. Priority of improvement actions as they relate to available resources can also be determined based on the order of iteration in which problem components are identified (Craddock, 1988). At each iteration, the out-of-control points should be removed from the analysis only when their causes have been understood and plans put in place to prevent their recurrence.
Addison Wesley: Metrics and Models in Software Quality Engineering, Second Edition 5.7 Control Chart
Figure 5.13. Pseudo-Control Chart of Test Defect Rate�Second Iteration
Another example, also from IBM Rochester, is charting the inspection effectiveness by area for the several phases of reviews and inspections, as shown in Figure 5.14. Effectiveness is a relative measure in percentage, with the numerator being the number of defects removed in a development phase and the denominator the total number of defects found in that phase, plus defects found later (for detailed discussion on this subject, see Chapter 6). In the figure, each data point represents the inspection effectiveness of a functional development area. The four panels represent high-level design review (I0), low-level design review (I1), code inspection (I2), and overall effectiveness combining all three phases (lower right). Areas with low effectiveness (below the warning and control limits) as well as those with the highest effectiveness were studied and contributing factors identified. As a result of this control charting and subsequent work, the consistency of the inspection effectiveness across the functional areas was improved.
Figure 5.14. Pseudo-Control Chart of Inspection Effectiveness
In recent years, control charts in software applications have attracted attention. The importance of using quantitative metrics in managing software development is certainly more recognized now than previously.
A related reason may be the promotion of quantitative management by the capability maturity model (CMM) of the Software Engineering Institute (SEI) at the Carnegie Mellon University. The concept and terminology of control charts are very appealing to software process improvement professionals. A quick survey of the examples of control chart applications in software in the literature, however, supported and confirmed the challenges discussed earlier. For instance, many of the control limits in the examples were Addison Wesley: Metrics and Models in Software Quality Engineering, Second Edition 5.7 Control Chart
too wide to be useful. For such cases, simple run charts with common sense for decision making would be more useful and control charts might not be needed. There were also cases with a one-sided control limit or a lower control limit close to zero. Both types of cases were likely due to problems related to multiple common causes and sample size. The multiple common cause challenge was discussed earlier.
With regard to sample size, again, a production environment with ongoing operations is more able to meet the challenge. The subgroup sample size can be chosen according to statistical considerations in a production environment, such as specifying a sample large enough to ensure a positive lower control limit. In software environments, however, other factors often prohibit operations that are based on statistical considerations. At the same time, it is positive that experts have recognized the problems, begun identifying the specific issues, started the discussions, and embarked on the process of mapping possible solutions (e.g., Layman et al., 2002).
To make control charts more applicable and acceptable in the software environment, a high degree of ingenuity is required. Focused effort in the following three areas by experts of control charts and by software process improvement practitioners will yield fruitful results:
1. The control chart applications in software thus far are the Shewhart control charts. Alternative