Predicting the Fix Time of Bugs

Related Work

6.7 Predicting the Fix Time of Bugs

Giger et al. [GPG10] use prediction models to support developers in the cost/benefit analysis by recommending which defects should be fixed first. They investigated the relationships between the fix-time of defect reports and their attributes with six sub-systems taken from open source software projects: Eclipse JDT, Eclipse Platform, Mozilla Core, Mozilla Firefox, Gnome GStreamer, and Gnome Evolution.

By analysing the change history information from the defect reports, they compute the measures of some of the defect attributes (e.g.: # of comments made to a defect report, # of people in CC list, etc). In a second step, decision trees using Exhaustive CHAID algorithm are applied to classify the defect reports as “Fast” or “Slow” using the median of the defect fix-time, in hours, from opened to last fix.

To evaluate the approach, they used 10-fold cross validation: the data set is broken into 10 sets of equal size. The model is trained with 9 data sets and tested with the remaining tenth data set. This process is repeated 10 times with each of the 10 data sets used exactly once as the validation data. The results of the 10 folds then are averaged to produce the performance measures. Besides precision and recall, they also used the area under the receiver operating characteristic curve (AUC) statistic for measuring the performance of prediction models. The authors identified that assignee, reporter, and monthOpened (the month in which the defect report was opened) are the attributes that have the strongest influence on the fix-time of defects. The average precision varied from 60.8% (Mozilla Firefox) to 65.4% (Eclipse Platform); the recall ranged from 48.5% (Eclipse JDT) to 73.2% (Mozilla Firefox); and the AUC varied from 64.9% (Eclipse JDT) to 74.3% (Eclipse Platform). Detailed results from this work can be seen in Table 6.2.

Table 6.2 Performance of prediction models computed with initial attribute values.

Project Precision Recall AUC Eclipse JDT 63.5% 48.5% 64.9% Eclipse Platform 65.4% 69.2% 74.3% Mozilla Core 63.9% 64.1% 70.1% Mozilla Firefox 60.8% 73.2% 70.1% Gnome GStreamer 64.6% 69.4% 72.4% Gnome Evolution 62.8% 69.5% 69.4%

time, using post-submission data: in addition to the initial values they obtain the attribute values for 1 day, 3 days, 1 week, 2 weeks, and for 1 month after a defect report was opened. Using this approach, they managed to improve the performance of the prediction models by 5% to 10%.

The approach developed by Giger et al. can be very useful to new developers as it gives an insight in how defects are prioritised in a software project.

6.8 PR-Miner

PR-Miner (Programming Rule Miner) [LZ05a] is a tool that uses a data mining technique called frequent itemset mining to identify patterns on implicit programming rules. For example, the functions lock() and unlock() are always used in pairs. Besides such a well-known programming rule, there are many other implicit rules in large software. Such rules are useful information for software development. Unfortunately, they usually exist only in programmers’ minds as they are too tedious to be documented manually. In addition, rule maintenance is a hard task since some rules can change in new versions. Violations to these rules are easy for programmers to introduce, especially for new programmers who are unaware of these rules.

The tool not only identifies these recurrent patterns but also detects their violations. The authors report thousands of occurrences of such patterns in projects like Linux and PostgreSQL. Such violations are potential defects being introduced. Once the list of suspects is generated, a manual inspection must be done to eliminate the false positives. By analysing the top 60 violations, 16 defects were found in Linux, 6 in PostgreSQL and 1 in Apache. Most of these defects violate complex rules that contain more than 2 elements and are thereby difficult to be detected by previous tools.

6.9 Dynamine

DynaMine [LZ05b] analyses source code check-ins and, just like PR-Miner [LZ05a], extracts coding patterns and their violations. Its method can learn both simple and complex patterns and scales to millions of lines of code.

The tool combines revision history information with dynamic analysis for the purpose of finding software errors. It largely automates the mining and dynamic execution steps and makes the results of both steps more accessible by presenting the discovered patterns as well as the results of dynamic checking to the user in custom Eclipse views.

The authors applied the tool on two real software systems (Eclipse [Hol04] and jEdit) and found 57% of patterns’ occurrences, in which 66% of those were violations.

6.10 Concluding Remarks

In this chapter we presented some works that apply recommender systems and other artificial intelligence techniques to the software engineering domain. It was very difficult to find works closely related to ours. Most of the works are related to recommender systems

being applied to software engineering in general: recommending people with desired ex- pertise [MM07, MA00, MH02]; recommending developers to fix defects [AHM06]; allocat- ing people geographically distributed for developing components together [PdSRE10]. We found only one work related to recommender systems being applied to testing: Kpodjedo et al.[KRGA08] developed a recommendation system to suggest which classes, in an object ori- ented system, should be tested more deeply. Even the book “Artificial Intelligence Methods In Software Testing” [LKB04], which collects a representative sample of artificial intelligence applications in the areas of software testing, does not mention any application of recommender systems applied to software testing. As far as we know there are no works being developed that make use of recommender systems to assign test cases to testers.

If, on the one hand, the lack of closely related works did not allow us to perform a com- parative analysis to confront prior approaches to ours and eventually show any improvements brought by our method, on the other hand, it gives a strong evidence that we are addressing a new problem. The adoption of recommender systems for helping test managers to assign test cases to testers seems to be a new contribution.

In document Recommender Systems for Manual Testing (Page 83-86)