TARANTULA+ - AUTOMATED FAULT LOCALIZATION

4. AUTOMATED FAULT LOCALIZATION

4.5. TARANTULA+

A second fitness guided fault localization system has been created, based on the trace comparison and TBLS techniques in the FGFL system. This system is called Tarantula+¹. Tarantula+ removes many of the assumptions that Tarantula makes as well as improves and further automates the process originally proposed by Jones

Tarantula+ uses the results of trace comparison and TBLS techniques along with static analysis of the faulty program to discover lines that are most likely re-sponsible for causing an error. The TBLS technique has been altered to allow test cases with more extreme fitness values alter line suspicion more dramatically. This is done by using Algorithm 4.

Algorithm 4 Suspicion Adjustment Amount Calculation used in Tarantula+

SAA = −2 · f itness + 1 if SAA < 0 then

SAA = −(SAA²) else

SAA = SAA² end if

Additionally, the support was added to modified TBLS technique that allowed the user to indicate error branches in the program through the use of an error comment (the specific comment used is indicated to the Tarantula+ system via the system’s configuration file). All lines in an identified error branch would automatically be

1The Tarantula+ system is the work of the author and his undergraduate mentee Alex Bertels.

This section summarizes the preliminary studies performed on the system as presented in Alex’s CS390 Undergraduate Research report and which are the basis for a conference paper in preparation.

given zero suspicion. This was done becuase positive test case execution traces would receive a negative SAA value, making them less suspicious than lines in error branches that were never executed, which could confuse the system results.

Tarantula+ has the capability of providing additional automated static anal-ysis of the faulty program using the parse trees produced by the system’s parser (the same parse used by the CASC system, described in Section 5.2.2.1). By using these trees to find various relationships between statements and code elements, additional suspicion can be applied to lines indirectly responsible for the incorrect outcome. For instance, the conditions within branch and loop statements that determine whether or not certain lines are executed indirectly affect the final outcome of the program.

Using the trace comparison and TBLS techniques, branch and loop statements that contain the error can be ran by both positive and negative executions and would not receive as much suspicion as the lines within its scope. By allowing the loop or branch to contain as much suspicion as the most suspicious line that is in its scope, the statement can take responsibility for running lines that should not have been ran.

Another observation made by the system is that if an incorrect condition of an

’if’ statement causes that condition to result in false when it should have been true, then the corresponding ’else if’ and ’else’ statements are given the option of executing lines within their scopes. This could result in the ’else if’ or ’else’ statements receiving suspicion for the ’if’ or another ’else if’ having wrong conditions. A solution to this problem was to take the sum of the suspicion to the corresponding ’if’, ’else if’, and

’else’ statements and apply the sum to each of the statements.

The last observation currently made by the system addresses the idea that an incorrect assignment of a variable can affect any other statement that variable may be on. For each function, each variable will accumulate suspicion for each suspicious line that the variable appears in. These suspicion totals will be assigned to any line in which that variable is assigned a value or is incremented or decremented.

These three observations take advantage of the suspicion applied by the tech-niques and the relationships between statements to correctly distribute suspicion.

This process is crucial to avoid misrepresenting the likeliness that a line contains the fault.

4.5.1. Preliminary Tarantula+ Results. Some preliminary experimen-tation has been performed using the Tarantula+. The results of these experiments are summarized in Table 4.10. In these experiments, the trace comparison technique was only used for analysis of single function programs or in programs where the diver-gent path was in one function. The current implementation of the technique does not provide meaningful results if the divergent path starts and ends in different functions.

More work is currently being done to ensure that the technique only adds suspicion to those lines in the divergent path and not every line that falls between the start and end.

Two programs of the Siemens Suite, a widely-used set of programs for fault localization, were tested along with some additional programs. These programs are listed with a description of some of the errors tested, the techniques used, and how the error line placed in comparison to the other lines in the program. Programs such as print tokens2 and replace, which contain many branch and loop statements, allow for more variety in the execution traces. Having a unique execution trace is ideal for any fault localization system. The results for shorter programs like remainder and triangleClassification benefit less from the additional suspicion added by the analysis of the program and more from the direct results of techniques. Future work will include finding a balance between the analysis suspicion and technique suspicion based on the program length.

Table 4.10: Preliminary Tarantula+ Results

In document Evolutionary computing driven search based software testing and correction (Page 66-70)