5. COEVOLUTIONARY AUTOMATED SOFTWARE CORRECTION
5.2. DESIGN
5.2.1. Approach Overview
Initializa-tion, Testing and VerificaInitializa-tion, and Testing and Correction. The System Initialization module is passed through one time at the start of a run and is not reentered. This module performs setup tasks needed by the other modules.
Figure 5.5: CASC Testing, Correction, and Verification Process
After initialization is complete, control is passed to the Testing and Verification module. This module attempts to identify test cases that demonstrate buggy behavior in the current candidate solution (in the first pass, this is the source program). If a bug is not demonstrated then the system exits; otherwise the system attempts to
create additional test cases that demonstrate the bug and then passes control to the Testing and Correction module.
The Testing and Correction module is responsible for creating a program (based on the source program) that passes all test cases created by the system. If no such program is found, then the system exits; otherwise the created program is marked as the candidate solution and control is passed back to the Testing and Verification module.
CASC utilizes multiple EAs, which are described in this section. For each run, a set of EA strategy parameters used by the system are provided via a configuration file. Except when specifically noted, the set of EA strategy parameters provided in this file are used uniformly by the EAs in the system.
This section is organized based on the major phases in the flow chart chart shown in Figure 5.5:
• System initialization is discussed in Section 5.2.2; with program parsing and program population initialization discussed in Sections 5.2.2.1 and 5.2.2.2, re-spectively.
• The Testing and Verification module is described in Section 5.2.3. Test case initialization is discussed in Section 5.2.3.1, coverage based test case set creation is discussed in Section 5.2.3.2, and the testing and verification EA is described in Section 5.2.3.3.
• The Testing and Correction module is described in Section 5.2.4. Program and test case evaluation is described in Section 5.2.4.1, with the optimization meth-ods supported in CASC described in Section 5.2.4.2. The program reproduction operators used are described in Section 5.2.4.4. Search stagnation detection is discussed in Section 5.2.4.5.
The method of communication of problem specific information to an auto-mated repair system is an important aspect of any such system; in order to be prac-tical, the system must have the versatility to address a variety of problems. In the CASC system the test case object embodies all problem specific information needed by the system and supporting subsystems. This is accomplished through the use of the dynamic polymorphism design paradigm [14]. An Abstract Test Case (ATC) object is provided by the system, laying out guidelines for the expected functionality that a Problem Specific Test Case (PSTC) object will need. The goal of this design is to centralize all problem specific implementation in a single object, making the transition to new problems as smooth as possible. Key functionality included in the current ATC object is:
• Mutation, crossover, and randomization of test cases
• Creation of input needed by the program from the test case
• Creation of expected output for the test case
• Reading in and storing the output of an execution
• Scoring an execution (i.e., calculation of objective scores for the problem) This functionality essentially makes the system able to serve as its own oracle, remov-ing the need for an external oracle. Clearly, this object is limited to the application of a specific class of problems; namely those for which expected output can be generated for the test case. Through use of the polymorphic design of the system, additional ATC objects could be designed for other classes of problems.
The described ATC object may, at first, appear to impose steep requirements on the user of the CASC system; however, for any system to remove the need for a priori generated test case sets made through the use of an external oracle, similar requirements will need to be made. In other words, the system will need to be told
what defines a test case for the problem and how to use and manipulate such a test case. Through the use of the described ATC object (or another similar object), automated software engineering systems (such as CASC) can clearly be expected to achieve a higher degree of automation and perform more comprehensive and intelligent search than those that rely on a priori generated test sets and external oracles.
The comprehensiveness of the testing performed by the CASC system is di-rectly related to that of the PSTC being used. For example, if a bug is only demon-strated for test cases with duplicate genes, but the PSTC disallows the creation of duplicate genes in a test case, then the system will not be able to demonstrate the bug and, as such, be unable to correct it. If known beforehand, information regard-ing the nature of the error in the program could be exploited durregard-ing the design of a PSTC by restricting the test cases that can be generated by the PSTC, making the implementation simpler. However, such restrictions introduce bias into the system and, as such, limit its effectiveness and so should be used with caution. Following the previous example, assume a PSTC is implemented that focuses exclusively on the generation and manipulation of test cases that contain duplicate genes. The CASC system would be expected to quickly identify the error in question and would likely correct it; however, if the error was masking a second error or, in the process of cor-recting the error, the system introduced a new error that was not reliant on duplicate genes, then the solution presented by the system would essentially be a false posi-tive. Similarly, these dangers are also present when using a priori generated test case sets, since when these are used the system is completely reliant on the generating entity’s ability to generate a test case set with the necessary degree of comprehension to correct the problem effectively.
For the remainder of this discussion, a running example will be used to demon-strate various aspects of CASC using the buggy bubble sort function shown in Fig-ure 5.6. For the sake of the example, assume that this function is being focused on for
correction and, as such, the specifications being used are specifically for the sorting function; namely that after execution the resulting data array is a permutation of input data array and that it is in sorted order. Also assume that the function shown is embedded in an otherwise complete program that in some way communicates the input and output data array. In this program the error is on line 6, which should be data[j] = data[j+1].
const int SIZE = 10;
...
void sort(int data[]) {
1 int i, j, temp;
2 for(i = 0; i < SIZE; ++i) {
3 for(j = 0; j < SIZE - 1; ++j) { 4 if(data[j] > data[j + 1]) {
5 temp = data[j];
6 data[j+1] = data[j];
7 data[j+1] = temp; } } }
}
Figure 5.6: Buggy Bubble Sort Function
5.2.2. System Initialization Module. This module is primarily