• No results found

5. COEVOLUTIONARY AUTOMATED SOFTWARE CORRECTION

5.2. DESIGN

5.2.2. System Initialization Module

void sort(int data[]) {

1 int i, j, temp;

2 for(i = 0; i < SIZE; ++i) {

3 for(j = 0; j < SIZE - 1; ++j) { 4 if(data[j] > data[j + 1]) {

5 temp = data[j];

6 data[j+1] = data[j];

7 data[j+1] = temp; } } }

}

Figure 5.6: Buggy Bubble Sort Function

5.2.2. System Initialization Module. This module is primarily responsi-ble for parsing the source program and creating the initial program population from the source program. After the first pass through, this module is never reentered during the run.

5.2.2.1. CASC parsing. The first major task in CASC is to parse the source program, resulting in a tree representation of the program that can then be analyzed and modified. A tree representation is used because it is a natural representation for the code elements, cleanly displaying relationships between them. This makes it

relatively simple to perform modifications to, and generate the code represented by, the trees.

Special comment tags can be used in the source program to mark the start and end of the Evolvable Sections (ESs) of code (i.e., sections where a semantic error is suspected). If these tags are used, then the code that is not part of an ES is not modified during evolution. If no tags are indicated, then the bodies of all the routines in the program are considered to be ESs (support for file/global scope evolution is not yet supported by CASC). Clearly, the problem space for the program population can be dramatically reduced through the use of the ES tags. Because of this, it is highly recommended to first apply fault localization software/techniques to the buggy software artifact to identify the ESs (although preliminary scalability experiments showed CASC having a sub linear relationship with problem size [105]).

The CASC parser first converts the source code to srcML [21] using the srcML toolkit. srcML is an XML representation of source code containing both the code text and selective abstract syntax tree information. The srcML toolkit currently supports C, C++, and Java; hence these are the languages currently supported by CASC, though C++ is the language that has been focused on. The resulting XML document is processed using the pugixml [53] library, which creates object trees based on the innate tree structure of XML. ESs are identified using the XML node information and are converted to ES objects, which are added to a Program object. A CASC Program is essentially a set of one or more ES objects along with the information typically stored for individuals in an EA (e.g., fitness, objective scores, book-keeping data). Each ES object contains a forest of trees (representing the code for the ES, one tree per line of code) and a variable name registry for the ES (indicating valid variable names to use during code modification for the ES).

Name registries are created during the parsing process from global tions, function parameter lists for the function containing the ES, and local

declara-tions. Each name has an associated type (e.g., int, char, float), qualifier information, and modifier information stored for the name. Additionally, all name registries also share an object registry, containing a listing of known user-defined objects; essentially each registered user-defined object has a name registry associated with it, containing information on the public members of the object. The object registry allows for in-telligent modification/use of user defined types during evolution. Member references are treated as atomic subtrees during code modification (i.e., the object, the member access operator, and the member being accessed), creating simple compatibility for modification between primitive variables and object instances.

Nodes in the ES trees are assigned a type indicating the nature of the node.

Each node’s type belongs to a node class, which is used during evolution to help maintain syntactic validity in generated programs. The node classes used by CASC for C++ programs are shown in Table 5.1. The M isc node class contains specific names (e.g., cout, cin, N U LL), operators, and other code elements that should not be generated during code evolution.

The CASC parser monitors scope level during the parsing process; when ES trees are created, each root node is assigned the appropriate scope level. CASC uses scope to indicate lines affected by control statements.

For the running example, assume that lines 2-7 in the function shown in Fig-ure 5.6 are indicated as an ES. The trees that would be generated for this program are shown in Figure 5.7 along with the associated name registry in Table 5.2.

5.2.2.2. Program population initialization. The program population is initialized by modifying copies of the source program employing the mutation and ar-chitecture alter operators. These operators are described in detail in Section 5.2.4.4;

the primary difference in their application in this phase is that when doing muta-tion, the proportion of program nodes mutated is randomly selected from a Gaussian distribution.

Table 5.1: Currently Supported C++ Node Classes Node Class Associated Node Types

Function Function Calls

Terminal Numeric Literal, Variable, Array, Logic(true, false) Obj. Reference, Obj. Dereference

Ternary Operator ?:

Binary Operator +, -, *, /, =, Modulus, Comma Unary Operator !, -, ++, - -, new, delete, &, * Logical Binary

Operator &&, ||, <, >, <=, >=, ==, !=

Bitwise Operator Bitwise And, Bitwise Or, Bitwise Xor, Bitwise Not Branch if, else, else if

Loop for, while

Misc Declaration, return, Comment, Insertion, Extraction, cout, cin, cerr, endl, stdout, stdin, String Literal, NULL, switch, case, default, break

Figure 5.7: Parsing Result for Running Example

5.2.3. Testing and Verification Module. This module is responsible for