2.2 Evolving Imperative Programs
2.2.3 Grammar-Guided Genetic Programming
The term grammar-guided genetic programming refers to a number of different techniques for introducing language grammars into the evolutionary algorithm, such that the syntactic structure of programs may be constrained [105]. This makes them very suitable for introducing both data-type constraints and the nec- essary structural constraints required by high-level imperative programs. In the rest of this section, some of the most popular grammar-guided GP approaches will be outlined, as well as those attempts at evolving programs with an imperative structure that have used such approaches.
Whigham proposed context-free grammar genetic programming (CFG-GP) [166], which makes use of grammars using the Backus-Naur Form (BNF) notation. BNF grammars are context-free, so are unable to contain any of the formal semantic constraints of a language. An example BNF grammar is shown in Algorithm 2.1. Whigham’s modifications from the standard GP algorithm, construct solutions which are represented as parse trees by stepping through the grammar, randomly selecting from the available set of productions in each rule. All solutions are thus created valid according to the grammar. This syntactic validity is maintained by
CHAPTER 2. BACKGROUND 25
genetic operators which replace subtrees with parse trees rooted at the same non- terminal, either randomly generated in the same way as the initial construction (for mutation) or copied from another individual in the population (crossover). An additional benefit of the use of grammars that Whigham explores in his work is modifying the grammar to bias specific constructs.
Algorithm 2.1 Example grammar in backus-naur form (BNF) notation, express-
ing the syntax of simple arithmetic expressions hexpr i ::= ’(’ hexpr i hopi hexpr i ’)’
| hvar i | hliterali
hopi ::= ’+’ | ’-’ | ’*’
hvar i ::= ’x’ | ’y’
hliterali ::= ’1.0’ | ’2.0’
Grammatical Evolution (GE) [122] is an alternative grammar based approach which uses a separate genotype and phenotype representation. The genotype representation, which is the representation modified by the genetic operators, is a simple sequence of codons, where each codon is an integer (or bit string rep- resenting an integer). During evaluation a mapping operation is used, whereby the phenotypic parse tree representation is constructed from the grammar by se- lecting productions in the grammar rules according to the value of codons. One of the key advantages of GE over other grammar-guided approaches is that the simple linear genotypic representation is very simple and inexpensive to modify by genetic operators. However, GE has undergone some criticism on the topic of locality [21, 23, 139], because its search operators appear to exhibit low local- ity, where a small modification in the genotype results in a large change to the phenotype and the resulting fitness of individuals.
The use of grammars provides a powerful mechanism for constraining the struc- ture of solutions that can be used for introducing a more naturally imperative control structure. This was demonstrated in O’Neill and Ryan’s work on evolving
hstmti hif-stmti ‘if(’ hconditioni hvar i ‘x’ hopi ‘<’ hvar i ‘y’ ‘)’ hblocki ‘{’ hstmti hassignmenti hvar i ‘x’ ‘=’ hvar i ‘y’ ‘}’
(a) Concrete Syntax Tree
IfStatement LessThan Variable[x] Variable[y] Block Assignment Variable[x] Variable[y]
(b) Abstract Syntax Tree
Figure 2.5: Example computer program represented as both a concrete syntax
tree(CST) and an abstract syntax tree(AST). In the CST, the actual syntax of the
program can be read from left to right in the leaf nodes. The AST uses a higher level of abstraction to represent the semantics of the program.
CHAPTER 2. BACKGROUND 27
multi-line C programs in GE to generate caching algorithms [119] and to solve the Santa Fe ant trail problem [120]. It has been suggested that complete BNF grammars for languages such as C can be “easily plugged in to GE” [121]. But no known attempts of this appear in the literature. It seems likely that in prac- tice it is far from easy to use GE with such large and complex grammars. One difficulty is that the context-free grammars used by GE and CFG-GP lack the ex- pressiveness to describe the semantic constraints associated with many high-level programming constructs. For example, variable declarations needing to precede any use of a variable. Other authors [31, 37, 125] have described extensions to GE that do use context-sensitive grammars, but none have used the extensions to evolve imperative programs.
Other grammar-based approaches have been designed to make use of context- sensitive grammars, such as DCTG-GP [136] and LOGENPRO [172, 173] which uses definite clause grammars to induce programs in a range of languages, includ- ing imperative C programs. Definite clause grammars allow symbols to include arguments that can be used to enforce context-dependency. Wong and Leung demonstrate using the additional context information to enforce data-type con- straints [173] and to evolve recursive structures that solve the general even-n-parity problem [174].
The parse trees used to represent programs in CFG-GP, GE and other grammar- based systems are concrete syntax trees(CST), as opposed to the abstract syntax
trees (AST) used in other GP representations. Concrete syntax trees contain ex-
plicit elements of a language’s syntax, while abstract syntax trees are abstracted from the specific syntax of any one language and instead model the semantic con- structs themselves. In this way, ASTs are a higher level abstraction than CSTs and they can be interpreted to represent any of a number of syntactic structures. Which means that a computer program represented as an AST can be very simply converted to source code in any programming language that supports the same programming constructs, regardless of the syntax used to express them. Figures
2.5a and 2.5b show comparable CST and AST representations of the same pro- gram.