The presentation of the PARTE algorithm is organized as follows:
First, an overview of PARTE and its algorithmic context is given. Next, the parse tree topology that is used to represent programs in PARTE is introduced, followed by the details of how the parse trees are encoded in genoms, and by the elaboration on the genotype-phenotype mapping as prerequisite for evaluating tree individuals. The subsequent section shows how PARTE complexifies the tree topologies during evolution. Next, the details of the Selection-, Mutation- and Crossover-procedures are presented. Finally, the appearance of code bloat in GP and how PARTE avoids bloat is discussed.
3.1 Overview of PARTE and its algorithmic Context
Parse Tree Evolution (PARTE) is an evolutionary algorithm that evolves, similar to Genetic Programming (GP), a population of programs represented as parse trees. PARTE differs in
various aspects from GP, though: Contrary to GP, the tree structure is an explicit part of the program’s functionality. Furthermore, a tree node in PARTE consists of both a function and the variables/constants that serve as arguments to the function. By consequence, the evolutionary search mechanisms are partly different too.
In PARTE, each parse tree genom consists of one or several genes that are binary encoded, and of the information how the genes are interconnected.
In order to evaluate a parse tree, the tree genom is mapped to the corresponding phenom (for a comprehensive discussion how to generate legal phenotypes, see for example Yu and Bentley (1998). Subsequently, the phenoms are evaluated to obtain the fitness of the parse trees.
Search and solution space are thus separated. This contributes to ensure the validity of the program output (as discussed in Banzhaf (1994), Keller and Banzhaf (1996)).
For the genotype-phenotype mapping, PARTE uses the Backus-Naur Form (BNF) Gram-mar. Several attempts to use Grammars in evolutionary algorithms have been undertaken. For an interesting application of BNF in evolutionary algorithms, see Ryan, Collins and O’Neill (1998) or O’Neill and Ryan (2001). Other examples of grammar usage are Whigham (1995) or Freeman (1998).
To select the parents that will breed the following generation, PARTE uses the ’Lexico-graphic parsimony pressure’ tournament, a method proposed by Luke and Panait (2002).
Part of the offspring is produced by mutating one parent, the other part by crossing over two parents. PARTE’s crossover mechanism is inspired by the ”NeuroEvolution of Augmenting Topologies” (NEAT) algorithm developed by Stanley and Miikkulainen (2002). It allows for an incremental complexification of the tree topology. The Mutation either changes the content of existing genes, or changes the information how they are interconnected, or adds a completely new gene to the genome.
Using Crossover and Mutation, an entire new generation of parse trees is produced. It
replaces the parent generation, unless the best parent is better than the best offspring. In this case, the best parent replaces the worst offspring in the new generation.
3.2 Parse Tree Topology
PARTE evolves a population of programs that are represented as a parse trees. Each node of a tree consists of two elements:
• A function f
• A number a of variables or constants as an input to the function
• An up-level pointer
Note that f ǫ ̥, where ̥ is the previously defined set of all functions to be used in the program, and that a corresponds to the arity of f .
In a tree with depth d ≥ 2, the up-level pointer indicates the program node that is situated in the tree hierarchy a level above. If there is only one level of program depth, the pointer of the nodes is zero.
The tree hierarchy represents an explicit functionality itself. There are two possible situ-ations for a program node in the parse tree: It can be dependent on another program node, or it can be parallel to it. Thus, we can define two functions with arity a = 2 that will be invoked when either structure is present.
An obvious choice would be to define the following function mapping:
position Arithmetic function Boolean function
parallel * OR
dependent + AND
However, the function mapping is not restricted, as long as the arity a = 2 is respected.
The following example illustrates the above: Assume that the trading strategy program of tree i is to buy an asset when the 20-day price trend is upward sloping, and either the
5-day momentum is accelerating or the Chaikin Oscillator is not negative. Write the strategy program as:
si = AN D³
IS(′p¯20,k>0), OR³
N OT(cok <0), IS(momentum5,k>0)´ ´
As function mapping of the tree structure, choose for dependence the boolean AND and for parallelism the boolean OR. The exemplary parse tree contains thus three nodes. The second and third node depend on the first node, their pointers indicating this dependence. As they are situated parallel at the same level, they are evaluated with the function OR. The AND function evaluates subsequently the first node together with the result of the OR operation.
The set of node functions ̥ comprises two functions: IS, and NOT. They have both arity a = 1. Each program node contains thus one variable. The set of variables Ψ corresponds to the set of raw signals Θ (as defined in Section 2.3). The variables and functions used in the program nodes are:
• (′p¯20,k>0) input to IS in the first program node
• (cok <0) input to NOT in the second program node
• (momentum5,k >0) input to IS in the third program node
3.3 Parse Tree Genomes
The information about the parse tree structure and -content is stored in the genom of each individual of the PARTE population. Each gene in the genome of an individual represents a program node. The number of genes in the genome depends thus on the complexity of the corresponding parse tree.
Part of the information contained in a gene is binary encoded, namely:
• a pointer to a function f ǫ ̥
• a number a of pointers to variables/constants that serve as arguments for f
The up-level pointer is encoded as an integer value.
Additionally, each gene contains an innovation number. This innovation number tracks the moment in evolution when the gene has been created. It is non-modifiable.
3.4 Genotype - Phenotype mapping
PARTE transcribes the genotype of a tree individual into its phenotype using a Backus-Naur Form (BNF) grammar. A BNF-grammar is represented as a tuple S, N, T, P, where S is a start symbol, N is the set of non-terminals, T is the set of Terminals, and P a set of rules that maps the elements of N to T (see Naur (1963)). To avoid possible confusion, note that a
”terminal” in the BNF context does not correspond to a ”terminal” as used in the context of GP4.
The transcription procedure takes four steps:
• First, the integer value of the binary string is calculated.
• Second, result of the equation:
(integer value) M ODU LO (number of BN F − production rules)
is evaluated
• The resulting integer is then used to select one of a set of BNF-production rules.
• Finally, the production rule is evaluated.
The procedure is repeated until all binary strings of all genes are transcribed and the parse tree is defined.
4Note that the language where the BNF is applied to can be chosen freely. In many cases it will be optimal to select a language close to a given problem.
To illustrate the above, consider again the example trading strategy of section 3.2. Note that f indicates a function in the set of all functions ̥, and ψ denotes one variable in the set of all variables Ψ. The brackets { } indicate a set.
S = { f }
N = { f, ψ}
T = { ̥, Ψ }
P = { f ::= IS ( ψ )
| NOT ( ψ )
ψ::= (′p¯20,k>0)
| (cok <0)
| (momentum5,k >0)
| .... }
It is possible to define S = { f }, as a program node always contains a function. There are two functions f in ̥. The first production rule that maps an element of ̥ to f is thus twofold:
f ::= IS( ψ ), or f ::= N OT ( ψ ). The second production rule maps an element of Ψ to ψ.
It contains as many cases as there are indicators in Ψ. In the trading strategy example, the first three cases of the second production rule would have been executed.
3.5 Evolutionary Complexification of the Tree Topology
PARTE starts its evolutionary search with a population of trees that contain a single program node. The genes that correspond to the node are initialized at random. This implies that both
the initial function of a node and the arguments are randomly selected from their respective sets ̥ and Ψ.
Once all trees are created, the population is evaluated with the target fitness function ξ.
ξ has to be defined according to the problem to be solved. In the case of the trading strategy example, ξ may be the profitability of the trading strategy over time.
The next, potentially more complex, generation is created by mutating individual parents, and crossing two parents over. The number of offspring nbo that is bred corresponds to the number of parents nbp5.
The offspring population replaces the parent population, with one exception: If the best parent is better than the best offspring, it replaces the worst offspring in the new population6
The evolutionary search continues until one of the following stop criteria is met:
• The solution found is at least as good as an initially defined quality target
• The maximum number of generation has been reached
3.6 Selection
The parents that will breed the new generation are selected using the ’Lexicographic Parsimony Pressure’ Tournament method. This Method has been proposed by Luke and Panait (2002). A subset of the population is randomly chosen to compete in a tournament. The tournament is played, and the winner of the tournament is selected as parent. If two equally fit competitors meet during the tournament, the one with the shorter genom wins.
5However, this is not a constraint: A highly elitist setup is possible where nbo > nbp and only the best offspring is kept, as well as an approach closer to the steady state where nbo< nbp
6Again, this is not a constraint: One can decide to keep an arbitrary number ǫ [ 0, nbp] of trees of the old generation.
3.7 Mutation
A Mutation in PARTE can happen in two ways: It either adds a new gene, or modifies an existing one.
When a new gene is added, the node function, its argument(s), and the up-level pointer are randomly initialized. A global innovation number is incremented and attributed to the gene. Note that the innovation numbers sort thus the genes according to their appearance in evolution. The information content of the innovation numbers will be exploited during crossover.
The mutation of an existing gene changes the either the binary string of the node function or the function argument by one bit, or it modifies the up-level pointer. Note that the range of the possible values for the pointer is restricted to those nodes that are not dependent of the node in mutation. When the up-level pointer is modified, a new innovation number is attributed to the gene.
The genotype mutation of the binary strings does not have to change the phenotype in all cases, as different gene specifications can lead to the same phene (e.g., the gene string can represent 128 different integers, but there are only 4 production rules). This genetic code degeneracy is observed in biological organisms, too. As suggested by O’Neill and Ryan (1999), this may, in the context of evolutionary algorithms, maintain genotypic diversity and preserve valid individuals.
The mutations cause three effects:
• The genomes of the individuals in the population will gradually grow and contain a different amount of genes
• The binary strings within the genes will start to vary
• The interconnection of the program nodes will start to vary
3.8 Crossover
The crossover mechanism in PARTE is inspired by the algorithm ”NeuroEvolution of Aug-menting Topologies” that has been proposed by Stanley and Miikkulainen (2002). It uses the innovation number of the genes to allow for crossover of parents with potentially diverse genomes.
Tree genoms are crossed over in five steps:
• The genes of both parents are first lined up according to their innovation number.
• Next, their genes are classified as
– matching (those occurring in both parents),
– disjoint (if they are situated within the range of the other parents innovation num-bers)
– excess (if they are situated outside the range of the other parents innovation num-bers).
• To create the offspring, the matching genes are drawn randomly from either parent.
• Disjoint and excess genes are included if they belong to the fitter of both parents.
• In case both parents are equally fit, all disjoint and excess genes are included.
Note that only in the case of equally fit parents, a new genome structure is created. If one parent is fitter, its genome structure is transferred to the offspring and only the gene strings of the parents are crossed over. Note also that two matching genes can have differing content (due to mutations of the node function and/or the function arguments).
3.9 Bloat: PARTE vs. GP
Many empirical studies have shown that programs evolved with Genetic Programming (GP) tend to bloat, i.e., to increase rapidly in size (see e.g., Koza (1992), Angeline (1994), Altenberg
(1994), Soule, Foster and Dickinson (1996), Soule (1998), Banzhaf, Nordin, Keller and Francone (1998), or Luke (2000) ). According to Banzhaf and Langdon (2002), the GP community agrees that there is a relationship between the appearance of neutral code (introns) and code growth.
Bleuler, Brack, Thiele and Zitzler (2001) explain the emergence of introns with the fact that GP crossover is inhomologous, i.e., it does not exchange code fragments that have the same functionality in both parents. They point out that ”...crossover most often reduces the fitness of offspring relative to their parents by disrupting valuable code segments or placing them in a different context. Because crossover points are chosen randomly within an individual, the risk of disrupting blocks of functional code can be reduced substantially by adding introns.” Introns are thus self-protective during evolution.
Various attempts to avoid bloat have been undertaken. For example, a size or depth limit is enforced on the programs. This solution, however, suffers from the same inconvenience as fixed size genome algorithms: A limit too small hampers the evolutionary search in finding optimal solutions, too big a limit slows the search process down without necessity. Another approach is to include an explicit size element in the program fitness. This Constant Parsimony Pressure has the disadvantage that in some runs, the entire population is driven to the minimal possible size (see Soule and Foster (1999)). Other approaches to control bloat are the Lexicographic Parsimony Pressure proposed by Luke and Panait (2002) (see Section 3.6), or Adaptive Parsimony Pressure proposed by Zhang and Muehlenbein (1995). Another promising approach has been suggested by Bleuler et al. (2001). They use multiobjective optimization for evolving compact GP programs by introducing the program size as a second, independent objective.
PARTE goes yet another way to avoid bloat. It tries to tackle the appearance of introns by the root of the problem: Crossover. The PARTE crossover mechanism is specifically designed to avoid the disruption of valuable code segments. By construction, the output of a PARTE program node is always valid, also after crossover. Introns are not self-protective anymore, as they loose their reason of existence: They do no longer increase the chances of survival of an individual. Furthermore, as long as one parent is fitter, crossover does not lead to an increase
in program size. This applies additional pressure in favor of compact solutions.