• No results found

Linear Genetic Programming

In document Global Optimization Algorithms (Page 192-197)

4.3 (Standard) Tree Genomes

4.6 Linear Genetic Programming

4.6.1 Introduction

In the beginning of this chapter, we have learned that the major goal of Genetic Programming is to find programs that solve a given set of problems. We have seen that tree genomes are suitable to encode such programs and how the genetic operators can be applied to them.

4.6 Linear Genetic Programming 175 Nevertheless, we have also seen that trees are not the only way for representing programs.

Matter of fact, a computer processes them as sequences of instructions instead. These se-quences may contain branches in form of jumps to other places in the code. Every possible flowchart describing the behavior of a program can be translated into such a sequence. It is therefore only natural that the first approach to automated program generation developed by Friedberg [10] at the end of the 1950s used a fixed-length instruction sequence genome [10, 895]. The area of Genetic Programming focused on such instruction string genomes is called linear Genetic Programming (LGP).

Linear Genetic Programming can be distinguished from approaches like Grammatical Evolution (seeSection 4.5.6 on page 164) by the fact that strings there are just genotypic, intermediate representations that encode the program trees. In LGP, they are the center of the whole evolution and contain the program code directly. Some of the most important early contributions to this field come from [1026]:

• Banzhaf [1130], who used a genotype-phenotype mapping with repair mechanisms to translate a bit string into a sequence of simple arithmetic instructions in 1993,

• Perkis [1131] (1994), whose stack based GP evaluated arithmetic expressions in Reverse Polish Notation (RPN),

• Openshaw and Turton [1132] (1994) who also used Perkis’s approach but already repre-sented mathematical equations as fixed-length bit string back in the 1980s [1133], and

• Crepeau [1134], who developed a machine code GP system around an emulator for the Z80 processor.

Besides the methods discussed in this section, other interesting approaches to linear Genetic Programming are the LGP variants developed by Eklund [1135] and Leung et al. [1136, 1137]

on specialized hardware, the commercial system by Foster [1138], and the MicroGP (µGP) system for test program induction by Corno et al. [1139, 1140].

4.6.2 Advantages and Disadvantages

The advantage of linear Genetic Programming lies in the straightforward evaluation of the evolved algorithms. Its structure furthermore eases limiting the runtime in the program evaluation and even simulating parallelism. The drawback is that simply reusing the genetic operators for variable-length string genomes (discussed in Section 3.5 on page 131) which randomly insert, delete, or toggle bits, is not really feasible. In LGP forms that allow arbi-trary jumps and call instructions to shape the control flow, this becomes even more eminent because of a high degree of epistasis (seeSection 1.4.6andSection 4.8).

We can visualize, for example, that the alternatives and loops which we know from high-level programming languages are mapped to conditional and unconditional jump instructions in machine code. These jumps target to either absolute or relative addresses inside the pro-gram. Let us consider the insertion of a single, new command into the instruction string, maybe as result of a mutation or recombination operation. If we do not perform any further corrections after this insertion, it is well possible that the resulting shift of the absolute ad-dresses of the subsequent instructions in the program invalidates the control flow and renders the whole program useless. This issue is illustrated inFig. 4.21.a. Nordin et al. [1141, 1141]

point out that standard crossover is highly disruptive. Even though the sub-tree crossover in tree-genomes is shown to be not very efficient either [1038], in comparison, tree-based genomes are less vulnerable in this aspect. The loop inFig. 4.21.b, for instance, stays intact although it is now one useless instruction richer. In LGP, precautions have to be taken in order to mitigate these problems, linear Genetic Programming becomes more competitive to Standard Genetic Programming also in terms of robustness of the recombination operations.

One approach to do so is to create intelligent mutation and crossover operators which preserve the control flow of the program when inserting or deleting instructions. Such op-erations could, for instance, analyze the program structure and automatically correct jump targets, for instance. Operations which are restricted to have only minimal effect on the

(before insertion)

... 50 0150 019A 10 38 33 83 8F 03 50 ...

(after insertion: loop begin shifted)

... 50 0150 019A 102338 33 83 8F 03 50 ...

Fig. 4.21.a: Inserting into an instruction string.

0

1 i

Pop create i

> {block}

while {block}

appendList

n i

(after Insertion)

Fig. 4.21.b: Inserting in a tree representation.

Fig. 4.21: The impact of insertion operations in Genetic Programming

control flow from the start can also easily be introduced. InSection 4.6.6, we shortly outline some of the work of Brameier and Banzhaf, who define some interesting approaches to this issue.Section 4.6.7 discusses the homologous crossover operation which represents another method for decreasing the destructive effects of reproduction in LGP.

4.6.3 The Compiling Genetic Programming System

Its roots go back to Nordin [1143], who was dissatisfied with the performance of GP systems written in an interpreted language which, in turn, interpret the programs evolved using a tree-shaped genome. In 1994, he published his work on a new Compiling Genetic Program-ming System (CGPS) written in the C programProgram-ming language35[1144] directly manipulating individuals represented as machine code.

Each solution candidate consisted of a prologue for shoveling the input from the stack into registers, a set of instructions for information processing, and an epilogue for terminating the function [1145]. The prologue and epilogue were never modified by the genetic operations.

As instructions for the middle part, the Genetic Programming system had arithmetical operations and bit-shift operators at its disposal in [1143], but no control flow manipulation primitives like jumps or procedure calls. These were added in [1146] along with ADFs, making this LGP approach Turing complete.

Nordin [1143] used the classification of Swedish words as task in the first experiments with this new system. He found that it had approximately the same capability for grow-ing classifiers as artificial neural networks but performed much faster. Another interestgrow-ing application of his system was the compression of images and audio data [1148].

4.6.4 Automatic Induction of Machine Code by Genetic Programming

CGPS originally evolved code for the Sun Sparc processors, which is a member of the RISC36 processor class. This had the advantage that all instructions are have the same

35http://en.wikipedia.org/wiki/C_(programming_language)[accessed 2008-09-16]

36http://de.wikipedia.org/wiki/Reduced_Instruction_Set_Computing[accessed 2008-09-16]

4.6 Linear Genetic Programming 177 size. In the Automatic Induction of Machine Code with GP system (AIM-GP, AIMGP), the successor of CGPS, the support for multiple other architectures was added by Nordin, Banzhaf, and Francone [1149, 1150], including Java bytecode37 and CISC38 CPUs with variable instruction widths such as Intel 80x86 processors. A new interesting application for linear Genetic Programming tackled with AIMGP is the evolution of robot behavior such as obstacle avoiding and wall following [1151].

4.6.5 Java Bytecode Evolution

Besides AIMGP, there exist numerous other approaches to the evolution of linear Java bytecode functions. The Java Bytecode Genetic Programming system (JBGP, also Java Method Evolver, JME) by Lukschandl et al. [1152, 1153, 1154, 1155] is written in Java.

A genotype in JBGP contains the maximum allowed stack depth together with a linear list of instruction descriptors. Each instruction descriptor holds information such as the corresponding bytecode and the branch offset. The genotypes are transformed with the genotype-phenotype mapping into methods of a Java class which then can be loaded into the JVM, executed, and evaluated. [1156, 1157].

The JAPHET system of Klahold et al. [1158], the user provides an initial Java class at startup. Classes are divided into a static and a dynamic part. The static parts contain things like version information are not affected by the reproduction operations. The dynamic parts, containing the methods, are modified by the genetic operations which add new byte code [1156, 1157].

Harvey et al. [1156, 1157] introduce byte code GP (bcGP), where the whole population of each generation is represented by one class file. Like in AIMGP, each individual is a linear sequence of Java bytecode and is surrounded by a prologue and epilogue. Furthermore, by adding buffer space, each individual has the same size and, thus, the whole population can be kept inside a byte array of a fixed size, too.

4.6.6 Brameier and Banzhaf: LGP with Implicit Intron removal

In the Genetic Programming system developed by Brameier and Banzhaf [1142] based on former experience with AIMGP, an individual is represented as a linear sequence of simple C instructions as outlined in the example Listing 4.8 (a slightly modified version of the example from [1142]). Due to reproduction operations like as mutation and crossover, such genotypes may contain introns, i.e., instructions not influencing the result (seeDefinition 3.2 andSection 4.10.3). Given that the output of the program defined inListing 4.8will store its outputs inv[0] andv[1], all the lines marked with(I)do not contribute to the overall functional fitness. Brameier and Banzhaf [1142] introduce an algorithm which removes these introns during the genotype-phenotype mapping, before the fitness evaluation. This linear Genetic Programming method was successfully tested with several classification tasks [1142, 1159, 129], function approximation and Boolean function synthesis [1160].

In his doctoral dissertation, Brameier [1161] elaborates that the control flow of linear Genetic Programming more equals a graph than a tree because of jump and call instructions.

In the earlier work of Brameier and Banzhaf [1142] mentioned just a few lines ago, introns were only excluded by the genotype-phenotype mapping but preserved in the genotypes because they were expected to make the programs robust against variations. In [1161], Brameier concludes that such implicit introns representing unreachable or ineffective code have no real protective effect but reduce the efficiency of the reproduction operations and, thus, should be avoided or at least minimized by them. Instead, the concept of explicitly defined introns (EDIs) proposed by Nordin et al. [1162] is utilized in form of something like nopinstructions in order to decrease the destructive effect of crossover. Brameier finds that

37http://en.wikipedia.org/wiki/Bytecode[accessed 2008-09-16]

38http://de.wikipedia.org/wiki/Complex_Instruction_Set_Computing[accessed 2008-09-16]

1 void ind ( double [8] v ) {

2 ...

3 v [0] = v [5] + 73;

4 v [7] = v [0] - 59; ( I )

5 if ( v [1] > 0)

6 if ( v [5] > 23)

7 v [4] = v [2] * v [1];

8 v [2] = v [5] + v [4]; ( I )

9 v [6] = v [0] * 25; ( I )

10 v [6] = v [4] - 4;

11 v [1] = sin ( v [6]) ;

12 if ( v [0] > v [1]) ( I )

13 v [3] = v [5] * v [5]; ( I )

14 v [7] = v [6] * 2;

15 v [5] = v [7] + 115; ( I )

16 if ( v [1] <= v [6])

17 v [1] = sin ( v [7]) ;

18 }

Listing 4.8: A genotype of an individual in Brameier and Banzhaf’s LGP system.

introducing EDIs decreases the proportion of introns arising from unreachable or ineffective code and lead to better results. In comparison with standard tree-based GP, his linear Genetic Programming approach performed better during experiments with classification, regression, and Boolean function evolution benchmarks.

4.6.7 Homologous Crossover

According to Banzhaf et al. [959], natural crossover is very restricted and usually exchanges only genes that express the same functionality and are located at the same positions (loci) on the chromosomes.

Definition 4.3 (Homology). In genetics, homology39 of protein-coding DNA sequences means that they code for the same protein which may indicate common function. Homolo-gous chromosomes40 are either chromosomes in a biological cell that pair during meiosis or non-identical chromosomes which code for the same functional feature by containing similar genes in different allelic states.

In other words, homologous genetic material is very similar and in nature, only such material is exchanged in sexual reproduction. In linear Genetic Programming however, two individuals often differ in their structure and in the function of the genes at the same loci.

Francone et al. [1043, 1149] introduce a sticky crossover operator which resembles homology by allowing the exchange of instructions between two genotypes (programs) only if they reside at the same loci. It first chooses a sequence of code in the first genotype and then swaps it with the sequence at exactly the same position in the second parent.

4.6.8 Page-based LGP

A similar approach is the Page-based linear Genetic Programming of Heywood and Zincir-Heywood [1163], where programs are described as sequences of pages, each including the same number of instructions. Here, crossover exchanges only a single page between the parents and, as a result, becomes less destructive. This approach should be distinguished from the fixed block size approach of Nordin et al. [1149] for CISC architectures which was developed to accommodate variable instruction lengths in AIMGP.

39http://en.wikipedia.org/wiki/Homology_(biology)[accessed 2008-06-17]

40http://en.wikipedia.org/wiki/Homologous_chromosome[accessed 2008-06-17]

In document Global Optimization Algorithms (Page 192-197)