• No results found

The Representation Problem for Genetic Algorithms

4.2 Introduction to LISP

As will be seen, the genetic programming paradigm described in this book applies many of the key ideas of the conventional genetic algorithm to structures that are more complex than character strings patterned after chromosome strings and considerably more general and expressive than the specialized structures used in past work on extending the conventional genetic algorithm. In particular, genetic

programming operates with very general, hierarchical computer programs.

Virtually any programming language (e.g., PASCAL, FORTRAN, C, FORTH, LISP) is capable of expressing and executing the general, hierarchical computer programs.

For reasons that are detailed in the next section, I have chosen the LISP (LISt Processing) programming language for the work with genetic programming. In particular, I have chosen the Common LISP dialect (Steele 1990).

This section provides a brief outline of the LISP programming language. The reader already familiar with LISP may wish to skip it.

LISP has only two main types of entities: atoms and lists. The constant 7 and the variable TIME are examples of atoms in LISP. A list in LISP is written as an ordered set of items inside a pair of parentheses. Examples of lists are (A B C D) and (+ 1 2).

Page 69 A symbolic expression (S-expression) is a list or an atom in LISP. The S-expression is the only syntactic form in pure versions of the LISP programming language. In particular, the programs of LISP are S-expressions.

The LISP compiler and operating system works so as to evaluate whatever it sees. When seen by LISP, constant atoms (e.g., 7) evaluate to themselves and variable atoms (e.g., TIME) evaluate to their current value. When a list is seen by LISP, the list is evaluated by treating the first element of the list (i.e., whatever is just inside the opening parenthesis) as a function and then causing the application of that function to the remaining items of the list. That is, these remaining items are themselves evaluated and then treated as arguments to the function.

For example, (+ 1 2) is a LISP S-expression. In this S-expression, the addition function + appears just inside the opening parenthesis of the S-expression. This S-expression calls for the application of the addition function + to two arguments (i.e., the atoms 1 and 2). The value returned as a result of the evaluation of the S-expression (+ 1 2) is 3. LISP S-expressions are examples of Polish notation (also called "prefix notation").

If any of the arguments in an S-expression are themselves lists (rather than atoms that can be immediately evaluated), LISP first evaluates these arguments (in a recursive, depth-first way, starting from the left, in Common LISP).

The LISP S-expression (+ (* 2 3) 4)

illustrates the way that computer programs in LISP can be viewed as compositions of functions. This S-expression calls for the application of the addition function + to two arguments, namely the sub-S-expression (* 2 3) and the constant atom 4. In order to complete the

evaluation of the entire S-expression, LISP must first evaluate the argument (* 2 3). The sub-S-expression (* 2 3) calls for the application of the multiplication function * to the two constant atoms 2 and 3. This sub-S-expression evaluates to 6, and the entire S- expression evaluates to 10.

Other programming languages apply functions to arguments in a similar manner. For example, the FORTH programming language uses reverse Polish notation; thus, the above S-expression would be written in FORTH as

2 3 * 4 +

FORTH first evaluates the subexpression 2 3 * by applying the function * to the 2 and the 3 to get 6. It then applies the function + to the 6 and the 4 to get 10.

The term "computer program," of course, carries the connotation of the ability to do more than merely perform compositions of simple arithmetic operations. Among the connotations of the term "computer program" is the ability to perform alternative computations conditioned on the outcome of intermediate calculations, to perform operations in a hierarchical way, and to perform computations on variables of many different types. LISP goes about doing all these seemingly different things in the same way: LISP treats the item

Page 70 just inside the outermost left parenthesis as a function and then applies that function to the remaining items of the list (i.e., the arguments). For example, the LISP S-expression

(+ 1 2 (IF (> TIME 10) 3 4))

illustrates how LISP views conditional and relational elements of computer programs as applications of functions to arguments. In the sub-S- expression (> TIME 10), the relation > is viewed as a function and is applied to the variable atom TIME and the constant atom 10. The subexpression (> TIME 10) then evaluates to either T (True) or NIL (False), depending on the current value of the variable atom TIME. The conditional operator IF is then viewed as a function which is applied to three arguments: the logical value (T or NIL) returned by the subexpression (> TIME 10), the constant atom 3, and the constant atom 4. If its first argument evaluates to T (more precisely, anything other than NIL), the function IF returns the result of evaluating its second argument (i.e., the constant atom 3), but if its first argument evaluates to NIL, the function IF returns the result of evaluating its third argument (i.e., the constant atom 4).

Thus, the S-expression evaluates to either 6 or 7, depending on whether the current value of the variable atom TIME is or is not greater than 10.

Any LISP S-expression can be graphically depicted as a rooted point-labeled tree with ordered branches. Figure 4.1 shows the tree corresponding to the above LISP S-expression.

In this graphical depiction, the three internal points of the tree are labeled with functions (i.e., +, IF, and >). The six external points (leaves) of the tree are labeled with terminals (e.g., the variable atom TIME and the constant atoms 1, 2, 10, 3, and 4). The root of the tree is labeled with the function (i.e., +) appearing just inside the leftmost opening parenthesis of the S-expression.

Note that this tree form of a LISP S-expression is equivalent to the parse tree which many compilers construct internally to represent a given computer program.

An important feature of LISP is that all LISP computer programs have just one syntactic form (i.e., the S-expression). The programs of the LISP programming language are S-expressions, and an S-expression is, in effect, the parse tree of the program.

Figure 4.1 The LISP S-expression

(+ 1 2 (IF (> TIME 10) 3 4)) depicted as a rooted, point-labeled

tree with ordered branches.

Page 71 4.3 Reasons for Choosing LISP

It is possible to implement genetic programming using any programming language that can manipulate computer programs as data and that can then compile, link, and execute the new programs (or support an interpreter to execute the new programs). As previously mentioned, virtually any programming language (e.g., PASCAL, FORTRAN, C, FORTH, LISP) is capable of expressing and evaluating the compositions of functions and terminals necessary to implement genetic programming.

No one reason is decisive in my choice of LISP as the programming language for the work with genetic programming, but the cumulative effect of the following reasons strongly favors the choice of LISP.

First, in the LISP programming language, both programs and data have the same form (i.e., S-expressions). Thus, it is both possible and convenient to treat a computer program in the genetic population as data so that it can first be genetically manipulated. Then, it is both possible and convenient to immediately execute the result of the manipulation as a program.

Second, the above-mentioned common form for both programs and data in LISP (i.e., S-expressions) is equivalent to the parse tree for the computer program. In spite of their outwardly different appearance and syntax, most compiled programming languages internally convert, at the time of compilation, a given program into a parse tree representing the underlying composition of functions and terminals of that program. In most programming languages, this parse tree is not accessible (or at least not conveniently accessible) to the programmer. And, if it were accessible, it would have a different appearance and syntax than the programming language itself. We need access to the parse tree of the computer program because we want to genetically manipulate the parts of the programs (i.e., subtrees of the parse tree). LISP provides this access because a LISP program is, in effect, its own parse tree.

Third, the EVAL function of LISP provides an almost effortless way of executing a computer program that was just created or genetically manipulated.

Fourth, LISP facilitates the programming of structures whose size and shape change dynamically (rather than being determined in advance). Moreover, LISP's dynamic storage allocation and garbage collection provide administrative support for the programming of dynamically changing structures. The underlying philosophy of all aspects of the LISP programming language is to impose no limitation on programs beyond the limitation inherently imposed by the physical and virtual memory limitations of the computer on which the program is being run. While it is possible to handle structures whose size and shape change dynamically in many programming languages, LISP is especially well suited for this.

Fifth, LISP facilitates the convenient handling of hierarchical structures.

Sixth, the basic PRINT function of the LISP programming language provides ways to present parse trees in an understandable manner.

Page 72 Seventh, software environments offering an unusually rich collection of programmer tools are commercially available for the LISP

programming language.

It is important to note that I did not choose the LISP programming language because genetic programming makes any use of the list data structure from LISP or the list manipulation functions unique or peculiar to LISP (such as CONS, CAR, CDR, or APPEND).

Page 73

5