return auxn
4.4 Fitness Functions
The tness of each individual program is evaluated by calling its constituent operations (i.e. the trees: makenull, front, dequeue, enqueue and empty) in a series of test sequences and comparing the result they return with the anticipated result. Only if they are the same is the individual's tness increased. NB whilst all testing is
black box
with no infor- mation about the program's internal implementation being used, following the discovery of memory hungry solutions (Section 4.7) the tness function was modied to include penalties for excessive resource (i.e. memory) usage and we show that memory ecient implementations can be evolved (see Sections 4.8, 4.9 and 4.10.5).The two operations makenull and enqueue do not return an answer. As with the stack makenull and push operations, they can only be tested indirectly by seeing if the other operations work correctly when called after them. They are both scored as if they had returned the correct result.
As with the empty operation in the stack data structure, the empty operation returns either true or false (cf. Table 4.1) however, like the other four operations, it is composed
of signed integer functions and terminals and the evolved code returns a signed integer. Therefore a wrapper is used to convert the signed integer to a boolean value before tness checks are performed. The wrapper converts a zero value to true and all other values to false. The evolved code can compare two values using subtraction (no explicit comparison operators are provided). If they are equal, subtracting them yields zero which the wrapper converts to true. This wrapper also avoids the potential bias in the wrapper used with the empty stack operation (Section 3.4).
As was explained in Section 4.1, the queue is dened to exclude error checks and so the tness test case is designed to avoid causing the logical errors that these checks would trap. I.e. they never try to enqueue more than nine integers, never uses dequeue or front when the queue should be empty and the data structure is always initialised by makenull before any other operation is called. All storage (i.e. the indexed memory and the auxiliary registers) is initialized to zero before each test sequence is started.
Initially the tness function was identical to that used for the stack, with the supposi- tion of enqueue for push etc. (cf. Section 3.4). However unlike the stack, various programs were evolved which passed the whole test case but did not correctly implement a FIFO (First-In First-Out) list. As these were produced the tness test case was changed, to include more tests, dierent test orders and to enqueue dierent numbers.
Initially the tness function, like the stack, was simply the sum of the number of tests passed. Dierent tness scalings were tried, which gave less weight to \easy" tests. In the experiments in Sections 4.7 and 4.8, makenull and enqueue tests are equivalent to only 5% of dequeue, front and empty tests. In later experiments a single tness value for each program was replaced by \Pareto" scoring (See Sections 2.3.8 and 4.10.1). The details of each tness function used are described with each experiment.
4.4.1 Test Case
Initially the tness test case, like the rest of the tness function, was identical to that used for the stack, with the supposition of enqueue for push etc. and so the argument of enqueue was identical to that used with push (Table 3.5, page 68), i.e. an integer between
;1000 and 999. As memory was initialised to zero and all enqueued data is non-zero
(cf. Table 3.5) an eective test of whether a memory cell has been used or not is to see if it contains a non-zero value. In the case of the queue, partial solutions were produced which exploited this and used it to estimate whether the queue was empty or not. Such solutions could fail if tested on a queue containing the value zero. The GP found and exploited this
and a few similar \holes" in the test sequences to produce high scoring individuals which solve the test case rather than the queue problem. Therefore (from Section 4.8 onwards) the test data values were changed to increase the proportion of small integers and a fth long test sequence was added to test for memory hungry solutions.
A possible explanation for why additional measures were needed in the tness testing of the queue that were not required with the stack is that without appropriate cursor primitives (such as MIncn), the queue is a harder problem and the absence of a solution allows the GP to explore the tness function more fully and then exploit \holes" in it.
4.5 Parameters
The default values for parameters given in Section E.3 were used except: the population size, the maximum program length, the length of each run and the use of a ne grained demic population (see Section 2.3.7). The values of these parameters were changed between the various experiments described in this chapter, details are given in Tables 4.4, 4.5, 4.7 and 4.10.
4.5.1 Population size
Initial runs with a population of 1,000 were very disappointing. Whilst the initial tness function, the initial primitives used or loss of genetic diversity caused by premature con- vergence (i.e. when the population converges to a local optimum rather than the global optimum) may have contributed to this, it was decided to follow advice in Kinnear, Jr., 1994c] and Koza, 1994, page 617] and make the population as big as possible. All the queue experiments described in this chapter have a population size of 10,000.
4.5.2 Maximum Program Size
As was discussed with the stack problem (see Section 3.5) each genetic program is com- posed of six trees (cf. Section 4.2) which must t into a xed length table (cf. Section E.1). There are no restrictions on each tree's size, however their combined lengths must sum to no more than the size of the table. The table size was the same as in Chapter 3, despite having an additional tree (adf1). This is reasonable as Koza, 1994, page 644] suggests using an ADF generally reduces the total size of the program.
As Figures 4.28 and 4.29 show individual programs within the population typically grew towards the maximum available space and so its eects can not be neglected. Section 2.3.5 described how the crossover operator ensures this limit is not violated.
Random queue programs are bigger on average than random stack ones (compare Fig- ures 4.28 and 4.29 with Figures 3.15 and 3.16 (pages 74 and 75)) principally because of the higher proportion of functions with two arguments amongst the primitives (45% ver- sus 25%, cf. Table 4.10 and Table 3.2 (page 62)). Random trees are created from the root (using the \ramped-half-and-half" method Koza, 1992, page 93]) so the higher the number of branches (i.e. function arguments) at each level the bigger the tree will be and (when using the grow mechanism) the greater the chance of growing to another level. Thus the limit on total tree size (i.e. program length) is more of a constraint in this chapter than it was in Chapter 3. (Alternative means of creating random trees for the initial population are proposed in Bohm and Geyer-Schulz, 1996] and Iba, 1996b]).
Gathercole and Ross, 1996] consider the impact of restrictions on program size in the case of programs consisting of a single tree and shows the standard GP crossover operator can lead to loss of diversity at the root of the tree. Whilst the analysis does not include multi-tree programs or treat in details restrictions on total program size rather than tree height, it may be the case that the restriction on total program size does cause problems in these experiments. There is some evidence that roots of trees in this chapter may converge to inappropriate primitives which the GP then has to work around to evolve operational code.
When the initial random population is created, the trees within the individual are created sequentially. As each primitive is added to the current tree the code ensures that the individual remains within the total restriction on program size. Thus the total size limit has little impact on the rst trees created but as each new tree makes the total program longer, the size limit has a disproportionate eect on the last tree to be created (i.e. adf1). If as a random program is being created, its length nears the length limit, the chance of adding a terminal (rather than a function) to the program is increased to restrict the addition of new branches to the tree and thus constrain its growth. This leads to asymmetric trees. (Section 5.7 introduces a per tree restriction on program size which ensures the eects do not fall disproportionately on the last tree). The crossover operator used (Section 2.3.5) ensures the ospring will never exceed the total size limit and so the eect of the size limit does not fall unduly on the last tree after the creation of the initial population.