Saturation - Results to 50 Generations - Developing and evaluating incremental evolution using

7.6 Results to 50 Generations

8.1.1 Saturation

Jackson’s initial study focused on the effect of varying saturation levels. Having noted that “programs that pass all four test cases [of the first stage] were often found in the initial population”, he concluded that, as a consequence, the second stage’s starting population was “often very similar to that which would have been obtained [had incremental evolution not been used]”, and that, as a result, “performance hardly differed”. His solution was to let the evolutionary process continue until a specified percentage of the population was made up of 100%- correct individuals—the saturation level.

Figure 8.1 plots the efficiency ratio of fitness-based incremental evolution from the final success proportion1 _{data given in Jackson’s work [67].}

1_{Although minimum computational effort measures were given in the paper, insufficient}

0 10 20 30 40 50

0.5

1.5

2.5

Jackson’s Results on Even−4−Parity with 4 of 16 fitness cases in the first of two stages

Saturation percentage Efficiency ratio 0 10 20 30 40 50 0.5 1.0 1.5 2.0

with 8 of 16 cases in the first of two stages

Saturation percentage Efficiency ratio 0 10 20 30 40 50 0.5 1.0 1.5 2.0

with the first 8 fitness cases in the first stage and the second 8 fitness cases in the second of three stages

Saturation percentage

Efficiency ratio

Figure 8.1: Jackson’s results [67] comparing the efficiency of using fitness-based incremental evolution to direct evolution on the even-4-parity problem given varying saturation levels in the initial stages. Neither technique used ADFs. Ratios are based on quoted final success proportions. 95% confidence intervals are in- cluded; calculated using the method in table 2.3) from run sizes and observed success data provided by Jackson. The grey line is plotted at the break-even point of 1.0.

Although his initial fitness-based incremental evolution study was only of even-4, we can draw some conclusions about the usefulness of saturation. Firstly, there are two general trends.

The first general trend is that the vast majority of experiments produced results that were below the easier of the two bars worth achieving. Canonical (standard) GP without ADFs had superior performance in 24 of the 30 experiments. The hope that one might hold for this technique would most likely be pinned on an increase in relative performance for more difficult problems. Jack- son however does not consider this in his paper. We address this topic (among others) in the following sections.

The second general trend is that reducing the saturation level increases performance. Although this fact is demonstrated by the trends in figure 8.1, it is more powerful to consider the raw measures (without comparison to the base-line). Unfortunately, even then the majority of results are statistically indistinguish- able: at 95% confidence, the top result is unable to be separated from, at the very least, saturation rates of 1 to 10%. From this evidence it would seem that only very small saturation rates should be considered.

A concerning issue with Jackson’s methodology was whether the target saturation rates were ever actually attained. It was not made clear, given a fixed limit of 15 generations in stage one, how saturation levels of up to 50% could ever have been achieved. However the concern is actually of no practical significance given the fact that high levels of saturation were shown to have a negative impact on performance.

The original motivation for the idea does not hold water when the difficulty of the initial stage is increased. My experiments showed that, with eight of the sixteen fitness cases, the initial stage took a number of generations before finding solutions to all its allocated fitness cases (compare the performance in figure 8.2 with the result in figure 8.3 that 35% of the initial population represented a solution of the first four fitness cases).

One might claim that the motivation for saturation should really be to drive the entire population into a genetic space that allows for even more success in the next stage. This however is the motivation of incremental evolution. So perhaps just as important would be to consider other, potentially less costly, techniques such as duplicating the best individuals so that they are over represented in the initial population of the next stage, or how the intermediate stages are chosen. The latter of these two ideas are considered in this chapter.

0 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0

First 8 fitness cases of even−4−parity with ADFs

M=500 N=500

Generation

Cumulative probability of success E=40,100 P(j)=0.468 R(j,z)=7.3 j=10 P(50)= 0.75 0 50 100 150

Individuals to be processed (thousands)

Probability of success (P) Computational effort (I)

Figure 8.2: Computational effort and success proportion curves for the first eight of the 16 fitness cases of even-4-parity. A solution in this instance is a score of eight out of the eight fitness cases.

up, one might consider the idea of saturation to be dead. However, possibly the most telling of evidence is Jackson’s lack of use of saturation levels in either of his second or third studies on the topic [65, 66]—he even labelled his own technique “disappointing” [65].

In document Developing and evaluating incremental evolution using high quality performance measures for genetic programming : a thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosphy in Computer Science at Massey University, (Page 149-152)