A Better Statistic? - Developing and evaluating incremental evolution using high quality perfor

Although mean generation tells you how long, on average, it takes to terminate a run, it does not tell you how often a solution was found. And although success proportion tells you how often a solution was found, if it is stated for only the final generation, then it gives no idea how long the runs spent evolving.

Mean best fitness is similar to success proportion except that its sensitivity is greater. Mean best fitness can say how the runs are improving even if none has found a solution. But like success proportion, if it is only stated for the final generation, the measure gives no idea how much effort was required to obtain that level of fitness.

There are partial solutions to these issues. If the vast majority of runs com- pleted successfully, then that may be sufficient information to give meaning to a mean generation statistic. But if it’s desirable to quantify “vast majority”, then both mean generation and success proportion can be quoted together, thus giv- ing an indication of both the success rate and the amount of evolution required. Equally, mean generation and mean best fitness could be paired too.

However it can be quite tricky to compare two pairs of values. If both statistics are better or worse than their competitor, then it makes for an obvious comparison. Indeed, even if one of the statistics is equal, then the other statistic can be easily compared. However, what if the comparison is against a result that has a higher mean generation (i.e. it takes longer) and a higher success rate? In this case a conclusive comparison is not obvious.

Another approach is to quote success proportion or mean best fitness for every generation. This is commonly achieved through the use of a graph. As well as being cumbersome, this approach does not achieve its purpose. How should one compare two graphs that intersect with one another? Such a situation occurs when one GP variation performs well early on but is out-performed later. In such a scenario there is no obvious choice for which is the better. Indeed, in sections 3.5 and 4.3 we demonstrated that analysis based on this approach may produce a misleading conclusion.

The problem lies in how the level of success and the length of time should be combined. Both minimum computational effort and success effort attempt to find an acceptable answer to this.

It is possible to consider the question that is answered by minimum computational effort and success effort. Koza’s statistic tells you how much effort would be required to find a solution 99% of the time were you to execute the optimal number of runs to a fixed generation (the minimum generation) irrespective of the success or failure of any run.

Success effort in comparison answers the question: given the specific settings how many generations will be required (on average) before a solution will be found. As a consequence, success effort includes the cut-off generation, and therefore, if the cost of failure is constant, the number of restarts that will be required. For the statistic to be meaningful, runs would have to be performed sequentially.

If genetic programming is to be used on hard problems, Luke has shown that longer run lengths are to be preferred over many shorter runs [83]. As a result we could expect practitioners to dedicate their resources to a single run, rather than split them into an “optimal” number of runs. Such practitioners will be very interested in the cut-off generation which tells them when their effort on the current run should be aborted. Practitioners will be interested in a statistic that offers a direct indication of the cost that they will incur if they use GP. In this light, success effort can be seen to be a more desirable measure than computational effort.

5.4 Summary

In this part we have:

• Introduced methods to produce confidence intervals for Koza’s minimum computational effort measure and concluded that the Wilson-Dependent is reliable.

• Re-introduced the success effort statistic and defined two confidence interval methods for it. We concluded the simulated parametric approach was reliable.

• Shown that, for Koza’s minimum computational effort, mean best-of-run fitness, mean generation, and success proportion, the confidence intervals produced are all reliable (bar the zero-width intervals of mean best fitness).

• Shown that success effort and minimum computational effort are philosophically more desirable than the other statistics if you are interested in both the proportion of success and the length of time it took to find solutions.

• Shown that success effort had generally narrower confidence interval width ratios and is a somewhat more desirable statistic than computational effort.

Because mean best fitness, mean generation, and success proportion only deal with one of the two parameters of general interest, their confidence intervals are notably tighter than those for computational effort and success effort. If you are in the unlikely position of being interested in only one of the two variables, then using one of mean best fitness, mean generation, or success proportion is a good choice.

If you are in the typical situation of being interested in both the proportion of runs that find a solution and the number of generations that were required to find the solutions, then the use of minimum computational effort or success effort is preferable. We have shown that success effort is philosophically more desirable, and statistically a possibly more powerful measure, than computational effort. We thus recommend the use of success effort be at least considered.

In the following chapters we further compare the practicality of success effort and minimum computational effort.

Developing

Review: Incremental Evolution

This chapter provides an introduction to incremental evolution. It offers a review of some of the uses and previous research in the area. This review acts as a starting point for the following chapters where incremental evolution techniques are developed.

6.1 Introduction

Incremental evolution is the sequential use of simpler evolutionary environments that gradually increase in difficulty until the goal environment is reached. De- pending on the researcher, the motivation for incrementally increasing the difficulty of the evolutionary environment is: to increase the likelihood of finding a solution, or to decrease the cost of finding a solution, or to increase the quality of solutions, or—ideally—all of these. Incremental evolution offers the human an opportunity to coach the evolutionary system’s development. It can be seen as a way to add domain-specific knowledge.

Throughout this thesis we will refer to each of the evolutionary environments as a stage. By definition, incremental evolution has a minimum of two stages— the most common use in the literature—but the number of stages can be much larger, with automated options sometimes using hundreds of stages to evolve a solution [53, 122].

In document Developing and evaluating incremental evolution using high quality performance measures for genetic programming : a thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosphy in Computer Science at Massey University, (Page 112-119)