• No results found

2.9 Summary and Further Reading

3.1.5 When Minimum Generation is Unknown

Unfortunately, is it extremely unlikely that a researcher will know the number of generations at which the true minimum computational effort occurs. This section discusses how confidence intervals can be established using an estimate of the true minimum generation.

For a sample of genetic programming runs, the minimum generation can be estimated by using the technique Koza described. That is, by calculating the computational effort,I(i), for every generation,i, from 0 to the maximum in the

1. Obtain n independent runs using a population of size M. Obtain the observed success proportion, p, and observed minimum generation, j, using all n runs. 2. Obtain the1αconfidence limits for the true success proportion using Wilson’s

method (equations 2.3 and 2.4). Label these upper and lower limitspu and pl.

3. Approximate1αconfidence limits for the true minimum computational effort are given by

El= (j+ 1)·M ·R(pu, z)

and

Eu = (j+ 1)·M·R(pl, z)

where Elis the approximate lower limit andEu is the approximate upper limit.

Table 3.4: Algorithm for the Wilson-Dependent method.

experiment. The estimated minimum generation is the generation where I(i) is minimal.

For the generation of confidence intervals, the estimated minimum generation is used in place of the true minimum generation, but otherwise the three methods remain unchanged. Table 3.4 describes the Wilson-Dependent algorithm.

From a statistical perspective this introduces dependence between the mea- surements of minimum generation and the minimum computational effort. Kei- jzer et al. suggested that the runs in a GP experiment could be divided into two halves; the first half used to estimate the minimum generation and the second half used to estimate the minimum computational effort. However the cost of a GP run is typically so expensive that using only half the runs to establish computational effort is not seriously considered. This work follows that prag- matic approach and accepts the dependence (although we re-consider this in section 3.2).

Because no effort has been made to account for the increased variability in the estimated computational effort that is due to estimating the minimum generation, it should be expected that the confidence intervals produced using these methods would achieve less than 95% coverage.

Results and Discussion

For each problem domain and confidence interval generation method, table 3.5 gives the average coverage and the average number of valid confidence intervals

Method \Problem Ant Parity Symbreg Multiplexor Average Normal 96.1% 63.8% 94.8% 93.1% 86.9% 7,012 1,892 9,839 3,684 5,606 Wilson-Dependent 92.9% 94.0% 94.9% 95.7% 94.4% 9,950 10,000 10,000 10,000 9,988 Resampling 92.4% 65.3% 91.2% 72.3% 80.3% 10,000 10,000 10,000 10,000 10,000

Table 3.5: Average coverage percentages and average validity statistics by prob- lem domain when the minimum generation is estimated. Averages are over 25–500 runs.

Method \Runs 25 50 75 100 200 500 Average

Normal 65.1% 72.3% 94.7% 97.3% 96.7% 95.4% 86.9% 2,497 3,770 4,695 5,595 7,212 9,870 5,606 Wilson-Dependent 93.0% 94.4% 94.7% 93.8% 94.9% 95.3% 94.4% 9,928 9,998 10,000 10,000 10,000 10,000 9,988 Resampling 62.2% 73.9% 80.2% 84.5% 89.0% 91.9% 80.3% 10,000 10,000 10,000 10,000 10,000 10,000 10,000

Table 3.6: Average coverage percentages and average validity statistics by run size when the minimum generation is estimated. Averages are over the four problem domains.

that were produced. Table 3.6 gives the same statistics but by run size and method.

Figure 3.1 depicts box and whisker plots of the width of the confidence in- tervals produced using each of the three methods for each of the six run sizes on the Ant domain.6 The grey line across each plot indicates the value of the best estimate of the true computational effort. This line is added to assist un- derstanding of the magnitude of the widths. The whiskers (indicated by the dashed line) in the plots extend to the most extreme data point or 1.5 times the interquartile range from the box, whichever is smaller. In the latter case, points past the whiskers are considered outliers and are marked with a small circle. The box-plot for 25 runs using the Resampling method is incomplete as more than 50% of the simulated experiments produced infinite confidence interval widths.

Surprisingly, the use of an estimated minimum generation had very little negative impact on the coverage of the three methods. Excluding the Parity

0 500000 1000000 1500000 2000000 25 runs Normal (92.1%) Wilson (90.6%) Resampling (89.1%) 50 runs Normal (97.3%) Wilson (92.5%) Resampling (92.1%) 75 runs Normal (97.5%) Wilson (93%) Resampling (92.6%) 100 runs Normal (97%) Wilson (94%) Resampling (93%) 200 runs Normal (96.7%) Wilson (94.8%) Resampling (93.1%) 500 runs Normal (96%) Wilson (94.2%) Resampling (94%)

Width of confidence interval

Figure 3.1: Confidence interval widths for the Ant problem domain when the minimum generation is estimated. Percentages indicate coverage for the specific configurations.

domain, the Normal method did well with an average coverage of 94.7% (as against the intended 95%). The Wilson-Dependent method did even better as, over all problem domains and run sizes, it dropped only slightly to an average of 94.4% (as compared to 95.2% when the true minimum generation was known). From these results it appears that, even when an estimated minimum generation is used, the confidence intervals produced by the Wilson-Dependent method are a good approximation to a 95% confidence interval.

It is hypothesised that the use of an estimated minimum generation had so little negative effect because the computational effort performance curves flatten out around the true minimum generation, and that the use of an estimate provides a result “good enough” for the production of a confidence interval.

Finally, it is worth noting that the median widths of the confidence intervals are almost always greater than the best estimate of the true value.