Efforts
In this section we first discuss a method to produce random numbers distributed according to the likelihood of the true minimum computational effort. Using that method we can then offer methods to find confidence intervals for two related measures: (i) the difference between, and (ii) the ratio of, two observed values of minimum computational effort.
3.4.1
Simulating Minimum Computational Effort
It is possible to substitute the use of Wilson’s method for a Beta-distribution- based simulation method as the two methods offer the same effect—a confidence interval for a proportion. If we make that change in the Wilson-Dependent algorithm (table 3.4) we have the ability to produce random numbers distributed according to the likelihood of the true minimum computational effort. Table 3.17 describes this algorithm.
Although this method (like the Wilson-Dependent method) assumes there is no variability associated with the minimum generation, this approximation has been shown to produce acceptable coverage for typical and even atypical GP results (see section 3.2).
1. Obtain the minimum generation (j), the success proportion at the minimum generation (P(j)), the number of runs executed (n) and the population size (M) for a given experiment.
2. Obtain a random number which follows a Beta distribution with anα0parameter
of (P1(j1)·n1) + 1 and a β0 parameter of ((1−P1(j1))·n1) + 1. Label this
Prand.
3. Transform Prand with the function
Erand = (j+ 1)·R(Prand)·M
to obtain a random number distributed according to the likelihood of the min- imum computational effort for the given parameters.
Table 3.17: Algorithm to produce a random number distributed according to the likelihood of a minimum computational effort with parameters j, P(j), n, and
M.
Confidence Intervals
We could use the algorithm in table 3.17 to produce an approximate confidence interval for minimum computational effort. Given say 10,000 Erand values, the
α
2 and 1−
α
2 quantiles would represent upper and lower limits of a confidence
interval at the (1−α) level for the true minimum computational effort.
The Wilson-Dependent method is however superior for our purposes, given that it produces repeatable results (as it is a deterministic algorithm), and that it is algorithmically and computationally much simpler.
3.4.2
Minimum Computational Effort Differences
We used the simulation algorithm just developed to allow us to form approxi- mate confidence intervals for the difference of two minimum computational effort measures. Table 3.18 details the algorithm.
3.4.3
Minimum Computational Effort Ratios
In his second book, Genetic Programming II, Koza introduced a measure he termed the efficiency ratio (RE) of two minimum computational effort measure-
1. Obtain the minimum generation (j1), the success proportion at the minimum
generation (P1(j1)), the number of runs executed (n1), and the population size
(M1) for the first experiment.
2. Obtain the same values (j2, P2(j2), n2, M2) for the second experiment.
3. The computational effort for the first experiment is:
E1 = (j1+ 1)·R(P1(j1))·M1
The computational effort for the second experiment is:
E2 = (j2+ 1)·R(P2(j2))·M2
The minimum computational effort difference is then:
∆E=E1−E2
4. Obtain X random numbers which follow the expected distribution for the first experiment’s parameters (as described in table 3.17). Label them ER1.
5. Obtain another X random numbers for the second experiment. Label these
ER2.
6. Find the α
2 and 1−
α
2 quantiles of ER1−ER2. These provide an upper and
lower limit for a1−αconfidence interval for the minimum computational effort difference.
Table 3.18: An algorithm to produce a confidence interval at the 1−α level for the difference between two minimum computational effort measurements.
1. Obtain the minimum generation (j1), the success proportion at the minimum
generation (P1(j1)), the number of runs executed (n1) and the population size
(M1) for a first experiment.
2. Obtain the same values (j2, P2(j2),n2, M2) for the second experiment.
3. The efficiency ratio is then:
RE =
E1
E2
= (j1+ 1)·R(P1(j1))·M1 (j2+ 1)·R(P2(j2))·M2
4. Obtain X random numbers which follow the expected distribution for the first experiment’s parameters (as described in table 3.17). Label themER1.
5. Obtain another X random numbers for the second experiment. Label these
ER2. 6. Find the α 2 and1− α 2 quantiles of ER1
ER2. These provide an upper and lower limit
for a 1−α confidence interval for the efficiency ratio RE.
Table 3.19: Algorithm to produce a confidence interval at the 1−α level for the efficiency ratio of two minimum computational effort measurements.
RE=
Computational effort without ADFs Computational Effort with ADFs =
Ewithout
Ewith
It was used throughout the book as an aide to demonstrate the benefits of genetic programming with automatically defined functions (ADFs).
The use of a ratio could however compare any two minimum computational effort measurements and is not specific to the use of ADFs. If you have two methods ‘A’ and ‘B’ and expected ‘A’ to outperform ‘’B then, given two minimum computational effort measurements, EA and EB, EEBA will be greater than one if
‘A’ had the better measure.
Table 3.19 introduces a method to obtain an approximate confidence interval for the ratio of two computational effort statistics. If the confidence interval does not include one then we can be confident (at the 1−α level) that the two results are statistically different.
An example of the increased power offered by the use of this method can be found in the work in chapter 9. We were experimenting with the even-4-parity
120 140 160 180 200 Minimum computational effort (x1000)
Direct Incremental
Figure 3.11: Example of two minimum computational effort measures whose individual confidence intervals overlap but whose ratio is significantly different from one.
problem and had two results whose confidence intervals overlapped.9 Figure 3.11 graphs those two intervals. When the algorithm in table 3.19 was used, a ratio of 0.77 had a 95% confidence interval of 0.60–0.97, thus the two measurements are indeed statistically significantly different.