Chapter 4: Experiments, Results, and Analysis
4.2 Experimental setup
4.2.1 Competing algorithms and performance measures
A meta-optimizer with a fixed budget is compared to a meta-optimizer with a flexible budget to test the null hypothesis H0: the difference in solution quality obtained by both methods is equal to zero for various computational budgets, against the alternative hypothesisH1: the difference is not equal to zero. To do so, the Flexible Budget is first run
once, optimizing parameters for any budget less than nmax. Then, several computational budgets are selected for the Fixed Budget{ܾଵ, … ,ܾ:ܾ≤ ݊ ௫∀݅= 1, … ,݇}, which is run
k times optimizing parameters for a specific bi in each run. Finally, solution qualities
൛ݍభ, … ,ݍೖൟி௫ௗ and ൛ݍభ, … ,ݍೖൟி௫ are compared. The expectation is that, over
many replications, these quantities will be statistically indifferent, at a given significance level. If this is the case, then the Flexible Budget is shown to have saved a lot of time, depending onk, while maintaining the same solution quality. See Figure4.1.
Figure 4.1. Comparison of the Fixed and Flexible Budget methods at different computational budgets. In this example, the Flexible Budget, using L, AUC, or AL, was able to find a better (lower) solution
quality at each of the four selected budgets less than 3000 (nmax).
The lower-level algorithm is the derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES)1. This (μW, λ)-CMA-ES generates λ offspring for the next generation g+1, from the current population of μ individuals, according to a multi-variate normal distribution with a weighted mean and a covariance matrix.
ݔାଵ~ࣨ ൫ݔ௪,ߪమܥ൯,݇= 1, … ,ߣ, 4.1
where the recombination point ݔ௪ =∑ఓୀଵݓݔ:ఒ is a weighted average of the selected individuals, such that the weights ݓ> 0 for all݅= 1, … ,ߤand ∑ఓୀଵݓ= 1. C is the covariance matrix with a step size ofσ. The detailed workings of CMA-ES are beyond the scope of this work, the reader is directed to Hansen and Kern (2004) for more details.
Five parameters are to be tuned for CMA-ES, they are:λ, μ,σ0(the initial step size), recombination type, and update type. The recombination type parameter specifies the wi’s. Three types are used: Equal where ݓ= 1⁄ߤ, Linear where ݓ=ߤ − ݅, and Superlinear
(non-linear) where ݓ=݈݃(ߤ+ 1)− ݈݃(݅+ 1). The update type parameter specifies the
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 L og F it n es s Function Evaluation 600 1200 1800 2400 AL AUC L L o g so lu ti o n s q u a li ty
rank of the matrix used to update C. It is either aRank-μmatrix, or aRank-1 matrix. These parameters are to be tuned for differentnmaxandbivalues, see Table4.1.
Hansen and Ostermeier (2001) showed improved solution quality when using
Superlinear recombination over Equal and Linear. They also noted that the best setting of
this parameter depends on the function being optimized. Rank-μ is expected to reduce the number of generations required to reach the optimal solution, as reported by Hansen et al.
(2003). Their experimental results showed reduction in the time complexity of CMA-ES from quadratic to linear over various functions.
Table 4.1.nmaxandbifor which parameters of CMA-ES are to be tuned.
Set2 Dimension nmax bi
1 5 3000 {600, 1200, 1800, 2400} 2 5 3000 {600, 1200, 1800, 2400, 3000} 10 3000 {600, 1200, 1800, 2400, 3000} 15 4500 {3000, 3500, 4000, 4500} 3 5 1000 {400, 600, 800, 1000} 10 15
The meta-level is a basic EA with crossover and mutation operators (detailed shortly). The population size is 15 and the number of generations is 40. Parents are selected according to a tournament selection of size 2, while survival selection is the elitist selection with a constant population size. The initial values for the recombination and update types are randomly selected from their respective domains, whileλ,μ, andσ0are randomly initialized
from [10, 20], [2, λ], and [0.01, 2], respectively. Note that 2 ≤ ߤ ≤ ߣ(Hansen, 2006). Each
individual is represented as 〈ߣ,ߤ,ߪ,ݎ݁ܿ݉ ܾ݅݊ܽݐ݅݊ݐݕ݁,ݑ݀ܽݐ݁ݐݕ݁〉 and is evaluated overk= 6 replications of CMA-ES. Finally, the meta-EA is replicated 30 times.
The crossover and mutation operators work as follows: following parent selection, a random cut-off point c is chosen from the set {2,3,4}, where c = 2 means that a cut-off occurs after the second “gene”, and so on. Each individual is then cut-off at c and the two
2
parts are swapped. This ensures thatߤ ≤ ߣis maintained. Mutatingλ,μ, and σ0 of the new individuals is done by drawing a number at random from a normal distribution with its mean centered about the current value, and its standard deviation equal to 1. Forλandμ, the new values are rounded to integers. This indicates that, on average, 38% of the time λ and μ remain the same. Ifσ0becomes negative, it is reset to a value of 1.
4.2.2 Training and testing sets
Three sets of optimization functions are used, each using different random seeds for training and testing.Set-1 contains eight functions from which different instances cannot be generated; thus, each function is used for training and testing. These functions are: Ackley, Griewangk, Rastrigin, Schwefel, Schwefel ellipsoid, Schwefel ellipsoid rotated, Rosenbrock, and Rosenbrock rotated, all in 5 dimensions and are implemented as in Igel et al. (2008).
Set-2 and Set-3 are the hump and quadratic family of functions, from which various instances can be created. These functions are taken from the generator by Rönkkönen et al.
(2011), and are implemented in 5, 10, and 15 dimensions. See Figures4.2-4.3.