2.3 Evolution Strategies Overview
2.3.2 Evolution Strategies Techniques
The early application of ES technique was (1+1)-ES which consists of two individu- als. One of them is the parent individual, which produces one offspring individual each of multiple generations (iterations). Each individual is represented by a pair of float- valued vectors indi = (X, σ), where i = parent for parent or current individual and
i = of f spring for offspring individual. The first float-valued vector X represents the proposed solution in the search problem space. It is also called the proposed solution chromosome. Thus, the parent proposed solutions chromosome Xparent is for the par- ent individualindparent = (Xparent, σ), while the offspring chromosomeXof f spring is for the offspring individualindof f spring = (Xof f spring, σ). The second float-valued vector σ represents the standard deviations of random Gaussian numbers that are used in the muta- tions of the current parent chromosome to produce the new offspring chromosome. This mutation procedure can be represented by the following equation:
where N(0, σ) represents a vector of random Gaussian numbers with zero means and standard deviation equal toσ. Examining the quality of the parent and offspring chro- mosomes (the evolved solutions) is the next step in each evolving iteration in (1+1)-ES technique. The function used to check the quality of the proposed solution chromosomes is called the fitness or the objective function. In IR problem domain, the accuracy and the similarity matching functions such as Mean Average Precision, Precision, Error Rate, and Cosine Similarity among others (Baeza-Yates and Ribeiro-Neto, 2011) are used as the fitness functions for evolutionary and machine learning techniques (Li,2014;Cordon et al.,2003;Cummins,2008). For example, theCosine Similarity(d, q)fitness function is defined by: Cosine Similarity(d, q) = Σ n i=1Wid · Wiq q Σn i=1Wid2 · Σni=1Wiq2 (2.3.31)
In the above equation,Cosine Similarity(d, q)is the similarity function between the queryqand documentdvectors,nis the number of index terms that exist in the document
dand queryq,Widis the weight of termiin documentdandWiqis the weight of the same termi in query q. The optimisation target for (1+1)-ES is to find an evolved document representation for the relevant documentd corresponding to its query q. Assuming that
n = 2 in Equation 2.3.31, the document d and the query q have only index terms t1
and t2. The query q vector has weight vector representation as q = (0.25,0.35). For
the current evolved iterationj in (1+1)-ES, the proposed current evolved representation (parent chromosome) for d is wparentj
1d = 0 and w
parentj
2d = 0.45. If the σ vector in equation2.3.30isσ = (1,1). Then, the Offspring chromosome of weight representations after mutation in the current evolving iteration is given by:
wof f springj 1d = w parentj 1d + N(0,1) = 0 + 0.4 = 0.4, wof f springj 2d = w parentj 2d + N(0,1) = 0.45 − 0.1 = 0.35 (2.3.32)
where N(0,1) is a random Gaussian number with zero mean and 1 as standard deviation. The cosine similarity functions for the parent and offspring chromosomes are given by:
Cosine Similarity(dparentj, q) = 0.814,
Cosine Similarity(dOf f springj, q) = 0.9733 (2.3.33)
From Equation 2.3.33, the fitness function value for the offspring chromosome
dOf f springj is higher than the fitness function value for the parent chromosome dparentj.
In other words, if the offspring chromosome which represents the relevant document D is more similar to the queryqthan the parent chromosome, the parent chromosomedparentj
is replaced by the offspring chromosomedOf f springj for the next evolving iterationj + 1
in the ES technique. In (1+1)-ES, the standard deviation vector σ = (σ1, σ2, ..., σn) of the mutation is usually updated in each iteration according to the performance of the offspring chromosome. Moreover, the success rule differs in each fitness function in the hope of achieving an increased performance of the search for converging to a better evolved solution. The first success rule for the corridor model and sphere model was proposed by Rechenberg (Beyer and Schwefel, 2002). This success rule is called
1/5 success rule. This rule is used to reduce or increase the standard deviation vector componentsσ = (σ1, σ2, ..., σn) based on the real value 1/5 for the sphere or corridor fitness functions using (1+1)-ES.
The multi-membered ES differs from the previous (1+1)-ES in the population size (the number of individuals) in each iteration (generation) (Beyer and Schwefel, 2002; Schwefel, 1981). Furthermore, each individual has a random probability to be selected for mating. Such multi-membered ES also use a recombination procedure which is similar to crossover in Genetic Algorithms. The most well known multi-membered ES techniques are:(µ+λ)−ESand(µ, λ)−ES. In the(µ+λ)−ES, the parent individuals
µare used to create λ offspring individuals in each evolving generation, where λ ≥ 1. The worstλ individuals are discarded out of all (µ+λ) individuals. Then, the best µ
individuals are used as parent individuals for the next generation. On the other hand, the(µ, λ)−ES has a different selection procedure. The parent individuals µ are used to create λ offspring individuals in each evolving generation. Then, parent individuals
are discarded and the selection of the bestµindividuals from theλoffspring are used as the parent individuals for the next generation. Similar to (1+1)-ES, the fitness functions are represented as(µ+λ)−ES and(µ, λ)−ES to check the quality of the proposed evolved solutions. The standard deviations of the mutation parameters are no longer constant, nor changed by a deterministic rule such as the ”1/5 success rule”. They are incorporated in the individual evolution process. For creating the offspring individuals from parent individuals, the ES technique works as follows:
1) The algorithm selects two or more individuals for recombination. Assuming that the following two individual are selected:
(x1, σ1) = ((x11, ..., xn1),(σ11, ..., σn1))and
(x2, σ2) = ((x21, ..., xn2),(σ21, ..., σ2n)) (2.3.34)
wherex1 andx2 are the chromosomes for individuals one and two, whileσ1 andσ2 are the standard deviation step-sizes vectors of the mutations in individuals one and two respectively. There are two well-known ways of applying the recombination (crossover) operator:
A) Discrete recombination which produces the new offspring
(x0, σ0) = ((xpop1 1 , ..., x popn n ),(σ pop1 1 , ..., σ popn n )), (2.3.35) wherepopi = 1or popi = 2, whilei= 1....n. Thus, each component was selected from individual one or individual two.
B) Intermediate recombination is similar to the uniform crossover in Genetic Algo- rithm (GA) (Le, 2011). When using the intermediate recombination, the new offspring becomes:
These recombination types can be used to converge the proposed evolved solutions to the global optimal solutions. They can be applied on each of the two individual pairs or multiple individuals to produce a new population of offspring individuals.
2) The following procedure is used for mutation of the offspring(x0, σ0)obtained from the recombination. The mutation is done as follows:
σOf f spring =σ0 ·eN(0,∆σ0), and
xOf f spring =x0+N(0, σOf f spring) (2.3.37)
where∆σ0 is the variation parameter value in the mutation standard deviation. For con- trolling the convergence rate, Schwefel proposed an additional controlling parameter (Beyer and Schwefel, 2002). Assuming that the additional control parameter is θ and each individual is(x, σ, θ). The mutation Equation2.3.37becomes as follows:
σOf f spring =σ0·eN(0,∆σ0), θOf f spring =θ0+N(0,∆θ0), and
xOf f spring =x0+C(0, σOf f spring, θOf f spring) (2.3.38)
where C(0, σOf f spring, θOf f spring) is a vector of Random Gaussian numbers with zero mean and appropriate probability density. Recently, Hansen proposed the use of the co- variance matrix adaptation to adapt and to control the convergence rate (Hansen, 2016; Back et al., 2013). This is called Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES). The historical developments of (1+1)-ES to the the state-of-the-art ES (p-sep- lmm-CMA-ES) is presented in (Back et al., 2013). However, the problem memory size and the run-time are issues in CMA-ES. This is because the size of Covariance Matrix and its calculations in each evolving iteration require more memory and more computational runtime. Thus, recent research (Back et al.,2013) proposed (1+1)-CMA-ES with various adaptations to reduce the problem memory size and the computational runtime. Christian
Igel et al. proved that (1+1)-Cholesky-CMA-ES is more efficient than multi-membered CMA-ES in some unimodal fitness functions (Ackley, Rastrigin and Griewangk) (Igel et al.,2006). Recently, Ilya Loshchilov proposed a computationally efficient limited mem- ory CMA-ES for large scale optimisation technique (LM-CMA-ES) (Loshchilov,2014). In this technique, vectors of random weights are used for the adaptation of the mutation and convergence rate. One of these vectors is a vector of random Ziggurat numbers. The Ziggurat random numbers are positive random Gaussian numbers within the range be- tween 0 and 1. This type of random number is also used in (1+1)-Evolutionary Gradient Strategy for evolving global weight technique6. The use of Gradient step-size helps to give more control in the converging rate of the new fitness functions such as Cumulative Cosine Similarity. The proposed ES techniques in this thesis are presented in details in Chapters6and7.