Evolution Strategies Techniques - Evolution Strategies Overview

2.3 Evolution Strategies Overview

2.3.2 Evolution Strategies Techniques

The early application of ES technique was (1+1)-ES which consists of two individuals. One of them is the parent individual, which produces one offspring individual each of multiple generations (iterations). Each individual is represented by a pair of float- valued vectors indi = (X, σ), where i = parent for parent or current individual and

i = of f spring for offspring individual. The first float-valued vector X represents the proposed solution in the search problem space. It is also called the proposed solution chromosome. Thus, the parent proposed solutions chromosome Xparent is for the parent individualindparent = (Xparent, σ), while the offspring chromosomeXof f spring is for the offspring individualindof f spring = (Xof f spring, σ). The second float-valued vector σ represents the standard deviations of random Gaussian numbers that are used in the mutations of the current parent chromosome to produce the new offspring chromosome. This mutation procedure can be represented by the following equation:

where N(0, σ) represents a vector of random Gaussian numbers with zero means and standard deviation equal toσ. Examining the quality of the parent and offspring chromosomes (the evolved solutions) is the next step in each evolving iteration in (1+1)-ES technique. The function used to check the quality of the proposed solution chromosomes is called the fitness or the objective function. In IR problem domain, the accuracy and the similarity matching functions such as Mean Average Precision, Precision, Error Rate, and Cosine Similarity among others (Baeza-Yates and Ribeiro-Neto, 2011) are used as the fitness functions for evolutionary and machine learning techniques (Li,2014;Cordon et al.,2003;Cummins,2008). For example, theCosine Similarity(d, q)fitness function is defined by: Cosine Similarity(d, q) = Σ n i=1Wid · Wiq q Σn i=1Wid2 · Σni=1Wiq2 (2.3.31)

In the above equation,Cosine Similarity(d, q)is the similarity function between the queryqand documentdvectors,nis the number of index terms that exist in the document

dand queryq,Widis the weight of termiin documentdandWiqis the weight of the same termi in query q. The optimisation target for (1+1)-ES is to find an evolved document representation for the relevant documentd corresponding to its query q. Assuming that

n = 2 in Equation 2.3.31, the document d and the query q have only index terms t1

and t2. The query q vector has weight vector representation as q = (0.25,0.35). For

the current evolved iterationj in (1+1)-ES, the proposed current evolved representation (parent chromosome) for d is wparentj

1d = 0 and w

parentj

2d = 0.45. If the σ vector in equation2.3.30isσ = (1,1). Then, the Offspring chromosome of weight representations after mutation in the current evolving iteration is given by:

wof f springj 1d = w parentj 1d + N(0,1) = 0 + 0.4 = 0.4, wof f springj 2d = w parentj 2d + N(0,1) = 0.45 − 0.1 = 0.35 (2.3.32)

where N(0,1) is a random Gaussian number with zero mean and 1 as standard deviation. The cosine similarity functions for the parent and offspring chromosomes are given by:

Cosine Similarity(dparentj_{, q}_{) = 0}_.₈₁₄_,

Cosine Similarity(dOf f springj_{, q}_{) = 0}_.₉₇₃₃ _(2.3.33)

From Equation 2.3.33, the fitness function value for the offspring chromosome

dOf f springj _{is higher than the fitness function value for the parent chromosome} _dparentj_.

In other words, if the offspring chromosome which represents the relevant document D is more similar to the queryqthan the parent chromosome, the parent chromosomedparentj

is replaced by the offspring chromosomedOf f springj _{for the next evolving iteration}_j _{+ 1}

in the ES technique. In (1+1)-ES, the standard deviation vector σ = (σ1, σ2, ..., σn) of the mutation is usually updated in each iteration according to the performance of the offspring chromosome. Moreover, the success rule differs in each fitness function in the hope of achieving an increased performance of the search for converging to a better evolved solution. The first success rule for the corridor model and sphere model was proposed by Rechenberg (Beyer and Schwefel, 2002). This success rule is called

1/5 success rule. This rule is used to reduce or increase the standard deviation vector componentsσ = (σ1, σ2, ..., σn) based on the real value 1/5 for the sphere or corridor fitness functions using (1+1)-ES.

The multi-membered ES differs from the previous (1+1)-ES in the population size (the number of individuals) in each iteration (generation) (Beyer and Schwefel, 2002; Schwefel, 1981). Furthermore, each individual has a random probability to be selected for mating. Such multi-membered ES also use a recombination procedure which is similar to crossover in Genetic Algorithms. The most well known multi-membered ES techniques are:(µ+λ)−ESand(µ, λ)−ES. In the(µ+λ)−ES, the parent individuals

µare used to create λ offspring individuals in each evolving generation, where λ ≥ 1. The worstλ individuals are discarded out of all (µ+λ) individuals. Then, the best µ

individuals are used as parent individuals for the next generation. On the other hand, the(µ, λ)−ES has a different selection procedure. The parent individuals µ are used to create λ offspring individuals in each evolving generation. Then, parent individuals

are discarded and the selection of the bestµindividuals from theλoffspring are used as the parent individuals for the next generation. Similar to (1+1)-ES, the fitness functions are represented as(µ+λ)−ES and(µ, λ)−ES to check the quality of the proposed evolved solutions. The standard deviations of the mutation parameters are no longer constant, nor changed by a deterministic rule such as the ”1/5 success rule”. They are incorporated in the individual evolution process. For creating the offspring individuals from parent individuals, the ES technique works as follows:

1) The algorithm selects two or more individuals for recombination. Assuming that the following two individual are selected:

(x1, σ1) = ((x1₁, ..., x_n1),(σ₁1, ..., σ_n1))and

(x2, σ2) = ((x2₁, ..., x_n2),(σ2₁, ..., σ2_n)) (2.3.34)

wherex1 andx2 are the chromosomes for individuals one and two, whileσ1 andσ2 are the standard deviation step-sizes vectors of the mutations in individuals one and two respectively. There are two well-known ways of applying the recombination (crossover) operator:

A) Discrete recombination which produces the new offspring

(x0, σ0) = ((xpop1 1 , ..., x popn n ),(σ pop1 1 , ..., σ popn n )), (2.3.35) wherepopi = 1or popi = 2, whilei= 1....n. Thus, each component was selected from individual one or individual two.

B) Intermediate recombination is similar to the uniform crossover in Genetic Algo- rithm (GA) (Le, 2011). When using the intermediate recombination, the new offspring becomes:

These recombination types can be used to converge the proposed evolved solutions to the global optimal solutions. They can be applied on each of the two individual pairs or multiple individuals to produce a new population of offspring individuals.

2) The following procedure is used for mutation of the offspring(x0, σ0)obtained from the recombination. The mutation is done as follows:

σOf f spring =σ0 ·eN(0,∆σ0), and

xOf f spring =x0+N(0, σOf f spring) (2.3.37)

where∆σ0 is the variation parameter value in the mutation standard deviation. For controlling the convergence rate, Schwefel proposed an additional controlling parameter (Beyer and Schwefel, 2002). Assuming that the additional control parameter is θ and each individual is(x, σ, θ). The mutation Equation2.3.37becomes as follows:

σOf f spring =σ0·eN(0,∆σ0), θOf f spring =θ0+N(0,∆θ0), and

xOf f spring =x0+C(0, σOf f spring, θOf f spring) (2.3.38)

where C(0, σOf f spring_{, θ}Of f spring₎ _{is a vector of Random Gaussian numbers with zero} mean and appropriate probability density. Recently, Hansen proposed the use of the covariance matrix adaptation to adapt and to control the convergence rate (Hansen, 2016; Back et al., 2013). This is called Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES). The historical developments of (1+1)-ES to the the state-of-the-art ES (p-sep- lmm-CMA-ES) is presented in (Back et al., 2013). However, the problem memory size and the run-time are issues in CMA-ES. This is because the size of Covariance Matrix and its calculations in each evolving iteration require more memory and more computational runtime. Thus, recent research (Back et al.,2013) proposed (1+1)-CMA-ES with various adaptations to reduce the problem memory size and the computational runtime. Christian

Igel et al. proved that (1+1)-Cholesky-CMA-ES is more efficient than multi-membered CMA-ES in some unimodal fitness functions (Ackley, Rastrigin and Griewangk) (Igel et al.,2006). Recently, Ilya Loshchilov proposed a computationally efficient limited memory CMA-ES for large scale optimisation technique (LM-CMA-ES) (Loshchilov,2014). In this technique, vectors of random weights are used for the adaptation of the mutation and convergence rate. One of these vectors is a vector of random Ziggurat numbers. The Ziggurat random numbers are positive random Gaussian numbers within the range between 0 and 1. This type of random number is also used in (1+1)-Evolutionary Gradient Strategy for evolving global weight technique6. The use of Gradient step-size helps to give more control in the converging rate of the new fitness functions such as Cumulative Cosine Similarity. The proposed ES techniques in this thesis are presented in details in Chapters6and7.

In document Evolutionary algorithms and machine learning techniques for information retrieval (Page 66-71)