Evolutionary Algorithms - Constructive and evolutionary algorithms for airport baggage sorting

An Evolutionary Algorithm (EA) is a population-based mechanism inspired by biolog- ical evolution, such as reproduction, mutation, recombination (population selection), and parent selection (member selection), which are based on the Darwin and Wallace (1858) theory of natural selection as developed in the former’s classic foundational work Origin of Species (On the Origin of Species by Means of Natural Selection,

2.7. EVOLUTIONARY ALGORITHMS 25

or the Preservation of Favoured Races in the Struggle for Life, Darwin (1859)) and Mendelian genetics (Experiments in Plant Hybridisation, Mendel (1865)), which are recognised as the foundation of evolutionary biology.

GAs have been used in the solution of a wide range of problems and are one of the methodologies belonging to the population-based model of EAs. GAs are population based approaches which encode the problem solutions on a chromosome-like data structure, the population being composed of solutions. Solutions are then selected, based on the reproductive allocated opportunities, following which recombination operators are applied in order to produce new solutions in the solution search space. The genetics principles were taken from biology and then applied to artificial sys- tems, based on the work of Holland (1975) and DeJong (1975) which constituted the origin of GAs. The early theoretical studies of GAs included such works as Vose and Liepins (1991) which aimed at achieving a better understanding of the Simple Genetic Algorithm (which is alternatively titled the Canonical Genetic Algorithm (CGA)) using the support of matrices (Walsh matrix), and Prugel-Bennett and Shapiro (1994), which applied a statistical mechanics-style approach in order to explain behaviour. Hinton and Nowlan (1987) investigated the way in which learning can mould the fitness landscape, since an individual’s fitness will consist of a genetic contribution, referred to as crossover, and a learned contribution known as mutation. Goldberg (1990), Whitley (1991) and Holland (1975) explored the problems of exploiting link- age and the recombination of tagged representations. Eiben et al (1995), Tsutsui and Jain (1998), and Eiben (2003) studied both the effect of using multiple parents and multiple crossover points. These studies emphasises the importance of operators. Blickle and Thiele (1996) presented an analysis of some different selection schemes, with the objective of overcoming the premature convergence problem, wherein offspring are never superior to their parents. Some typical selection operators are shown in Section 2.7.1.

Theoretical studies of the GAs however were and still are based on a binary problem representation which arguably restricts its applicability, but undoubtedly assist an overall understanding of GA workings.

The terms phenotype and genotype are typically used in genetics to assist in the explanation and comparison of individuals. The phenotype is the observable realisation of an individual (in this thesis an individual is the equivalent of a solution), where the genotype refers to the makeup of the same individual. For example, when considering the two genes determining the organism’s gender (X and Y), two of these genes are necessary to represent the gender (genotype), so that XX represents a female

and YX represents a male. The genotype is the combination of these genes, that is to say YX and XX, and being male or female is the phenotype, as shown in Figure 2.8.

Figure 2.8: Examples of phenotype and genotype.

GAs differ from other methods in that they search among a population of solutions (often called a population based algorithm), and work with the encoded parameter set, which constitutes the genotype, rather than using the parameter values themselves.

The CGA was introduced by Holland (1975), using a binary model, and the Schema theorem was then developed to explain it. The next population of solutions of a predetermined size is then generated by applying a replacement strategy to the current population, here referred to as population selector. A replacement strategy selects solutions from a given population to take part in creating the next population. The members from the next population will be used as parents in producing a new population of solutions. The selection of the parents in generating a new solution is called parent selection or member selector. A crossover operator with a certain high probability is then applied to all solutions taken from the next population (which constitute the parent solutions) to produce the new solutions, which may be modified once more by application of a mutation operator with a low probability, finally con- stituting the current population. The process described represents one generation. These operations are repeated until one of the stopping conditions is reached, where- upon the new solutions are assessed for use in the final solutions, as demonstrated in Figure 2.9.

2.7. EVOLUTIONARY ALGORITHMS 27

Figure 2.9: Flowchart of the Canonical Genetic Algorithm (CGA).

Evolutionary Strategies (ESs) are a sub-class of nature-inspired search methods belonging to the class of EAs and are based on the work of Rechenberg (1971). The canonical versions of the ESs are denoted by (µ, λ)-ES and (µ+λ)-ES. Where µ is the number of parents andλis the number of offspring. The (µ, λ)-ES is closer to the generational model used in CGA where offspring replace the parents and take part in the next generation,λ≥µ. In the (µ+λ)-ES,µparents produce λoffspring and the new population of µ parents are selected from the combined population of offspring and the parents.

2.7.1 Selection Approaches

The fitness function defines a scalar value for each individual used by the selection method to compare individuals. The loss of different fitness values in the population leads to a reduction in the selection pressure on individuals having the same fitness. Some common selection approaches (selectors) are presented here and are used in the study conducted in this thesis.

Elitist Selection

The Elitist Selection (ES) selects the fittest µ population members from the current population.

Roulette Wheel Parent Selection

The Roulette Wheel Member Selection (RWMS) was originally used by Holland (1975), where the probability of a solution being selected is assigned to each solution in the population of λ solutions (1 ≤ i ≤ λ), which is proportional to their fitness (fi), Equation 2.1. A section of a roulette wheel is assigned to each of the

solutions based on their corresponding probability, where s0 = 0, si =Pij=1pj and

[si−1, si)∀i∈[1. . . λ]. pi = fi Pλ i=1fi fori∈[1. . . λ] (2.1)

A random number between zero (included) to one (excluded) is obtained, which is represented here asrnd[0,1), so the section within which the random number falls, identifies the solution to select, e.g. for si−1≤rnd[0,1)< si solution iis selected.

One spin of the roulette (rnd[0,1)) is required per solution to be selected, whereas in the Stochastic Universal Sampling (SUS) with only one spin all of the required solutions are obtained. Given that the selections are independent of each other, in both the Tournament Member Selection (TMS) and the RWMS, Blickle and Thiele (1996) showed that there is a relatively high mean variation in the outcome of selecting the solutions in a population, which can be almost eliminated completely by using SUS (Baker (1987)).

Blickle and Thiele (1996) looked at different selection methods for discrete and continuous problems, and their selection variance (fitness before and after selection) concluding, based on the assumption that higher variance is advantageous, that Roulette Wheel Selection method is not appropriate as a selection scheme and the Ex- ponential Ranking Selection is the best selection schema. They also pointed out that

2.7. EVOLUTIONARY ALGORITHMS 29

for a better understanding of the behaviour it is necessary to consider the operators used.

Tournament Selection

In the TMS, commonly called Tournament Selection, a few individuals from the population are chosen at random, where all members of the population have the same probability of being selected, Goldberg (1990); Goldberg and Deb (1991). The fittest is finally selected from among the chosen individuals.

Stochastic Universal Sampling

The SUS was introduced by Baker (1987) to reduce bias and inefficiency in the selection of individuals. The SUS exhibits less bias and spread (range of possible values for the number of an individual’s offspring) than the Roulette Tournament Selection. Theλmembers of the population are mapped by sections, as in the Roulette Tourna- ment Selection, in the range [pi−1, pi)∀i∈[1. . . λ], withp0 = 0, based on their fitness

fi (pi =

Pi j=1fj

Pλ j=1fj

). µ individuals are selected by obtaining an initial random number within [0,_µ1), i.e. r₀ = rnd[0,_µ1), and subsequent ones spread _µ1 from the previous one. The solution iis selected once for each pi−1 ≤ j−_µ1+r0 < pi ∀j∈[1. . . µ].

Remainder Stochastic Sampling

Remainder Stochastic Sampling (RSS) is based upon the ratio between the fitness of a solution and the average population fitness. In Remainder Stochastic Sampling with Replacement (RSSR), the fractional relative fitness values are used to calculate weights in a roulette wheel selection which is then used to produce the remaining population.

In Remainder Stochastic Sampling Without Replacement (RSSWR), the fractional part of an individual is set to zero where it has been selected during the fractional phase of the selection. According to Goldberg (1989), RSSR has a greater probability of population diversity than the roulette wheel technique and provides zero bias (similarly to Stochastic Universal Modified Sampling (SUMS) and SUS).

Linear Ranking Selection

The Linear Ranking Selection (LRS) was first suggested by Baker (1989), Whitley (1989) and B¨ack and Hoffmeister (1991). For a population ordered in ascending fitness, the probability assigned to an individualifor a population of sizeλis provided

by Equation 2.2 wherep1 is the probability of the worst individual being selected and

pλ is the probability of the best individual being selected.

pi = p1+ (pλ−p1)∗ i−1 λ−1 ∀ i∈[1. . . λ], pλ = ( 2 λ−p1) and 0≤p1 ≤1 (2.2) All individuals have a different rank so all receive a different probability, even if they are of the same fitness.

Exponential Ranking Selection

The Exponential Ranking Selection (ERS) differs from LRS only in that the assigned probabilities are exponentially weighted, Equation 2.3. Blickle and Thiele (1996) discussed the meaning and the influence of the parameterc in detail.

pi =

cλ−i

Pλ j=1cλ−j

∀ i∈[1. . . λ] and 0< c <1 (2.3)

Given that Pλ_i₌₁pi= 1 the Equation 2.3 can be written as

pi =

c−1 cλ₋₁c

λ−i _for_i_∈_[1_{. . . λ] and 0}_{< c <}₁ _(2.4)

2.7.2 Multi-objective Optimisation

Multi-objective optimisation has been applied in many areas where decisions need to be taken in the presence of trade-offs between two or more conflicting objectives. The difficulty appears because in the case of a non-trivial Multi-Objective Optimisation Problem (MOOP), a single solution which simultaneously optimises each objective does not exist. Given this trade-off between two or more conflicting objectives, a solution is known as non-dominated, Pareto optimal (Pareto (1909), Tarascio (1968)), where there are objective(s) which cannot be improved without degrading one or many of the other objectives. The non-dominated solutions constitute what is known as the Pareto front. So it is necessary to find as many Pareto-optimal solutions (non-dominated solutions) as possible, Michalewicz and Fogel (2002) and Burke and Kendall (2005).

The MOOP has been solved as a single-objective optimisation problem where a single fitness function is used, i.e. a weighted sum of all the objectives (Prem Kumar and Bierlaire (2013), Dorndorf et al (2010), Hu and Di Paolo (2009), Pesch et al (2008), Dorndorf et al (2007a), Lim et al (2005) in the AGAP, and Ascó et al (2013), Ascó et al (2012), Ascó et al (2011) and Abdelghany et al (2006) in the ABSSAP). In a

2.7. EVOLUTIONARY ALGORITHMS 31

single-objective optimisation problem, the aim is to find one solution which optimises the combined fitness function. The aim is more than merely finding optimal solutions for each objective in MOOPs. The objective function in multi-objective problems constitutes a multi-dimensional space (the objective space), in addition to the decision variables space common to all optimisation problems. Although the search process of an algorithm takes place in the decision variable space, multi-objectives EAs use the objective space information in their search operators. In a multi-objective approach the aims are commonly convergence to the Pareto front and maintenance of a set of maximally-spread Pareto-optimal solutions. Most multi-objective optimisation algorithms use the idea of dominance in their search for solutions to reach and build the Pareto front.

The weighted sum approach is a commonly used classical multi-objective optimisation approach, which consists of converting the multi-objective problem into a single objective as the combined weighted sum of each objective. Its conceptual sim- plicity is complicated by the need to determine appropriate weights, the answer to which is not unique, as it depends on the importance given to each objective. This approach of combining multiple objectives into a single one is used in this study. Another classic approach is theǫ-Constraint introduced in Haimes et al (1971) which keeps one objective whilst restricting the remaining objectives.

EAs combine methodologies which allow an efficient means of finding multiple Pareto-optimal solutions in a single run. Srinivas and Deb (1994) introduced the Non- dominated Sorting Genetic Algorithm (NSGA) which was later modified in Kalyan- moy Deb and Meyarivan (2002) which introduced the Elitist Nondominated Sort- ing Genetic Algorithm II (NSGA-II), with the intention of overcoming some of the problems of the original NSGA. More recently Hanne (2009) in their GA, known as Primal-Dual Multiobjective Optimisation Algorithm (PDMOEA), considered the infeasible solutions and uses populations of variable size. Their results show that by extending the search to infeasible regions, the population may more easily reach new parts of the Pareto front.

The Strength Pareto Evolutionary Algorithm (SPEA) was introduced in Zitzler and Thiele (1999) and further improved in Zitzler et al (2001) (SPEA2) which incor- porates a fine-grained fitness assignment strategy, a density estimation technique, and an enhanced archive truncation method. The Improved Strength Pareto Evolutionary Algorithm 2 (ISPEA2) presented in Sheng et al (2012) is a more recent extension of the SPEA. Other multi-objective optimisation approaches are Vector Evaluated Ge- netic Algorithm (VEGA) (Schaffer (1984) and Schaffer (1985)), and Pareto Archived

Evolution Strategy (PAES) (Knowles and Corne (2003), Knowles and Corne (2000), Knowles and Corne (1999b) and Knowles and Corne (1999a)) which uses a simple (1+1) local search evolution strategy.

Coello et al (2007) provided a comprehensive survey of EA for multi-objective optimisation, and the survey in Castillo Tapia and Coello Coello (2007) concentrated on multi-objective optimisation in the areas of economics and finance.

2.7.3 Diversity

The population diversity of an EA is an important factor in avoiding premature convergence Michalewicz (1996). For many EAs a key obstacle to finding the global optimal solution is insufficient solution diversity, causing the algorithm to become trapped in a local optimum. The solution diversity can be influenced by the algorithm parameters such as population size, operators and diversity preservation approaches. One of the diversity preservation approaches corresponds to the selection methods, some of which are presented in Section 2.7.1. A survey of measures used to capture diversity in genetic programming was provided in Burke et al (2004).

Other approaches used to promote diversity are as follows:

Ageing: Syswerda (1990) uses ageing to help maintain diversity in the population. Arabas et al (1994) and Kubota and Fukuda (1997) used ageing approaches to resolve the premature convergence problem. Ghosh et al (1998) incorporated an ageing approach where new individuals begin with a zero age and at every iteration their age increases, which age is then used to calculate their effective fitness value, which changes dynamically.

Island model: this model considers the geographical distribution of individuals, Martin et al (1997). This model is used in parallel distributed GA, surveyed in Knysh and Kureichik (2010) and Cant-Paz (1998).

Crowding technique: this was introduced by DeJong (1975) as a technique for preserving population diversity and preventing premature convergence. Crowding is applied to generate the next generation in GAs. The next generation is composed of the individuals selected using the crowding technique among those in the current population and their offspring. Crowding is composed of two main stages: pairing and replacement. In the pairing stage, offspring individuals are paired with individuals in the current population according to a similarity metric. In the replacement stage, it is decided for each pair of individuals which of them will remain in the population. A review of crowding approaches for GAs can be found in Mengshoel and Goldberg (2008).

In document Constructive and evolutionary algorithms for airport baggage sorting station and gate assignment problems (Page 46-55)