Fitness Assignment - Evolutionary Algorithms

Evolutionary Algorithms

2.3 Fitness Assignment

Ghosh and Jain [574]: Evolutionary Computation in Data Mining

Miettinen, Mäkelä, Neittaanmäki, and Periaux [565]: Evolutionary Algorithms in Engineer-ing and Computer Science

Fogel [714]: Evolutionary Computation: Principles and Practice for Signal Processing Ashlock [715]: Evolutionary Computation for Modeling and Optimization

Watanabe and Hashem [584]: Evolutionary Computations – New Algorithms and their Ap-plications to Evolutionary Robots

Cagnoni, Lutton, and Olague [585]: Genetic and Evolutionary Computation for Image Pro-cessing and Analysis

Kramer [716]: Self-Adaptive Heuristics for Evolutionary Computation

Lobo, Lima, and Michalewicz [717]: Parameter Setting in Evolutionary Algorithms Spears [718]: Evolutionary Algorithms – The Role of Mutation and Recombination Eiben and Michalewicz [719]: Evolutionary Computation

Jin [720]: Knowledge Incorporation in Evolutionary Computation Grosan, Abraham, and Ishibuchi [721]: Hybrid Evolutionary Algorithms Abraham, Jain, and Goldberg [722]: Evolutionary Multiobjective Optimization Kallel, Naudts, and Rogers [723]: Theoretical Aspects of Evolutionary Computing

Ghosh and Tsutsui [724]: Advances in Evolutionary Computing – Theory and Applications Yang, Shan, and Bui [725]: Success in Evolutionary Computation

Pereira and Tavares [726]: Bio-inspired Algorithms for the Vehicle Routing Problem

2.3 Fitness Assignment

2.3.1 Introduction

With concept of prevalence comparisons introduced inSection 1.2.2 on page 9 we deﬁne a partial order on the elements in the problem space X. Many selection algorithms however require that a total order is imposed onto the individuals p of a population. Such a total order can be created by assigning a single real number v(p.x) to each solution candidate p.x – its fitness.

The fitness assigned to an individual may not just reflect its rank in the population, but can also incorporate density/niching information. This way, not only the quality of a solution candidate is considered, but also the overall diversity of the population. This can improve the chance of finding the global optima as well as the performance of the optimization algorithm significantly. If many individuals in the population occupy the same rank or do not dominate each other, for instance, such information will be very helpful.

The fitness v(p.x) thus not only depends on the solution candidate p.x itself, but on the whole population Pop of the evolutionary algorithm and on the archive Arc of optimal elements, if available. In practical realizations, the fitness values are often stored in a special member variable in the individual records. Therefore, v(p.x) can be considered as a mapping that returns the value of such a variable which has previously been stored there by a fitness assignment process “assignFitness”.

Definition 2.5 (Fitness Assignment). A ﬁtness assignment process “assignFitness” cre-ates a function v : X7→ R⁺which relates a scalar ﬁtness value to each solution candidate in the population PopEquation 2.1(and archive Arc, if an archive is availableEquation 2.2).

v = assignFitness(Pop, cmp_F)⇒ v(p.x) ∈ V ⊆ R⁺∀p ∈ Pop (2.1) v = assignFitness(Pop, Arc, cmp_F)⇒ v(p.x) ∈ V ⊆ R⁺∀p ∈ Pop ∪ Arc (2.2) In the context of this book, we generally minimize fitness values, i. e., the lower the fitness of a solution candidate the better. Therefore, many of the fitness assignment processes based on the prevalence relation will obey toEquation 2.3. This equation represents a general

relation – sometimes it is useful to violate it for some individuals in the population, especially when crowding information is incorporated.

p1.x≻p².x⇒ v(p¹.x) < v(p2.x) ∀p¹, p2∈ Pop ∪ Arc (2.3)

2.3.2 Weighted Sum Fitness Assignment

The most primitive ﬁtness assignment strategy would be assigning a weighted sum of the objective values. This approach is very static and comes with the same problems as weighted sum-based approach for deﬁning what an optimum is introduced inSection 1.2.2 on page 11.

It makes no use of the prevalence relation. For computing the weighted sum of the diﬀerent objective values of a solution candidate, we reuseEquation 1.4 on page 11from the weighted sum optimum deﬁnition. The weights have to be chosen in a way that ensures that v(p.x)∈ R⁺ holds for all individuals p.

v(p.x) = assignFitnessWeightedSum(Pop)⇔ ∀p ∈ Pop ⇒ v(p.x) = g(p.x) (2.4)

2.3.3 Pareto Ranking

Another very simple method of fitness assignment would be to use fitness values that directly reflect the prevalence relation. Figure 2.4and Table 2.1illustrate the Pareto relations in a population of 15 individuals and their corresponding objective values f1and f2, both subject to minimization. For doing so, we can imagine two different approaches:

f2 8

0 1 2 3 4 5 6 7 8 10

1 2 3 4 5 6 7 8 9 10 12

13 9

10 1 6

7 2

12 5

3 Pareto Frontier 4

Fig. 2.4: An example scenario for Pareto ranking.

To each individual, we can assign a value inversely proportional to the number of other individuals it prevails, like v(p1.x) ≡ _|∀p₂_∈Pop:p¹₁_.x≻p₂_.x|+1. We have written such fitness values in the column “Ap. 1” of Table 2.1 for Pareto optimization, i. e., the special case where the Pareto dominance relation is used to define prevalence. Individuals that dominate many others will here receive a lower fitness value than those which are prevailed by many.

When taking a look at these values, the disadvantage of this approach becomes clear: It

2.3 Fitness Assignment 93

x prevails is prevailed by Ap. 1 Ap. 2

1 {5, 6, 8, 9, 14, 15} ∅ ¹/7 0

2 {6, 7, 8, 9, 10, 11, 13, 14, 15} ∅ ¹/10 0

3 {12, 13, 14, 15} ∅ ¹/5 0

4 ∅ ∅ 1 0

5 {8, 15} {1} ¹/3 1

6 {8, 9, 14, 15} {1, 2} ¹/5 2

7 {9, 10, 11, 14, 15} {2} ¹/6 1

8 {15} {1, 2, 5, 6} ¹/2 4

9 {14, 15} {1, 2, 6, 7} ¹/3 4

10 {14, 15} {2, 7} ¹/3 2

11 {14, 15} {2, 7} ¹/3 2

12 {13, 14, 15} {3} ¹/4 1

13 {15} {2, 3, 12} ¹/2 3

14 {15} {1, 2, 3, 6, 7, 9, 10, 11, 12} ¹/2 9

15 ∅ {1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14} 1 13 Table 2.1: The Pareto domination relation of the individuals illustrated inFigure 2.4.

promotes individuals that reside in crowded region of the problem space and underrates those in sparsely explored areas.

By doing so, it achieves exactly the opposite of what we want. Instead of exploring the problem space and delivering a wide scan of the frontier of best possible solution candidates, it will focus all effort on one single trajectory. We will only obtain a subset of the best solutions and it is even possible that this fitness assignment method leads to premature convergence to a local optimum. A good example for this problem are the four non-prevailed individuals{1, 2, 3, 4} from the Pareto frontier. The best fitness is assigned to the element 2, followed by individual 1. Although individual 7 is dominated (by 1), its fitness is better than the fitness of the non-dominated element 3.

The solution candidate 4 gets the worst possible ﬁtness 1, since it prevails no other element. Its chances for reproduction are similarly low than those of individual 15 which is dominated by all other elements except 4. Hence, both solution candidates will most probably be not selected and vanish in the next generation. The loss of solution candidate 4 will greatly decrease the diversity and even increase the focus on the crowded area near 1 and 2.

A much better approach for fitness assignment directly based on prevalence has first been proposed by Goldberg [197]. Here, the idea is to assign the number of individuals it is prevailed by to each solution candidate [727, 118, 728, 729]. This way, the previously mentioned negative effects will not occur. As the column “Ap 2” in Table 2.1 shows, all four non-prevailed individuals now have the best possible fitness 0. Hence, the exploration pressure is applied to a much wider area of the Pareto frontier. This so-called Pareto ranking can be performed by first removing all non-prevailed individuals from the population and assigning the rank 0 to them. Then, the same is performed with the rest of the population, and the individuals only dominated by those on rank 0 (now non-dominated) will be removed and get the rank 1. This is repeated until all solution candidates have a proper fitness assigned to them. Algorithm 2.3 outlines another simple way to perform Pareto ranking.

Since we follow the idea of the freer prevalence comparators instead of Pareto dominance relations, we will synonymously refer to this approach as Prevalence ranking.

As already mentioned, the ﬁtness values of all non-prevailed elements in our example Figure 2.4 and Table 2.1are equally 0. However, the region around 1 and 2 has probably already exhaustively been explored, whereas the surrounding of 4 is rather unknown. A better approach of ﬁtness assignment should incorporate such information and put a bit

Algorithm 2.3: v←− assignFitnessParetoRank(Pop, cmpF) Input: Pop: the population to assign fitness values to

Input: cmp_F: the prevalence comparator defining the prevalence relation Data: i, j, cnt: the counter variables

Output: v: a fitness function reflecting the Prevalence ranking begin

more pressure into the direction of 4, in order to make the evolutionary algorithm explore this area more thoroughly.

2.3.4 Sharing Functions

Previously, we have mentioned that the drawback of Pareto ranking is that it does not incorporate any information about whether the solution candidates reside closely to each other or in regions of the problem space which are only sparsely covered by individuals.

Sharing, as a method for including such diversity information into the ﬁtness assignment process, was introduced by Holland [136] and later reﬁned by Deb [730], Goldberg and Richardson [731], Deb and Goldberg [732]. [733, 115]

Definition 2.6 (Sharing Function). A sharing function Sh : R⁺ 7→ R⁺ is a function used to relate two individuals p1 and p2 to a value that decreases with their distance¹⁴ d = dist(p1, p2) in a way that it is 1 for d = 0 and 0 if the distance exceeds a specified Sharing functions can be employed in many different ways and are used by a variety of fitness assignment processes [731, 734]. Typically, the simple triangular function Sh tri [735] or one of its either convex (Sh cvexp) or concave (Sh ccavp) pendants with the power p∈ R⁺, p > 0 are applied. Besides using different powers of the distance-σ-ratio, another approach is the exponential sharing method Sh exp.

Sh triσ(σ) d =

14The concept of distance and a set of different distance measures is defined in Section 29.1 on page 537.

2.3 Fitness Assignment 95 For sharing, the distance of the individuals in the search space G as well as their distance in the problem space X or the objective space Y may be used. Often, the Euclidian distance measure disteucl(F (p1.x) , F (p2.x)) in the objective space is applied in sharing functions, but of course, any other distance measure could be applied. If the solution candidates are real vectors in the Rⁿ, we could as well measure the Euclidean distance of the phenotypes of the individuals directly, i. e., compute disteucl(p1.x, p2.x). In genetic algorithms, where the search space is the set of all bit strings G = Bⁿ of the length n, another suitable approach would be to use the Hamming distance¹⁵ distHam(p1.g, p2.g) of the genotypes. The work of Deb [730], however, indicates that phenotypical sharing will often be superior to genotypical sharing.

Definition 2.7 (Niche Count). The niche count m(p, P ) [737, 733] of an individual p is the sum its sharing values with all other individual in a list P .

∀p ∈ P ⇒ m(p, P ) =

len(P )−1X

i=0

Shσ(dist(p, P^[i])) (2.10)

The niche count m is always greater than zero, since p∈ P and, hence, Shσ(dist(p, p)) = 1 is computed and added up at least once.

The original sharing approach was developed for single-objective optimization where only one objective function f was subject to maximization. In this case, it was simply divided by the niche count, punishing solutions in crowded regions [733]. The goal of sharing was to distribute the population over a number of different peaks in the fitness landscape, with each peak receiving a fraction of the population proportional to its height [735]. The results of dividing the fitness by the niche counts strongly depends on the height differences of the peaks and thus, on the complexity class¹⁶of f . On f1∈ O(n), for instance, the influence of m is much bigger than on a f2∈ O(eⁿ).

By multiplying the niche count m to predetermined fitness values v^′, we can use this approach for fitness minimization in conjunction with a variety of other different fitness assignment processes, but also inherit its shortcomings:

v(p.x) = v^′(p.x)∗ m(p, Pop) , v^′≡ assignFitness(Pop, cmpF) (2.11) Sharing was traditionally combined with ﬁtness proportionate, i. e., roulette wheel selec-tion¹⁷. Oei et al. [115] have shown that if the sharing function is computed using the parental individuals of the “old” population and then na¨ıvely combined with the more sophisticated tournament selection¹⁸, the resulting behavior of the evolutionary algorithm may be chaotic.

They suggested to use the partially filled “new” population to circumvent this problem. The layout of evolutionary algorithms, as defined in this book, bases the fitness computation on the whole set of “new” individuals and assumes that their objective values have already been completely determined. In other words, such issues simply do not exist in multi-objective evolutionary algorithms as introduced here and the chaotic behavior does occur.

For computing the niche count m, O n²

comparisons are needed. According to Goldberg et al. [738], sampling the population can be suﬃcient to approximate min order to avoid this quadratic complexity.

2.3.5 Variety Preserving Ranking

Using sharing and the niche counts plainly leads to more or less unpredictable eﬀects. Of course, it promotes solutions located in sparsely populated niches but how much their ﬁtness

15SeeDefinition 29.6 on page 537for more information on the Hamming distance.

16SeeSection 30.1.3 on page 550for a detailed introduction into complexity and the O-notation.

17Roulette wheel selection is discussed inSection 2.4.3 on page 104.

18You can find an outline of tournament selection inSection 2.4.4 on page 109.

will be improved is rather unclear. Using distance measures which are not normalized can lead to strange eﬀects too. Imagine two objective functions f1 and f2. If the values of f1

span from 0 to 1 for the individuals in the population whereas those of f2 range from 0 to 10000, the components of f1 will most often be negligible in the Euclidian distance of two individuals in the objective space Y. Another problem is that the effect of simple sharing on the pressure into the direction of the Pareto frontier is not obvious either or depends on the sharing approach applied. Some methods simply add a niche count to the Pareto rank, which may cause non-dominated individuals having worse fitness than any others in the population. Other approaches scale the niche count into the interval [0, 1) before adding it, which not only ensures that non-dominated individuals have the best fitness but also leave the relation between individuals at different ranks intact, which does not further variety very much.

Variety Preserving Ranking is a fitness assignment approach based on Pareto ranking using prevalence comparators and sharing. We have developed it in order to mitigate all these previously mentioned side effects and balance the evolutionary pressure between optimizing the objective functions and maximizing the variety inside the population. In the following, we will describe the process of Variety Preserving Ranking-based fitness assignment which is defined inAlgorithm 2.4.

Before the process can begin, it is required that all individuals with infinite objective values must be removed from the population Pop. If such a solution candidate is optimal, i. e., if it has negative infinitely large objectives in a minimization process, for instance, it should receive fitness zero, since fitness is subject to minimization. If the individual is infeasible, on the other hand, its fitness should be set to len(Pop) +p

len(Pop) + 1, which is larger by one than every other ﬁtness values that may be assigned byAlgorithm 2.4.

In lines 2 to 9, we create a list ranks which we use to efficiently compute the Pareto rank of every solution candidate in the population. By the way, the word prevalence rank would be more precise in this case, since we use prevalence comparisons as introduced in Section 1.2.4. Therefore, Variety Preserving Ranking is not limited to Pareto optimization but may also incorporate External Decision Makers (Section 1.2.4) or the method of in-equalities (Section 1.2.3). The highest rank encountered in the population is stored in the variable maxRank. This value may be zero if the population contains only non-prevailed elements. The lowest rank will always be zero since the prevalence comparators cmp_F define order relations which are non-circular by definition.¹⁹. We will use maxRank to determine the maximum penalty for solutions in an overly crowded region of the search space later on.

From line 10 to 18, we determine the maximum and the minimum values that each objective function takes on when applied to the individuals in the population. These values are used to store the inverse of their ranges in the array rangeScales, which we will use to scale all distances in each dimension (objective) of the individuals into the interval [0, 1].

There are|F | objective functions in F and, hence, the maximum Euclidian distance between two solution candidates in the (scaled) objective space becomes p

|F |. It occurs if all the distances in the single dimensions are 1.

The most complicated part of the Variety Preserving Ranking algorithm is between line 19and 33. Here we computed the scaled distance from every individual to each other solution candidate in the objective space and use this distance to aggregate share values (in the array shares). Therefore, again two nested loops are needed (lines22and24). The distance components of two individuals Pop^[i].x and Pop^[j].x are scaled summarized in a variable dist in line 27. The Euclidian distance between them is √

dist which we use to determine a sharing value in 28. We therefore have decided for exponential sharing with power 16 and σ =p

|F |, as introduced in Equation 2.9 on page 94. For every individual, we sum up all the shares (see line30). While doing so, we also determine the minimum and maximum such total share in the variables minShare and maxShare in lines32and33.

19In all order relations imposed on finite sets there is always at least one “smallest” element.

SeeSection 27.7.2 on page 463for more information.

2.3 Fitness Assignment 97 Algorithm 2.4: v←− assignFitnessVarietyPreserving(Pop, cmpF)

Input: Pop: the population

Input: cmp_F: the comparator function Input:[implicit]F : the set of objective functions

Data: . . . : sorry, no space here, we’ll discuss this in the text Output: v: the fitness function

begin

/* If needed: Remove all elements with infinite objective values from Pop and assign fitness 0 or len(Pop) +p

len(Pop) + 1 to them. Then compute the

prevalence ranks. */

if ranks[i]> maxRank then maxRank←− ranks^[i]

// determine the ranges of the objectives mins←− createList(|F | , +∞)

rangeScales←− createList(|F | , 1)

for i←− |F | − 1 down to 0 do

if maxs[i]> mins[i]then rangeScales[i]←− 1/ (maxs^[i]− mins^[i])

// Base a sharing value on the scaled Euclidean distance of all elements shares←− createList(len(Pop) , 0)

if curShare < minShare then minShare←− curShare

if curShare > maxShare then maxShare←− curShare

// Finally, compute the fitness values scale←−

1/ (maxShare− minShare) if maxShare > minShare 1 otherwise

maxRank∗ scale ∗ (shares^[i]− minShare)

else v(Pop[i].x)←− scale ∗ (shares^[i]− minShare)

end

We will use these variables to scale all sharing values again into the interval [0, 1] (line 34), so the individual in the most crowded region always has a total share of 1 and the most remote individual always has a share of 0. So basically, we now know two things about the individuals in Pop:

1. their Pareto rank, stored in the array ranks, giving information about their relative quality according to the objective values and

2. their sharing values, held in shares, denoting how densely crowded the area around them is.

With this information, we determine the ﬁnal ﬁtness values of an individual p as follows:

If p is non-prevailed, i. e., its rank is zero, its ﬁtness is its scaled total share (line 38).

Otherwise, we multiply the square root of the maximum rank,√

maxRank, with the scaled share and add it to its rank (line 37). By doing so, we preserve the supremacy of non-prevailed individuals in the population but allow them to compete with each other based on the crowdedness of their location in the objective space. All other solution candidates may degenerate in rank, but at most by the square root of the worst rank.

Example

2 4 6 8 f1 10

0 2 4 6 8 10

0 0.2 0.4 0.6 0.8 1

sharepotential

1 5

6 8

2 9

7 10

3 12

Pareto Frontier

Fig. 2.5: The sharing potential in the Variety Preserving Ranking example

Let us now apply Variety Preserving Ranking to the examples for Pareto ranking from Section 2.3.3. In Table 2.2, we again list all the solution candidates from Figure 2.4 on page 92, this time with their objective values obtained with f1 and f2 corresponding to

2.3 Fitness Assignment 99 their coordinates in the diagram. In the third column, you can ﬁnd the Pareto rank of the individuals as it has been listed inTable 2.1 on page 93. The columns share/u and share/s correspond to the total sharing sums of the individuals, unscaled and scaled into [0, 1].

x f1 f2 rank share/u share/s v(x)

1 1 7 0 0.71 0.779 0.779

2 2 4 0 0.239 0.246 0.246

3 6 2 0 0.201 0.202 0.202

4 10 1 0 0.022 0 0

5 1 8 1 0.622 0.679 3.446

6 2 7 2 0.906 1 5.606

7 3 5 1 0.531 0.576 3.077

8 2 9 4 0.314 0.33 5.191

9 3 7 4 0.719 0.789 6.845

10 4 6 2 0.592 0.645 4.325

11 5 5 2 0.363 0.386 3.39

12 7 3 1 0.346 0.366 2.321

13 8 4 3 0.217 0.221 3.797

14 7 7 9 0.094 0.081 9.292

15 9 9 13 0.025 0.004 13.01

Table 2.2: An example for Variety Preserving Ranking based onFigure 2.4.

But let us start from the beginning. As already mentioned, we know the Pareto ranks of the solution candidates from Table 2.1, so the next step is to determine the ranges of

In document Global Optimization Algorithms (Page 109-119)