Algorithm analysis: PICEA-w - Preference-inspired Co-evolutionary Algorithms

4.4 PICEA-w

4.4.2 Algorithm analysis: PICEA-w

PICEA-w co-evolves candidate solutions with weight vectors during the search. The candidate solutions are evaluated by the employed weights in a similar way to other decomposition based algorithms (e.g., MOGLS, MSOPS and MOEA/D). However, the weights are neither randomly generated nor initialised as an even distribution but are adaptively modified in a co-evolutionary manner during the search. It is expected PICEA-w would be less sensitive to problem geometry and perform better than other decomposition based algorithms that use random, evenly distributed or adaptive weights on many-objective problems.

The convergence of PICEA-w is affected by the chosen scalarising function (as described in Section4.2.1). We apply the Chebyshev scalarising function to PICEA-w. Compared with otherLp scalarising functions, the Chebyshev scalarising function, though it leads

to a slower convergence speed, is able to identify Pareto optimal solutions in both convex and non-convex regions.

The diversity performance of PICEA-w is affected by the distribution of the employed weights. Co-evolution enables suitable weights to be constructed adaptively and thus distributing the search effort appropriately towards different parts of the Pareto front. The two criteria used for the selection of weights effectively balances exploration and exploitation.

With respect to the time complexity of PICEA-w, evaluation of a population of candidate solutions runs atO(M×N), whereM is the number of objectives andN is the number of candidate solutions. The main cost of PICEA-w concerns function coEvolve, in which three sub-functions are involved. The sub-functionrankingSWranks all candidate solutions on each weight vector and so runs atO(N2×Nw) (assuming that bubble sorting

is used). The sub-functionselectSselects the bestN solutions from 2N solutions which runs at O(N2). The sub-function selectW calculates the angle between each pair of candidate solution and weight vector, and runs at O(N ×Nw). Therefore, the overall

time complexity of PICEA-w is O(N2×Nw).

4.5 Experiment description

4.5.1 Test problems

Table 4.3 shows the employed test problems. These test problems are constructed by applying different shape functions provided in the WFG toolkit to the standard WFG4 benchmark problem (Huband et al.,2006). Details are provided in AppendixA.3.

Table 4.3: Problem geometries of the WFG4X test problems.

Test problem Geometry

WFG41 concave WFG42 convex WFG43 strong concave WFG44 strong convex WFG45 mixed WFG46 hyperplane WFG47 disconnected, concave WFG48 disconnected convex

These problems are invoked in 2-, 4-, 7- and 10- objective instances. The WFG parameters k (position parameter) and l (distance parameter) are set to 18 and 14, i.e., the number of decision variables isn=k+l= 32 for each problem instance.

Optimal solutions of these problems satisfy the condition in Equation4.9:

xi=k+l:n= 2i×0.35 (4.9)

wherenis the number of decision variables and n=k+l. To obtain an approximation of the Pareto optimal front, we first randomly generate 20,000 optimal solutions for the test problem and compute their objective values. Second, we employ the clustering technique employed in SPEA2 to select a set of evenly distributed solutions from all the generated solutions.

The Pareto optimal front of the WFG4X problem has the same trade-off magnitudes, and it is within [0,2]. Thus, thenadir point for these problems is [2,2,· · · ,2]. Figures 4.13 and4.14show the Pareto optimal front as well as the corresponding optimal distribution of weights for all the 2-objective WFG problems. We also plot the Pareto optimal fronts of the 3-objective WFG4X problems in AppendixA.3.

4.5.2 The considered competitor MOEAs

To benchmark the performance of PICEA-w, four competitor decomposition based algorithms are considered. All the competitors use the same algorithmic framework as PICEA-w, and the Chebyshev scalarising function is chosen. The only difference lies in the way of constructing J ointW.

• The first algorithm (denoted as RMOEA) formsJ ointW by combiningNwweights

that are randomly selected from current J ointW and another set of Nw ran-

domly generated weights. RMOEA represents decomposition based algorithms using random weights, e.g., I-MOGLS (Ishibuchi and Murata,1998) and J-MOGLS (Jaszkiewicz,2002).

0 0.5 1 1.5 2 0

0.5 1 1.5

2 Pareto optimal front

Optimal weights (a) WFG41-2 0 0.5 1 1.5 2 0 0.5 1 1.5

2 Pareto optimal front

Optimal weights (b) WFG42-2 0 0.5 1 1.5 2 0 0.5 1 1.5 2

Pareto optimal front Optimal weights (c) WFG43-2 0 0.5 1 1.5 2 0 0.5 1 1.5

2 Pareto optimal front

Optimal weights

(d) WFG44-2

Figure 4.13: Pareto optimal fronts and the optimal distributions of weights for WFG41-2 to WFG44-2.

• The second competitor MOEA applies 2Nw evenly distributed weights asJ ointW

(denoted as UMOEA). UMOEA represents another class of decomposition based algorithms, such as MSOPS and MOEA/D, that use uniform weights.

• The other two considered competitors use adaptive weights. The weights adaptation strategies are extracted from DMOEA/D (Gu et al.,2012) and EMOSA (Li and Landa-Silva,2011), respectively. The reason for choosing these two algorithms is that DMOEA/D has been shown to be able to obtain evenly distributed solutions for bi- and three-objective problems having complex geometries, and EMOSA is found to outperform MOGLS and MOEA/D on bi- and three-objective problems. Note that the neighbourhood size used in DMOEA/D and EMOSA is set as T = 10 which is demonstrated to offer a good performance (according to our previous comparative study (Wang et al.,2013)).

0 0.5 1 1.5 2 0

0.5 1 1.5

2 Pareto optimal front

Optimal weights (a) WFG45-2 0 0.5 1 1.5 2 0 0.5 1 1.5

2 Pareto optimal front

Optimal weights (b) WFG46-2 0 0.5 1 1.5 2 0 0.5 1 1.5

2 Pareto optimal front

Optimal weights (c) WFG47-2 0 0.5 1 1.5 2 0 0.5 1 1.5

2 Pareto optimal front

Optimal weights

(d) WFG48-2

Figure 4.14: Pareto optimal fronts and the optimal distributions of weights for WFG45-2 to WFG48-2.

The weights adaptation strategies in DMOEA/D and EMOSA are briefly described as follows (for the readers’ convenience), more details can be found in Chapter 2.4.2p.40.

(i) DMOEA/D: first a piecewise linear interpolation method is used to fit a curve (M = 2) or hyper-surface (M > 2) for the current non-dominated solutions. Sec- ond, sample a set of evenly distributed points from the curve (or the hyper-surface). After that an optimal distribution of weights, corresponding to the evenly distributed points, is generated to guide the search.

(ii) EMOSA: for each member F si in the current population, first, find the closest

neighbour (e.g.,F sj) toF siand its associated weight vectorwj. Second, identify

the weights inJ ointW whose Euclidean distance towj is larger than the distance

which the distance between them andwiare closer than the distance between them

and all the neighbours ofwi. The definition of the neighbourhood is the same as

MOEA/D. If there are multiple weights, then pick one randomly.

4.5.3 General parameters

Each algorithm is performed for 31 runs, each run for 25 000 function evaluations. For all algorithms the population size of candidate solutions and weights are set asN = 100 andNw = 100, respectively. Simulated binary crossover (SBX) and polynomial mutation

(PM) are applied as genetic operators. The recombination probabilitypcof SBX is set to

1 per individual and mutation probabilitypm of PM is set to 1/nper decision variable.

The distribution indices ηc of SBX and ηm of PM are set as 15 and 20, respectively.

These parameter settings are summarised in Table4.4and are fixed across all algorithm runs.

Table 4.4: Algorithm testing parameter setting.

Parameters Values N 100 Nw 100 maxGen 250 Crossover operator SBX (pc= 1, ηc= 15) Mutation operator PM (pm = _n1, ηm= 20) 4.5.4 Performance assessment

The hypervolume metric (HV) is used as a performance metric. A favourable hypervolume (larger, for a minimisation problem) implies a better combination of proximity and diversity. The approximation sets used in the HV calculation are the members of the offline archive of all non-dominated points found during the search, since this is the set most relevant to aposteriori decision-making. For reasons of computational feasibility, prior to analysis the set is pruned to a maximum size of 100 using the SPEA2 truncation procedure (Zitzler et al.,2002). Note that prior to calculating theHV, we normalize all objective values to be within the range [0,1] using thenadir point (which assumes equal relative importance of normalised objectives across the search domain). The reference point for the hypervolume calculation is set as ri = 1.2, i= 1,2,· · · , M.

Performance comparisons between algorithms based on theHV metric are made according to a rigorous non-parametric statistical framework, drawing on recommendations in Zitzler et al.(2003). Specifically, we first test the hypothesis that all algorithms perform equally using the Kruskal-Wallis test (Hollander and Wolfe,1999). If this hypothesis is rejected at the 95% confidence level, we then consider pair-wise comparisons between the algorithms using the Wilcoxon-ranksum two-sided comparison procedure (Hollander

and Wolfe,1999) at the 95% confidence level, employing the ˇSid´ak correction to reduce Type I errors (Curtin and Schulz,1998).

4.6 Experiment results

First, the Pareto fronts obtained by PICEA-w are visually plotted. Then, the statistical comparison results of PICEA-w and other competitor decomposition based algorithms are presented.

In document Preference-inspired Co-evolutionary Algorithms (Page 143-148)