3.2 The MOLA Method
3.2.3 The process of searching and learning
The calculation of MOLA consists of two parts: the process of searching and the process of learning from neighborhood. They are introduced in the following two subsections respectively.
The process of searching
The process of searching is carried out through a tournament that is held between two types of search, Pareto global search and Pareto local search. The rule of the tournament will be introduced in Section 3.2.4. In this subsection, Pareto global search and Pareto local search are introduced respectively as follows:
Pareto global search (PGS): Starting with the first stateX, by selecting automa- toni, at dimensional statexi, a search is undertaken by calculating the path values
3.2 The MOLA Method 74
of two possible paths using the cell values obtained in the previous iterations, as given in (3.2.1), and then movingxi towards a path, which is selected according to the probability calculated using the path values, as in (3.2.2), with a step lengthη, which is determined by the probability calculated from the cell values of the cells locating on the path, as in (3.2.3). A reinforcement signal is generated using di- mensional state xi, according to (3.2.5). Then the estimated path values and the
reinforcement signal are used to update the cell value ofci,j, in which dimensional statexi locates, using (3.2.4). In this case the objective function values of the mul- tiple objective functions are calculated once, and the cell value of V(ci,j)|xi∈ci,j is updated to replace the cell value that was stored previously when any dimensional state located within cellci,jwas visited. At the same time,Xbest and elite listLare updated according to (3.2.6) and (3.2.7) respectively. If the reinforcement signal is not equal to 1, the current iteration ends and the next automaton is selected in order for MOLA computation to continue. Otherwise searching on this path is regarded worthy, as the cells locating on this path could be very sensitive and need more ex- ploitation. In order to further exploit the potential of the dimensional states located on this path, the search action is taken continuously forIemax times in the same way presented in (3.2.1), (3.2.2), (3.2.3), (3.2.5), (3.2.4), (3.2.6) and (3.2.7). An iteration completes after the exploitation, which has updatedXbestand all values of the cells
visited, before the next automaton is selected. After all automata are sequentially selected once, in order to increase the diversity and fully explore the probability of finding the Pareto set, different perturbations are introduced to the non-dominated solutions stored in the elite list, according to the following rule:
X′ ←X+ ∆ +β(Xbest−X) (3.2.9)
where∆i = sign(κ)ζ(xmax,i−xmin,i), ζ is a random variable and ζ ∈ [0,Dk]. The
sign function is used to choose the moving direction ofxi. Supposexi is located in cellci,j, then the input to the sign function is the subtraction of the two adjacent cell values ofci,j, that is,κ= (V(ci,j+1)−V(ci,j−1)). Ifκ≥0,sign(κ) = 1, otherwise,
sign(κ) = −1. It can be seen that xi moves towards the direction in which the adjacent cell has a larger cell value. Once completing the perturbation operation, the obtained solutions are used to updateXbest andL, according to (3.2.6) and (3.2.7)
Minimize f> M inim iz e f ? d@ dA d> dB Obtained solutions dA C> dD F*
Figure 3.2: The illustration of findingF∗
respectively.
Pareto local search (PLS): PLS is similar to PGS with the exception of the fol- lowing two aspects:
1. The perturbation operation used in PGS,i.e.(3.2.9), is not performed in PLS. 2. The calculation of reinforcement signal is different from that used in PGS. To obtain the reinforcement signal, a target objective function vector,F∗, which is regarded as the aim of the PLS, should be set at the beginning of the search.
F∗can be determined by the following method. First, the Euclidean distance between every two consecutive objective function vectors is calculated, as given in Fig. 3.2, in which solid circles denote the objective function vectors of the solutions stored inL. The pair of the two consecutive objective function vectors, which has the largest Euclidean distance, is taken as two base vertexes to create an isoceles triangle, in which the height is half of the base length. Then the third vertex of the triangle is selected asF∗, as given in Fig. 3.2. During the process of Pareto local search, the fitness value of state X(x′
i)
can be obtained by calculating the Euclidean distance between F∗ and the
objective function vector of the solution, i.e. F(X(x′
i)). For simplicity, the
Euclidean distance betweenF∗andF(X(x′
3.2 The MOLA Method 76
Then a reinforcement signal is generated by the following rule:
r(X(x′i)) = (
1 if d(F∗, F(X(x′
i)))≤d(F∗, F(Xlbest))
0 otherwise (3.2.10)
where Xlbest (called the local best solution) denotes the latest best solution
found during the process of Pareto local search. In other words, among all the solutions found during the process of Pareto local search, the objective func- tion vector of solutionXlbest is closest toF∗. It can be seen that r = 1rep- resents a favorable response, as the objective function vector of the obtained solution X(x′
i) is closer to F∗ than that of Xlbest; r = 0 is an unfavorable
response.
The relationship between stateX(x′
i)andXlbest can be described by the fol-
lowing equation: Xlbest← ( X(x′ i) if r= 1 Xlbest otherwise (3.2.11) whereX(x′
i) = [x1, . . . , xi−1, x′i, xi+1, . . . , xN]. In the process of Pareto local
search,Xlbestis updated according to (3.2.11), and at the same time,Xlbestis used to updateXbest andLaccording to (3.2.6) and (3.2.7) respectively.
The process of learning
Suppose the current state, X, is one of the non-dominated solutions stored in elite listL, as shown in Fig. 3.3, which plots the objective function vectors,F(·), of the non-dominated solutions in the objective space. The solutions whose objective function vectors locate at the adjacent area ofF(X),e.g.areaE1, are the neighbor-
ing solutions ofX. It is believed that the neighboring solutions of X carry more
useful information that can benefit the search around X, and they can make more
contributions than remote solutions,e.g.the solutions whoseF(·)locates at areaE2.
Here, a method of clarifying the relationship of neighborhood among the non- dominated solutions is given first. Then the operation of learning from neighbor- hood can be performed based on the obtained solutions.
E E F G H F G IJKLMNO P L Q R S T U V W XY U Z [\]^ _` _a bcdefgdh O ei NL g
Figure 3.3: The illustration ofX’s neighborhood
• To find out the relationship among the non-dominated solutions, an uniform
spread ofM weight vectors, (W1,· · · ,Wi,· · ·,WM), whereWi = [w1,· · · , wmf]andP(W) = 1, are initialized at the beginning of the learning process
[103]. Then the Euclidean distances between any two weight vectors are cal- culated, and theM/4closest weight vectors to each weight vector are found
according to the Euclidean distances. Let the indexes of the M/4 closest
weight vectors toWibe denoted as setD(i).
Each weight vector,Wi (i = 1,2,· · · , M), is associated with a subsolution, denoted asXi
sub, which is selected fromLaccording to Tchebycheff rule as
follows:
Xsubi = argminX∈Lgte(X|Wi, z∗) (3.2.12) where Tchebycheff value is defined as:
gte(X|Wi, z∗) = max
1≤j≤mf{wj
|fj(X)−zj∗ |
fj,range } (3.2.13)
andz∗ = (z∗
1,· · ·, zj∗,· · · , zmf∗ )is the reference point,i.e. zj∗ = min{fj(X)| X ∈L};fj,range is the estimated range offj,i.e. fj,range = max{fj(Y)|Y ∈
L} −min{fj(X)|X ∈L}.
After each weight vector is assigned with a subsolution, the neighboring solu- tions ofXi
3.2 The MOLA Method 78
belong to setD(i). The set of the neighboring solutions ofXi
subis denoted as
setNei(Xi
sub), and it can be formulated as:
Nei(Xsubi ) ={Xsubj |j ∈D(i)} (3.2.14)
• With the neighborhood classification introduced above, each subsolutionXi sub
is updated based on its neighboring solutions. Two indexes will be randomly selected fromD(i). Suppose the two indexes are denoted asr1andr2. Then a new solution can be generated according to the following learning operation:
¯
y=Xi
sub+ 0.5×(Xsubr1 −Xsubr2 ) (3.2.15)
¯
y will be used to replace Xi
sub if its Tchebycheff value is smaller than that
ofXi
sub. Simultaneously,y¯will be used to update Xbest andL according to
(3.2.6) and (3.2.7) respectively if it is not dominated by any solution stored in L.