The process of searching and learning

3.2 The MOLA Method

3.2.3 The process of searching and learning

The calculation of MOLA consists of two parts: the process of searching and the process of learning from neighborhood. They are introduced in the following two subsections respectively.

The process of searching

The process of searching is carried out through a tournament that is held between two types of search, Pareto global search and Pareto local search. The rule of the tournament will be introduced in Section 3.2.4. In this subsection, Pareto global search and Pareto local search are introduced respectively as follows:

Pareto global search (PGS): Starting with the first stateX, by selecting automa- toni, at dimensional statexi, a search is undertaken by calculating the path values

3.2 The MOLA Method 74

of two possible paths using the cell values obtained in the previous iterations, as given in (3.2.1), and then movingxi towards a path, which is selected according to the probability calculated using the path values, as in (3.2.2), with a step lengthη, which is determined by the probability calculated from the cell values of the cells locating on the path, as in (3.2.3). A reinforcement signal is generated using dimensional state xi, according to (3.2.5). Then the estimated path values and the

reinforcement signal are used to update the cell value ofci,j, in which dimensional statexi locates, using (3.2.4). In this case the objective function values of the mul- tiple objective functions are calculated once, and the cell value of V(ci,j)_|xi∈ci,j is updated to replace the cell value that was stored previously when any dimensional state located within cellci,jwas visited. At the same time,Xbest and elite list_Lare updated according to (3.2.6) and (3.2.7) respectively. If the reinforcement signal is not equal to 1, the current iteration ends and the next automaton is selected in order for MOLA computation to continue. Otherwise searching on this path is regarded worthy, as the cells locating on this path could be very sensitive and need more exploitation. In order to further exploit the potential of the dimensional states located on this path, the search action is taken continuously forIemax times in the same way presented in (3.2.1), (3.2.2), (3.2.3), (3.2.5), (3.2.4), (3.2.6) and (3.2.7). An iteration completes after the exploitation, which has updatedXbestand all values of the cells

visited, before the next automaton is selected. After all automata are sequentially selected once, in order to increase the diversity and fully explore the probability of finding the Pareto set, different perturbations are introduced to the non-dominated solutions stored in the elite list, according to the following rule:

X′ _←X+ ∆ +β(Xbest₋X) (3.2.9)

where∆i = sign(κ)ζ(xmax,i−xmin,i), ζ is a random variable and ζ ∈ [0,_Dk]. The

sign function is used to choose the moving direction ofxi. Supposexi is located in cellci,j, then the input to the sign function is the subtraction of the two adjacent cell values ofci,j, that is,κ= (V(ci,j+1)₋V(ci,j−1)). Ifκ≥0,sign(κ) = 1, otherwise,

sign(κ) = ₋1. It can be seen that xi moves towards the direction in which the adjacent cell has a larger cell value. Once completing the perturbation operation, the obtained solutions are used to updateXbest and_L, according to (3.2.6) and (3.2.7)

Minimize _f> M inim iz e f ? d@ dA d> dB Obtained solutions dA C> dD F*

Figure 3.2: The illustration of findingF∗

respectively.

Pareto local search (PLS): PLS is similar to PGS with the exception of the following two aspects:

1. The perturbation operation used in PGS,i.e.(3.2.9), is not performed in PLS. 2. The calculation of reinforcement signal is different from that used in PGS. To obtain the reinforcement signal, a target objective function vector,F∗_{, which} is regarded as the aim of the PLS, should be set at the beginning of the search.

F∗_{can be determined by the following method. First, the Euclidean distance} between every two consecutive objective function vectors is calculated, as given in Fig. 3.2, in which solid circles denote the objective function vectors of the solutions stored in_L. The pair of the two consecutive objective function vectors, which has the largest Euclidean distance, is taken as two base vertexes to create an isoceles triangle, in which the height is half of the base length. Then the third vertex of the triangle is selected asF∗_{, as given in Fig. 3.2.} During the process of Pareto local search, the fitness value of state X(x′

can be obtained by calculating the Euclidean distance between F∗ _{and the}

objective function vector of the solution, i.e. F(X(x′

i)). For simplicity, the

Euclidean distance betweenF∗_and_F₍_X₍_x′

3.2 The MOLA Method 76

Then a reinforcement signal is generated by the following rule:

r(X(x′_i)) = (

1 if d(F∗_{, F}₍_X₍_x′

i)))≤d(F∗, F(Xlbest))

0 otherwise (3.2.10)

where Xlbest (called the local best solution) denotes the latest best solution

found during the process of Pareto local search. In other words, among all the solutions found during the process of Pareto local search, the objective function vector of solutionXlbest is closest toF∗_{. It can be seen that} _r _{= 1}_rep- resents a favorable response, as the objective function vector of the obtained solution X(x′

i) is closer to F∗ than that of Xlbest; r = 0 is an unfavorable

response.

The relationship between stateX(x′

i)andXlbest can be described by the fol-

lowing equation: Xlbest_← ( X(x′ i) if r= 1 Xlbest otherwise (3.2.11) whereX(x′

i) = [x1, . . . , xi−1, x′i, xi+1, . . . , xN]. In the process of Pareto local

search,Xlbestis updated according to (3.2.11), and at the same time,Xlbestis used to updateXbest andLaccording to (3.2.6) and (3.2.7) respectively.

The process of learning

Suppose the current state, X, is one of the non-dominated solutions stored in elite list_L, as shown in Fig. 3.3, which plots the objective function vectors,F(_·), of the non-dominated solutions in the objective space. The solutions whose objective function vectors locate at the adjacent area ofF(X),e.g.areaE1, are the neighbor-

ing solutions ofX. It is believed that the neighboring solutions of X carry more

useful information that can benefit the search around X, and they can make more

contributions than remote solutions,e.g.the solutions whoseF(_·)locates at areaE2.

Here, a method of clarifying the relationship of neighborhood among the non- dominated solutions is given first. Then the operation of learning from neighborhood can be performed based on the obtained solutions.

E E F G H F G IJKLMNO P L Q R S T U V W XY U Z [\]^ _` _a bcdefgdh O ei NL g

Figure 3.3: The illustration ofX’s neighborhood

• To find out the relationship among the non-dominated solutions, an uniform

spread ofM weight vectors, (W1,_{· · ·} ,Wi,_{· · ·},WM), whereWi = [w1,_{· · ·} , wmf]andP(W) = 1, are initialized at the beginning of the learning process

[103]. Then the Euclidean distances between any two weight vectors are calculated, and theM/4closest weight vectors to each weight vector are found

according to the Euclidean distances. Let the indexes of the M/4 closest

weight vectors toWibe denoted as setD(i).

Each weight vector,Wi (i = 1,2,_{· · ·} , M), is associated with a subsolution, denoted asXi

sub, which is selected fromLaccording to Tchebycheff rule as

follows:

X_subi = argmin_X_∈_Lgte(X_|Wi, z∗) (3.2.12) where Tchebycheff value is defined as:

gte(X_|Wi, z∗) = max

1≤j≤mf{wj

|fj(X)₋zj∗ |

fj,range } (3.2.13)

andz∗ _{= (}_z∗

1,· · ·, zj∗,· · · , zmf∗ )is the reference point,i.e. zj∗ = min{fj(X)| X _∈L};fj,range is the estimated range offj,i.e. fj,range = max_{fj(Y)_|Y _∈

L} −min_{fj(X)_|X _∈L}.

After each weight vector is assigned with a subsolution, the neighboring solutions ofXi

3.2 The MOLA Method 78

belong to setD(i). The set of the neighboring solutions ofXi

subis denoted as

setNei(Xi

sub), and it can be formulated as:

Nei(X_subi ) =_{X_subj _|j _∈D(i)_} (3.2.14)

• With the neighborhood classification introduced above, each subsolutionXi sub

is updated based on its neighboring solutions. Two indexes will be randomly selected fromD(i). Suppose the two indexes are denoted asr1andr2. Then a new solution can be generated according to the following learning operation:

y=Xi

sub+ 0.5×(Xsubr1 −Xsubr2 ) (3.2.15)

y will be used to replace Xi

sub if its Tchebycheff value is smaller than that

ofXi

sub. Simultaneously,y¯will be used to update Xbest andL according to

(3.2.6) and (3.2.7) respectively if it is not dominated by any solution stored in L.

In document Multi-objective optimisation using learning automata and its applications in power systems (Page 91-96)