SMS-EMOA customisation for feature selection

3. Feature Selection

3.2. Evolutionary feature selection

3.2.4. SMS-EMOA customisation for feature selection

The S-metric selection evolutionary multi-objective algorithm (SMS-EMOA) was intro- duced in [54]. It is a (µ + 1)-EA, which estimates hypervolume related metric for the individual selection, so that both the quality and the distribution of the solutions are evaluated. The original contribution of solution ai to S of the complete population is

measured as follows:

∆S(ai) = S(a1, ..., aPN D) − S(a1, ..., ai−1, ai+1, ..., aPN D). (3.7)

Figure3.3illustrates the difference between S and ∆S(ai). The filled area in the left sub-

figure corresponds to the hyperarea covered by the population of solutions. The solutions are marked with small squares. In the right subfigure, the ∆S(ai) areas correspond to the

large filled rectangles.

Figure 3.3.: Examples for the estimation of hypervolume (left) and ∆S(ai) (right). The

solutions in the objective space are marked with squares. The reference point is marked with an asterisk. The first two non-dominated fronts F1, F2 are

marked with thin lines in the right subfigure.

SMS-EMOA applies thefast non-dominated sorting[39] before selection. The solution fronts are built according to the Pareto dominance relation. At the beginning, the individuals, which are not Pareto dominated by any other solution, are assigned to the first front. Then, the same procedure is applied on the remaining individuals, and it is repeated until the complete population is assigned to fronts. The right subfigure of Fig.3.3

shows two fronts F1, F2, marked with thin lines.

The SMS-EMOA selection operator removes the individual j with the smallest ∆S(aj)

from the worst front. The advantage of this method is that with an increasing number of objectives the number of the non-comparable solutions according to the Pareto dominance

3.2. Evolutionary feature selection 63

relation increases strongly, but it is still possible to estimate ∆S(aj) for the comparison

of solutions and to do it in an efficient way [10]4.

For a solution representation, it was self-evident to use an F -dimensional bit vector q, where qj = 1, if the feature Xj has to be selected, and qj = 0 otherwise (j ∈ {1, ..., F }).

As a mutation operator, we integrated theasymmetric bit flip, where the probability

of switching a bit is equal to:

pq(j) =

F · (|qj − p01|), (3.8)

where γ controls the general mutation probability and is equal to the expected number of flips during an offspring generation for the symmetric variant of the bit flip mutation. In the asymmetric mutation, the probability for a bit flip is reduced by |qj− p01|, as proposed

in [91]. p01 controls the probability of a zero-to-one switch. The probability of a one-to-

zero switch is set to p10= 1 − p01. Because we try to discard as many irrelevant, noisy, and

redundant features as possible, it is reasonable to set p01 p10. In our previous studies,

p01∈ {0.01; 0.1} performed quite well [217,219].

As the first crossover operator, we implemented a uniform crossover (UC), which selects each bit value either from the first or from the second parent with equal probability. The second operator was a commonality-based crossover (CBC), which was proposed for FS in [53]. Here, the non-shared bits of both parents are inherited from the parent k with the probability

pc(k) =

nk− nc

(3.9)

(nk is the number of ones for parent k, nc is the number of the shared ones for both

parents, and nu is the number of non-shared ones for both parents).

However, in [219] we could not observe any significant advantages of both UC and CBC operators. Therefore, in the further studies, which are described in Sections5.1.2 to5.2, we left out the crossover.

Besides, we have experimented with different settings of the other SMS-EMOA parameters: • _{Initial feature rate} ifr controls the expected number of features in the first

population after the initialisation. Here, each bit is set to one with the probability ifr, and we used ifr ∈ {0.05; 0.2; 0.5}. In [219], we observed that this parameter

played a role together with a classifier: SVM performed worse for lower ifr values,

and this behaviour was not observed for other classifiers. In general, it is hard to provide an exact recommendation for this parameter. Small ifr values correspond

to solutions with larger hypervolumes at the beginning. This situation may be sometimes advantageous but may also lead to a fast convergence to a local optima. Therefore, we used two or three different ifr values in further studies.

4_{In case of four and more objectives, the related optimisation problems are referred to as many-objective}

[89]. Such scenarios are currently unexplored for music classification. They can be reasonable, if several conflicting metrics listed in Sections4.1.1and4.1.2are considered.

• Population size µ should be large enough to provide a good distribution of solu-

tions, and it was set to 30 for instrument recognition described in Section 5.1.1and increased to 50 for other experiments which are described in Sections 5.1.2to5.2. • As a_{stopping condition} we have chosen the number of SMS-EMOA generations,

which was set after the preliminary experiments to 2,000 for the studies described in Section5.1and to 3,000 for the recognition of genres and styles which are discussed in Section 5.2. Setting this number to higher values may lead to a further increase of the classification performance, but on the other side to larger computing time requirements.

It is important to mention that a more exhaustive search for the optimal parameter settings was beyond the scope of our study. It is indeed reasonable to make more investigations in that direction in future.

In document Improving supervised music classification by means of multi-objective evolutionary feature selection (Page 66-68)