3. Feature Selection
3.2. Evolutionary feature selection
3.2.4. SMS-EMOA customisation for feature selection
The S-metric selection evolutionary multi-objective algorithm (SMS-EMOA) was intro- duced in [54]. It is a (µ + 1)-EA, which estimates hypervolume related metric for the individual selection, so that both the quality and the distribution of the solutions are evaluated. The original contribution of solution ai to S of the complete population is
measured as follows:
∆S(ai) = S(a1, ..., aPN D) − S(a1, ..., ai−1, ai+1, ..., aPN D). (3.7)
Figure3.3illustrates the difference between S and ∆S(ai). The filled area in the left sub-
figure corresponds to the hyperarea covered by the population of solutions. The solutions are marked with small squares. In the right subfigure, the ∆S(ai) areas correspond to the
large filled rectangles.
Figure 3.3.: Examples for the estimation of hypervolume (left) and ∆S(ai) (right). The
solutions in the objective space are marked with squares. The reference point is marked with an asterisk. The first two non-dominated fronts F1, F2 are
marked with thin lines in the right subfigure.
SMS-EMOA applies thefast non-dominated sorting[39] before selection. The solu- tion fronts are built according to the Pareto dominance relation. At the beginning, the individuals, which are not Pareto dominated by any other solution, are assigned to the first front. Then, the same procedure is applied on the remaining individuals, and it is repeated until the complete population is assigned to fronts. The right subfigure of Fig.3.3
shows two fronts F1, F2, marked with thin lines.
The SMS-EMOA selection operator removes the individual j with the smallest ∆S(aj)
from the worst front. The advantage of this method is that with an increasing number of objectives the number of the non-comparable solutions according to the Pareto dominance
3.2. Evolutionary feature selection 63
relation increases strongly, but it is still possible to estimate ∆S(aj) for the comparison
of solutions and to do it in an efficient way [10]4.
For a solution representation, it was self-evident to use an F -dimensional bit vector q, where qj = 1, if the feature Xj has to be selected, and qj = 0 otherwise (j ∈ {1, ..., F }).
As a mutation operator, we integrated theasymmetric bit flip, where the probability
of switching a bit is equal to:
pq(j) =
γ
F · (|qj − p01|), (3.8)
where γ controls the general mutation probability and is equal to the expected number of flips during an offspring generation for the symmetric variant of the bit flip mutation. In the asymmetric mutation, the probability for a bit flip is reduced by |qj− p01|, as proposed
in [91]. p01 controls the probability of a zero-to-one switch. The probability of a one-to-
zero switch is set to p10= 1 − p01. Because we try to discard as many irrelevant, noisy, and
redundant features as possible, it is reasonable to set p01 p10. In our previous studies,
p01∈ {0.01; 0.1} performed quite well [217,219].
As the first crossover operator, we implemented a uniform crossover (UC), which selects each bit value either from the first or from the second parent with equal probability. The second operator was a commonality-based crossover (CBC), which was proposed for FS in [53]. Here, the non-shared bits of both parents are inherited from the parent k with the probability
pc(k) =
nk− nc
nu
(3.9)
(nk is the number of ones for parent k, nc is the number of the shared ones for both
parents, and nu is the number of non-shared ones for both parents).
However, in [219] we could not observe any significant advantages of both UC and CBC operators. Therefore, in the further studies, which are described in Sections5.1.2 to5.2, we left out the crossover.
Besides, we have experimented with different settings of the other SMS-EMOA parameters: • Initial feature rate ifr controls the expected number of features in the first
population after the initialisation. Here, each bit is set to one with the probability ifr, and we used ifr ∈ {0.05; 0.2; 0.5}. In [219], we observed that this parameter
played a role together with a classifier: SVM performed worse for lower ifr values,
and this behaviour was not observed for other classifiers. In general, it is hard to provide an exact recommendation for this parameter. Small ifr values correspond
to solutions with larger hypervolumes at the beginning. This situation may be sometimes advantageous but may also lead to a fast convergence to a local optima. Therefore, we used two or three different ifr values in further studies.
4In case of four and more objectives, the related optimisation problems are referred to as many-objective
[89]. Such scenarios are currently unexplored for music classification. They can be reasonable, if several conflicting metrics listed in Sections4.1.1and4.1.2are considered.
• Population size µ should be large enough to provide a good distribution of solu-
tions, and it was set to 30 for instrument recognition described in Section 5.1.1and increased to 50 for other experiments which are described in Sections 5.1.2to5.2. • As astopping condition we have chosen the number of SMS-EMOA generations,
which was set after the preliminary experiments to 2,000 for the studies described in Section5.1and to 3,000 for the recognition of genres and styles which are discussed in Section 5.2. Setting this number to higher values may lead to a further increase of the classification performance, but on the other side to larger computing time requirements.
It is important to mention that a more exhaustive search for the optimal parameter settings was beyond the scope of our study. It is indeed reasonable to make more investigations in that direction in future.