High-level feature selection - Recognition of genres and styles

5. Application of Feature Selection

5.2. Recognition of genres and styles

5.2.2. High-level feature selection

Since the detailed comparison of the LL and HL feature sets is discussed in Section5.2.3, we here only describe the study results for the classification based on high-level descriptors, as it was done for the LL feature set in the previous section.

Figure5.11plots the final ND solutions after EMO-FS, when only high-level features were used for classification. Two SVM ND solutions, one for Classic and one for HeavyMetal, are not plotted, because the corresponding models had both ms

BRE = 0.5 in those cases,

classifying all instances to one category.

The main tendencies are similar to the outcomes from the other studies:

• The large non-dominated fronts provide different trade-off solutions. However, the characteristics of these fronts are not the same: for example, for the category Classic relatively large feature sets using up to 13.07% of the features lead to the classification with the smallest ms

BRE. For ProgRock, the situation is similar (maximal

mSF R = 0.106). On the other side, for HeavyMetal it does not make sense to

increase the number of features above approximately 4% of the complete feature amount, and the number of solutions in the overall ND front across all classification methods is rather low. Similar trend can be observed for Rap.

• For all categories except one (ProgRock), at least three different classifiers contribute to the overall ND front.

• The complexities of the categories are very different: Classic is the easiest category, where at least one solution with ms_BRE < 0.02 is provided by each classifier. The most complex categories are Pop (smallest ms

BRE = 0.1186) and ClubDance (small-

est ms_BRE = 0.1252). A possible explanation is that Pop is a rather general genre: e.g., negative Pop examples songs, which belong to the categories Rap and R’n’B, can be in principle described also as popular, and may have several similar distributions of high-level characteristics as Pop songs. ClubDance is on the other side a very specific subgenre, which is more complex to distinguish from other music with strong beat impulses, e.g., dance pop or alternative rock.

As in the previous section, we first measure theincrease of the multi-objective per- formanceafter the optimisation. Figure5.12 plots the increase of the mean dominated

hypervolume on the holdout set. For all combinations of a classifier and a categorisation task, EMO-FS proves its general ability to create fronts with solutions which perform better w.r.t. both metrics on the independent holdout set. The increase of hypervolume is higher for ifr = 0.5, because the initial populations start with a significantly larger num-

ber of features. The large increase of hypervolume for SVM with ifr = 0.5 comes from

the poor performance of the linear kernel with default parameters on larger feature sets: here, all instances are assigned to the same category. It also means that the implemented multi-objective feature selection helps to strongly reduce this disadvantage of the linear kernel.

The increase of hypervolume is again confirmed as being significant by the Wilcoxon signed rank test for the following test setup (we repeat it from Section 5.2.1), and the p-values are equal to 0.002 in all cases:

5.2. Recognition of genres and styles 109

Figure 5.11.: The best ND fronts after genre and style recognition with the HL set. Circles: C4.5, squares: RF, diamonds: NB, triangles: SVM. The ND fronts for each classifier are indicated with thin lines. The ND fronts across all classifiers are indicated with thick lines, and the markers of the corresponding models are enlarged.

Figure 5.12.: Increase of the relative mean holdout dominated hypervolume after the optimisation. Circles: C4.5, squares: RF, diamonds: NB, triangles: SVM. Large markers: ifr= 0.5, small markers: ifr = 0.2.

• For a fixed classifier and ifr setting, denoted by the index i ∈ {1, ..., 8}, and a fixed

classification task, denoted by its index j ∈ {1, ..., 6}, let u(i, j, HL) be the vector of the initial dominated hypervolumes estimated on the holdout set for the experiments with the HL feature set, so that uk(i, j, HL) = SinitH (i, j, k, HL) corresponds to the

hypervolume value from the k-th statistical repetition, k ∈ {1, ..., 10}. Similarly, let v(i, j, HL) be the vector of the final dominated hypervolumes estimated on the holdout set, so that vk(i, j, HL) = S_{f in}H (i, j, k, HL).

• H0: u and v belong to the same probability distribution. • H1: The distributions are not equal.

The goal of the next investigation is to measure theincrease of the single-objective performance. Figure5.13plots the mean msBRE decrease of the best-msBRE solutions for

each statistical repetition, compared to the full feature sets. For almost all combinations of a category and a classifier, it is possible to achieve more than 20% reduction of the error. The first exception is the Classic category, which is characterised by smaller error decreases for C4.5 and RF. For Rap and C4.5, it even seems to be slightly preferable to use the complete feature set for classification. The design and integration of further high-level features, which have highly distinctive characteristics for Rap, might help to overcome this problem. And it should not be forgotten that classification with the complete feature set is significantly slower, requires higher storage demands for features and models, and the models have a higher tendency to be overfitted.

For the estimation of the significance of the error decrease, we repeat the application of the Wilcoxon signed rank test with the following setup:

• For a fixed classifier and ifr setting, denoted by the index i ∈ {1, ..., 8}, and a fixed

classification task, denoted by its index j ∈ {1, ..., 6}, let u(i, j, HL, Φbest) be the

vector of the smallest ms_BRE estimated on the holdout set for the experiments with the HL feature set, so that uk(i, j, HL, Φbest) = ms_BRE(i, j, k, HL, Φbest) corresponds

to the ms_BRE-best value from the k-th statistical repetition, k ∈ {1, ..., 10}. Similarly, let v(i, j, HL, Φall) be the vector of msBREestimated on the holdout set, if all features

are switched on, so that vk(i, j, HL, Φall) = ms_BRE(i, j, k, HL, Φall) (in this case,

5.2. Recognition of genres and styles 111

Figure 5.13.: Decrease of ms_BRE for the best-ms_BRE solution after the optimisation, compared to the error using the complete feature set. Circles: C4.5, squares: RF, diamonds: NB, triangles: SVM. Large markers: ifr = 0.5, small markers:

ifr = 0.2.

• H0: u and v belong to the same probability distribution. • H1: The distributions are not equal.

Since the error decrease rates were low or negative for several cases, it should not be expected that H0 will be rejected for all combinations of a classifier and a task. This is illustrated by Fig. 5.14, which plots the corresponding p-values. H0 is not rejected for 4 Rap experiments, and 3 Classic experiments. However, the overall H0 rejection rate is 83%.

Figure 5.14.: p-values after the Wilcoxon signed rank test, comparing the best ms_BRE solutions to solutions with the complete feature set (the exact test description is provided in text). H0 is rejected, if p < 0.05. p = 0.05 is marked with the thick horizontal line. Circles: C4.5, squares: RF, diamonds: NB, triangles: SVM. Large markers: ifr= 0.5, small markers: ifr = 0.2.

In document Improving supervised music classification by means of multi-objective evolutionary feature selection (Page 112-115)