5.2 Fuzzy rough set based classifiers and self-labelling
5.2.2 Interaction with self-labelling schemes
As a next step in our evaluation, we integrate the OWA based lower approximation classifier with our weighting scheme selection guidelines from Section 3.3 in the well-performing self- labelling schemes discussed in Section 5.1.1. This step is performed to evaluate whether our classifier benefits from self-labelling at all or whether it suffices to only explicitly use the information inL to derive confident predictions. We include the following classification models:
• Lower: the method evaluated in the previous section. It uses the OWA based fuzzy rough lower approximation to classify instances. Within this operator, the method uses our weighting scheme selection strategy proposed in Section 3.3.
• SelfTr(FR), CoTr(FR), TriTr(FR), CoBag(FR): the standard self-training [462], co- training [51], TriTraining [502] and CoBagging [201, 202] methods using the above fuzzy rough classifier as base classifier during self-labelling as well as final classifier. • CoTr(SMO)+FR, TriTr(C45)+FR, CoBag(C45)+FR: during the self-labelling phases,
these methods coincide with the ones listed in Section 5.1.1, namely the co-training algorithm with SMO as base classifier, the TriTraining method with C4.5 as base clas- sifier and the CoBagging method with C4.5 as base classifier respectively. The final classification of U and the test set is performed by the fuzzy rough classifier.
• FR-SSL: the study of [311] proposed a naive fuzzy rough self-labelling method that labels all instances in U by using the fuzzy rough approximation operators. This is a straightforward fuzzy rough set based self-labelling approach, which always results, as opposed to the other self-labelling methods listed above, in a fully labelled training set. We use our classifiers from Chapter3in these labelling steps in order to verify how they interact with approach.
Table 5.4: Mean balanced accuracy results of the included self-labelling methods and clas- sifiers. For algorithms containing random components, we report the standard deviation of the mean balanced accuracy across ten runs. For each setting, the highest mean value is printed in boldface.
Transductive 10% 20% 30% 40% Lower 0.6396 0.6807 0.6987 0.7142 SelfTr(FR) 0.6197 0.6602 0.6831 0.6972 CoTr(FR) 0.6182±0.0106 0.6611±0.0094 0.6838±0.0079 0.6975±0.0082 CoTr(SMO)+FR 0.6170±0.0107 0.6530±0.0095 0.6730±0.0078 0.6860±0.0089 TriTr(FR) 0.6302±0.0072 0.6726±0.0054 0.6942±0.0045 0.7086±0.0049 TriTr(C45)+FR 0.6280±0.0100 0.6672±0.0083 0.6919±0.0059 0.7044±0.0060 CoBag(FR) 0.6156±0.0091 0.6546±0.0111 0.6820±0.0082 0.7008±0.0079 CoBag(C45)+FR 0.6227±0.0099 0.6587±0.0109 0.6860±0.0081 0.7038±0.0076 FR-SSL 0.5123 0.5466 0.5715 0.5930 Inductive 10% 20% 30% 40% Lower 0.6447 0.6847 0.7024 0.7208 SelfTr(FR) 0.6234 0.6577 0.6841 0.7004 CoTr(FR) 0.6231±0.0136 0.6644±0.0103 0.6890±0.0094 0.7040±0.0092 CoTr(SMO)+FR 0.6233±0.0145 0.6609±0.0115 0.6854±0.0088 0.6969±0.0094 TriTr(FR) 0.6323±0.0093 0.6742±0.0076 0.6965±0.0070 0.7139±0.0052 TriTr(C45)+FR 0.6379±0.0124 0.6769±0.0107 0.6976±0.0077 0.7111±0.0072 CoBag(FR) 0.6211±0.0127 0.6607±0.0124 0.6854±0.0107 0.7062±0.0107 CoBag(C45)+FR 0.6282±0.0138 0.6647±0.0124 0.6887±0.0105 0.7087±0.0110 FR-SSL 0.5050 0.5521 0.5672 0.5915
For the self-labelling methods apart from FR-SSL, we use the same parameter settings as in [406]. The combinations CoTr(SMO), TriTr(C45) and CoBag(C45) were put forward as the best-performing self-labelling alternatives in that study. In this evaluation, we modify these settings in two ways. In a first version, we replace their base classifier by the ‘Lower’ method (represented by CoTr(FR), TriTr(FR) and CoBag(FR)). The second version corresponds to the CoTr(SMO), TriTr(C45) and CoBag(C45) methods themselves, in which we have only replaced the final classifier by our fuzzy rough method. In the next section, we will compare the results of the fuzzy rough methods with the original methods used in [406] as well. Note that we do not include the SEGSSC framework from [405] at this point, since we first wish to study the precise interplay of our fuzzy rough classifier with the pure self-labelling techniques without the results being affected by the further interaction between SEGSSC and self-labelling. We also do not test the third possible modification of the self-labelling methods evaluated in [406], wherein we would use the fuzzy rough method as base classifier and maintain SMO or C4.5 as final classifier, since this evaluation carries no information on the effect of self-labelling on the fuzzy rough set based classifier.
The mean results are reported in Table 5.4 and the accompanying statistical analysis can be found in Table 5.5. We use the Wilcoxon test to compare the performance of our OWA based fuzzy rough lower approximation classifier with and without self-labelling, as we wish to verify whether it benefits from a self-labelling step. The results in Table 5.4 indicate
that this may not be the case. For both the transductive and inductive performance, for all evaluated percentages of labelled instances in the training set, the highest average result is obtained by the classifier without self-labelling. It only uses the information inLto classify the instances inU and the test set and outperforms other methods that have extended the set L by labelling some additional training instances. The statistical analysis in Table5.5confirms that no increase in classification performance of our fuzzy rough method is obtained after self-labelling. Instead, almost all pairwise comparisons show that a statistically significant performance drop follows from trying to extend the set of labelled training instances. Only for the TriTr(C45)+FR combination do we not observe significant differences, although the highR+values for the classifier without self-labelling again express that the TriTraining step does more harm than good. We also note that the performance of the FR-SSL method is particularly poor. It labels the full setU, which is clearly too extreme an option and results in a considerable and significant decrease in prediction performance.
Table 5.5: Results of the Wilcoxon test comparing our fuzzy rough classifier ‘Lower’ to the other algorithms in Table 5.4 in the format ‘R+/R−/p’. The R+ value always corresponds to the fuzzy rough method. P-values implying statistically significant differences at the 5% significance level are printed in boldface.
Transductive 10% 20% 30% 40% SelfTr(FR) 346.5/118.5/0.018219 382.0/83.0/0.00198 376.5/88.5/0.002813 367.5/97.5/0.005088 CoTr(FR) 398.5/66.5/0.000616 406.5/58.5/0.000332 392.5/42.5/0.000136 431.0/34.0/0.000043 CoTr(SMO)+FR 426.0/39.0/0.000066 435.0/30.0/0.00003 435.0/30.0/0.00003 412.0/53.0/0.000214 TriTr(FR) 384.0/81.0/0.00177 416.0/49.0/0.000154 385.0/80.0/0.00165 414.0/51.0/0.000182 TriTr(C45)+FR 297.0/168.0/0.180119 319.5/145.5/0.071174 240.5/194.5/0.610386 275.0/16.0/0.202373 CoBag(FR) 446.0/19.0/0.000011 463.0/2.0/0.000002 455.0/10.0/0.000005 453.0/12.0/0.000005 CoBag(C45)+FR 353.0/112.0/0.012819 418.0/47.0/0.00013 417.0/48.0/0.000142 407.0/58.0/0.000319 FR-SSL 436.0/29.0/0.000027 438.0/27.0/0.000023 438.0/27.0/0.000023 408.0/57.0/0.000295 Inductive 10% 20% 30% 40% SelfTr(FR) 375.0/90.0/0.003269 372.0/63.0/0.000803 337.0/98.0/0.009465 372.5/62.5/0.000748 CoTr(FR) 372.5/62.5/0.000748 354.0/81.0/0.002974 363.5/71.5/0.001447 386.5/48.5/0.000247 CoTr(SMO)+FR 390.0/75.0/0.001155 389.0/76.0/0.001241 388.0/77.0/0.001334 418.0/47.0/0.00013 TriTr(FR) 402.0/63.0/0.000471 333.0/132.0/0.037764 313.0/152.0/0.095706 414.0/51.0/0.000182 TriTr(C45)+FR 309.5/155.5/0.110008 293.0/172.0/0.206079 242.5/192.5/0.57828 308.5/156.5/0.114673 CoBag(FR) 449.0/16.0/0.000008 442.0/23.0/0.000016 451.0/14.0/0.000007 444.0/21.0/0.000013 CoBag(C45)+FR 377.0/88.0/0.00286 415.0/50.0/0.000167 422.0/43.0/0.000093 411.0/54.0/0.000232 FR-SSL 432.0/33.0/0.000039 426.0/39.0/0.000066 416.0/49.0/0.000154 403.0/62.0/0.000436
Starting from overall sparse training data, a self-labelling process creates denser regions of same-class elements. The average similarity of labelled elements with other labelled instances of the same class increases, while the average similarity with labelled elements in opposite classes decreases. As our fuzzy rough classifier heavily relies on instance similarity values in its class predictions, the creation of such class islands in feature space can result in misclassifica- tion errors in sparser regions. Our method performs better on the original datasets, where the sparsity is distributed more evenly across the feature space. In fact, the question whether or not knowledge of unlabelled training instances truly improves the performance of a classifier trained on semi-supervised data is an important and ongoing topic of discussion in the litera- ture. Several studies have shown that genuine semi-supervised classifiers can be outperformed by supervised methods trained on the labelled data only (e.g. [37,82,266,283,285,383,486]). In a way, this is not altogether surprising, as the self-labelling process shows a close relation
to imputation and the true signal present in the labelled part of the training set may be diminished or averaged out in this process, rendering prediction more cumbersome.