• No results found

6.4 Experimental study of our fuzzy multi-instance classifiers

6.4.2 The IFMIC family

As described in Section6.3.1.1, our instance-based fuzzy classifiers depend on three parame- ters, namely the ways to compute (i) the affinity B(x) of an instance x with a bag B, (ii) the membership degreeC(x) of instance x to classC and (iii) the membership degreeC(X) of bagX to classC. Each can be set to one of five alternatives (Max, MaxExp, MaxInvadd, MaxAdd and Avg).

We use three evaluation measures: the overall accuracy, the accuracy on class 0 (negative class) and the accuracy on class 1 (positive class). Table 6.6 lists the five highest scoring IFMIC methods for each measure based on their average performance across the datasets in Table6.5. To complement these results, the worst obtained mean values for these measures are included as well. A striking point is that each classifier in the top five for each evaluation

Table 6.6: Five best performing IFMIC methods for each evaluation measure. The results are taken as averages over the datasets in Table 6.5. The bottom line lists the methods with the lowest obtained mean results for these measures.

Classifier Acc cl0 Classifier Acc cl1 Classifier Accuracy Max-MaxExp-MaxInvadd 0.8068 Max-Avg-MaxAdd 0.8918 Max-MaxExp-MaxAdd 0.8168 Max-MaxExp-MaxAdd 0.8068 Max-Avg-MaxExp 0.8893 Max-MaxExp-MaxInvadd 0.8155 Max-MaxExp-Avg 0.8039 Max-Avg-MaxInvadd 0.8886 Max-MaxExp-Avg 0.8153 Max-Max-Avg 0.8022 Max-MaxAdd-MaxAdd 0.8808 Max-MaxInvadd-Avg 0.8128 Max-Max-MaxAdd 0.7991 Max-MaxAdd-MaxInvadd 0.8802 Max-MaxInvadd-MaxAdd 0.8121 Avg-Avg-Max 0.5024 Avg-Max-Avg 0.5971 Avg-Avg-Max 0.6502

measure uses the Max alternative for B(x). With respect to the C(x) step, the preference for a procedure closer to a maximum (Max or MaxExp) or to an average (Avg or MaxAdd) seems to depend on the particular class, as evident from the different high-ranking versions for the class 0 and class 1 accuracy. Based on the overall accuracy, MaxExp seems to be the preferred choice. A preference for one of theC(X) alternatives is less clear-cut based on the results included in Table 6.6. We study and explain the differences in performance of the possible settings in the following sections.

6.4.2.1 Instance-to-bag affinity B(x)

The mean results obtained by the five alternatives for B(x) are listed in Table 6.7. These results were derived as averages over the datasets in Table6.5and all IFMIC methods using a particular setting. For example, the results for Avg are taken as the mean value across the 25 included IFMIC-Avg-*-* methods. The reported standard deviations are taken across these methods as well.

Table 6.7: Setting rankings of theB(x) alternatives of the IFMIC methods.

B(x) Acc cl0 B(x) Acc cl1 B(x) Accuracy

Max 0.7175±0.0789 Max 0.8324±0.0524 Max 0.7786±0.0304 MaxAdd 0.6904±0.0679 MaxExp 0.8034±0.0550 MaxExp 0.7537±0.0276 MaxInvadd 0.6839±0.0723 MaxInvadd 0.7795±0.0699 MaxInvadd 0.7459±0.0322 MaxExp 0.6803±0.0794 MaxAdd 0.7605±0.0667 MaxAdd 0.7381±0.0298 Avg 0.6729±0.0729 Avg 0.7248±0.0682 Avg 0.7121±0.0275

As was already evident from the results listed in Table6.6, the Max setting is clearly preferred. It attains the highest mean accuracy, both overall as well as on the two classes separately. The performance of Avg is lowest for all measures and that of the OWA alternatives is found between Max and Avg. This indicates that there is no point in softening the strict maximum (and thereby bringing it closer to an average) in the calculation ofB(x).

TheB(x) step determines the membership of an instancexto a bagB. The results show that the most similar instanceyB contains the most information. Involving all instances inBin this calculation, either by assigning them all equal weights (Avg) or not (MaxAdd, MaxInvadd, MaxExp), deteriorates the performance. This phenomenon is explained as follows. In a multi-instance dataset, the variety between instances in a bag can be quite large. Indeed, considering the standard two-class multi-instance hypothesis that states that a bag is positive when at least one of its instances belongs to the positive class, a positive bag can both contain

instances affiliated with the positive concept and instances affiliated with the negative concept. Assume that we draw two instancesx1 andx2 from bag B that are affiliated with the positive and negative class respectively and that, based on their feature values, their similarity is (unsurprisingly) low. If we involve value RI(x1, x2) in the calculation ofB(x1) and B(x2), it

would unjustifiably lower the results, even thoughx1andx2both belong toBand their affinity with the bag should be high. We also observe that the OWA approaches, as intermediate options between the average and maximum, do not provide an advantage over the strict maximum and we conclude that they are not useful for the estimation ofB(x).

6.4.2.2 Instance-to-class membership degree C(x)

Table6.8 lists the results for the five alternatives for the C(x) calculations, obtained in the same way as described in the previous section. MaxInvadd has the highest mean accuracy. We observe that this is due to its more or less balanced performance on the two classes, that is, it does not obtain extremely good or poor results on one of them. The Max and Avg alternatives sit at the bottom of the accuracy table, which is due to their inferior performance on one class (low class 0 accuracy for Avg, low class 1 accuracy for Max). MaxInvadd attains the best trade-off between the maximum and average aggregations.

Table 6.8: Setting rankings of theC(x) alternatives of the IFMIC methods.

C(x) Acc cl0 C(x) Acc cl1 C(x) Accuracy

Max 0.7579±0.0301 Avg 0.8434±0.0342 MaxInvadd 0.7696±0.0308 MaxExp 0.7539±0.0331 MaxAdd 0.8272±0.0415 MaxExp 0.7555±0.0375 MaxInvadd 0.7142±0.0304 MaxInvadd 0.8010±0.0442 MaxAdd 0.7423±0.0317 MaxAdd 0.6383±0.0322 MaxExp 0.7332±0.0513 Max 0.7376±0.0358 Avg 0.5807±0.0319 Max 0.6957±0.0549 Avg 0.7235±0.0285

We need to explain why Avg works better for class 1, while class 0 prefers Max. The largest differences can be found on the TREC and WIR datasets from Table6.5, which respectively belong to the text classification and web mining domains. On this group of ten datasets, the average class 0 accuracies are 0.3125 (Avg) and 0.7233 (Max), while the average class 1 accuracies are 0.9557 (Avg) and 0.7683 (Max). Based on these results, we only need to consider the behaviour of Avg, since Max achieves a more or less balanced performance on the two classes. It warrants an explanation why Avg assigns the class 1 label more easily than the class 0 label for these text datasets.

We momentarily fix B(x) to Max (the favoured setting of this parameter as discussed in Section6.4.2.1) and consider the distribution of the average B(x) values. These values (and therefore theC(x) values as computed by Avg) of instances belonging to class 0 bags are close together for bagsB of class 0 and class 1 and are even a little higher for those of class 1. On the other hand, for instances belonging to class 1 bags, the opposite occurs: their average B(x) values are higher for class 1 than class 0. As a result, we can expect the membership degree to class 1 to be higher for all instances, regardless of the bag label of their parent bag. Clearly, there is an attracting force in class 1 for the ten text datasets. Recall that class 1 is the positive class and the characteristics of the text datasets may lead to an attraction of this class when Avg is used to compute the membership degree of instances to classes. We can also note that these datasets further stand out as having a relatively low similarity of

instances in different bags, that is, instances in a bagBare usually more similar to each other than they are to instances in other bags. We observe the same characteristics for the image datasets Elephant, Fox and Tiger, where the difference in class accuracy results is present as well, albeit less pronounced. Finally, we should note that the observed behaviour can not be solely due to our choice of instance similarity relation RI(·,·). All these datasets have

far more than 20 features and are therefore processed with the cosine similarity, but other datasets using the cosine similarity do not exhibit this behaviour. Instead, the positive class of these datasets must have some sort of attracting property, which results in an overly easy assignment of the class 1 label by Avg.

The overall preference of MaxInvadd can be deduced from its ornessvalue, which places it between Max and Avg, but closer to the former. The observant reader will have noted that the first two columns in Table 6.8 rank the C(x) settings according to their orness value, namely in decreasing order for the class 0 accuracy and in increasing order for the class 1 accuracy. The MaxInvadd setting achieves the best trade-off between a high class 1 accuracy and an acceptable class 0 accuracy.

6.4.2.3 Bag-to-class membership degree C(X)

Table 6.9 lists the results for the five alternatives for the C(X) calculations, once again obtained by averaging the results of all IFMIC methods using a particular setting. The average-related options yield the best performance. In particular, the Max setting provides clearly inferior results compared to the other alternatives. It is interesting to note that the ranking obtained for the overall accuracy is exactly the opposite from the one observed for the B(x) calculations in Table 6.7. The C(X) membership degrees are aggregations of the C(x) values for all instances xX. It is reasonable to expect that all instances in X should contribute (more or less) equally to this calculation for a proper class estimation. The experimental results confirm this.

Table 6.9: Setting rankings of theC(X) alternatives of the IFMIC methods.

C(X) Acc cl0 C(X) Acc cl1 C(X) Accuracy

Avg 0.7244±0.0647 MaxAdd 0.8002±0.0677 Avg 0.7680±0.0290 MaxAdd 0.7008±0.0721 MaxInvadd 0.7926±0.0705 MaxAdd 0.7624±0.0285 MaxInvadd 0.6942±0.0737 Avg 0.7916±0.0731 MaxInvadd 0.7551±0.0293 MaxExp 0.6774±0.0736 MaxExp 0.7773±0.0709 MaxExp 0.7387±0.0294 Max 0.6483±0.0735 Max 0.7388±0.0643 Max 0.7042±0.0261

6.4.2.4 Conclusion

Overall, we can advise the use of the IFMIC-Max-MaxInvadd-Avg classifier. From the settings evaluated in this chapter, Max for B(x), MaxInvadd forC(x) and Avg for C(X) stand out as the best choices on average, for reasons discussed above. We include it in the global com- parison of our proposed methods to state-of-the-art multi-instance classifiers in Section6.7.