CHAPTER 4: THE PROPOSED FUZZY-ENSEMBLE METHOD
5.1 Fuzzy rule-based system
5.1.2 Selection of a suitable fuzzy rule-based system among non-dominated
As we previously stated, in real situations, the user has to select a solution based on his/her preferences among non-dominated solutions. For our case, we prefer to select the fuzzy rule-based systems that have the best classification ability which are generally, but not consistently, the fuzzy systems with the highest number of rules and antecedent conditions (i.e. the least interpretable). To adopt consistent criteria, we
University
141
choose the fuzzy rule-based system with the highest number of rules as it is likely to be the most accurate solution.
The selection process of these fuzzy systems are as the following. Since, for each data set, we have run 100 training and testing experiments (10 iterations of ten-cross- validation procedure or simply 10×10cv), all the solutions for training data are sorted in descending order by two criteria: firstly by the number of rules and subsequently, in case of tie, the number of antecedent conditions. Then one solution or one fuzzy rule- based system, which comes at the top of the sorted solutions, is selected for each training data and after that used to calculate the corresponding testing accuracy. Finally, the average error rate is calculated by averaging the error rate values of the 100 runs. Table 5.13 shows the results of both Proposal1 and 2 which is quite similar to the results of Tables 5.11 and 5.12. The difference is due to the fact that it is not necessarily the fuzzy rule-based system which has the highest number of rules and antecedent conditions is the one which has the highest training accuracy. Table 5.13 confirms the previous results that Proposal2 achieved better results than Proposal1 for both training and testing data set. This indicates that using feature selection-based approach to select the “don’t care” antecedent conditions in the initial population may give better results compared to the use of random selection method used in Proposal1 and Original. Actually, this conclusion is consistent with previous findings which state that the start from a good initial population may help Genetic algorithm to find better solutions than if it starts from a randomly generated initial population (Grosan & Abraham, 2007). Since we have to select one fuzzy rule-based system in order to combine it with the ensemble method, so we chose Proposal2.
University
142
Table 5.13 Average error rates of the selected fuzzy rule-based systems
Proposal1 Proposal2
training testing training testing
Breast W 1.52 3.96 1.54 3.82 Diabetes 18.43 24.29 18.33 24.56 Glass 25.91 37.99 25.62 38.10 Heart C 32.50 47.41 31.35 46.03 Sonar 5.48 23.35 5.36 22.85 Wine 0.50 5.27 0.44 5.03 Average ranks 1.83 1.67 1.17 1.33
5.1.3 Comparison between the selected fuzzy rule-based system and benchmark methods
One question may arise that selecting less interpretable fuzzy systems in terms of complexity can be a disadvantage for our solution. But our choice is justified by the observation that even the least interpretable fuzzy rule-based systems obtained from Proposal2 have modest numbers of rules and antecedent conditions.
To evaluate our choice, we make a comparison between the fuzzy rule-based systems obtained by proposal2 that have the highest number of rules and antecedent conditions and the commonly used fuzzy rule-based systems proposed in the literature. The comparison is made based on two criteria, namely, accuracy and interpretability expressed in terms of the number of rules and number of antecedent conditions per rule. The results listed in Tables 5.14 and 5.15 about the algorithms FURIA, SLAVE and CHI are obtained from (Hühn & Hüllermeier, 2009). In the aforementioned article, the authors proposed a fuzzy rule-based classifier called FURIA and they include SLAVE and CHI algorithms as benchmark methods for comparison. They estimated the error rates as the following: the data set was randomly split into 2/3 for training and 1/3 for testing. This process is repeated 100 times to stabilize the results. The results on Heart C data set were not reported because the authors used a different version of the data set
University
143
that has two classes instead of five classes as in our study. The other algorithm, we named Hybrid, is proposed in (Ishibuchi, Yamamoto, et al., 2005) and the results reported are calculated using the same method applied in our study, i.e. 10×10cv. In fact, Hybrid algorithm is a variant and early version of the Original algorithm. From Table 5.12, which summarizes the testing error rates, we notice that Proposal2 received the best rank with 1.67 followed by Hybrid with 1.83 then FURIA with 2.40.
If we take only Proposal2 and Hybrid and compare them in term of accuracy, we can see that Proposal2 outperformed Hybrid in 3 data sets (Diabetes, Heart C and Sonar) while it is inferior on the other 3 data sets (Breast W, Glass and Wine). Thus, we can say they are equal in term of accuracy. But as we can see from Table 5.15, Proposal2 has fewer rules than Hybrid in 5 out of 6 data sets, which indicates that, by taking accuracy and interpretability measures into consideration, Proposal2 achieved in overall better results than Hybrid.
For FURIA algorithm, Table 5.14 shows that Proposal2 achieved better error rates in 4 out of 5 data sets which indicates its classification ability compared to FURIA. For interpretability measure, Proposal2 and FURIA have comparable performance. Proposal2 has fewer rules in 3 out of 5 data sets while FURIA has shorter rules in 4 out of 5 data sets. As a result, we can conclude that for both interpretability and accuracy measures, Proposal2 performed better than FURIA.
In the case of SLAVE algorithm, Proposal2 outperformed SLAVE in term of accuracy in 5 out of 5 data sets. For interpretability measure, both of the algorithms have comparable performance. SLAVE has fewer rules in 3 out of 5 data sets while Proposal2 has shorter rules in 4 out of 5 data sets. In overall results, Proposal2 received better results than SLAVE.
For CHI algorithm, because of its lacks of a mechanism for reducing the number of rules and antecedent conditions, we can notice that it has a high number of rules
University
144
compared to other methods. In addition to its complexity, CHI achieved less accuracy than Proposal2 in all the 5 data sets.
As we can see from this comparison with benchmark methods, Proposal2 is competitive or even better than these methods in both accuracy and interpretability measures. So, our approach to select the most complicated fuzzy rule-based systems seems to be an acceptable approach for our case.
Table 5.14 Average testing error rates for Proposal2 and some benchmark methods Proposal2 FURIA SLAVE Ishi CHI
Breast W 3.82 4.32 4.51 3.54 9.8 Diabetes 24.56 25.29 26.35 25.08 27.45 Glass 38.10 31.78 38.17 37.80 38.61 Heart C 46.03 / / 46.50 / Sonar 22.85 22.99 31.5 23.70 25.39 Wine 5.03 6.75 7.54 4.94 7.23 Average rank 1.67 2.40 4.40 1.83 4.60
Table 5.15 Average number of rules and antecedent conditions per rule for Proposal2 and some benchmark methods
Proposal 2 FURIA Ishi SLAVE CHI #Rule #condi #Rule #condi #Rule #condi #Rule #condi #Rule #condi Breast W 5.2 2.7 12.2 2.9 10 / 5.8 3.7 172.4 / Diabetes 10.44 3.48 8.5 2.6 10 / 9.3 3.7 98.6 / Glass 7.61 3.51 11.3 2.2 10 / 12.3 3.3 42.7 / Heart C 8.77 4.09 / / 10 / / / / / Sonar 8.78 4.36 8.1 2.3 10 / 6.9 4.7 137.1 / Wine 4.07 2.11 6.2 1.9 10 / 3.8 2.9 101.2 / Average rank 2 2 2.6 1.2 3 / 2 2.8 5 /
University
of Malaya
145
5.2 Ensemble methods
In this section, we conduct a series of comparisons to determine which of the ensemble methods is more accurate in order to be selected for the fuzzy-ensemble method. Some comparisons also aim at analyzing classifiers’ performances as single and ensemble classifiers.