In our research we used the same data sets that were used for experiments in [9]. All of these data set have numerical attributes and are completely spec- ified (i.e., for every attribute and every case the corresponding attribute value is specified). These ten data sets are presented in Table 2. For experiments we used three different approaches: the original LEM2 algorithm with dis- cretization based on entropy as preprocessing, and two versions of MLEM2 algorithms. The first version of MLEM2 was not equipped with a mechanism for merging intervals within the same rule. For example, frompima data set, a typical induced rule was:
6, 38, 38
(Diabetes, 0.078..0.2995) & (Pressure, 57..122) & (Diabetes, 0.1655..2.42) & (Age, 21..38.5) & (Pressure, 0..83) & (Glucose, 0..99.5) –>(Class, 0)
It is clear that two conditions, both associated with the same attribute
Diabetes, namely:
(Diabetes, 0.078..0.2995) and (Diabetes, 0.1655..2.42) can be merged into one condition:
(Diabetes, 0.1655..0.2995).
Similarly, for attributePressure, the following two conditions (Pressure, 57..122) and (Pressure, 0..83)
can be also merged into one condition: Pressure, 57..83).
The third way to induce rules was the newest version of the MLEM2 al- gorithm that is able to merge conditions with intervals. We used results of
Table 2.Data sets
Data set Number of
Cases Attributes Concepts
Bank 66 5 2 Bricks 216 10 2 Bupa 345 6 2 Buses 76 8 2 German 1,000 24 2 Glass 214 9 6 HSV 122 11 2 Iris 150 4 3 Pima 768 8 2 Segmentation 210 19 7
Table 3.Number of rules
Data set Discretization MLEM2 without MLEM2 with based on entropy merging merging
and LEM2 conditions conditions
Bank 10 3 3 Bricks 25 12 12 Bupa 169 73 71 Buses 3 2 2 German 290 160 159 Glass 111 33 30 HSV 62 23 23 Iris 14 9 8 Pima 252 113 116 Segmentation 108 14 14
Table 4.Number of conditions
Data set Discretization MLEM2 without MLEM2 with based on entropy merging merging
and LEM2 conditions conditions
Bank 13 5 6 Bricks 61 40 35 Bupa 501 345 241 Buses 4 4 5 German 1,226 1,044 814 Glass 262 137 87 HSV 206 101 80 Iris 33 23 17 Pima 895 599 428 Segmentation 322 48 37
experiments for the first two approaches that were reported in [9]. The same data sets were used for experiments with the newest version of the MLEM2 algorithm with merging of intervals. Results are presented in Tables 3–5. Table 3 presents the total number of rules for all three approaches while Table 4 shows the total number of conditions in those rule sets. Note that both versions of MLEM2 were independently implemented, using different heuristics, so the number of rules may differ. Furthermore, even the total number of conditions may be – for some data sets – larger for the algorithm that was supposed to induce rules with the smallest number of conditions.
Table 5 shows accuracy for all ten data sets and all three used approaches for rule induction. Accuracy was computed using tenfold cross validation.
Table 5. Accuracy
Data set Discretization MLEM2 without MLEM2 with based on entropy merging merging
and LEM2 conditions conditions
Bank 97 95 95 Bricks 92 92 87 Bupa 66 65 64 Buses 99 96 93 German 74 70 69 Glass 67 72 69 HSV 56 60 65 Iris 97 95 94 Pima 74 71 70 Segmentation 64 89 84
6 Conclusions
To compare our three different approaches to rule induction from numerical data the Wilcoxon matched-pairs signed rank test was used (with level of significance 5%, two-tailed test) [11]. All three approaches were compared pair wise. The total number of rules is the largest for discretization based on entropy, used as preprocessing for original data sets, and then the LEM2 algorithm for rule induction. For both versions of MLEM2 the difference in performance is – statistically – insignificant.
Similarly, for the total number of conditions in the induced rule sets, the worst result – the largest number of conditions – was induced by the first ap- proach: discretization based on entropy and then the LEM2 algorithm for rule induction. The performance of the MLEM2 algorithm with merging intervals was better than MLEM2 without merging intervals, as expected.
Surprisingly, all three approaches show no significant difference in perfor- mance for the most important parameter: accuracy.
References
1. Booker LB, Goldberg DE, and Holland JF (1990) Classifier systems and genetic algorithms. In: Carbonell JG (ed.)Machine learning. Paradigms and methods. MIT, Menlo Park, CA, 235–282
2. Chmielewski MR, Grzymala-Busse JW (1996) Global discretization of contin- uous attributes as preprocessing for machine learning,International Journal of Approximate Reasoning 15: 319–331
3. Fayyad UM, Irani KB (1992) On the handling of continuous-valued attributes in decision tree generation,Machine Learning8: 87–102
4. Grzymala-Busse JW (1992) LERS – A system for learning from examples based on rough sets. In: Slowinski R (ed) Intelligent decision support. Handbook of applications and advances of the rough set theory. Kluwer, Dordrecht, 3–18
5. Grzymala-Busse JW (1997) A new version of the rule induction system LERS,
Fundamenta Informaticae31: 27–39
6. Grzymala-Busse JW (2002) Discretization of numerical attributes. In: Kloesgen W, Zytkow J (eds)Handbook of data mining and knowledge discovery, Oxford University Press, New York, 218–225
7. Grzymala-Busse JW (2002) MLEM2: A new algorithm for rule induction from imperfect data. Proceedings of the 9th International Conference on Informa- tion Processing and Management of Uncertainty in Knowledge-Based Systems, IPMU 2002, July 1–5, Annecy, France, 243–250
8. Grzymala-Busse JW (2004) Three strategies to rule induction from data with numerical attributes,Transactions on Rough Sets, Lecture Notes in Computer Science Journal Subline, Springer, Berlin Heidelberg New York 2: 54–62 9. Grzymala-Busse JW, Stefanowski J (2001) Three discretization methods for rule
induction,International Journal of Intelligent Systems 16: 29–38
10. Grzymala-Busse JW, Hsiao TY (1998) Dropping conditions in rules induced by ID3. Proceedings of the 6th International Workshop on Rough Sets, Data Mining and Granular Computing RSDMGrC’98 at the 4th Joint Conference on Information Sciences (JCIS’98), October 1998, Research Triangle Park, NC, 351–354
11. Hamburg M (1983) Statistical Analysis for Decision Making. Harcourt Brace Jovanovich, New York 546–550 and 721
12. Holland JH, Holyoak KJ, Nisbett RE (1986)Induction. Processes of inference, learning, and discovery. MIT, Boston
13. Pawlak Z (1982) Rough sets,International Journal of Computer and Informa- tion Sciences 11: 341–356
14. Pawlak Z (1991) Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer, Dordrecht
15. Quinlan JR (1993)C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, CA