This chapter discusses an extension of the theory of Rough Sets for generating monotone classifiers from monotone data sets. Our approach uses the concepts of monotone discernibility matrix/function and monotone (object) reduct and the theory of monotone discrete functions. It has a number of advantages over previous research on the problem as it was summarized in section 2.3.3 and in the discussion on the experiment with the bankruptcy data set in section 2.4.
Compared to the previous research, the approach presented here produces smaller sets of rules that are consistent (do not predict conflicting class values), cover the whole input space and form a monotone classifier. When the maximal extension is used, the predictions are of single class values and not sets of values as in the previous research.
Another difference is the use of the discernibility matrix for computing the monotone reducts. This approach provides a comprehensive method for gener- ating all monotone reducts instead of using heuristics for generating one short (but not necessarily of minimum length) reduct.
a7 > 2 a6 > 0 a5 > 0 0 ,, ,, l l ll a8 > 0 0 \ \\ 1 \ \\ 1 "" "" " b b b b b a9 > 2 1 \ \\ 2
Figure 2.2: Monotone decision tree for the bankruptcy data set Compared to monotone decision trees, our method produces a more compact classifier since the decision tree contains the information of both the extension and its dual.
Furthermore, it appears that there is a close relationship between the deci- sion rules obtained using the rough set approach and the prime implicants of the maximal extension. Although this has been shown for the monotone case this also holds at least for non-monotone Boolean data sets. We have discussed how to compute this extension by using dualization.
The generalization of the discrete function approach to non-monotone data sets and the comparison with the theory of Rough Sets is a topic of further research. Finally, the sometimes striking similarity we have found between Rough Set Theory and Logical Analysis of Data remains an interesting research topic.
Monotone Decision Trees
3.1
Introduction
In the previous chapter we considered a classifier expressed in decision rules. Another frequently used representation is a decision tree. A decision tree is a directed, acyclic, connected graph with a designated starting node (a root) and a designated set of terminal nodes (leaves). At each non-terminal node a test is performed on a certain attribute value(s) and at each leaf a class value is assigned.
Decision trees were first introduced in Machine Learning by Quinlan with the ID3 algorithm [66, 67]. It was applied originally to discrete domains but was extended to C4.5 which is also applicable to continuous domains [69]. C4.5 is now one of the most popular decision tree algorithms. The statistical point of view on the problem was expressed in the other mainstream decision tree algorithm CART (Classification and regression trees) [26].
A number of attempts were made to apply decision trees on the classification for monotone problems, see [9, 18, 55, 64], from which the most successful was the monotone decision trees algorithm introduced in [18, 64].
This chapter addresses the problem of classification with monotonicity con- straints in the context of monotone decision trees (MDT). It extends the al- gorithm presented in [64] in several directions in order to provide a full set of possibilities for solving real-life problems similar to the possibilities available for the classical decision trees.
Data noise is one problem that frequently occurs in real-life classification problems and is extensively studied by a number of authors from different per- spectives. In classification problems with monotonicity constraints, noise often causes an additional problem not relevant for the general case – violation of the monotonicity constraint. The MDT algorithm requires a strictly monotone data set. This chapter proposes an extension for dealing with monotonicity
noise which allows the generation of a monotone tree from any non-monotone data set.
Decision tree pruning is another area that has attracted a lot of attention (see [27] for a survey). A number of successful methods are available for reducing the tree size and avoiding the overfitting of the particular properties of the data set. However the monotonicity constraint raises new questions, the most important of which is how to label the new leaves so that the tree remains monotone. This chapter tries to answer the question in the setting of pre-pruning as well as post-pruning. Furthermore, we address the more general problem of labelling any tree in a consistent way so that it becomes monotone.
Most of the chapter is based on the publications [17, 16].
The chapter is organized as follows. Section 3.2 presents the original mono- tone decision tree algorithm. An extension for generating monotone trees from noisy non-monotone data is described in section 3.3. A number of pruning ap- proaches and labelling functions which guarantee the monotonicity of the tree are presented in section 3.4.
Section 3.5 investigates the performance of two different splitting criteria in monotone decision tree generation in order to give more insight into which one better fits the classification problems with monotonicity restrictions. Section 3.6 explores the problem of missing attribute values in the context of monotone classification. A simple preprocessing method is proposed as an extension of a number of general approaches for filling in the unknown values so that the monotonicity property of the resulting data set is guaranteed.
The methods discussed in the paper are tested experimentally and the exper- imental settings, data sets and results are given in section 3.7. The conclusions of the chapter are given in section 3.8.