Experiments - Knowledge Discovery and Monotonicity

2.4.1 The Bankruptcy Prediction Problem

The research in bankruptcy prediction has a long history dating back to the 1930s. A number of statistical methods were applied for developing models able to predict in advance whether a company will go bankrupt or not. The analysis is based on financial information about the company in the form of financial indicators and ratios obtained from the company’s annual reports. The main goal is to describe the relationship between these indicators and bankruptcy using available data about companies that have already gone bankrupt and data about “healthy” companies.

A major breakthrough in the research was achieved in the 1960s by applying the method of discriminant analysis for bankruptcy prediction. In 1968 Altman proposed multivariate discriminant analysis for developing the prediction model [4] and the approach has been further improved and tested in a number of studies since then. However, this method makes several assumptions that are not always present in real-life data. This encouraged the researchers to look for alternatives. One of them is the logistic analysis that was proposed in the 1980s. It was applied on bankruptcy data and gave very good results.

The success of the machine learning methods in a number of application domains suggested that they might be useful for predicting bankruptcy as well. Neural networks, decision trees, genetic algorithms, rough sets and other ma-

chine learning approaches were applied to bankruptcy data with promising results [6, 52, 62, 74, 75]. In a number of studies these methods are tested and compared to the traditional statistical techniques. Some of the articles suggest that the new approaches outperform the classical methods on data sets from a number of areas including bankruptcy prediction.

In this research the bankruptcy prediction problem is interpreted as a clas- sification problem with monotonicity constraints.

2.4.2 The Bankruptcy Data set

The data set used in the experiments is discussed in [74, 40]. The sample consists of 39 objects denoted by F 1 to F 39 – firms that are described by 12 financial parameters. To each company a decision value is assigned – the expert evaluation of its category of risk for the year 1988. The condition attributes denoted by A1 to A12 take integer values from 0 to 4.

The decision attribute is denoted by d and takes integer values in the range 0 to 2 where: 0 means unacceptable, 1 means uncertainty and 2 means acceptable. This and the other data sets used in the following chapters are described in more details in Appendix A.

The data was first analyzed for monotonicity. The problem is obviously monotone (if one company outperforms another on all condition attributes then it should not have a lower value of the decision attribute). Nevertheless, one noisy example was discovered, namely F 24. It was removed from the data set and was not considered further.

2.4.3 Reducts and Decision Rules

The minimal reducts have been computed using our program Dualizer. There are 25 minimal general reducts (minimum length 3) and 15 monotone reducts (minimum length 4). They are given in tables 2.16 and 2.17 respectively.

We have also compared the heuristics to approximate a minimum reduct: the “Best reduct” method (for general reducts) and the Johnson strategy (for general and monotone reducts). The results are given in tables 2.18 and 2.19. Note that for the “Best reduct” method the results do not change for the monotone case since we use no additional information. For the Johnson’s heuristic, however, the results change because we use the monotone discernibility matrix instead of the general discernibility matrix. The set of generated reducts here is also influenced by whether we use simplification (remove redundancies, etc.) or not both for the general and for the monotone case.

Table 2.20 shows the two sets of decision rules obtained by computing the object (value)-reducts for the monotone reduct (A1, A3, A7, A9). Both sets of rules have minimal covers, of which the ones with minimum length are shown in table 2.21. A minimal cover can be transformed into an extension if the rules

Minimal general reducts: A1, A7, A11

A3, A6, A7, A12 A1, A3, A7, A9 A5, A6, A7, A8 A1, A3, A7, A12 A5, A6, A7, A9 A1, A5, A7, A9 A5, A6, A7, A10 A1, A6, A7, A12 A5, A6, A7, A11 A1, A7, A8, A9 A5, A6, A7, A12 A1, A7, A9, A12 A6, A7, A8, A10 A2, A3, A6, A7 A6, A7, A8, A11 A2, A5, A6, A7 A6, A7, A8, A12 A2, A6, A7, A11 A6, A7, A9, A11 A2, A6, A7, A12 A6, A7, A9, A12 A3, A6, A7, A8 A3, A6, A7, A9 A3, A6, A7, A10

Minimal monotone reducts: A1, A3, A7, A9

A1, A5, A7, A9 A2, A3, A6, A7 A2, A5, A6, A7 A3, A6, A7, A9 A3, A6, A7, A10 A3, A6, A7, A12 A5, A6, A7, A8 A5, A6, A7, A9 A5, A6, A7, A10 A5, A6, A7, A11 A5, A6, A7, A12 A1, A3, A6, A7, A8 A1, A3, A6, A7, A11

Table 2.17: All minimal monotone reducts

The Best Reduct method: Johnson’s algorithm:

with simplification without simplification

A1, A7, A8, A11 A1, A3, A7, A9 A1, A7, A8, A11

A6, A7, A8, A11 A1, A5, A7, A9 A6, A7, A8, A11

A3, A6, A7, A9 A3, A6, A7, A8

A5, A6, A7, A9

Table 2.18: General reducts using the two heuristics

Johnson’s algorithm:

with simplification without simplification

A1, A3, A7, A9 A2, A5, A6, A7

A1, A5, A7, A9 A5, A6, A7, A8

A3, A6, A7, A9 A5, A6, A7, A9

A5, A6, A7, A9 A5, A6, A7, A10

A5, A6, A7, A11 A5, A6, A7, A12

class d≥ 2 class d≥ 1 A1≥ 3 A1≥ 3 A7≥ 4 A3≥ 3 A9≥ 4 A7≥ 3 A1≥ 2 ∧ A7 ≥ 3 A9≥ 4 A3≥ 2 ∧ A7 ≥ 3 A1≥ 1 ∧ A3 ≥ 2 A7≥ 3 ∧ A9 ≥ 3 A1≥ 1 ∧ A9 ≥ 3 A3≥ 2 ∧ A7 ≥ 2 A3≥ 2 ∧ A7 ≥ 1 ∧ A9 ≥ 3 class d≤ 0 class d≤ 1 A7≤ 0 A7≤ 2 A9≤ 1 A9≤ 2 A1≤ 0 ∧ A3 ≤ 0 A1≤ 0 ∧ A3 ≤ 2 ∧ A7 ≤ 1 A1≤ 0 ∧ A3 ≤ 1 ∧ A7 ≤ 2 A1≤ 0 ∧ A3 ≤ 2 ∧ A9 ≤ 2 A3≤ 0 ∧ A9 ≤ 2 A3≤ 1 ∧ A7 ≤ 2 ∧ A9 ≤ 2 A3≤ 2 ∧ A7 ≤ 1 ∧ A9 ≤ 2

Table 2.20: The rules for (A1, A3, A7, A9)

are considered as minimal/maximal vectors in a decision list representation. In this sense the minimal cover of the first set of rules can be described by the following function:

f = 2.x73x93∨ 1.(x33∨ x73∨ x11x93∨ x32x72). (2.30) The maximal extension corresponding to the monotone reduct (A1, A3, A7, A9) is represented in table 2.22.

Our results show that the function f or equivalently its minlist consists of only 5 decision rules (prime implicants). They cover the whole input space. Moreover, each possible vector is classified as d = 0, 1 or 2 and not as d≥ 1 or d≥ 2 like in [40, 41].

The method presented in [40, 41] uses both formats shown in table 2.20 to describe a minimal cover, resulting in a system of 11 rules. Using both formats at the same time can result in much (possibly exponential) larger sets of rules. As mentioned before, using both formats of rules at the same time may also result in conflicting predictions.

We also computed a monotone decision tree [18, 64] for the bankruptcy data set discussed here. The tree is given in figure 2.2. It appears that monotone decision trees are larger because they contain the information of both an extension and its dual. The topic of generating monotone decision trees is discussed

class d≥ 2 class d≥ 1 A7≥ 3 ∧ A9 ≥ 3 A3≥ 3 A7≥ 3 A1≥ 1 ∧ A9 ≥ 3 A3≥ 2 ∧ A7 ≥ 2 class d≤ 0 class d≤ 1 A1≤ 0 ∧ A3 ≤ 2 ∧ A7 ≤ 1 A7 ≤ 2 A1≤ 0 ∧ A3 ≤ 1 ∧ A7 ≤ 2 A9 ≤ 2 A3≤ 1 ∧ A7 ≤ 2 ∧ A9 ≤ 2

Table 2.21: The minimal covers for (A1, A3, A7, A9)

class d = 2 class d = 1 A1≥ 3 A3≥ 3 A3≥ 4 A7≥ 3 A7≥ 4 A1≥ 1 ∧ A3 ≥ 2 A9≥ 4 A1≥ 1 ∧ A9 ≥ 3 A1≥ 2 ∧ A7 ≥ 3 A3 ≥ 2 ∧ A7 ≥ 2 A3≥ 2 ∧ A7 ≥ 3 A3 ≥ 2 ∧ A7 ≥ 1 ∧ A9 ≥ 3 A7≥ 3 ∧ A9 ≥ 3

Table 2.22: The maximal extension for (A1, A3, A7, A9)

in detail in the next chapter.

In document Knowledge Discovery and Monotonicity (Page 53-58)