Chapter 3: A Genetic Rule Mining Method
3.5 GA Vs Decision Tree Rule Mining Method
The performance of the proposed GA method over decision tree method is compared in this section. The basis of the comparison is chosen as the quality of the rules. The class support and class error, two popular measures used in DM literature, determine the quality of the rules. A higher class support and a lower class error is more desirable for rule mining problems. The genome data set, which is a reasonably large data set was chosen to test these two methods. For the decision tree method the C4.5 algorithm was chosen for the experiment. The split criterion of the node in this algorithm was chosen as the Information Gain Ratio. The minimum number of leaves in the node was chosen to be two. The decision tree was constructed with 96 leaves. To prune the tree a 0.25 confidence level was chosen and finally 38 leaves were found.
Rule Description
1. (A30= G) and (A34= G)==> (EI), rcs=0.855, rce =0.092
2. (A31= T) and (A34= G)==> (EI), rcs=0.849, rce =0.059
3. (A30= G) and (A31= T) and (A34= G)==> (EI), rcs=0.849, rce =0.028
Table 3.11 Mined classification rules of DNA data (rcs=Rule class support, rce=Rule class error)
These 38 leaves were translated into rules and are shown in Table 3.12 with their class support and error values.
The results show that rule ((A31 = G))==>IE is the best rule found for the class
IE gene using the decision tree method since it has the highest support among all the other rules of the same class. Similarly, the rule ((A31 = T) and (A30 = G) and (A34 =
G))==>EI is the best rule mined by the decision tree method for the class EI. To
compare this result to the GA rule mining method the same data set is chosen. The minimum class support and maximum error for the IE class was set as 0.2 and 0.001 with the hope that it could mine rules for the IE class better than rules found using the decision tree method (the best class support and error was set as 0.22 and 0.0 in that method). The results are shown in Table 3.12. The best rule found using the GA method for the class IE was (A30= A)==> IE, with a class support 0.28 which is obviously higher than the support (0.22) found using the decision tree method. The class error (0.000) was same for both methods. For the other class EI the minimum class support and maximum error for the IE class was set as 0.5 and 0.04 and the resultant rules are also reported in the Table 3.12. In this experiment the best rule was ((A30= G) and
(A31= T) and (A34= G))==> EI, which was the same as the best rule found using the
decision tree method. In addition to this rule, the GA method has mined other rules with supports that are higher than the other rules found using the decision tree method. It is also obvious from the results of the GA method in this experiment, that the GA method only mines rules of user interest and the decision tree method mines rules from a large number of leaves.
This experiment shows that the GA method can mine better rules than the decision tree method. The mined rule set can be large using the decision tree method but it is compact and constituted by better rules in when the GA method is used. This is because the user can set the parameters of minimum support and maximum allowed error which reduces the number of unwanted rules in the mining result.
Rule Description
1. ((A31 = G))==>IE, rcs= 0.228, rce= 0.001 2. ((A31 = C))==>IE, rcs= 0.214, rce= 0.005
3. ((A31 = T) and (A30 = A))==>IE, rcs= 0.081, rce= 0.000 4. ((A31 = T) and (A30 = C))==>IE, rcs= 0.066, rce= 0.000 5. ((A31 = T) and (A30 = T))==>IE, rcs= 0.036, rce= 0.000
6. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = C))==>IE, rcs= 0.017, rce= 0.000
7. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = T))==>IE, rcs= 0.012, rce= 0.000
8. ((A31 = T) and (A30 = G) and (A34 = T) and (A32 = G) and (A21 =
T))==>IE, rcs= 0.012, rce= 0.000
9. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = C) and (A48 =
C))==>IE, rcs= 0.009, rce= 0.001
10. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = G) and (A21 =
T))==>IE, rcs= 0.008, rce= 0.000
11. ((A31 = T) and (A30 = G) and (A34 = T) and (A32 = T))==>IE, rcs= 0.008, rce= 0.001
12. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = C) and (A48 =
A))==>IE, rcs= 0.007, rce= 0.000
13. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = G) and (A21 = C)
and (A22 = C))==>IE, rcs= 0.005, rce= 0.000
14. ((A31 = T) and (A30 = G) and (A34 = T) and (A32 = C))==>IE, rcs= 0.005, rce= 0.001
15. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = C) and (A48 = G)
and (A10 = C))==>IE, rcs= 0.004, rce= 0.000
16. ((A31 = T) and (A30 = G) and (A34 = T) and (A32 = G) and (A21 =
C))==>IE, rcs= 0.004, rce= 0.000
17. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = C) and (A48 = G)
18. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = C) and (A48 = G)
and (A10 = T))==>IE, rcs= 0.003, rce= 0.000
19. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = T) and (A13 =
A))==>IE, rcs= 0.003, rce= 0.000
20. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = T) and (A13 =
C))==>IE, rcs= 0.003, rce= 0.000
21. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = G) and (A21 = C)
and (A22 = G))==>IE, rcs= 0.003, rce= 0.000
22. ((A31 = T) and (A30 = G) and (A34 = G))==>EI, rcs= 0.850, rce= 0.029 23. ((A31 = T) and (A30 = G) and (A34 = T) and (A32 = A))==>EI, rcs=
0.038, rce= 0.005
24. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = A))==>EI, rcs= 0.033, rce= 0.003
25. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = A))==>EI, rcs= 0.018, rce= 0.000
26. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = C) and (A48 =
T))==>EI, rcs= 0.009, rce= 0.000
27. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = G))==>EI, rcs= 0.008,rce= 0.000
28. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = G) and (A21 =
A))==>EI, rcs= 0.007, rce= 0.000
29. ((A31 = T) and (A30 = G) and (A34 = T) and (A32 = G) and (A21 =
A))==>EI, rcs= 0.007, rce= 0.000
30. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = G) and (A21 =
G))==>EI, rcs= 0.005, rce= 0.001
31. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = C) and (A48 = G)
and (A10 = G))==>EI, rcs= 0.003, rce= 0.000
32. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = T) and (A13 =
T))==>EI, rcs= 0.003, rce= 0.000
33. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = G) and (A21 = C)
34. ((A31 = N))==>EI, rcs= 0.001, rce= 0.000
35. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = T) and (A13 =
G))==>EI, rcs= 0.001, rce= 0.000
36. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = G) and (A21 = C)
and (A22 = A))==>EI, rcs= 0.001, rce= 0.000
37. ((A31 = T) and (A30 = G) and (A34 = T) and (A32 = G) and (A21 =
G))==>EI, rcs= 0.001, rce= 0.000
Table 3.12 Rule Mining on DNA data by Decision Tree (rcs=Rule class support,
rce=Rule class error)
Rule Description
1. (A30= A)==> IE, rcs= 0.284, rce= 0.000
2. ((A28= A) and (A30= A0)==> IE, rcs= 0.282, rce= 0.000 3. ((A29= G) and (A30= A))==> IE, rcs= 0.282, rce= 0.000
4. ((A30= G) and (A31= T) and (A34= G))==> EI, rcs= 0.850, rce= 0.028 5. ((A30= G) and (A31= T) and (A33= A))==> EI, rcs= 0.718, rce= 0.038 6. ((A30= G) and (A33= A) and (A34= G))==> EI, rcs= 0.642, rce= 0.032 7. ((A31= T) and (A33= A) and (A34= G))==> EI, rcs= 0.638, rce= 0.018 8. ((A30= G) and (A31= T) and (A33= A) and (A34= G))==> EI, rcs= 0.638,
rce= 0.008
9. ((A29= G) and (A30= G) and (A31= T) and (A34= G))==> EI, rcs= 0.671,
rce= 0.028
10. ((A29= G) and (A30= G) and (A31= T) and (A33= A))==> EI, rcs= 0.551,
rce= 0.378
Table 3.13 Rule Mining on DNA data by GA (rcs=Rule class support, rce=Rule class error)