GA Vs Decision Tree Rule Mining Method - A Genetic Rule Mining Method

Chapter 3: A Genetic Rule Mining Method

3.5 GA Vs Decision Tree Rule Mining Method

The performance of the proposed GA method over decision tree method is compared in this section. The basis of the comparison is chosen as the quality of the rules. The class support and class error, two popular measures used in DM literature, determine the quality of the rules. A higher class support and a lower class error is more desirable for rule mining problems. The genome data set, which is a reasonably large data set was chosen to test these two methods. For the decision tree method the C4.5 algorithm was chosen for the experiment. The split criterion of the node in this algorithm was chosen as the Information Gain Ratio. The minimum number of leaves in the node was chosen to be two. The decision tree was constructed with 96 leaves. To prune the tree a 0.25 confidence level was chosen and finally 38 leaves were found.

Rule Description

1. (A30= G) and (A34= G)==> (EI), rcs=0.855, rce =0.092

2. (A31= T) and (A34= G)==> (EI), rcs=0.849, rce =0.059

3. (A30= G) and (A31= T) and (A34= G)==> (EI), rcs=0.849, rce =0.028

Table 3.11 Mined classification rules of DNA data (rcs=Rule class support, rce=Rule class error)

These 38 leaves were translated into rules and are shown in Table 3.12 with their class support and error values.

The results show that rule ((A31 = G))==>IE is the best rule found for the class

IE gene using the decision tree method since it has the highest support among all the other rules of the same class. Similarly, the rule ((A31 = T) and (A30 = G) and (A34 =

G))==>EI is the best rule mined by the decision tree method for the class EI. To

compare this result to the GA rule mining method the same data set is chosen. The minimum class support and maximum error for the IE class was set as 0.2 and 0.001 with the hope that it could mine rules for the IE class better than rules found using the decision tree method (the best class support and error was set as 0.22 and 0.0 in that method). The results are shown in Table 3.12. The best rule found using the GA method for the class IE was (A30= A)==> IE, with a class support 0.28 which is obviously higher than the support (0.22) found using the decision tree method. The class error (0.000) was same for both methods. For the other class EI the minimum class support and maximum error for the IE class was set as 0.5 and 0.04 and the resultant rules are also reported in the Table 3.12. In this experiment the best rule was ((A30= G) and

(A31= T) and (A34= G))==> EI, which was the same as the best rule found using the

decision tree method. In addition to this rule, the GA method has mined other rules with supports that are higher than the other rules found using the decision tree method. It is also obvious from the results of the GA method in this experiment, that the GA method only mines rules of user interest and the decision tree method mines rules from a large number of leaves.

This experiment shows that the GA method can mine better rules than the decision tree method. The mined rule set can be large using the decision tree method but it is compact and constituted by better rules in when the GA method is used. This is because the user can set the parameters of minimum support and maximum allowed error which reduces the number of unwanted rules in the mining result.

Rule Description

1. ((A31 = G))==>IE, rcs= 0.228, rce= 0.001 2. ((A31 = C))==>IE, rcs= 0.214, rce= 0.005

3. ((A31 = T) and (A30 = A))==>IE, rcs= 0.081, rce= 0.000 4. ((A31 = T) and (A30 = C))==>IE, rcs= 0.066, rce= 0.000 5. ((A31 = T) and (A30 = T))==>IE, rcs= 0.036, rce= 0.000

6. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = C))==>IE, rcs= 0.017, rce= 0.000

7. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = T))==>IE, rcs= 0.012, rce= 0.000

8. ((A31 = T) and (A30 = G) and (A34 = T) and (A32 = G) and (A21 =

T))==>IE, rcs= 0.012, rce= 0.000

9. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = C) and (A48 =

C))==>IE, rcs= 0.009, rce= 0.001

10. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = G) and (A21 =

T))==>IE, rcs= 0.008, rce= 0.000

11. ((A31 = T) and (A30 = G) and (A34 = T) and (A32 = T))==>IE, rcs= 0.008, rce= 0.001

12. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = C) and (A48 =

A))==>IE, rcs= 0.007, rce= 0.000

13. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = G) and (A21 = C)

and (A22 = C))==>IE, rcs= 0.005, rce= 0.000

14. ((A31 = T) and (A30 = G) and (A34 = T) and (A32 = C))==>IE, rcs= 0.005, rce= 0.001

15. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = C) and (A48 = G)

and (A10 = C))==>IE, rcs= 0.004, rce= 0.000

16. ((A31 = T) and (A30 = G) and (A34 = T) and (A32 = G) and (A21 =

C))==>IE, rcs= 0.004, rce= 0.000

17. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = C) and (A48 = G)

18. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = C) and (A48 = G)

and (A10 = T))==>IE, rcs= 0.003, rce= 0.000

19. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = T) and (A13 =

A))==>IE, rcs= 0.003, rce= 0.000

20. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = T) and (A13 =

C))==>IE, rcs= 0.003, rce= 0.000

21. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = G) and (A21 = C)

and (A22 = G))==>IE, rcs= 0.003, rce= 0.000

22. ((A31 = T) and (A30 = G) and (A34 = G))==>EI, rcs= 0.850, rce= 0.029 23. ((A31 = T) and (A30 = G) and (A34 = T) and (A32 = A))==>EI, rcs=

0.038, rce= 0.005

24. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = A))==>EI, rcs= 0.033, rce= 0.003

25. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = A))==>EI, rcs= 0.018, rce= 0.000

26. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = C) and (A48 =

T))==>EI, rcs= 0.009, rce= 0.000

27. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = G))==>EI, rcs= 0.008,rce= 0.000

28. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = G) and (A21 =

A))==>EI, rcs= 0.007, rce= 0.000

29. ((A31 = T) and (A30 = G) and (A34 = T) and (A32 = G) and (A21 =

A))==>EI, rcs= 0.007, rce= 0.000

30. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = G) and (A21 =

G))==>EI, rcs= 0.005, rce= 0.001

31. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = C) and (A48 = G)

and (A10 = G))==>EI, rcs= 0.003, rce= 0.000

32. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = T) and (A13 =

T))==>EI, rcs= 0.003, rce= 0.000

33. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = G) and (A21 = C)

34. ((A31 = N))==>EI, rcs= 0.001, rce= 0.000

35. ((A31 = T) and (A30 = G) and (A34 = A) and (A27 = T) and (A13 =

G))==>EI, rcs= 0.001, rce= 0.000

36. ((A31 = T) and (A30 = G) and (A34 = C) and (A32 = G) and (A21 = C)

and (A22 = A))==>EI, rcs= 0.001, rce= 0.000

37. ((A31 = T) and (A30 = G) and (A34 = T) and (A32 = G) and (A21 =

G))==>EI, rcs= 0.001, rce= 0.000

Table 3.12 Rule Mining on DNA data by Decision Tree (rcs=Rule class support,

rce=Rule class error)

Rule Description

1. (A30= A)==> IE, rcs= 0.284, rce= 0.000

2. ((A28= A) and (A30= A0)==> IE, rcs= 0.282, rce= 0.000 3. ((A29= G) and (A30= A))==> IE, rcs= 0.282, rce= 0.000

4. ((A30= G) and (A31= T) and (A34= G))==> EI, rcs= 0.850, rce= 0.028 5. ((A30= G) and (A31= T) and (A33= A))==> EI, rcs= 0.718, rce= 0.038 6. ((A30= G) and (A33= A) and (A34= G))==> EI, rcs= 0.642, rce= 0.032 7. ((A31= T) and (A33= A) and (A34= G))==> EI, rcs= 0.638, rce= 0.018 8. ((A30= G) and (A31= T) and (A33= A) and (A34= G))==> EI, rcs= 0.638,

rce= 0.008

9. ((A29= G) and (A30= G) and (A31= T) and (A34= G))==> EI, rcs= 0.671,

rce= 0.028

10. ((A29= G) and (A30= G) and (A31= T) and (A33= A))==> EI, rcs= 0.551,

rce= 0.378

Table 3.13 Rule Mining on DNA data by GA (rcs=Rule class support, rce=Rule class error)

In document Data Mining Using Neural Networks (Page 104-108)