Mining Efficiency - Experimental Evaluation

3.6 Experimental Evaluation

3.6.4 Mining Efficiency

In this section, we present our experiments for comparing the efficiency of several pattern mining methods. The purpose is to show that our proposed pruning techniques can greatly

Figure 18: Comparing the classification accuracies for different methods on the UCI datasets against MPP. Every point corresponds to a specific method on a specific dataset: its y-coordinate represents its accuracy and its x-coordinate represents the accuracy of MPP for the same dataset. Points above the diagonal correspond to methods that outperform MPP and points below the diagonal correspond to methods that are outperformed by MPP.

improve the efficiency of frequent pattern mining.

3.6.4.1 Compared Methods We compare the running time of the following methods:

1. FPM: A frequent pattern mining method, which corresponds to the first phase of any two-phase method [Cheng et al., 2007,Webb, 2007,Xin et al., 2006,Kavsek and Lavraˇc, 2006, Exarchos et al., 2008,Deshpande et al., 2005,Li et al., 2001b]. We use the algorithm by [Zaki, 2000], which applies the vertical data format (see Section2.2.3).

2. MBST: The Model Based Search Tree (MBST) method [Fan et al., 2008], which uses frequent pattern mining to build a decision tree (see Section5.7.2.1).

3. MPP-naïve: This method mines MPPs using a naive two-phase implementation, which first applies FPM (the first phase) and then evaluate all frequent patterns (the second phase).

4. MPP-lossless: This method mines MPPs by integrating pattern evaluation with fre- quent pattern mining. It applies only the lossless pruning described in Section3.5.4.1. 5. MPP-lossy: Our proposed method, which mines MPPs by integrating pattern evaluation

with frequent pattern mining. It applies both the lossless pruning and the lossy pruning described in Section3.5.4.2.

The experiments are conducted on a Dell Precision T1600 machine with an Intel Xeon 3GHz CPU and 16GB of RAM. All methods are implemented in MATLAB.

3.6.4.2 Results on UCI Datasets Table 8shows the execution time (in seconds) of the compared methods on the UCI datasets. We use the same settings as before (see Section 5.7.2.1): For FPM, MPP-naïve, MPP-lossless and MPP-lossy, we set the local minimum support (σy) to 10%. For MBST, we set the invocation minimum support to 25%.

The results show that scalability is a concern for FPM, MPP-naïve and MBST. For example, these methods are very slow on the parkinson, hepatitis, ionoshpere and WDBC datasets. In comparison, MPP-lossless is faster due to its lossless pruning and MPP-lossy is much faster due to its lossy pruning. Consider for instance the parkinson dataset. Mining all frequent patterns took 9,866 seconds, MBST took 5,159 seconds and MPP-naïve took 10,832 seconds. On the other hand, MPP-lossless took 828 seconds (an order of magnitude faster than FPM) and MPP-lossy took only 37 seconds (more than two orders of magnitude faster than FPM). This shows that our pruning techniques can significantly improve the mining efficiency.

Effectiveness of the lossy pruning: Remember that the lossy pruning heuristic used

by MPP-lossy does not guarantee the completeness of the result (see Section3.5.4.2). Here, we discuss its effectiveness by examining the number of MPPs that are missed when we apply the lossy pruning. In turned out that on ten out of the fifteen datasets (namely, lymphography, heart, hepatitis, diabetes, breast cancer, nursery, mammographic, tic tac toe, ionosphere and zoo), MPP-lossy did not miss any MPP. On the parkinson, red wine, Kr vs

Dataset FPM MBST MPP-naïve MPP-lossless MPP-lossy Lymphography 335 568 344 154 22 Parkinson 9,866 5,159 10,832 828 37 Heart 41 143 57 38 9 Hepatitis 1,121 1,808 1,156 394 35 Diabetes 3 16 6 6 2 Breast cancer 3 13 4 4 2 Nursery 2 3 11 9 8 Red wine 24 73 52 52 11 Mammographic 1 2 1 1 1

Tic tac toe 2 7 4 4 3

Ionosphere 16,580 3,522 16,809 1,080 814

Kr vs kp 175 538 535 479 104

Pen digits 70 23 137 135 121

Zoo 183 116 242 23 4

WDBC 2,305 1,834 5,182 270 73

Table 8: The mining time (in seconds) for the different pattern mining methods (see Section 3.6.4.1) on the UCI datasets.

KP, pen digits and WDBC datasets, MPP-lossy misses on average 1.6%, 2.1%, 1.5%, 2.1% and 0.9% (respectively) of the total number of MPPs. This small loss in completeness can be often tolerated in exchange for the large gain in efficiency that is obtained by using the lossy pruning.

of the different methods using different minimum support thresholds. Figure19shows the execution time (on logarithmic scale) of FPM, MPP-naïve, MPP-lossless and MPP-lossy on the lymphography dataset and on the hepatitis dataset. We did not include MBST because it is very inefficient for low minimum support thresholds.

(a) (b)

Figure 19: The mining time (on logarithmic scale) for the different pattern mining methods using different support thresholds on the lymphography dataset (a) and the hepatitis dataset (b).

We can see that the execution time of FPM and MPP-naïve exponentially blows up when the minimum support decreases. On the other hand, MPP-lossless controls the complexity and the execution time increases much slower when the minimum support decreases. For example, by setting the minimum support to 5% for hepatitis, MPP-lossless becomes more than 15 times faster than FPM. Finally, MPP-lossy is very fast and it scales up much better than the other methods when the minimum support is low.

3.7 SUMMARY

In this chapter, we studied pattern mining in the supervised setting and presented the minimal predictive patterns (MPP) framework. Our framework relies on Bayesian inference to evaluate the predictiveness of patterns. It also considers the structure of the patterns to ensure that each pattern in the result offers a significant predictive advantage over all of its generalizations. We presented an efficient algorithm for mining the MPP set. In contrast to the widely used two-phase approach, our algorithm integrates pattern evaluation with frequent pattern mining and applies several pruning techniques to improve the efficiency.

Our experimental evaluation on several synthetic and real-world data illustrates the following benefits of our work:

1. The MPP framework is able to explain and cover the data with fewer rules than existing rule mining methods, which facilitates the process of knowledge discovery.

2. Pattern-based classification using MPPs outperforms many well known classifiers. 3. Mining MPPs is more efficient than mining all frequent patterns.

In document Mining Predictive Patterns and Extension to Multivariate Temporal Data (Page 78-84)