Effective Tree Structures for Mining High Utility Itemsets from Transaction Databases

(1)

Mrs.S.Subhasri, IJRIT 18 International Journal of Research in Information Technology (IJRIT) International Journal of Research in Information Technology (IJRIT) International Journal of Research in Information Technology (IJRIT) International Journal of Research in Information Technology (IJRIT)

www.ijrit.comwww.ijrit.comwww.ijrit.comwww.ijrit.com

ISSN 2001-5569

Effective Tree Structures for Mining High Utility Itemsets from Transaction

Databases

Mrs.S.Subhasri¹, Mr.B.Jai Kumar²

1PG Scholar, CSE Department, Computer Science and Engineering, Anna University, Chennai ²Assistant professor, CSE Department.

Vivekanandha college of technology for women, Elayampalayam, Thiruchengode.

Tamilnadu, India.

1[email protected]

2[email protected]

Abstract - In Data Mining, Extraction of high utility item sets from transaction database with high utility like profit, quantity, cost. High utility item sets are useful decision making process in most marketing and various places. It is also useful for customer behavior prediction. Mining high utility item sets from databases is very difficult task. Because downward closure property not used in high utility item sets mining. More Candidate item sets generation for mining high utility item sets degrades the mining performance in terms of memory space and execution time. In other words, pruning search space for high utility item set mining is difficult because a superset of a low-utility item set may be a high utility itemset. . Larger search space increases the candidate generation and computation time, especially when database contains lots of transactions. Propose two novel algorithms as well as a compact data structure for efficiently discovering high utility item sets from transactional databases. High utility item sets and maintaining important information related to utility patterns within databases are used. High-utility item sets can be generated from UP-Tree efficiently with only two scans of original databases. The concept of weighted downward closure property is used.

By using transaction weight, weighted supports not only reflect the importance of an item set but also maintain the downward closure property during the mining process.

Index Terms: Data mining, Utility mining, frequent Itemset mining, Candidate pruning.

I.INTRODUCTION

Data mining is the process of extracting the nontrivial, previously unknown data and potentially useful information from larger databases. Discovering useful patterns hidden in a database plays an essential role in several data mining tasks, such as frequent pattern mining, weighted frequent pattern mining, and high utility pattern mining. Frequent pattern mining is traditional data mining concept, this applicable for various databases like transaction databases, streaming databases and so on. Frequent pattern mining only consider presence of an items in transaction database, not consider the importance of an items. Weighted association rule mining not considers quantity of items. With this concept, even if some items appear infrequently, they might still be found if they have high weights. An important topic in data mining field is utility mining. Mining high utility item sets from databases

(2)

Mrs.S.Subhasri, IJRIT 19

refers to finding the item sets with high profits. An item set is called a high utility item set if its utility is no less than a user-specified minimum utility threshold; otherwise, it is called a low-utility item set. Mining high utility item sets from databases is an important task has a wide range of applications such as website click stream analysis, business promotion in chain hypermarkets, cross marketing in retail stores, online e-commerce management, mobile commerce environment planning, and even finding important patterns in biomedical applications. Mining high utility item sets from databases is not an easy task since downward closure property in frequent item set mining does not hold. In other words, pruning search space for high utility item set mining is difficult because a superset of a low-utility item set may be a high utility item set. A method to address this problem is to enumerate all item sets from databases by the principle of exhaustion. Obviously, this method suffers from the problems of a large search space, especially when databases contain lots of long transactions or a low minimum utility threshold is set. The huge number of PHUIs forms a challenging problem to the mining performance since the more PHUIs the algorithm generates, the higher processing time it consumes. We propose two novel algorithms as well as a compact data structure for efficiently discovering high utility item sets from transactional databases. The requirements of users who are interested in discovering the item sets with high sales profits, since the profits are composed of unit profits, i.e., weights, and purchased quantities.

A. Utility Mining

Mining the high utility itemsets from databases refers finding the itemsets with high profits. Here utility means most usefulness, importance, interestingness i.e. mostly used by users

B. Pruning

Cutting away the unwanted branches i.e. avoid the unwanted candidate itemsets or low utility itemsets from database. Low utility itemsets means utility value less than the user specified minimum utility threshold.

C. Utility Types

Utility of items in a database is of two types are following:

1. Internal Utility: It considers the presence of an items in database i.e. quantity of an items in database.

2. External Utility: It considers the importance of items in database i.e. profit of an item to user.

Utility of an itemsets is defined as the product of its internal utility and external utility.

II.RELATED WORK

In the past years, several algorithms are described for mining the high utility itemsets from database. Among them, apriori algorithm [1] for mining frequent itemsets from database. But it generates more candidate itemsets. So, mining performance degraded. To overcome this, frequent pattern growth algorithm [2] without candidate generation proposed. But it also degrades mining performance and frequent itemsets only reflects the presence of an item not considers the importance of items in a database. To overcome problem of frequent itemsets utility mining plays an important role in data mining. Utility mining is the extraction of high utility itemsets from larger transaction database. Utility means most usefulness, importance to user point of view. High utility itemsets are having utility like high profit, high quantity, high cost. High utility itemsets also having utility not lesser than the user specified minimum utility threshold.

Liu et al proposed two phase algorithm[3] .It has two mining phases. In phase1, apriori method is used. In phase 2, mining high utility itemsets. But, it degrades mining performance in terms of candidate generation.Ahmed et al proposed IHUP (Incremental High Utility Pattern) algorithm. It is a tree based algorithm. In this paper, IHUP Tree maintains the information about items and utilities of items. It is better than previous algorithms. In the previously lots of algorithms have been proposed, all algorithms generate more candidate itemsets it leads to larger search space, more memory space, more execution time. So it degrades mining performance.

III.EXISTING SYSTEM

(3)

Mrs.S.Subhasri, IJRIT 20

Today’s database size increases to Giga bytes. But, existing algorithms are only applicable for small transaction databases. For example, in marketing places lots of daily transactions stored in databases, in that we mine mostly purchased items by user. That most purchased items increase the sales profits, based on that promote those useful items in market. In existing algorithms, Potential high utility item sets (PHUIs) are found first, and then an additional database scan is performed for identifying their utilities. Existing methods generate a large set of PHUIs and their mining performance is degraded.

Figure1. Existing System structure

This situation become difficult when databases contain many long transactions or low thresholds are set.

The huge number of PHUIs forms a challenging problem to the mining performance since the more PHUIs the algorithm generates, the higher processing time it consumes and more memory space, multiple scans for identify the high utility itemsets.

IV.PROPOSED SYSTEM

We propose two novel algorithms as well as a compact data structure for efficiently discovering high utility item sets from transactional databases.

Larger database

Scan the database

Frequent itemset mining

High utility itemset mining

More candidates

Time consumes

Degrades performance

Larger database

Scan the database

Enhanced UP-Growth

algorithm

Construct UP- Tree

Reduce candidates High utility itemsets mining

Improve performance

(4)

Mrs.S.Subhasri, IJRIT 21

Figure 2. Proposed System structure

High utility item sets and maintaining important information related to utility patterns within databases are used.

High-utility item sets can be generated from UP-Tree efficiently with only two scans of original databases. The concept of weighted downward closure property is used. By using transaction weight, weighted support not only reflects the importance of an item set but also maintains the downward closure property during the mining process.

A. Candidate Pruning

Unwanted candidate itemsets or low utility itemsets are cut down from tree based structure or from database.

Here low utility itemsets having utility less than minimum utility threshold.

B. Frequent Itemsets

Frequent itemsets mining is also called association rule mining. It is widely recognized that FP-Growth achieves a better performance than Apriori-based algorithms since it finds frequent itemsets without generating any candidate itemsets and scans database just twice.

C.High Utility Itemsets

An itemsets utility is defined as the product of its external utility and its internal utility. An itemsets is called a high utility itemsets if its utility is no less than a user-specified minimum utility threshold; otherwise, it is called a low-utility itemsets. Mining high utility itemsets from databases is an important task has a wide range of applications such as website click stream analysis.

D.Utility Mining

The performance of utility mining is compared to previous algorithm. In these algorithms, potential high utility itemsets (PHUIs) are found first, and then an additional database scan is performed for identifying their utilities.

However, existing methods often generate a large set of PHUIs and their mining performance is degraded consequently. We describe several strategies to decrease overestimated utility and enhance the performance of utility mining. Based on the algorithm, utility mining will works efficiently in terms of running time and space consumption, so performance will also increases.

VI.CONCLUSION

We have proposed two efficient algorithms named UP-Growth and UP-Growth+ for mining high utility itemsets from transaction databases. A data structure named UP-Tree was proposed for maintaining the information of high utility itemsets. PHUIs can be efficiently generated from UP-Tree with only two database scans. Moreover, we developed several strategies to decrease overestimated utility and enhance the performance of utility mining.

The proposed algorithms, especially UP-Growth+, well perform especially when databases contain lots of long transactions or a low minimum utility threshold is used. Reduce candidate generation, memory consumption, execution time and improve mining performance. Based on the algorithm, utility mining predict the customer purchasing behavior and decision making in various places.

REFERENCES

(5)

Mrs.S.Subhasri, IJRIT 22

[1] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int’l Conf. Very Large

Data Bases (VLDB), pp. 487-499, 1994.

[ 2] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. ACM-SIGMOD Int’l Conf. Management of Data, pp. 1-12, 2000.

[3] Y. Liu, W. Liao, and A. Choudhary, “A Fast High Utility Itemsets Mining Algorithm,” Proc. Utility-Based Data Mining Workshop,2005.

[4] C.F. Ahmed, S.K. Tanbeer, B.-S. Jeong and Y.-K. Lee, “Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases,” IEEE Trans. Knowledge and Data Eng., vol. 21, no. 12, pp. 1708-1721, Dec.

2009.

[5] C.H. Cai, A.W.C. Fu, C.H. Cheng, and W.W. Kwong, “Mining Association Rules with Weighted Items,” Proc.

Int’l Database Eng. and Applications Symp. (IDEAS ’98), pp. 68-77, 1998.

[6] A. Erwin, R.P. Gopalan, and N.R. Achuthan, “Efficient Mining of High Utility Itemsets from Large Data Sets,”

Proc. 12th Pacific-Asia Conf. Advances in Knowledge Discovery and Data Mining (PAKDD), pp. 554-561, 2008.

[7] V.S. Tseng, C.J. Chu, and T. Liang, “Efficient Mining of Temporal High Utility Itemsets from Data Streams,”

Proc. ACM KDD Workshop Utility-Based Data Mining Workshop (UBDM ’06), Aug.2006

[8] B.-E. Shie, V.S. Tseng, and P.S. Yu, “Online Mining of Temporal Maximal Utility Itemsets from Data Streams,”

Proc. 25th Ann. ACM Symp. Applied Computing, Mar. 2010.

[9] S.K. Tanbeer, C.F. Ahmed, B.-S. Jeong and Y.-K. Lee, “Efficient Frequent Pattern Mining over Data Streams,”

Proc. ACM 17^th Conf. Information and Knowledge Management, 2008.

[10] V.S. Tseng, C.-W. Wu, B.-E. Shie, and P.S. Yu, “UP Growth: An Efficient Algorithm for High Utility Itemsets Mining,” Proc. 16^th ACM SIGKDD Conf. Knowledge Discovery and Data Mining (KDD ’10), pp. 253-262, 2010.

[11] M.-S. Chen, J.-S. Park, and P.S. Yu, “Efficient Data Mining for Path Traversal Patterns,” IEEE Trans.

Knowledge and Data Eng., vol. 10, no. 2, pp. 209-221, Mar. 1998.

[12] C. Creighton and S. Hanash, “Mining Gene Expression Databases for Association Rules,” Bioinformatics, vol.

19, no. 1, pp. 79-86, 2003.

[13] B.-E. Shie, H.-F. Hsiao, V., S. Tseng, and P.S. Yu, “Mining High Utility Mobile Sequential Patterns in Mobile Commerce Environments”, Proc. 16th Int’l Conf. Database Systems for Advanced Applications (DASFAA

’11), vol. 6587/2011, pp. 224-238, 2011.

[14] J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang, “H-Mine: Fast and Space-Preserving Frequent Pattern Mining in Large Databases,” IIE Trans. Inst. of Industrial Engineers, vol. 39, no. 6, pp. 593-605, June 2007.

[15] Vincent S. Tseng, Bai-En Shie, Cheng-Wei Wu, and Philip S. Yu, “Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases” IEEE Trans. Knowledge and Data Eng., vol. 21, no. 12, Aug.

2013