Journal of Environmental Science, Computer Science and Engineering & Technology

(1)

JECET; September 2013 – November 2013; Vol.2.No.4, 1230-2235.

Journal of Environmental Science, Computer Science and Engineering & Technology

An International Peer Review E-3 Journal of Sciences and Technology

Available online at www.jecet.org Computer Science

Research Article

Mining Frequent Patterns Based on

Agglomerative Clustering and Apriori Algorithm

Suneetha Merugula

¹

and Rachitha Sony Krotha

²

GMR Institute of Technology, Rajam, Andhra Pradesh, India

Received: 5 October 2013; Revised: 25 October 2013; Accepted: 29 October 2013

Abstract: Decision making is considered as one of the most difficult tasks in association rule mining.

Traditional data mining methods of market basket analysis can be used to identify transactional data.

It is used to determine which products are purchased together and frequently. But these methods are time consuming and we need to scan the entire database many times. This paper proposes a method that groups the data using Agglomerative clustering algorithm. These clusters intern are given as input for the association rule mining based Apriori algorithm and Most Frequent Pattern Mining algorithm to generate frequent patterns.

Keywords: Frequent pattern mining, Apriori, Agglomerative.

INTRODUCTION

Sales data typically have information such as date, items in transaction, quantity and price. The data mining process is intended to turn this sales data into information. Data mining is designed to identify relationships, patterns and trends that may be present among data. Traditional data mining methods of market basket analysis can be used to identify transactional data. It helps to determine which products are purchased together and how often and also to examine customer preferences of purchase. But these methods are time consuming as we need to scan the entire database many times. So there is a need to provide an efficient method to generate the frequent patterns. Other data mining techniques like clustering and associations are preferable to find more meaningful patterns for future predictions.

(2)

Clustering is used to generate groups of related patterns, while association provides a way to get generalized rules of dependent variables^{3, 4}.

To deal with this issue, this paper is aims to group the food items depending upon their selling frequency. Further application of association rule mining on these clusters gives efficient patterns of single item or multiple items. We have used a method that combines clusters based Agglomerative clustering algorithm with Association rule mining based APRIORI algorithm and Most Frequent Pattern mining algorithm to generate frequent patterns. These frequent patterns will assist manager to formulate marketing strategies and maximize profit. The paper entitled Frequent Patterns Mining of Stock Data using Hybrid Clustering Association Algorithm, have described that the sales data can be divided in groups based on their sold quantities using K Means Clustering algorithm and then to generate the frequent patterns based on Most Frequent pattern Mining algorithm. K Means algorithm has some limitations^{5, 6}.

Disadvantages:

1. K Means Algorithm is sensitive to the selection of initial cluster centre.

2. It is sensitive to the outliers.

3. Most Frequent pattern algorithm gives frequent pattern of single item. But in business many times we have to do analysis of which particular combination is sold together. It is not possible to do such type of analysis using Most Frequent Pattern Mining algorithm.

Related work: In this paper clustering and association are combined together to overcome the limitations in K Means Algorithm

1. Clustering is done using Agglomerative Clustering algorithm

2. Finding frequent patterns of single item is done using Most Frequent pattern Mining Algorithm.

3. Finding frequent patterns of combination of items is done using Apriori association rule mining algorithm.

Advantages:

1. Uses large item set property 2. Easily parallelized 3. Easy to implement

METHODOLOGY

In this paper we can observe 3 phases. Those are

1. Clustering the given data set 2. Applying MFP algorithm on each cluster 3. Applying apriori on each cluster

Clustering the given data set: In this phase we cluster the given dataset using agglomerative clustering algorithm i.e hierarchical clustering algorithm and we cluster based on the number of items present in each transaction and finally we will be brought up with two clusters namely small Transaction Cluster And Big Transaction Cluster Respectively.

Applying MFP Algorithm on Each Cluster: In this phase we will apply Most Frequent pattern mining algorithm on each cluster i.e small transactions cluster and big transactions cluster to get most frequent items along with their counts.

(3)

Applying APRIORI on Each Cluster: In this phase we will apply Apriori association rule mining algorithm on each cluster i.e small transactions cluster and big transactions cluster. The Apriori algorithm is based on the property “Any subset of frequent item set must be frequent” and we generate candidate sets based on this apriori property to get most frequent items and which is a combination of items along with their counts.

ALGORITHM

Step1 : Select the database and give it as input to agglomerative clustering algorithm.

Step 1.1: Initially each row is a cluster.

Step 1.2: calculate the count of items in each transaction.

Step 1.3: Based on count insert each row into its equivalent cluster.

Step 1.4: Now insert the clusters with below threshold into one cluster and the remaining into other cluster.

Step2: Input Each cluster obtained from agglomerative clustering to Most Frequent Pattern Algorithm Step 2.1: Now count the number of occurrences of each item by using Most Frequent Pattern algorithm.

Step3: Input Each cluster obtained from agglomerative clustering to Apriori Algorithm Step3.1: first generate all the unique elements.

Step3.2: Generate Cartesian products and check each whether each and every element in the candidate sets satisfy the threshold or not and the satisfied ones has to go for the next step.

Step 3.3: Continue 3.2 until all the Cartesian products obtained and checked with the minimum Support

RESULTS

Fig1: MFP for Big Transactions cluster

(4)

Fig.2: MFP for small Transactions cluster

Fig.3: APRIORI-for-small-transactions-cluster

(5)

Fig.4: APRIORI-for-Big-transactions-cluster

CONCLUSION

Decision making in business sector is considered as one of the critical tasks .Hybrid clustering and association mining approach is used to classify data and find compact form of associated patterns of sale. From the experimental results it is clear that proposed approach is very efficient for mining patterns of huge data and predicting the factors affecting the sale of products. Agglomerative Clustering algorithm gives better result as compared to K-Means clustering algorithm. MFP and Apriori give frequent patterns. So we conclude that the Application of Agglomerative with Apriori and MFP will give useful patterns of multiple items or single item. These frequent patterns may assist manager to formulate marketing strategies and maximize profit.

REFERENCES

1.Aurangzeb Khan, Khairullah khan, Baharum B. Baharudin, ” Frequent patterns mining of stock data using hybrid clustering association algorithm” proceedings of International Conference on Information Management and Engineering (ICIME)Malaysia,2009,.1,667–671.

2. S. Kotsiantis, Kanellopoulos “Association Rules Mining: A Recent Overview” GESTS International Transactions on Computer Science and Engineering, 2006,32(1), 71- 82.

3.R. Agrawal, T. Imielinski and A. Swami, “Mining Association Rules between Sets of Items in Large Databases”, Proceedings of ACM SIGMOD Conference, Washington DC, USA, May 1993.

4.V.Umarani, Dr.M.Punithavalli,”A study on effective mining of association rules from huge database”, International Journal of Computer Science and Research, 2010, 1, 1, 30-34.

(6)

5.K. Shyamala and S. P. Rajagopalan, “Mining Essential and Interesting Rules for Efficient Prediction”, Asian Journal of Information Technology (AJIT), 2009,6,11, 1192-1195.

6.Madhav N. Segal and Ralph W. Giacobbe,”Market Segmentation and Competitive Analysis for Supermarket Retailing” International Journal of retail and distribution management, 1994, 22, 1, 38- 48.

*Corresponding Author: Suneetha Merugula;

GMR Institute Of Technology, Rajam, Andhra Pradesh, India

.