Reduction of the number of the database scans is one of the major concerns of designing algorithms for mining frequent itemsets and **association** **rules**. In partition **algorithm** [16], it was possible to reduce the number database scans to two. For this, the database is divided into small partitions so that each such partition can be accommodated in the main memory. In AS–CPA **algorithm** [18] also partitioning technique is applied. It also needs at most two scans of the database. In the DLG **algorithm** [17], by using the TIDs, itemsets are converted to memory resident bit vectors. Then frequent itemsets are computed by logical AND operations on the bit vectors. In Dynamic Itemset Counting (DIC) **algorithm** [19], counting of itemsets of different cardinality is done simultaneously along with the database scans. Pincer – Search [20], All MFS [21] and MaxMiner [22] are algorithms for finding maximal itemsets.

Show more
13 Read more

Data mining is the process of analyzing the data from different perspectives and going over the useful information – information that can be used to increase revenue, cuts costs, or both[10]. Precisely data mining is the process of finding correlation or patterns among dozens of fields in large relational database. One of the most important data mining applications is that of mining **association** **rules**. Data mining has many virtues and vices which involves in many fields. Some of the examples are (1) Bank – to identify patterns that help be used to decide result for loan application to the customer, (2) Satellite research – to identify potential undetected natural resources or to identify disaster situations , (3) Medical fields – to protect the patients from infectious diseases , (4) Market strategy – to predict the profit and loss in purchase. Data mining functions include clustering, classification, prediction, and link **analysis** (associations). One of the popular techniques used for mining data is knowledge discover database for pattern **discovery** is the **association** rule. It implies certain **association** relationships among a set of objects. An **algorithm** for **association** rule induction is the **Apriori** **algorithm**, proven to be one of the popular data mining techniques used to extract **association** between various item set among large amount of data. Many algorithms come under **association** rule mining but **Apriori** **algorithm** is one of the typical algorithms. The **rules** produced by **Apriori** **algorithm** makes it easier for the user to understand and further apply the result. It was introduced by Agarwal in 1993; it is a strong **algorithm** which helps in finding **association** between itemsets. A basic property of **apriori** **algorithm** is “every subset of a frequent item sets is still frequent item set, and every superset of a non-frequent item set is not a frequent item set”. This property is used in **apriori** **algorithm** to discover all the frequent item sets. Further in the paper we will see more about the **Apriori** **algorithm** steps in detail.

Show more
Data mining has provided a lot of opportunities to mine customer purchasing patterns and uncover hidden knowledge from data. Researchers have explored rule extraction using **association** **analysis**. Oladipupo and Oyelade [6] identify student’s failure patterns using **association** mining. In their research, 30 courses for 100 level and 200 level students were considered to discover patterns of failed courses. The discovered patterns can assist academic planners in making constructive recommendation, curriculum structure and modification in order to improve students’ performances. Agarwal et al., [7] explored application of **Apriori** **algorithm** in grocery store. Tissera et al., [8] presented a real-world experiment conducted in an ICT educational institute in Sri Lanka for analyzing students’ performance. In their research, they applied a series of data mining task to find relationships between subjects in the undergraduate syllabi. They used **association** **rules** to identify possible related two subjects’ combination in the syllabi, and apply correlation coefficient to determine the strength of the relationships of subject combinations identified by **association** **rules**. Researchers have also carried out survey of data mining algorithms in market basket **analysis** [9, 10]. The use of **association** based classification for relational data in web environment was presented in [11]. The intention of the author is to put forward a alteration of the fundamental **association** based classification technique that can be helpful in data gathering from Web pages. Sumithra and Paul [12] presented a distributed **Apriori** **association** rule mining and classical **Apriori** mining algorithms for grid-based knowledge **discovery**. Qiang et al., [13] proposed **association** classification based method on compactness of **rules**. The proposed approach suffers from a difficulty of over fitting because the classification **rules** satisfied least support and lowest confidence are returned as strong **association** **rules** return to the classifier.

Show more
Aggarwal [17] has proposed criteria which emphasis the importance of the actual correlation of items with one another with an **algorithm** which has a good computational efficiency and as well as maintain statistical robustness. **Association** **rules** in data sets which have varying density or even negative **association** **rules** have designed by the author. Temporal **association** **rules** are studied in [18] with an innovative approach and considered the time constraints and evaluated the performance of the proposed methods. Dunkel and Soparkar [19] have considered three important aspects; namely representation, organization and access of the data. When input and output costs are considered then the access of the data may significantly affect the performance. Authors also made comparison between column-wise and row-wise approaches of data access of **Apriori** **association** **algorithm** and found that counting in the **Apriori** **algorithm** with data accessed in the column-wise manner is better by reducing the degree to which the data and counters need to be repeatedly brought into the memory. Zhu et, al. [20] have developed methodology on the principle of the maximum entropy for studying the privacy consequences of data mining results. Aggarwal and Yu [21] have applied a graph-theoretic search **algorithm** which is proportional to the size of the output, on the stored pre-processed data in such a way that online

Show more
In everyday life, information is collected almost everywhere. For example, at supermarket checkouts, information about customer purchases is recorded. When payback or discount cards are used, information about customer purchasing behavior and personal details can be linked. Evaluation of this information can help retailers devise more efficient and modified marketing strategies. The majority of the recognized organizations have accumulated masses of information from their customers for decades. With the e-commerce applications growing quickly, the organizations will have a vast quantity of data in months not in years. Data Mining, also called as Knowledge **Discovery** in Databases, is to determine the trends, patterns, correlations and anomalies in these databases that can assist to create precise future decisions. Physical **analysis** of these huge amount of information stored in modern databases is very difficult. Data mining provides tools to reveal unknown information in large databases which are already stored. A well-known data mining technique is **Association** Rule Mining. It is able to discover all the interesting relationships which are called as associations in a database. **Association** **rules** are very efficient in revealing all the interesting relationships in a relatively large database with a huge amount of data. The large quantity of information collected through the set of **association** **rules** can be used not only for illustrating the relationships in the database, but also for differentiating between different kinds of classes in a database. **Association** rule mining identifies the remarkable **association** or relationship between a large set of data items. With a huge quantity of data constantly being obtained and stored in databases, several industries are becoming concerned in mining **association** **rules** from their databases.

Show more
Mining **association** **rules** is one of the several data mining tasks, has a big share in the data mining research. It aims to extract interesting correlations, frequent patterns, associations or casual structures among sets of items in the transactional databases or other data repositories. This is attributed to its wide area of applications [4]. Applications of **association** rule mining span a wide area of business from market basket **analysis** to **analysis** of promotion and catalogue design, and from designing store layout to customer segmentation based on buying patterns. Other applications include health insurance, fraudulent **discovery** and loss-leader **analysis**, telecommunication networks market and risk management, inventory control etc. Various **association** mining techniques and algorithms are briefly introduced and compared later. **Association** rule mining has the same challenges which are being faced by data mining.

Show more
11 Read more

Abstract: Mining is a process that provides useful information on surfing and access pattern information based on capturing the behaviour of the user. Semantic knowledge helps to understand how the users will interact with the system. In this paper, we propose a Boolean based **APriori** Pattern (APP) **algorithm** to discover pattern based on human interaction using behavioural **analysis**. In the process of data mining, we have used a Boolean expression that helps to determine the pattern **discovery** based on the use of frequent pattern by applying **association** **rules**. The behavioural **analysis** is proposed based on the classification of ideas based on comments concerning positive opinion /contrary opinion during human interaction in the practical scenarios. The behavioural **analysis** is represented as a tree hierarchy where tree based mining is performed by the tree construction and interaction of flow patterns i.e., frequent patterns. The study shows that the successful pattern can be extracted based on the behavioural **analysis** of human interaction such as frequent pattern, flow interaction and relationships between the interactions.

Show more
Several works have proposed to eliminate redundancy from a complete set of frequent items, which include frequent closed itemsets[10] and generators[11]. Frequent closed itemsets are the maximal itemsets among the itemsets appearing in the same set of transactions, and generators are the minimal itemsets. Several studies[12,13] show that the frequent closed itemsets are condensed representation of complete set of frequent itemsets. The number of frequent closed itemsets is much smaller than total number of frequent itemsets. However, frequent closed itemsets still contain some amount of redundancy. Generators are minimal itemsets among itemsets appearing in the same set of transactions. Generators are non-redundant and concise in nature. The set of generators in a dataset uniquely represents the patterns in the dataset. Though the number of generators obtained can be little more than the number of frequent closed itemsets, the length of a classification rule based on generators is on average smaller than the length of a classification rule based on frequent closed itemsets. Generators are a concise, loss-less and non-redundant representation of all frequent itemsets for the purpose of generating **association** **rules** [13]. In this paper, we propose to develop Associative Classifier from generators and prove that the set of class **association** **rules** thus generated is much more effective compared to the set of class **association** **rules** generated from all frequent itemsets. To the best of our knowledge, no previous work has considered using Associative Classifier based on frequent generators.

Show more
iv) Classification based on **Association** **Rules** Generated in a Bidirectional Approach (CARGBA): CARGBA [16] is essentially a bidirectional rule generation approach that generates not only general but specific **association** **rules** too. General **rules** are produced by **apriori** approach and specific **rules** are generated by considering the larger length of respective rule in order to generate specific details wherein support would apparently lower. Then, classifier is build by the construction of final rule set consisting of essential **rules** formed by the mutual mixture of both the **rules** by taking confidence, support and length of the rule into consideration. Measure such as correlation coefficient is used for pruning of such **rules**. When a new tuple is to be classified, the classifier classifies according to the first rule in the final rule set is formed that covers the new tuple.

Show more
In this paper we envision a distributed clustering **algorithm** which is scalable and provides cooperation while preserving a high degree of independency for each site. Clustering is a discipline aimed at revealing groups_ or clusters_ of similar entities in data. As clustering is an essential technique for data mining, distributed clustering algorithms were developed as part of the distributed data mining research Clustering or unsupervised learning, is the task of grouping together related data objects. Unlike supervised learning, there isn’t a predefined set of discrete classes to assign the objects to. Instead, new classes, in this case called clusters, have to be found. There are a lot of possible definitions for what a cluster is, but most of them are based on two properties: objects in the same cluster should be related to each other, while objects in different clusters should be different. Clustering is a very natural way for humans to discover new patterns, having been studied since the ancient times. If it is regarded as unsupervised learning, clustering is one of the basic tasks of machine learning, but it is used in a whole range of other domains: The focus of this survey is the application of clustering algorithms in data mining.

Show more
12 Read more

In 2010, Jiabin Deng et al. [3] propose about the use of Power-law Distributions and Improved Cubic Spline Interpolation for multi-perspective **analysis** of shareware download frequency. The tasks include data mining the usage patterns and to build a mathematical model. Through **analysis** and checks, in accordance with changes to usage requirements, our proposed methods will intelligently adjust the data redundancy of cloud storage. Thus, storage resources are fine tuned and storage efficiency is greatly enhanced

One of data mining techniques that are commonly used in web mining is **Association** **Rules**. In brief, an **association** rule is an expression X => Y, where X and Y are sets of items. The meaning of such **rules** is quite intuitive: Given a database D of transactions where each transaction T є D is a set of items, X => Y expresses that whenever a transaction T contains X than T probably contains Y also. The probability or rule confidence is defined as the percentage of transactions containing Y in addition to X with regard to the overall number of transactions containing X. And, according to [9] the task of **Association** Rule mining has received a great deal of attention.

Show more
Badri patel et al. [6] describes an **Apriori** **algorithm** and **association** rule mining and to improved **algorithm** by using the Ant colony optimization **algorithm**. ACO was introduced by dorigo and has developed substainly in the last few years. A huge amount of data has collected by many organization ,public sector and to store this dataset on the database systems. For analyzing the information system arise two major problems. First to reduce redundant objects and attributes so as to obtain the minimum subset of attributes ensuring a good estimation of classes and a good quality of classification. Second is representing the information system as a decision table which shows dependencies between the minimum subset of attributes and particular class numbers without redundancy. The working process of **Apriori** **algorithm** defines in steps. It is the two step processes used to find the frequent item set to join and prune. ACO **algorithm** was encouraged from original behaviour of ant colonies. It is used to solve to many hard optimizations including the travelling salesman problem. ACO system consists two **rules**. First, the local pheromone update rule, which is applied in constructing solution. Second one is global pheromone update rule which is applied in ant construction. ACO **algorithm** describes two more methods, namely trail evaporation and optionally deamonactions. ACO **algorithm** is used for the particular problem of minimizing the number of **association** **rules**. **Apriori** **algorithm** uses transaction data set and user defined support and confidence value then generates the **association** **rules**. These **association** rule set is distinct and continues. The weak **rules** are required to discard. Parag Deoskar [7] proposed an **algorithm** for detecting the lung cancer patients. This **algorithm** is based on ant colony optimization. It classify increase the lung cancer chances detect the early and correct decision which prove to be detect in battling disease.

Show more
4.2 Apriori Algorithm All data that are recorded in the transaction database is fed as input for the Apriori algorithm, which generates rules based on the support and confidence measures[r]

Charu C. Aggarwal, [4] presented the detailed information about data streams. He discussed how to relate variant data mining technologies to data streams for supportive and unknown data extraction. He also discussed data stream clustering, data stream classification, **association** rule mining **algorithm** in data stream and frequent pattern mining. Kuldeep Malik et al., [13] explained the FP-Growth **algorithm** and he proposed Enhanced FP-Growth **Algorithm**. He defined the Enhanced FP- Growth is working without prefix tree and any other complex data structure and he has proved that Enhanced FP-Growth has produced good results than FP-Growth.

Show more
Goswami D.N., ChaturvediAnshu. Raghuvanshi C.S[7]In the previous section they have described the Record filter approach based on **Apriori**, now they suggested one another changes in **Apriori** which gives the better result as compare to the Record Filter approach. The Intersection **Algorithm** is designed to improve the efficiency, memory management and remove the complexity of **Apriori**. Here they proposed a different approach in **Apriori** **algorithm** to count the support of candidate item set. Basically this approach is more appropriate for vertical data layout, since **Apriori** basically works on horizontal data layout. In this new approach, they used the set theory concept of intersection. In Classical **Apriori** **algorithm**, to count the support of candidate set each record is scanned one by one and check the existence of each candidate, if candidate exists then we increase the support by one. This process takes a lot of time, requires iterative scan of whole database for each candidate set, which is equal to the max length of candidate item set. In modified approach, to calculate the support we count the common transaction that contains in each element’s of candidate set, by using the intersect query of SQL. This approach requires very less time as compared to classical **Apriori**

Show more
The test results show that the network forensics system based on **Apriori** application can detect attack crime accurately, with the increase of attack continuous seconds, the detection ratio would be improved, the system has a high detection ratio and low false alarm ratio. In addition, the system can detect some new attack crimes. The speed and detection ratio in different minimal support of mining **algorithm** are tested, the test results show that minimal support is smaller, the speed is slower, **rules** are generated longer, generated **rules** are more and false alarm ratio will be higher. If the support is set too big, it will be opposite, but the detection ratio may be lower, and it will cause higher miss probability. Many experiments show that minimal support is set as 0.3 and it will be balance on the speed, detection ratio and false alarm ratio.

Show more
From Fig.10,11,12,and 13 we observed that when support% increases from 10 to 60 , the response time to generate CCFDs is reduced. From support%≥40 , all the four fig‟s show the same response time. While when confidence% increases from 70 to 100 and support is kept constant, we also observe a reduction in response time to generate CCFDs by all 3 algorithms. Among the 3 algorithms CCFD-FPgrowth took less time to generate CCFDs .This is because the time taken to compute Frequent itemsets by FPGrowth is less when compared to others. The algorithms CCFD-AprioriClosed and CCFD-ZartMNR incur an additional overhead for computing closed itemsets. Response time of CCFD-ZartMNR is always observed high, as it involves cost of computing closed itemsets and mining minimal non redundant **rules**.

Show more
In existing associative classification techniques, only one class label is associated with each rule derived, and thus **rules** are not suitable for the prediction of multiple labels. However, multi-label classification may often be useful in practice. Consider for example, a document which has two class labels “Health” and “Government”, and assume that the document is associated 50 times with the “Health” label and 48 times with the “Government” label, and the number of times the document appears in the training data is 98. A traditional associative technique like CBA generates the rule associated with the “Health” label simply because it has a larger representation, and discards the other rule. However, it is very useful to generate the other rule, since it brings up useful knowledge having a large representation in the training data, and thus could take a role in classification.

Show more