Previous Studies for Mining Association Rules

Chapter 2 Literature Review

2.2 Data Mining

2.2.3 Previous Studies for Mining Association Rules

It is well know that the association rule mining task is first introduced by Agrawal et al. (Agrawal et al. 1993). They develop Apriori algorithm which generates a candidate set and tests it in a breadth first manner. Before moving to its next level (k+1), this approach discovers all the frequent item sets at level k. The support value of each node is calculated at level k and prunes those nodes if the support values of those nodes do not satisfy a user define support value. It generates candidate item sets at each level and scans the database so frequently that it is costly, especially when there is a long pattern and large transaction database. After generating frequent item sets, the next step is to generate rules form those frequent patterns. A rule is valid if it satisfies a user defined confidence value. Confidence value of a rule 𝑋 → 𝑌 is defined by

𝑠𝑢𝑝𝑝(𝑋∪𝑌)

𝑠𝑢𝑝𝑝(𝑋) , 𝑤ℎ𝑒𝑟𝑒 𝑋, 𝑌 ⊆ 𝐼, 𝑠𝑢𝑝𝑝(𝐼) ≥ min _𝑠𝑢𝑝𝑝. Apriori algorithm follows support-

confidence framework, introduces the following major issues:

1) This algorithm highly depends on user define support and confidence values, although users have no knowledge regarding the database.

2) If the support value is too high, generation of frequent item sets is less and hence few rules are mined. If the support value is too low, then al- most possible patterns will become frequent and a large number of un- necessary rules are generated.

3) Since this algorithm only use a single criteria i.e. confidence to evaluate the quality of a generated rule, it generates misleading rules as well (Zhou & Yau 2007).

Some recent studies focused on designing different efficient association rule mining algorithms where the first task is to find frequent item sets (Webb 2000; Toivonen 1996; Zhou & Yau 2007; Wu et al. 2002; Wu et al. 2004). The main limitation of these ap- proaches need multiple passes over the transaction database to find frequent item sets. A large number of disk I/O is required for a large database since database are disk resident and for each pass it needs to read the database completely. The number of disk I/O depends on the size of the database (Yan et al. 2005).

The diversity, application, usefulness and probability of genetic algorithm for mining high utility item sets containing negative item values for transaction database is showed by Kannimuthu and Premalatha (Kannimuthu & Premalatha 2014). Mutation operator is used to maintain diversity from one generation of population to the next one. Generally, the mutation probability is set to low. The searching task is becoming primitive random search if the mutation probability is set too high. To avoid this, instead of using fixed mutation rate they use a ranked mutation approach because it gives better result (Premalatha & Natarajan 2009). Initially a large mutation rate is applied for exploring more on the search space. The ranking of offspring depends on the fitness value. The mutation rate is assigned for an offspring based on the rank of that offspring. The rate of mutation is set low for a higher ranked offspring and through this way the offspring containing highest fitness value may reach to the optimal solution in high possibility. To mine quantitative association rules researchers propose a new algorithm which is based on genetic algorithm named QUANTMINER (Salleb-aouissi et al. 2013; Salleb- aouissi et al. 2007). By optimizing support and confidence value, this system dynamical- ly identify good intervals in association rules. Researchers apply this algorithm in different data sets and showed the usefulness of this algorithm as a data mining tool.

R.J.kuo and C.W.Shih use a new meta-heuristic technique name ant colony system (Kuo & Shih 2007) to mine large database for efficient searching of association rules. Multi- dimensional constraints are considered in this approach. In addition this approach also considers user’s assign constraint. The experimental result shows that it gives more con- densed rules than Apriori algorithm. The computational time of this approach is less than Apriori algorithm. Although this system provides promising results but this system still faces some issues which need to resolve. After analyzing the results it found that lots of similar rules are generated so the researchers suggest another technique like fuzzy approach to merge those similar rules into one class.

Researchers propose a novel particle swarm optimization algorithm, named rough particle swarm optimization algorithm (RPSOA) to mine numeric association rules. Based on notion of rough patterns, this algorithm uses rough decision variables and rough particles. As opposed to precise values, a rough particle consists of upper and lower values. Conventional and rough particles and variables are used by this proposed method. This approach is designed in such a way that it simultaneously search the intervals of numeric

attributes and extract the numeric association rules based on these intervals. Furthermore, this approach directly mines association rules from a data set without generating frequent item sets. Since there was no study for mining association rules, researchers conclude that this approach gives satisfactory results in its first application (Alatas & Akin 2008b). Most of the association rule mining tasks assume that items in a dataset have a uniform distribution. By weighting individual items, weighted association rule mining tasks are used to provide a notion of importance to an individual item. To assign meaningful weights to each item for mining association rules, researchers introduces a new approach named Weighted Association Rule Mining using Particle Swarm Optimization (WARM SWARM). In this proposed approach researchers use PSO to search the vector space of possible solutions for finding the optimal solution of a problem. Unlike Apriori, this approach multiplies the support of an item by the sum of the weights of its constituent items during the generation of candidate item sets (Pears & Koh 2012).

To mine association rules most researchers focus on ameliorating computational effi- ciency. To determine the threshold values of support and confidence which are the key factors for association rule mining task, researchers propose a new approach which is based on a particle swarm optimization technique. Suitable fitness values and their corre- sponding support and confidence values of identified swarms are searched through this approach. Their result show that particle swarm optimization algorithm quickly finds suitable threshold fitness values of item sets and quality rules are obtained through this way. Users can mine specific rules from a large database by setting support or fitness values. Since this technique free from support constraint, the main problem of this approach is users have no control over mining techniques. Apart from this their result only shows two or three dimensional rules instead of more dimensional rules which could be interesting for the policy makers of the industry (Kuo et al. 2011).

In document New evolutionary algorithms for mining interesting association rules (Page 46-48)