Multi-Class Classification Based on Association rule (MCAR)

2.7 Associative Classification Algorithms

2.7.5 Multi-Class Classification Based on Association rule (MCAR)

MCAR employs an effective technique to produce frequent ruleitems and uses a method to rank rules that keeps the rules with high confidence values for prediction. MCAR works in two steps: generation of rules and classifier building. The first stage starts by searching the training data and extract frequent-1 rule itemsets, and these ruleitems are combined together to generate candidate- 2 ruleitems using other attributes. The ruleitems that passes a certain set values of minimum support and minimum confidence is treated as a frequent rule. In the second stage, effectiveness

Input:, minconf ,minsupp thresholds and Training data (D), Output: A Rule Set

Preprocessing phase

If D having integer/real attributes

Discretise continuous columns using a Multi-interval discretisation method.

Shuffle the training objects locations randomly The Algorithm

Scan D for the set of frequent one-ruleitems

while {

}

Rank R according to the method shown in fig 3.7. Evaluate R on D

Remove all rules from R where there is some rule of a higher rank and .

Figure 2.3 MCAR algorithm

of the rules on the training datasets is measured to build a classifier. The rules that cover or can classify a certain number of training data instances are placed in the classifier. The MCAR algorithm is presented in Figure 2.4 Figure 2.3 shows the rule discovery procedure used by MCAR, In Figure 2.4, the Multi-interval discretisation technique of (Fayyad and Irani, 1993) is applied in MCAR for real and integer type of data. MCAR scan the training dataset calculates the frequencies of the itemsets. The itemsets along with their classes, which have support count more than the minsupp value, are stored in vertical format in an array. Rest of the itemsets are discarded. Produce function as described in the Figure 2.3 is used to discover ruleitems of size k by combining the different column itemsets of size k-1 and then their rowIds are intersected. The outcome of intersection between two itemsets rowIds is a set that holds the rowIds of both the itemsets occurrence in the training data. The above set and the array containing the frequencies of the class labels were generated in the first scan. The values are used to formulate the

confidence and support of new combinations of ruleitems. .

Input: set of created rules (R),a array (Tr), class array C

A rule r in R has the following properties: Items, class, rowIds(tid-list) The class array, C, contains the occurrences of class labels in the training data Output: classifier (Cl)

R’ = sort(R); insert r1.rowIds into Tr

for each rule rR’ in sequence do if r classifies at least a single case begin

insert r at the end of Cl; insert r.rowIds into Tr

end if end

If Tr.size > 0 then

select the majority class as a default class from (C-Tr ) else

select the majority class as a default class from the current Cl and add it to Cl

end if

for each rule ( )Cl in sequence do if there is a lower ranked rule where prune

end if end

Figure 2.4 MCAR classifier builder algorithm

Function produce

Input: A set of ruleitems S

Output: set of produced ruleitems

For each pair of disjoint items I1, I2in S Do

If (<I1I2>, c) passes the minsupp threshold

if (<I1I2>, c) passes the minconf threshold

end if end if end end Return

Figure 2.5 Rule discovery algorithm of MCAR

The function in Figure 2.5 is called in each iteration of the algorithm and produces a frequent

ruleitems at iteration K in order to discover frequent itemsets at K+ 1 iteration. As documented that the number of rules generated by the AC algorithms are large (Baralis et al., 2004; Li et al.,

2001). MCAR ranks and then prunes the redundant rules to form a set of classifier that have less number of rules, which are eventually easy to understand and handle. Rule ordering procedure of MCAR is shown in Figure: 2.5.

MCAR uses a different approach in ranking the rules. Instead of using the confidence, support and cardinality measures, it ranks by taking into consideration the class frequencies distribution in the main data and rules are prioritized which are linked with classes that are dominant. If two rules have the same confidence, support value and length of item, MCAR selects a rule that is associated with the dominant class. Rules with the same class frequencies are selected randomly.

Figure 2.5 shows the build classifier algorithm. After the classifier is built it is used to classify the test case data. MCAR uses a method, which implies that in the ranked rules, the first rule that matches the portion of the test instance classifies it. The default class is assigned to the test instance where no rule match is found for the test instance condition.

MCAR is found very competitive in terms of predictive accuracy when analysed with the traditional approaches like C4.5 and RIPPER, on 20 data sets from collected from UCI data repository. MCAR has shown good scalability when comparison is drawn between well-known AC technique CBA (Liu et al., 1998) with regards to prediction capacity, efficiency and rule features. MCAR has shown 2-5% higher accuracy than CBA and C4.5.

In document LC an effective classification based association rule mining algorithm (Page 61-64)