2.7 Associative Classification Algorithms
2.7.5 Multi-Class Classification Based on Association rule (MCAR)
MCAR employs an effective technique to produce frequent ruleitems and uses a method to rank rules that keeps the rules with high confidence values for prediction. MCAR works in two steps: generation of rules and classifier building. The first stage starts by searching the training data and extract frequent-1 rule itemsets, and these ruleitems are combined together to generate candidate- 2 ruleitems using other attributes. The ruleitems that passes a certain set values of minimum support and minimum confidence is treated as a frequent rule. In the second stage, effectiveness
60
Input:, minconf ,minsupp thresholds and Training data (D), Output: A Rule Set
Preprocessing phase
If D having integer/real attributes
Discretise continuous columns using a Multi-interval discretisation method.
Shuffle the training objects locations randomly The Algorithm
Scan D for the set of frequent one-ruleitems
while {
}
Rank R according to the method shown in fig 3.7. Evaluate R on D
Remove all rules from R where there is some rule of a higher rank and .
Figure 2.3 MCAR algorithm
of the rules on the training datasets is measured to build a classifier. The rules that cover or can classify a certain number of training data instances are placed in the classifier. The MCAR algorithm is presented in Figure 2.4 Figure 2.3 shows the rule discovery procedure used by MCAR, In Figure 2.4, the Multi-interval discretisation technique of (Fayyad and Irani, 1993) is applied in MCAR for real and integer type of data. MCAR scan the training dataset calculates the frequencies of the itemsets. The itemsets along with their classes, which have support count more than the minsupp value, are stored in vertical format in an array. Rest of the itemsets are discarded. Produce function as described in the Figure 2.3 is used to discover ruleitems of size k by combining the different column itemsets of size k-1 and then their rowIds are intersected. The outcome of intersection between two itemsets rowIds is a set that holds the rowIds of both the itemsets occurrence in the training data. The above set and the array containing the frequencies of the class labels were generated in the first scan. The values are used to formulate the
confidence and support of new combinations of ruleitems. .
61
Input: set of created rules (R),a array (Tr), class array C
A rule r in R has the following properties: Items, class, rowIds(tid-list) The class array, C, contains the occurrences of class labels in the training data Output: classifier (Cl)
R’ = sort(R); insert r1.rowIds into Tr
for each rule rR’ in sequence do if r classifies at least a single case begin
insert r at the end of Cl; insert r.rowIds into Tr
end if end
If Tr.size > 0 then
select the majority class as a default class from (C-Tr ) else
select the majority class as a default class from the current Cl and add it to Cl
end if
for each rule ( )Cl in sequence do if there is a lower ranked rule where prune
end if end
Figure 2.4 MCAR classifier builder algorithm
Function produce
Input: A set of ruleitems S
Output: set of produced ruleitems
Do
For each pair of disjoint items I1, I2in S Do
If (<I1I2>, c) passes the minsupp threshold
if (<I1I2>, c) passes the minconf threshold
end if end if end end Return
Figure 2.5 Rule discovery algorithm of MCAR
The function in Figure 2.5 is called in each iteration of the algorithm and produces a frequent
ruleitems at iteration K in order to discover frequent itemsets at K+ 1 iteration. As documented that the number of rules generated by the AC algorithms are large (Baralis et al., 2004; Li et al.,
62
2001). MCAR ranks and then prunes the redundant rules to form a set of classifier that have less number of rules, which are eventually easy to understand and handle. Rule ordering procedure of MCAR is shown in Figure: 2.5.
MCAR uses a different approach in ranking the rules. Instead of using the confidence, support and cardinality measures, it ranks by taking into consideration the class frequencies distribution in the main data and rules are prioritized which are linked with classes that are dominant. If two rules have the same confidence, support value and length of item, MCAR selects a rule that is associated with the dominant class. Rules with the same class frequencies are selected randomly.
Figure 2.5 shows the build classifier algorithm. After the classifier is built it is used to classify the test case data. MCAR uses a method, which implies that in the ranked rules, the first rule that matches the portion of the test instance classifies it. The default class is assigned to the test instance where no rule match is found for the test instance condition.
MCAR is found very competitive in terms of predictive accuracy when analysed with the traditional approaches like C4.5 and RIPPER, on 20 data sets from collected from UCI data repository. MCAR has shown good scalability when comparison is drawn between well-known AC technique CBA (Liu et al., 1998) with regards to prediction capacity, efficiency and rule features. MCAR has shown 2-5% higher accuracy than CBA and C4.5.