3.3 Research experiment
3.3.3 Redundancy elimination
This is a pre-processing step in which redundant or highly-correlated features are removed from the data. Redundant features supply information which exists in other attributes and thereby reduce the predictive performance (Xu et al. 2016:370-381). In this study, a hybrid algorithm that is based on the
71
FCBF was used to eliminate redundant features in all sets selected by the feature selection algorithms.
Symmetric uncertainty (SU)
In information theory, SU is a normalised measure that evaluates the dependencies of features using entropy and conditional entropy. The entropy of X given that X is a random variable and the probability of x is P(x) is defined as:
π»(π) = β β π(π₯π)πππ2(π(π₯π)) π
(3.1)
The conditional entropy, also known as the conditional uncertainty of X after given the values of an attribute Y is: π»(π\π) = β β π(π¦π) π β π(π₯π\π¦π)πππ2(π(π₯π\π¦π)) π (3.2)
The SU figure of 0 implies that features are totally independent, while an SU amount of 1 signifies that a feature can totally predict the value of another feature.
According to (Yu & Liu 2003: 1-11), an attribute which has a certain degree of correlation with a concept, for example, a class may also have the same or even higher degree of correlation to other concepts. Thus, the attribute and the target concept are correlated at a level that is greater than a specific threshold πΏ and therefore causing this attribute to be significant to the class concept. This correlation is by no means predominant or significant in determining the target concept. The concept of predominant correlation is defined as follows (Singh, Kushwaha & Vyas 2014:95-105);
Definition 1 β Predominant correlation
72
πππ πππ,π β₯ πΏ, and β πΉπ β πΛ(π β ), there exists no πΉπ such that (πππ,π β₯ πππ,π).
Definition 2 β Predominant Feature
A feature is predominant to the class; πππ its correlation to the class is predominant or can become predominant after eliminating other attributes from the class.
As stated by the preceding explanations, an attribute is good if it is predominant in the prediction of the class concept. Selecting attributes by classifying them is a procedure that recognises all attributes that are predominant to the class and eliminates non-key features.
The following three heuristics can efficiently recognise predominant attributes and eliminate redundancy from all significant or relevant attributes, with no need to test for peer redundancy for each attribute in πΛ, and thereby avoiding investigations of correlations between pairs of all significant attributes. If two redundant attributes are recognised, eliminating one of them that is less significant to the class results in the retainment of weightier information for the class prediction whilst decreasing redundant attributes.
Heuristic 1
πππ is the set all redundant peers to πΉπ
ππ (πππ + = β ) consider πΉπ as a predominant feature, eliminate all attributes in πππβ, and skip
identifying redundant peers for them(Yu & Liu 2003).
Heuristic 2
ππ (πππ + β β ) action all attributes in π
ππ +prior to deciding on πΉπ. If none of them becomes predominant,
follow Heuristic 1, else only eliminate πΉπ and decide if or not to eliminate any attributes in πππ β based on other attributes in π'.
73
The attribute with the biggest πππ,πvalue is always a predominant attribute and can be used to
eliminate other attributes.
3.3.3.1 Fast correlation-based filter
The FCBF is an algorithm that selects good features based on predominant correlation, and then presents a fast algorithm with less than quadratic time complexity (Yu & Liu 2003). The algorithm applies the predominance concept. The FCBF uses SU as a correlation measure (Wu et al. 2006). It is composed of two sections, and the first section selects relevant attributes. An attribute p is significant to the target attribute C iff SU ( p,c) β₯ Ξ΄ given that Ξ΄ is a predefined threshold.
In the second section, redundant features are selected from the relevant ones, according to its redundancy definition: a feature q is said to be redundant iff p is a predominant feature, SU(p,c) > SU(q,c) and SU(p,q) β₯ SU(q,c).The inequalities imply that p is a better predictor of class c and that q
is more similar to p than to c (Yang et al. 2016).
The steps of identifying redundant features consist of: (1) choosing a predominant attribute, (2) removing all attributes for which it forms an approximate Markov blanket, and (3) iterate steps (1) and (2) until no more predominate attributes can be found. An optimal feature subset can therefore be approximated by a set of predominant features without redundancy.
3.3.3.2 FastCR
The proposed FAST Correlation-based Redundancy elimination (FastCR) algorithm is based on the MIC and FCBF method and is implemented in Java. The original FCBF code selects significant features and removes redundant attributes. The FastCR algorithm for this research removes redundant attributes. Relevant attributes are selected using the MIC algorithm, resulting in a hybrid MICFastCR algorithm.
74
In lines 2-4, the algorithm calculates the MIC values for all the features and saves them in a list. In the second part (line 6-20), redundant features are removed if the SU values of two features are the same, Table 3.2.
As stated in Heuristic 1, a feature Fp, which has been ascertained to be a predominant attribute, can
constantly be used to eliminate other attributes that are graded lower than Fp and have Fp as one of
its redundant peers (Yu & Liu 2003:1-11).
The loop begins from the first element (Heuristic 3) inπΛπππ π‘ (line 7) and runs as detailed below:
Considering all the prevailing attributes (from the one right next to Fp to the last one inπΛπππ π‘), if Fp turns
out to be a redundant peer to a feature Fq, Fq will be eliminated from ππππ π‘list (Heuristic 2). After
selecting attributes for one cycle subject to Fp,, the algorithm will utilise the feature that currently
remains, is beside Fp as the new reference (line 19) to reiterate the selection. The algorithm ends
when there are no more attributes that can be eliminated from ππππ π‘list.
Table 3.2 MIC and FastCR Algorithm (Zhao, Deng & Shi 2013:70-79; Yu & Liu 2003:1-8)
Input: π(πΉ1, πΉ2, β¦ , πΉπ, πΆ) // training dataset
πΏ //a predefined threshold
Output:ππππ π‘ //an optimal subset
1 Begin
2 for all ππ, ππβ π·, π β π do begin
3 Calculate MIC values and Set ππ,π = ππΌπΆ( ππ, ππ);
4 end for
5 Sort distinct values of ππ,π in descending order as πΛπππ π‘;
75 7 πΉπ = πππ‘πΉπππ π‘πΈππππππ‘(πΛπππ π‘); 8 do begin 9 πΉπ = πππ‘πππ₯π‘πΈππππππ‘(πΛπππ π‘, πΉπ); 11 do begin 12 πΉΛπ= πΉπ;
13 ππ (πππ,π = 1) // if features are identical 14 remove πΉπ ππππ πΛπππ π‘; 15 πΉπ= πππ‘πππ₯π‘πΈππππππ‘(πΛπππ π‘, πΉΛπ); 16 πππ π πΉπ = πππ‘πππ₯π‘πΈππππππ‘(πΛπππ π‘, πΉπ); 17 πππ π’ππ‘ππ (πΉπ == πππΏπΏ); 18 πΉπ = πππ‘πππ₯π‘πΈππππππ‘(πΛπππ π‘, πΉπ); 19 πππ π’ππ‘ππ (πΉπ == πππΏπΏ); 20 ππππ π‘ = πΛπππ π‘; 21 end;