2.4 A review of Evolutionary Algorithms for Feature Selection in a Data
2.4.2 Fitness Function
Another component of GAs is a fitness function, which aims to evaluate the fitness of individuals. The vast majority of GAs for feature selection follow the wrapper approach, where the fitness function involves the predictive performance of a clas- sifier built using the features selected by the corresponding individual. However, the filter approach could be used also, without using a classifier’s performance [34].
There are several types of feature ranking techniques used in the literature, such as Between Group to Within group sum of square ratio (BW ratio) [15][47], Entropy based [108], Information gain [5, 6, 15], T-statistics [108], the relative approximity degree [82] and Wilcoxon rank sum [75].
A search method using correlation coefficient as the evaluation function [15] and a search for the Markov blanket [125] of the class attribute are examples of a search-based method following the filter approach for feature selection.
Individual Pool
Evaluate each individual
using fitness function
Apply Genetic
operators
Survival Selection
Figure 2.10: General scheme of GAs based on the filter approach
based on the accuracy of a classifier [6, 14, 40, 49, 70, 73, 86]. Some papers use the accuracy of the classifier and another special criterion as a fitness function. For instance, in [70] they use the accuracy of k-NN and the proportion of selected features in the individual to the total number of features in the dataset; in [14] they used the accuracy, the simplicity of decision tree (tree size); and number of features in feature subset; and in [22] they used the accuracy of an SVM and the number of selected features. A list of the different types of fitness functions used by many GAs proposed in the literature is provided in Table 2.1
Individual Pool
Run Classification
algorithm on each
individual
Evaluate Fitness
function
Apply Genetic
operators
Survival Selection
Table 2.1: A summary of the literature on Genetic Algorithms for Feature Selection in a data preprocessing phase References Feature Selection Approach Ind.Rep. Fitness
Function Crossover Mutation
Other Operation
[69] Filt & Wrap List of feature indexes
BW ratio for filter approach
The accuracy of k-NN for wrapper approach Dynamic Dynamic Elitist strategy
[5] Filt & Wrap Bit string
Information content for filter and The accuracy of Decision Tree,
the classification cost for wrapper approach
not mentioned not mentioned not mentioned
[70] Wrap Bit string The accuracy of k-NN Adaptive probability Adaptive probability Elitist strategy [120] Filt & Wrap Bit string PCA for filter approach and
the accuracy of MLNB for wrapper approach Uniform not mentioned Elitist strategy [6] Wrap Bit string The accuracy of Decision Tree
and size of the feature subset not mentioned not mentioned not mentioned
[108] Filt & Wrap Bit string
Entropy based, T-statistics, SVM-recursive elimination
for filter approach and the accuracy of SVM for wrapper approach
Single-point Bit-flip not mentioned
[40] Wrap Bit string The accuracy of GRNN Half uniform Bit-flip Simulated Annealing
[14] Wrap List of feature indexes The accuracy and simplicity
of Decision Tree Uniform Bit-flip Delete Feature
[86] Wrap Bit string Feature subset cardinality
and the accuracy of 1-NN Multi-point Bit-flip
Problem–specific operation
[83] Filt & Wrap Bit string
The relative proximity degree
for filter approach and the accuracy of k-NN for wrapper approach
Multiple-point Bit-flip not mentioned
[73] Wrap Bit string The accuracy of SVM Single-point Bit-flip not mentioned
[15] Filt & Wrap Bit string
The correlation based feature weights for each feature for filter approach and the accuracy of k-NN for wrapper approach
Standard Bit-flip Taguchi method
[22] Filt & Wrap Bit string M Ranked method for filter approach and
the accuracy of SVM for wrapper approach Single-point Bit-flip not mentioned [117] Filt & Wrap Bit string Information Gain for filter approach and
the accuracy of k-NN for wrapper approach Two-point Bit-flip not mentioned
[51] Filt & Wrap Bit string
Cosine amplitude method and alpha cut method for filter approach and the accuracy of SVM
for wrapper approach
One-point Multi-uniform Elitist strategy
[75] Filt & Wrap Bit string Wilcoxon rank sum test for filter approach
and the accuracy of SVM for wrapper approach Double one-point Bit-flip not mentioned
[50] Wrapper List of feature indexes The accuracy of ANN One-point Bit-flip Speciation,
Elitist strategy
[47] Filt & Wrap 2 parts bit string
BW ratio, the correlation coefficient , the Fisher’s discriminant criterion
for filter approach and the accuracy of SVM Specialized Specialized Elitist strategy
Considering the feature selection approach, most works mentioned in the sec- ond column of the table use the filter and wrapper approaches together, in a sequential fashion. The advantage of using the filter approach before applying a GA is the reduction of the number of features in the feature space, in order to allow the subsequent use of a wrapper approach. In contrast, applying only the wrapper approach to all original features would be much more computationally expensive. On the other hand, in works like [125], they do not need to use the filter approach (for feature elimination) because the number of features in the datasets mined in those papers is no more than 100 features, which does not seem too large for a wrapper-based GA for feature selection.