• No results found

2.2 Single-Label Feature Selection for Classification

2.2.1 Feature Selection Approaches

Feature selection methods can be separated into 3 approaches; (1) the filter ap- proach, (2) the wrapper approach and (3) the embedded approach [19, 20, 39, 67, 76, 77, 79, 97].

There are two groups of methods following the filter approach: (I) feature ranking-based methods and (II) search-based methods. In general, a feature ranking-based method applies statistical techniques to measure the relevance (broadly speaking, correlation with class attribute) of each feature separately, ranks fea- tures according to their relevance and selects the top k features from the ranked list (where k is a predefined number). The drawback of this technique is that it considers only one feature at a time (univariate method) and ignores the correla- tions between features. One feature that is irrelevant by itself can be significantly informative when considered together with other features [43]. Moreover, it tends to select a redundant feature subset.

Another type of filter approach consists of search-based methods. This type of method considers the relationship between features in a feature subset (being a multivariate method), doing a search in the space of possible feature subsets. Each

Feature subset Generation

Evaluation

Function Good?

Full set of features

Feature subset Quality No Yes Classification Algorithm Testing Classifier Accuracy Best Feature Subset Phase1: Feature Selection Phase2: Evaluation

Figure 2.2: The filter approach for feature selection (adapted from [76])

feature subset considered by the search method represents a candidate solution, which is evaluated by an evaluation function (e.g. a correlation-based function). The advantage of this approach is feature redundancy elimination, assuming the evaluation function penalizes redundant feature subsets. On the other hand, in some cases, features with a moderate degree of redundancy are significantly infor- mative when considered together with other features [43].

According to Figure 2.2, in the search-based filter approach, phase 1, the basic flow of feature selection starts with feature subsets which are generated from the full set of features using a search method. Next, each feature subset is evaluated based on a specific criterion (or evaluation function). Both steps in phase 1 are repeated until a stopping criterion is satisfied, e.g. until a fixed number of iter- ations is performed or the quality of the current best feature subset cannot be improved. Note that all mentioned steps in phase 1 are independent from the classification algorithm, until the system gets the best feature subset. Only in phase 2, executed after we got the best feature subset, the classification algorithm is used. This approach was applied in the design of several feature selection meth-

Feature subset Generation

Classification

Algorithm Good?

Full set of features

subset Accuracy No Yes Classification Algorithm Testing Classifier Accuracy Best Feature subset Phase1: Feature Selection Phase2: Evaluation

Figure 2.3: The wrapper approach for feature selection (adapted from [76])

ods, such as Correlation-based Feature Selection [44] and Fast Correlation-based Feature Selection [39, 115, 119].

The filter approach is fast, scalable and independent of the classifier. More- over, [78] highlighted that the most used feature selection approach in real-world applications where the number of features is very large (such as in microarray data and text mining) is the filter approach, because the structure of filter algorithms is simple and it provides a simple way to calculate the relevance of features in large-scale data in a short time.

On the other hand, the wrapper approach selects the best feature subset by doing a search in the feature space guided by a classifier’s performance, i.e. using a classifier’s accuracy as the evaluation function (Figure 2.3). In the wrapper ap- proach, the classification algorithm used in phase 1 is the same as the algorithm in phase 2, which will use the selected features to build a classifier to be applied to the test set.

The wrapper approach is usually more effective (in terms of maximizing pre- dictive accuracy) than the filter approach because the wrapper approach directly uses the accuracy of the classification model as the evaluation function of a feature subset, but there is a risk of model overfitting [39, 97]. Moreover, the wrapper ap- proach is usually much more computationally expensive than the filter approach because a classification algorithm has to be run for each candidate feature subset, which is not the case in the filter approach.

In the third approach, namely the embedded approach, the search for a good feature subset is embedded into the classifier construction process. Hence, this approach is classifier-specific too, and it also tends to be more computationally expensive than the filter approach. An example of a type of classification algo- rithm performing embedded feature selection is decision tree algorithms [93], where during the tree construction process, a feature is selected at each internal node of the tree.

Note that both the filter and the wrapper approaches are performed in a pre- processing step, before applying the classification algorithm; whilst the embedded approach is performed as part of the run of a classification algorithm. In this chap- ter we focus only on feature selection methods performed in a preprocessing phase using the filter approach, i.e., the wrapper and the embedded approach are out of the scope of this work; for the sake of computational efficiency and scalability.

In the context of the filter approach, we can classify feature selection methods into 2 types based on whether or not the method takes into account relationships among features [76]. First, in the univariate filter feature selection approach, the feature selection method measures the quality of just one feature at a time using a given evaluation function, e.g. t-test, F-statistic or information-gain. The ad- vantage of the univariate filter approach is that it fast and scalable [97], but there

are some drawbacks, such as it ignores the dependencies and correlations between features in the feature space.

Second, in the multivariate filter feature selection approach, the feature selec- tion method measures the quality of a feature subset as a whole. That is, the correlation between features in the subset is taken into account. This approach takes more time to generate feature subsets and measure the quality of each fea- ture subset, so it is usually slower than the univatiate approach. Examples of the evaluation functions which are used to measure a feature subset’s quality are the correlation-based feature selection (CFS) [44] and Maximize Relevance Minimize Redundant (MRMR)[25, 90]. These evaluation functions will be discussed later in this Chapter.