• No results found

2.6 Related Work: Classification with Unbalanced Data

2.6.4 Ensemble Methods

As discussed, combining bagging and boosting with sampling (such as under- sampling, over-sampling and SMOTE), to create balanced bootstrap samples is a popular approach to classification with unbalanced data [123][118][170][37]. This means that the base learners are trained using traditional measures such as the overall error or classification accuracy, as the bootstrap samples are artificially balanced. While bagging and boosting with balanced bootstrap sampling represents a large area of work for classification with unbalanced data; bagging has also recently been combined with EMO [123][124], and NCL in the fitness function [168] for class imbalance. These two main approaches (traditional bagging and boosting, and bagging with NCL and EMO) and their limitations are discussed below.

Bagging and Boosting with Balanced Bootstrap Sampling

Most work in this area uses ANNs, decision trees, NB or SVMs as the base classi- fiers in the ensembles. Some examples include the following. In [118], two new under-sampling methods are developed to create balanced bootstrap samples for a boosting algorithm with decision trees; these are compared to several other boosting approaches from the literature. A similar under-sampling approach is developed in [164] using SVMs where the support-vectors are iteratively learned on balanced bootstrap samples. Both [118] and [164] use benchmark tasks from the UCI repository. In [135], an ensemble of linear regression classifiers are trained using under-sampling and AdaBoost for a (binary) classification task from the PAKDD [1] data mining competition. In [159], a pool of decision tree, NB and rule-based base classifiers are combined into a “meta-classifier” for an e-Commerce fraud detection task. Here the base classifiers are trained using a combination of balanced and unbalanced bootstrap samples.

In [170] and [169], a decision trees-based bagging approach is developed where ensemble performances on balanced and unbalanced tasks are compared when the level of diversity in the ensembles are varied during training. Diversity

2.6. RELATED WORK: CLASSIFICATION WITH UNBALANCED DATA 49 is varied by increasing the balanced bootstrap sample sizes where the smaller the bootstrap size, the better the diversity. These findings show that, as expected, the accuracies on the minority and majority class improve together when ensemble diversity is increased inbalanceddata sets. However, on the unbalanced test sets, ensembles with high diversity rates show high minority class accuracies but poor majority class accuracies. These experiments used eight binary and multi-class benchmark tasks from the UCI repository.

Bagging with NCL, and Bagging with EMO

Recently, bagging (with balanced bootstrap sampling) has been compared with NCL in the fitness function (for ensembles-diversity) to train ANN-based ensem- bles [168]. In this work, two formulations of NCL are evaluated in the fitness function (and compared to bagging) on the same UCI tasks as [169]. However, the original unbalanced data set is first re-balanced using sampling before NCL is measured. In the first formulation, NCL is applied to all training instances in the (re-balanced) training set; in the second formulation, NCL is only applied to minority class instances in the (re-balanced) training set while majority class instances are ignored. Both NCL-trained ensembles show better minority class accuracies than the bagging approach, and the first NCL formulation shows the best overall diversity from all three approaches. The authors attribute this to very high diversity on the minority class alone (e.g. the second NCL formulation) producing high minority class accuracies but poor majority class accuracies.

In [123][124], a co-evolutionary approach with bagging and EMO has been used in ensemble learning with grammatical evolution (GE). These works use a problem-decomposition approach (e.g. one-vs-rest) to evolve a population of classifiers for two multi-class tasks (from the UCI repository) with many minority classes. Two populations are co-evolved for ensemble diversity: (binary) classifiers and “points” (which are balanced bootstrap samples). The Pareto based learning objectives in fitness include the overall error of each classifier, the level of overlap between correctly learned “points”, and a parsimony objective favouring smaller solutions. A winner-takes-all approach of the Pareto front determines the final ensemble prediction. This approach is shown to outperform traditional single-objective GP on the tasks.

In one related work [37], a bagging approach with unbalanced bootstrap samples are used for ensemble learning where base classifiers from 16 different learning algorithms are trained with the F-measure as the training criteria. However, this work focuses on ensemble selection where a second training

phase (using GA) optimises the weights that specify which base classifiers are represented in the final ensemble. Experiments on five tasks with unbalanced data from the UCI repository show that the GA-based ensemble selection strategy outperforms two previous approaches: a fitness-weighted majority vote strategy, and a traditional majority voting approach where all members contribute equally in the ensembles. However, this approach represents the only related work which does not use balanced bootstrap sampling.

Limitations of Ensemble Methods

While these approaches show good results on some unbalanced data sets, there are some limitations which this thesis tries to address. Most works uses ANNs, decision trees and NB as the base classifiers [159][168][40][170][37]. In addition, these works rely on sampling techniques to either create balanced bootstrap samples in bagging [159][123][124][170], or re-balance the training data when diversity measures (such as NCL) are used in fitness evaluation [168].

GP has shown much success in evolving reliable and accurate classifiers for traditional single-predictor classification [176][157][60][141][57][77]. However, there is very little related work, particularly in GP, which does not rely on sampling techniques (for cost adjustment) when data is unbalanced. This thesis uses the original unbalanced training data directly in the GP learning process (using the EMO component for cost adjustment), without the need to first artificially re-balance the data. This allows us to concentrate on the cost- adjustment and diversity measures in GP, and remove the dependence on a sampling algorithm.

There has also been very little work which focuses on adapting the ensemble diversity measures in fitness to account for the skewed class distributions [168]. As discussed, most related works measure diversity relative to all examples, irrespective of class, as the classes are firstre-balancedusing sampling [168]. While the work in [168] measures NCL separately for the two classes, diversity on the majority class is ignored. In contrast, this thesis compares two ensemble- diversity measures in the fitness function (NCL and PFC) where diversity is calculated separately for each class using the original unbalanced training data, and diversity on the minority and the majority classes then contributes equally

in fitness evaluation. This is to ensure that the ensembles are equally diverse on both classes.