Impact of Diversity on Classification Performance on Artificial Im-

3.3 Relationship Between Diversity and Single-Class Measures

3.4.1 Impact of Diversity on Classification Performance on Artificial Im-

To clearly observe the impact of diversity on balanced and imbalanced data sets and

investigate how the performance measures behave as diversity varies, we build ensembles

show the relationship between Q-statistic and other performance measures, and present

corresponding decision boundary plots to explain the obtained results. The correlations

are further confirmed by discussing another non-pairwise diversity measure – generalized

diversity (GD). Then, we examine if the findings are also applicable to imbalanced data

with a larger training size and a larger feature space.

Experimental Setup

Artificial data is generated from two Gaussian distributions with equal covariance and a

small overlapping area close to the separating line. Three different data sizes are con-

sidered: one class always contains 200 training points, while the size of the other class

is set to 10 (very imbalanced), 50 (imbalanced) and 200 (balanced) respectively. They

are denoted by “200-10”, “200-50” and “200-200”. By applying the same method, a

corresponding testing file is created with 50 points in each class.

As we know, Bagging (Breiman, 1996) achieves diversity through resampling, where

each training subset is kept different from each other. In our experiments, we apply the

Bagging training strategy and manipulate diversity by tuning the sampling rate r%. A

smallerrmeans that fewer points join the training. Each training subset is less likely to be

similar to each other. Thus, the prediction becomes less stable and a larger diversity degree

is expected (Skurichina et al., 2002). Different from the conventional Bagging, a different

sampling rate is applied to each class of the imbalanced data set in our experiments.

Concretely, the majority class is randomly sampled with replacement at rate r%; the

minority class is randomly sampled with replacement at rate r(1−θ)/θ%, where θ is the imbalance rate as defined before. By doing so, every training subset has a balanced

class distribution. This is to avoid the situation that the minority class is ignored by

the classifier, and thus affecting our experimental analysis with misleading results. Some

preliminary experiments showed that the recall of the minority class always remains 0

without rebalancing applied when the training data is highly skewed, which means that

simple majority vote gives the final decision.

Table 3.3: Bagging-based training strategy (Wang and Yao, 2009b).

Training:

1. Given the training set Z with imbalance rateθ; resampling rate at r%; number of classifiers L.

Zmin is the minority class subset of Z.

Zmaj is the majority class subset ofZ.

2. Construct subset Zk by executing the following:

2a. Bootstrap Zmaj at rate r%

and add chosen examples into Zk;

2b. Bootstrap Zmin at rater(1−θ)/θ%

and add chosen examples into Zk.

3. Train a classifier using Zk.

4. Repeat steps 2 and 3 until k equals L.

Testing on a new example: (majority voting) 1. Collect decisions from each classifier.

2. Return the class label that receives the most votes.

The training method is run 50 times for each setting of sampling rate, and the av-

erages are computed. The sampling rate r% is varied in the range of [3%,1000%]. Ev-

ery ensemble consists of 15 classifiers. C4.5 decision trees are used as the base learner.

For the correlation analysis, we output 10 measures, including overall Q-statistic (Q),

minority-class Q-statistic (Qmin), majority-class Q-statistic (Qmaj), minority-class recall

(Rmin), minority-class precision (Pmin), minority-class F-measure (Fmin), majority-class

recall (Rmaj), majority-class precision (Pmaj), majority-class F-measure (Fmaj) and over-

all accuracy (Povr). Qmin and Qmaj assess the diversity degree of an ensemble only within

the minority and majority data subsets respectively.

Correlation Analysis of Q-statistic

We conduct correlation analysis to study the relationship between diversity and single-

class performance. We compute Spearman’s rank correlation coefficient between overall

dependence between two variables, and insensitive to how the measures are scaled. It

ranges in [−1,1], where 1 (or -1) indicates a perfect monotone increasing (or decreasing) relationship. Table 3.4 summarizes the correlation coefficients in two sampling ranges. A

significance correlation is indicated in boldface at confidence level of 95%.

Table 3.4: Rank correlation coefficient (in %) between overall diversityQand 9 measures on 3 artificial data sets. Numbers in boldface indicate significant correlations.

r∈[3,100] Qmin Qmaj Rmin Pmin Fmin Rmaj Pmaj Fmaj Povr

200-200 93 52 27 -54 -9 -50 26 -22 -16 200-50 97 98 -98 54 -98 58 -98 -98 -98

200-10 6 98 -38 33 -36 33 -36 -36 -36

r∈[100,1000] Qmin Qmaj Rmin Pmin Fmin Rmaj Pmaj Fmaj Povr

200-200 98 89 -79 -89 -98 -81 -88 -98 -96

200-50 100 93 -81 58 -81 58 -81 -81 -81

200-10 99 27 -81 77 -61 77 -55 -55 -61

First, between Q and Qmin/Qmaj, we observe that all coefficients from the three data

sets are positive, which shows that ensemble diversity for each class has the same changing

tendency as the overall diversity, regardless of whether the data set is balanced. On one

hand, it guarantees that increasing the classification diversity over the whole data set

can increase diversity over each class. On the other hand, it confirms that the diversity

measure Q-statistic seems not to be sensitive to imbalanced distributions.

For the balanced data set “200-200”, the overall accuracy and most single-class mea-

sures do not present clear correlations withQwhen r∈[3,100]. In this range, increasing diversity does not necessarily lead to better performance, due to the mutual effect of in-

dividual accuracy and diversity. When r varies in [100,1000] with a low diversity range,

all 7 performance measures have strong negative correlations with Q. It suggests that

diversity is beneficial to both classes and the overall accuracy. This observation agrees

with the mathematical relations under the independence assumption in section 3.3.1. It

corresponds to the situation 1 in Table 3.2. Generally speaking, when ensemble diversity

is small, increasing diversity improves the classification performance of both classes in the

Imbalanced data sets “200-50” and “200-10” present similar impacts of diversity on

each performance measure and more significant correlations than the balanced data set.

Overall accuracyPovr gets higher in both ranges ofras the ensemble becomes more diverse.

Single-class measures behave differently between classes. With respect to the minority

class,Rmin and Fmin have significant negative correlations withQ; Pmin has a significant

positive correlation with Q. It implies that increasing diversity can find more minority

class examples (i.e. better recall) but lose some classification precision, due to the trade-off

between recall and precision. Better F-measure indicates that the improvement of recall

is greater than the reduction of precision. The observation corresponds to the situation 2

in Table 3.2. As to the majority class,Rmaj has a significant positive correlation with Q;

Pmaj and Fmaj have significant negative correlations withQ. It means that the majority-

class recall gets smaller along with the increase of diversity, but precision and F-measure

are improved. The measure behaviours of the majority class correspond to the situation 3

in Table 3.2. It implies the performance trade-off between minority and majority classes.

Table 3.5 summarizes the measure tendencies for both balanced and imbalanced cases

along with the decrease of Q (i.e. the increase of ensemble diversity) and their corre-

sponding situations in Table 3.2.

Table 3.5: Changing directions of single-class measures and overall accuracy by increasing diversity on artificial data sets and their corresponding situations in Table 3.2.

Class Povr R P F Situation

Single-class ↑ ↑ ↑ ↑ (1) (200-200) (good pattern)

Minority ↑ ↓ ↑ (2)

(200-50,200-10) ↑

Majority (good pattern) ↓ ↑ ↑ (3) (200-50,200-10)

Considering the learning objective in class imbalance learning and the importance of

the minority class, we conclude that diversity has a positive effect in classifying these

and precision with more minority class examples identified. Moreover, the different be-

haviours of recall between the minority and majority classes agree with the two arguments

in section 3.2.3.

Decision Boundary Analysis

To show the radical effects of diversity visually, we produce classification boundary plots

for data sets “200-200” and “200-10” at three specific sampling rates in Fig. 3.2: r= 1000

with a low diversity degree,r= 100 with a moderate diversity degree, and r= 20 with a

high diversity degree.