• No results found

The Analysis of Parameter Effects

5.2 AdaBoost.NC for Two-Class Imbalance Problems

5.2.4 The Analysis of Parameter Effects

The experimental results so far show that, to tackle class imbalance problems effectively,

AdaBoost.NC requires the use of random oversampling, and a large λvalue is preferable.

In the current section, we provide an analysis of the impact of the oversampling rate and

λ across different base learners and data sets. The analysis suggests that, even though

AdaBoost.NC has two parameters to be tuned (oversampling rate andλ), it is robust to

the choice of oversampling rate; the choice ofλ does not depend on the imbalance rate of

training data. So, the substantial advantage of our approach still holds.

We perform a mixed (split-plot) factorial analysis of variance (ANOVA) (Montgomery,

2004). The factors analyzed are the oversampling rate, λ, the base learner and the im-

balance rate of training data. A mixed design is necessary because the oversampling

rate, λ and the base learner are within-subject factors (their levels vary within a data

set), whereas the imbalance rate is a between-subjects factor (its levels vary depending

on the data set being used). The factorial design is a commonly adopted method for

effect analysis, when there is more than one factor. It allows the effects of a factor to be

are considered for each factor. In our cases, artificial data sets “200-50” and “200-10” are

used as training data with different imbalance rates. The oversampling rate is set to 100%

and 150% respectively, which is the size ratio of the minority class to the majority class

after oversampling is applied. λ is set to 2 (conservative level) and 9 (aggressive level) as

before. The C4.5 decision tree and MLP neural network are chosen as the base learner.

The effects of the factors on both AUC and minority-class recall are analyzed. These per-

formance measures are referred to as responses in the context of ANOVA. AdaBoost.NC

composed of 51 classifiers is repeated 30 times for each combination of factor level.

The ANOVA results are presented in Table 5.5, including the p-value and eta-squared

(η2). A p-value smaller than 0.05 indicates a significant difference by rejecting the null

hypothesis under the significance level of 5%. η2 is a measure in the range of [0,1]

describing the effect size (Pierce et al., 2004). The larger theη2, the greater the effect of

the factor.

Table 5.5: ANOVA results: factor effects and interaction effects on AUC and minority- class recall involving the oversampling rate (abbr. over) andλin terms of the base learner (abbr. learner) and training data (abbr. data). The symbol “*” indicates the interaction between two factors.

AUC p-val η2 Minority-class p-val η2

recall over .955 .000 over .087 .002 over*learner .385 .000 over*learner .207 .001 over*data .166 .000 over*data .062 .003 over*learner .952 .000 over*learner .034 .004 *data *data λ .032 .006 λ .000 .176 λ*learner .000 .140 λ*learner .000 .286 λ*data .233 .000 λ*data .010 .005 λ*learner .003 .006 λ*learner .025 .004 *data *data

The results show that the oversampling rate does not have a significant effect on AUC

and minority-class recall in general. The effects of “over” and interactions involving “over”

the interaction between “over”, “learner” and “data” has a significant effect on minority-

class recall, its effect size is very small (η2 = 0.004). The weak interactions here imply

that the oversampling rate always has very little effect on AdaBoost.NC regardless of

what base learner and training data are used. This is a reasonable behaviour, as random

oversampling is the most conservative sampling technique without losing or generating

any data information. Data replication does not cause the change of decision boundaries.

As long as the minority class draws comparable attention of the learning algorithm with

the majority class, the oversampling rate should not affect AdaBoost.NC’s performance

significantly.

Different observations are obtained on λ. It presents a significant effect in terms of

both AUC and minority-class recall, when varying from low to high. Especially, the

interaction effect of λ*learner appears to be quite strong with a much higher η2 value

than the others (0.140 and 0.286). It means that the impact of λ depends on the base

learner. However, the interaction effect ofλ*data is rather weak, which is not significant

on AUC and has a very small η2 value (0.005) on minority-class recall. So, the effect

of λ is not affected by the training data much. For a clear understanding of λ’s effect,

we further draw the plots of marginal means of performance for λ*learner and λ*data

in Fig. 5.11. The plots forλ*learner*data are similar to the 2-factor ones due to its low

effect size, and were omitted here for space considerations.

Fig. 5.11(a) shows different effects of λ in AdaBoost.NC between base learners. As λ

varies from low to high, the tree-based AdaBoost.NC improves AUC and minority-class

recall, whereas the NN-based AdaBoost.NC reduces them. The NN-based AdaBoost.NC

presents better AUC than the tree-based one at both λ values. The tree-based Ad-

aBoost.NC with a largeλ improves minority-class recall greatly, which shows even better

ability to recognize minority class examples than the NN-based AdaBoost.NC.

To explain why the effectiveness of AdaBoost.NC varies between base learners, a

possible reason could be that a NN is less sensitive to the change of number of training

(a)λ*learner

(b)λ*data

Khoshgoftaar et al., 2010) and confirmed in our experiments. Neural networks can be

thought of as a less global approach to partitioning the data space than decision trees

since they get modified by each data point or a batch of data points sequentially and

repeatedly through the error function (Japkowicz and Stephen, 2002). With a different

training strategy, a decision tree grows based on the whole training set by counting the

class frequency at each node. We conjecture that the sensitivity of a classifier to the

characteristics of training examples affects the effectiveness of AdaBoost.NC in handling

imbalanced data sets. This point will be further discussed in the next section by providing

more empirical evidences.

From the view of training data, two response lines in each plot of Fig. 5.11(b) are

nearly parallel to each other, which further confirms the weak interaction between λ and

the imbalance rate of training data previously pointed out by the lowη2 value. The result

is reasonable, because λ does not operate on training data directly, so its impact should

not depend on the given data set greatly.

In summary, the ANOVA results suggest that AdaBoost.NC is insensitive to the over-

sampling rate if the minority class draws relatively equal attention of the learning al-

gorithm to the majority class. λ is the main factor that decides the generalization of

AdaBoost.NC. Its effect depends on the base learner. When the decision tree is used as

the base learner, a largeλis recommended. Its effect is not affected much by the training

data we tested on, becauseλ does not work on the data level. As a remarkable improve-

ment over other resampling-based ensemble methods, AdaBoost.NC simply learns from

the original imbalanced data without generating new minority class data or removing

majority class data. It reduces the dependence of the algorithm on resampling techniques

and training data, which is supported by the ANOVA experiment in this section. Re-

maining questions for future are the behaviour of AdaBoost.NC using other base learning

5.3

Comparisons with Other NCL Algorithms and