5.2 AdaBoost.NC for Two-Class Imbalance Problems
5.2.4 The Analysis of Parameter Effects
The experimental results so far show that, to tackle class imbalance problems effectively,
AdaBoost.NC requires the use of random oversampling, and a large λvalue is preferable.
In the current section, we provide an analysis of the impact of the oversampling rate and
λ across different base learners and data sets. The analysis suggests that, even though
AdaBoost.NC has two parameters to be tuned (oversampling rate andλ), it is robust to
the choice of oversampling rate; the choice ofλ does not depend on the imbalance rate of
training data. So, the substantial advantage of our approach still holds.
We perform a mixed (split-plot) factorial analysis of variance (ANOVA) (Montgomery,
2004). The factors analyzed are the oversampling rate, λ, the base learner and the im-
balance rate of training data. A mixed design is necessary because the oversampling
rate, λ and the base learner are within-subject factors (their levels vary within a data
set), whereas the imbalance rate is a between-subjects factor (its levels vary depending
on the data set being used). The factorial design is a commonly adopted method for
effect analysis, when there is more than one factor. It allows the effects of a factor to be
are considered for each factor. In our cases, artificial data sets “200-50” and “200-10” are
used as training data with different imbalance rates. The oversampling rate is set to 100%
and 150% respectively, which is the size ratio of the minority class to the majority class
after oversampling is applied. λ is set to 2 (conservative level) and 9 (aggressive level) as
before. The C4.5 decision tree and MLP neural network are chosen as the base learner.
The effects of the factors on both AUC and minority-class recall are analyzed. These per-
formance measures are referred to as responses in the context of ANOVA. AdaBoost.NC
composed of 51 classifiers is repeated 30 times for each combination of factor level.
The ANOVA results are presented in Table 5.5, including the p-value and eta-squared
(η2). A p-value smaller than 0.05 indicates a significant difference by rejecting the null
hypothesis under the significance level of 5%. η2 is a measure in the range of [0,1]
describing the effect size (Pierce et al., 2004). The larger theη2, the greater the effect of
the factor.
Table 5.5: ANOVA results: factor effects and interaction effects on AUC and minority- class recall involving the oversampling rate (abbr. over) andλin terms of the base learner (abbr. learner) and training data (abbr. data). The symbol “*” indicates the interaction between two factors.
AUC p-val η2 Minority-class p-val η2
recall over .955 .000 over .087 .002 over*learner .385 .000 over*learner .207 .001 over*data .166 .000 over*data .062 .003 over*learner .952 .000 over*learner .034 .004 *data *data λ .032 .006 λ .000 .176 λ*learner .000 .140 λ*learner .000 .286 λ*data .233 .000 λ*data .010 .005 λ*learner .003 .006 λ*learner .025 .004 *data *data
The results show that the oversampling rate does not have a significant effect on AUC
and minority-class recall in general. The effects of “over” and interactions involving “over”
the interaction between “over”, “learner” and “data” has a significant effect on minority-
class recall, its effect size is very small (η2 = 0.004). The weak interactions here imply
that the oversampling rate always has very little effect on AdaBoost.NC regardless of
what base learner and training data are used. This is a reasonable behaviour, as random
oversampling is the most conservative sampling technique without losing or generating
any data information. Data replication does not cause the change of decision boundaries.
As long as the minority class draws comparable attention of the learning algorithm with
the majority class, the oversampling rate should not affect AdaBoost.NC’s performance
significantly.
Different observations are obtained on λ. It presents a significant effect in terms of
both AUC and minority-class recall, when varying from low to high. Especially, the
interaction effect of λ*learner appears to be quite strong with a much higher η2 value
than the others (0.140 and 0.286). It means that the impact of λ depends on the base
learner. However, the interaction effect ofλ*data is rather weak, which is not significant
on AUC and has a very small η2 value (0.005) on minority-class recall. So, the effect
of λ is not affected by the training data much. For a clear understanding of λ’s effect,
we further draw the plots of marginal means of performance for λ*learner and λ*data
in Fig. 5.11. The plots forλ*learner*data are similar to the 2-factor ones due to its low
effect size, and were omitted here for space considerations.
Fig. 5.11(a) shows different effects of λ in AdaBoost.NC between base learners. As λ
varies from low to high, the tree-based AdaBoost.NC improves AUC and minority-class
recall, whereas the NN-based AdaBoost.NC reduces them. The NN-based AdaBoost.NC
presents better AUC than the tree-based one at both λ values. The tree-based Ad-
aBoost.NC with a largeλ improves minority-class recall greatly, which shows even better
ability to recognize minority class examples than the NN-based AdaBoost.NC.
To explain why the effectiveness of AdaBoost.NC varies between base learners, a
possible reason could be that a NN is less sensitive to the change of number of training
(a)λ*learner
(b)λ*data
Khoshgoftaar et al., 2010) and confirmed in our experiments. Neural networks can be
thought of as a less global approach to partitioning the data space than decision trees
since they get modified by each data point or a batch of data points sequentially and
repeatedly through the error function (Japkowicz and Stephen, 2002). With a different
training strategy, a decision tree grows based on the whole training set by counting the
class frequency at each node. We conjecture that the sensitivity of a classifier to the
characteristics of training examples affects the effectiveness of AdaBoost.NC in handling
imbalanced data sets. This point will be further discussed in the next section by providing
more empirical evidences.
From the view of training data, two response lines in each plot of Fig. 5.11(b) are
nearly parallel to each other, which further confirms the weak interaction between λ and
the imbalance rate of training data previously pointed out by the lowη2 value. The result
is reasonable, because λ does not operate on training data directly, so its impact should
not depend on the given data set greatly.
In summary, the ANOVA results suggest that AdaBoost.NC is insensitive to the over-
sampling rate if the minority class draws relatively equal attention of the learning al-
gorithm to the majority class. λ is the main factor that decides the generalization of
AdaBoost.NC. Its effect depends on the base learner. When the decision tree is used as
the base learner, a largeλis recommended. Its effect is not affected much by the training
data we tested on, becauseλ does not work on the data level. As a remarkable improve-
ment over other resampling-based ensemble methods, AdaBoost.NC simply learns from
the original imbalanced data without generating new minority class data or removing
majority class data. It reduces the dependence of the algorithm on resampling techniques
and training data, which is supported by the ANOVA experiment in this section. Re-
maining questions for future are the behaviour of AdaBoost.NC using other base learning