CS570 Data Mining Classification:
Ensemble Methods
Cengiz Günay
Dept. Math & CS, Emory University
Fall 2013
Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong
Günay (Emory) Classification: Ensemble Methods Fall 2013 1 / 6
Today
Due today midnight:
Homework #2 – Frequent itemsets Given today:
Homework #3 – Classification Today’s menu:
Classification: Ensemble Methods
Günay (Emory) Classification: Ensemble Methods Fall 2013 2 / 6
Ensemble Methods
• Given a data set, generate multiple models and combine the results
• Bagging
• Random Forests
• Boosting
– PAC learning significance
General Idea
Why does it work?
Suppose there are 25 base classifiers
Each classifier has error rate, ε = 0.35
Assume classifiers are independent
Probability that the ensemble classifier makes a wrong prediction:
∑
i=13
25
(
25i)
εi(1−ε)25−i=0 . 06Types of Ensemble Methods
Can be obtained by manipulating:
1 Training set:
Bagging Boosting
2 Input features: Random forests
Multi-objective evolutionary algorithms Forward/backward elimination?
3 Class labels: Multi-classes Active learning
4 Learning algorithm: ANNs
Decision trees
Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6
Types of Ensemble Methods
Can be obtained by manipulating:
1 Training set:
Bagging Boosting
2 Input features:
Random forests
Multi-objective evolutionary algorithms Forward/backward elimination?
3 Class labels: Multi-classes Active learning
4 Learning algorithm: ANNs
Decision trees
Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6
Types of Ensemble Methods
Can be obtained by manipulating:
1 Training set:
Bagging Boosting
2 Input features:
Random forests
Multi-objective evolutionary algorithms Forward/backward elimination?
3 Class labels:
Multi-classes Active learning
4 Learning algorithm: ANNs
Decision trees
Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6
Types of Ensemble Methods
Can be obtained by manipulating:
1 Training set:
Bagging Boosting
2 Input features:
Random forests
Multi-objective evolutionary algorithms Forward/backward elimination?
3 Class labels:
Multi-classes Active learning
4 Learning algorithm:
ANNs Decision trees
Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6
Bagging
• Create a data set by sampling data points with replacement
• Create model based on the data set
• Generate more data sets and models
• Predict by combining votes
– Classification: majority vote
– Prediction: average
Bagging
Sampling with replacement
Build classifier on each bootstrap sample
Each sample has probability (1 – 1/n)n of being selected
Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7
Bagging
Advantages:
Less overfitting
Helps when classifier is unstable (has high variance) Disadvantages:
Not useful when classifier is stable and has large bias
Günay (Emory) Classification: Ensemble Methods Fall 2013 4 / 6
PAC learning
• Model defining learning with given accuracy and confidence using polynomial sample complexity
• References:
– L. Valiant. A theory of the learnable.
• http://web.mit.edu/6.435/www/Valiant84.pdf
– D. Haussler. Overview of the Probably
Approximately Correct (PAC) Learning Framework
• http://www.cs.iastate.edu/~honavar/pac.pdf
Boosting
• Use weak learners and combine to form strong learner in PAC learning sense
• Learn using a weak learner
• Boost the accuracy by reweighting the examples misclassified by previous weak learner and forcing the next weak learner to focus on the
“hard” examples
• Predict by using a weighted combination of the weak learners
– Weight is determined by their accuracy
Boosting
An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records
Initially, all N records are assigned equal weights
Unlike bagging, weights may change at the end of boosting round
Boosting
Records that are wrongly classified will have their weights increased
Records that are classified correctly will have their weights decreased
Original Data 1 2 3 4 5 6 7 8 9 10
Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3
Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2
Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4
• Example 4 is hard to classify
• Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds
Boosting
Advantages:
Focuses on samples that are hard to classify Sample weights can be used for:
1 Sampling probability
2 Used by classifier to value them more
Adaboost:
Calculates classifier importance instead of voting Exponential weight update rules
But, susceptible to overfitting
Günay (Emory) Classification: Ensemble Methods Fall 2013 5 / 6
Example: AdaBoost
Base classifiers: C
1, C
2, …, C
T
Error rate:
Importance of a classifier:
ε
i= 1 N ∑
j=1 N
w
jδ ( C
i( x
j)≠ y
j)
α
i= 1
2 ln ( 1−ε ε
i i)
Example: AdaBoost
Weight update:
If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated
Classification:
wi(j+ 1)
=
w(ij )Zj
{ exp exp−ααj j if C if C
jj( (
xxii)=y )≠
yii}
where Z
jis the normalization factor
( )
∑
==
=
Tj
j y j
y x C x
C
1
) ( max
arg )
(
* α δ
(C) Vipin Kumar, Parallel Issues in Data Mining, V ECPAR 2002
11
Illustrating AdaBoost
Data points for training Initial weights for each data point
(C) Vipin Kumar, Parallel Issues in Data Mining, V ECPAR 2002
12
Illustrating AdaBoost
Random Forests
• Sample a data set with replacement
• Select m variables at random from p variables
• Create a tree
• Similarly create more trees
• Combine the results
• Reference:
– Hastie, Tibshirani, Friedman, The Elements of
Statistical Learning, Chapter 15
Random Forests
Advantages:
Only for decision trees Lowers generalization error
Uses randomization in tree construction: #features= log2d + 1 Equivalent accuracy to Adaboost, but faster
See table in Tan et al p. 294 for comparison of ensemble methods.
Günay (Emory) Classification: Ensemble Methods Fall 2013 6 / 6