• No results found

CS570 Data Mining Classification: Ensemble Methods

N/A
N/A
Protected

Academic year: 2022

Share "CS570 Data Mining Classification: Ensemble Methods"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

CS570 Data Mining Classification:

Ensemble Methods

Cengiz Günay

Dept. Math & CS, Emory University

Fall 2013

Some slides courtesy of Han-Kamber-Pei, Tan et al., and Li Xiong

Günay (Emory) Classification: Ensemble Methods Fall 2013 1 / 6

(2)

Today

Due today midnight:

Homework #2 – Frequent itemsets Given today:

Homework #3 – Classification Today’s menu:

Classification: Ensemble Methods

Günay (Emory) Classification: Ensemble Methods Fall 2013 2 / 6

(3)

Ensemble Methods

• Given a data set, generate multiple models and combine the results

• Bagging

• Random Forests

• Boosting

– PAC learning significance

(4)

General Idea

(5)

Why does it work?

Suppose there are 25 base classifiers

Each classifier has error rate, ε = 0.35

Assume classifiers are independent

Probability that the ensemble classifier makes a wrong prediction:

i=13

25

(

25i

)

εi(1−ε)25−i=0 . 06

(6)

Types of Ensemble Methods

Can be obtained by manipulating:

1 Training set:

Bagging Boosting

2 Input features: Random forests

Multi-objective evolutionary algorithms Forward/backward elimination?

3 Class labels: Multi-classes Active learning

4 Learning algorithm: ANNs

Decision trees

Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

(7)

Types of Ensemble Methods

Can be obtained by manipulating:

1 Training set:

Bagging Boosting

2 Input features:

Random forests

Multi-objective evolutionary algorithms Forward/backward elimination?

3 Class labels: Multi-classes Active learning

4 Learning algorithm: ANNs

Decision trees

Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

(8)

Types of Ensemble Methods

Can be obtained by manipulating:

1 Training set:

Bagging Boosting

2 Input features:

Random forests

Multi-objective evolutionary algorithms Forward/backward elimination?

3 Class labels:

Multi-classes Active learning

4 Learning algorithm: ANNs

Decision trees

Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

(9)

Types of Ensemble Methods

Can be obtained by manipulating:

1 Training set:

Bagging Boosting

2 Input features:

Random forests

Multi-objective evolutionary algorithms Forward/backward elimination?

3 Class labels:

Multi-classes Active learning

4 Learning algorithm:

ANNs Decision trees

Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

(10)

Bagging

• Create a data set by sampling data points with replacement

• Create model based on the data set

• Generate more data sets and models

• Predict by combining votes

– Classification: majority vote

– Prediction: average

(11)

Bagging

Sampling with replacement

Build classifier on each bootstrap sample

Each sample has probability (1 – 1/n)n of being selected

Original Data 1 2 3 4 5 6 7 8 9 10

Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9

Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2

Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7

(12)

Bagging

Advantages:

Less overfitting

Helps when classifier is unstable (has high variance) Disadvantages:

Not useful when classifier is stable and has large bias

Günay (Emory) Classification: Ensemble Methods Fall 2013 4 / 6

(13)

PAC learning

• Model defining learning with given accuracy and confidence using polynomial sample complexity

• References:

– L. Valiant. A theory of the learnable.

• http://web.mit.edu/6.435/www/Valiant84.pdf

– D. Haussler. Overview of the Probably

Approximately Correct (PAC) Learning Framework

• http://www.cs.iastate.edu/~honavar/pac.pdf

(14)

Boosting

• Use weak learners and combine to form strong learner in PAC learning sense

• Learn using a weak learner

• Boost the accuracy by reweighting the examples misclassified by previous weak learner and forcing the next weak learner to focus on the

“hard” examples

• Predict by using a weighted combination of the weak learners

– Weight is determined by their accuracy

(15)

Boosting

An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records

Initially, all N records are assigned equal weights

Unlike bagging, weights may change at the end of boosting round

(16)

Boosting

Records that are wrongly classified will have their weights increased

Records that are classified correctly will have their weights decreased

Original Data 1 2 3 4 5 6 7 8 9 10

Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3

Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2

Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4

• Example 4 is hard to classify

• Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

(17)

Boosting

Advantages:

Focuses on samples that are hard to classify Sample weights can be used for:

1 Sampling probability

2 Used by classifier to value them more

Adaboost:

Calculates classifier importance instead of voting Exponential weight update rules

But, susceptible to overfitting

Günay (Emory) Classification: Ensemble Methods Fall 2013 5 / 6

(18)

Example: AdaBoost

Base classifiers: C

1

, C

2

, …, C

T

Error rate:

Importance of a classifier:

ε

i

= 1 N

j=1 N

w

j

δ ( C

i

( x

j

)≠ y

j

)

α

i

= 1

2 ln ( 1−ε ε

i i

)

(19)

Example: AdaBoost

Weight update:

If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated

Classification:

wi(j+ 1)

=

w(ij )

Zj

{ exp exp

−ααj j

if C if C

jj

( (

xxii

)=y )≠

yii

}

where Z

j

is the normalization factor

( )

=

=

=

T

j

j y j

y x C x

C

1

) ( max

arg )

(

* α δ

(20)

(C) Vipin Kumar, Parallel Issues in Data Mining, V ECPAR 2002

11

Illustrating AdaBoost

Data points for training Initial weights for each data point

(21)

(C) Vipin Kumar, Parallel Issues in Data Mining, V ECPAR 2002

12

Illustrating AdaBoost

(22)

Random Forests

• Sample a data set with replacement

• Select m variables at random from p variables

• Create a tree

• Similarly create more trees

• Combine the results

• Reference:

– Hastie, Tibshirani, Friedman, The Elements of

Statistical Learning, Chapter 15

(23)

Random Forests

Advantages:

Only for decision trees Lowers generalization error

Uses randomization in tree construction: #features= log2d + 1 Equivalent accuracy to Adaboost, but faster

See table in Tan et al p. 294 for comparison of ensemble methods.

Günay (Emory) Classification: Ensemble Methods Fall 2013 6 / 6

References

Related documents

This guideline note covers one category of AFI’s Second Tier Set of Financial Inclusion Indicators, developed jointly by AFI’s Mobile Financial Services Working Group (MFSWG)

3 To access information from the Knowledge Base for products listed on this Revision Release List please create an account at the Rockwell Automation Support. Center

48 A "tracking device" is defined as "an electronic or mechanical device which permits the tracking of the movement of a person or object., 49 While

In Maharashtra by contrast, aggregate expenditure carries a unit root, with no deterministic trend, and no drift term; expenditure shocks of other than Pay Commission origin appear

The ten biggest financial institutions in the Czech and Slovak Republics account for about 43% of all voucher points and 60% of the voucher points acquired by all

This is the reason, why a QS certification (pig) (version 1 of March 2006) on the farm is assu- med in the following view and a benefit value cost analysis for an

Junk bonds and stocks Nonprime- grade firms Savers and investors Banks Hedge/venture funds, direct private equity Privat e equity Dep osit s Risky capital Securities markets

The considerable increase in volatile energy generation poses new challenges to the me- dium and low-voltage networks: Load fluctuations, changes in load-flow directions