• No results found

Machine Learning Algorithms for Classification. Rob Schapire Princeton University

N/A
N/A
Protected

Academic year: 2022

Share "Machine Learning Algorithms for Classification. Rob Schapire Princeton University"

Copied!
92
0
0

Loading.... (view fulltext now)

Full text

(1)

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Rob Schapire Princeton University

(2)

Machine Learning Machine Learning Machine Learning Machine Learning Machine Learning

studies how to automatically learnto make accurate predictions based on past observations

classification problems:

classify examples into given set of categories new example

machine learning algorithm

classification predicted rule

classification examples

training labeled

(3)

Examples of Classification Problems Examples of Classification Problems Examples of Classification Problems Examples of Classification Problems Examples of Classification Problems

text categorization (e.g., spam filtering)

fraud detection

optical character recognition

machine vision (e.g., face detection)

natural-language processing

(e.g., spoken language understanding)

market segmentation

(e.g.: predict if customer will respond to promotion)

bioinformatics

(e.g., classify proteins according to their function) ...

(4)

Characteristics of Modern Machine Learning Characteristics of Modern Machine Learning Characteristics of Modern Machine Learning Characteristics of Modern Machine Learning Characteristics of Modern Machine Learning

primary goal: highly accurate predictions on test data

goal isnot to uncover underlying “truth”

methods should be general purpose, fullyautomatic and

“off-the-shelf”

however, in practice, incorporation ofprior, human knowledge is crucial

rich interplay between theory andpractice

emphasis on methods that can handle large datasets

(5)

Why Use Machine Learning? Why Use Machine Learning? Why Use Machine Learning? Why Use Machine Learning? Why Use Machine Learning?

advantages:

often much more accurate than human-crafted rules (since data driven)

humans often incapable of expressing what they know (e.g., rules of English, or how to recognize letters), but can easily classify examples

don’t need a human expert or programmer

automatic method to search for hypotheses explaining data

cheap and flexible — can apply to any learning task

disadvantages

need a lot oflabeleddata

error prone— usually impossible to get perfect accuracy

(6)

This Talk This Talk This Talk This Talk This Talk

machine learning algorithms:

decision trees

conditions for successful learning

boosting

support-vector machines

othersnot covered:

neural networks

nearest neighbor algorithms

Naive Bayes

bagging

random forests ...

practicalities of using machine learning algorithms

(7)

Decision Trees Decision Trees Decision Trees Decision Trees Decision Trees

(8)

Example: Good versus Evil Example: Good versus Evil Example: Good versus Evil Example: Good versus Evil Example: Good versus Evil

problem: identify people as good or bad from their appearance sex mask cape tie ears smokes class

training data

batman male yes yes no yes no Good

robin male yes yes no no no Good

alfred male no no yes no no Good

penguin male no no yes no yes Bad

catwoman female yes no no yes no Bad

joker male no no no no no Bad

test data

batgirl female yes yes no yes no ??

riddler male yes no no no no ??

(9)

A Decision Tree Classifier A Decision Tree Classifier A Decision Tree Classifier A Decision Tree Classifier A Decision Tree Classifier

tie

good no yes

yes no

bad bad

cape smokes

no yes

good

(10)

How to Build Decision Trees How to Build Decision Trees How to Build Decision Trees How to Build Decision Trees How to Build Decision Trees

choose rule to split on

divide data using splitting rule into disjoint subsets

tie

no yes

alfred catwoman

joker penguin

robin batman

batman alfred

robin joker catwoman

penguin

(11)

How to Build Decision Trees How to Build Decision Trees How to Build Decision Trees How to Build Decision Trees How to Build Decision Trees

choose rule to split on

divide data using splitting rule into disjoint subsets

repeat recursively for each subset

stop when leaves are (almost) “pure”

tie

no yes

alfred catwoman

joker penguin

robin batman

batman alfred

robin joker catwoman

penguin

tie

no yes

(12)

How to Choose the Splitting Rule How to Choose the Splitting Rule How to Choose the Splitting Rule How to Choose the Splitting Rule How to Choose the Splitting Rule

key problem: choosing best rule to split on:

tie

no yes

alfred catwoman

joker penguin

robin batman

batman alfred

robin joker catwoman

penguin

no yes

alfred catwoman

joker penguin

robin batman

joker catwoman

cape

batman robin alfred

penguin

(13)

How to Choose the Splitting Rule How to Choose the Splitting Rule How to Choose the Splitting Rule How to Choose the Splitting Rule How to Choose the Splitting Rule

key problem: choosing best rule to split on:

tie

no yes

alfred catwoman

joker penguin

robin batman

batman alfred

robin joker catwoman

penguin

no yes

alfred catwoman

joker penguin

robin batman

joker catwoman

cape

batman robin alfred

penguin

idea: choose rule that leads to greatest increase in “purity”

(14)

How to Measure Purity How to Measure Purity How to Measure Purity How to Measure Purity How to Measure Purity

want (im)purity function to look like this:

(p = fraction of positive examples)

impurity

0 1/2 1

p

commonly used impurity measures:

entropy: −p ln p − (1 − p) ln(1 − p)

Gini index: p(1 − p)

(15)

Kinds of Error Rates Kinds of Error Rates Kinds of Error Rates Kinds of Error Rates Kinds of Error Rates

training error = fraction of training examples misclassified

test error = fraction of test examples misclassified

generalization error = probability of misclassifying new random example

(16)

A Possible Classifier A Possible Classifier A Possible Classifier A Possible Classifier A Possible Classifier

cape

ears

tie cape

ears

good

good bad bad

good bad

bad good bad good sex smokes

smokes

no yes

no yes

no yes

no

no no

no

no yes

yes yes yes

female yes male

mask

perfectly classifies training data

BUT: intuitively, overly complex

(17)

Another Possible Classifier Another Possible Classifier Another Possible Classifier Another Possible Classifier Another Possible Classifier

bad good

no yes

mask

overly simple

doesn’t even fit available data

(18)

Tree Size versus Accuracy Tree Size versus Accuracy Tree Size versus Accuracy Tree Size versus Accuracy Tree Size versus Accuracy

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9

0 10 20 30 40 50 60 70 80 90 100

Accuracy

On training data On test data

                              

                              

                              

                              

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

            

         

         

         

         

         

         

         

         

         

         

         

         

         

                             

                             

40

20 30

error (%)

50

tree size 50

test

train

0 100

10

trees must be big enough to fit training data (so that “true” patterns are fully captured)

BUT: trees that are too big may overfit

(capture noise or spurious patterns in the data)

significant problem: can’t tell best tree size from training error

(19)

Overfitting Example Overfitting Example Overfitting Example Overfitting Example Overfitting Example

fitting points with a polynomial

underfit ideal fit overfit

(degree = 1) (degree = 3) (degree = 20)

(20)

Building an Accurate Classifier Building an Accurate Classifier Building an Accurate Classifier Building an Accurate Classifier Building an Accurate Classifier

for goodtest peformance, need:

enough training examples

good performance on training set

classifier that is not too “complex” (“Occam’s razor”)

classifiers should be “as simple as possible, but no simpler”

“simplicity” closely related to prior expectations

(21)

Building an Accurate Classifier Building an Accurate Classifier Building an Accurate Classifier Building an Accurate Classifier Building an Accurate Classifier

for goodtest peformance, need:

enough training examples

good performance on training set

classifier that is not too “complex” (“Occam’s razor”)

classifiers should be “as simple as possible, but no simpler”

“simplicity” closely related to prior expectations

measure “complexity” by:

number bits needed to write down

number of parameters

VC-dimension

(22)

Example Example Example Example Example

Training data:

References

Related documents

Tsehaye Dejene and my two daughters Samera Mulatu and Abyssinia Mulatu for their great cooperation during my work on this

A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories. International Research Journal of Marketing

S˘ al˘ agean, Subclasses of univalent functions, Complex Analysis—Fifth Romanian-Finnish Seminar, Part 1 (Bucharest, 1981), Lecture Notes in Math., vol. Owa, Convolution properties of

skill required for IT companies and basic computer education, such trained students get job..

In this paper, we analyze the stability properties of the polynomial spline collocation method for the approximate solution of general Volterra integro-di ff erential equation..

An Indian study described changes in functional improvements using the West- ern Ontario and McMaster University Osteoarthritis Index (WOMAC) in patients receiving

shipyards have lower direct labor hours than China, but this advantage is no more than offset by the expensive material costs and labor rates.... DETAIL ABOUT

Regarding the kappa values (Table 3) and the diagnostic values for grade 3 and 4 lesions (Table 4), we assume that MRI is more reliable for the diagnosis of cartilaginous defects