• No results found

Supervised Learning: The Setup

N/A
N/A
Protected

Academic year: 2021

Share "Supervised Learning: The Setup"

Copied!
38
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Decision Trees: Representation

Machine Learning Fall 2017

Supervised Learning: The Setup

1

Machine Learning

Spring 2018

(2)

Last Lecture: Supervised Learning Settings

1. What is our instance space?

What are the inputs to the problem? What are the features?

2. What is our label space?

What is the prediction task?

3. What is our hypothesis space?

What functions should the learning algorithm search over?

4. What is our learning algorithm?

How do we learn from the labeled data?

5. What is our loss function or evaluation metric?

What is success?

Formulation

Implementation

(3)

Coming up… (the rest of the semester)

Different hypothesis spaces and learning algorithms

– Decision trees and the ID3 algorithm – Linear classifiers

Perceptron

Winnow

SVM

Logistic regression

– Combining multiple classifiers

Boosting, bagging

– Non-linear classifiers – Nearest neighbors

3

(4)

Coming up… (the rest of the semester)

Different hypothesis spaces and learning algorithms

– Decision trees and the ID3 algorithm – Linear classifiers

Perceptron

Winnow

SVM

Logistic regression

– Combining multiple classifiers

Boosting, bagging

– Non-linear classifiers – Nearest neighbors

Important issues to consider

1. What do these hypotheses represent?

2. Implicit assumptions and tradeoffs 3. Generalization?

4. How do we learn?

(5)

This lecture: Learning Decision Trees

1. Representation : What are decision trees?

2. Algorithm: Learning decision trees

– The ID3 algorithm: A greedy heuristic

3. Some extensions

5

(6)

This lecture: Learning Decision Trees

1. Representation : What are decision trees?

2. Algorithm: Learning decision trees

– The ID3 algorithm: A greedy heuristic

3. Some extensions

(7)

Representing data

Data can be represented as a big table, with columns denoting different attributes

Name Label

Claire Cardie

-

Peter Bartlett

+

Eric Baum

+

Haym Hirsh

+

Shai Ben-David

+

Michael I. Jordan

-

7

(8)

Representing data

Data can be represented as a big table, with columns denoting different features/attributes

Name Special

character in name?

Second character of

first name

Length of first

name>5? Gender Label

Claire Cardie No l Yes Female

-

Peter Bartlett No e No Male

+

Eric Baum No r No Male

+

Haym Hirsh No a No Male

+

Shai Ben-

David Yes h No Male

+

Michael I.

Jordan Yes i Yes Male

-

(9)

Representing data

Data can be represented as a big table, with columns denoting different attributes

Name Special

character in name?

Second character of

first name

Length of first

name>5? Gender Label

Claire Cardie No l Yes Female

-

Peter Bartlett No e No Male

+

Eric Baum No r No Male

-

Haym Hirsh No a No Male

+

Shai Ben-

David Yes h No Male

-

Michael I.

Jordan Yes i Yes Male

+

With these four attributes, how many unique rows are possible?

2¢ 26¢ 26¢ 2 = 2704

If there are 100 attributes, all binary, how many unique rows are possible?

2100

9

(10)

Representing data

Data can be represented as a big table, with columns denoting different attributes

Name Special

character in name?

Second character of

first name

Length of first

name>5? Gender Label

Claire Cardie No l Yes Female

-

Peter Bartlett No e No Male

+

Eric Baum No r No Male

-

Haym Hirsh No a No Male

+

Shai Ben-

David Yes h No Male

-

Michael I.

Jordan Yes i Yes Male

+

With these four attributes, how many unique rows are possible?

2 × 26 × 2 × 2 = 208

If there are 100 attributes, all binary, how many unique rows are possible?

2100

(11)

Representing data

Data can be represented as a big table, with columns denoting different attributes

Name Special

character in name?

Second character of

first name

Length of first

name>5? Gender Label

Claire Cardie No l Yes Female

-

Peter Bartlett No e No Male

+

Eric Baum No r No Male

-

Haym Hirsh No a No Male

+

Shai Ben-

David Yes h No Male

-

Michael I.

Jordan Yes i Yes Male

+

With these four attributes, how many unique rows are possible?

2 × 26 × 2 × 2 = 208

If there are 100 attributes, all binary, how many unique rows are possible?

2100

11

(12)

Representing data

Data can be represented as a big table, with columns denoting different attributes

Name Special

character in name?

Second character of

first name

Length of first

name>5? Gender Label

Claire Cardie No l Yes Female

-

Peter Bartlett No e No Male

+

Eric Baum No r No Male

-

Haym Hirsh No a No Male

+

Shai Ben-

David Yes h No Male

-

Michael I.

Jordan Yes i Yes Male

+

With these four attributes, how many unique rows are possible?

2 × 26 × 2 × 2 = 208

If there are 100 attributes, all binary, how many unique rows are possible?

2100

(13)

Representing data

Data can be represented as a big table, with columns denoting different attributes

Name Special

character in name?

Second character of

first name

Length of first

name>5? Gender Label

Claire Cardie No l Yes Female

-

Peter Bartlett No e No Male

+

Eric Baum No r No Male

-

Haym Hirsh No a No Male

+

Shai Ben-

David Yes h No Male

-

Michael I.

Jordan Yes i Yes Male

+

With these four attributes, how many unique rows are possible?

2 × 26 × 2 × 2 = 208

If there are 100 attributes, all binary, how many unique rows are possible for a function ? 2100

13

We need to figure out how to represent in a better, more efficient way

(14)

What are decision trees?

A hierarchical data structure that represents data using a divide-and-conquer strategy

Can be used as flexible hypothesis class for classification or regression

General idea: Given a collection of labeled examples,

construct a decision tree that represents it

(15)

What are decision trees?

• Decision trees are a family of classifiers for instances that are represented by collections of attributes (i.e. features)

• Nodes are tests for feature values

• There is one branch for every value that the feature can take

• Leaves of the tree specify the class labels

15

(16)

Let’s build a decision tree for classifying shapes

Label=A

Label=C Label=B

(17)

Let’s build a decision tree for classifying shapes

Label=A

Label=C Label=B

17

Before building a decision tree:

What is the label for a red triangle? And why?

(18)

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Label=A

Label=C Label=B

(19)

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Color , Shape

19

Label=A

Label=C Label=B

(20)

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Color , Shape

Color?

Label=A

Label=C Label=B

(21)

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Color , Shape

Color?

Blue Red Green

21

Label=A

Label=C Label=B

(22)

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Color , Shape

Color?

Blue Red Green

B

Label=A

Label=C Label=B

(23)

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Color , Shape

Color?

Blue Red Green

B square

triangle circle A C

B

Shape?

23

Label=A

Label=C Label=B

(24)

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Color , Shape

Color?

Shape?

circle square

B A

Blue Red Green

B square

triangle circle A C

B

Shape?

Label=A

Label=C Label=B

(25)

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

Color , Shape

Label=A

Label=C Label=B

Color?

Shape?

circle square

B A

Blue Red Green

square

triangle circle A C

B

Shape?

1. How to use a decision tree for prediction?

What is the label for a red triangle?

Just follow a path from the root to a leaf

What about a green triangle?

25

B

(26)

Expressivity of Decision trees

What Boolean functions can decision trees represent?

Every path from the tree to a root is a rule

The full tree is equivalent to the conjunction of all the rules

(27)

Expressivity of Decision trees

What Boolean functions can decision trees represent?

(Color=blue AND Shape=triangle ) Label=B) AND (Color=blue AND Shape=square ) Label=A) AND (Color=blue AND Shape=circle ) Label=C) AND….

Every path from the tree to a root is a rule

The full tree is equivalent to the conjunction of all the rules

Any Boolean function can be represented as a decision tree.

27

(28)

Expressivity of Decision trees

What Boolean functions can decision trees represent?

Any Boolean function can be represented as a decision tree.

Why?

(29)

Decision Trees

• Outputs are discrete categories

• But real valued outputs are also possible (regression trees)

• Methods for handling noisy data (noise in the label or in the features) and for handling missing attributes

Pruning trees helps with noise More on this later…

29

(30)

Numeric attributes and decision boundaries

• We have seen instances represented as attribute-value pairs

(color=blue, second letter=e, etc.)

Values have been categorical

• How do we deal with numeric feature values?

(eg length = ?)

Discretize them or use thresholds on the numeric values

This example divides the feature space into axis parallel rectangles

(31)

Numeric attributes and decision boundaries

• We have seen instances represented as attribute-value pairs

(color=blue, second letter=e, etc.)

Values have been categorical

• How do we deal with numeric feature values?

(eg length = ?)

Discretize them or use thresholds on the numeric values

This example divides the feature space into axis parallel rectangles

31

(32)

Numeric attributes and decision boundaries

• We have seen instances represented as attribute-value pairs

(color=blue, second letter=e, etc.)

Values have been categorical

• How do we deal with numeric feature values?

(eg length = ?)

Discretize them or use thresholds on the numeric values

This example divides the feature space into axis parallel rectangles

(33)

Numeric attributes and decision boundaries

• We have seen instances represented as attribute-value pairs

(color=blue, second letter=e, etc.)

Values have been categorical

• How do we deal with numeric feature values?

(eg length = ?)

Discretize them or use thresholds on the numeric values

This example divides the feature space into axis parallel rectangles

1 3 X 7

5 Y

- +

+ +

+ +

- - +

33

(34)

Numeric attributes and decision boundaries

• We have seen instances represented as attribute-value pairs

(color=blue, second letter=e, etc.)

Values have been categorical

• How do we deal with numeric feature values?

(eg length = ?)

Discretize them or use thresholds on the numeric values

This example divides the feature space into axis parallel rectangles

1 3 X 7

5 Y

- +

+ +

+ +

- - +

34

X<3

Y<5

no yes

Y>7

yes no

X < 1

no yes

- + +

+

no yes

-

(35)

Numeric attributes and decision boundaries

• We have seen instances represented as attribute-value pairs

(color=blue, second letter=e, etc.)

Values have been categorical

• How do we deal with numeric feature values?

(eg length = ?)

Discretize them or use thresholds on the numeric values

This example divides the feature space into axis parallel rectangles

1 3 X 7

5 Y

- +

+ +

+ +

- - +

Decision boundaries can be non-linear

35

X<3

Y<5

no yes

Y>7

yes no

X < 1

no yes

- + +

+

no yes

-

(36)

Summary: Decision trees

• Decision trees can represent any Boolean function

• A way to represent lot of data

• A natural representation (think 20 questions)

• Predicting with a decision tree is easy

• Clearly, given a dataset, there are many decision trees that can represent it. Why?

• Learning a good representation from data is the next

question

(37)

Summary: Decision trees

• Decision trees can represent any Boolean function

• A way to represent lot of data

• A natural representation (think 20 questions)

• Predicting with a decision tree is easy

• Clearly, given a dataset, there are many decision trees that can represent it. Why?

• Learning a good representation from data is the next question

37

(38)

Exercise

Write down the decision tree for the shapes data if the root node was Shape instead of Color

Will the two trees make the same predictions for unseen shapes/color combinations?

Label=A

Label=C Label=B

References

Related documents

[r]

Kho du lieu duqc xay dung de tien loi cho viec truy cap theo nhieu nguon, nhieu kieu du lieu khac nhau sao cho co the ket hop duqc ca nhung ung dung cua cac cong nghe hien dai va

There are currently two proposals for the socket outlet and at least five for the vehicle inlet. The early supply of EVs into the European market are mainly equipped with vehicle

However, this would likely give rise to much litigation to determine whether this was consistent with the MSFCMA and not an overly broad reading of section 1856(a)(3)(A) of that

Temperature can be thought of as a subset of hue, but it’s such an important form of contrast, that it is helpful to consider it as its own form of contrast. Color

Tema pameran fotografi ini DGDODK ³3HQGLGLNDQ GDQ .HDULIDQ /RNDO´ Setelah kegiatan permbelajaran yang bersifat teori dan praktek telah selesai, pada pertengahan semester

The Toshiba e- STUDIO 2330c/2830c are powerful, multifunction color copiers that combine the performance of a multifunction system with the quality of a professional graphics

Minors who do not have a valid driver’s license which allows them to operate a motorized vehicle in the state in which they reside will not be permitted to operate a motorized