• No results found

Lecture 34.pdf

N/A
N/A
Protected

Academic year: 2020

Share "Lecture 34.pdf"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

((even) more)

Decision Trees

(and regression)

CS540-002, Spring 2015 Lecture 34

(2)

● Upcoming AIRG

○ Causality -- An Introduction (Monday)

● WHW2 Released

○ Due Wednesday, April 22 (before class)

● Remember Project 4

○ Due April 17

○ Please turn in latedays.txt no matter what

(3)

Today:

● Decision Trees

○ Overfitting (and pruning)

● Regression

○ Linear

○ Polynomial

(4)

So Far:

Given data, we can build a consistent DT if: ● The setting is Classification.

● All features are categorical.

(5)

Roadmap:

● More on decision trees

○ Broadening Applicability (18.3.6) ○ Overfitting

■ Early Stopping ■ Pruning

● Regression (18.6)

○ Linear

○ Polynomial

● Practical Considerations (18.4)

(6)

Overfitting (Remedy 1):

Suppose we find ourselves in the following position. 1, 2, 3, 4, 5, 6, 7, 8,

9, 10, 11, 12, 13

f139

0 1

1, 2, 3, 4, 5, 6, 7 8, 9, 10, 11, 12, 13

f10

0 1

8, 9 10, 11, 12,13

10

10, 11, 12, 13

f15

0 1

12, 13 10, 11

f15

(7)

Overfitting (Remedy 1):

Early stopping.

Basic idea: if you’re close to 0 entropy, don’t split.

Alternative early stopping:

(8)

Overfitting (Remedy 2):

Basic idea:

Suppose we find that splitting on a feature has a low (but non-zero) information gain.

Maybe the feature in question is pure noise, and it just so happens that it helps separate the classes. Can we determine if it is actually an indicative

(9)

Overfitting (Remedy 2):

Basic idea:

Suppose this feature is pure noise (the ‘null-hypothesis’)

We got Δ better than that.

What’s the probability we ‘lucked out’ by so much? Less than 5% you say? Then then it’s probably a good split.

(10)

Overfitting (Remedy 2):

Suppose we split on a feature and get p positive, and n negative examples in the kth split.

(11)

Overfitting (Remedy 2):

(12)

Δ is distributed as

(13)

Continuous or Ordinal Features:

Suppose we have a feature Height. Example:

[60, 64, 80, 70, 71, 68, 81, 55, 48, 70, 71] Step 1: Sort.

[48, 55, 60, 64, 68, 70, 70, 71, 71, 80, 81] Step 2: Consider each possible split.

(14)

Regression Trees:

Basic idea:

Replaces leaves with simple regression models. Two (sub) problems arise:

● How do we learn a regression model?

(15)

Regression:

Recall: In Classification, our target variable is one of a discrete set.

E.g., Author, WillWait, WillGetHeartDisease In Regression, the target variable is real.

(16)

Given x ∈ ℝ, predict y where

y = f(x) = w1x + w0+ noise

Univariate Linear Regression:

(17)
(18)
(19)

Univariate Linear Regression:

y Basic idea:

Find the line

which minimizes the sum of

(20)

Univariate Linear Regression:

Minimizing the loss:

Of all the pairs (w0, w1) ...

(21)

...the prediction for the jth point...

Univariate Linear Regression:

Sum over all points

The loss

incurred from...

...and the true value.

(22)
(23)

Let There be Calculus...

Key point:

Because the loss is quadratic, its derivative is linear. This yields a linear system.

(24)
(25)
(26)

Multivariate Linear Regression:

Notational note:

The book denotes the ith feature of the jth datapoint as: xj, i

(27)

Detour: This w

0

business is ugly

Note how w0 is ‘different’ (it’s not multiplied by any feature).

We can get rid of it by augmenting x:

[x1, x2, x3] [1, x1, x2, x3] w0 + w1x1+ w2x2 + w3x3

(28)

Multivariate Linear Regression:

(29)

(Univariate) ‘Polynomial’ Regression:

Given x ∈ ℝ, predict y where

y = f(x) = w2x2 + w1x + w0+ noise

(30)

Detour: This w

0

business is ugly

Note how w0 is ‘different’ (it’s not multiplied by any feature).

We can get rid of it by augmenting x: [x1] [1, x1]

w0 + w1x1

Remember this? w2

w2

(But rather, by a new ‘feature’, x2) add

[1, x1, x12]

Becomes w

(31)

Now we’re back to the linear

regression case!

(32)

(Univariate) ‘Polynomial’ Regression:

Given x ∈ ℝ, predict y where

y = f(x) = w3x3 + w2x2 + w1x + w0+ noise

(33)

No problem!

(34)

Let’s look at the [1, x] case.

References

Related documents

When stimulated with an HIV-1 Gag peptide pool ex vivo, both CD8 + and CD4 + T cells from hu-mice with IFNAR1 blockade produced significantly higher levels of IFN- γ and IL-2