• No results found

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

N/A
N/A
Protected

Academic year: 2021

Share "Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

Machine Learning and Data Mining

Regression Problem

(adapted from) Prof. Alexander Ihler

(2)

Overview

Regression Problem Definition and define

parameters

ϴ.

Prediction using

ϴ as parameters

Measure the error

Finding good parameters

ϴ (direct

minimization problem)

(3)

Example of Regression

Vehicle price estimation problem

Features x: Fuel type, The number of doors, Engine size

Targets y: Price of vehicle

– Training Data: (fit the model)

– Testing Data: (evaluate the model)

Fuel type The number of doors Engine size (61 – 326) Price 1 4 70 11000 1 2 120 17000 2 2 310 41000

Fuel type The number of doors

Engine size (61 – 326)

Price

(4)

Example of Regression

Vehicle price estimation problem

ϴ = [-300, -200 , 700 , 130]

– Ex #1 = -300 + -200 *1 + 4*700 + 70*130 = 11400

– Ex #2 = -300 + -200 *1 + 2*700 + 120*130 = 16500

– Ex #3 = -300 + -200 *2 + 2*700 + 310*130 = 41000

• Mean Absolute Training Error = 1/3 *(400 + 500 + 0) = 300

– Test = -300 + -200 *2 + 4*700 + 150*130 = 21600

• Mean Absolute Testing Error = 1/1 *(600) = 600

Fuel type #of doors Engine size Price

1 4 70 11000

1 2 120 17000

2 2 310 41000

Fuel type The number of

doors

Engine size (61 – 326)

Price

(5)

Supervised learning

Notation

Features x (input variables)

Targets y (output variables)

– Predictions ŷ – Parameters θ Program (“Learner”) Characterized by some “parameters” θ Procedure (using θ) that outputs a prediction

Training data (examples) Features Learning algorithm Change θ Improve performance Feedback / Target values Evaluation of the model (measure error)

(6)

Overview

Regression Problem Definition and

parameters.

Prediction using

ϴ as parameters

Measure the error

Finding good parameters

ϴ (direct

minimization problem)

(7)

Linear regression

Define form of function f(x) explicitly

Find a good f(x) within that family

0 10 20 0 20 40 T ar ge t y Feature X1 “Predictor”: Evaluate line: ӯ = ϴ0 + ϴ1 * X1 return ӯ ӯ = Predicted target value (Black line)

(c) Alexander Ihler

New instance with X1=8 Predicted value =17

(8)

More dimensions?

0 10 20 30 40 0 10 20 30 20 22 24 26 0 10 20 30 40 0 10 20 30 20 22 24 26

x

1

x

2

y

x

1

x

2

y

(9)

Notation

Define “feature” x

0

= 1 (constant)

Then

(c) Alexander Ihler

n = the number of features in dataset

(10)

Overview

Regression Problem Definition and

parameters.

Prediction using

ϴ as parameters

Measure the error

Finding good parameters

ϴ (direct

minimization problem)

(11)

Supervised learning

Notation

Features x (input variables)

Targets y (output variables)

– Predictions ŷ – Parameters θ Program (“Learner”) Characterized by some “parameters” θ Procedure (using θ) that outputs a prediction

Training data (examples) Features Learning algorithm Change θ Improve performance Feedback / Target values Evaluation of the model (measure error)

(12)

Measuring error

0 20 0

Error or “residual”

Prediction

Observation

Red points = Real target values Black line = ӯ (predicted value) ӯ = ϴ0 + ϴ1 * X

Blue lines = Error (Difference between real value y and predicted value ӯ)

(13)

Mean Squared Error

How can we quantify the error?

• Y= Real target value in dataset,

• ӯ = Predicted target value by ϴ*X

• Training Error: m= the number of training instances,

• Testing Error: Using a partition of Training error to check predicted

values. m= the number of testing instances,

(c) Alexander Ihler

(14)

MSE cost function

Rewrite using matrix form

m=number of instance of data n = the number of features,

X = input variables in dataset y= output variable in dataset

(15)

Visualizing the error function

-1 -0.5 0 0.5 1 1.5 2 2.5 3 -40 -30 -20 -10 0 10 20 30 40 (c) Alexander Ihler

θ

1

J(θ

)

Dimensions are ϴ0 and ϴ1 instead of

X1 and X2

Output is J instead of y as target value

Representation of J in 2D space. Inner red circles has less value of J Outer red circles has higher value of J

J is error function.

θ

0

θ

1

The plane is the value of J, not the plane fitted to output values.

(16)

Overview

Regression Problem Definition and

parameters.

Prediction using

ϴ as parameters

Measure the error

Finding good parameters

ϴ (direct

minimization problem)

(17)

Supervised learning

Notation

Features x Targets y – Predictions ŷ – Parameters θ Program (“Learner”) Characterized by some “parameters” θ Procedure (using θ) that outputs a prediction

Training data (examples) Features Learning algorithm Change θ Improve performance Feedback / Target values Evaluation of the model (measure error)

(18)

Finding good parameters

Want to find parameters which minimize our error…

(19)

MSE Minimum (m <= n+1)

Consider a simple problem

– One feature, two data points

– Two unknowns and two equations:

Can solve this system directly:

(c) Alexander Ihler

m=number of instance of data n = the number of features,

Theta gives a line or plane that exactly fit to all target values. n +1=1+1 = 2

(20)

SSE Minimum (m > n+1)

Reordering, we have

•Most of the time, m > n

– There may be no linear function that hits all the data exactly

– Minimum of a function has gradient equal to zero (gradient is a

horizontal line.)

Just need to know how to compute

n +1=1+1 = 2 m=3

(21)

0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18

Effects of Mean Square Error choice

16 2 cost for this one datum

Heavy penalty for large errors

Distract line from other points.

(c) Alexander Ihler

(22)

Absolute error

2 4 6 8 10 12 14 16

18 MSE, original data

Abs error, original data

(23)

Error functions for regression

“Arbitrary” Error functions

can’t be solved in closed form… So as alternative way, use

gradient descent (Mean Square Error)

(Mean Absolute Error)

Something else entirely…

(???)

(24)

Overview

Regression Problem Definition and

parameters.

Prediction using

ϴ as parameters

Measure the error

Finding good parameters

ϴ (direct

minimization problem)

(25)

Nonlinear functions

Single feature x, predict target y:

Sometimes useful to think of “feature transform”

– Convert a non-linear function to linear function and then

solve it.

Add features:

Linear regression in new features

(26)

Higher-order polynomials

Are more features better?

“Nested” hypotheses

– 2nd order more general than 1st,

– 3rd order “ “ than 2nd, …

Fits the observed data better

18nd order

Y = ϴ0

Y = ϴ0+ ϴ1 * X Y = ϴ0+ ϴ1 * X + ϴ2*X2 + ϴ

(27)

Test data

After training the model

Go out and get more data from the world

– New observations (x,y)

How well does our model perform?

(c) Alexander Ihler

Training data

(28)

0 1 2 3 4 5 6 7 8 9 10 0 5 10 15 20 25 30

Training data

Training versus test error

• Plot MSE as a function

of model complexity

– Polynomial order

• Decreases

– More complex function

fits training data better

• What about new data?

Mea n s q u a red err o r Polynomial order

New, “test” data

• 0th to 2st order – Error decreases – Underfitting • Higher order – Error increases Overfitting Under fitting

(29)

Summary

Regression Problem Definition

Vehicle Price estimation

Prediction using

ϴ:

Measure the error: difference between y and

ŷ

e.g. Absolute error, MSE

direct minimization problem

Two cases m<=n+1 and m > n+1

Non-Linear regression problem

Finding best n

th

order polynomial function for

each problem (not overfitting and not under

fitting)

References

Related documents

While there are numerous strategies to be deployed to move Canada to a financially sustainable future, this report addresses two critically important issues:

Homeowner and Lender get additional mediation sessions 5 1 Roughly 50% of homeowners have a pending “Home Affordable Modification Program” (HAMP) application when entering

In no event this document and any information included herein shall be construed as providing any warranties or performance guarantees either express or implied, including without

For the last thirty years, X-ray fluorescence spectrometry (XRF), adopted mainly from geological applications, has been used particularly for the analysis of volcanic rocks

Lalramnghinghlova &amp; Jha (1998) de- scribed more than 200 medicinal plants used to treat dis- eases like: bleeding from nose, fever, malarial fever, asth- ma, tuberculosis,

More specifically, the results show that contingent reward has stronger rela- tionship with internal locus of control, whereas passive management by exception has stronger relation

One in eight registered athletes incurred an injury during the 9 days of the Championships or 135 injuries per 1000 accred- ited athletes, which is a higher incidence than at the 2007