Introduction to Machine Learning

Introduction to Machine Learning

Computer Science & Engineering State University of New York at Buffalo

Outline

Taking the next step

Linear Regression

Two Interpretations

Two Interpretations

Learning Parameters - MLE Approach

Learning Parameters - Least Squares Approach

Gradient Descent Based Method

Recap - Linear Regression

Issues with Linear Regression

Putting a Prior on w

Posterior Estimates of the Weight Vector

Parameter Estimation for Bayesian Regression

Prediction with Bayesian Regression

Full Bayesian Treatment

Handling Non-linear Relationships

How to Control Overfitting?

Examples of Regularization

Parameter Estimation for Ridge Regression

Introduction to Machine Learning

Introduction to Machine Learning

Linear Regression

Varun Chandola

Outline

Linear Regression Problem Formulation Geometric Interpretation Learning Parameters Recap

Issues with Linear Regression Bayesian Linear Regression Bayesian Regression

Estimating Bayesian Regression Parameters Prediction with Bayesian Regression Handling Non-linear Relationships

Handling Overfitting via Regularization

Taking the next step

Hypothesis Space, H

Conjunctive

Disjunctive

Disjunctions of k attributes

Linear hyperplanes

c

∈ H /

Non-linear network

Input Space, x

x ∈ {0, 1}

x ∈ R

Input Space, y

y ∈ {0, 1}

y ∈ {−1, +1}

y ∈ R

Linear Regression

There is one scalar target variable y (instead of hidden)

There is one vector input variable x

Inductive bias:

y = w

x

Linear Regression Learning Task

Learn w given training examples, hX, yi.

Two Interpretations

1. Probabilistic Interpretation

y is assumed to be normally distributed y ∼ N (w

x, σ

)

or, equivalently:

y = w

x +  where  ∼ N (0, σ

)

y is a linear combination of the input variables

Given w and σ

, one can find the probability distribution of y for a

given x

Two Interpretations

2. Geometric Interpretation

Fitting a straight line to d dimensional data y = w

x

y = w

x = w

x

+ w

x

+ . . . + w

x

Will pass through origin

Add intercept

y = w

+ w

x

+ w

x

+ . . . + w

x

Equivalent to adding another column in X of 1s.

Learning Parameters - MLE Approach

Find w and σ

that maximize the likelihood of training data

w b

= (X

X)

X

y b σ

= 1

N (y − Xw)

(y − Xw)

Learning Parameters - Least Squares Approach

Minimize squared loss

x + where ∼ N (0, σ