Lecture 41.pdf

(1)

Non-Parametric Methods

+ Multiclass Classification

CS540-002, Spring 2015 Lecture 41

(2)

● Remember WHW3

● We have a final exam coming up

○ Thursday, May 14 ○ 5:05 - 7:05 PM

○ You can bring TWO sheets (8.5x11) of notes

○ Exam is cumulative, but focuses on material from after the midterm.

(3)

Today:

● Non-Parametric Methods

(4)

Recall: Parametric Methods

With previous methods, we produce a model: ● DTs: A tree

● Linear Regression: A vector* ● Perceptrons: A vector*

● Logistic Regression: A vector*

(5)

Non-Parametric Methods

k-Nearest Neighbors classification:

Basic idea:

Given a test point x, find the k closest points

in the training set.

(6)

x x₂

(7)

kNN: Implementation

The “learning” phase of kNN is putting the data into a structure such that queries are fast.

For example: ● A k-d Tree

(8)

kNN: A Potential Problem

Height (Feet)

(9)

kNN: A Potential Problem

Height (cm)

(10)

Data Normalization:

Basic Idea:

The units used shouldn’t make a difference. So, convert everything to a “unitless” measure. For the ith feature:

Let m_i be the mean of that feature (that we observed), and

let s_i be the standard deviation.

Replace x_i(j) by (x_i(j) - m_i) / s_i for every j.

(11)

Data Standardization:

Basic Idea:

The units used shouldn’t make a difference. So, convert everything to a “unitless” measure. For the ith feature:

Let m_i be the max of feature (that we observed).

Let n_i be the min of that feature.

Let r_i be m_i- n_i.

Replace x_i(j) with (x_i(j) - n_i)/r_i for every j.

(12)

Non-Parametric Regression

(13)

Non-parametric Regression:

Piecewise Linear Non-Parametric Regression Given a query x and training set T:

Let (L, y(L)) be the largest* point in T such that L ≤ x.

Let (R, y(R)) be the smallest* point in T such

that

R ≥ x.

h(x; T) = α y(R) + (1 - α) y(L)

(14)

0

Non-Parametric Regression:

y

= α

(15)

Given a query x and training set T:

Let (x₁, y₁), …, (x_k, y_k) be the k closest* points to x in T.

h(x ; T) = mean(y₁, …, y_k)

Non-parametric Regression:

kNN Averaging

(16)

(17)

Given a query x and training set T:

Let (x₁, y₁), …, (x_k, y_k) be the k closest* points to x in T.

Let:

Non-parametric Regression:

kNN (Linear) Regression

(18)

(19)

Given a query point x, a training set T, and a kernel K:

Compute a weight vector w for a (imaginary) dataset where

each point is weighted according to K.

Now return h_w(x)

Non-parametric Regression:

(20)

(21)

LWLR:

(22)

Advantages

(of Non-Parametric Methods)

:

● They can easily leverage locality in the data.

○ Example:

Suppose we can separate Small Cats from Small

Dogs using some particular decision boundary, but a completely different boundary works better for Big Cats vs Big Dogs.

(23)

Issues in the Real World:

Suppose we’re classifying cats vs dogs.

(24)

Multi-Class Classification

● Not to be confused with Multi-Label Classification

● No big difference from the binary case for some algorithms

○ kNN

(25)

Multiclass Classification

With Linear Models: Perceptrons

(26)

Training:

Augment labels to be: Class 1

Not Class 1

Learn Perceptron h₁

Repeat for Class 2, Class 3, ...

Multiclass Classification

(27)

Testing:

Given x, compute h₁(x), h₂(x), ...

Take whichever one says “Positive”.

Multiclass Classification

(28)

Multiclass Classification

With Linear Models: Perceptrons

x x₂

(29)

Problem:

x might not bird, nor plane, nor even frog

(all Perceptrons say Negative)

When you data isn’t linearly separable, multiple might say Positive.

Solution?

Multiclass Classification

(30)

An Additional Problem:

Suppose there are C classes.

(31)

1-of-k Encoding:

Point x₁ x₂ y

A 4 -1 2

B -1 5 3

C -1 -2 1

D -1 4 1

Point y₁ y₂ y₃

A 0 1 0

B 0 0 1

C 1 0 0

(32)

1-of-k Encoding:

We’re using 1 bit per class. We can do better.

(33)

1-of-k Encoding:

(34)

Fancier Encoding:

(35)

1-of-k Encoding: Example

Classifying Cars vs Trucks vs Cats vs Dogs

Classifier 1: Car or Not Car

Classifier 2: Truck or Not Truck

Classifier 3: Cat or Not Cat

(36)

Fancier Encoding: Example

Classifier 1: Car/Truck vs Cat/Dog

Classifier 2: Car vs Truck

(37)

Fancier Encoding: Example

Classifier 1: Car/Cat vs Truck/Dog

Classifier 2: Car vs Cat