Non-Parametric Methods
+ Multiclass Classification
CS540-002, Spring 2015 Lecture 41
● Remember WHW3
● We have a final exam coming up
○ Thursday, May 14 ○ 5:05 - 7:05 PM
○ You can bring TWO sheets (8.5x11) of notes
○ Exam is cumulative, but focuses on material from after the midterm.
Today:
● Non-Parametric Methods
Recall: Parametric Methods
With previous methods, we produce a model: ● DTs: A tree
● Linear Regression: A vector* ● Perceptrons: A vector*
● Logistic Regression: A vector*
Non-Parametric Methods
k-Nearest Neighbors classification:
Basic idea:
Given a test point x, find the k closest points
in the training set.
x x2
kNN: Implementation
The “learning” phase of kNN is putting the data into a structure such that queries are fast.
For example: ● A k-d Tree
kNN: A Potential Problem
Height (Feet)
kNN: A Potential Problem
Height (cm)
Data Normalization:
Basic Idea:
The units used shouldn’t make a difference. So, convert everything to a “unitless” measure. For the ith feature:
Let mi be the mean of that feature (that we observed), and
let si be the standard deviation.
Replace xi(j) by (xi(j) - mi) / si for every j.
Data Standardization:
Basic Idea:
The units used shouldn’t make a difference. So, convert everything to a “unitless” measure. For the ith feature:
Let mi be the max of feature (that we observed).
Let ni be the min of that feature.
Let ri be mi- ni.
Replace xi(j) with (xi(j) - ni)/ri for every j.
Non-Parametric Regression
Non-parametric Regression:
Piecewise Linear Non-Parametric Regression Given a query x and training set T:
Let (L, y(L)) be the largest* point in T such that L ≤ x.
Let (R, y(R)) be the smallest* point in T such
that
R ≥ x.
h(x; T) = α y(R) + (1 - α) y(L)
0
Non-Parametric Regression:
y
= α
Given a query x and training set T:
Let (x1, y1), …, (xk, yk) be the k closest* points to x in T.
h(x ; T) = mean(y1, …, yk)
Non-parametric Regression:
kNN AveragingGiven a query x and training set T:
Let (x1, y1), …, (xk, yk) be the k closest* points to x in T.
Let:
Non-parametric Regression:
kNN (Linear) RegressionGiven a query point x, a training set T, and a kernel K:
Compute a weight vector w for a (imaginary) dataset where
each point is weighted according to K.
Now return hw(x)
Non-parametric Regression:
LWLR:
Advantages
(of Non-Parametric Methods):
● They can easily leverage locality in the data.
○ Example:
Suppose we can separate Small Cats from Small
Dogs using some particular decision boundary, but a completely different boundary works better for Big Cats vs Big Dogs.
Issues in the Real World:
Suppose we’re classifying cats vs dogs.
Multi-Class Classification
● Not to be confused with Multi-Label Classification
● No big difference from the binary case for some algorithms
○ kNN
Multiclass Classification
With Linear Models: Perceptrons
Training:
Augment labels to be: Class 1
Not Class 1
Learn Perceptron h1
Repeat for Class 2, Class 3, ...
Multiclass Classification
Testing:
Given x, compute h1(x), h2(x), ...
Take whichever one says “Positive”.
Multiclass Classification
Multiclass Classification
With Linear Models: Perceptrons
x x2
Problem:
x might not bird, nor plane, nor even frog
(all Perceptrons say Negative)
When you data isn’t linearly separable, multiple might say Positive.
Solution?
Multiclass Classification
An Additional Problem:
Suppose there are C classes.
1-of-k Encoding:
Point x1 x2 y
A 4 -1 2
B -1 5 3
C -1 -2 1
D -1 4 1
Point y1 y2 y3
A 0 1 0
B 0 0 1
C 1 0 0
1-of-k Encoding:
We’re using 1 bit per class. We can do better.
1-of-k Encoding:
Fancier Encoding:
1-of-k Encoding: Example
Classifying Cars vs Trucks vs Cats vs Dogs
Classifier 1: Car or Not Car
Classifier 2: Truck or Not Truck
Classifier 3: Cat or Not Cat
Fancier Encoding: Example
Classifying Cars vs Trucks vs Cats vs Dogs
Classifier 1: Car/Truck vs Cat/Dog
Classifier 2: Car vs Truck
Fancier Encoding: Example
Classifying Cars vs Trucks vs Cats vs Dogs
Classifier 1: Car/Cat vs Truck/Dog
Classifier 2: Car vs Cat