Lecture 2: Image Classification pipeline

(1)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 1 7 Jan 2015

Lecture 2:

Image Classification pipeline

(2)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 2 7 Jan 2015

Image Classification: a core task in Computer Vision

cat

(assume given set of discrete labels)

{dog, cat, truck, plane, ...}

(3)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 3 7 Jan 2015

The problem:

semantic gap

Images are represented as R ^d arrays of numbers

•  E.g. R ³ with integers

between [0, 255], where

d=3 represents 3 color

channels (RGB)

(4)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 4 7 Jan 2015

(5)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 5 7 Jan 2015

(6)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 6 7 Jan 2015

(7)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 7 Jan 2015

(8)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 8 7 Jan 2015

(9)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 9 7 Jan 2015

(10)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 10 7 Jan 2015

(11)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 11 7 Jan 2015

An image classifier

Unlike e.g. sorting a list of numbers,

no obvious way to hard-code the algorithm for

recognizing a cat, or other classes.

(12)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 12 7 Jan 2015

Data-driven approach:

1.  Collect a dataset of images and label them

2.  Use Machine Learning to train an image classifier

3.  Evaluate the classifier on a withheld set of test images

Example training set

(13)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 13 7 Jan 2015

First classifier: Nearest Neighbor Classifier

Remember all training images and their labels

Predict the label of the

most similar training image

(14)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 14 7 Jan 2015

Example dataset: CIFAR-10 10 labels

50,000 training images

10,000 test images.

(15)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 15 7 Jan 2015

Example dataset: CIFAR-10 10 labels

50,000 training images

10,000 test images. For every test image (first column),

examples of nearest neighbors in rows

(16)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 16 7 Jan 2015

How do we compare the images? What is the distance metric?

L1 distance: ^{Where I}

¹

denotes image 1,

and p denotes each pixel

(17)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 17 7 Jan 2015

Nearest Neighbor classifier

(18)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 18 7 Jan 2015

Nearest Neighbor classifier

remember the training data

(19)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 19 7 Jan 2015

Nearest Neighbor classifier

for every test image:

-  find nearest train image with L1 distance

-  predict the label of

nearest training image

(20)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015

Nearest Neighbor classifier Q: what is the

complexity of the NN classifier w.r.t training set of N images and test set of M images?

1.  at training time?

2.  at test time?

(21)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 21 7 Jan 2015

Nearest Neighbor classifier Q: what is the

complexity of the NN classifier w.r.t training set of N images and test set of M images?

1.  at training time?

O(1)

1.  at test time?

O(NM)

(22)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 22 7 Jan 2015

Nearest Neighbor classifier 1.  at training time?

O(1)

2. at test time?

O(NM)

This is backwards:

- test time performance is usually much more important.

- CNNs flip this:

expensive training,

cheap test evaluation

(23)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 23 7 Jan 2015

Aside: Approximate Nearest Neighbor

find approximate nearest neighbors quickly

(24)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 24 7 Jan 2015

The choice of distance is a hyperparameter

L1 (Manhattan) distance L2 (Euclidean) distance

-  Two most commonly used special cases of p-norm

(25)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 25 7 Jan 2015

k-Nearest Neighbor

find the k nearest images, have them vote on the label

the data NN classifier 5-NN classifier

http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

(26)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 26 7 Jan 2015

What is the best distance to use?

What is the best value of k to use?

i.e. how do we set the hyperparameters?

(27)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 27 7 Jan 2015

What is the best distance to use?

What is the best value of k to use?

i.e. how do we set the hyperparameters?

Very problem-dependent.

Must try them all out and see what works best.

(28)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 28 7 Jan 2015

Trying out what hyperparameters work best on test set:

Very bad idea. The test set is a proxy for the generalization performance

(29)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 29 7 Jan 2015

Validation data

use to tune hyperparameters

evaluate on test set ONCE at the end

(30)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 30 7 Jan 2015

Cross-validation

cycle through the choice of which fold is

the validation fold, average results.

(31)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 31 7 Jan 2015

Example of

5-fold cross-validation for the value of k.

Each point: single outcome.

The line goes

through the mean, bars indicated standard

deviation

(Seems that k = 7 works best

for this data)

(32)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 32 7 Jan 2015

Summary

-  Image Classification: We are given a Training Set of labeled images, asked to predict labels on Test Set. Common to report the Accuracy of predictions (fraction of correctly predicted images)

-  We introduced the k-Nearest Neighbor Classifier, which predicts the labels based on nearest images in the training set

-  We saw that the choice of distance and the value of k are hyperparameters that are tuned using a validation set, or through cross-validation if the size of the data is small.

-  Once the best set of hyperparameters is chosen, the classifier is evaluated

once on the test set.

Lecture 2: Image Classification pipeline

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 1 7 Jan 2015

Lecture 2:

Image Classification pipeline

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 2 7 Jan 2015

Image Classification: a core task in Computer Vision

cat

(assume given set of discrete labels)

{dog, cat, truck, plane, ...}

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 3 7 Jan 2015

The problem:

semantic gap

Images are represented as R d arrays of numbers

• E.g. R 3 with integers

between [0, 255], where

d=3 represents 3 color

channels (RGB)

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 4 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 5 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 6 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 8 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 9 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 10 7 Jan 2015

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 11 7 Jan 2015

An image classifier

Unlike e.g. sorting a list of numbers,

no obvious way to hard-code the algorithm for

recognizing a cat, or other classes.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 12 7 Jan 2015

Data-driven approach:

1. Collect a dataset of images and label them

2. Use Machine Learning to train an image classifier

3. Evaluate the classifier on a withheld set of test images

Example training set

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 13 7 Jan 2015

First classifier: Nearest Neighbor Classifier

Remember all training images and their labels

Predict the label of the

most similar training image

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 14 7 Jan 2015

Example dataset: CIFAR-10 10 labels

50,000 training images

10,000 test images.

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 15 7 Jan 2015

Example dataset: CIFAR-10 10 labels

50,000 training images

10,000 test images. For every test image (first column),

examples of nearest neighbors in rows

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 16 7 Jan 2015

How do we compare the images? What is the distance metric?

L1 distance: Where I

denotes image 1,

and p denotes each pixel

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 17 7 Jan 2015

Nearest Neighbor classifier

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 18 7 Jan 2015

Nearest Neighbor classifier

remember the training data

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 19 7 Jan 2015

Nearest Neighbor classifier

for every test image:

- find nearest train image with L1 distance

- predict the label of

nearest training image

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 20 7 Jan 2015

Nearest Neighbor classifier Q: what is the

complexity of the NN classifier w.r.t training set of N images and test set of M images?

1. at training time?

2. at test time?

Fei-Fei Li & Andrej Karpathy Lecture 2 - 7 Jan 2015 Fei-Fei Li & Andrej Karpathy Lecture 2 - 21 7 Jan 2015

Nearest Neighbor classifier Q: what is the

complexity of the NN classifier w.r.t training set of N images and test set of M images?

Images are represented as R ^d arrays of numbers

•  E.g. R ³ with integers

1.  Collect a dataset of images and label them

2.  Use Machine Learning to train an image classifier

3.  Evaluate the classifier on a withheld set of test images

L1 distance: ^{Where I}

-  find nearest train image with L1 distance

-  predict the label of

1.  at training time?

2.  at test time?

1.  at training time?

1.  at test time?

Nearest Neighbor classifier 1.  at training time?

-  Two most commonly used special cases of p-norm

-  Image Classification: We are given a Training Set of labeled images, asked to predict labels on Test Set. Common to report the Accuracy of predictions (fraction of correctly predicted images)

-  We introduced the k-Nearest Neighbor Classifier, which predicts the labels based on nearest images in the training set

-  We saw that the choice of distance and the value of k are hyperparameters that are tuned using a validation set, or through cross-validation if the size of the data is small.

-  Once the best set of hyperparameters is chosen, the classifier is evaluated