• No results found

Support Vector Machine for Image Classification

N/A
N/A
Protected

Academic year: 2022

Share "Support Vector Machine for Image Classification"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

Support Vector Machine for Image Classification

Bruno Galerne

[email protected]

Université d’Orléans

Vendredi 27/03/2020 =Confinement COVID-19 J11 Statistiques pour le traitement d’images

Master 1 Statistique & Data Science, Ingénierie Mathématique

(2)

Course Plan

Supervised classification

Kernel SVM for linearly separable training set

Kernel SVM for non-linearly separable training set

Multi-Class SVM

Practical session

(3)

Main references:

I C. M. Bishop, Pattern Recognition and Machine Learning, Information Science and Statistics, Springer, 2006

Freely available:

https://www.microsoft.com/en-us/research/people/

cmbishop/prml-book/

I Various tutorials of[Scikit-learn]

(4)

Outline

Supervised classification

Kernel SVM for linearly separable training set

Kernel SVM for non-linearly separable training set

Multi-Class SVM

Practical session

(5)

Supervised classification

I The points x ∈ Rd are partitioned into K classes.

I We have atraining set of N labeled points:

T = {(xn,tn)16n6N, xn∈ Rd,tn∈ {1, . . . , K }}.

I The goal is to use this training set to define a classification function Ψ : Rd→ {1, . . . , K }.

I The performance of the classifier is measured using atest set that is different from the training set.

Training points are solid, testing points are semi-transparent.1

1Image from: https://scikit-learn.org/stable/auto_examples/

classification/plot_classifier_comparison.html

(6)

Outline

Supervised classification

Kernel SVM for linearly separable training set

Kernel SVM for non-linearly separable training set

Multi-Class SVM

Practical session

(7)

Feature map and kernel

I The initial data xn∈ Rdis often not well-described in its original form.

I We transform it using a feature map φ : Rd → RD. I This feature map is associated to a kernel function:

k (x , x0) = hφ(x ), φ(x0)i.

I The kernel k : Rd× Rd→ [0, +∞) is a symmetric and positive function.

I As will be shown, for SVM the mapping φ need not to be explicit since one only needs to compute the kernel.

Examples:

I φ = id, k (x , x0) = hx , x0i.

I Gaussian RBF kernel (RBF = Radial Basis Function).

k (x , x0) =e−γkx−x0k.

Remark: The mapping φ such that k (xm,xn) = hφ(xm), φ(xn)iis often from Rdto an infinite dimensional Hilbert space H (called a reproducing kernel Hilbert space (RKHS)).

I Nice YouTube video: https://youtu.be/9NrALgHFwTo

(8)

Separating hyperplane

I The main idea of SVM is to find a separating hyperlane that has the largest margin from the dataset.

I Which hyperplane separate the data ? Which one is better?

(9)

SVM theory

SVM theory relies on Lagrange duality of constrained convex problems.

Below are slides from: Stephen Boyd and Lieven Vandenberghe, Convex Optimization http://web.stanford.edu/~boyd/cvxbook/

(10)

Lagrangian

standard form problem (not necessarily convex) minimize f0(x)

subject to fi(x)≤ 0, i = 1, . . . , m hi(x) = 0, i = 1, . . . , p variable x∈ Rn, domainD, optimal value p

Lagrangian: L : Rn× Rm× Rp→ R, with dom L = D × Rm× Rp,

L(x, λ, ν) = f0(x) + Xm i=1

λifi(x) + Xp i=1

νihi(x)

• weighted sum of objective and constraint functions

• λiis Lagrange multiplier associated with fi(x)≤ 0

• νiis Lagrange multiplier associated with hi(x) = 0

Duality 5–2

(11)

Lagrange dual function

Lagrange dual function: g : Rm× Rp→ R, g(λ, ν) = inf

x∈DL(x, λ, ν)

= inf

x∈D f0(x) + Xm i=1

λifi(x) + Xp

i=1

νihi(x)

!

g is concave, can be−∞ for some λ, ν

lower bound property: if λ  0, then g(λ, ν) ≤ p proof: if ˜x is feasible and λ 0, then

f0(˜x)≥ L(˜x, λ, ν) ≥ inf

x∈DL(x, λ, ν) = g(λ, ν) minimizing over all feasible ˜x gives p≥ g(λ, ν)

Duality 5–3

(12)

The dual problem

Lagrange dual problem

maximize g(λ, ν) subject to λ 0

• finds best lower bound on p, obtained from Lagrange dual function

• a convex optimization problem; optimal value denoted d

• λ, ν are dual feasible if λ  0, (λ, ν) ∈ dom g

• often simplified by making implicit constraint (λ, ν) ∈ dom g explicit example: standard form LP and its dual (page 5–5)

minimize cTx subject to Ax = b

x 0

maximize −bTν subject to ATν + c 0

Duality 5–9

(13)

Weak and strong duality

weak duality: d≤ p

• always holds (for convex and nonconvex problems)

• can be used to find nontrivial lower bounds for difficult problems for example, solving the SDP

maximize −1Tν

subject to W + diag(ν) 0

gives a lower bound for the two-way partitioning problem on page 5–7 strong duality: d= p

• does not hold in general

• (usually) holds for convex problems

• conditions that guarantee strong duality in convex problems are called constraint qualifications

Duality 5–10

(14)

SVM theory

Please readpages 326 to 331 of

I C. M. Bishop, Pattern Recognition and Machine Learning, Information Science and Statistics, Springer, 2006

Freely available:

https://www.microsoft.com/en-us/research/people/

cmbishop/prml-book/

(15)

Outline

Supervised classification

Kernel SVM for linearly separable training set

Kernel SVM for non-linearly separable training set

Multi-Class SVM

Practical session

(16)

Kernel SVM for non-linearly separable training set

I Datasets are generally not linearly separable.

I A single outlier can make the hypothesis false.

I To overcome this problem, the optimization problem must be relaxed.

(17)

SVM theory

Please read Section “7.1.1 Overlapping class distributions”pages 332 to 336 of

I C. M. Bishop, Pattern Recognition and Machine Learning, Information Science and Statistics, Springer, 2006

Freely available:

https://www.microsoft.com/en-us/research/people/

cmbishop/prml-book/

(18)

Outline

Supervised classification

Kernel SVM for linearly separable training set

Kernel SVM for non-linearly separable training set

Multi-Class SVM

Practical session

(19)

Multi-Class SVM

I SVM is designed to separate two classes.

I Next we see how to use it with K > 2 classes, although it is an open issue.

Please read Section “7.1.3 Multiclass SVMs”pages 338 to 339 of I C. M. Bishop, Pattern Recognition and Machine Learning, Information

Science and Statistics, Springer, 2006 Freely available:

https://www.microsoft.com/en-us/research/people/

cmbishop/prml-book/

(20)

Outline

Supervised classification

Kernel SVM for linearly separable training set

Kernel SVM for non-linearly separable training set

Multi-Class SVM

Practical session

(21)

The practical session is here:

https://github.com/bgalerne/M1MAS_Stat_Images/blob/

master/TP_SVM_images.ipynb

(22)

C. M. Bishop, Pattern Recognition and Machine Learning, Information Science and Statistics, Springer, 2006

Stephen BOYDand Lieven VANDENBERGHE, Convex Optimization, Cambridge University Press, 2004

Pedregosa et al., Scikit-learn: Machine Learning in Python, JMLR 12, pp. 2825-2830, 2011

References

Related documents

Our proposed ap- proach differs from (Le et al., 2011) in three re- spects: (1) we enrich the queries, but not the cat- egories (2) we employ a machine learning system and

Classification is a data mining (machine learning) technique used to predict group membership for data instances. Classification is a two step process. 1st step

LNAI 3651 Relation Extraction Using Support Vector Machine R Dale et al (Eds ) IJCNLP 2005, LNAI 3651, pp 366 ? 377, 2005 ? Springer Verlag Berlin Heidelberg 2005 Relation Extraction

Quantum machine learning can take two forms: where classical machine learning algorithms are trans- formed into their quantum counterparts; to be implemented on a quantum information

In the present work machine learning techniques namely Support Vector Machine [SVM] and Random Forest [RF] are used to learn, classify and compare cancer, liver and

Text categorization is a wide domain to make machine able to understand and organize the document (text, document), as a part of Natural Language Processing that focus to make

Support vector machine is one of the techniques that are used to classify a handwritten character based on handwriting recognition and identify the digitized output

Clustering model selection for reduced support vector machines, Lecture Notes in Computer Science.. Making large scale support vector machine