• No results found

UVA CS : Machine Learning. S4 Lecture 21 Extra Extra: Optimization with Dual Form and for SVM

N/A
N/A
Protected

Academic year: 2022

Share "UVA CS : Machine Learning. S4 Lecture 21 Extra Extra: Optimization with Dual Form and for SVM"

Copied!
44
0
0

Loading.... (view fulltext now)

Full text

(1)

UVA CS :

Machine Learning

S4 Lecture 21 Extra Extra:

Optimization with Dual Form and for SVM

Dr. Yanjun Qi

(2)

Today Extra

q Optimization of SVM

ü SVM as QP

ü A simple example of constrained optimization ü SVM Optimization with dual form

ü KKT condition

ü SMO algorithm for fast SVM dual optimization

(3)

This: Kernel Support Vector Machine

Classification / Regression / Ranking Kernel Trick Func K(xi, xj)

Margin + Hinge Loss QP with Dual form

Dual Weights Task

Representation Score Function

Search/Optimization

Models, Parameters

w = αixiyi

i

argmin

p w2+C ε

n

K(x, z) := Φ(x)TΦ(z)

maxα

αi 12

αiα jyi yjxiTxj

Data Tabular / Text / Image/ Structured

(4)

Optimization with Quadratic programming (QP)

Quadratic programming solves optimization problems of the following form:

minU uTRu

2 + dTu + c

subject to n inequality constraints:

a11u1 + a12u2 + ... ≤ b1

! ! !

an1u1 + an 2u2 + ... ≤ bn

and k equivalency constraints:

an +1,1u1 + an +1,2u2 + ... = bn +1

! ! !

an +k,1u1 + an +k,2u2 + ... = bn +k

Quadratic term

When a problem can be

specified as a QP problem we can use solvers that are better than gradient descent or

simulated annealing

10/16/19 Dr. Yanjun Qi / UVA 4

(5)

SVM as a QP problem

minU uTRu

2 + dTu + c

subject to n inequality constraints:

a11u1+ a12u2 + ... ≤ b1

! ! !

an1u1+ an 2u2 + ... ≤ bn

and k equivalency constraints:

an +1,1u1 + an +1,2u2 + ... = bn +1

! ! !

an +k,1u1 + an +k,2u2 + ... = bn +k Predict class +1

Predict class -1

wTx+b=+1 wTx+b=0

wTx+b=-1

x+ M

x-

M = 2

wTw

Min (wTw)/2

subject to the following inequality constraints:

For all x in class + 1 wTx+b >= 1

For all x in class - 1

}

A total of n constraints if

R as I matrix, d as zero vector, c as 0 value

(6)

Optimization Review:

Ingredients

• Objective function

• Variables

• Constraints

Find values of the variables

that minimize or maximize the objective function

while satisfying the constraints

(7)

Today Extra

q Optimization of SVM

ü SVM as QP

ü A simple example of constrained optimization and dual ü Optimization with dual form

ü KKT condition

ü SMO algorithm for fast SVM dual optimization

(8)

Optimization Review:

Lagrangian Duality

• The Primal Problem

Primal:

The generalized Lagrangian:

the a's (ai≥0) are called the Lagarangian multipliers Lemma:

A re-written Primal:

minw s.t.

f0(w)

fi(w) ≤ 0, i = 1,…,k

L(w,α )= f0(w)+ αi fi(w)

i=1

k

maxα,α

i≥0 L(w,α)= f0(w) if w satisfies primal constraints

o/w

“Method of Lagrange multipliers”

convert to a higher-dimensional problem

(9)

Optimization Review:

Lagrangian Duality, cont.

• Recall the Primal Problem:

• The Dual Problem:

Theorem (weak duality):

Theorem (strong duality):

minw maxα,α

i≥0 L(w,α )

maxα,α

i≥0 minw L(w,α)

d* = maxα,α

i≥0minwL(w,α) ≤ minw maxα,α

i≥0 L(w,α)= p*

(w,α)

(10)

minu u2 s.t. u >= b

(11)

Optimization Review:

Constrained Optimization

minu u2 s.t. u >= b

b Global min

Allowed min

b Global min

Allowed min Case 1:

Case 2:

(12)

Optimization Review:

Constrained Optimization

minu u2 s.t. u >= b

b Global min

Allowed min

b Global min

Allowed min Case 1:

Case 2:

(13)

Optimization Review:

Constrained Optimization

minu u2 s.t. u >= b

b Global min

Allowed min

b Global min

Allowed min Case 1:

Case 2:

(14)

minu u2 s.t. u >= b

(15)

minu u2 s.t. u >= b

(16)

minu u2 s.t. u >= b

(17)

minu u2 s.t. u >= b

Dual

(18)

minu u2 s.t. u >= b

Dual

(19)

minu u2 s.t. u >= b

Dual

(20)

minu u2 s.t. u >= b

Dual

(21)
(22)
(23)

Today Extra

q Optimization of SVM

ü SVM as QP

ü A simple example of constrained optimization ü SVM Optimization with dual form

ü KKT condition

ü SMO algorithm for fast SVM dual optimization

(24)

!!

minw ,bmaxα wTw

2 αi[(wTxi + b)yi −1]

i

αi ≥ 0 ∀i

(25)

!!

minw ,bmaxα wTw

2 αi[(wTxi + b)yi −1]

i

αi ≥ 0 ∀i

(26)

!!!

Lprimal

= 1

2

w 2

− α

i

(

yi

(w ⋅x

i

+ b)−1 )

i=1

N

(27)

Optimization Review: Dual Problem

• Solving dual problem if the dual form is easier than primal form

• Need to change primal minimization to dual

maximization (OR è Need to change primal maximization to dual minimization)

• Only valid when the original optimization problem is

Dual Problem, Primal Problem

Strong duality

(28)

Today Extra

q Optimization of SVM

ü SVM as QP

ü A simple example of constrained optimization ü SVM Optimization with dual form

ü KKT condition

ü SMO algorithm for fast SVM dual optimization

(29)

KKT Condition for Strong Duality

Dual Problem, Primal Problem

Strong duality

(30)

Optimization Review: Lagrangian (even

more general standard form)

(31)

Optimization Review: Lagrange dual function

Inf(.): greatest lower bound

(32)

Optimization Review:

inf (.): greatest lower bound

(33)

Optimization Review:

inf (.): greatest lower bound

(34)
(35)

Optimization Review:

Key for SVM Dual

(36)

Dual formulation for linearly non

separable case (soft SVM)

(37)

Dual formulation for linearly non

separable case

(38)

Support Vectors for the Soft-case

• Support vectors are

• Samples on the margin:

• Sample violate (mostly inside the margin area):

yi

(

xi ⋅ w + b

)

= 1,

0 <αi < C

yi

(

xi ⋅ w + b

)

< 1,

αi = C

(39)

Today Extra

q Optimization of SVM

ü SVM as QP

ü A simple example of constrained optimization ü SVM Optimization with dual form

ü KKT condition

ü SMO algorithm for fast SVM dual optimization

(40)

Fast SVM Implementations

• SMO: Sequential Minimal Optimization

• SVM-Light

• LibSVM

• BSVM

• ……

J. Platt (1999),

Fast Training of Support Vector Machines Using Sequential Minimal Optimization

(41)

SMO: Sequential Minimal Optimization

• Key idea

• Divide the large QP problem of SVM into a series of smallest possible QP problems, which can be solved analytically and thus avoids using a time- consuming numerical QP in the loop (a kind of SQP method).

• Space complexity: O(n).

• Since QP is greatly simplified, most time-consuming part of SMO is the evaluation of decision function, therefore it is very fast for linear SVM and sparse data.

(42)

SMO

• At each step, SMO chooses 2 Lagrange multipliers to jointly optimize, find the optimal values for these multipliers and updates the SVM to reflect the new optimal values.

• Three components

• An analytic method to solve for the two Lagrange multipliers

• A heuristic for choosing which (next) two multipliers to optimize

• A method for computing b at each step, so that the KTT conditions are fulfilled for both the two examples (corresponding to the two multipliers )

(43)

Choosing Which Multipliers to Optimize

• First multiplier

• Iterate over the entire training set, and find an example that violates the KTT condition.

• Second multiplier

• Maximize the size of step taken during joint optimization.

• |E1-E2|, where Ei is the error on the i-th example.

(44)

References

• Big thanks to Prof. Ziv Bar-Joseph and Prof. Eric Xing @ CMU for allowing me to reuse some of his slides

• Elements of Statistical Learning, by Hastie, Tibshirani and Friedman

• Prof. Andrew Moore @ CMU’s slides

• Tutorial slides from Dr. Tie-Yan Liu, MSR Asia

• A Practical Guide to Support Vector

Classification Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, 2003-2010

• Tutorial slides from

Stanford “Convex Optimization I — Boyd & Vandenberghe

References

Related documents

“I know, but when I read the summary, I just felt like he was sad.”. “Well, he

We have to create Three Dc’s to make a reusable component, one an External Library DC which will contain the open source jars, other Dc which uses as reusable dc in many DC’s to

In Romania, where the objective criteria of eligibility where tied to written documents, two thirds (62.68%) of adult applicants proved their eligibility through

However, there are multiple influencing factors of perceived usefulness (subjective norm, image, job relevance, output quality, and result demonstrability) and perceived ease of

NASA’s research got many people thinking about a better way to produce photographs without using harmful chemicals.. Then in 1981, Sony Corporation came up with a filmless camera that

The article titled “ Oral haemangioma ” [ 1 ], published in Case Reports in Medicine, has been retracted as it is found to contain a substantial amount of material from the paper

Aiming to promote a “system of health and medical services in India (…), which stresses the importance of preventive, promotive, public health and

Article 930 of the Jordanian Civil Code states that &#34;the insurer liability shall not be resulted in insurance against civil liability unless the victim claims the