• No results found

Support Vector Machines

N/A
N/A
Protected

Academic year: 2021

Share "Support Vector Machines"

Copied!
65
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Support Vector Machines

Machine Learning Fall 2017

Supervised Learning: The Setup

Machine Learning Spring 2018

The slides are mainly from Vivek Srikumar

(2)

Big picture

Linear models

(3)

Big picture

Linear models

How good is a learning

(4)

Big picture

Linear models

How good is a learning algorithm?

Mistake- driven learning Perceptron,

Winnow

(5)

Big picture

Linear models

How good is a learning Mistake-

driven learning

PAC, Agnostic learning Perceptron,

Winnow

(6)

Big picture

Linear models

How good is a learning algorithm?

Mistake- driven learning

PAC, Agnostic learning Perceptron,

Winnow Support Vector

Machines

The Risk Minimization Principle

(7)

Big picture

Linear models

How good is a learning Mistake-

driven learning

PAC, Agnostic learning Perceptron,

Winnow Support Vector

Machines

….

The Risk Minimization Principle

….

(8)

This lecture: Support vector machines

• Training by maximizing margin

• The SVM objective

• Solving the SVM optimization problem

• Support vectors, duals and kernels

(9)

This lecture: Support vector machines

• Training by maximizing margin

• The SVM objective

• Solving the SVM optimization problem

• Support vectors, duals and kernels

(10)

VC dimensions and linear classifiers

What we know so far

1. If we have m examples, then with probability 1 - !, the true error of a hypothesis h with training error err

S

(h) is bounded by

Generalization error Training error A function of VC dimension.

Low VC dimension gives tighter bound

(11)

VC dimensions and linear classifiers

What we know so far

1. If we have m examples, then with probability 1 - !, the true error of a hypothesis h with training error err

S

(h) is bounded by

2. VC dimension of a linear classifier in d dimensions = d + 1

Generalization error Training error A function of VC dimension.

Low VC dimension gives tighter bound

(12)

VC dimensions and linear classifiers

What we know so far

1. If we have m examples, then with probability 1 - !, the true error of a hypothesis h with training error err

S

(h) is bounded by

2. VC dimension of a linear classifier in d dimensions = d + 1

Generalization error Training error A function of VC dimension.

Low VC dimension gives tighter bound

But are all linear classifiers the same?

(13)

Recall: Margin

The margin of a hyperplane for a dataset is the distance between the hyperplane and the data point nearest to it.

+

+ +

+ ++ + - +

- - - - - - - -

-

- - -

- - -

-

-

(14)

Recall: Margin

The margin of a hyperplane for a dataset is the distance between the hyperplane and the data point nearest to it.

+

+ +

+ ++ + - +

- - - - - - - -

-

- - -

- - - -

-

Margin with respect to this hyperplane

(15)

Which hyperplane is better?

h1

h2

Recall: Margin

(16)

Which hyperplane is better?

h1

h2

The farther from the data points, the less chance to make wrong prediction

Recall: Margin

(17)

Data dependent VC dimension

• Intuitively, larger margins are better

• Suppose we only consider linear classifiers with margins !

1

and !

2

– H

1

= linear classifiers that have a margin at least !

1

– H

2

= linear classifiers that have a margin at least !

2

– And !

1

> !

2

• The entire set of functions H

1

is “better”

(18)

Data dependent VC dimension

Theorem (Vapnik):

– Let H be the set of linear classifiers that separate the training set by a margin at least !

– Then

– R is the radius of the smallest sphere containing the data

(19)

Data dependent VC dimension

Theorem (Vapnik):

– Let H be the set of linear classifiers that separate the training set by a margin at least !

– Then

– R is the radius of the smallest sphere containing the data Larger margin ⟹ Lower VC dimension

Lower VC dimension ⟹ Better generalization bound

(20)

Learning strategy

Find the linear classifier that maximizes the margin

(21)

This lecture: Support vector machines

• Training by maximizing margin

• The SVM objective

• Solving the SVM optimization problem

• Support vectors, duals and kernels

(22)

Recall: The geometry of a linear classifier

Prediction = sgn(b +w

1

x

1

+ w

2

x

2

)

+ + + + ++ +

+

- - - -

- - - - -

-

- - -

- - - - -

b +w1 x1 + w2x2=0

(23)

Recall: The geometry of a linear classifier

Prediction = sgn (b +w

1

x

1

+ w

2

x

2

)

+ + + + ++ +

+

- - - -

- - - - -

-

- - -

- - - - -

b +w1 x1 + w2x2=0

We only care about the sign, not the magnitude

(24)

Recall: The geometry of a linear classifier

Prediction = sgn(b +w

1

x

1

+ w

2

x

2

)

b +w1 x1 + w2x2=0 2b +2w1 x1 + 2w2x2=0

1000b +1000w1 x1 + 1000w2x2=0

+

+ +

+ ++ + +

- - - -

- - - - -

-

- - -

- - - - -

We only care about the sign, not the magnitude

(25)

Maximizing margin

• Margin = distance of the closest point from the hyperplane

• We want max

w

°

Some people call this the geometric margin

The numerator alone is called the functional margin

(26)

Maximizing margin

• Margin = distance of the closest point from the hyperplane

• We want max

w

!

Some people call this the geometric margin

The numerator alone is called the functional margin

(27)

Recall: The geometry of a linear classifier

Prediction = sgn(b +w

1

x

1

+ w

2

x

2

)

b +w1 x1 + w2x2=0

We only care about the sign, not the magnitude

+ + + + ++ +

+

- - - -

- - - - -

-

- - -

- - -

-

-

(28)

Recall: The geometry of a linear classifier

Prediction = sgn(b +w

1

x

1

+ w

2

x

2

)

b +w1 x1 + w2x2=0

We only care about the sign, not the magnitude

+ + + + ++ +

+

- - - -

- - - - -

-

- - -

- - - - -

We have the freedom to scale up/down w and b so that the numerator is 1.

(29)

Maximizing margin

• Margin = distance of the closest point from the hyperplane

• We want max

w

!

We only care about the sign of w

T

x

i

+b in the end and not the magnitude

– Set the absolute score (functional margin) of the closest point to be 1 and allow w to adjust itself

max

%

! is equivalent to max

%

&

%

in this setting

(30)

Max-margin classifiers

• Learning problem:

30

216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

min

w

1

2 w

>

w

s.t. 8i, y

i

(w

>

x

i

+ b) 1

min

w

1

2 w

>

w + C X

i

i

s.t. 8i, y

i

(w

>

x

i

+ b) 1 ⇠

i

i

0

min

w,b

1

2 w

>

w + C X

i

max 0, 1 y

i

(w

>

x

i

+ b)

L

Hinge

(y, x, w, b) = max 0, 1 y(w

>

x + b)

min

w

1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

J

t

(w) = 1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

rJ

t

=

⇢ [w

0

; 0] if max(0, 1 y

i

w

ix

) = 0

w Cy

i

x

i

otherwise

(31)

Max-margin classifiers

• Learning problem:

31

Minimizing gives us max

$

%

$

216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

min

w

1

2 w

>

w

s.t. 8i, y

i

(w

>

x

i

+ b) 1

min

w

1

2 w

>

w + C X

i

i

s.t. 8i, y

i

(w

>

x

i

+ b) 1 ⇠

i

i

0

min

w,b

1

2 w

>

w + C X

i

max 0, 1 y

i

(w

>

x

i

+ b)

L

Hinge

(y, x, w, b) = max 0, 1 y(w

>

x + b)

min

w

1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

J

t

(w) = 1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

rJ

t

=

⇢ [w

0

; 0] if max(0, 1 y

i

w

ix

) = 0

w Cy

i

x

i

otherwise

(32)

Max-margin classifiers

• Learning problem:

32

This condition is true for every example, specifically, for the example closest to the

separator

Minimizing gives us max

$

%

$

216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

min

w

1

2 w

>

w

s.t. 8i, y

i

(w

>

x

i

+ b) 1

min

w

1

2 w

>

w + C X

i

i

s.t. 8i, y

i

(w

>

x

i

+ b) 1 ⇠

i

i

0

min

w,b

1

2 w

>

w + C X

i

max 0, 1 y

i

(w

>

x

i

+ b)

L

Hinge

(y, x, w, b) = max 0, 1 y(w

>

x + b)

min

w

1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

J

t

(w) = 1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

rJ

t

=

⇢ [w

0

; 0] if max(0, 1 y

i

w

ix

) = 0

w Cy

i

x

i

otherwise

(33)

Max-margin classifiers

• Learning problem:

• This is called the “hard” Support Vector Machine

We will look at how to solve this optimization problem later

33

This condition is true for every example, specifically, for the example closest to the

separator

Minimizing gives us max

$

%

$

216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

min

w

1

2 w

>

w

s.t. 8i, y

i

(w

>

x

i

+ b) 1

min

w

1

2 w

>

w + C X

i

i

s.t. 8i, y

i

(w

>

x

i

+ b) 1 ⇠

i

i

0

min

w,b

1

2 w

>

w + C X

i

max 0, 1 y

i

(w

>

x

i

+ b)

L

Hinge

(y, x, w, b) = max 0, 1 y(w

>

x + b)

min

w

1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

J

t

(w) = 1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

rJ

t

=

⇢ [w

0

; 0] if max(0, 1 y

i

w

ix

) = 0

w Cy

i

x

i

otherwise

(34)

Hard SVM

• This is a constrained optimization problem

If the data is not separable, there is no w that will classify the data

• Infeasible problem, no solution!

What if the data is not separable?

34

Maximize margin Every example has an

functional margin of at least 1 216

217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

min

w

1

2 w

>

w

s.t. 8i, y

i

(w

>

x

i

+ b) 1

min

w

1

2 w

>

w + C X

i

i

s.t. 8i, y

i

(w

>

x

i

+ b) 1 ⇠

i

i

0

min

w,b

1

2 w

>

w + C X

i

max 0, 1 y

i

(w

>

x

i

+ b)

L

Hinge

(y, x, w, b) = max 0, 1 y(w

>

x + b)

min

w

1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

J

t

(w) = 1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

rJ

t

=

⇢ [w

0

; 0] if max(0, 1 y

i

w

xi

) = 0

w Cy

i

x

i

otherwise

(35)

What if the data is not separable?

Hard SVM

• This is a constrained optimization problem

If the data is not separable, there is no w that will classify the data

• Infeasible problem, no solution!

35

Maximize margin Every example has an

functional margin of at least 1 216

217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

min

w

1

2 w

>

w

s.t. 8i, y

i

(w

>

x

i

+ b) 1

min

w

1

2 w

>

w + C X

i

i

s.t. 8i, y

i

(w

>

x

i

+ b) 1 ⇠

i

i

0

min

w,b

1

2 w

>

w + C X

i

max 0, 1 y

i

(w

>

x

i

+ b)

L

Hinge

(y, x, w, b) = max 0, 1 y(w

>

x + b)

min

w

1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

J

t

(w) = 1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

rJ

t

=

⇢ [w

0

; 0] if max(0, 1 y

i

w

xi

) = 0

w Cy

i

x

i

otherwise

(36)

Dealing with non-separable data

Key idea: Allow some examples to “break into the margin”

+

+ +

+ ++ + - +

- - - - - - - -

-

- - -

- - - - -

+ -

(37)

Dealing with non-separable data

Key idea: Allow some examples to “break into the margin”

+

+ +

+ ++ + - +

- - - - - - - -

-

- - -

- - - - -

+ -

(38)

Dealing with non-separable data

Key idea: Allow some examples to “break into the margin”

+

+ +

+ ++ + - +

- - - - - - - -

-

- - -

- - - - -

+ -

This separator has a large enough margin that it should generalize well.

(39)

Dealing with non-separable data

Key idea: Allow some examples to “break into the margin”

So, when computing margin, ignore the

examples that make the margin smaller or the data inseparable.

+

+ +

+ ++ + - +

- - - - - - - -

-

- - -

- - - - -

+ -

This separator has a large enough margin that it should generalize well.

(40)

Soft SVM

• Hard SVM:

• Introduce one slack variable »

i

per example – And require

yiwTxi ¸ 1 - »i and »i ¸ 0

40

Maximize margin

Every example has a functional margin of at least 1

Intuition: The slack variable allows examples to “break” into the margin

If the slack value is zero, then the example is either on or outside the margin 216

217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

min

w

1

2 w

>

w

s.t. 8i, y

i

(w

>

x

i

+ b) 1

min

w

1

2 w

>

w + C X

i

i

s.t. 8i, y

i

(w

>

x

i

+ b) 1 ⇠

i

i

0

min

w,b

1

2 w

>

w + C X

i

max 0, 1 y

i

(w

>

x

i

+ b)

L

Hinge

(y, x, w, b) = max 0, 1 y(w

>

x + b)

min

w

1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

J

t

(w) = 1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

rJ

t

=

⇢ [w

0

; 0] if max(0, 1 y

i

w

xi

) = 0

w Cy

i

x

i

otherwise

(41)

Soft SVM

• Hard SVM:

• Introduce one slack variable per example

– And require and

41

Maximize margin

Every example has a functional margin of at least 1

Intuition: The slack variable allows examples to “break” into the margin

If the slack value is zero, then the example is either on or outside the margin

i

216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

maxw

1

2w>w

s.t.8i, yi(w>xi + b) 1

maxw

1

2w>w + C X

i

i

s.t.8i, yi(w>xi + b) 1 i

i 0

5

i 0 216

217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

min

w

1

2 w

>

w

s.t. 8i, y

i

(w

>

x

i

+ b) 1

min

w

1

2 w

>

w + C X

i

i

s.t. 8i, y

i

(w

>

x

i

+ b) 1 ⇠

i

i

0

min

w,b

1

2 w

>

w + C X

i

max 0, 1 y

i

(w

>

x

i

+ b)

L

Hinge

(y, x, w, b) = max 0, 1 y(w

>

x + b)

min

w

1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

J

t

(w) = 1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

rJ

t

=

⇢ [w

0

; 0] if max(0, 1 y

i

w

xi

) = 0

w Cy

i

x

i

otherwise

(42)

Soft SVM

• Hard SVM:

• Introduce one slack variable per example

– And require and

42

Maximize margin

Every example has a functional margin of at least 1

Intuition: The slack variable allows examples to “break” into the margin

If the slack value is zero, then the example is either on or outside the margin

i

216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

maxw

1

2w>w

s.t.8i, yi(w>xi + b) 1

maxw

1

2w>w + C X

i

i

s.t.8i, yi(w>xi + b) 1 i

i 0

5

i 0 216

217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

min

w

1

2 w

>

w

s.t. 8i, y

i

(w

>

x

i

+ b) 1

min

w

1

2 w

>

w + C X

i

i

s.t. 8i, y

i

(w

>

x

i

+ b) 1 ⇠

i

i

0

min

w,b

1

2 w

>

w + C X

i

max 0, 1 y

i

(w

>

x

i

+ b)

L

Hinge

(y, x, w, b) = max 0, 1 y(w

>

x + b)

min

w

1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

J

t

(w) = 1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

rJ

t

=

⇢ [w

0

; 0] if max(0, 1 y

i

w

xi

) = 0

w Cy

i

x

i

otherwise

(43)

Soft SVM

• Hard SVM:

• Introduce one slack variable per example

– And require and

• New optimization problem for learning

43

Maximize margin

Every example has a functional margin of at least 1

i

i 0

216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

maxw

1

2w>w

s.t.8i, yi(w>xi + b) 1

maxw

1

2w>w + C X

i

i

s.t.8i, yi(w>xi + b) 1 i

i 0

5

216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

min

w

1

2 w

>

w

s.t. 8i, y

i

(w

>

x

i

+ b) 1

min

w

1

2 w

>

w + C X

i

i

s.t. 8i, y

i

(w

>

x

i

+ b) 1 ⇠

i

i

0

min

w,b

1

2 w

>

w + C X

i

max 0, 1 y

i

(w

>

x

i

+ b)

L

Hinge

(y, x, w, b) = max 0, 1 y(w

>

x + b)

min

w

1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

J

t

(w) = 1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

rJ

t

=

⇢ [w

0

; 0] if max(0, 1 y

i

w

xi

) = 0 w Cy

i

x

i

otherwise

216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

minw

1

2w>w

s.t.8i, yi(w>xi + b) 1

minw

1

2w>w + C X

i

i

s.t.8i, yi(w>xi + b) 1 ⇠i

i 0

minw,b

1

2w>w + C X

i

max 0, 1 yi(w>xi + b)

LHinge(y, x, w, b) = max 0, 1 y(w>x + b)

minw

1

2w0>w0 + C X

i

max(0, 1 yiw>xi)

Jt(w) = 1

2w0>w0 + C X

i

max(0, 1 yiw>xi)

rJt =

⇢[w0; 0] if max(0, 1 yiwxi ) = 0 w Cyixi otherwise

5

(44)

216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

min

w

1

2 w

>

w

s.t. 8i, y

i

(w

>

x

i

+ b) 1

min

w

1

2 w

>

w + C X

i

i

s.t. 8i, y

i

(w

>

x

i

+ b) 1 ⇠

i

i

0

min

w,b

1

2 w

>

w + C X

i

max 0, 1 y

i

(w

>

x

i

+ b)

L

Hinge

(y, x, w, b) = max 0, 1 y(w

>

x + b)

min

w

1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

J

t

(w) = 1

2 w

0>

w

0

+ C X

i

max(0, 1 y

i

w

>

x

i

)

rJ

t

=

⇢ [w

0

; 0] if max(0, 1 y

i

w

xi

) = 0 w Cy

i

x

i

otherwise

Soft SVM

• Hard SVM:

• Introduce one slack variable per example

– And require and

• New optimization problem for learning

44

Maximize margin

Every example has a functional margin of at least 1

i

216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

maxw

1

2w>w

s.t.8i, yi(w>xi + b) 1

maxw

1

2w>w + C X

i

i

s.t.8i, yi(w>xi + b) 1 i

i 0

5

i 0 216

217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

minw

1

2w>w

s.t.8i, yi(w>xi + b) 1

minw

1

2w>w + C X

i

i

s.t.8i, yi(w>xi + b) 1 ⇠i

i 0

minw,b

1

2w>w + C X

i

max 0, 1 yi(w>xi + b)

LHinge(y, x, w, b) = max 0, 1 y(w>x + b)

minw

1

2w0>w0 + C X

i

max(0, 1 yiw>xi)

Jt(w) = 1

2w0>w0 + C X

i

max(0, 1 yiw>xi)

rJt =

⇢[w0; 0] if max(0, 1 yiwxi ) = 0 w Cyixi otherwise

5

(45)

Soft SVM

45

216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

minw

1

2w>w

s.t.8i, yi(w>xi + b) 1

minw

1

2w>w + C X

i

i

s.t.8i, yi(w>xi + b) 1 ⇠i

i 0

minw,b

1

2w>w + C X

i

max 0, 1 yi(w>xi + b)

LHinge(y, x, w, b) = max 0, 1 y(w>x + b)

minw

1

2w0>w0 + C X

i

max(0, 1 yiw>xi)

Jt(w) = 1

2w0>w0 + C X

i

max(0, 1 yiw>xi)

rJt =

⇢[w0; 0] if max(0, 1 yiwix) = 0 w Cyixi otherwise

(46)

Soft SVM

46

Maximize margin

216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269

minw

1

2w>w

s.t.8i, yi(w>xi + b) 1

minw

1

2w>w + C X

i

i

s.t.8i, yi(w>xi + b) 1 ⇠i

i 0

minw,b

1

2w>w + C X

i

max 0, 1 yi(w>xi + b)

LHinge(y, x, w, b) = max 0, 1 y(w>x + b)

minw

1

2w0>w0 + C X

i

max(0, 1 yiw>xi)

Jt(w) = 1

2w0>w0 + C X

i

max(0, 1 yiw>xi)

rJt =

⇢[w0; 0] if max(0, 1 yiwix) = 0 w Cyixi otherwise

References

Related documents

 Fill out ClassMate Internal Help Desk Form on the Barry Tech/LIHSA website under “Staff Resources.”  When filling out form, please be as detailed as possible..  The

Recall that α b is the greatest fractional error in task-duration which the project can tolerate, in each task, without time over-run, while β b is the smallest ‘error’ (early

The variation in state-level stringency is likely to influence the impact of Federal policies on water quality, hence the need to create a theoretically consistent measure

The increase in RWE kinetic and soft violence in Canada over the last few years begs the question: is there a gap in Canada’s hate crime laws.. This paper argues that the legal use

Constitution and Rules may be made at an Annual General Meeting or Special General Meeting, as prescribed in Rule 14.1 of the Official Club Constitution and Rules, provided that

The Impact of the Educational Intervention on Knowledge, Attitude and Practice of Phar- macovigilance toward Adverse Drug Reactions Reporting among Health-care Professionals in

From a large-scale hospital-based registry in Japan, we demonstrated the predictive accuracy of lactate, pH, and potassium for 1-month survival in OHCA patients with

The difference of overweight and obesity observed between sex was not significant (p=0.978) (Table 2). Similar insignificant difference between sex was observed in the