1
Support Vectors, Duals, and the Kernel Trick
Machine Learning Fall 2017
Supervised Learning: The Setup
1
Machine Learning Spring 2018
The slides are partly from Vivek Srikumar
Support vector machines
• Training by maximizing margin
• The SVM objective
• Solving the SVM optimization problem
• Support vectors, duals and kernels
2
This lecture
1. Dual forms, and support vectors 2. Kernels & kernel trick
3. Properties of kernels 4. Nonlinear SVM
3
This lecture
1. Dual forms, and support vectors
2. Kernels & kernel trick 3. Properties of kernels 4. Nonlinear SVM
4
So far we have seen
• Support vector machines
• Hinge loss and optimizing the regularized loss
More broadly, different algorithms for learning linear classifiers
5
So far we have seen
• Support vector machines
• Hinge loss and optimizing the regularized loss
More broadly, different algorithms for learning linear classifiers
What about non-linear models?
6
One way to learn non-linear models
Explicitly introduce non-linearity into the feature space
7
If the true separator is quadratic
One way to learn non-linear models
Explicitly introduce non-linearity into the feature space
8
If the true separator is quadratic Transform all input points as
One way to learn non-linear models
Explicitly introduce non-linearity into the feature space
9
If the true separator is quadratic Transform all input points as
Now, we can try to find a weight vector in this higher dimensional space That is, predict using wT!(x1, x2) + b≥ 0
Primal and dual forms: constraint optimization
Primal: general constraint optimization problem
10
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
min
w1
2 w
>w
s.t. 8i, y
i(w
>x
i+ b) 1
min
w1
2 w
>w + C X
i
⇠
is.t. 8i, y
i(w
>x
i+ b) 1 ⇠
i⇠
i0
min
w,b1
2 w
>w + C X
i
max 0, 1 y
i(w
>x
i+ b)
L
Hinge(y, x, w, b) = max 0, 1 y(w
>x + b)
min
w1
2 w
0>w
0+ C X
i
max(0, 1 y
iw
>x
i)
J
t(w) = 1
2 w
>0w
0+ C · N max(0, 1 y
iw
>x
i)
rJ
t=
⇢ [w
0; 0] if max(0, 1 y
iw
xi) = 0 [w
0; 0] C · Ny
ix
iotherwise
f (x) f (u) + rf(u)
>(x u)
min
xf (x)
s.t. g
1(x) 0, . . . , g
m(x) 0
5
Primal and dual forms: constraint optimization
Primal: general constraint optimization problem
11
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
min
w1
2 w
>w
s.t. 8i, y
i(w
>x
i+ b) 1
min
w1
2 w
>w + C X
i
⇠
is.t. 8i, y
i(w
>x
i+ b) 1 ⇠
i⇠
i0
min
w,b1
2 w
>w + C X
i
max 0, 1 y
i(w
>x
i+ b)
L
Hinge(y, x, w, b) = max 0, 1 y(w
>x + b)
min
w1
2 w
0>w
0+ C X
i
max(0, 1 y
iw
>x
i)
J
t(w) = 1
2 w
>0w
0+ C · N max(0, 1 y
iw
>x
i)
rJ
t=
⇢ [w
0; 0] if max(0, 1 y
iw
xi) = 0 [w
0; 0] C · Ny
ix
iotherwise
f (x) f (u) + rf(u)
>(x u)
min
xf (x)
s.t. g
1(x) 0, . . . , g
m(x) 0
5
Lagrange form
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
minw
1
2w>w
s.t.8i, yi(w>xi + b) 1
minw
1
2w>w + C X
i
⇠i
s.t.8i, yi(w>xi + b) 1 ⇠i
⇠i 0
minw,b
1
2w>w + C X
i
max 0, 1 yi(w>xi + b)
LHinge(y, x, w, b) = max 0, 1 y(w>x + b)
minw
1
2w>0 w0 + C X
i
max(0, 1 yiw>xi)
Jt(w) = 1
2w0>w0 + C · N max(0, 1 yiw>xi)
rJt =
⇢[w0; 0] if max(0, 1 yiwxi ) = 0 [w0; 0] C · Nyixi otherwise
f (x) f (u) + rf(u)>(x u)
minx f (x)
s.t. g1(x) 0, . . . , gm(x) 0
L(x, ) = f (x) + Xm
i=1
igi(x)
5
Primal and dual forms: constraint optimization
Primal: general constraint optimization problem
12
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
min
w1
2 w
>w
s.t. 8i, y
i(w
>x
i+ b) 1
min
w1
2 w
>w + C X
i
⇠
is.t. 8i, y
i(w
>x
i+ b) 1 ⇠
i⇠
i0
min
w,b1
2 w
>w + C X
i
max 0, 1 y
i(w
>x
i+ b)
L
Hinge(y, x, w, b) = max 0, 1 y(w
>x + b)
min
w1
2 w
0>w
0+ C X
i
max(0, 1 y
iw
>x
i)
J
t(w) = 1
2 w
>0w
0+ C · N max(0, 1 y
iw
>x
i)
rJ
t=
⇢ [w
0; 0] if max(0, 1 y
iw
xi) = 0 [w
0; 0] C · Ny
ix
iotherwise
f (x) f (u) + rf(u)
>(x u)
min
xf (x)
s.t. g
1(x) 0, . . . , g
m(x) 0
5
Lagrange form
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
minw
1
2w>w
s.t.8i, yi(w>xi + b) 1
minw
1
2w>w + C X
i
⇠i
s.t.8i, yi(w>xi + b) 1 ⇠i
⇠i 0
minw,b
1
2w>w + C X
i
max 0, 1 yi(w>xi + b)
LHinge(y, x, w, b) = max 0, 1 y(w>x + b)
minw
1
2w>0 w0 + C X
i
max(0, 1 yiw>xi)
Jt(w) = 1
2w0>w0 + C · N max(0, 1 yiw>xi)
rJt =
⇢[w0; 0] if max(0, 1 yiwxi ) = 0 [w0; 0] C · Nyixi otherwise
f (x) f (u) + rf(u)>(x u)
minx f (x)
s.t. g1(x) 0, . . . , gm(x) 0
L(x, ) = f (x) + Xm
i=1
igi(x)
5
Equivalent formulation, removing constraints on x
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
min
w1
2 w
>w
s.t. 8i, y
i(w
>x
i+ b) 1
min
w1
2 w
>w + C X
i
⇠
is.t. 8i, y
i(w
>x
i+ b) 1 ⇠
i⇠
i0
min
w,b1
2 w
>w + C X
i
max 0, 1 y
i(w
>x
i+ b)
L
Hinge(y, x, w, b) = max 0, 1 y(w
>x + b)
min
w1
2 w
0>w
0+ C X
i
max(0, 1 y
iw
>x
i)
J
t(w) = 1
2 w
>0w
0+ C · N max(0, 1 y
iw
>x
i)
rJ
t=
⇢ [w
0; 0] if max(0, 1 y
iw
ix) = 0 [w
0; 0] C · Ny
ix
iotherwise
f (x) f (u) + rf(u)
>(x u)
min
xf (x)
s.t. g
1(x) 0, . . . , g
m(x) 0
L(x, ) = f (x) +
X
m i=1i
g
i(x)
min
xmax L(x, ) = f (x) +
X
m i=1i
g
i(x) s.t.
10, . . . ,
m0
5
Primal and dual forms: constraint optimization
Primal: general constraint optimization problem
13
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
min
w1
2 w
>w
s.t. 8i, y
i(w
>x
i+ b) 1
min
w1
2 w
>w + C X
i
⇠
is.t. 8i, y
i(w
>x
i+ b) 1 ⇠
i⇠
i0
min
w,b1
2 w
>w + C X
i
max 0, 1 y
i(w
>x
i+ b)
L
Hinge(y, x, w, b) = max 0, 1 y(w
>x + b)
min
w1
2 w
0>w
0+ C X
i
max(0, 1 y
iw
>x
i)
J
t(w) = 1
2 w
>0w
0+ C · N max(0, 1 y
iw
>x
i)
rJ
t=
⇢ [w
0; 0] if max(0, 1 y
iw
xi) = 0 [w
0; 0] C · Ny
ix
iotherwise
f (x) f (u) + rf(u)
>(x u)
min
xf (x)
s.t. g
1(x) 0, . . . , g
m(x) 0
5
Lagrange form
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
minw
1
2w>w
s.t.8i, yi(w>xi + b) 1
minw
1
2w>w + C X
i
⇠i
s.t.8i, yi(w>xi + b) 1 ⇠i
⇠i 0
minw,b
1
2w>w + C X
i
max 0, 1 yi(w>xi + b)
LHinge(y, x, w, b) = max 0, 1 y(w>x + b)
minw
1
2w>0 w0 + C X
i
max(0, 1 yiw>xi)
Jt(w) = 1
2w0>w0 + C · N max(0, 1 yiw>xi)
rJt =
⇢[w0; 0] if max(0, 1 yiwxi ) = 0 [w0; 0] C · Nyixi otherwise
f (x) f (u) + rf(u)>(x u)
minx f (x)
s.t. g1(x) 0, . . . , gm(x) 0
L(x, ) = f (x) + Xm
i=1
igi(x)
5
Equivalent formulation, removing constraints on x
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
min
w1
2 w
>w
s.t. 8i, y
i(w
>x
i+ b) 1
min
w1
2 w
>w + C X
i
⇠
is.t. 8i, y
i(w
>x
i+ b) 1 ⇠
i⇠
i0
min
w,b1
2 w
>w + C X
i
max 0, 1 y
i(w
>x
i+ b)
L
Hinge(y, x, w, b) = max 0, 1 y(w
>x + b)
min
w1
2 w
0>w
0+ C X
i
max(0, 1 y
iw
>x
i)
J
t(w) = 1
2 w
>0w
0+ C · N max(0, 1 y
iw
>x
i)
rJ
t=
⇢ [w
0; 0] if max(0, 1 y
iw
ix) = 0 [w
0; 0] C · Ny
ix
iotherwise
f (x) f (u) + rf(u)
>(x u)
min
xf (x)
s.t. g
1(x) 0, . . . , g
m(x) 0
L(x, ) = f (x) +
X
m i=1i
g
i(x)
min
xmax L(x, ) = f (x) +
X
m i=1i
g
i(x) s.t.
10, . . . ,
m0
min
xmax
0
L(x, ) = f (x) +
X
m i=1i
g
i(x)
(2)
5
Primal and dual forms: constraint optimization
Primal: general constraint optimization problem
14
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
min
w1
2 w
>w
s.t. 8i, y
i(w
>x
i+ b) 1
min
w1
2 w
>w + C X
i
⇠
is.t. 8i, y
i(w
>x
i+ b) 1 ⇠
i⇠
i0
min
w,b1
2 w
>w + C X
i
max 0, 1 y
i(w
>x
i+ b)
L
Hinge(y, x, w, b) = max 0, 1 y(w
>x + b)
min
w1
2 w
0>w
0+ C X
i
max(0, 1 y
iw
>x
i)
J
t(w) = 1
2 w
>0w
0+ C · N max(0, 1 y
iw
>x
i)
rJ
t=
⇢ [w
0; 0] if max(0, 1 y
iw
xi) = 0 [w
0; 0] C · Ny
ix
iotherwise
f (x) f (u) + rf(u)
>(x u)
min
xf (x)
s.t. g
1(x) 0, . . . , g
m(x) 0
5 Why equivalent?
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
minw
1
2w>w
s.t.8i, yi(w>xi + b) 1
minw
1
2w>w + C X
i
⇠i
s.t.8i, yi(w>xi + b) 1 ⇠i
⇠i 0
minw,b
1
2w>w + C X
i
max 0, 1 yi(w>xi + b)
LHinge(y, x, w, b) = max 0, 1 y(w>x + b)
minw
1
2w>0 w0 + C X
i
max(0, 1 yiw>xi)
Jt(w) = 1
2w>0 w0 + C · N max(0, 1 yiw>xi)
rJt =
⇢[w0; 0] if max(0, 1 yiwix) = 0 [w0; 0] C · Nyixi otherwise
f (x) f (u) + rf(u)>(x u)
minx f (x)
s.t. g1(x) 0, . . . , gm(x) 0
L(x, ) = f (x) + Xm
i=1
igi(x)
minx max L(x, ) = f (x) +
Xm i=1
igi(x) s.t. 1 0, . . . , m 0
minx max
0 L(x, ) = f (x) + Xm
i=1
igi(x)
(2)
max0 f (x) +
Xm i=1
igi(x) =
⇢f (x) if x satisfy all the constraints 1 otherwise
5
Primal and dual forms: constraint optimization
Primal: general constraint optimization problem
15
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
min
w1
2 w
>w
s.t. 8i, y
i(w
>x
i+ b) 1
min
w1
2 w
>w + C X
i
⇠
is.t. 8i, y
i(w
>x
i+ b) 1 ⇠
i⇠
i0
min
w,b1
2 w
>w + C X
i
max 0, 1 y
i(w
>x
i+ b)
L
Hinge(y, x, w, b) = max 0, 1 y(w
>x + b)
min
w1
2 w
0>w
0+ C X
i
max(0, 1 y
iw
>x
i)
J
t(w) = 1
2 w
>0w
0+ C · N max(0, 1 y
iw
>x
i)
rJ
t=
⇢ [w
0; 0] if max(0, 1 y
iw
xi) = 0 [w
0; 0] C · Ny
ix
iotherwise
f (x) f (u) + rf(u)
>(x u)
min
xf (x)
s.t. g
1(x) 0, . . . , g
m(x) 0
5 Why equivalent?
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
minw
1
2w>w
s.t.8i, yi(w>xi + b) 1
minw
1
2w>w + C X
i
⇠i
s.t.8i, yi(w>xi + b) 1 ⇠i
⇠i 0
minw,b
1
2w>w + C X
i
max 0, 1 yi(w>xi + b)
LHinge(y, x, w, b) = max 0, 1 y(w>x + b)
minw
1
2w>0 w0 + C X
i
max(0, 1 yiw>xi)
Jt(w) = 1
2w>0 w0 + C · N max(0, 1 yiw>xi)
rJt =
⇢[w0; 0] if max(0, 1 yiwix) = 0 [w0; 0] C · Nyixi otherwise
f (x) f (u) + rf(u)>(x u)
minx f (x)
s.t. g1(x) 0, . . . , gm(x) 0
L(x, ) = f (x) + Xm
i=1
igi(x)
minx max L(x, ) = f (x) +
Xm i=1
igi(x) s.t. 1 0, . . . , m 0
minx max
0 L(x, ) = f (x) + Xm
i=1
igi(x)
(2)
max0 f (x) +
Xm i=1
igi(x) =
⇢f (x) if x satisfy all the constraints 1 otherwise
5
1
= . . . =
m= 0
<latexit sha1_base64="KwPfTpSGDOCgPJNEw2zpRNa+MCg=">AAACC3icbZDLSsNAFIYnXmu9RV26GVoEVyURQTdC0Y3LCvYCbQiTyaQdOpcwMxFK6N6Nr+LGhSJufQF3vo3TNAtt/WHg4z/ncOb8UcqoNp737aysrq1vbFa2qts7u3v77sFhR8tMYdLGkknVi5AmjArSNtQw0ksVQTxipBuNb2b17gNRmkpxbyYpCTgaCppQjIy1Qrc2YLY5RqEPr+CAxdLoAuYmt+yFbt1reIXgMvgl1EGpVuh+DWKJM06EwQxp3fe91AQ5UoZiRqbVQaZJivAYDUnfokCc6CAvbpnCE+vEMJHKPmFg4f6eyBHXesIj28mRGenF2sz8r9bPTHIZ5FSkmSECzxclGYNGwlkwMKaKYMMmFhBW1P4V4hFSCBsbX9WG4C+evAyds4bvNfy783rzuoyjAo5BDZwCH1yAJrgFLdAGGDyCZ/AK3pwn58V5dz7mrStOOXME/sj5/AHZk5j+</latexit><latexit sha1_base64="KwPfTpSGDOCgPJNEw2zpRNa+MCg=">AAACC3icbZDLSsNAFIYnXmu9RV26GVoEVyURQTdC0Y3LCvYCbQiTyaQdOpcwMxFK6N6Nr+LGhSJufQF3vo3TNAtt/WHg4z/ncOb8UcqoNp737aysrq1vbFa2qts7u3v77sFhR8tMYdLGkknVi5AmjArSNtQw0ksVQTxipBuNb2b17gNRmkpxbyYpCTgaCppQjIy1Qrc2YLY5RqEPr+CAxdLoAuYmt+yFbt1reIXgMvgl1EGpVuh+DWKJM06EwQxp3fe91AQ5UoZiRqbVQaZJivAYDUnfokCc6CAvbpnCE+vEMJHKPmFg4f6eyBHXesIj28mRGenF2sz8r9bPTHIZ5FSkmSECzxclGYNGwlkwMKaKYMMmFhBW1P4V4hFSCBsbX9WG4C+evAyds4bvNfy783rzuoyjAo5BDZwCH1yAJrgFLdAGGDyCZ/AK3pwn58V5dz7mrStOOXME/sj5/AHZk5j+</latexit><latexit sha1_base64="KwPfTpSGDOCgPJNEw2zpRNa+MCg=">AAACC3icbZDLSsNAFIYnXmu9RV26GVoEVyURQTdC0Y3LCvYCbQiTyaQdOpcwMxFK6N6Nr+LGhSJufQF3vo3TNAtt/WHg4z/ncOb8UcqoNp737aysrq1vbFa2qts7u3v77sFhR8tMYdLGkknVi5AmjArSNtQw0ksVQTxipBuNb2b17gNRmkpxbyYpCTgaCppQjIy1Qrc2YLY5RqEPr+CAxdLoAuYmt+yFbt1reIXgMvgl1EGpVuh+DWKJM06EwQxp3fe91AQ5UoZiRqbVQaZJivAYDUnfokCc6CAvbpnCE+vEMJHKPmFg4f6eyBHXesIj28mRGenF2sz8r9bPTHIZ5FSkmSECzxclGYNGwlkwMKaKYMMmFhBW1P4V4hFSCBsbX9WG4C+evAyds4bvNfy783rzuoyjAo5BDZwCH1yAJrgFLdAGGDyCZ/AK3pwn58V5dz7mrStOOXME/sj5/AHZk5j+</latexit><latexit sha1_base64="KwPfTpSGDOCgPJNEw2zpRNa+MCg=">AAACC3icbZDLSsNAFIYnXmu9RV26GVoEVyURQTdC0Y3LCvYCbQiTyaQdOpcwMxFK6N6Nr+LGhSJufQF3vo3TNAtt/WHg4z/ncOb8UcqoNp737aysrq1vbFa2qts7u3v77sFhR8tMYdLGkknVi5AmjArSNtQw0ksVQTxipBuNb2b17gNRmkpxbyYpCTgaCppQjIy1Qrc2YLY5RqEPr+CAxdLoAuYmt+yFbt1reIXgMvgl1EGpVuh+DWKJM06EwQxp3fe91AQ5UoZiRqbVQaZJivAYDUnfokCc6CAvbpnCE+vEMJHKPmFg4f6eyBHXesIj28mRGenF2sz8r9bPTHIZ5FSkmSECzxclGYNGwlkwMKaKYMMmFhBW1P4V4hFSCBsbX9WG4C+evAyds4bvNfy783rzuoyjAo5BDZwCH1yAJrgFLdAGGDyCZ/AK3pwn58V5dz7mrStOOXME/sj5/AHZk5j+</latexit>
Primal and dual forms: constraint optimization
Primal: general constraint optimization problem
16
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
minw
1
2w>w
s.t.8i, yi(w>xi + b) 1
minw
1
2w>w + C X
i
⇠i
s.t.8i, yi(w>xi + b) 1 ⇠i
⇠i 0
minw,b
1
2w>w + C X
i
max 0, 1 yi(w>xi + b)
LHinge(y, x, w, b) = max 0, 1 y(w>x + b)
minw
1
2w>0 w0 + C X
i
max(0, 1 yiw>xi)
Jt(w) = 1
2w>0 w0 + C · N max(0, 1 yiw>xi)
rJt =
⇢[w0; 0] if max(0, 1 yiwix) = 0 [w0; 0] C · Nyixi otherwise
f (x) f (u) + rf(u)>(x u)
minx f (x)
s.t. g1(x) 0, . . . , gm(x) 0
L(x, ) = f (x) + Xm
i=1
igi(x)
minx max L(x, ) = f (x) +
Xm i=1
igi(x) s.t. 1 0, . . . , m 0
minx max
0 L(x, ) = f (x) + Xm
i=1
igi(x)
(2)
max0 f (x) +
Xm i=1
igi(x) =
⇢f (x) if x satisfy all the constraints 1 otherwise
5
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
min
w1
2 w
>w
s.t. 8i, y
i(w
>x
i+ b) 1
min
w1
2 w
>w + C X
i
⇠
is.t. 8i, y
i(w
>x
i+ b) 1 ⇠
i⇠
i0
min
w,b1
2 w
>w + C X
i
max 0, 1 y
i(w
>x
i+ b)
L
Hinge(y, x, w, b) = max 0, 1 y(w
>x + b)
min
w1
2 w
0>w
0+ C X
i
max(0, 1 y
iw
>x
i)
J
t(w) = 1
2 w
0>w
0+ C · N max(0, 1 y
iw
>x
i)
rJ
t=
⇢ [w
0; 0] if max(0, 1 y
iw
ix) = 0 [w
0; 0] C · Ny
ix
iotherwise
f (x) f (u) + rf(u)
>(x u)
min
xf (x)
s.t. g
1(x) 0, . . . , g
m(x) 0
L(x, ) = f (x) +
X
m i=1i
g
i(x)
min
xmax L(x, ) = f (x) +
X
m i=1i
g
i(x) s.t.
10, . . . ,
m0
min
xmax
0
L(x, ) = f (x) +
X
m i=1i
g
i(x)
(2)
5
Outer minimization:
Search x over constraint space To minimize f(x)
Primal and dual forms: constraint optimization
Primal: general constraint optimization problem
17
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
min
w1
2 w
>w
s.t. 8i, y
i(w
>x
i+ b) 1
min
w1
2 w
>w + C X
i
⇠
is.t. 8i, y
i(w
>x
i+ b) 1 ⇠
i⇠
i0
min
w,b1
2 w
>w + C X
i
max 0, 1 y
i(w
>x
i+ b)
L
Hinge(y, x, w, b) = max 0, 1 y(w
>x + b)
min
w1
2 w
0>w
0+ C X
i
max(0, 1 y
iw
>x
i)
J
t(w) = 1
2 w
>0w
0+ C · N max(0, 1 y
iw
>x
i)
rJ
t=
⇢ [w
0; 0] if max(0, 1 y
iw
xi) = 0 [w
0; 0] C · Ny
ix
iotherwise
f (x) f (u) + rf(u)
>(x u)
min
xf (x)
s.t. g
1(x) 0, . . . , g
m(x) 0
5
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
min
w1
2 w
>w
s.t. 8i, y
i(w
>x
i+ b) 1
min
w1
2 w
>w + C X
i
⇠
is.t. 8i, y
i(w
>x
i+ b) 1 ⇠
i⇠
i0
min
w,b1
2 w
>w + C X
i
max 0, 1 y
i(w
>x
i+ b)
L
Hinge(y, x, w, b) = max 0, 1 y(w
>x + b)
min
w1
2 w
0>w
0+ C X
i
max(0, 1 y
iw
>x
i)
J
t(w) = 1
2 w
>0w
0+ C · N max(0, 1 y
iw
>x
i)
rJ
t=
⇢ [w
0; 0] if max(0, 1 y
iw
ix) = 0 [w
0; 0] C · Ny
ix
iotherwise
f (x) f (u) + rf(u)
>(x u)
min
xf (x)
s.t. g
1(x) 0, . . . , g
m(x) 0
L(x, ) = f (x) +
X
m i=1i
g
i(x)
min
xmax L(x, ) = f (x) +
X
m i=1i
g
i(x) s.t.
10, . . . ,
m0
min
xmax
0
L(x, ) = f (x) +
X
m i=1i
g
i(x)
(2)
5
Totally equivalent!
It is very trial to incorporate equality constraints! How?
Primal and dual forms: constraint optimization
Wo focus on this primal form
18
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
min
w1
2 w
>w
s.t. 8i, y
i(w
>x
i+ b) 1
min
w1
2 w
>w + C X
i
⇠
is.t. 8i, y
i(w
>x
i+ b) 1 ⇠
i⇠
i0
min
w,b1
2 w
>w + C X
i
max 0, 1 y
i(w
>x
i+ b)
L
Hinge(y, x, w, b) = max 0, 1 y(w
>x + b)
min
w1
2 w
0>w
0+ C X
i
max(0, 1 y
iw
>x
i)
J
t(w) = 1
2 w
>0w
0+ C · N max(0, 1 y
iw
>x
i)
rJ
t=
⇢ [w
0; 0] if max(0, 1 y
iw
xi) = 0 [w
0; 0] C · Ny
ix
iotherwise
f (x) f (u) + rf(u)
>(x u)
min
xf (x)
s.t. g
1(x) 0, . . . , g
m(x) 0
L(x, ) = f (x) +
X
m i=1i
g
i(x)
min
xmax L(x, ) = f (x) +
X
m i=1i
g
i(x) s.t.
10, . . . ,
m0
min
xmax
0
L(x, ) = f (x) +
X
m i=1i