Summer course
on Convex Optimization
Fifth Lecture
Interior-Point Methods (1)
Michel Baes, K.U.Leuven
Interior-Point Methods:
the rebirth of an old idea
Suppose that f is convex and g1, . . . , gm are concave min f(x)
s.t. gi(x) ≥ 0 1 ≤ i ≤ m ≡ x ∈ X We want to solve this by Newton’s method
I Constraints are difficult to handle with this method
Idea: put them in the objective minf(x) + µ
m X i=1
φ(gi(x))
where φ is convex, nondecreasing, and φ(t) → +∞ if t ↓ 0
Then solve it for various µ → 0. Φ(x) := Pm
i=1 φ(gi(x)) is a barrier, as Φ is convex, and Φ(x) → +∞ as x → ∂X
Problems:
Interior-Point Methods:
the rebirth of an old idea
1969 - 1984 I Reign of Augmented Lagrangian Methods
In the language of the lecture of yesterday, F =
x 7→ µ 2
||u||2 − X
i
ui − xi µ
!2
+
: µ > 0
1984 I Narendra Karmarkar creates a new polynomial-time
algorithm for Linear Programming
People realized it fits
The blooming of Interior-Point Methods
1988 I Yurii Nesterov and Arkadii Nemirovski generalize
Interior-Point Methods to Convex Optimization Nonlinearity is not an issue anymore.
Largest nonlinear Optimization problem solved:
109 variables, 3 · 108 constraints (Gonzio, 2006)
1992 I Yurii Nesterov and Mike Todd define efficient
al-gorithms for Semidefinite Optimization. A new way of modelling appears, with applications in Mechanics, Con-trol, Finance, Structural Design,. . . (Boyd, Vandenberghe)
Something odd in Black-Box methods for
convex programming
How do Black-Box methods deal with convexity?
First, you realize that your problem is convex
(or even strongly convex)
Thus, you investigate its global properties
Then you hide your problem in a mysterious black box
You only interact with it through an oracle that gives you local information
(if x is the current point, gives f(x), and/or ∇f(x), and/or ∇2f(x) . . . )
By the way, how do you check convexity ?
I Directly from the definition
Try this one: for x > 0, f(x) := max
exp(||x||22), λmax
n X i=1
xiAi
− ln(x1)
+ 5x4n.
I By using the structure of the function
You know several ”simple” convex functions: t2,exp(t), . . . and several operations that preserve convexity: max, +,. . . And after all this work, you give this beautiful structure to a Black-Box method that doesn’t care about it!
But interior-point method explicitly use this structure to construct a barrier for the feasible set (see below)
Newton’s Method under scrutiny
xk+1 = xk − ∇2f(xk)−1∇f(xk) We use ||| · ||| for the induced matrix norm
Theorem 1 (Kantorovich)
Suppose that the function f : Rn → R ∪ {+∞} satisfies:
B f is twice continuously differentiable,
B ∃M > 0 such that ∀x, y |||∇2f(x)− ∇2f(y)||| ≤ M||x−y||,
B ∇2f(x∗) lI 0
then, when ||x0 − x∗|| < 2l/3M, the iterates xk of Newton’s method are well-defined and:
||xk+1 − x∗|| ≤ M||xk − x∗|| 2
Newton’s Method under scrutiny
Kantorovitch’s proof
xk+1 = xk − ∇2f(xk)−1∇f(xk)
We have:
xk+1 − x∗ = xk − x∗ − ∇2f(xk)−1∇f(xk) = ∇2f(xk)−1
Z 1
0
[∇2f(xk)− ∇2f(x∗ + t(xk − x∗))](xk − x∗)dt
Hence, with rk := ||xk − x∗||,
rk+1 ≤ |||∇2f(xk)−1|||
Z 1
0 |||∇ 2
f(xk) − ∇2f(x∗ + t(xk − x∗))|||dt
rk
≤ |||∇2f(xk)−1|||
M
2 r
2
k ≤
M rk2
2(l − M rk)
,
because ∇2f(xk) − ∇2f(x∗) −M rkIn, and ∇2f(xk) (l − M rk)In. Note that rk+1 < rk when rk < 2l/(3M), because
rk+1 ≤
M rk2
2(l − M rk)
Kantorovitch’s result is very ”strange”
The iterates of Newton’s Method are affine invariant
Proof:
Consider A invertible, x0, xk+1 = xk − ∇2f(xk)−1∇f(xk)
φ(y) := f(Ay), and y0 = A−1x0. Note that h∇φ(y0), hi = lim
t→0
f(Ay0 + tAh) − f(Ay0)
t = h∇f(Ay0), Ahi
and ∇φ(y0) = A∗∇f(Ay0). Similarly ∇2φ(y0) = A∗∇2f(Ay0)A.
y1 = A−1 x0 − ∇2f(Ay0)−1∇f(Ay0)
What’s wrong with the assumptions?
xk+1 = xk − ∇2f(xk)−1∇f(xk) We use ||| · ||| for the induced matrix norm
Theorem 1 (Kantorovich)
Suppose that the function f : Rn → R ∪ {+∞} satisfies:
B f is twice continuously differentiable,
B ∃M > 0 such that ∀x, y |||∇2f(x)− ∇2f(y)||| ≤ M||x−y||,
B ∇2f(x∗) lI 0
then, when ||x0 − x∗|| < 2l/3M, the iterates xk of Newton’s method are well-defined and:
||xk+1 − x∗|| ≤ M||xk − x∗|| 2
Nesterov and Nemirovski’s solution
Instead of using the Euclidean norm || · ||, use a local norm u 7→ ||u||x = h∇2f(x)u, ui
This norm is affine invariant
Let φ(y) := f(Ay), and v = A−1u. We have
h∇2φ(y)v, vi = h(A∗∇2f(y)A)(A−1u),(A−1u)i = h∇2f(x)u, ui The property
∀x, y |||∇2f(x) − ∇2f(y)||| ≤ M||x − y|| should then be replaced by:
Self-concordancy:
one of the two ”big” properties
There exists M > 0 for which:
∀x, y, h ∇3f(x)[h, h, x − y] ≤ M||h||2x||x − y||x ⇔ ∀x, h ∇3f(x)[h, h, h] ≤ M||h||3x
⇔ ∀x, h ∇3f(x)[h, h, h] ≤ 2||h||3x Such functions are called self-concordant
Examples (check it as exercise):
−ln(t) (domain: R++)
−ln det(X) (domain: S++N )
−ln(t2 − ||x||2) (domain: ice-cream cone)
Self-concordant functions:
the right thing for Newton’s method
These functions have MANY properties, among which: For every x ∈ domf, {y : ||y − x||x < 1} ⊆ domf
(interesting for Karmarkar’s method)
Proof:Let x ∈ dom(f), h ∈ Rn. Let φ(t) = 1
||h||x+th
= 1
h∇2f(x + th)h, hi1/2
Then φ(t) → +0 when x + th → ∂dom(f) (because the Hessian goes to ∞).
As long as φ(t) > 0, x + th ∈ dom(f) ∇φ(t) = − ∇
3f(x + th)[h, h, h]
2h∇2f(x+ th)h, hi3/2
Thus |∇φ(t)| < 1, and φ(t) > 0 for −φ(0) < t < φ(0) i.e
Self-concordant functions:
the right thing for Newton’s method
These functions have MANY properties, among which:
I If ||∇f(x)||∗x := qh∇2f(x)−1∇f(x),∇f(x)i ≤ 3−√5 2 ,
then x is in the quadratic convergence zone
(automatic test, no x∗ needed)
I The following method ALWAYS converges
xk+1 = xk − ∇2f(xk)−1∇f(xk)
1+||∇f(xk)||∗xk
Finally ! Interior-point methods
Main idea:
formulate your problem in its conic form, and
use as barrier for your cone a self-concordant function f min hc, xi ← min hc, xi + µf(x)
s.t. Ax = b s.t. Ax = b x ∈ K
The set of minimizers x(µ) is called the primal central path, and x(µ) → x∗ when µ → 0
But wait: is hc, xi + µf(x) a self-concordant function ? Yes!
How to decrease
µ
?
Main goal: we want to decrease it linearly: µ− = (1−θ)µ.
Main idea: use our knowledge
of the quadratic convergence zone
Current point: x(µ). Target: x(µ−).
We have c + µ∇f(x(µ)) = 0, and we want ||c + (1 − θ)µ∇f(x(µ))||∗x(µ) < 3 −
√ 5 2 , i.e.
θµ||∇f(x(µ))||∗x(µ) < 3 − √
5 2 ,
hence, we would like to have a bound for ||∇f(x)||∗x
Note: this bound is responsible of the complexity. The smaller it is, the bigger is the decrease θ
The two crucial properties of barriers
Self-concordancy:
∀x, h ∇3f(x)[h, h, h] ≤ 2||h||3x Bound for ||∇f(x)||∗x:
∀x ∈ domf h∇2f(x)−1∇f(x), ∇f(x)i ≤ ν These functions are called ν-self-concordant barriers
The theoretical complexity of the best IPM is O(√ν ln(C/)) Newton iterations
An interior-point algorithm
Algorithm 1 Let > 0, µ0 > 0 and x0 feasible such that ||c + µ0∇f(x0)||∗x
0 ≤
3−√5 2
Let θ := 1/(1.5 + 14.3√ν) and k := 0
While 2.58µk√ν ≤
1. µk+1 := µk(1 − θ)
2. xk+1 := xk − ∇2f(xk)−1(∇f(xk) + µk+1c) 3. Increment k
Complexity upper bound: (1.03 + 14.3√ν) ln(2.58µ0√ν)/ Proof of constants: PhD Thesis of Fran¸cois Glineur
How do you construct
self-concordant barriers?
1- Basic barriers:Domain Barrier Complexity parameter
R+ −ln(t) 1
S+n −ln(det(X)) n epi||x||2 −ln(t2 − ||x||22) 2 epi exp(x) −ln(t) − ln(ln(t) − x) 2 2- Combining barriers:
I Let f1 be a barrier for K1 with param. ν1, and f2 be a barrier for K2 with param. ν2.
Then f1 + f2 is a barrier for K1 ∩ K2 with param. ν1 + ν2 I Let f be the barrier for K with param. ν
Then f∗(s) := supx∈Rn −hs, xi − f(x)
How do you construct
self-concordant barriers?
1- Basic barriers:
Domain Barrier Complexity parameter
R+ −ln(t) 1
S+n −ln(det(X)) n epi||x||2 −ln(t2 − ||x||22) 2 epi exp(x) −ln(t) − ln(ln(t) − x) 2 2- Combining barriers:
I Let f be the barrier for K with param. ν
The restriction of f to a affine space S is a barrier for the set S ∩ K with param. ν
What about primal-dual problems ?
Everything is the same: min hc, xi
s.t. Ax = b x ∈ K
≥
max hy, bi
s.t. A∗y + s = c s ∈ K∗
i.e. (What’s the optimal value?)
min hc, xi − hb, yi ← min hs, xi + µ(f(x) + f∗(s)) s.t. Ax = b s.t. Ax = b
A∗y + s = c A∗y + s x ∈ K, s ∈ K∗
Strangely enough,
primal-dual IPM work very well
I All IPM optimization software (SeDuMi, MOSEK, . . . ) are primal-dual.
I Efficient IPM methods can solve:
Linear problems,
Second Order problems (ice-cream cone) – in particular Quadratic problems,
Semidefinite problems, and (sometimes)
Geometric problems, i.e. involving posynomials (see on Thursday)
And in practice?
Many speed-ups and tricks are used
I For computing the starting point
(and dealing with infeasible starting points)
I For solving the Newton system (reduction of variable) I For updating µ
Decrease µ much faster than in the theory,
then do several steps targeting the central path
Some references
[1] - Y. Nesterov, Introductory lectures on convex opti-mization: a basic course, Kluwer, 2003
[2] - Y. Nesterov and A. Nemirovski, Interior Point Algo-rithms in Convex Programming, SIAM, 1993
[3] - J. Renegar, A Mathematical View of Interior-Point Methods in Convex Optimization, SIAM, 2001