Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

(1)

Summer course

on Convex Optimization

Fifth Lecture

Interior-Point Methods (1)

Michel Baes, K.U.Leuven

(2)

Interior-Point Methods:

the rebirth of an old idea

Suppose that f is convex and g₁, . . . , gm are concave min f(x)

s.t. g_i(x) _≥ 0 1 _≤ i _≤ m _≡ x _∈ X We want to solve this by Newton’s method

I Constraints are difficult to handle with this method

Idea: put them in the objective minf(x) + µ

m X i=1

φ(g_i(x))

where φ is convex, nondecreasing, and φ(t) _→ +_∞ if t _↓ 0

Then solve it for various µ _→ 0. Φ(x) := Pm

i=1 φ(gi(x)) is a barrier, as Φ is convex, and Φ(x) _→ +_∞ as x _→ ∂X

Problems:

(3)

Interior-Point Methods:

the rebirth of an old idea

1969 - 1984 I Reign of Augmented Lagrangian Methods

In the language of the lecture of yesterday, F =

 



x _7→ µ 2



||u||2 − X

i

u_i ₋ xi µ

!2

+



 : µ > 0  



1984 I _{Narendra Karmarkar creates a new polynomial-time}

algorithm for Linear Programming

People realized it fits

(4)

The blooming of Interior-Point Methods

1988 I _{Yurii Nesterov and Arkadii Nemirovski generalize}

Interior-Point Methods to Convex Optimization Nonlinearity is not an issue anymore.

Largest nonlinear Optimization problem solved:

109 variables, 3 _· 108 constraints (Gonzio, 2006)

1992 I _{Yurii Nesterov and Mike Todd define efficient}

al-gorithms for Semidefinite Optimization. A new way of modelling appears, with applications in Mechanics, Con-trol, Finance, Structural Design,. . . (Boyd, Vandenberghe)

(5)

Something odd in Black-Box methods for

convex programming

How do Black-Box methods deal with convexity?

_{First, you realize that your problem is convex}

(or even strongly convex)

Thus, you investigate its global properties

Then you hide your problem in a mysterious black box

You only interact with it through an oracle that gives you local information

(if x is the current point, gives f(x), and/or _∇f(x), and/or _∇2f(x) . . . )

(6)

By the way, how do you check convexity ?

I _{Directly from the definition}

Try this one: for x > 0, f(x) := max

 



exp(_||x_||2₂), λmax



 n X i=1

x_iA_i 

 − ln(x1)  



+ 5x4_n.

I _{By using the structure of the function}

You know several ”simple” convex functions: t2,exp(t), . . . and several operations that preserve convexity: max, +,. . . And after all this work, you give this beautiful structure to a Black-Box method that doesn’t care about it!

But interior-point method explicitly use this structure to construct a barrier for the feasible set (see below)

(7)

Newton’s Method under scrutiny

x_k₊₁ = x_k _{− ∇}2f(x_k)−1_∇f(x_k) We use _{||| · |||} for the induced matrix norm

Theorem 1 (Kantorovich)

Suppose that the function f : _Rn _{→ R ∪ {}+_∞} satisfies:

B _f is twice continuously differentiable,

B _∃_{M >} 0 such that _∀_{x, y} _|||∇2_f(x)_{− ∇}2f(y)_{||| ≤} M_||x₋y_||,

B _∇2_f_(x∗₎ _lI ₀

then, when _||_x₀ ₋ _x∗_|| _< _2l/3M, the iterates _x_k of Newton’s method are well-defined and:

||x_k₊₁ ₋ x∗_{|| ≤} M||xk − x∗|| 2

(8)

Newton’s Method under scrutiny

Kantorovitch’s proof

x_k₊₁ = x_k _{− ∇}2f(x_k)−1_∇f(x_k)

We have:

xk+1 − x∗ = xk − x∗ − ∇2f(xk)−1∇f(xk) = _∇2f(xk)−1

Z 1

0

[_∇2f(xk)− ∇2f(x∗ + t(xk − x∗))](xk − x∗)dt

Hence, with rk := ||xk − x∗||,

rk+1 ≤ |||∇2f(xk)−1|||

Z 1

0 |||∇ 2

f(xk) − ∇2f(x∗ + t(xk − x∗))|||dt

rk

≤ |||∇2f(xk)−1|||

M

2 r

2

k ≤

M r_k2

2(l ₋ M rk)

,

because _∇2f(xk) − ∇2f(x∗) −M rkIn, and ∇2f(xk) (l − M rk)In. Note that rk+1 < rk when rk < 2l/(3M), because

rk+1 ≤

M r_k2

2(l ₋ M rk)

(9)

Kantorovitch’s result is very ”strange”

The iterates of Newton’s Method are affine invariant

Proof:

Consider A invertible, x₀, x_k₊₁ = x_k _{− ∇}2f(x_k)−1_∇f(x_k)

φ(y) := f(Ay), and y₀ = A−1x₀. Note that h∇φ(y₀), h_i = lim

t_→0

f(Ay₀ + tAh) ₋ f(Ay₀)

t = h∇f(Ay0), Ahi

and _∇φ(y₀) = A∗_∇f(Ay₀). Similarly _∇2φ(y₀) = A∗_∇2f(Ay₀)A.

y₁ = A−1 x₀ _{− ∇}2f(Ay₀)−1_∇f(Ay₀)

(10)

What’s wrong with the assumptions?

x_k₊₁ = x_k _{− ∇}2f(x_k)−1_∇f(x_k) We use _{||| · |||} for the induced matrix norm

Theorem 1 (Kantorovich)

Suppose that the function f : _Rn _{→ R ∪ {}+_∞} satisfies:

B _f is twice continuously differentiable,

B _∃_{M >} 0 such that _∀_{x, y} _|||∇2_f(x)_{− ∇}2f(y)_{||| ≤} M_||x₋y_||,

B _∇2_f_(x∗₎ _lI ₀

then, when _||_x₀ ₋ _x∗_|| _< _2l/3M, the iterates _x_k of Newton’s method are well-defined and:

||x_k₊₁ ₋ x∗_{|| ≤} M||xk − x∗|| 2

(11)

Nesterov and Nemirovski’s solution

Instead of using the Euclidean norm _{|| · ||}, use a local norm u _{7→ ||}u_||_x = _h∇2f(x)u, u_i

This norm is affine invariant

Let φ(y) := f(Ay), and v = A−1u. We have

h∇2φ(y)v, v_i = _h(A∗_∇2f(y)A)(A−1u),(A−1u)_i = _h∇2f(x)u, u_i The property

∀x, y _|||∇2f(x) _{− ∇}2f(y)_{||| ≤} M_||x ₋ y_|| should then be replaced by:

(12)

Self-concordancy:

one of the two ”big” properties

There exists M > 0 for which:

∀x, y, h _∇3f(x)[h, h, x ₋ y] _≤ M_||h_||2_x_||x ₋ y_||x ⇔ ∀x, h _∇3f(x)[h, h, h] _≤ M_||h_||3_x

⇔ ∀x, h _∇3f(x)[h, h, h] _≤ 2_||h_||3_x Such functions are called self-concordant

Examples (check it as exercise):

−ln(t) (domain: _R₊₊)

−ln det(X) (domain: _S₊₊N )

−ln(t2 _{− ||}x_||2) (domain: ice-cream cone)

(13)

Self-concordant functions:

the right thing for Newton’s method

These functions have MANY properties, among which: For every x _∈ domf, _{y : _||y ₋ x_||_x < 1_{} ⊆} domf

(interesting for Karmarkar’s method)

Proof:Let x _∈ dom(f), h _{∈ R}n. Let φ(t) = 1

||h_||x+th

= 1

h∇2_f₍_x ₊ _th₎_{h, h}_i1/2

Then φ(t) _→ +0 when x + th _→ ∂dom(f) (because the Hessian goes to _∞).

As long as φ(t) > 0, x + th _∈ dom(f) ∇φ(t) = ₋ ∇

3_f₍_x ₊ _th_)[_{h, h, h}_]

2_h∇2_f₍_x₊ _th₎_{h, h}_i3/2

Thus _|∇φ(t)_| < 1, and φ(t) > 0 for ₋φ(0) < t < φ(0) i.e

(14)

Self-concordant functions:

the right thing for Newton’s method

These functions have MANY properties, among which:

I _If _||∇_f_(x)_||∗_x _:= q_h∇2_f_(x)−1_∇_f_(x),_∇_f_(x)_{i ≤} 3−√5 2 ,

then x is in the quadratic convergence zone

(automatic test, no x∗ needed)

I The following method ALWAYS converges

x_k₊₁ = x_k ₋ ∇2f(xk)−1∇f(xk)

1+_||∇f(x_k)_||∗_xk

(15)

Finally ! Interior-point methods

Main idea:

formulate your problem in its conic form, and

use as barrier for your cone a self-concordant function f min _hc, x_i _← min _hc, x_i + µf(x)

s.t. Ax = b s.t. Ax = b x _∈ K

The set of minimizers x(µ) is called the primal central path, and x(µ) _→ x∗ when µ _→ 0

But wait: is _hc, x_i + µf(x) a self-concordant function ? Yes!

(16)

How to decrease

µ

?

Main goal: we want to decrease it linearly: µ− = (1₋θ)µ.

Main idea: use our knowledge

of the quadratic convergence zone

Current point: x(µ). Target: x(µ−).

We have c + µ_∇f(x(µ)) = 0, and we want ||c + (1 ₋ θ)µ_∇f(x(µ))_||∗_x₍_µ₎ < 3 −

√ 5 2 , i.e.

θµ_||∇f(x(µ))_||∗_x₍_µ₎ < 3 − √

5 2 ,

hence, we would like to have a bound for _||∇f(x)_||∗_x

Note: this bound is responsible of the complexity. The smaller it is, the bigger is the decrease θ

(17)

The two crucial properties of barriers

Self-concordancy:

∀x, h _∇3f(x)[h, h, h] _≤ 2_||h_||3_x Bound for _||∇f(x)_||∗_x:

∀x _∈ domf _h∇2f(x)−1_∇f(x), _∇f(x)_{i ≤} ν These functions are called ν-self-concordant barriers

The theoretical complexity of the best IPM is O(√ν ln(C/)) Newton iterations

(18)

An interior-point algorithm

Algorithm 1 Let > 0, µ₀ > 0 and x₀ feasible such that _||_c + µ₀_∇f(x₀)_||∗_x

0 ≤

3₋√5 2

Let _θ _{:= 1/(1.5 + 14.3}√_ν) and _k := 0

While 2.58µ_k√ν _≤

1. _µ_k₊₁ := µ_k(1 ₋ θ)

2. _x_k₊₁ := x_k _{− ∇}2f(x_k)−1(_∇f(x_k) + µ_k₊₁c) 3. Increment _k

Complexity upper bound: (1.03 + 14.3√ν) ln(2.58µ₀√ν)/ Proof of constants: PhD Thesis of Fran¸cois Glineur

(19)

How do you construct

self-concordant barriers?

1- Basic barriers:

Domain Barrier Complexity parameter

R+ −ln(t) 1

S₊n −ln(det(X)) n epi_||x_||₂ ₋ln(t2 _{− ||}x_||2₂) 2 epi exp(x) ₋ln(t) ₋ ln(ln(t) ₋ x) 2 2- Combining barriers:

I Let f₁ be a barrier for K₁ with param. ν₁, and f₂ be a barrier for K₂ with param. ν₂.

Then f₁ + f₂ is a barrier for K₁ _∩ K₂ with param. ν₁ + ν₂ I _Let _f _{be the barrier for} _K _{with param.} _ν

Then f_∗(s) := sup_x_∈Rn −hs, xi − f(x)

(20)

How do you construct

self-concordant barriers?

1- Basic barriers:

Domain Barrier Complexity parameter

R+ −ln(t) 1

S₊n −ln(det(X)) n epi_||x_||₂ ₋ln(t2 _{− ||}x_||2₂) 2 epi exp(x) ₋ln(t) ₋ ln(ln(t) ₋ x) 2 2- Combining barriers:

I _Let _f _{be the barrier for} _K _{with param.} _ν

The restriction of f to a affine space S is a barrier for the set S _∩ K with param. ν

(21)

What about primal-dual problems ?

Everything is the same: min _hc, x_i

s.t. Ax = b x _∈ K

≥

max _hy, b_i

s.t. A∗y + s = c s _∈ K∗

i.e. (What’s the optimal value?)

min _hc, x_{i − h}b, y_i _← min _hs, x_i + µ(f(x) + f_∗(s)) s.t. Ax = b s.t. Ax = b

A∗y + s = c A∗y + s x _∈ K, s _∈ K∗

(22)

Strangely enough,

primal-dual IPM work very well

I All IPM optimization software (SeDuMi, MOSEK, . . . ) are primal-dual.

I _{Efficient IPM methods can solve:}

Linear problems,

Second Order problems (ice-cream cone) – in particular Quadratic problems,

Semidefinite problems, and (sometimes)

Geometric problems, i.e. involving posynomials (see on Thursday)

(23)

And in practice?

Many speed-ups and tricks are used

I For computing the starting point

(and dealing with infeasible starting points)

I _{For solving the Newton system (reduction of variable)} I _{For updating} _µ

Decrease µ much faster than in the theory,

then do several steps targeting the central path

(24)

Some references

[1] - Y. Nesterov, Introductory lectures on convex opti-mization: a basic course, Kluwer, 2003

[2] - Y. Nesterov and A. Nemirovski, Interior Point Algo-rithms in Convex Programming, SIAM, 1993

[3] - J. Renegar, A Mathematical View of Interior-Point Methods in Convex Optimization, SIAM, 2001