• No results found

Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U.

N/A
N/A
Protected

Academic year: 2021

Share "Summer course on Convex Optimization. Fifth Lecture Interior-Point Methods (1) Michel Baes, K.U.Leuven Bharath Rangarajan, U."

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

Summer course

on Convex Optimization

Fifth Lecture

Interior-Point Methods (1)

Michel Baes, K.U.Leuven

(2)

Interior-Point Methods:

the rebirth of an old idea

Suppose that f is convex and g1, . . . , gm are concave min f(x)

s.t. gi(x) 0 1 i m x X We want to solve this by Newton’s method

I Constraints are difficult to handle with this method

Idea: put them in the objective minf(x) + µ

m X i=1

φ(gi(x))

where φ is convex, nondecreasing, and φ(t) + if t 0

Then solve it for various µ 0. Φ(x) := Pm

i=1 φ(gi(x)) is a barrier, as Φ is convex, and Φ(x) + as x ∂X

Problems:

(3)

Interior-Point Methods:

the rebirth of an old idea

1969 - 1984 I Reign of Augmented Lagrangian Methods

In the language of the lecture of yesterday, F =

 

x 7→ µ 2

||u||2 − X

i

ui xi µ

!2

+

 : µ > 0  

1984 I Narendra Karmarkar creates a new polynomial-time

algorithm for Linear Programming

People realized it fits

(4)

The blooming of Interior-Point Methods

1988 I Yurii Nesterov and Arkadii Nemirovski generalize

Interior-Point Methods to Convex Optimization Nonlinearity is not an issue anymore.

Largest nonlinear Optimization problem solved:

109 variables, 3 · 108 constraints (Gonzio, 2006)

1992 I Yurii Nesterov and Mike Todd define efficient

al-gorithms for Semidefinite Optimization. A new way of modelling appears, with applications in Mechanics, Con-trol, Finance, Structural Design,. . . (Boyd, Vandenberghe)

(5)

Something odd in Black-Box methods for

convex programming

How do Black-Box methods deal with convexity?

First, you realize that your problem is convex

(or even strongly convex)

Thus, you investigate its global properties

Then you hide your problem in a mysterious black box

You only interact with it through an oracle that gives you local information

(if x is the current point, gives f(x), and/or f(x), and/or 2f(x) . . . )

(6)

By the way, how do you check convexity ?

I Directly from the definition

Try this one: for x > 0, f(x) := max

 

exp(||x||22), λmax

 n X i=1

xiAi

 − ln(x1)  

+ 5x4n.

I By using the structure of the function

You know several ”simple” convex functions: t2,exp(t), . . . and several operations that preserve convexity: max, +,. . . And after all this work, you give this beautiful structure to a Black-Box method that doesn’t care about it!

But interior-point method explicitly use this structure to construct a barrier for the feasible set (see below)

(7)

Newton’s Method under scrutiny

xk+1 = xk − ∇2f(xk)−1f(xk) We use ||| · ||| for the induced matrix norm

Theorem 1 (Kantorovich)

Suppose that the function f : Rn → R ∪ {+∞} satisfies:

B f is twice continuously differentiable,

B M > 0 such that x, y |||∇2f(x)− ∇2f(y)||| ≤ M||xy||,

B 2f(x) lI 0

then, when ||x0 x|| < 2l/3M, the iterates xk of Newton’s method are well-defined and:

||xk+1 x∗|| ≤ M||xk − x∗|| 2

(8)

Newton’s Method under scrutiny

Kantorovitch’s proof

xk+1 = xk − ∇2f(xk)−1f(xk)

We have:

xk+1 − x∗ = xk − x∗ − ∇2f(xk)−1∇f(xk) = 2f(xk)−1

Z 1

0

[2f(xk)− ∇2f(x∗ + t(xk − x∗))](xk − x∗)dt

Hence, with rk := ||xk − x∗||,

rk+1 ≤ |||∇2f(xk)−1|||

Z 1

0 |||∇ 2

f(xk) − ∇2f(x∗ + t(xk − x∗))|||dt

rk

≤ |||∇2f(xk)−1|||

M

2 r

2

k ≤

M rk2

2(l M rk)

,

because 2f(xk) − ∇2f(x∗) −M rkIn, and ∇2f(xk) (l − M rk)In. Note that rk+1 < rk when rk < 2l/(3M), because

rk+1 ≤

M rk2

2(l M rk)

(9)

Kantorovitch’s result is very ”strange”

The iterates of Newton’s Method are affine invariant

Proof:

Consider A invertible, x0, xk+1 = xk − ∇2f(xk)−1f(xk)

φ(y) := f(Ay), and y0 = A−1x0. Note that h∇φ(y0), hi = lim

t0

f(Ay0 + tAh) f(Ay0)

t = h∇f(Ay0), Ahi

and φ(y0) = A∗f(Ay0). Similarly 2φ(y0) = A∗2f(Ay0)A.

y1 = A−1 x0 − ∇2f(Ay0)−1f(Ay0)

(10)

What’s wrong with the assumptions?

xk+1 = xk − ∇2f(xk)−1f(xk) We use ||| · ||| for the induced matrix norm

Theorem 1 (Kantorovich)

Suppose that the function f : Rn → R ∪ {+∞} satisfies:

B f is twice continuously differentiable,

B M > 0 such that x, y |||∇2f(x)− ∇2f(y)||| ≤ M||xy||,

B 2f(x) lI 0

then, when ||x0 x|| < 2l/3M, the iterates xk of Newton’s method are well-defined and:

||xk+1 x∗|| ≤ M||xk − x∗|| 2

(11)

Nesterov and Nemirovski’s solution

Instead of using the Euclidean norm || · ||, use a local norm u 7→ ||u||x = h∇2f(x)u, ui

This norm is affine invariant

Let φ(y) := f(Ay), and v = A−1u. We have

h∇2φ(y)v, vi = h(A∗2f(y)A)(A−1u),(A−1u)i = h∇2f(x)u, ui The property

∀x, y |||∇2f(x) − ∇2f(y)||| ≤ M||x y|| should then be replaced by:

(12)

Self-concordancy:

one of the two ”big” properties

There exists M > 0 for which:

∀x, y, h 3f(x)[h, h, x y] M||h||2x||x y||x ⇔ ∀x, h 3f(x)[h, h, h] M||h||3x

⇔ ∀x, h 3f(x)[h, h, h] 2||h||3x Such functions are called self-concordant

Examples (check it as exercise):

−ln(t) (domain: R++)

−ln det(X) (domain: S++N )

−ln(t2 − ||x||2) (domain: ice-cream cone)

(13)

Self-concordant functions:

the right thing for Newton’s method

These functions have MANY properties, among which: For every x domf, {y : ||y x||x < 1} ⊆ domf

(interesting for Karmarkar’s method)

Proof:Let x dom(f), h ∈ Rn. Let φ(t) = 1

||h||x+th

= 1

h∇2f(x + th)h, hi1/2

Then φ(t) +0 when x + th ∂dom(f) (because the Hessian goes to ).

As long as φ(t) > 0, x + th dom(f) ∇φ(t) =

3f(x + th)[h, h, h]

2h∇2f(x+ th)h, hi3/2

Thus |∇φ(t)| < 1, and φ(t) > 0 for φ(0) < t < φ(0) i.e

(14)

Self-concordant functions:

the right thing for Newton’s method

These functions have MANY properties, among which:

I If ||∇f(x)||x := qh∇2f(x)−1f(x),f(x)i ≤ 3−√5 2 ,

then x is in the quadratic convergence zone

(automatic test, no x∗ needed)

I The following method ALWAYS converges

xk+1 = xk ∇2f(xk)−1∇f(xk)

1+||∇f(xk)||xk

(15)

Finally ! Interior-point methods

Main idea:

formulate your problem in its conic form, and

use as barrier for your cone a self-concordant function f min hc, xi min hc, xi + µf(x)

s.t. Ax = b s.t. Ax = b x K

The set of minimizers x(µ) is called the primal central path, and x(µ) x∗ when µ 0

But wait: is hc, xi + µf(x) a self-concordant function ? Yes!

(16)

How to decrease

µ

?

Main goal: we want to decrease it linearly: µ− = (1θ)µ.

Main idea: use our knowledge

of the quadratic convergence zone

Current point: x(µ). Target: x(µ−).

We have c + µf(x(µ)) = 0, and we want ||c + (1 θ)µf(x(µ))||x(µ) < 3 −

√ 5 2 , i.e.

θµ||∇f(x(µ))||x(µ) < 3 − √

5 2 ,

hence, we would like to have a bound for ||∇f(x)||x

Note: this bound is responsible of the complexity. The smaller it is, the bigger is the decrease θ

(17)

The two crucial properties of barriers

Self-concordancy:

∀x, h 3f(x)[h, h, h] 2||h||3x Bound for ||∇f(x)||x:

∀x domf h∇2f(x)−1f(x), f(x)i ≤ ν These functions are called ν-self-concordant barriers

The theoretical complexity of the best IPM is O(√ν ln(C/)) Newton iterations

(18)

An interior-point algorithm

Algorithm 1 Let > 0, µ0 > 0 and x0 feasible such that ||c + µ0f(x0)||x

0 ≤

3√5 2

Let θ := 1/(1.5 + 14.3ν) and k := 0

While 2.58µk√ν

1. µk+1 := µk(1 θ)

2. xk+1 := xk − ∇2f(xk)−1(f(xk) + µk+1c) 3. Increment k

Complexity upper bound: (1.03 + 14.3√ν) ln(2.58µ0√ν)/ Proof of constants: PhD Thesis of Fran¸cois Glineur

(19)

How do you construct

self-concordant barriers?

1- Basic barriers:

Domain Barrier Complexity parameter

R+ −ln(t) 1

S+n −ln(det(X)) n epi||x||2 ln(t2 − ||x||22) 2 epi exp(x) ln(t) ln(ln(t) x) 2 2- Combining barriers:

I Let f1 be a barrier for K1 with param. ν1, and f2 be a barrier for K2 with param. ν2.

Then f1 + f2 is a barrier for K1 K2 with param. ν1 + ν2 I Let f be the barrier for K with param. ν

Then f(s) := supx∈Rn −hs, xi − f(x)

(20)

How do you construct

self-concordant barriers?

1- Basic barriers:

Domain Barrier Complexity parameter

R+ −ln(t) 1

S+n −ln(det(X)) n epi||x||2 ln(t2 − ||x||22) 2 epi exp(x) ln(t) ln(ln(t) x) 2 2- Combining barriers:

I Let f be the barrier for K with param. ν

The restriction of f to a affine space S is a barrier for the set S K with param. ν

(21)

What about primal-dual problems ?

Everything is the same: min hc, xi

s.t. Ax = b x K

max hy, bi

s.t. A∗y + s = c s K∗

i.e. (What’s the optimal value?)

min hc, xi − hb, yi min hs, xi + µ(f(x) + f(s)) s.t. Ax = b s.t. Ax = b

A∗y + s = c A∗y + s x K, s K∗

(22)

Strangely enough,

primal-dual IPM work very well

I All IPM optimization software (SeDuMi, MOSEK, . . . ) are primal-dual.

I Efficient IPM methods can solve:

Linear problems,

Second Order problems (ice-cream cone) – in particular Quadratic problems,

Semidefinite problems, and (sometimes)

Geometric problems, i.e. involving posynomials (see on Thursday)

(23)

And in practice?

Many speed-ups and tricks are used

I For computing the starting point

(and dealing with infeasible starting points)

I For solving the Newton system (reduction of variable) I For updating µ

Decrease µ much faster than in the theory,

then do several steps targeting the central path

(24)

Some references

[1] - Y. Nesterov, Introductory lectures on convex opti-mization: a basic course, Kluwer, 2003

[2] - Y. Nesterov and A. Nemirovski, Interior Point Algo-rithms in Convex Programming, SIAM, 1993

[3] - J. Renegar, A Mathematical View of Interior-Point Methods in Convex Optimization, SIAM, 2001

References

Related documents

If this depth value is smaller than the corresponding depth value in the z-buffer (ie. it is closer to the view point), both the depth value in the z-buffer and the color value in

The peptide intensity observed for each sample (WT: N-biotin Miz1 peptide crosslinking in TNF treated lysates, MT: N-biotin mutant Miz1 peptide crosslinking in TNF treated lysates,

Ang alam ko wala siya dito sa Pilipinas nasa Jap an siya kasama ang bitch na iyon, pero ang mga mem bers niya andito.... Malalaman mo na siya ang leade r kapag nakita mo ang

Normal metabolic effects of thyroid hormone on different tissues: Liver: increase glycolysis, cholesterol synthesis, and conversion of cholesterol into bile salts Adipose: amplifies

The return from forest products during land clearing accounted for up to 16 % of the total median OC in the lowland zone and 70 % in the highland zone, depending on the wage

Nesterov (1988), “On an approach to the construction of optimal methods of minimization of smooth convex functions”.

Perhaps the most consistent theme of the convex opti- mization literature concerns the importance of using the structural form of the CP (that is, the properties of the func- tions f

Optimization, numerical methods, linear programming, optimal con- trol, robust control, convex programming, interior-point methods, FIR lter design, conjugate