Efficiency of coordinate descent methods on huge-scale optimization problems

(1)

■

Efficiency of coordinate descent methods

on huge-scale optimization problems

Yu. Nesterov

Center for Operations Research

and Econometrics

Voie du Roman Pays, 34

B-1348 Louvain-la-Neuve

Belgium

http://www.uclouvain.be/core

(2)

Efficiency of coordinate descent methods on huge-scale optimization problems

Yu. NESTEROV1

January 2010

Abstract

In this paper we propose new methods for solving huge-scale optimization problems. For problems of this size, even the simplest full-dimensional vector operations are very expensive. Hence, we propose to apply an optimization technique based on random partial update of decision variables. For these methods, we prove the global estimates for the rate of convergence. Surprisingly enough, for certain classes of objective functions, our results are better than the standard worst-case bounds for deterministic algorithms. We present constrained and unconstrained versions of the method, and its accelerated variant. Our numerical test confirms a high efficiency of this technique on problems of very big size.

Keywords: Convex optimization, coordinate relaxation, worst-case efficiency estimates, fast gradient schemes, Google problem.

1_{Université catholique de Louvain, CORE and INMA, B-1348 Louvain-la-Neuve, Belgium.}

E-mail: [email protected]. This author is also member of ECORE, the association between CORE and ECARES.

The research results presented in this paper have been supported by a grant "Action de recherche concertées ARC 04/09-315" from the 'Direction de la recherche scientifique – Communauté française de Belgique".

This paper presents research results of the Belgian Program on Interuniversity Poles of Attraction initiated by the Belgian State, Prime Minister's Office, Science Policy Programming. The scientific responsibility is assumed by the author.

(3)

1 Introduction

Motivation. Coordinate descent methods were among the first optimization schemes suggested for solving smooth unconstrained minimization problems (see [1, 2] and refer-ences therein). The main advantage of these methods is the simplicity of each iteration, both in generating the search direction and in performing the update of variables. How-ever, very soon it became clear that the coordinate descent methods can be criticized in several aspects.

1. Theoretical justification. The simplest variant of the coordinate descent method is based on a cyclic coordinate search. However, for this strategy it is difficult to prove convergence, and almost impossible to estimate the rate of convergence1)_{. Another}

pos-sibility is to move along the direction corresponding to the component of gradient with maximal absolute value. For this strategy, justification of the convergence rate is trivial. For the future references, let us present this result right now. Consider the unconstrained minimization problem

min

x∈Rnf(x), (1.1)

where the convex objective functionf has component-wise Lipschitz continuous gradient: |∇if(x+hei)− ∇if(x)| ≤M|h|, x∈Rn, h∈R, i= 1, . . . , n, (1.2)

wheree_i is the ith coordinate vector inRn_{. Consider the following method:}

Choose x0 ∈Rn. Fork≥0 iterate

1. Choose i_k= arg max

1≤i≤n|∇if(xk)|. 2. Updatexk+1 =xk−_M1∇ikf(xk)eik. (1.3) Then f(x_k)−f(x_k₊₁) (1≥.2) ₂1_M|∇ikf(xk)|2 ≥ 1 2nMk∇f(xk)k2 ≥ ₂_{nM R}1 2(f(xk)−f∗)2,

whereR≥ kx₀−x∗_k_{, the norm is Euclidean, and}_f∗ _{is the optimal value of problem (1.1).}

Hence,

f(xk)−f∗ ≤ 2nM R

2

k+4 , k≥0. (1.4)

Note that at each test point, method (1.3) requires computation of the whole gradient vector. However, if this vector is available, it seems better to apply the usual full-gradient methods. It is also important that for convex functions with Lipschitz-continuous gradient: k∇f(x)− ∇f(y)k ≤ L(f)kx−yk, x, y∈Rn_, _(1.5)

(4)

it can happen thatM ≥O(L(f)). Hence, in general, the rate (1.4) isworsethan the rate of convergence of the simple Gradient Method (e.g. Section 2.1 in [6]).

2. Computational complexity. From the theory of fast differentiation it is known that for all functions defined by an explicit sequence of standard operations, the complexity of computing the whole gradient is proportional to the computational complexity of the value of corresponding function. Moreover, the coefficient in this proportion is a small absolute constant. This observation suggests that for coordinate descent methods the line-search strategies based on the function values are too expensive. Provided that the general directional derivative of the function has the same complexity as the function value, it seems that no room is left for supporting the coordinate descent idea. The versions of this method still appear in the literature. However, they are justified only by local convergence analysis for rather particular problems (e.g. [3, 8]).

At the same time, in the last years we can observe an increasing interest to opti-mization problems of a very big size (Internet applications, telecommunications). In such problems, even computation of a function value can require substantial computational efforts. Moreover, some parts of the problem’s data can be distributed in space and in time. The problem’s data may be only partially available at the moment of evaluating the current test point. For problems of this type, we adopt the termhuge-scaleproblems. These applications strongly push us backward to the framework of coordinate mini-mization. Therefore, let us look again at the above criticism. It appears, that there is a small chance for these methods to survive.

1. We can also think about the random coordinate search with pre-specified proba-bilities for each coordinate move. As we will see later, the complexity analysis of corresponding methods is quite straightforward. On the other hand, from techno-logical point of view, this strategy fits very well the problems with distributed or unstable data.

2. It appears, that the computation of acoordinate directional derivativecan be much simpler than computation of either a function value, or a directional derivative along arbitrarydirection.

In order to support this claim, let us look at the following optimization problem: min x∈Rn ½ f(x)def= Pn i=1fi(x (i)_{) +}1 2kAx−bk2 ¾ , (1.6)

where fi are convex differentiable univariate functions, A = (a1, . . . , an) ∈ Rp×n is an

p×n-matrix, andk · k is the standard Euclidean norm inRp_{. Then}

∇if(x) = fi0(x(i)) +hai, g(x)i, i= 1, . . . , n,

g(x) = Ax−b.

If the residual vector g(x) is already computed, then the computation of ith directional derivative requiresO(p_i) operations, wherep_i is the number of nonzero elements in vector

ai. On the other hand, the coordinate movex+=x+αei results in the following change

in the residual:

(5)

Therefore, the ith coordinate step for problem (1.6) needs O(pi) operations. Note that

computation of either the function value, or the whole gradient, or an arbitrary directional derivative requiresO

µ _n P

i=1pi

¶

operations. The reader can easily find many other examples of optimization problems with cheap coordinate directional derivatives.

The goal of this paper is to provide the random coordinate descent methods with the worst-case efficiency estimates. We show that for functions with cheap coordinate deriva-tives the new methods are always faster than the corresponding full-gradient schemes. A surprising result is that even for functions with expensive coordinate derivatives the new methods can be also faster since their rate of convergence depends on an upper estimate for the average diagonal element of the Hessian of the objective function. This value can be much smaller than the maximal eigenvalue of the Hessian, entering the worst-case complexity estimates of the black-box gradient schemes.

Contents. The paper is organized as follows. In Section 2 we analyze the expected rate of convergence of the simplest Random Coordinate Descent Method. It is shown that it is reasonable to define probabilities for particular coordinate directions using the estimates of Lipschitz constants for partial derivatives. In Section 3 we show that for the class of strongly convex functions RCDM converges with a linear rate. By applying a regularization technique, this allows to solve the unconstrained minimization problems with arbitrary high confidence level. In Section 4 we analyze a modified version of RCDM as applied to constrained optimization problem. In Section 5 we show that unconstrained version of RCDM can be accelerated up to the rate of convergence O(1

k2), where k is

the iteration counter. Finally, in Section 6 we discuss the implementation issues. In Section 6.1 we show that the good estimates for coordinate Lipshitz constants can be efficiently computed. In Section 6.2 we present an efficient strategy for generating random coordinate directions. And in Section 6.3 we present preliminary computational results.

Notation. In this paper we work with real coordinate spaces Rn _{composed by column}

vectors. Forx, y∈Rn _denote

hx, yi = Pn

i=1x (i)_y(i)_.

We use the same notationh·,·ifor spaces of different dimension. Thus, its actual sense is defined by the space containing the arguments. If we fix a normk · kinRn_{, then the dual}

norm is defined in the standard way:

ksk∗ ₌ _max

khk=1hs, xi. (1.7)

We denote by s# _{an arbitrary vector from the set}

Arg max

x

h

hs, xi −1₂kxk2i_. _(1.8)

Clearly,ks#_k₌_k_s_k∗_.

For function f(x), x ∈Rn_{, we denote by} _∇_f₍_x_{) its} _{gradient, which is a vector from}

Rn _{composed by partial derivatives. By}_∇2_f₍_x_{) we denote the Hessian of}_f _at_x_{. In the}

(6)

Lemma 1 Let us fix a decomposition of Rn on k subspaces. For positive semidefinite symmetric matrix A∈Rn×n_{, denote by}_A

i,i the corresponding diagonal blocks. If

Ai,i ¹ LiBi, i= 1, . . . , k,

where all Bi º0, then

A ¹ Ã k P i=1 L_i ! ·diag{B_i}k i=1, where diag{B_i}k

i=1 is a block-diagonal(n×N)-matrix with diagonal blocks Bi. In

partic-ular,

A ¹ n·diag{Ai,i}ni=1.

Proof:

Denote byx(i) _{the part of vector}_x _∈_Rn_{, belonging to} _i_{th subspace. Since} _A _{is positive}

semidefinite, we have hAx, xi = Pk i=1 k P j=1 hAi,jx(i), x(j)i ≤ Ã k P i=1 hAi,ix(i), x(i)i1/2 !₂ ≤ Ã k P i=1 L1_i/2hBi,ix(i), x(i)i1/2 !₂ ≤ Pk i=1 Li· k P i=1 hBix(i), x(i)i. 2

2 Coordinate relaxation for unconstrained

minimization

Consider the following unconstrained minimization problem min

x∈RNf(x), (2.1)

where the objective function f is convex and differentiable onRN_{. We assume that the}

optimal setX∗ _{of this problem is nonempty and bounded.}

Let us fix a decomposition of RN _on _n_subspaces:

RN ₌ Nn i=1R

ni_, _N ₌ Pn

i=1ni.

Then we can define the corresponding partition of the unit matrix

IN = (U1, . . . , Un)∈RN×N, Ui ∈RN×ni, i= 1, . . . , n.

Thus, any x= (x(1)_{, . . . , x}(n)₎T _∈_RN _{can be represented as}

x = Pn

i=1Uix

(7)

Then the partial gradientof f(x) inx(i) is defined as

f_i0(x) =U_iT∇f(x)∈Rni_, _x_∈_RN_. For spacesRni_{, let us fix some norms}_k·k

(i),i= 1, . . . , n. We assume that the gradient

of function f is coordinate-wise Lipschitz continuous with constantsL_i =L_i(f): kf0

i(x+Uihi)−fi0(x)k∗(i) ≤ Likhik(i), hi ∈Rni, i= 1, . . . , n, x∈RN. (2.2)

For the sake of simplicity, let us assume that these constants are known. By the standard reasoning (e.g. Section 2.1 in [6]), we can prove that

f(x+U_ih_i)≤f(x) +hf0

i(x), hii+L₂ikhik2₍_i₎, x∈RN, hi ∈Rni, i= 1, . . . , n. (2.3)

Let us define the optimal coordinate steps:

Ti(x) def= x−_L1_iUifi0(x)#, i= 1, . . . , n,

Then, in view of the bound (2.3), we get

f(x)−f(Ti(x)) ≥ ₂1_L_i ³ kf0 i(x)k∗(i) ´₂ , i= 1, . . . , n. (2.4) In our random algorithm, we need a special random counterRα,α∈R, which

gener-ates an integer numberi∈ {1, . . . , n} with probability

p(αi) = Lαi · " n P j=1 Lα j #₋₁ , i= 1, . . . , n. (2.5) Thus, the operation i = R_α means that an integer random value, chosen from the set {1, . . . , n} in accordance to probabilities (2.5), is assigned to variable i. Note that R0

generates a uniform distribution.

Now we can present the scheme of Random Coordinate Descent Method. It needs a starting point x0 ∈RN and valueα∈R as input parameters.

Method RCDM(α, x₀) Fork≥0 iterate:

1) Choose ik =Rα.

2) Updatex_k₊₁ =T_i_k(x_k).

(2.6)

For estimating the rate of convergence of RCDM, we introduce the following norms: kxkα = · _n P i=1L α ikx(i)k2(i) ¸₁_/₂ , x∈RN_, kgk∗ α = · _n P i=1L −α i ³ kg(i)_k∗ (i) ´₂¸1/2 , g∈RN_. (2.7)

(8)

Clearly, these norms satisfy Cauchy-Schwartz inequality: kgk∗

α· kxkα ≥ hg, xi, x, g ∈RN. (2.8)

In the sequel, we use notation Sα=Sα(f) = n

P

i=1L

α

i(f) withα≥0. Note that

S0(f) ≡ n.

Let us link our main assumption (2.2) with a full-dimensional Lipschitz condition for the gradient of the objective function.

Lemma 2 Let f satisfy condition (2.2). Then for anyα∈R we have k∇f(x)− ∇f(y)k∗ 1−α ≤ Sαkx−yk1−α, x, y∈RN. (2.9) Therefore, f(x) ≤ f(y) +h∇f(y), x−yi+1₂Sαkx−yk1−2 α, x, y∈RN. (2.10) Proof: Indeed, f(x)−f∗ (2≥.4) max 1≤i≤n 1 2Li ³ kf_i0(x)k∗₍_i₎´2 ≥ ₂_S1 α n P i=1 Lα i Li ³ kf0 i(x)k∗(i) ´₂ = ₂_S1 α ¡ k∇f(x)k∗ 1−α ¢₂ .

Applying this inequality to function φ(x) =f(x)−f(y)− h∇f(y), x−yi, we obtain

f(x) ≥ f(y) +h∇f(y), x−yi+ ₂_S1_α¡k∇f(x)− ∇f(y)k∗ 1−α

¢₂

, x, y∈RN_. _(2.11)

Adding two variants of (2.11) with xan y interchanged, we get

1 Sα ¡ k∇f(x)− ∇f(y)k∗ 1−α ¢₂ ≤ h∇f(x)− ∇f(y), x−yi (2.8) ≤ k∇f(x)− ∇f(y)k∗ 1−α· kx−yk1−α.

Inequality (2.10) can be derived from (2.9) by simple integration. 2

After k iterations, RCDM(α, x0) generates a random output (xk, f(xk)), which

de-pends on the observed implementation of random variable

ξk = {i0, . . . , ik}.

Let us show that the expected value

φk def= Eξk−1f(xk)

(9)

Theorem 1 For any k≥0 we have φk−f∗ ≤ _k₊₄2 · " n P j=1L α j # ·R2 1−α(x0), (2.12) where Rβ(x0) = max_x ½ max x∗∈X∗ kx−x∗kβ : f(x)≤f(x0) ¾ . Proof:

LetRCDM generate implementationxk of corresponding random variable. Then

f(x_k)−E_i_k(f(x_k₊₁)) = Pn i=1 p(αi)·[f(xk)−f(Ti(xk))] (2.4) ≥ Pn i=1 p(αi) 2Li ³ kf_i0(xk)k∗₍_i₎ ´_{2 (2}_.₅₎ = ₂_S1_α(k∇f(xk)k∗1−α)2. (2.13)

Note thatf(xk)≤f(x0). Therefore,

f(xk)−f∗ ≤ min x∗∈X∗ h∇f(xk), xk−x∗i (2.8) ≤ k∇f(xk)k∗1−αR1−α(x0). Hence, f(x_k)−E_i_k(f(x_k₊₁)) ≥ 1 C(f(xk)−f∗)2,

whereCdef= 2SαR21−α(x0). Taking the expectation of both sides of this inequality inξk−1,

we obtain φ_k−φ_k₊₁ ≥ 1 CEξk−1[(f(xk)−f∗)2] ≥ _C1(φk−f∗)2. Thus, 1 φk+1−f∗ − 1 φk−f∗ = φk−φk+1 (φk+1−f∗)(φk−f∗) ≥ φk−φk+1 (φk−f∗)2 ≥ 1 C,

and we conclude that _φ 1 k−f∗ ≥ 1 φ0−f∗ + k C (2.10) ≥ k+4_C for anyk≥0. 2

Let us look at the most important variants of the estimate (2.12). • α= 0. ThenS₀=n, and we get

φ_k−f∗ _≤ 2n

k+4 ·R21(x0). (2.14)

Note that problem (2.1) can be solved by the standard full-gradient method endowed with the metrick · k₁. Then its rate of convergence can be estimated as

f(xk)−f∗ ≤ γ_kR12(x0),

where the constantγ is big enough to ensure

f00₍_x₎_¹_γ_·_diag_{_L

i·Ini}ni=1, x∈E,

andIk is a unit matrix inRk. (Assume for a moment thatf is twice differentiable.)

However, since the constantsLiare the upper bounds for the block-diagonal elements

of the Hessian, in the worst case we have γ = n. Hence, the worst-case rate of convergence of this variant of the gradient method is proportional to that one of

(10)

• α = 1₂. Consider the caseni = 1, i = 1, . . . , n. Denote byD∞(x0) the size of the

initial level set measured in the infinity-norm:

D∞(x0) = max_x ½ max y∈X∗_1≤max_i_≤_n|x (i)₋_y(i)_|_: _f₍_x₎_≤_f₍_x 0) ¾ . Then,R2 1/2(x0)≤S1/2D∞2 (x0), and we obtain φ_k−f∗ _≤ 2 k+4· · _n P i=1 L1_i/2 ¸₂ ·D2 ∞(x0). (2.15)

Note that for the first order methods, the worst-case dimension-independent com-plexity of minimizing the convex functions over ann-dimensional box is infinite [4]. Since for some problems the valueS₁_/₂ can be bounded even for very big (or even infinite) dimension, the estimate (2.15) shows that RCDM can work in situations where the usual gradient methods have no theoretical justification.

• α= 1. Consider the case when all norms k · k₍_i₎ are the standard Euclidean norms of Rni, i= 1, . . . , n. Then R

0(x0) is the size of the initial level set in the standard

Euclidean norm ofRN_{, and the rate of convergence of} _RCDM₍₁_{, x}

0) is as follows: φk−f∗ ≤ _k₊₄2 · · _n P i=1Li ¸ ·R2 0(x0) ≡ _k2₊₄n · · 1 n n P i=1Li ¸ ·R2 0(x0). (2.16)

At the same time, the rate of convergence of the standard gradient method can be estimated as

f(xk)−f∗≤ γ_kR20(x0),

whereγ satisfies condition

f00(x)¹γ·IN, x∈RN.

Note that the maximal eigenvalue of symmetric matrix can reach its trace. Hence, in the worst case, the rate of convergence of the gradient method is the same as the rate ofRCDM. However, the latter method has much more chances to accelerate.

3 Minimizing strongly convex functions

Let us estimate now the performance of RCDM on strongly convex functions. Recall thatf is called strongly convex onRN _{with convexity parameter}_σ ₌_σ₍_f₎_>_{0 if for any}

x and y fromRN we have

f(y) ≥ f(x) +h∇f(x), y−xi+ 1

2σ(f)ky−xk2. (3.1)

Minimizing both sides of this inequality iny, we obtain a useful bound

(11)

Theorem 2 Let function f(x) be strongly convex with respect to the norm k · k1−α with

convexity parameter σ1−α = σ1−α(f) > 0. Then, for the sequence {xk} generated by

RCDM(α, x0) we have φk−φ∗ ≤ ³ 1−σ1−α(f) Sα(f) ´_k (f(x0)−f∗). (3.3) Proof:

In view of inequality (2.13), we have

f(xk)−Eik(f(xk+1)) ≥ 1 2Sα(k∇f(xk)k ∗ 1−α)2 (3.2) ≥ σ1−α Sα (f(xk)−f ∗₎_.

It remains to compute the expectation inξk−1. 2

At this moment, we are able to prove only that the expected quality of the output of RCDM is good. However, in practice we are not going to run this method many times on the same problem. What is the probability that our single run can give us also a good result? In order to answer this question, we need to apply RCDM to a regularized objective of the initial problem. For the sake of simplicity, let us endow the spaces Rni with some Euclidean norms:

kh(i)_k2

(i) = hBih(i), h(i)i, h(i)∈Rni, (3.4)

whereBiÂ0,i= 1, . . . , n. We present the complexity results for two ways of measuring

distances inRN_.

1. Let us use the norm k · k1: khk2 1 = n P i=1 LihBih(i), h(i)i def= hBLh, hi, h∈RN. (3.5)

Since this norm is Euclidean, the regularized objective function

fµ(x) = f(x) +µ₂kx−x0k21

is strongly convex with respect to k · k1 with convexity parameter µ. Moreover,

S0(fµ) = n

for any value ofµ >0. Hence, by Theorem 2, methodRCDM(0, x0) can quickly approach

the valuef∗

µ = min

x∈RNfµ(x). The following result will be useful.

Lemma 3 Let random pointxk be generated by RCDM(0, x0) as applied to function fµ.

Then E_ξ_k(k∇f_µ(x_k)k∗ 1) ≤ h 2(n+µ)·(f(x₀)−f∗₎_·¡₁₋ µ n ¢_ki1/2 . (3.6) Proof:

In view of Lemma 2, for anyh= (h(1), . . . , h(n))∈RN we have

f_µ(x+h) ≤ f(x) +h∇f(x), hi+n

2khk21+µ2kx+h−x0k21 (3.5)

(12)

Thus, function fµ has Lipschits-continuous gradient (e.g. (2.1.6) in Theorem 2.1.5, [6])

with constantL(fµ)def= n+µ. Therefore, (see, for example, (2.1.7) in Theorem 2.1.5, [6]),

1 2L(fµ)(k∇fµ(x)k ∗ 1)2 ≤ fµ(x)−fµ∗, x∈RN. Hence, E_ξ_k(k∇f_µ(x_k)k∗ 1) ≤ Eξk µh 2(n+µ)·(f_µ(x_k)−f∗ µ) i₁_/₂¶ ≤ h2(n+µ)·(E_ξ_k(fµ(xk))−fµ∗) i₁_/₂ (3.3) ≤ h2(n+µ)·¡1−_nµ¢k(f(x₀)−f∗ µ) i₁_/₂ .

It remains to note thatf∗

µ ≥f∗. 2

Now we can estimate the quality of the random pointxk generated by RCDM(0, x0),

taken as an approximate solution to problem (2.1). Let us fix the desired accuracy of the solution ² >0 and the confidence levelβ ∈(0,1).

Theorem 3 Let us define µ= ₄_R2²

1(x0), and choose k ≥ 1 +8nR21(x0) ² ln 2nR2 1(x0) ²(1−β) . (3.7)

If the random pointxk is generated by RCDM(0, x0) as applied to function fµ, then

Prob(f(x_k)−f∗ _≤_²₎ _≥ _β.

Proof:

Note that f(xk) ≤ fµ(xk) ≤fµ(x0) = f(x0). Therefore, there exists x∗ ∈X∗ such that kx_k−x∗k1≤R1(x0)def= R. Hence,kxk−x0k1 ≤ kxk−x∗k1+kx∗−x0k1≤2R. Since

∇f_µ(x) = ∇f(x) +µ·B_L(x−x₀), (3.8) we conclude that Prob(f(xk)−f∗ ≥²) ≤ Prob(k∇f(xk)k∗1·R≥²) (3.8) ≤ Prob(k∇fµ(xk)k∗1·R+ 2µR2≥²) = Prob ¡k∇f_µ(x_k)k∗ 1 ≥ 2²R ¢ .

Now, using Chebyshev inequality, we obtain

Prob ¡k∇fµ(xk)k∗1 ≥ 2²R ¢ ≤ 2_²R·Eξk(k∇fµ(xk)k∗1) (3.6) ≤ 2_²R·h2(n+µ)·(f(x0)−f∗)· ¡ 1− µ_n¢ki1/2.

(13)

Since the gradient of function f is Lipschitz continuous with constantn, we get

f(x₀)−f∗ _≤ n

2R2.

Taking into account that (n+µ)(1−µ_n)k_≤_n₍₁₋µ

n)k−1, we obtain the following bound:

Prob ¡k∇f_µ(x_k)k∗ 1≥ 2²R ¢ ≤ 2nR2 ² · ¡ 1−µ_n¢k−21 ≤ 2nR2 ² ·e− µ(k−1) 2n ≤ 1−β.

This proves the statement of the theorem. 2

For the current choice of norm (3.5), we can guarantee only L(f)≤n. Therefore, the standard gradient method as applied to the problem (2.1) has the worst-case complexity bound of

O³nR21(x0)

²

´

full-dimensional gradient iterations. Up to a logarithmic factor, this bound coincides with (3.7).

2. Consider now the following norm: khk2₀ = Pn

i=1hBih

(i)_{, h}(i)_i def₌ _h_{Bh, h}_i_, _h_∈_RN_, _(3.9)

Again, the regularized objective function

fµ(x) = f(x) +µ₂kx−x0k20

is strongly convex with respect to k · k0 with convexity parameter µ. However, now

S1(fµ) = n

P

i=1[Li(f) +µ] = S1(f) +nµ.

Since L(f_µ) =S₁(f) +µ, using the same arguments as in the proof of Lemma 3, we get the following bound.

Lemma 4 Let random pointx_k be generated by RCDM(1, x₀) as applied to function f_µ. Then Eξk(k∇fµ(xk)k∗0) ≤ · 2(S1(f) +µ)·(f(x0)−f∗)· ³ 1−_S₁₍_fµ₎₊_nµ´k ¸₁_/₂ . (3.10) Using this lemma, we can prove the following theorem.

Theorem 4 Let us define µ= ₄_R2²

0(x0), and choose k ≥ 2hn+ 4S1(f)R20(x0) ² i ·³ln₁₋1_β + ln³1₂ +2S1(f)R20(x0) ² ´´ . (3.11)

If the random pointx_k is generated by RCDM(1, x0) as applied to function fµ, then

(14)

Proof:

As in the proof of Theorem 3, we can prove thatkxk−x0k0≤2R≡2R0(x0). Since ∇fµ(x) = ∇f(x) +µ·B(x−x0), (3.12) we conclude that Prob(f(x_k)−f∗ _≥_²₎ _≤ _Prob₍_k∇_f₍_x k)k∗0·R≥²) (3.12) ≤ Prob(k∇f_µ(x_k)k∗ 0·R+ 2µR2 ≥²) = Prob ¡k∇fµ(xk)k∗0 ≥ 2²R ¢ .

Now, using Chebyshev inequality, we obtain

Prob ¡k∇fµ(xk)k∗0 ≥ 2²R ¢ ≤ 2_²R·Eξk(k∇fµ(xk)k∗0) (3.10) ≤ 2R ² · · 2(S₁(f) +µ)·(f(x₀)−f∗₎_·³₁₋ µ S1(f)+nµ ´_k¸1/2 .

Since the gradient of function f is Lipschitz continuous with constantS1(f), we get

f(x0)−f∗ ≤ 1₂S1(f)R2.

Thus, we obtain the following bound:

Prob ¡k∇fµ(xk)k∗1 ≥ 2²R ¢ ≤ 2R_²2 ·(S1(f) +µ)· ³ 1−_S µ 1(f)+nµ ´k 2 ≤ 2R_²2 ·(S1(f) +µ)·e− µk 2(S1(f)+nµ) _≤ ₁₋_β.

This proves the statement of the theorem. 2

We conclude the section with some remarks.

• The dependence of complexity bounds (3.7), (3.11) in the confidence levelβ is very moderate. Hence, even very high confidence level is easily achievable.

• The standard gradient method (GM) has the complexity bound of O³L(f)R20(x0)

²

´

full-dimensional gradient iterations. Note that in the worst case L(f) can reach

S₁(f). Hence, for the class of objective functions treated in Theorem 4, the worst-case complexity bounds of RCDM(1, x0) and GM are essentially the same. Note

that RCDM needs a certain number of full cycles. But this number grows propor-tionally to the logarithms of accuracy and of the confidence level. Note that the computational cost of a single iteration of RCDM is very often much smaller than that of GM.

• Consider very sparse problems with the cost of n coordinate iterations being pro-portional to a single full-dimensional gradient iterations. Then the complexity bound of RCDM(1, x0) in terms of the groups of iterations of size n becomes

O∗³_{1 +}S1 n · R2 0(x0) ² ´

. As compared with the gradient method, in this complexity bound the largest eigenvalue of the Hessian is replaced by an estimate for itsaverage eigenvalue.

(15)

4 Constrained minimization

Consider now the constrained minimization problem

min

x∈Q f(x), (4.1)

where Q = Nn

i=1Qi, and the sets Qi ⊆ R

ni_, _i _{= 1}_{, . . . , n}_{, are closed and convex. Let us} endow the spaces Rni _{with some Euclidean norms (3.4), and assume that the objective} functionf of problem (4.1) is convex and satisfies our main smoothness assumption (2.2).

We can define now the constrained coordinate update as follows:

u(i)₍_x_{) = arg min} u(i)_∈_Q_i h hf0 i(x), u(i)−x(i)i+L2iku(i)−x(i)k2(i) i , Vi(x) = x+UiT(u(i)(x)−x(i)), i= 1, . . . , n. (4.2)

The optimality conditions for these optimization problems can be written in the following form:

hf0

i(x) +LiBi(u(i)(x)−x(i)), ui−u(i)(x)i ≥ 0 ∀u(i)∈Qi, i= 1, . . . , n. (4.3)

Using this inequality for u(i) ₌_x(i)_{, we obtain}

f(Vi(x)) (2.3) ≤ f(x) +hf_i0(x), u(i)(x)−x(i)i+Li 2 ku(i)(x)−x(i)k2(i) (4.3) ≤ f(x) +hLiBi(u(i)(x)−x(i)), xi−u(i)(x)i+L₂iku(i)(x)−x(i)k2₍_i₎. Thus, f(x)−f(Vi(x)) ≥ L₂iku(i)(x)−x(i)k2₍_i₎, i= 1, . . . , n. (4.4)

Let us apply to problem (4.1) the uniformcoordinate decent method (UCDM).

Method U CDM(x0)

For k≥0 iterate:

1) Choose randomlyi_k by uniform distribution on{1. . . n}. 2) Updatexk+1=Vik(xk).

(4.5)

Theorem 5 For any k≥0 we have

φ_k−f∗ _≤ n n+k · h 1 2R21(x0) +f(x0)−f∗ i .

If f is strongly convex ink · k1 with constant σ, then

φk−f∗ ≤ ³ 1− _n₍₁₊2σ_σ₎´k·³1₂R2 1(x0) +f(x0)−f∗ ´ . (4.6)

(16)

Proof:

We will use notation of Theorem 1. Let UCDM generate an implementation xk of

corre-sponding random variable. Denote

r2_k = kxk−x∗k21 = n P i=1LihBi ³ x(_ki)−x(∗i) ´ , x(_ki)−x(∗i)i. Then r2 k+1 =rk2+ 2LikhBik ³ u(ik)₍_x k)−x(_kik) ´ , x(ik) k −x (ik) ∗ i+Likku(ik)(xk)−x (ik) k k2(ik) =r2 k+ 2LikhBik ³ u(ik)₍_x k)−x(_kik) ´ , u(ik)₍_x k)−x(∗ik)i −Likku(ik)(xk)−x (ik) k k2(ik) (4.3) ≤ r2 k+ 2hfi0k(xk), x (ik) ∗ −u(ik)(xk)i −Likku(ik)(xk)−x (ik) k k2(ik) =r2_k+ 2hf_i0_k(xk), x(∗ik)−x(kik)i −2hhf0 ik(xk), u (ik)₍_x k)−x(_kik)i+1₂Likku(ik)(xk)−x (ik) k k2(ik) i (2.3) ≤ r2 k+ 2hfi0k(xk), x (ik) ∗ −x(_kik)i+ 2 [f(xk)−f(Vik(xk))]. Taking the expectation inik, we obtain

Eik ³ 1 2rk2+1+f(xk+1)−f∗ ´ ≤ 1₂r2_k+f(xk)−f∗+_n1h∇f(xk), x∗−xki. (4.7)

Thus, for anyk≥0 we have

1 2r02+f(x0)−f∗ ≥φk+1−f∗+_n1 " h∇f(x0), x0−x∗i+ k P i=1Eξi−1(h∇f(xi), xi−x∗i) # ≥φk+1−f∗+_n1 k P i=0 (φi−f∗) (4.4) ≥ ³1 +k+1_n ´(φk+1−f∗).

Finally, let f be strongly convex in k · k1 with convexity parameter σ.2) Then, (e.g.

Section 2.1 in [6]),

h∇f(x), x−x∗i ≥f(x)−f∗+σ₂kx−x∗k21≥σkx−x∗k21.

Define β= ₁₊2σ_σ ∈[0,1]. Then, in view of inequality (4.7), we have

E_i_k³1 2rk2+1+f(xk+1)−f∗ ´ ≤ 1 2r2k+f(xk)−f∗ −1 n £ β(f(x_k)−f∗₊σ 2rk2) + (1−β)σrk2 ¤ = ³1− 2σ n(1+σ) ´ ·³1 2r2k+f(xk)−f∗ ´ .

It remains to take expectation inξk−1. 2

(17)

5 Accelerated coordinate descent

It is well known that the usual gradient method can be transformed in a faster scheme by applying an appropriate multistep strategy [5]. Let us show that this can be done also for random coordinate descent methods. Consider the following accelerated scheme as applied to the unconstrained minimization problem (2.1) with strongly convex objective function. We assume that the convexity parameter σ=σ₁(f)≥0 is known.

Method ACDM(x₀) 1. Definev0 =x0,a0 = _n1,b0 = 2. 2. Fork≥0 iterate: 1) Compute γ_k≥ 1 n from equation γk2−γnk = ¡ 1−γkσ n ¢_a2 k b2 k . Setα_k= n−γkσ γk(n2−σ), and βk= 1− 1 nγkσ. 2) Selectyk=αkvk+ (1−αk)xk.

3) Choose i_k=R₀ and update

xk+1=Tik(yk), vk+1 =βkvk+ (1−βk)yk− γk L_ikUikfi0(yk)#. 4) Setb_k₊₁= √bk βk , and a_k₊₁ =γ_kb_k₊₁. (5.1)

For n= 1 and σ = 0 this method coincides with method [5]. Its parameters satisfy the following identity: γ2 k−γnk = βk a2 k b2 k = βkγk n ·1−ααkk. (5.2)

Theorem 6 For any k≥0 we have

φk−f∗ ≤ σ h 2kx0−x∗k21+n12(f(x0)−f∗) i · ·³ 1 +√₂_nσ´k+1−³1−√₂_nσ´k+1 ¸₋₂ ≤ ³ n k+1 ´₂ ·h2kx0−x∗k21+n12(f(x0)−f∗) i . (5.3) Proof:

Let x_k and v_k be the implementations of corresponding random variables generated by

ACDM(x0) afterkiterations. Denote rk2=kvk−x∗k21. Then using representation

(18)

we obtain r2 k+1 = kβkvk+ (1−βk)yk−x∗k21+ γ2 k L_ik ³ kf0 ik(yk)k ∗ (ik) ´₂ +2γ_khf0 ik(yk),(x ∗₋_β kvk−(1−βk)yk)(ik)i ≤ kβ_kv_k+ (1−β_k)y_k−x∗_k2 1+ 2γk2(f(yk)−f(Tik(yk))) +2γkhfi0k(yk), ³ x∗−yk+βk(1−_α_kαk)(xk−yk ´₍_i_k₎ i.

Taking the expectation of both sides ini_k, and we obtain:

Eik(r 2 k+1) ≤ βkrk2+ (1−βk)kyk−x∗k12+ 2γk2[f(yk)−Eik(f(xk+1))] +2γk nh∇f(yk), x∗−yk+βk(1−αkαk)(xk−yk)i ≤ β_kr2 k+ (1−βk)kyk−x∗k21+ 2γk2[f(yk)−Eik(f(xk+1))] +2γk n[f∗−f(yk)−12σkyk−x∗k12+βk(1−αkαk)(f(xk)−f(yk))] (5.2) = βkrk2−2γk2Eik(f(xk+1)) + 2 γk n[f∗+ βk(1−αkαk)f(xk)]. Note that b2_k₊₁ = _β1 kb 2 k, a2k+1 = γk2b2k+1, γkβk(1−_nα_kαk) = a 2 k b2 k+1.

Therefore, multiplying the last inequality by b2

k+1, we obtain b2_k₊₁Eik(r 2 k+1) ≤ b2kr2k−2ak2+1(Eik(f(xk+1))−f ∗_{) + 2}_a2 k(f(xk)−f∗).

Taking now the expectation of both sides of this inequality inξ_k₋₁, we get 2a2_k₊₁(φk+1−f∗) +b2k+1Eξk(r

2

k+1) ≤ 2a2k(φk−f∗) +b2kr2k

≤ 2a2

0(f(x0)−f∗) +b20kx0−x∗k21.

It remains to estimate the growth of coefficientsak and bk. We have:

b2 k = βkb2k+1 = ¡ 1−σ nγk ¢ b2 k+1 = ³ 1−σ n ak+1 bk+1 ´ b2 k+1. Thus, σ_na_k₊₁b_k₊₁≤b2

k+1−b2k ≤2bk+1(bk+1−bk), and we conclude that

bk+1 ≥ bk+₂σ_nak. (5.4)

On the other hand, a2k+1

b2 k+1 − ak+1 nbk+1 (5.2) = βka2k b2 k = a2k b2 k+1 . Therefore, 1 nak+1bk+1 ≤a2k+1−ak2 ≤2ak+1(ak+1−ak),

(19)

and we obtain ak+1 ≥ ak+ ₂1_nbk. (5.5) Further, denoting Q1 = 1 + √ σ 2n and Q2 = 1− √ σ

2n and using inequalities (5.4) and (5.5),

it is easy to prove by induction that

ak ≥ √1_σ

h

Qk₁+1−Qk₂+1i, bk ≥ Qk1+1+Qk2+1.

Finally, using trivial inequality (1 +t)k₋₍₁₋_t₎k _≥₂_kt_,_t_≥_{0, we obtain}

Qk₁+1−Qk₂+1≥ k+1

n

√

σ.

2

The rate of convergence (5.3) of ACDM is much better than the rate (2.14). However, for some applications (e.g. Section 6.3), the complexity of one iteration of the accelerated scheme is rather high since for computing yk it needs to operate with full-dimensional

vectors.

6 Implementation details and numerical test

6.1 Dynamic adjustment of Lipschitz constants

In RCDM (2.6) we use the valid upper bounds for Lipschitz constants {Li}ni=1 of the

directional derivatives. For some applications (e.g. Section 6.3) this information is easily available. However, for more complicated functions we need to apply a dynamic adjust-ment procedure for finding appropriate bounds. Let us estimate the efficiency of a simple backtracking strategy with restore (e.g. [7]) inserted inRCDM(0, x0). As we have already

discussed, such a strategyshould not be based on computation of the function values. Consider the Random Adaptive Coordinate Decent Method. For the sake of notation,

(20)

we assume thatN =n.

Method RACDM(x0)

Setup: lower bounds ˆL_i :=L0

i ∈(0, Li], i= 1, . . . , n. For k≥0 iterate: 1) Choose ik=R0. 2) Setxk+1:=xk−Lˆ−1ik · ∇ikf(xk)·eik. while ∇ikf(xk)· ∇ikf(xk+1) < 0 do n ˆ Lik := 2 ˆLik, xk+1 :=xk−Lˆ −1 ik · ∇ikf(xk)·eik o 3) SetLik := 1 2Lik. (6.1)

Theorem 7 1. At the beginning of each iteration, we have ˆ

Li ≤Li, i= 1, . . . , n.

2. RACDM has the following rate of convergence:

φ_k−f∗ _≤ 8nR12(x0)

16+3k , k≥0. (6.2)

3. After iteration k, the total number Nk of computations of directional derivatives in

method (6.1) satisfies inequality

N_k ≤ 2(k+ 1) + Pn i=1 log₂ Li L0 i. (6.3) Proof:

For proving the first statement, we assume that at the entrance to the internal cycle in method (6.1), we have ˆL_i_k ≤L_i_k. If during this cycle we get ˆL_i_k > L_i_k then immediately

|∇_i_kf(x_k₊₁)− ∇_i_kf(x_k)| (2≤.2) Lˆ_i−1_k L_i_k|∇_i_kf(x_k)| < |∇_i_kf(x_k)|.

In this case, the termination criterion is satisfied. Thus, during the internal cycle ˆ

Lik ≤ 2Lik. (6.4)

(21)

Further, the internal cycle is terminated with ˆLiksatisfying inequality (6.4). Therefore, using inequality (2.1.17) in [6], we have:

f(xk)−f(xk+1) ≥ ∇ikf(xk+1)· ³ x(ik) k −x(ki+1k) ´ + ₂_L1 ik (∇ikf(xk+1)− ∇ikf(xk)) 2 = Lˆ−1_i_k · ∇ikf(xk+1)· ∇ikf(xk) + 1 2L_ik (∇ikf(xk+1)− ∇ikf(xk)) 2 (6.4) ≥ 1 2L_ik h (∇_i_kf(x_k₊₁))2+ (∇_i_kf(x_k))2− ∇_i_kf(x_k₊₁)· ∇_i_kf(x_k)i ≥ ₈_L3 ik (∇ikf(xk)) 2_.

Thus, we obtain the following bound:

f(xk)−Eik(f(xk+1)) ≥ 3

8n(k∇f(xk)k∗1)2,

(compare with (2.13)). Now, using the same arguments as in the proof of Theorem 1, we can prove that

φk−φk+1 ≥ _C1(φk−f∗)2

withC = 8₃nR2

1(x0). And we conclude that 1 φk−f∗ ≥ 1 f(x0)−f∗ +_Ck ≥ _nR22 1(x0)+ 3k 8nR2 1(x0).

Finally, let us find an upper estimate for N_k. Denote by Ki a subset of iteration

numbers {0, . . . , k}, at which the index i was active. Denote by pi(j) the number of

computations of ith partial derivative at iteration j ∈ Ki. If ˆLi is the current estimate

of the Lipschitz constant Li in the beginning of iteration j, and ˆL0i is the value of this

estimate in the end of this iteration, then these values are related as follows: ˆ

L0

i = 12 ·2pi(j)−1·Lˆi.

In other words,p_i(j) = 2 + log₂hLˆ0

i/Lˆi

i

. Taking into account the statement of Item 1, we obtain the following estimate for Mi, the total number of computations of ith partial in

the firstkiterations:

Mi ≤ 2· |Ki|+ log2 _LLi0 i. It remains to note that Pn

i=1

|Ki|=k+ 1. 2

Note that the similar technique can be used also in the accelerated version (6.1). For that, we need to choose its parameters from the equations

4 3γk2−γnk = ¡ 1−γkσ n ¢a2 k b2 k, βk = 1−γk_nσ, 1−αk αk = n βk · ³ 4 3γk−1n ´ . (6.5)

(22)

6.2 Random counters

For problems of very big size, we should treat carefully any operation with multidimen-sional vectors. In RCD-methods, there is such an operation, which must be fulfilled at each step of the algorithms. This is the random generation of active coordinates. In a straightforward way, this operation can be implemented in O(n) operations. However, for huge-scale problems this complexity can be prohibitive. Therefore, before presenting our computational results, we describe a strategy for generating random coordinates with complexityO(lnn) operations.

Given the values Lα

i, i = 1, . . . , n, we need to generate efficiently random integer

numbersi∈ {1, . . . , n} with probabilities

Prob[i=k] = Lα_k/ " n P j=1L α j # , k= 1, . . . , n.

Without loss of generality, we assume that n = 2m_{. Define} _m_{+ 1 vectors} _P k ∈R2 m−k , k= 0, . . . , m, as follows: P₀(i) = Lα i, i= 1, . . . , n. P_k(i) = P_k(2₋₁i)+P_k(2₋₁i−1), i= 1, . . . ,2m−k_, _k_{= 1}_{, . . . , m.} (6.6)

Clearly,Pm(1)=Sα. Note that the preliminary computations (6.6) need n₂ additions.

Let us describe now our random generator.

1. Choose i= 1 inPm.

2. For k=m down to 1 do:

Let the elementiof Pk be chosen. Choose inPk−1

either 2ior 2i−1 with probabilities P

(2i) k−1 P_k(i) or P_k(2₋i₁−1) P_k(i) . (6.7)

Clearly, this procedure implements correctly the random counterRα. Note that its

com-plexity is O(lnn) operations. In the above form, its execution requires generating m

random numbers. However, a simple modification can reduce this amount up to one. Note also, that corrections of vectorsPk,k= 0, . . . , mdue to the change of a single entry

in the initial data needsO(lnn) operations.

6.3 Numerical test

Let us describe now our test problem (sometimes it is called the Google problem). Let

E∈Rn×n _{be an incidence matrix of a graph. Denote}_e_{= (1}_{, . . . ,}₁₎T _and

¯

(23)

Since, ¯ETe=e, it is a stochastic matrix. Our problem consists in finding a right maximal eigenvector of the matrix ¯E:

Find x∗ _≥_{0 :} _Ex¯ ∗₌_x∗_,_h_{e, x}_i_{= 1.}

Clearly, this problem can be rewritten in an optimization form:

f(x)def= 1₂kEx¯ −xk2₊γ

2[he, xi −1]2 → _xmin_∈_Rn, (6.8) where γ > 0 is a penalty parameter for the equality constraint, and the norm k · k is Euclidean. If the degree of each node in the graph is small, then the computation of partial derivatives of functionf is cheap. Hence, we can apply to (6.8) RCDM even if the size of the matrix ¯E is very big.

In our numerical experiments we appliedRCDM(1,0) to a randomly generated graph with average node degreep. The termination criterion was

kEx¯ −xk ≤²· kxk

with²= 0.01. In the table below, kdenotes the total number of groups byn coordinate iterations. The computations were performed on a standard Pentium-4 computer with frequency 1.6GHz. n p γ k Time(s) 65536 10 _n1 47 7.41 20 30 5.97 10 _√1 n 65 10.5 20 39 7.84 262144 10 _n1 47 42.7 20 32 39.1 10 _√1 n 72 76.5 20 45 62.0 1048576 10 _n1 49 247 20 31 240 10 _√1 n 82 486 20 64 493

We can see that the number ofn-iteration groups grows very moderately with the dimen-sion of the problem. The increase of factor γ also does not create for RCDM significant difficulties. Note that in the standard Euclidean norm we have L(f) ≈ γn. Thus, for the black-box gradient methods the problems with γ = √1

n are very difficult. Note also

that the accelerated scheme (6.1) is not efficient on problems of this size. Indeed, each coordinate iteration of this method needs an update of n-dimensional vector. Therefore, one group of n iterations takes at least n2 _≈ ₁₀12 _{operations (this is for the maximal}

dimension in the above table). For our computer, this amount of computations takes at least 20 minutes.

Note that the dominance of RCDM on sparse problems can be supported by compar-ison of the efficiency estimates. Let us put in one table the complexity results related to RCDM (2.6), accelerated scheme ACDM (5.1), and the fast gradient method (FGM)

(24)

working with the full gradient (e.g. Section 2.2 in [6]). Denote by T the complexity of computation of single directional derivative, and by F complexity of computation of the function value. We assume that F = nT. Note that this can be true even for dense problems (e.g. quadratic functions). We will measure distances ink · k1. For this metric,

denote byγ the Lipschitz constant for the gradient of the objective function. Note that 1≤γ ≤n.

In the table below, we compare the cost of iteration, cost of the oracle and the iteration complexity for these three methods.

RCDM ACDM F GM Iteration 1 n n Oracle T T F Complexity nR2 ² n√R² √_γR √ ² Total (F+n)R2 ² (F+n2)√R² (F +n) √_γR √ ² (6.9)

We can see that RCDM is better than FGM if √R

² <

√

γ. On the other hand, FGM is better than ACDM ifF < √n2_γ. For our test problem, both conditions are satisfied.

References

[1] A. Auslender. Optimisation M´ethodes Num´eriques. Masson, Paris (1976).

[2] D. Bertsekas. Nonlinear Programming. 2nd edition, Athena Scientific, Belmont (1999).

[3] Z.Q. Luo, P. Tseng. On the convergence rate of dual ascent methods for linearly constrained convex minimization.Mathematics of Operations Research,18(2), 846-867 (1993).

[4] A. Nemirovsky, D. Yudin.Problem complexity and method efficiency in optimization. John Willey & Sons, Somerset, NJ (1983).

[5] Yu. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence O( 1

k2).Doklady AN SSSR (translated as Soviet Math. Docl.),

269(3), 543-547 (1983).

[6] Yu. Nesterov.Introductory Lectures on Convex Optimization. Kluwer, Boston, 2004. [7] Yu. Nesterov. Gradient methods for minimizing composite objective function.

CORE Discussion Paper 2007/76, (2007).

[8] P. Tseng. Convergence of a block coordinate descent methods for nondifferentiable minimization. JOTA, 109(3), 475-494 (2001).

(25)

CORE Discussion Papers

2009/50. Nicolas BOCCARD and Xavier WAUTHY. Entry accommodation under multiple commitment

strategies: judo economics revisited.

2009/51. Joachim GAHUNGU and Yves SMEERS. Multi-assets real options.

2009/52. Nicolas BOCCARD and Xavier WAUTHY. Regulating quality by regulating quantity: a case

against minimum quality standards.

2009/53. David DE LA CROIX and Frédéric DOCQUIER. An incentive mechanism to break the

low-skill immigration deadlock.

2009/54. Henry TULKENS and Vincent VAN STEENBERGHE. "Mitigation, adaptation, suffering": In search of the right mix in the face of climate change.

2009/55. Santanu S. DEY and Quentin LOUVEAUX. Split rank of triangle and quadrilateral inequalities.

2009/56. Claire DUJARDIN, Dominique PEETERS and Isabelle THOMAS. Neighbourhood effects and

endogeneity issues.

2009/57. Marie-Louise LEROUX, Pierre PESTIEAU and Maria RACIONERO. Voting on pensions: sex

and marriage.

2009/58. Jean J. GABSZEWICZ. A note on price competition in product differentiation models. 2009/59. Olivier BOS and Martin RANGER. All-pay auctions with endogenous rewards.

2009/60. Julio DAVILA and Marie-Louise LEROUX. On the fiscal treatment of life expectancy related choices.

2009/61. Luc BAUWENS and Jeroen V.K. ROMBOUTS. On marginal likelihood computation in

change-point models.

2009/62. Jorge ALCALDE-UNZU and Elena MOLIS. Exchange of indivisible goods and indifferences:

the Top Trading Absorbing Sets mechanisms.

2009/63. Pascal MOSSAY and Pierre M. PICARD. On spatial equilibria in a social interaction model. 2009/64. Laurence JACQUET and Dirk VAN DE GAER. A comparison of optimal tax policies when

compensation or responsibility matter.

2009/65. David DE LA CROIX and Clara DELAVALLADE. Why corrupt governments may receive more foreign aid.

2009/66. Gilles GRANDJEAN, Ana MAULEON and Vincent VANNETELBOSCH. Strongly rational

sets for normal-form games.

2009/67. Kristian BEHRENS, Susana PERALTA and Pierre M. PICARD. Transfer pricing rules, OECD

guidelines, and market distortions.

2009/68. Marco MARINUCCI and Wouter VERGOTE. Endogenous network formation in patent

contests and its role as a barrier to entry.

2009/69. Andréas HEINEN and Alfonso VALDESOGO. Asymmetric CAPM dependence for large

dimensions: the Canonical Vine Autoregressive Model.

2009/70. Skerdilajda ZANAJ. Product differentiation and vertical integration in presence of double marginalization.

2009/71. Marie-Louise LEROUX and Grégory PONTHIERE. Wives, husbands and wheelchairs:

Optimal tax policy under gender-specific health.

2009/72. Yu. NESTEROV and Levent TUNCEL. Local quadratic convergence of polynomial-time interior-point methods for conic optimization problems.

2009/73. Grégory VANDENBULCKE, Claire DUJARDIN, Isabelle THOMAS, Bas DE GEUS, Bart

DEGRAEUWE, Romain MEEUSEN and Luc INT PANIS. Cycle commuting in Belgium: Spatial determinants and 're-cycling' strategies.

2009/74. Noël BONNEUIL and Raouf BOUCEKKINE. Sustainability, optimality, and viability in the Ramsey model.

2009/75. Eric TOULEMONDE. The principle of mutual recognition – A source of divergence?

2009/76. David DE LA CROIX, Pierre PESTIEAU and Grégory PONTHIÈRE. How powerful is

demography? The Serendipity Theorem revisited.

2009/77. Nicola ACOCELLA, Giovanni DI BARTOLOMEO, Andrew HUGUES HALLETT and Paolo

(26)

2009/78. Julio DÁVILA. The taxation of savings in overlapping generations economies with unbacked risky assets.

2009/79. Elena DEL REY and Miguel Angel LOPEZ-GARCIA. Optimal education and pensions in an

endogenous growth model.

2009/80. Hiroshi UNO. Strategic complementarities and nested potential games.

2009/81. Xavier WAUTHY. Market coverage and the nature of product differentiation: a note. 2009/82. Filippo L. CALCIANO. Nash equilibria of games with increasing best replies.

2009/83. Jacques H. DRÈZE, Oussama LACHIRI and Enrico MINELLI. Stock prices, anticipations and

investment in general equilibrium.

2009/84. Claire DUJARDIN and Florence GOFFETTE-NAGOT. Neighborhood effect on

unemployment? A test à la Altonji.

2009/85. Erwin OOGHE and Erik SCHOKKAERT. School accountability: (how) can we reward schools

and avoid cream-skimming.

2009/86. Ilke VAN BEVEREN and Hylke VANDENBUSSCHE. Product and process innovation and the

decision to export: firm-level evidence for Belgium.

2010/1. Giorgia OGGIONI and Yves SMEERS. Degree of coordination in market-coupling and counter-trading.

2010/2. Yu. NESTEROV. Efficiency of coordinate descent methods on huge-scale optimization problems.

Books

Public goods, environmental externalities and fiscal competition: 22 selected papers in public economics by Henry Tulkens, edited and introduced by Parkash Chander, Jacques Drèze, C. Knox Lovell and Jack Mintz, Springer, Boston 2006 (588 pp.).

V. GINSBURGH and D. THROSBY (eds.) (2006), Handbook of the economics of art and culture.

Amsterdam, Elsevier.

J. GABSZEWICZ (ed.) (2006), La différenciation des produits. Paris, La découverte.

L. BAUWENS, W. POHLMEIER and D. VEREDAS (eds.) (2008), High frequency financial econometrics:

recent developments. Heidelberg, Physica-Verlag.

P. VAN HENTENRYCKE and L. WOLSEY (eds.) (2007), Integration of AI and OR techniques in constraint programming for combinatorial optimization problems. Berlin, Springer.

P-P. COMBES, Th. MAYER and J-F. THISSE (eds.) (2008), Economic geography: the integration of regions and nations. Princeton, Princeton University Press.

J. HINDRIKS (ed.) (2008), Au-delà de Copernic: de la confusion au consensus ? Brussels, Academic and Scientific Publishers.

J-M. HURIOT and J-F. THISSE (eds) (2009), Economics of cities. Cambridge, Cambridge University Press.

CORE Lecture Series

C. GOURIÉROUX and A. MONFORT (1995), Simulation Based Econometric Methods. A. RUBINSTEIN (1996), Lectures on Modeling Bounded Rationality.

J. RENEGAR (1999), A Mathematical View of Interior-Point Methods in Convex Optimization.

B.D. BERNHEIM and M.D. WHINSTON (1999), Anticompetitive Exclusion and Foreclosure Through Vertical Agreements.

D. BIENSTOCK (2001), Potential function methods for approximately solving linear programming problems: theory and practice.

R. AMIR (2002), Supermodularity and complementarity in economics. R. WEISMANTEL (2006), Lectures on mixed nonlinear programming.