• No results found

An alternating linearization bundle method for a class of nonconvex nonsmooth optimization problems

N/A
N/A
Protected

Academic year: 2020

Share "An alternating linearization bundle method for a class of nonconvex nonsmooth optimization problems"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

R E S E A R C H

Open Access

An alternating linearization bundle

method for a class of nonconvex nonsmooth

optimization problems

Chunming Tang

1

, Jinman Lv

1

and Jinbao Jian

2*

*Correspondence:

[email protected]

2College of Science, Guangxi

University of Nationalities, Nanning, P.R. China

Full list of author information is available at the end of the article

Abstract

In this paper, we propose an alternating linearization bundle method for minimizing the sum of a nonconvex function and a convex function, both of which are not necessarily differentiable. The nonconvex function is first locally “convexified” by imposing a quadratic term, and then a cutting-planes model of the local

convexification function is generated. The convex function is assumed to be “simple” in the sense that finding its proximal-like point is relatively easy. At each iteration, the method solves two subproblems in which the functions are alternately represented by the linearizations of the cutting-planes model and the convex objective function. It is proved that the sequence of iteration points converges to a stationary point. Numerical results show the good performance of the method.

Keywords: Bundle method; Alternating linearization; Local convexification; Global convergence

1 Introduction

In this paper, we consider the structured nonconvex minimization problem

min

x∈Rn

F(x) :=f(x) +h(x), (1)

wheref :RnRis possibly a nonconvex nonsmooth function andh:Rn(–,] is a closed proper convex function.

Problems of the form (1) often appear in practice, such as signal processing, image re-construction, engineering, optimal control, and so on. Three typical examples are given below.

Example1 (Unconstrained transformation of a constrained problem) Consider the con-strained problem

minf(x) :xC, (2)

(2)

where f is possibly a nonsmooth nonconvex function andC is a convex subset ofRn. Problem (2) can be written equivalently as

min

x∈Rnf(x) +ıC(x), (3)

whereıC is the indicator function ofC, i.e.,ıC(x) equals 0 onCand infinity elsewhere. Clearly, problem (3) is a special case of problem (1) withh(x) =ıC(x). We note that the proximal point ofıCcan easily be calculated or even has a closed-form solution ifChas some special structure.

Example2 (Nonconvex regularization of a convex function) Consider thelq(0 <q< 1) regularization problem

min

x∈Rn 1

2Axb 2+λx

q, (4)

which has many practical applications in compressed sensing and imaging science (see e.g., [1]), wherexq= (

n

i=1|xi|q)1/q. The objective function of problem (4) is also the sum of a convex function and a nonconvex function.

Example3 (Convex regularization of a nonconvex function) Hare et al. [2] studied the function of the form

F(x) = n

i=1 fi(x)+

1 2x

2, (5)

wherefi(x) :RnR,i= 1, . . . ,nare Ferrier polynomials defined as

fi(x) =ix2i – 2xi+ n

j=1 xj.

It is well known thatf(x) =ni=1|fi(x)|is a nonconvex nonsmooth function, andh(x) = 1

2x

2is a simple convex function.

(3)

In this paper, we consider to minimize the sum of a nonconvex function and a convex function with the form of (1). In particular, we assume thatf is lower-C2 andhis “sim-ple” in the sense that minimizinghplus a quadratic term is relatively easy. The method presented in this paper can be viewed as a generalized version of the methods given in [9] and [13]. On one hand, we generalize the method of [9] from minimizing the sum of two convex functions to the sum of a nonconvex function and a convex function. On the other hand, we generalize the method of [13] from minimizing a single nonconvex nonsmooth function to the sum of two functions.

Our method will produce three sequences of points:{z}, {y}and{xk()}, where{z}

is the sequence of proximal points,{y}is the sequence of trial points, and{xk()}is the sequence of stability centers (i.e.,xk()∈ {y}is the “best” point obtained so far for iteration , which will be abbreviated asxkif there is no confusion). More precisely, our method will alternately solve the following two subproblems:

z+1:=arg min ϕˇ (·) +h¯

–1(·) +

1

2μ·–x k2

, (6)

y+1:=arg min ϕ¯ (·) +h(·) +1

2μ·–x k2

, (7)

whereϕˇ is a cutting-planes model [14, 15] of the local convexification function off at iteration, which is based on the idea of the redistributed proximal bundle method in [13] and will be made more precise later;h¯–1is a linearization ofhat iteration– 1;ϕ¯ is a linearization ofϕˇ ;μis the proximal parameter. Our convergence analysis shows that, under suitable assumptions, any accumulation point of the sequence{xk}is a stationary point ofFif there is an infinite number of serious steps; otherwise, the last stability center is a stationary point ofF.

This paper is organized as follows. In Sect. 2, we review some basic definitions and re-sults required for this work. In Sect. 3, we present the alternating linearization bundle method for problem (1). Section 4 examines the convergence properties of the algorithm. Some preliminary numerical results are given in Sect. 5. The Euclidean inner product in

Rnis denoted byx,y=xTy, and the associated norm by · .

2 Preliminaries

In this section, we recall some basic definitions and results that are closely relevant to our method, which can be found in [13, 16, 17].

– Thelimiting subdifferentialoff atx¯is defined by

∂f(x) :=¯ lim

x→¯xf(x)supf(x)¯ ˆf(x),

whereˆf(x)¯ is theregular subdifferentialdefined by

ˆ

∂f(x) :=¯ g∈Rn:lim

x→¯xinfx=x¯

f(x) –f(x) –¯ g,xx¯ xx¯ ≥0

.

(4)

– The functionf isprox-boundedif there existsR≥0such that the function

f(·) +12R · 2is bounded below. The corresponding threshold is the smallestrpb0

such thatf(·) +12R · 2is bounded below for allR>rpb.

– The functionf islower-C2on an open setVif for eachx¯V there is a neighborhood Vofx¯upon which a representationf(x) =maxtTft(x)holds, whereT is a compact

set and the functionsftare of classC2onVsuch thatft,ft and2ftdepend continuously on(t,x)T×V.

– Theproximal point mappingof the functionf at the pointx∈Rnis defined by

pRf(x) :=arg min ω∈Rn f(ω) +

1

2x 2

.

Lemma 1([16]) Suppose that the function f is lower-C2on V andx¯V.Then there exist

ε> 0,K> 0,andρ> 0such that

(i) for any pointx0and parameterRρthe functionf+12R ·–x02is convex and finite valued on the closed ballB¯ε(x)¯ ,and

(ii) the functionf is Lipschitz continuous with constantKonB¯ε(x)¯ .

Theorem 1([16]) Suppose that the lower semicontinuous function f is prox-bounded with threshold rpband lower-C2 on V.Letx¯V and letε> 0,K> 0andρ> 0be given by Lemma 1.Then x is a stationary point of f if and only if¯ x¯=pRf(x)¯ for any R>Rx¯ :=

max{4K/ε,ρ,rpb}.

Assumption 1([13]) Givenx0∈RnandM0≥0, there exist an open bounded setOand a functionHsuch thatL0:={x∈Rn:f(x)f(x0) +M0} ⊂O, andHis lower-C2onOwith Hf onL0.

Theorem 2([13]) For a function f satisfying Assumption1,the following results hold:

(i) The level setL0is nonempty and compact.

(ii) There existsρid> 0such that,for anyρρidand any givenyL0,the function f +1

2ρ ·–y

2is convex onL0.

(iii) The functionf is Lipschitz continuous onL0.

3 The alternating linearization bundle method

3.1 Motivation and framework

The classic proximal point algorithm (see e.g. [18]) for solving problem (1) generates the new iterate by

y+1=arg min f(·) +h(·) +1 2R·–y

2

, (8)

whereR> 0 is the proximal parameter.

However, sincef is a nonconvex function, solving problem (8) may not be easy and is usually as difficult as the original problem (1). Therefore, we will tackle the difficulty via the following three steps.

(5)

ηandμ which are nonnegative and satisfyR=η+μ. Then the problem (8) (after

replacingybyxk) can be written as

y+1=arg min ϕ(·) +h(·) +1

2μ·–x k2

, (9)

where

ϕ(·) =f(·) +

1 2η·–x

k2

(10)

is called the local convexification function of f, since it is convex wheneverη is large enough (see Theorem 2).

2.Generate the cutting-planes model of ϕ. Letbe the current iteration index,yi,i J⊆ {0, 1, . . . ,}be trial points generated in the previous iterations, andgfi∂f(yi). Define

the cutting-planes model ofϕby

ˇ

ϕ(·) =max

iJ

fyi+1 2ηy

ixk2

+gfi+η

yixk,·–yi, (11)

wheregfi=gf(yi)∈∂f(yi). Therefore, we obtain an approximate version of problem (9) as follows:

y+1:=arg min ϕˇ (·) +h(·) +1

2μ·–x k2

. (12)

3.Apply the alternating linearization bundle strategy to solve problem(12). Since prob-lem (12) may still be difficult, motivating by the idea of the alternating linearization bun-dle [9], we consider to alternately solve the following two subproblems:

z+1:=arg min ϕˇ (·) +h¯–1(·) +

1

2μ·–x k2

, (13)

y+1:=arg min ϕ¯ (·) +h(·) +1

2μ·–x k2

. (14)

The above two subproblems are much easier to solve, whose objective functions are alter-nately represented by linear models ofh(·) andϕˇ (·), respectively.

3.2 Further description via bundle terminologies

Bundle methods [19–21] are among the most robust and reliable methods to solve gen-eral nonsmooth optimization problems, which can be considered stabilized variants of cutting-planes method [14, 15]. In general, for a convex functionh, bundle methods store the trial pointsyi,iJwith their function values and subgradients in a bundle of

infor-mation:

iJ

(6)

and a pointxk:=xk() (calledstability center) which is the “best” point obtained so far. A storage-saving form of (15) (refer to the current stability centerxk) is given by

iJ

ei,kh ,ghi

eih,kh

xk,

where ∂ehis thee-subdifferential ofhin convex analysis, and ei,kh are the linearization errors ofhdefined by

ei,kh =hxkhyi+gi h,xkyi

. (16)

Following the notations above, the bundle information of the functionϕ(·) can be writ-ten as (see also [13]):

iJ

ei,kf ,dki,ki,gfi with ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩

ei,kf =f(xk) – (f(yi) +gi

f,xkyi), dki =12yixk2,

gi

f∂f(yi),

ki =yixk,

(17)

whereei,kf anddk

i are the linearization errors off and12 ·–xk2, respectively,gfiis a sub-gradient off atyi, andki is the gradient of 1

2 ·–x

k2atyi. From (17), we know thatei,k f , dikandki depend on the pointxk, so they should be updated whenever a new stability center is generated (details are given below).

By the optimality conditions of subproblem (13), there exists a multiplier vector (αi,iJ)∈Ssuch that

z+1=xk– 1

μ

iJ

αigfi+ηki+gh–1

, (18)

whereSdenotes the unit simplex inR|J|andg–1

h =∇ ¯h–1(z+1).

As iterations go along, the number of elements in the bundle may increase infinitely, which could lead to serious problems with storage and computation. The subgradient ag-gregation strategy [22] is the most popular and efficient way to overcome such a difficulty. We use the notationg

η to denote the aggregate subgradient, i.e.,

g η :=

iJ

αigfi+ηki∂ϕˇ z+1. (19)

Define the strongly active set of subgradients by

Jact:=

iJ:αi> 0

.

Then the corresponding aggregate bundle elements are given by

ef,dk,k,g

f

:=

iJ

αiei,kf ,dki,ki,gfi=

jJact

(7)

Therefore

=

iJ

αigfi+ηki

=gf+ηk=μ

xkz+1–gh–1.

Here, as in [13, 23], we use negative index –to express the aggregate bundle elements, henceJ⊆ {–, –+ 1, . . . , 0, 1, . . . ,– 1,}in general.

By making use of the notations above, the cutting-planes modelϕˇ in (11) can be rewrit-ten as:

ˇ

ϕ(·) =fxk+max

iJ

ei,kf +ηdki+gif+ηki,·–xk. (21)

Note that, for alljJact

we have

ˇ

ϕz+1=fxkej,kfηdjk+gfj+ηkj,z+1–xk, (22)

and the aggregate model ofϕˇ inJis

˜

ϕ(·) =f

xkef,k–ηdk+

gf+ηk,·–xk.

For a new stability centerxk+1, the bundle elements can be updated by (see [13])

ei,k+1f =ei,kf +fxk+1fxk+gfi,xk+1xk,

dik+1=dik+1 2x

k+1xk2+k

i,xk+1xk

,

k+1i =ki +xkxk+1.

(23)

On the other hand, sincefis possibly nonconvex, the linearization errorsei,kf +ηdik,iJ

may be negative, and therefore the modelϕˇ is not necessarily a lower approximation to

ϕ. In the case ofei,kf +ηdki ≥0, one has

gfi+ηkiei,k f +ηdkiϕˇ

xk. (24)

In order to ensure that the linearization errors are all nonnegative, the convexification parameterηshould be adjusted to asymptotically estimate the ideal convexity threshold

ρidin Theorem 2. Hare et al. [2] suggested a lower bound forηas follows:

ηmin := max

iJ,dik>0 –e

i,k f

dki , (25)

which guarantees thatei,kf +ηdki ≥0 for alliJwheneverηηmin .

Finally, in our algorithm, we define the predicted descentδand the linearization error

εas follows:

δ:=fxk+1 2ηy

(8)

ε:=Fxkϕ¯xk+h¯

xk. (27)

For a fixed parameterκ∈(0, 1), adescent stepis taken if

Fy+1≤Fxkκδ, (28)

holds, and then update the stability centerxk+1=y+1. Otherwise, anull stepoccurs, and then the aggregate linearization and the new linearization are used to produce a better modelϕˇ+1.

3.3 The algorithm

Algorithm 1

Step0. (Initialization). Select a starting pointy0and setx0=y0. Set parametersM> 0, R0> 0,κ∈(0, 1),≥0, and≥1. Initialize the iteration counter= 0, the descent step counterk:=k() = 0 withi0= 0. Set (μ0,η0) = (R0, 0) andJ0:={0}. Computef(x0),gf0∈

∂f(x0) and the bundle information (e0,0

f ,d00,00) := (0, 0, 0). Sets–1h =gh0∈∂h(x0). Step1. Findz+1by solving subproblem (13), and set

¯

ϕ(·) =ϕˇ z+1+s ϕ,·–z

+1 with s ϕ=μ

xkz+1s–1

h . (29)

Step2. Findy+1by solving subproblem (14), and set

¯ h(·) =h

y+1+s

h,·–y+1

with s

h=μ

xky+1s

ϕ. (30)

Step3. (Stopping criterion). Computef(y+1),h(y+1),g+1 f∂f(y

+1) andg+1 h∂h(y

+1). Ifδ, then STOP. Otherwise, compute the new bundle elements by

k+1:=y+1xk, dk

+1:=k+1 2

/2,

ef+1,k:=fxkfy+1+g+1 f ,k+1

.

Select a new index setJ+1satisfying

J+1⊇ {+ 1,ik} and ⎧ ⎨ ⎩

either J+1⊇Jact,

or J+1⊇ {–}.

(31)

Step4. (Descent test). If (28) holds, declare a descent step, setk(+ 1) =k+ 1,ik+1=

+ 1,xk+1=y+1, and update the bundle elements by (23). Otherwise, declare a null step, and setk(+ 1) =k().

Step5. (Updateη). Update the convexification parameter by ⎧

⎨ ⎩

η+1:=η ifηmin+1≤η,

η+1:=ηmin+1 and R+1:=μ+η+1 otherwise.

(32)

Step6. (Updateμ). IfF(y+1) >F(xk) +M, then the objective increase is unacceptable, letμ+1:=μand loop to Step 1; otherwise, setμ+1:=μ.

(9)

Remark1 (1) The predicted descentδand the linearization errorεare nonnegative (the details are given below); (2) in Step 6, ifF(y+1) >F(xk) +Mholds, then the current model is considered as “bad”, so we should become more “conservative”, and therefore increase the proximal parameterμby settingμ+1=μ. In the next section, we will prove that the number of increasingμis finite; (3) the parametersμ,ηin the algorithm will be stable eventually.

Lemma 2 The predicted descentδ and the linearization errorε are nonnegative,and

satisfy

ε=δR+μ 2μ2 s

2, with s=s ϕ+s

h. (33)

Proof In Step 2, from (30) and (18) we knowg–1

h =sh–1, hence

g η =μ

xkz+1g–1 h =μ

xkz+1s–1

h =. (34)

Next, we proveδ≥0 andε≥0. From (26) and (29) we have

δ=fxk+1 2ηy

+1xk2+hxkϕ¯ y+1+h¯

y+1

=fxk+1 2ηy

+1xk2

+hxkϕˇ z+1s

ϕ,y+1–z+1

hy+1

=fxkϕˇ z+1–,y

+1z+1+1 2ηy

+1xk2+hxkhy+1. (35)

Letj= –in (22) and from (34) we can obtain

fxkϕˇ z+1–,y

+1z+1

=fxkfxkefηdk+

gf+ηk,z+1–xk,y

+1z+1

=ef+ηdk

,z

+1xks ϕ,y

+1z+1

=ef+ηdk+

,xky

+1. (36)

On the other hand, from (30), we have

–hy+1= –hy+1–sh,xky+1+sh,xky+1

= –h¯

xk+sh,xky+1. (37)

Hence, by combining (36), (37) and (30), (35) can be written as

δ=ef+ηdk+

,xky +1+1

2ηy

+1xk2+hxkh¯

xk

+s

h,xky+1

(38)

=ef+ηdk+

μxky+1,xky+1+1 2ηy

+1xk2

+hxkh¯

xk

=ef+ηdk+

R+μ

2 y

+1xk2+hxkh¯

(10)

In Step 5, the update forηis done to ensureηηmin

for all iterations, so thatef+ηdk

0. Therefore, the predicted descentδ≥0 sincehis convex. Forε, from (27), one has

ε=Fxkϕ¯ xk+h¯

xk

=fxk+hxkϕˇ z+1–,xkz +1h¯

xk

=fxkϕˇ z+1s ϕ,x

kz+1+hxkh¯

xk. (40)

Similar to (36), we have

fxkϕˇ z+1–,xkz +1

=fxkfxkefηdk+

gf+ηk,z+1–xk

,xkz +1

=ef+ηdk

,z

+1xks ϕ,xkz

+1

=ef+ηdk. (41)

Thus, we have

ε=ef+ηdk+h

xkh¯

xk≥0. (42)

Equation (33) follows immediately from (39) and (42).

From (33), We know thatδε. Therefore, ifδ, thenε. So, we only useδ

as the termination criterion in Step 5.

Lemma 3 The vectors s ϕand s

hof(30)and(29)are in fact subgradients,i.e.,

∂ϕˇ

z+1 and sh∂h

y+1. (43)

Furthermore,we have

¯

ϕ≤ ˇϕ and h¯h. (44)

Proof Letφfandφhdenote the objectives of (13) and (14), respectively, i.e.,

φf(·) :=ϕˇ(·) +h¯–1(·) + 1

2μ·–x k2

, (45)

φh(·) :=ϕ¯(·) +h(·) +1

2μ·–x

k2. (46)

By (13), (29) and the optimality condition of (45), we have

(11)

which impliess

ϕ∂ϕˇ(z+1). Similarly, by (14) and the optimality condition of (46), we

obtain

0∈∂hy+1+s ϕ+μ

xky+1=hy+1s

h,

which impliessh∂h(y+1). So (43) holds.

Equation (44) follows immediately from (43).

4 Convergence

In this section, we will study the convergence properties of Algorithm 1. Firstly, based on the objective function of problem (1), we need to slightly modify Assumption 1 as follows.

Assumption 2 Givenx0∈RnandM0≥0, there exist an open bounded setOand a func-tionHsuch thatL0:={x∈Rn:F(x)F(x0) +M0} ⊂O, andH is lower-C2onOwith Hf onL0.

For convenience, we assume that Assumption 2 holds throughout the rest of convergence analysis.

In addition, from [24] we know that, iff is a locally Lipschitz continuous function, then the subgradients off are locally bounded, i.e.,

gfis bounded ifyis bounded. (47)

Further, as in [9], it follows that the model subgradientss

ϕin (43) satisfy

is bounded ifyis bounded. (48)

Remark2 Note that (47) implies that{g

ϕ:=gf+ηi} (gϕ∂ϕˇ) is bounded if{y}is

bounded, since {

i} in (17) is bounded if {y

} is bounded. Sinces

ϕ∂ϕˇ , then s ϕ ∈ conv{gϕj}jJ, thus we haves

ϕ ≤maxj=1g j

ϕ, and the modelϕˇ satisfies condition (48)

automatically when (47) holds.

The following lemma shows the properties of the model functionϕˇ, whose proof can

be found in [13, 16].

Lemma 4 For the model functionϕˇ and convexification parameterη,we have

(i) ϕˇ is a convex function. (ii) Ifηηmin,then

ˇ

ϕxkfxk. (49)

(iii) Ifη+1=η,and eitherJ+1⊇JactorJ+1⊇ {–},then

ˇ

ϕ+1(·)≥ ˇϕz+1+,·–z +1

(12)

(iv) IfJ⊇ {},then

ˇ

ϕ(·)≥fy+1 2ηy

xk2+gf+ηyxk,·–y,

for someg

f∂f(y

).

(v) Ifηρid,then

ˇ

ϕ(ω)≤f(ω) +

1

2ηωx k2

for allωL0. (50)

From the updating rule in Step 5 of Algorithm 1, the convexification parameterηis either unchanged or increasing. The following lemma shows thatηcan be fixed in a finite number of iterations, whose proof can be found in [13].

Lemma 5 There exist an index1and a positive constantη> 0such that

ηη, for all1.

Lemma 6 Suppose that there exists an integer K such that,for allK,only null steps occur without increasingμ.Then the following results hold:

(i) The sequences

φfz+1=ϕˇ

z+1+h¯–1

z+1+1 2μz

+1xk2

K ,

φhy+1=ϕ¯ y+1+hy+1+1 2μy

+1xk2

K

are nondecreasing and convergent.

(ii) The sequences{y+1}and{z+1}are bounded,z+1y+10and z+2y+10as→ ∞.

Proof First, using partial linearizations of the subproblems to show (i) is hold. FixedK. By the definitions in (13) and (29), we haveϕ¯ (z+1) =ϕˇ (z+1) and

z+1=arg min φ¯f(·) :=ϕ¯(·) +h¯–1(·) +

1

2μ·–x k2

, (51)

from∇ ¯φf(z+1) = 0. Sinceφ¯

f is quadratic andφ¯

f(z

+1) =φ

f(z

+1), by Taylor’s expansion

¯

φf(·) =φ¯fz+1+∇ ¯φfz+1·–z+1+1 2μ·–z

+12

=φfz+1+1 2μ·–z

+12. (52)

Similarly, by the definitions in (14) and (30), we haveh¯(y+1) =h(y+1), and

y+1=arg min φ¯

h(·) :=ϕ¯ (·) +h¯(·) +

1

2μ·–x k2

(13)

¯

φh(·) =φhy+1+1

2μ·–y

+12. (54)

Next, to bound the objective values of the linearized subproblem (51) and (53) from above, we useϕ¯ ≤ ˇϕandh¯–1≤h,h¯hof (44) andϕˇ(xk)≤f(xk) in (ii) of Lemma 4

φfz+1+1 2μx

kz+12=φ¯

f

xk≤ ˇϕxk+hxkFxk, (55)

φhy+1+1 2μx

ky+12

=φ¯hxk≤ ˇϕxk+hxkFxk. (56)

From (14) and (51), we haveφ¯fφh. On the other hand, since only null step occurred, so xk+1=xk, the algorithm ensures thatμ=μ

+1, andϕ¯ ≤ ˇϕ+1by (iii) of Lemma 4, we can obtainφ¯

hφf+1. By (52) and (54), we see that

φfz+1+1 2μy

+1z+12

=φ¯fy+1φ

h

y+1, (57)

φhy+1+1 2μz

+2y+12

=φ¯hz+2≤φf+1z+2. (58)

In particular, from (57) and (58), we have the relation

φfz+1φ

h

y+1φ+1 f

z+2

which implies that{φf(z+1)}

Kand{φh(y+1)}Kare nondecreasing sequences. Together with the bound ofF(xk) from (55) and (56), the convergence is established.

For (ii), we have proved the convergence of{φ

f(z+1)}and{φh(y+1)}in (i) whenK, so there must have a common limit, sayφ∞≤F(xk), such that

φfz+1→φ∞, φhy+1→φ∞ (59)

and we havez+1y+10 andz+2y+10 from (57) and (58),{y+1}and{z+1} are bounded from (55) and (56). Then the sequences{g

f}and{s

ϕ}are bounded by (47)

and (48).

The following lemma shows that the number of times of increasingμis finite.

Lemma 7 Suppose that ikJ,and let Nbe the number of times of increasingμ.Then

there exists a positive constant L such that

N

ln L2

0

ln

, (60)

whereais the smallest integer greater than or equal to a.As a result,there exists an index

2such that

(14)

Proof Letrbe the index corresponding to therth time thatμincreases, then whenr+ 1≤<r+1, we have

μ=0. (61)

SinceikJ, from (24), we obtain gik∂ϕˇ(xk) by writingi=ik, and it also holds that

gfik∂f(xk) from (21), sogik

f is bounded. Hence

(ϕ¯ +h)

xk

=arg min ϕ¯ (y) +h(y) +1

2μyx k2

¯ (y) +h(y) +1

2μyx k2

≤ ¯ϕxk+hxk

¯ xk+gfik,yxk+hxk+ghk,yxk

+1

2μyx

k2≤ ¯ϕxk+hxk

= ygik f ,yx

k+gk h,yxk

+1

2μyx k2

≤0

= y1

2μyx

k2gik

f +ghk,yxk

y1

2μyx k2

gfik+ghkyxk

= yyxk≤2g ik

f + 2ghk

μ

.

Whenr+ 1≤<r+1, ify+1is a null step, we knowgf is bounded from Lemma 6, else y+1is a descent step, and the corresponding subgradientgik(+1)∈f(xk+1) is also bounded.

Therefore, there exists a constant L> 0 such that max{gfik,g+1

f ,gkh,gh+1} ≤ L2. Thus we have

y+1=(ϕ¯ +h)

xkyyxk≤2L

μ

.

This together with (61) shows that Fy+1Fxk

=fxk+g+1

f ,y+1–xk

fxk+hxk+g+1

h ,y+1–xk

hxk

g+1

f +gh+1y+1–xk ≤2·L

2 · 2L

rμ 0

M.

Thus, if

r≥ln L2 0

(15)

then

fy+1fxk+M,

r+ 1≤r+1.

This means that the number of timesNof increasingμsatisfies (60). The latter part of

the lemma follows immediately from the above result.

Theorem 3 Ifδ= 0andηρid,then xkis a stationary point of F.

Proof From (33), we have the relation

δ=ε+R+μ 2μ2

s2.

Ifδ= 0, thenε= 0 ands= 0, and therefore

xk=y+1=(ϕ¯ +h)

xk. (62)

From the last result of Theorem 1, we know that xk is a stationary point ofϕ¯

+h. In

addition, fromε= 0 in (27), we have

fxk+hxk=ϕ¯xk+h¯

xk.

This together withh¯(xk)≤h(xk) shows that

fxk≤ ¯ϕxk. (63)

On one hand, forωL0, ifηρid, we obtain by (62) and (63)

Fxk=fxk+hxk≤ ¯ϕxk+hxk≤ ¯ϕ(ω) +h(ω) +1

2μωx

k2. (64)

From the convexity ofϕˇ and (50), we have

¯

ϕ(ω)≤ ˇϕ(ω)≤f(ω) +1

2ηωx k2.

So, (64) can be written as

Fxkf(ω) +1

2ηωx k2

+h(ω) +1

2μωx k2

. (65)

On the other hand, forω∈/L0, from (28) we can obtain

FxkFx0≤Fx0+MF(ω) =F(ω) +1

2x

k2. (66)

Combining (64) and (66), we have

FxkF(ω) +1

2x

(16)

Hence

xk=pRF

xk,

which together with Theorem 1 shows thatxkis a stationary point ofF.

We are now in a position to present the main convergence result of our algorithm. As usual in bundle methods, two cases are considered: the algorithm generates finite number of descent steps; and the algorithm generates infinite number of descent steps. We set the stopping parameter= 0.

Theorem 4 Letη¯be stabilized value for the convexification parameter sequence and as-sumeη¯≥ρid.Then the following mutually exclusive situations hold:

(i) Algorithm1generates finite number of descent steps followed by infinitely many null steps.Letx¯be the last stability center.Theny+1→ ¯x,andx¯is a stationary point of F.

(ii) Algorithm1generates an infinite sequence{xk}of stability centers.Then any accumulation point of{xk}is a stationary point ofF.

Proof For (i), without loss of generality, we may assumeη=η,μ=μ, andR=R

through-out. As in Lemma 6, for the bounded sequences{y}and{z}we showed thaty–z0

andz+1y0 as→ ∞. Thereforeyipasi→ ∞implieszipandzi+1p asi→ ∞. ForωL0nearp, by (50) and the convexity ofh, we have

F(ω) =f(ω) +h(ω)

≥ ˇϕi(ω) – 1

2η¯ωx¯ 2+h(ω)

≥ ¯ϕi(ω) – 1

2η¯ωx¯ 2+h¯

i–1(ω)

=ϕˇizi+1+si

ϕ,ωz i+11

2η¯ωx¯

2+hyi+si–1 h ,ωy

i. (67)

Letxk=x,¯ μ=μ¯, from (29) and the boundedness ofs

ϕ+sh–1=μ(x¯–z+1), we know

si

ϕ,ωz

i+1+si–1 h ,ωy

i

=si

ϕ,ωz

i+1+si–1 h ,ωz

i+1+zi+1yi

=μ¯x¯–zi+1,ωzi+1+shi–1,zi+1yi. (68)

Note that

–1

2η¯ωx¯

2 (69)

= –1 2η¯ωz

i+1+zi+1x¯2

= –1 2η¯ωz

i+12 –1

2η¯z

i+1x¯2

(17)

Combining (68) and (70), (67) can be written as

F(ω) =ϕˇizi+1–1 2η¯z

i+1x¯2

+hyi+ (η¯+μ¯)x¯zi+1,ωzi+1

+shi–1,zi+1yi1 2η¯ωz

i+12.

By Claim (iv) of Lemma 4 written with=iforω=zi+1, we have the following inequality:

ˇ

ϕizi+1–1 2ηz

i+1x¯2fyi+1 2ηy

ix¯21 2ηz

i+1x¯2

+gi f +η

yix¯,zi+1yi.

Then

F(ω) =f(ω) +h(ω)

≥ ˇϕizi+1–1 2η¯z

i+1x¯2+hyi1 2η¯ωz

i+12

+ (η¯+μ¯)x¯–zi+1,ωzi+1+si–1 h ,z

i+1yi

fyi+hyi+1 2η¯y

ix¯21 2η¯z

i+1x¯21 2η¯ωz

i+12

+gi f +η¯

yix¯,zi+1yi+ (η¯+μ¯)x¯zi+1,ωzi+1

+shi–1,zi+1yi.

Froms–1

h =μ(x¯–z

+1) –s

ϕ, the bounded sequence{μ(x¯–z+1)},{},{gf}and{y

}, we

know that{s–1 h }and{g

i

f +η(yix)¯}are bounded. Taking the limit asi→ ∞, and using the fact thatf is continuous atp, we obtain

F(ω) =f(ω) +h(ω)

≥ lim

i→∞ϕˇi

zi+11

2η¯px¯

2+h(p) –1

2η¯ωp

2+R¯¯xp,ωp

f(p) +h(p) –1

2η¯ωp

2+R¯¯xp,ωp

=F(p) –1

2η¯ωp

2+R¯¯xp,ωp, (71)

for all ωL0 near p. Since 12ηωp2 =o(ωp), the last inequality means that R(x¯–p)∈∂F(p) by Definition 8.3 in [17], which impliesp=pR¯F(x) by Theorem 1. Since¯ fis continuous and condition (50) holds at all accumulation points of{y}, then the entire

se-quence{y}converges to the proximal pointp

¯

RF(x). Furthermore, evaluating the relations¯ atω=pshows that the following equation holds for the entire sequence by Theorem 2 in [16]:

lim

i→∞ϕˇi

zi+1=f(p) +1

2ηpx¯

(18)

So as→ ∞, the whole sequence

yp=pR¯F(x)¯ withϕˇ

z+1→f(p) +1

2ηpx¯ 2.

Thus, from (26) we have

δ=f(x) +¯ 1 2η¯y

+1x¯2+h(x) –¯ ϕ¯ y+1+hy+1

=f(x) +¯ 1 2η¯y

+1x¯2

+h(x) –¯ ϕˇ z+1s ϕ,y

+1z+1hy+1

f(x) +¯ 1

2η¯px¯

2+h(x) –¯ f(p) –1

2η¯px¯ 2h(p)

=f(x) +¯ h(x) –¯ f(p) –h(p)

=F(x) –¯ F(p).

Since null step does not satisfy the descent test in Step 4 of the algorithm, we haveF(y+1) > F(x) –¯ κδ. Taking the limit as→ ∞gives the relationF(p)F(x) –¯ κ(F(x) –¯ F(p)), so F(x)¯ ≤F(p) becauseκ∈(0, 1). Butp=pR¯F(x) implies¯

F(p) +R¯px¯2≤F(x),¯

which shows thatx¯=p. That is,x¯=pR¯F(x), so¯ x¯is a stationary point ofFfrom Theorem 1. For (ii),L0is a compact set, and the sequence{xk} ⊂L0, so it has an accumulation point, i.e., there exists some infinite setKsuch thatxk→ ˆxL0asKk→ ∞. Sincexk+1=yik+1,

letjk=ik+1– 1 so thatxk+1=¯(ϕ¯jk+h)(xk). The descent test

Fxk+1Fxkκδjk

implies that, ask→ ∞, eitherF(xk), orδ

jk→0. By Assumption 2,F(x

k) is bounded

below, therefore,δjk →0. From (39), this means that

yjk+1xk, e–jk

f +η¯dk–jk, h

xkhjk¯ xk

must converge to 0. By

zjk+1xkzjk+1yjk+1+yjk+1xk

and

zjk+1yjk+1→0

in Lemma 6, we have

(19)

By (21),ϕˇjk(z

jk+1) –f(xk)0 ask→ ∞, from (29) we know that

¯

ϕjk

yjk+1=ϕˇjk

zjk+1+sjkϕ,yjk+1zjk+1

.

Therefore,

¯

ϕjk

yjk+1–fxk→0

ask→ ∞. Consider nowkK. Sincexk+1xk=yjk+1xk0, bothxk+1andxk converge toxinfasKk→ ∞with

¯

ϕjk

xk+1f(x).ˆ

And fromxk+1=¯(ϕ¯jk+h)(xk),η¯≥ρidand (50), for allωL0,

¯

ϕjk

xk+1+hxk+1+1 2μ¯x

k+1xk2

≤ ¯ϕjk(ω) +h(ω) + 1

2μ¯ωx k2

≤ ˇϕjk(ω) +h(ω) + 1

2μ¯ωx k2

f(ω) +h(ω) +1 2R¯ωx

k2.

Therefore, taking the limitkK, we have

f(x) +ˆ h(x)ˆ ≤f(ω) +h(ω) +1

2R¯ωxˆ

2, for allωL0.

On the other hand,xinfL0and for anyω/L0, it follows

F(x)ˆ ≤Fx0≤Fx0+M<F(ω) <F(ω) +1

2R¯ωxˆ 2.

Hence,

F(x)ˆ ≤F(ω) +1

2R¯ωxˆ

2, for allωRn.

Therefore,xˆ=pR¯F(x) withˆ R¯≥ρid, hencexˆis a stationary point ofFfrom Theorem 1.

5 Numerical results

This section aims to test the practical effectiveness of Algorithm 1. We tested a set of nine problems. The first set of seven problems are generalized from the unconstrained versions in [25] by imposing suitable constraints, the second set of two nonconvex unconstrained problems are taken from [13, 26] which are the sum of a nonconvex function and a convex function.

(20)

form of (2) withC={x:xab}, wherea∈Rnand 0 <bRare given below. These problems are transformed to the form of (3) by using the indicator function. The detailed data for the seven problems are listed below. For simplicity, we use the MATLAB nota-tions:ones(p,q)andzeros(p,q)denotep-by-qmatrices of ones and zeros, respec-tively.

CB2:f(x) =max{x21+x42, (2 –x1)2+ (2 –x2)2, 2ex2–x1},y0= (3, 3)T,a= (0, 0)T,b= 1.

CB3:f(x) =max{x4

1+x22, (2 –x1)2+ (2 –x2)2, 2ex2–x1},y0= (3, 3)T,a= (3, 3)T,b= 1.

LQ:f(x) =max{–x1–x2, –x1x2+x2

1+x22– 1},y0= (1, 1)T,a= (1, –1)T,b= 1.

Mifflin1:f(x) = –x1+ 20max{x21+x22– 1, 0},y0= (1.5, 0.5)T,a= (–2, 2)T,b= 1.

Rosen-Suzuki:f(x) =max1i4fi(x),y0= (1, 2.1, –3, –0.9)T,a= (1, 2, 3, 4)T,b= 2, with

f1(x) =x21+x22+ 2x23+x24– 5x1– 5x2– 21x3+ 7x4,

f2(x) =f1(x) + 10x21+x22+x23+x24+x1x2+x3x4– 8,

f3(x) =f1(x) + 10x21+ 2x22+x23+ 2x24x1x4– 10,

f4(x) =f1(x) + 102x21+x22+x23+ 2x1–x2x4– 5.

Shor: f(x) =max1≤i≤10{di 5

j=1(xj–cij)2)}, y0=zeros(10,1), a=zeros(10,1), b= 3,d= (1, 5, 10, 2, 4, 3, 1.7, 2.5, 6, 3.5)T,

C= ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝

0 2 1 1 3 0 1 1 0 1

0 1 2 4 2 2 1 0 0 1

0 1 1 1 1 1 1 1 2 2

0 1 1 2 0 0 1 2 1 0

0 3 2 2 1 1 1 1 0 0

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ .

MAXL:f(x) =max1≤i≤20|xi|,a= (–ones(1,10),ones(1,10))T,b= 4,y0= (1, 1.1, 3, 1.1, 5, 1.1, 7, 1.1, 9, 1.1, –11, 0.1, –13, 0.1, –15, 0.1, –17, 0.1, –19, 0.1)T.

The second set of two problems are:

Regular:F(x) =ni=1|fi(x)|+12x2, where

fi(x) :=ix2i – 2xi+ n

j=1

xj, i= 1, 2, . . . ,n

are the Ferrier polynomials.

L-Mifflin:F(x) = 2(x2

1+x22– 1) + 1.75|x21+x22– 1|.

The nonconvexity of the above two problems can be seen from Fig. 1.

(21)
[image:21.595.117.480.81.201.2] [image:21.595.118.477.252.411.2]

Figure 1The 3D images of Regular and L-Mifflin

Table 1 Numerical results for the first set of seven problems

Problem n Algorithm NI ND NF F

CB2 2 Algorithm 1 2 1 2 3.343146

PPBM 2 1 2 3.343146

CB3 2 Algorithm 1 9 4 9 24.479795

PPBM 11 8 11 24.479795

LQ 2 Algorithm 1 15 12 15 –0.999989

PPBM 18 17 18 –0.999996

Mifflin1 2 Algorithm 1 3 2 3 48.153612

PPBM 4 2 4 48.153612

Rosen-Suzuki 4 Algorithm 1 7 6 7 39.715617

PPBM 9 8 9 39.715617

Shor 5 Algorithm 1 5 1 5 50.250278

PPBM 4 1 4 50.250278

MAXL 20 Algorithm 1 19 6 19 0.552786

PPBM 22 2 22 0.552786

Table 2 Numerical results for “Regular” compared with RedistProx

ALBM RedistProx

n Fδ∗ NF Fδ∗ NF

1 0.000000 0.000001 52 0.500000 0.000010 5

2 0.012188 0.038685 35 0.000188 0.002720 301

3 0.000001 0.000001 88 0.000006 0.002360 90

4 0.000000 0.000001 105 0.000005 0.000682 301

5 0.000000 0.000001 134 7.948708 1.179115 3

6 0.000001 0.000001 220 0.000002 0.000001 289

7 0.000004 0.000007 301 0.008876 0.037044 103

8 0.064289 0.004921 301 0.000043 0.000240 301

9 0.000920 0.003238 301 0.281491 0.020101 301

10 0.004182 0.014245 301 0.426285 0.056585 301

Table 3 Numerical results for “Regular” and “L-Mifflin”

Regular L-Mifflin

x0 xFxF

[image:21.595.117.480.442.572.2] [image:21.595.116.479.605.674.2]

[1, 1] [0.0267, 0.0283] 0.003103 [–0.1362e–009, –0.1362e–009] –0.250000

[–1, –1] [0.0208, 0.0204] 0.001707 [0.2353e–009, 0.2353e–009] –0.250000

[10, 10] [0.0779, 0.1313] 0.089998 [0.4776e–008, 0.4776e–008] –0.250000

[–10, –10] [0.0626, 0.0635] 0.015955 [–0.4817e–008, –0.4817e–008] –0.250000

(22)

Acknowledgements

Project supported by the National Natural Science Foundation (11761013, 11771383) and Guangxi Natural Science Foundation (2013GXNSFAA019013, 2014GXNSFFA118001, 2016GXNSFDA380019) of China.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All authors read and approved the final manuscript. CT mainly contributed to the algorithm design and convergence analysis; JL mainly contributed to the convergence analysis and numerical results; and JJ mainly contributed to the idea of the method and algorithm design.

Author details

1College of Mathematics and Information Science, Guangxi University, Nanning, P.R. China.2College of Science, Guangxi

University of Nationalities, Nanning, P.R. China.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Received: 2 March 2018 Accepted: 28 March 2018

References

1. Chartrand, R.: Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process. Lett.14(10), 707–710 (2007)

2. Hare, W., Sagastizábal, C., Solodov, M.V.: A proximal bundle method for nonsmooth nonconvex functions with inexact information. Comput. Optim. Appl.63(1), 1–28 (2016)

3. Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res.35(2), 438–457 (2010)

4. Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program.146, 459–494 (2014)

5. Dinh, Q.T., Diehl, M.: Proximal methods for minimizing the sum of a convex function and a composite function (2011). arXiv:1105.0276

6. Eckstein, J., Svaiter, B.F.: A family of projective splitting methods for the sum of two maximal monotone operators. Math. Program.111(1), 173–199 (2007)

7. Goldfarb, D., Ma, S., Scheinberg, K.: Fast alternating linearization methods for minimizing the sum of two convex functions. Math. Program.141, 349–382 (2013)

8. Kiwiel, K.C.: A method for minimizing the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl.48(3), 437–449 (1986)

9. Kiwiel, K.C.: An alternating linearization bundle method for convex optimization and nonlinear multicommodity flow problems. Math. Program.130(1), 59–84 (2011)

10. Li, D., Pang, L., Chen, S.: A proximal alternating linearization method for nonconvex optimization problems. Optim. Methods Softw.29(4), 771–785 (2014)

11. Mine, H., Fukushima, M.: A minimization method for the sum of a convex function and a continuously differentiable function. J. Optim. Theory Appl.33(1), 9–23 (1981)

12. Tuy, H., Tam, B.T., Dan, N.D.: Minimizing the sum of a convex function and a specially structured nonconvex function. Optimization28, 237–248 (1994)

13. Hare, W.L., Sagastizábal, C.: A redistributed proximal bundle method for nonconvex optimization. SIAM J. Optim.

20(5), 2442–2473 (2010)

14. Cheney, E.W., Goldstein, A.A.: Newton’s method for convex programming and Tchebycheff approximations. Numer. Math.1, 253–268 (1959)

15. Kelley, J.E.: The cutting-plane method for solving convex programs. J. Soc. Ind. Appl. Math.8, 703–712 (1960) 16. Hare, W., Sagastizábal, C.A.: Computing proximal points of nonconvex functions. Math. Program.116(1), 221–258

(2009)

17. Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin (1998)

18. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim.14(5), 877–898 (1976) 19. Lemaréchal, C.: An extension of Davidon methods to nondifferentiable problems. Math. Program. Stud.3, 95–109

(1975)

20. Wolfe, P.: A method of conjugate subgradients for minimizing nondifferentiable functions. Math. Program. Stud.3, 145–173 (1975)

21. Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.: Numerical Optimization: Theoretical and Practical Aspects, 2nd edn. Springer, Berlin (2006)

22. Kiwiel, K.C.: Methods of Descent for Nondifferentiable Optimization. Lecture Notes in Mathematics, vol. 1133. Springer, Berlin (1985)

23. Kiwiel, K.C.: A method of centers with approximate subgradient linearizations for nonsmooth convex optimization. SIAM J. Optim.18(4), 1467–1489 (2008)

24. Kiwiel, K.C.: An algorithm for linearly constrained convex nondifferentiable minimization problems. J. Math. Anal. Appl.105(2), 452–465 (1985)

(23)

26. Bagirov, A.M., Karmitsa, N., Mäkelä, M.: Introduction to Nonsmooth Optimization: Theory, Practice and Software. Springer, Cham (2014)

Figure

Figure 1 The 3D images of Regular and L-Mifflin

References

Related documents