Convergence Analysis of a Proximal Point Algorithm for Minimizing Differences of Functions

(1)

Portland State University

PDXScholar

Mathematics and Statistics Faculty Publications and

Presentations

Fariborz Maseeh Department of Mathematics and

Statistics

2017

Convergence Analysis of a Proximal Point Algorithm for

Minimizing Differences of Functions

Thai An Nguyen

Institute of Research and Development, Duy Tan University

Mau Nam Nguyen

Portland State University, [email protected]

Let us know how access to this document benefits you.

Follow this and additional works at:

https://pdxscholar.library.pdx.edu/mth_fac

Part of the

Analysis Commons

This Post-Print is brought to you for free and open access. It has been accepted for inclusion in Mathematics and Statistics Faculty Publications and Presentations by an authorized administrator of PDXScholar. For more information, please [email protected].

Citation Details

Nguyen, Thai An and Nguyen, Mau Nam, "Convergence Analysis of a Proximal Point Algorithm for Minimizing Differences of Functions" (2017).Mathematics and Statistics Faculty Publications and Presentations. 167.

(2)

CONVERGENCE ANALYSIS OF A PROXIMAL POINT ALGORITHM FOR MINIMIZING DIFFERENCES OF FUNCTIONS

Nguyen Thai An1_{, Nguyen Mau Nam}2_,

Abstract. Several optimization schemes have been known for convex optimization problems. How-ever, numerical algorithms for solving nonconvex optimization problems are still underdeveloped. A progress to go beyond convexity was made by considering the class of functions representable as differences of convex functions. In this paper, we introduce a generalized proximal point algorithm to minimize the difference of a nonconvex function and a convex function. We also study conver-gence results of this algorithm under the main assumption that the objective function satisfies the Kurdyka - Lojasiewicz property.

Key words. DC programming, proximal point algorithm, diﬀerence of convex functions, Kurdyka - Lojasiewicz inequality.

AMS subject classifications. 49J52, 49J53, 90C31.

1 Introduction

In this paper, we introduce and study the convergence analysis of an algorithm for solving optimization problems in which the objective functions can be represented as differences of nonconvex and convex functions. The structure of the problem under consideration is flexible enough to include the problem of minimizing a smooth function on a closed set or minimizing a DC function, where DC stands for Difference of Convex functions. It is worth noting that DC programming is one of the most successful approaches to go beyond convexity. The class of DC functions is closed under many operations usually considered in optimization and is quite large to contain many objective functions in applications of optimization. Moreover, this class of functions possesses beautiful generalized differentiation properties and is favorable for applying numerical optimization schemes; see [1–3] and the references therein.

A pioneer in this research direction is Pham Dinh Tao who introduced a simple algorithm called the (DCA) based on generalized differentiation of the functions involved as well as their Fenchel conjugates [4]. Over the past three decades, Pham Dinh Tao, Le Thi Hoai An and many others have contributed to providing mathematical foundation for the algorithm and making it accessible for applications. The (DCA) nowadays becomes a classical tool in the field of optimization due to several key features including simplicity, inexpensiveness, flexibility and efficiency; see [5–8].

1

Thua Thien Hue College of Education, 123 Nguyen Hue, Hue City, Vietnam ([email protected]). The research of Nguyen Thai An was supported by the Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 101.01-2014.37.

2

Fariborz Maseeh Department of Mathematics and Statistics, Portland State University, PO Box 751, Portland, OR 97207, United States ([email protected]). The research of Nguyen Mau Nam was partially supported by the USA National Science Foundation under grant DMS-1411817 and the Simons Foundation under grant #208785.

(3)

The proximal point algorithm(PPA for short) was suggested by Martinet [9] for solving convex optimization problems and was extensively developed by Rockafellar [10] in the con-text of monotone variational inequalities. The main idea of this method consists of replacing the initial problem with a sequence of regularized problems, so that each particular auxil-iary problem can be solved by one of the well-known algorithms. Along with the (DCA), a number of proximal point optimization schemes have been proposed in [11–14] to minimize differences of convex functions. Although convergence results for the (DCA) and the proxi-mal point algorithms for minimizing differences of convex functions have been addressed in some recent research, it is still an open research question to study the convergence analysis of algorithms for minimizing differences of functions in which convexity is not assumed.

Based on the method developed recently in [15–17], we study a proximal point algorithm for minimizing the diﬀerence of nonsmooth functions in which only the second function involved is required to be convex. Under the main assumption that the objective function satisﬁes the Kurdyka - Lojiasiewicz property, we are able to analyze the convergence of the algorithm. Our results further recent progress in using the Kurdyka - Lojiasiewicz property and variational analysis to study nonsmooth numerical algorithms pioneered by Attouch, Bolte, Redont, Soubeyran, and many others. The paper is organized as follows. In Section 2, we provide tools of variational analysis used throughout the paper. Section 3 is the main section of the paper devoted to the generalized proximal point algorithm and its convergence results. Applications to trust-region subproblems and nonconvex feasibility problems are introduced in Section 4.

2 Tools of Variational Analysis

In this section, we recall some basic concepts and results of generalized diﬀerentiation for nonsmooth functions used throughout the paper; see, e.g., [18–21] for more details. We use Rn _{to denote the} _n _{- dimensional Euclidean space,} _h·_,_·i _{to denote the inner product,} and k · k to denote the associated Euclidean norm. For an extended-real-value function

f :Rn_→R_{∪ {+∞}, the domain of}_f _{is the set}

domf ={x∈Rn_: _f₍_x₎_<_+∞}_. The function f is said to beproper if its domain is nonempty.

Given a lower semicontinuous functionf :Rn_→R_{∪ {+∞}}_{with ¯}_x_∈_dom_f_{, the}_Fréchet subdifferential of f at ¯x is defined by ∂Ff(¯x) = v ∈Rn: lim inf x→¯x f(x)−f(¯x)− hv, x−x¯i kx−x¯k ≥0 .

We set ∂F_f_(¯_x_{) =}_∅ _{if ¯}_{x /}_∈_dom_f_{. Note that the Fréchet subdifferential mapping does not} have a closed graph, so it is unstable computationally. Based on the Fréchet subdifferential, thelimiting/Mordukhovich subdifferential of f at ¯x∈domf is defined by

∂Lf(¯x) = Lim sup x−→f ¯x

(4)

where the notation x −→f x¯ means that x → x¯ and f(x) → f(¯x). We also set ∂Lf(¯x) = ∅ if ¯x /∈domf. It follows from the deﬁnition the following robustness/closedness property of

∂L_f_:

v∈Rn_: _∃_xk f₋_→_{x, v}_¯ k_→_{v, v}k _∈_∂L_f₍_xk₎

=∂Lf(¯x).

Obviously, we have ∂F_f₍_x₎ _⊂ _∂L_f₍_x_{) for every} _x _∈ _Rn_{, where the first set is closed and} convex while the second one is closed; see [22, Theorem 8.6, p 302]. Iff is differentiable at ¯x, then∂F_(¯_x_{) =}_{∇_f_(¯_x_{)}. Moreover, if} _f _{is continuously differentiable on a neighborhood of} ¯

x, then∂Lf(¯x) ={∇f(¯x)}. Whenf is convex, the Fréchet and the limiting subdifferentials reduce to the subdifferential in the sense of convex analysis:

∂f(¯x) ={v∈Rn_: _h_{v, x}₋_x_¯_{i ≤}_f₍_x₎₋_f_(¯_x₎_, _∀_x_∈Rn_}_. For a convex subset Ω of Rn_{and ¯}_x_∈_{Ω, the}_{normal cone} _{to Ω at ¯}_x_{is the set}

N(¯x; Ω) ={v∈Rn_: _h_{v, x}₋_x_¯_{i ≤}₀_, _∀_x_∈_Ω}_.

This normal cone can be represented as the subdiﬀerential at the point under consideration of the indicator function:

δ(x; Ω) =

(

0 if ¯x∈Ω,

+∞ if ¯x /∈Ω,

i.e., N(¯x; Ω) = ∂δ(¯x; Ω). We use the notation dist(¯x; Ω) to denote the distance from ¯x to Ω, i.e., dist(¯x; Ω) = infx∈Ωkx−x¯k. The notation PΩ(¯x) ={w¯∈Ω : k¯x−w¯k= dist(¯x; Ω)}

stands for the projection from ¯xonto Ω. We also usedΩ(¯x) for dist(¯x; Ω) where convenience.

Another subdifferential concept called the Clarke subdifferential was defined in [18] based on generalized directional derivatives. The Clarke subdifferential of a locally Lipschitz continuous function f around ¯x can be represented in terms of the limiting subdifferential:

∂Cf(¯x) = co∂Lf(¯x).

Here co Ω denotes the convex hull of an arbitrary set Ω.

Proposition 2.1 ([22, Exercise 8.8, p. 304]). Letf =g+hwheregis lower semicontinuous and let h is continuously diﬀerentiable on a neighborhood of x¯. Then

∂Ff(¯x) =∂Fg(¯x) +∇h(¯x) and ∂Lf(¯x) =∂Lg(¯x) +∇h(¯x).

Proposition 2.2 ([22, Theorem 10.1, p. 422]). If a lower semicontinuous function f : Rn _→ R_{∪ {+∞}} _{has a local minimum at} _x_¯ _∈ _dom_f_{, then} ₀ _∈ _∂F_f_(¯_x₎ _⊂_∂L_f_(¯_x_{). In the} convex case, this condition is not only necessary for a local minimum but also suﬃcient for a global minimum.

Proposition 2.3 Let h:Rn_→R_{be a ﬁnite convex function on} Rn_{. If} _yk _∈_∂h₍_xk₎_{for all} k and {xk_} _{is bounded, then the sequence} _{_yk_} _{is also bounded.}

(5)

ProofThe result follows from the fact thath is locally Lipschitz continuous onRn _{and [22,} Deﬁnition 5.14, Proposition 5.15, Theorem 9.13].

Following [15, 16], a lower semicontinuous function f: Rn _→ R_{∪ {+∞}} _{satisﬁes the} Kurdyka - Lojasiewicz property at x∗∈dom∂L_f _{if there exist} _{ν >}_{0, a neighborhood}_V _of

x∗_{, and a continuous concave function} _ϕ_{: [0}_{, ν}_[→_[0_,_{+∞[ with}

(i)ϕ(0) = 0.

(ii) ϕis of classC1 on ]0, ν[.

(iii) ϕ′ _>_{0 on ]0}_{, ν}_[.

(iv)For every x∈ V withf(x∗)< f(x)< f(x∗) +ν, we have

ϕ′(f(x)−f(x∗)) dist 0, ∂Lf(x)

≥1.

We say thatf satisﬁes thestrong Kurdyka - Lojasiewicz property atx∗if the same assertion holds for the Clarke subdiﬀerential∂Cf(x).

According to [15, Lemma 2.1], a proper lower semicontinuous function f : Rn _→ R_∪ {+∞}has the Kurdyka - Lojasiewicz property at any point ¯x∈Rn _{such that 0}_∈_/ _∂L_f_(¯_x_). Recall that a subset Ω ofRnis calledsemi-algebraic if it can be represented as a ﬁnite union of sets of the form

{x∈Rn_: _p_i(_x_{) = 0}_{, q}_i(_x₎_<_{0 for all}_i_{= 1}_{, . . . , m}_}_,

where pi and qi for i = 1, . . . , m are polynomial functions. A function f is said to be semi-algebraic if its graph is a semi-algebraic subset of Rn+1_{. It is known that a proper} lower semicontinuous semi-algebraic function always satisfies the Kurdyka - Lojasiewicz property; see [15, 23]. In a recent paper, Bolte et al. [23, Theorem 14] showed that the class of definable functions, which contains the class of semi-algebraic functions, satisfies the strong Kurdyka - Lojasiewicz property at each point of dom∂C_f_.

3 A Generalized Proximal Point Algorithm for Minimizing

Differences of Functions

We focus on the convergence analysis of a proximal point algorithm for solving nonconvex optimization problems of the following type

min

f(x) =g1(x) +g2(x)−h(x) : x∈Rn

, (3.1)

where g1(x) : Rn → R∪ {+∞} is proper and lower semicontinuous, g2(x) : Rn → R is

differentiable withL- Lipschitz gradient, andh:Rn_→R_{is convex. The specific structure} of (3.1) is flexible enough to include the problem of minimizing a smooth function on a closed constraint set:

min{g(x) : x∈Ω},

and the general DC problem: min

(6)

whereg:Rn_→R_{∪ {+∞}}_{is a proper lower semicontinuous convex function and}_h_:Rn_→R is convex.

It is well-known that if ¯x∈domf is a local minimizer of (3.2), then

∂h(¯x)⊂∂g(¯x). (3.3) Any point ¯x∈domf that satisﬁes (3.3) is called astationary point of (3.2), and any point ¯

x ∈ domf such that ∂g(¯x)∩∂h(¯x) 6= ∅ is called a critical point of this problem. Since

h is a ﬁnite convex function, its subdiﬀerential at any point is nonempty, and hence any stationary point of (3.2) is a critical point; see [5, 24, 25] and the references therein for more details.

Let us recall below a necessary optimality condition from [26] for minimizing diﬀerences of functions in the nonconvex settings.

Proposition 3.1 ([26, Proposition 4.1]) Consider the diﬀerence functionf =g−h, where

g:Rn_→ R_{∪ {+∞}} _and _h _:Rn_→ R _{are lower semicontinuous functions. If} _x_¯ _∈_dom_f _is a local minimizer of f, then we have the inclusion

∂Fh(¯x)⊂∂Fg(¯x).

If in addition h is convex, then ∂h(¯x)⊂∂L_g_(¯_x₎_.

When adapting to the setting of (3.1), we obtain the following optimality condition.

Proposition 3.2 If x¯∈domf is a local minimizer of the function f considered in (3.1), then

∂h(¯x)⊂∂Lg1(¯x) +∇g2(¯x). (3.4)

Proof The assertion follows from Proposition 2.1 and Proposition 3.1.

Following the DC case, any point ¯x ∈ domf satisfying condition (3.4) is called a sta-tionary point of (3.1). In general, this condition is hard to be reached and we may relax it to

[∂Lg1(¯x) +∇g2(¯x)]∩∂h(¯x)6=∅ (3.5)

and call ¯x a critical point of f. Obviously, every stationary point ¯x is a critical point. Moreover, by [26, Corollary 3.4] at any point ¯x withg1(¯x)<+∞, we have

∂L(g1+g2−h)(¯x)⊂∂Lg1(¯x) +∇g2(¯x)−∂h(¯x).

Thus, if 0∈∂L_f_(¯_x_{), then ¯}_x_{is a critical point of}_f _{in the sense of (3.5). The converse is not} true in general as shown by the following example. Consider the functions below

f(x) = 2|x|+ 3x, g1(x) = 3|x|, g2(x) = 3x, and h(x) =|x|.

In this case, ¯x = 0 satisﬁes (3.5) but 0 ∈/ ∂L_f_(¯_x_{) since} _∂g

1(0) = [−3,3], ∇g2(0) = 3,

∂h(0) = [−1,1] and ∂f(0) = [1,5]. However, it is easy to check that these two conditions are equivalent when h is diﬀerentiable onRn_.

(7)

We recall now theMoreau/Moreau-Yoshida proximal mapping for a nonconvex function; see [22, page 20]. Letg:Rn_→R_{∪ {+∞}}_{be a proper lower semicontinuous function. The} Moreau proximal mapping with regularization parameter t > 0, proxg_t : Rn _→ ₂Rn_{, is} deﬁned by proxg_t(x) = argmin g(u) + t 2ku−xk 2 _: _u_∈_Rn .

As an interesting case, wheng is the indicator functionδ(·; Ω) associated with a nonempty closed set Ω, proxg_t(x) coincides with the projection mapping.

Under the assumption infx∈Rng(x) > −∞, the lower semicontinuity of g and the co-ercivity of the squared norm imply that the proximal mapping is well-deﬁned; see [27, Proposition 2.2].

Proposition 3.3 Let g : Rn _→ R_{∪ {+∞}} _{be a proper lower semicontinuous function}

with infx∈Rng(x) >−∞. Then, for every t ∈(0,+∞), the set proxg_t(x) is nonempty and compact for everyx ∈Rn_.

We now introduce a new generalized proximal point algorithm for solving (3.1). Let us begin with the lemma below regarding an upper bound for a smooth function with Lipschitz continuous gradient; see [28, 29].

Proposition 3.4 If g : Rn _→ R _{is a diﬀerentiable function with} _L _{- Lipschitz gradient,} then

g(y)≤g(x) +h∇g(x), y−xi+L

2ky−xk

2 _{for all} _{x, y}_∈_Rn_. _(3.6) Let us introduce the generalized proximal point algorithm (GPPA) below to solve (3.1).

Generalized Proximal Point Algorithm (GPPA)

1. Initialization: Choosex0 _∈_dom_g

1 and a tolerance ǫ >0. Fix anyt > L.

2. Find yk∈∂h(xk). 3. Findxk+1 _{as follows} xk+1 ∈proxg1_t xk−∇g2(x k₎₋_yk t . (3.7)

4. Ifkxk₋_xk+1_{k ≤}_ǫ_{, then exit. Otherwise, increase} _k _{by 1 and go back to step 2.}

From the deﬁnition of proximal mapping, (3.7) is equivalent to saying that

xk+1∈argmin x∈Rn g1(x)− hyk− ∇g2(xk), x−xki+ t 2kx−x k_k2 . (3.8)

Theorem 3.5 Consider the (GPPA)for solving (3.1) in which g1(x) : Rn→R∪ {+∞}is

proper and lower semicontinuous with infx∈Rng₁(x)>−∞,g₂(x) :Rn→Ris diﬀerentiable withL - Lipschitz gradient, and h:Rn_→R _{is convex. Then}

(i) For any k≥1, we have

f(xk)−f(xk+1)≥ t−L 2 kx

(8)

(ii) If α= inf x∈Rnf(x)>−∞, then _k_→lim₊_∞f(x k_{) =}_ℓ∗ _≥_α _and _lim k→+∞kx k₋_xk+1_k_{= 0.} (iii) If α = inf x∈Rnf(x) > −∞ and {x

k_} _{is bounded, then every cluster point of} _{_xk_} _{is a} critical point of f.

Proof (i)By Proposition 2.1 and Proposition 4.3, it follows from (3.8) that

yk− ∇g2(xk)∈∂Lg1(xk+1) +t xk+1−xk. (3.10) Since yk _∈_∂h₍_xk_), h(xk+1)≥h(xk) +hyk, xk+1−xki. (3.11) From (3.8), we have g1(xk)≥g1(xk+1)− hyk− ∇g2(xk), xk+1−xki+ t 2kx k₋_xk+1_k2_. _(3.12)

Adding (3.11) and (3.12) and using (3.6), we get

g1(xk)−h(xk)≥g1(xk+1)−h(xk+1) +h∇g2(xk), xk+1−xki+ t 2kx k₋_xk+1_k2 ≥g1(xk+1)−h(xk+1) + g2(xk+1)−g2(xk)− L 2kx k₋_xk+1_k2 + t 2kx k₋_xk+1_k2_. This implies f(xk)−f(xk+1)≥ t−L 2 kx k₋_xk+1_k2_.

Assertion (i)has been proved.

(ii)It follows from the assumptions made and(i)that{f(xk)}is monotone decreasing and bounded below, so the ﬁrst assertion of (ii)is obvious. Observe that

m X k=1 kxk−xk+1k2≤ 2 t−L f(x 1₎₋_f₍_xm+1₎ ≤ 2 t−L f(x 1₎₋_α for all m∈N_. Thus, the sequence{kxk₋_xk+1_k} _{converges to 0.}

(iii) From (3.8), for allx∈Rn_{, we have} g1(xk+1)− hwk, xk+1−xki+ t 2kx k+1₋_xk_k2_≤_g 1(x)− hwk, x−xki+ t 2kx−x k_k2_, _(3.13) where wk ₌ _yk_{− ∇}_g

2(xk). Now suppose further that {xk} is bounded. Since h is ﬁnite

convex function onRn_,_yk_∈_∂h₍_xk_{) and}_{_xk_}_{is bounded, from Proposition 2.3,}_{_yk_}_{is also} bounded. We can take two subsequences: {xkℓ}of{xk} and{ykℓ}of{yk}that converge to x∗ andy∗, respectively. Becausekxkℓ−_xkℓ+1k →_{0 as}_ℓ→+∞, we deduce from (3.13) that

lim sup ℓ→+∞ g1(xkℓ+1)≤g1(x)− hy∗− ∇g2(x∗), x−x∗i+ t 2kx−x ∗_k2 _{for all}_x_∈_Rn_.

In particular, for x=x∗, we get lim sup

ℓ→+∞

(9)

Combining this with the lower semicontinuity ofg1, we get

lim ℓ→+∞g1(x

kℓ+1_{) =}_g

1(x∗).

From the closed property of the subdiﬀerential mapping ∂h(·), we have y∗ ∈ ∂h(x∗). It follows from (3.10) that there existszkℓ+1_∈_∂L_g

1(xkℓ+1) satisfying

kykℓ_{− ∇}_g

2(xkℓ)−zkℓ+1k=tkxkℓ−xkℓ+1k.

By (ii)and the Lipschitz continuity of ∇g2,

lim ℓ→+∞z

kℓ+1₌_y∗_{− ∇}_g

2(x∗) :=z∗.

Thus, xkℓ+1 −→g1 _x∗_, _zkℓ+1 ∈ _∂L_g

1(xkℓ+1), zkℓ+1 → z∗ as ℓ → +∞, it follows from the

robustness of limiting subdiﬀerential thatz∗ ∈∂L_g

1(x∗). Therefore,

y∗ ∈[∂Lg1(x∗) +∇g2(x∗)]∩∂h(x∗).

This implies thatx∗ is a critical point off and the proof is complete. Proposition 3.6 Suppose that infx∈Rnf(x)>−∞,f is proper and lower semicontinuous. If the (GPPA) sequence {xk} has a cluster point x∗, then lim

k→+∞f(x

k_{) =} _f₍_x∗_{). Thus,} _f

has the same value at all cluster points of {xk_}.

Proof Since infx∈Rnf(x) > −∞, it follows from (3.9) that the sequence of real numbers {f(xk_)} _{is non-increasing and bounded below. Thus, lim}

k→+∞f(x

k_{) =}_ℓ∗ _{exists. If}_{_xkℓ_} _{is a} subsequence converging tox∗, then by the lower semicontinuity off, we have lim inf

ℓ→+∞f(x

kℓ₎≥ f(x∗_{). Observe from the structure of}_f _{that dom}_f _{= dom}_g

1. Sinceg2 andhare continuous,

f is proper and lower semicontinuous if and only if g1 is proper and lower semicontinuous.

To prove the opposite inequality, we employ the proof of (iii)of Theorem 3.5 and get lim sup ℓ→+∞ f(xkℓ_{) = lim sup} ℓ→+∞ g1(xkℓ) +g2(xkℓ)−h(xkℓ) ≤lim sup ℓ→+∞ g1(xkℓ) + lim sup ℓ→+∞ g2(xkℓ)−lim inf ℓ→+∞h(x kℓ₎ ≤g1(x∗) +g2(x∗)−h(x∗) =f(x∗).

Combining this with the uniqueness of limit, we haveℓ∗ ₌_f₍_x∗_{). The proof is complete.}

Remark 3.7 (i)If g is also convex, we can get a stronger inequality than (3.9) and relax the range of the regularization parameter t. Indeed, using deﬁnition of the subdiﬀerential in the sense of convex analysis in (3.10), we have

hyk− ∇g2(xk)−t(xk+1−xk), xk−xk+1i ≤g1(xk)−g1(xk+1).

Since yk _∈_∂h₍_xk_),

(10)

Adding these inequalities and using (3.6) give f(xk)−f(xk+1)≥ t−L 2 kxk−xk+1k2.

Thus, we can choose t > L

2 instead of t > Las before.

(ii) When h(x) = 0, the (GPPA) reduces to the proximal forward - backward algorithm for minimizing f =g1+g2 considered in [30]. If h(x) = 0 andg1 is the indicator function

δ(·; Ω) associated with a nonempty closed set Ω, then the (GPPA) reduces to the projected gradient method (PGM) for minimizing the smooth functiong2 on a nonconvex constraint

set Ω: xk+1=PΩ xk−1 t∇g2(xk) .

(iii) If g2 = 0, then the (GPPA) reduces to the (PPA) with constant stepsize proposed in

[11, 31].

In the theorem below, we establish suﬃcient conditions that guarantee the convergence of the sequence {xk} generated by the (GPPA). These conditions include the Kurdyka Lojasiewicz property of the function f and the diﬀerentiability with Lipschitz gradient of

h. In what follows, let C∗ _{denote the set of cluster points of the sequence} _{_xk_{}. We follow} the method from [15, 16].

Theorem 3.8 Suppose that infx∈Rnf(x) >−∞, and f is lower semicontinuous. Suppose further that∇hisL(h) - Lipschitz continuous andf has the Kurdyka - Lojasiewicz property at any point x∈domf. If C∗ 6=∅, then the (GPPA) sequence {xk_} _{converges to a critical} point of f.

ProofTake anyx∗ ∈C∗and a subsequence{xkℓ}_{that converges to}_x∗_{. Applying Proposition} 3.6 yields

lim k→+∞f(x

k_{) =}_ℓ∗ ₌_f₍_x∗₎_.

If f(xk_{) =} _ℓ∗ _{for some} _k _≥ _{1, then} _f₍_xk_{) =} _f₍_xk+p_{) for any} _p _≥ _{0 since the sequence} {f(xk_)} _{is monotone decreasing by (3.9). Therefore,} _xk ₌ _xk+p _{for all} _p _≥ _{0. Thus, the} (GPPA) terminates after a ﬁnite number of steps. Without loss of generality, from now on, we assume thatf(xk₎_{> ℓ}∗ _{for all} _k_.

Recall that the (GPPA) starts from a point x0 _∈ _dom_g

1 and generates two sequences

{xk_} _and _{_yk_} _with_yk_∈_∂h₍_xk_{) =}_∇_h₍_xk_{) and}

yk−1− ∇g2(xk−1)−t(xk−xk−1)∈∂Lg1(xk).

Thus, from Proposition 2.1 we have

yk−1− ∇g2(xk−1)−t(xk−xk−1)

+∇g2(xk)−yk∈∂Lg1(xk) +∇g2(xk)− ∇h(xk) =∂Lf(xk).

Using the Lipschitz continuity of ∇g2 and ∇h, we have

y k−1_{− ∇}_g 2(xk−1)−t(xk−xk−1) +∇g2(xk)−yk = = ∇h(xk−1)− ∇h(xk)+∇g2(xk)− ∇g2(xk−1) −t(xk−xk−1) ≤(L(h) +L+t)kxk−1−xkk ≤Mkxk−1−xkk,

(11)

whereM :=L(h) +L+t. Therefore,

dist0;∂Lf(xk)≤Mkxk−1−xkk. (3.14) According to the assumption thatf has the strong Kurdyka - Lojasiewicz property at x∗, there exist ν > 0, a neighborhood V of x∗, and a continuous concave function ϕ: [0, ν[→ [0,+∞[ so that for allx∈ V satisfying ℓ∗ _{< f}₍_x₎_{< ℓ}∗₊_ν_{, we have}

ϕ′(f(x)−ℓ∗) dist 0;∂Lf(x)

≥1. (3.15)

Let δ > 0 small enough such that B₍_x∗_;_δ₎ _{⊂ V. Using the facts that} _lim ℓ→+∞x kℓ ₌ _x∗_, lim k→+∞kx k+1₋_xk_k _{= 0, lim} k→+∞f(x

k_{) =} _ℓ∗_{, and} _f₍_xk₎ _{> ℓ}∗ _{for all} _k_{, we can ﬁnd a natural}

number N large enough satisfying

xN ∈B₍_x∗_;_δ₎_{, ℓ}∗ _{< f}₍_xN₎_{< ℓ}∗₊_ν, _(3.16) and kxN ₋_x∗_k₊ kxN −xN−1k 4 +γϕ f(x N₎₋_ℓ∗ < 3δ 4 , (3.17) where γ = _t2₋M_L >0. We will show that for allk ≥N,xk ∈B₍_x∗_;_δ_{). To this end, we ﬁrst} show that whenever xk_∈_B₍_x∗_;_δ_{) and}_ℓ∗ _{< f}₍_xk₎_{< ℓ}∗₊_ν _{for some}_k_{, we have}

kxk−xk+1k ≤ kx

k−1₋_xk_k 4 +γ

h

ϕf(xk)−ℓ∗−ϕf(xk+1)−ℓ∗i. (3.18) Indeed, by (3.14), the concavity ofϕ, (3.15), and (3.9), we have

Mkxk−1₋_xk_kh_ϕ_f₍_xk₎₋_ℓ∗₋_ϕ_f₍_xk+1₎₋_ℓ∗i ≥dist0;∂Lf(xk) hϕf(xk)−ℓ∗−ϕf(xk+1)−ℓ∗i ≥dist0;∂Lf(xk)ϕ′f(xk)−ℓ∗ hf(xk)−f(xk+1)i ≥ t−L 2 kx k₋_xk+1_k2_. It follows that ϕf(xk)−ℓ∗−ϕf(xk+1)−ℓ∗≥ 1 γ kxk₋_xk+1_k2 kxk−1₋_xk_k ≥ 1 γ kxk−xk+1k − kx k−1₋_xk_k 4 , (3.19) where the last inequality holds since a2

b ≥a− b

4 for any positive real numbersaandb. This

implies (3.18).

We next show thatxk_∈_B₍_x∗_;_δ_{) for all}_k_≥_N _{by induction. The claim is true for}_k₌_N

by the construction above. Now suppose the assertion holds for k =N, . . . , N +k−1 for somek≥1, i.e.,xN_{, . . . , x}N+k−1_∈_B₍_x∗_;_δ_{). Since} _f₍_xk_{) is a non-increasing sequence that} converges to ℓ∗_{, our choice of} _N _{implies that} _ℓ∗ _{< f}₍_xk₎ _{< ℓ}∗ ₊_ν _{for all} _k _≥ _N_{. In}

(12)

particular, (3.18) can be applied for allk=N, . . . , N+k−1. Using the estimation (3.18) fork=N, . . . , N +k−1, we have kxN −xN+1k ≤ kx N−1₋_xN_k 4 +γ ϕ f(xN)−ℓ∗ −ϕ f(xN+1)−ℓ∗ , kxN+1−xN+2k ≤ kx N ₋_xN+1_k 4 +γ ϕ f(xN+1)−ℓ∗ −ϕ f(xN+2)−ℓ∗ , . . . kxN+k−1−xN+kk ≤ kx N+k−2₋_xN+k−1_k 4 +γ h ϕf(xN+k−1)−ℓ∗−ϕf(xN+k)−ℓ∗i. Therefore, k X j=1 kxN+j−xN+j−1k ≤ 1 4 k X j=1 kxN+j−xN+j−1k+ kx N−1₋_xN_k 4 − kxN+k−1₋_xN+k_k 4 +γhϕ f(xN)−ℓ∗ −ϕf(xN+k)−ℓ∗i.

Making use of the non-negativity of ϕ, we get k X j=1 kxN+j−xN+j−1k ≤ 4 3 kxN−1₋_xN_k 4 +γϕ f(x N )−ℓ∗ . (3.20) It follows that kxN+k−x∗k ≤ kxN −x∗k+ k X j=1 kxN+j−xN+j−1k ≤ 4 3 kxN −x∗k+ kx N−1₋_xN_k 4 +γϕ f(x N₎₋_ℓ∗ < δ.

Thus, xk _∈ _B₍_x∗_;_δ_{) for all} _k _≥ _N_{. Since} _xk _∈ _B₍_x∗_{, δ}_{) and} _ℓ∗ _{< f}₍_xk₎ _{< ℓ}∗₊_ν _{for all}

k≥N, it follows from (3.20) by lettingk→+∞thatP+∞

k=1kxk+1−xkk<+∞. Therefore,

{xk} is a Cauchy sequence and hence it is a convergent sequence.

Below is another theorem which gives suﬃcient conditions that guarantee the conver-gence of the sequence {xk} generated by (GPPA). In contrast to Theorem 3.8, we require the diﬀerentiability with Lipschitz gradient of the functiong1+g2 instead ofh along with

the strong Kurdyka - Lojasiewicz property off. In this case, without loss of generality, we can assume thatg1(x) = 0. In the next result, for convenience, we putg2(x) =g(x).

Theorem 3.9 Consider the difference of functions f = g−h with infx∈Rnf(x) > −∞. Suppose thatgis differentiable and∇gisL- Lipschitz continuous,f has the strong Kurdyka - Lojasiewicz property at any point x∈domf, and h is a finite convex function. IfC∗ 6=∅, then the (GPPA)sequence {xk_} _{converges to a critical point of}_f_.

Proof The proof is very similar to that of Theorem 3.8, except a few adjustments. Note thatf is locally Lipschitz continuous under the assumptions made since gis aC1 function and h is a ﬁnite convex function. By (3.10), we have

(13)

This implies,

yk−yk−1−t(xk−xk−1)=∇g(xk+1)− ∇g(xk) +t(xk+1−xk).

Making use of the Lipschitz continuity of ∇gyields

y k₋_yk−1₋_t₍_xk₋_xk−1₎ = ∇g(x k+1₎_{− ∇}_g₍_xk_{) +}_t₍_xk+1₋_xk₎ ≤(L+t)kxk−xk+1k.

On the other hand,

yk−yk−1−t(xk−xk−1)∈∂h(xk)− ∇g(xk) =∂F(−f)(xk)⊂∂C(−f)(xk).

Since ∂C₍₋_f₎₍_xk_{) =}₋_∂C_f₍_xk_{), we have}

dist0;∂Cf(xk)= dist0;∂C(−f)(xk)≤(L+t)kxk−xk+1k.

ChooseN as in (3.16) and (3.17) with γ = 2_tL₋+2_Lt instead of _t2₋M_L as before. For all klarge enough such that xk_∈_B₍_x∗_;_r_{) and}_ℓ∗ _{< f}₍_xk₎_{< ℓ}∗₊_ν_{, we have}

(L+t)kxk−xk+1khϕf(xk)−ℓ∗−ϕf(xk+1)−ℓ∗i ≥dist0;∂Cf(xk) hϕf(xk)−ℓ∗−ϕf(xk+1)−ℓ∗i ≥dist0;∂Cf(xk)ϕ′f(xk)−ℓ∗ hf(xk)−f(xk+1)i ≥ t−L 2 kx k₋_xk+1_k2_. It follows that kxk−xk+1k ≤γhϕf(xk)−ℓ∗−ϕf(xk+1)−ℓ∗i. (3.21) From this, the induction to prove thatxk_∈_B₍_x∗_;_r_{) for all}_k_≥_N _{can be carried out similarly}

to the proof of Theorem 3.8. Indeed, suppose the assertion holds for k=N, . . . , N +k−1 for somek≥1, i.e., xN_{, . . . , x}N+k−1 _∈_B₍_x∗_;_r_{). Observe that}

kxN+k−x∗k ≤ kxN−x∗k+ k X j=1 kxN+j−1−xN+jk ≤ kxN−x∗k+γ k X j=1 ϕ f(xN+j−1)−ℓ∗ −ϕ f(xN+j)−ℓ∗ ≤ kxN−x∗k+γϕ f(xN)−ℓ∗ < r.

Thus, xk _∈ _B₍_x∗_;_r_{) for all} _k _≥ _N_{. Since} _xk _∈ _B₍_x∗_{, r}_{) and} _ℓ∗ _{< f}₍_xk₎ _{< ℓ}∗₊_ν _{for all}

k ≥N, we can sum (3.21) from k =N to some N1 greater than N and take the limit as

N1 →+∞, showing that P∞k=1kxk+1−xkk<+∞. This completes the proof.

In the proposition below, we give suﬃcient conditions for the set of cluster points C∗ of the (GPPA) sequence{xk} to be nonempty.

(14)

Proposition 3.10 Consider the function f =g−h, where g=g1+g2 in (3.1). Let {xk}

be sequence generated by the(GPPA)for solving (3.2). The set of critical points C∗ _of _{_x_k}

is nonempty if one of the following conditions is satisﬁed:

(i) For any α, the lower level set L≤α :={x∈Rn: f(x)≤α} is bounded.

(ii) lim inf

kxk→+∞h(x) = +∞ and klim infxk→+∞

g(x)

h(x) >1.

Proof The conclusion under(i) follows directly form the facts thatf(xk₎ _≤_f₍_x0_{) for all} _k

and L_≤f(x0

) is bounded. Now assume that (ii) is satisﬁed. Then there exist M > 1 and

R >0 such thatg(x)≥M h(x) for all x satisfying kxk ≥R. It follows that lim inf

kxk→+∞f(x) = lim infkxk→+∞[g(x)−h(x)]≥(M −1) lim infkxk→+∞h(x) = +∞.

Thus,f is coercive. Combining this with the descent property of the sequence{f(xk_{)}, we} can conclude that {xk_} _{is bounded.} It is known from [23, Corollary 16] and [15, Section 4.3] that a proper lower semicontin-uous semi-algebraic function f on Rn _{always satisﬁes the Kurdyka - Lojasiewicz property} at all points in dom∂f with ϕ(s) = cs1−θ for some θ ∈ [0,1[ and c > 0. We now derive convergence rates of the (GPPA) sequence by examining the range of the exponent.

Theorem 3.11 Consider the settings of Theorems 3.8 and 3.9. Suppose further that f is a proper closed semi-algebraic function so that the function ϕin the Kurdyka - Lojasiewicz property has the form ϕ(s) = cs1−θ _{for some} _θ _∈ _[0_,_1[ _and _{c >} _{0. Then we have the} following conclusions.

(i) If θ= 0, then the sequence {xk_} _{converges in a ﬁnite number of steps.}

(ii) If 0< θ≤ 1₂, then there exist µ >0 and q∈(0,1) satisfying kxk−x∗k ≤µqk.

(iii) If 1₂ < θ <1, then there exists µ >0 such that kxk−x∗k ≤µk11₋−2θθ. Proof For each k ≥1, set ∆k =P+∞

p=kkxp+1−xpk and set ℓk =f(xk)−ℓ∗. It is obvious from the triangle inequality that kxk₋_x∗_{k ≤} _{∆k. From Kurdyka - Lojasiewicz property}

with the special form of ϕ, we have

c(1−θ)ℓ−_kθdist0;∂Lf(xk)≥1. (3.22) From the proof of Theorem 3.9, if ∇g is L- Lipschitz continuous, then

dist0;∂Lf(xk)≤(L+t)kxk+1−xkk,

for all suﬃciently large k. Combining this with (3.21) yields ∆k≤γϕ(ℓk)≤γϕ(ℓk−1) =γcℓk1−−θ1 ≤γc[(L+t)c(1−θ)] 1 −θ θ kxk−xk−1k 1₋θ θ ,

(15)

whereγ = 2L_t₋+2_Lt.

In the case of Theorem 3.8 where ∇h isL(h) - Lipschitz continuous, we have dist0;∂Lf(xk)≤Mkxk−xk−1k,

for all suﬃciently large k, whereM =L(h) +L+t. It follows from (3.20) that ∆k≤ 4 3 kxk−xk−1k 4 +γϕ(ℓk) ≤ kx k₋_xk−1_k 3 + 4γ 3 [M c(1−θ)] 1 −θ θ kxk−xk−1k 1 −θ θ ,

whereγ = 2L_t₋+2_Lt. Thus, in both cases it always holds that ∆k≤C1(∆k−1−∆k) +C2(∆k−1−∆k)

1

−θ θ ,

for someC1, C2>0.The result now follows from the proof of [32, Theorem 2].

4 Examples

Trust-Region SubProblem. Consider thetrust-region subproblem

min φ(x) = 1 2x ⊤_Ax₊_b⊤_x_: _k_x_k2_≤_r2 , (4.1)

where A is an n×n real symmetric matrix and b ∈Rn _{is given. Since} _A _{is not required} to be positive-semideﬁnite, (4.1) is a nonconvex optimization problem. LetE ={x∈Rn _: kxk ≤ r} and deﬁne the function

f(x) =φ(x) +δ(x;E), x∈Rn_.

The trust-region subproblem (4.1) can be solved by the (DCA) with the following DC decomposition f =g−h with g(x) = 1 2ρkxk 2₊_b⊤_x₊_δ₍_x_;_E_{) and}_h₍_x_{) =} 1 2x ⊤₍_ρI₋_A₎_x,

whereρ is a positive number such that ρI−Ais positive-semideﬁnite; see [6]. The conver-gence analysis of the (DCA) sequence for solving (4.1) was proved in [34].

Deﬁne g2(x) = 1 2ρkxk 2₊_b⊤_x_and _g 1(x) =δ(x;E).

In this case, g2 and h have Lipschitz gradient with Lipschitz constants L=ρ and L(h) =

λmax(ρI −A), respectively. Applying the (GPPA) for (4.1), we have yk = ∇h(xk) =

(ρI−A)xk _and yk− ∇g2(xk+1)−t xk+1−xk∈∂g1(xk+1). This implies yk+txk−b∈(t+ρ)xk+1+N(xk+1;E). Thus, xk+1=PE 1 t+ρ (t+ρ)xk−Axk−b .

(16)

Proposition 4.1 Consider the trust-region subproblem(4.1). ThenC∗ 6=∅and the(GPPA) sequence{xk_} _{converges to a critical point of}_f ₌_g

1+g2−h.

Proof We only need to verify that all assumptions of Theorem 3.8 are satisﬁed in this particular case. Note that f(x) = φ(x) + δ(x;E). Obviously, infx∈Rnf(x) > −∞ and C∗₆₌_{∅. Let us show that}_f _{is a semi-algebraic function. Note that}

E={x∈Rn_: _p₍_x₎_≤_r2_}_, where p is the polynomial p(x) =Pn

i=1x2i. Thus, E is a semialgebraic set, which implies that its associated indicator function is a semi-algebraic function; see, e.g., [15].

It is also straightforward that φis also a semi-algebraic function since its graph gphφ={(x, y)∈Rn_×R_: _x⊤_Ax₊_b⊤_x₋_y _{= 0}}

is a semi-algebraic set. It follows thatf is a semialgebraic function as it is the sum of two semi-algebraic functions; see, e.g., [15]. Therefore, f satisﬁes the Kurdyka - Lojasiewicz property. Obviously, hhas Lipschitz continuous gradient. We have shown that all assump-tions of Theorem 3.8 are satisﬁed and the conclusion follows from Theorem 3.8.

Nonconvex Feasibility Problems. In this part, we show how the (GPPA) can be applied

to solve nonconvex feasibility problems. Let A and B be two nonempty closed sets in Rn_. It is implicitly assumed that A and B are simple enough so that the projection onto each set is easy to compute. The feasibility problem asks for a point in A∩B. It is clear that

A∩B 6=∅if and only if the following optimization problem has the zero optimal value: min 1 2d 2 B(x) : x∈A . (4.2)

This problem is of the type (3.1) with the objective function f(x) =g1(x) +g2(x)−h(x),

where g1(x) =δ(x;A), g2(x) = 1 2kxk 2_{, h}₍_x_{) =} 1 2 kxk 2₋_d2 B(x) .

Obviously, the function g2 is diﬀerentiable with L−Lipschitz gradient where L = 1. We

have h(x) = 1 2kxk 2₋1 2inf{kxk 2₊_k_y_k2₋_2h_{x, y}_i_: _y_∈_B_} = sup{hx, yi −kyk 2 2 : y∈B} = sup{fy(x) : y∈B},

where fy(x) =hx, yi − ky₂k2. Therefore, h is a pointwise supremum of a collection of aﬃne functions so it is a convex function. DenoteS(¯x) ={y∈B : fy(¯x) =h(¯x)}. We have

S(¯x) ={y∈B : k¯x−yk2 =d2_B(¯x)}=PB(¯x).

SinceB is a nonempty closed subset ofRn_{, the set}_S_(¯_x_{) =}_P_B_(¯_x_{) is nonempty and compact} for any ¯x∈Rn_{. By [33, Theorem 3, p. 201], we have}

∂h(¯x) = co   [ y∈S(¯x) ∂fy(¯x)  = co   [ y∈S(¯x) {y}  = co P_B(¯x).

(17)

Making use of Proposition 3.1, we now can state the necessary condition for a local minimum of (4.2).

Proposition 4.2 If x¯∈A is a local optimal solution of (4.2), then

PB(¯x)⊂x¯+NL(¯x;A), (4.3) where NL_(¯_x_;_A₎ _{is the limiting normal cone to} _A _at _x_¯ _{deﬁned by}_NL_(¯_x_;_A_{) =}_∂L_δ_(¯_x_;_A_). Note that the optimality condition (4.3) is not suﬃcient to ensure that ¯xis a local minimizer of (4.2) as shown in the next example.

Example 4.3 Consider the following subsets of R2_:

A=

(x1, x2)∈R2: x2 ≥1 and B =

(x1, x2)∈R2 : x2 ≤αx21 ,

whereα < 1₂. Put ¯x= (0,1) ∈A. Since α < 1₂, the system

(

x2₁+ (x2−1)2 ≤1,

x2−αx21 ≤0,

has a unique solution (x1, x2) = (0,0). This implies PB(¯x) = {(0,0)} and dB(¯x) = 1.

Obviously, ¯x satisﬁes condition (4.3) since

PB(¯x) ={(0,0)} ⊂ {(0, γ) : γ ≤1}= ¯x+N(¯x;A).

However, for any neighborhood U of ¯x, there always exists ǫ > 0 small enough such that

xǫ = (ǫ,1)∈U and

dB(xǫ)≤1−αǫ2 <1. Thus, ¯z cannot be a local minimizer of (4.2).

Based on the (GPPA), we now propose the following simple algorithm for solving (4.2). For a given initial point x0 ∈ A, the (GPPA) sequence {xk_} _{with the starting point}_x0 _is

deﬁned by xk+1∈PA (1−1 t)x k₊1 ty k , (4.4)

whereyk _{is an element chosen in co} _P

B(xk). Note that, this scheme is diﬀerent from some other well-known methods such as the alternating projection algorithm or the averaged pro-jection algorithm. Moreover, it cannot be obtained from the proximal forward - backward schemes in [27, 30].

Theorem 4.4 Let A and B are nonempty closed sets in Rn _{and let} _{t >} _{1. Then the}

sequence{xk_{} ⊂}_A _{satisﬁes the following:} (i) For any k≥1,

d2_B(xk)−d2_B(xk+1)≥2(t−1)kxk−xk+1k2.

(ii) lim k→+∞kx

k₋_xk+1_k_{= 0}_.

(iii) If{xk_} _{is bounded, then every cluster point is a critical point of} _f ₌_δ_(·;_A_{) +}_d2

(18)

Proposition 4.5 Let A and B are nonempty closed sets in Rn _{such that both of them are} semi-algebraic sets and B is convex. Suppose further that either A or B is bounded. Then the sequence {xk} generated by the (GPPA) converges to a critical point of (4.2).

ProofAsAis a semi-algebraic set, the indicator functionδ(·;A) is a semi-algebraic function. On the other hand,B is also semi-algebraic, sox7→ 1

2d2B(x) is also a semi-algebraic function; see [30, Lemma 2.3]. Therefore, f(x) = δ(x;A) +1₂d2_B(x) is a semi-algebraic function. If

B is closed and convex, it is well known that the function x 7→ d2

B(x) is smooth with 1 - Lipschitz continuous gradient; see [35, Corollary 12.30]. The result now follows directly from Theorem 3.8 since the boundedness of {xk_} _{is ensured by the coercivity of} _f _under the assumption that either Aor B is bounded.

5 Concluding Remarks

Based on recent progress in using the Kurdyka - Lojasiewicz property and variational anal-ysis in analyzing nonsmooth optimization algorithms, we introduce and study convergence analysis of a proximal point algorithm for minimizing diﬀerences of functions. We are able to relax some convexity in the classical DC programming to deal with a more general class of problems. The results open up the possibility of understanding the convergence of the (DCA) and other algorithms for minimizing diﬀerences of convex functions used in numerous applications.

Acknowledgment. The authors are very grateful to Prof. J´erˆome Bolte for his

helpful suggestions to improve the paper. The authors are also thankful to Prof. Nguyen Dong Yen and Dr. Hoang Ngoc Tuan for useful discusions on the subject. This work was completed while the ﬁrst author was visiting the Vietnam Institute for Advanced Study in Mathematics (VIASM). He would like to thank the VIASM for ﬁnancial support and hospitality. The research of the second author was partially supported by the USA National Science Foundation under grant DMS-1411817 and the Simons Foundation under grant #208785.

References

[1] Tuy, H.: Convex Analysis and Global Optimization, Kluwer Academic Publishers, Dor-drecht, The Netherlands (1998)

[2] Horst, R., Tuy, H.: Global Optimization, Springer Verlag (1990)

[3] Bac´ak, M., Borwein, J. M.: On diﬀerence convexity of locally Lipschitz functions. Op-timization, 60, 961-978 (2011)

[4] Pham, D. T., Souad, E. B.: Algorithms for solving a class of nonconvex optimization problems: Methods of subgradient, Fermat days 85, Mathematics for optimization, Else-vier, North Holland, 249-270 (1986)

[5] Pham Dinh, T., Le Thi, H. A.: Convex analysis approach to D.C. programming: Theory, algorithms and applications, ActaMath. Vietnam. 22, 289-355 (1997)

(19)

[6] Pham Dinh, T., Le Thi, H. A.: A d.c. optimization algorithm for solving the trust-region subproblem, SIAM J. Optim. 8, 476-505 (1998)

[7] Pham, D. T., An, L. T. H., Akoa, F.: The DC (Diﬀerence of Convex functions) program-ming and DCA revisited with DC models of real world nonconvex optimization problems, Annals of Operations Research, 133, 23-46 (2005)

[8] Muu, L. D., Quoc, T. D.: One step from DC optimization to DC mixed variational inequalities, Optimization, 59, 63-76 (2010)

[9] Martinet, B.: Regularisation, d’in´equations variationelles par approximations succesives, Rev. Francaise d’Inform. Recherche Oper., 4, 154-159 (1970)

[10] Rockafellar, R.T.: Monotone operator and the proximal point algorithm, SIAM J. Control. Opt., 14:5, 877–898 (1976)

[11] Sun, W., Sampaio, R. J. B., Candido, M. A. B.: Proximal point algorithm for mini-mization of DC Functions, Journal of Computational Mathematics, 21, 451-462 (2003) [12] Moudaﬁ, A., Maing´e, P. E.: On the convergence of an approximate proximal method

for DC functions, Journal of Computational Mathematics, 24, 475-480 (2006)

[13] Bento, G. C., Ferreira, O. P., Oliveira, P. R.: Proximal point method for a special class of nonconvex functions on Hadamard manifolds, Optimization, 64(2), 289-319 (2015) [14] Souza, S. S., Oliveira, P. R., Cruz Neto, J. X., Soubeyran, A.: A proximal method with

separable Bregman distances for quasiconvex minimization over the nonnegative orthant, European Journal of Operational Research, 201, 365-376 (2010)

[15] Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka Lojasiewicz inequality. Mathematics of Operations Research, 35, 438–457 (2010)

[16] Bolte, J., Pauwels, E.: Majorization-minimization procedures and convergence of SQP methods for semi-algebraic and tame programs, preprint (2015)

[17] Pham Dinh, T., Ngai, H. V., Le Thi, H. A.: Convergence analysis of DC algorithm for DC programming with subanalytic data, preprint (2013)

[18] Clarke, F. H.: Optimization and Nonsmooth Analysis, SIAM, Philadelphia (1990) [19] Mordukhovich, B.S.: Variational Analysis and Generalized Diﬀerentiation, I: Basic

Theory, Springer, Berlin (2006)

[20] Mordukhovich, B. S., Nam, N. M.: An Easy Path to Convex Analysis and Applications, Morgan & Claypool Publishers (2014)

[21] Rockafellar, R. T.: Convex Analysis, Princeton University Press, Princeton, NJ (1970) [22] Rockafellar, R. T., Wets, R.: Variational Analysis, Grundlehren der Mathematischen

(20)

[23] Bolte, J., Daniilidis, A., Lewis, A.S., Shiota., M.: Clarke subgradients of stratiﬁable functions. SIAM J. Optim., 18, 556-572 (2007)

[24] Hiriart-Urruty, J. B.: Generalized diﬀerentiability, duality and optimization for prob-lems dealing with diﬀerences of convex functions, Lecture Note in Economics and Math. Systems 256, 37-70 (1985)

[25] Horst, R., Thoai, N. V.: DC programming: overview, Journal of Optimization Theory and Applications, 103(1), 1-43 (1999)

[26] Mordukhovich, B. S., Nam, N. M., Yen, N. D.: Fréchet subdifferential calculus and optimality conditions in nondifferentiable programming, Optimization, 55, 685-708 (2006) [27] Bolte, J., Sabach, S., Teboulle, M.: Proximal Alternating Linearized Minimization for

Nonconvex and Nonsmooth Problems, Math. Program., Ser. A, (146), 459-494 (2014) [28] Nesterov, Y.: Introductory lectures on convex optimization: a basic course, Applied

optimization, Kluwer Academic Publ., Boston, Dordrecht, London (2004)

[29] Ortega, J. M., Rheinboldt, W. C.: Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New-York (1970)

[30] Attouch, H., Bolte, J., Svaiter, B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program., Ser. A,137 (1), 91-124 (2011)

[31] Souza, J. C. , Oliveira, P. R., Soubeyran, A.: A modiﬁed generalized proximal point algorithm for DC functions with application to the optimal size of the ﬁrm problem, submitted to European J. Oper. Res. (2015)

[32] Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116, no. 1-2, Ser. B, 5-16 (2009) [33] Ioﬀe, A. D., Tihomirov, V. M.: Theory of extremal problems, Studies in Mathematics

and its Applications, vol. 6, North-Holland Publishing Co., Amsterdam - New York, 1979. Translated from the Russian by Karol Makowski.

[34] Tuan, H. N., Yen, N. D.: Convergence of Pham DinhLe This algorithm for the trust-region subproblem, J. Global Optim.,55 (2013), 337–347.

[35] Bauschke, H. H., Combettes, P. L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer (2011).