Inertial proximal algorithm for difference of two maximal monotone operators

(1)

INERTIAL PROXIMAL ALGORITHM FOR DIFFERENCE OF TWO MAXIMAL MONOTONE OPERATORS

M. Alimohammady and M. Ramazannejad

Department of Mathematics, Faculty of Mathematical Sciences,

University of Mazandaran, Babolsar, Iran, 47416-1468

e-mails:{m.alimohammady, m.ramzannezhad}@gmail.com

(Received 30 November 2013; after final revision 24 February 2015;

accepted 6 July 2015)

In this note, a new algorithm is presented for finding a zero of difference of two maximal mono-tone operatorsTandS, i.e.,T−Sin finite dimensional real Hilbert spaceHin which operator S has local boundedness property. This condition is weaker than Moudafi’s condition on operator S in [13]. Moreover, applying some conditions on inertia term in new algorithm, one can improve speed of convergence of sequence.

Key words : Maximal monotone operator; proximal point algorithm.

1. PRELIMINARIES

LetHbe a Hilbert space. The notationh., .iwill be used for inner product inH×Handk.kfor the corresponding norm. A set valued operatorT :H→2H is said to be monotone if

hx∗−y∗, x−yi ≥0, ∀(x, x∗), (y, y∗)∈G(T),

whereinG(T) := {(x, y) ∈ H ×H;y ∈ T x}is graph ofT. The domain ofT isD(T) := {x ∈

H;T(x)6=∅}.

A monotone operatorT is called maximal monotone if its graph is maximal in the sense of inclu-sion.

(2)

whenT is maximal monotone. For anyx ∈H,limλ→0J_λT(x) =P roj_D₍_T₎x, whereinP roj_D₍_T₎is the orthogonal projection on the closure of the domain ofT. One of the best known approaches in the theory of optimization that is related to resolvent operators is Yosida approximateTλ := (I−J

T λ) λ of a maximal monotone operatorT which satisfies in:

(i) For allx∈H,Tλ(x)∈T(JλT(x)),

(ii)Tλis Lipschitz with constant_λ1 and maximal monotone,

(iii)Tλ(x)converges strongly toT(x)asλ→0,forx∈D(T),

(iv)kTλ(x)k ≤ kT0(x)kfor everyx∈D(T),λ >0, whereT0 is minimal selection

T0(x) :={y∈T(x); kyk= min

z∈T(x)kzk}, x∈D(T).

The aim of this note is offering the inertial proximal algorithm for the problem

find x∈H such that 0∈T(x)−S(x), (1.1)

whereT, S :H→2H _{are two maximal monotone operators on finite dimensional real Hilbert space}

Hand it is equivalent to the problem

find x∈H such that T(x)∩S(x)6=∅. (1.2)

This study is important, because finding the critical points of the difference of two convex func-tions is the special case of finding the zeros of difference of two maximal monotone operators. Ac-tually, an algorithm for difference of two maximal monotone operators plays a central role in the study of DC programming [8, 9]. Moreover, it is valuable to mention that the variational inclusions corresponding to the difference of two monotone operators have grown from prox-regularity, multi-commodity network, image restoring processing, tomography, molecular biology and optimization, see [1, 4, 6, 10] and the references therein.

The problem (1.1) did not study extensively. The latter studies are limited to Moudafi [12, 13]. By [13], a regularization of the problem (1.1) is

find x∈H such that 0∈T(x)−Sλ(x). (1.3)

For finding a solution of (1.1) Moudafi [13] suggested a sequence{xn}by

(3)

whereµn>0andx0 is an initial point.

Here, the problem (1.1) is studied via generalization of Moudafi’s algorithm in [13] as the follow-ing:

x_k₊₁=J_βT_k(x_k+α_k(x_k−x_k₋₁) +β_kSµkxk) ∀k∈N, (1.5)

with starting pointsx0, x1 ∈Hand sequences{µk},{αk}and{βk} ⊂[0,+∞)such that

(a)limk→+∞µk= 0;

(b)P+_k₌₁∞βk

µk <+∞;

(c)limk→+∞α_β_kk = 0;

also we suppose that

(d)P+_k₌₁∞αkkxk−xk−1k<+∞;

(e)limk→+∞kxk+1_β_k−xkk = 0.

We note that (1.5) is emanated from the evolution equation

x00(t) +γx0(t) +∇f(x(t))− ∇g(x(t)) = 0, (1.6)

whereγ >0and algorithm (1.4) can be inspired from

x0(t) +∇f(x(t))− ∇g(x(t)) = 0, (1.7)

in which bothf, g : H → Rare differentiable convex functions and ∇f(x(t))and ∇g(x(t))are operatorsT andSin (1.1), respectively.

If∇g(x(t)) = 0, then (1.6) is heavy ball with friction system or (HBF) and (1.5) is equivalent to the standard gradient descent iteration (1.4) with an additional inertia term or momentum term

αk(xk−xk−1). By the inertia term, convergence of the solution trajectories of the (HBF) system to a

stationary point off can be faster than those of the first order system (1.7) when∇g(x(t)) = 0[14].

Another important advantage of algorithm (1.5) over algorithm (1.4) is using condition of local boundedness ofSinstead of boundedness in (1.4).

In this note, we present different conditions under which (1.5) converges to a solution of (1.1).

Now, we recall some required results and definitions.

(4)

Lemma 1.2 [16] — Suppose thatE is a reflexive Banach space. A maximal monotone operator

T :E →2E∗

is locally bounded at a pointx¯∈D(T)if and only ifx¯belongs to interior ofD(T).

Defintion 1.3 — A set valued operatorT : H → 2H is upper semicontinuous at x¯ if for any positive² >0there existsδ >0such that

kx−x¯k ≤δ⇒T(x)⊆T(¯x) +B(0, ²). (1.8)

Lemma 1.4 [2] — Suppose thatEis a Banach space. The maximal monotone operatorT :E → 2E∗

is demiclosed, i.e., the following conditions hold.

(1) If{xk} ⊂Econverges strongly tox0and{uk∈T(xk)}converges weak* tou0 inE∗, then

u0 ∈T(x0).

(2) If{xk} ⊂Econverges weakly tox0and{uk∈T(xk)}converges strongly tou0inE∗, then

u0 ∈T(x0).

Lemma 1.5 [11] — Suppose that{an},{bn}and{cn}are three sequences of nonnegative numbers such that

an+1 ≤(1 +bn)an+cn for all n ≥1.

IfP∞_n₌₁bn<+∞and P_∞

n=1cn<+∞, thenlimn→∞anexists.

2. MAINRESULTS

In the following, we improve the conditions of Theorem 2.1 in [13].

Theorem 2.1 — Assume that S is locally bounded onD(S) and the solution set Ωof problem (1.1) is nonempty. If the conditions(a), ...,(e) satisfy andD(T) ⊂ D(S), then the sequence{xk}

generated by (1.5) converges to a solution of (1.1).

PROOF : Takex∗ ∈ Ω. According to (1.2), there existsy∗ ∈ T(x∗) ∩S(x∗) and from (1.5), x∗ ₌_JT

βk(x

∗₊_β

ky∗). From the triangular inequality,(iv), nonexpansivity ofJβTk and the fact that

Sµk is also nonexpansive with constant _µ1_k, one quickly deduces that

kx_k₊₁−x∗k=kJ_βT_k(x_k+α_k(x_k−x_k₋₁) +β_kS_µ_kx_k)−J_βT_k(x∗+β_ky∗)k

≤ kx_k+α_k(x_k−x_k₋₁) +β_kSµk(xk)−x

∗₋_β ky∗k

≤ kx_k−x∗k+α_kkx_k−x_k₋₁k+β_kkSµk(xk)−y

∗_k

≤ kxk−x∗k+αkkxk−xk−1k+βk(kSµk(xk)−Sµk(x

∗₎_k₊_k_S µk(x

∗₎₋_y∗_k₎

≤(1 +βk

µk

(5)

Applying(a) and(b),P∞_k₌₀βk < ∞. Also by (d) and Lemma 1.5, we havelimk→+∞kxk−

x∗_k _{exists. Hence,}_{_x

k}is bounded. Notice that there existxeand a subsequence {xkν} such that limν→∞xkν =ex, sinceHis a finite dimensional space. We seeJµSkνxkν tends toex, because

kJ_µS_kνxkν−xek ≤ kJµSkνxkν−JµSkνexk+kJ S

µkνxe−exk

≤ kxkν−xek+kJµSkνxe−exk,

andlimν→+∞JµS_kνxe=P rojD(S)xe=ex.This fact and local boundedness ofSimply that ©

Sµ_kνxkν ª

⊆S

³ {J_µS

kνxkν} ´

⊆B, (2.1)

whereB is a bounded set. Therefore, {Sµ_kνxkν}is bounded and there exist y˜and a subsequence {Sµ_kν0xkν0}such thatlimν0→∞Sµ_kν0xkν0 = ˜y. Theny˜∈S(˜x)follows from

Sµ_kν₀xkν0 ∈S ³

J_µS

kν0xkν0 ´

, (2.2)

and Lemma 1.4. In sequel by (1.5), we have

S_µ

kν0xkν0 − µ

xkν0+1−xkν0

βkν0 ¶

+αkν0

βkν0

(x_k_ν0 −xk_ν0−1)∈T xk_ν0+1, (2.3)

tendingν0 to+∞in (2.3) and using conditions(c),(e), boundedness of{xk}and Lemma 1.4, it is obtained thatey∈Txe. By similar procedure to proof of Theorem 2.1 in [13],exis unique. Then proof

is complete. 2

Example 2.2 : The best example of Theorem 2.1 can be seen in digital halftoning which is a

procedure for producing a sample of pixels when a limited number of colors are available with a binary system so that it is a continuous-tone image. In this context Teuber et al. [17] minimized difference of two functions that one is corresponding to attraction of the dots by the image gray values and the other corresponds to the repulsion between the dots. They signified black pixel with0 and white pixel with1and investigated imagesu:G→ [0,1]on an integer gridG:={1, ..., nx} × {1, ..., n_y}. If m be the number of black pixels generated by the dithering procedure and p := (pk)mk=1 = ((pk,x, pk,y)T)mk=1 ∈ R2m be their position vector then |pk| :=

q

p2

k,x+p2k,y is the Eucilidian norm of the position of thek-th black pixel.

In [17], it is detected minimizerpˆof the functional

E(p) = m X

k=1 X

(i,j)∈G

w(i, j)|p_k− µ

i j

¶ |

| {z }

F(p)

−λ m X k=1 m X

l=k+1

|p_k−p_l|

| {z }

G(p)

(6)

wherew:= 1−uis the corresponding weight distribution andλ:= _m1 P₍_i,j₎_∈_Gw(i, j).

Given two functionsF(p)andG(p)are continuous and convex. Since∂F and∂Gare maximal monotone operators [15] and∂Gis locally bounded onR2m [7], the problem of finding a minimizer of (2.4) is a special case of (1.1). If conditions(a),...,(e) satisfy and D(∂F) ⊂ D(∂G), then by Theorem 2.1 the generated sequence{xk}of (1.5) converges to a minimizer of (2.4).

In next result, the condition of local boundedness ofSin Theorem 2.1 is eliminated and domain of it will be entireH.

Corollary 2.3 — Assume that the solution setΩof problem (1.1) is nonempty, conditions(a), ...,(e) satisfy andD(S) =H, then the sequence{xk}generated by (1.5) converges to a solution of (1.3).

PROOF: SinceD(S)is open, using Lemma 1.2 the operatorSis locally bounded at any point of

D(S). The rest of proof is similar to Theorem 2.1. 2

Remark 2.4 : IfD(S) =HandT−Sis a monotone operator then by [3, Theorem 2.1],T−Sis maximal monotone. Hence, (1.1) reduces to find a zero point of maximal monotone operatorT −S

and iteration algorithm (1.5) changes toxk+1 =J_βT_k−S(xk+αk(xk−xk−1)).

Corollary 2.5 — Assume thatS is bounded value (i.e. for allx ∈ H,Sxis a bounded set) and upper semicontinuous at any point ofD(S) and the solution set Ωof problem (1.1) is nonempty. If the conditions(a), ...,(e) satisfy andD(T) ⊂ D(S)then the sequence{xk} generated by (1.5) converges to a solution of (1.1).

PROOF: SinceSis bounded value and upper semicontinuous at any point ofD(S), so it is locally bounded. The rest of proof is similar to Theorem 2.1. 2

Two types of interesting particular instances of (1.1) are:

find x∗ ∈H such that y∗ ∈T(x∗), (2.5)

and

find x∗ ∈H such that x∗∈T(x∗). (2.6)

It is assumed thatG(S) := H× {y}for an arbitrary pointy ∈H in (2.5) andG(S) :={(x, x);x∈

H}for any pointx∈H in (2.6).

In the following, we present the results of these types of problems.

(7)

problem (1.1) is nonempty. If the conditions(a),...,(e)satisfy andD(T) ⊂D(S)then the sequence {x_k}generated by (1.5) converges to a solution of (1.1).

PROOF: It is easy to check that sequence{xk}is bounded and there existexand a subsequence {x_k_ν}such thatlimν→∞xkν =ex. In proof of Theorem 2.1 it has been shown thatlimν→∞Jµkνxkν = e

x. Consequently, from

S_µ_kνx_k_ν − µ

x_k_ν+1−x_k_ν βkν

¶ +αkν

βkν

(x_k_ν−x_k_ν₋₁)∈T x_k_ν+1, (2.7)

Sµ_kν(xkν) =S(JµS_kν(xkν)), continuity ofSand by passing to a subsequence, we can arrange that left side of (2.7) converges toS(ex). By Lemma 1.4, we see thatS(ex)∈T(ex), i.e. 0∈T(xe)−S(xe). 2

Corollary 2.7 — Assume thatS:H →His Lipschitz continuous, the solution setΩof problem (1.1) is nonempty andD(T) ⊂ D(S). If conditions(c),...,(e) satisfy and if one replaces condition P_∞

k=1βk<∞with(a)and(b)then the generated sequence{xk}of method

xk+1=JβTk(xk+αk(xk−xk−1) +βkS(xk))

converges to a solution of problem (1.1).

Remark 2.8 : All results of this paper has derived from Lemma 1.4. In an infinite dimensional real

Hilbert space, boundedness of sequence{xk}in Theorem 2.1 implies that there exist subsequence {x_k_ν}andx˜ ∈ H such that{xkν}converges weakly to x˜. The fundamental difficulties in proving ˜

y∈S(˜x)andy˜∈T(˜x)are showing strongly convergence of either{J_µS_kνxkν}tox˜or{Sµ_kνxkν}to ˜

yand the left side of (2.3) toy˜.

ACKNOWLEDGEMENT

The authors thank the referees for their pertinent and constructive comments.

REFERENCES

1. S. Adly and W. Oettli, Solvability of generalized nonlinear symmetric variational inequalities, J. Austral. Math. Soc. Ser., 40 (1999), 289-300.

2. Y. Alber and I. Ryazantseva, Nonlinear Ill-posed Problems of Monotone Type, Springer, New York, (2006).

(8)

4. L. T. H. An and D. T. Pham, The DC programming and DCA revised with DC models of real world nonconvex optimization problems, Ann. Oper. Res., 133 (2005), 25-46.

5. J. P. Aubin and H. Frankowska, Set-valued analysis, Reprint of the 1990 Edition.

6. S. Chandra, Strong pseudo-convex programming, Indian J. Pure Appl. Math., 3 (1972), 278-282.

7. F. H. Clarke, R. J. Stern and G. Sabidussi, Nonlinear analysis, differential equations and control, ser. NATO Sciences Series, Series C: Mathematical and Physical Sciences, Kluwer Academic Publishers, 528 (1999).

8. A. Hamdi, A modified Bregman proximal schemes to minimize the difference of two convex functions, Appl. Math. E-Notes., 6 (2006), 132-140.

9. A. Hamdi, A Moreau-Yosida regularization of a DC functions: application to variational inequality problems, Appl. Math. E-Notes., 5 (2005), 164-170.

10. S. Huda and R. Mukerjee, Minimax second-order designs over cuboidal regions for the difference be-tween two estimated responses, Indian J. Pure Appl. Math., 41(1) (2010), 303-312.

11. D. Lei and L. Shenghong, Ishikawa iteration process with errors for nonexpansive mappings in uniformly convex Banach spaces, Internat. J. Math. and Math. Sci., 24(1) (2000), 49-53.

12. A. Moudafi, On the difference of two maximal monotone operators: Regularization and algorithmic approaches, Appl. Math. Comput., 202 (2008), 446-452.

13. A. Moudafi, On critical points of the difference of two maximal monotone operators, Afr. Mat., (2013). DOI 10.1007/s13370-013-0218-7.

14. N. Qian, On the momentum term in gradient descent learning algorithms, Neural networks, 12(1) (1999), 145-151.

15. R. T. Rockafellar, On the maximal monotonicity of subdifferential mappings, Pacific J. Math., 33 (1970), 209-216.

16. R. T. Rockafellar, Local boundedness of nonlinear monotone operators, Michigan Math. J., 16 (1969), 397-407.