• No results found

Proximal Methods for Elliptic Optimal Control Problems with Sparsity Cost Functional

N/A
N/A
Protected

Academic year: 2020

Share "Proximal Methods for Elliptic Optimal Control Problems with Sparsity Cost Functional"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

http://dx.doi.org/10.4236/am.2016.79086

Proximal Methods for Elliptic Optimal

Control Problems with Sparsity

Cost Functional

Andreas Schindele, Alfio Borzì

Institut für Mathematik, Universität Würzburg, Würzburg, Germany

Received 17 March 2016; accepted 27 May 2016; published 30 May 2016

Copyright © 2016 by authors and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/

Abstract

First-order proximal methods that solve linear and bilinear elliptic optimal control problems with a sparsity cost functional are discussed. In particular, fast convergence of these methods is proved. For benchmarking purposes, inexact proximal schemes are compared to an inexact semismooth Newton method. Results of numerical experiments are presented to demonstrate the computa-tional effectiveness of proximal schemes applied to infinite-dimensional elliptic optimal control problems and to validate the theoretical estimates.

Keywords

Optimal Control, Elliptic PDE, Nonsmooth Optimization, Proximal Method, Semismooth Newton Method

1. Introduction

In recent years, a great research effort has been made to solve optimization problems governed by Partial Differential Equations (PDEs); see, e.g., [1]-[3] and references therein. In many cases, this research has focused on objective functionals with differentiable L2 terms and non-smoothness resulted from the presence of control and state constraints. However, more recently, the investigation of L1 cost functionals has become a central topic in PDE-based optimization [4]-[6], because they give rise to sparse controls that are advantageous in many applications like optimal actuator placement [4] or impulse control [7].

A representative formulation of optimal control problems with L1 control costs is the following

( ) ( ) ( )

( )

2 2 1

1 2

0

2 2

,

1 min

2 2

s.t. , 0, a.e. in ,

L L L

y u H L

a b

y z u u

c y u u u u

α β

∈ Ω × Ω − + +

= ≤ ≤ Ω

(2)

where c y u

( )

, =0 represents a PDE for the state y including the control u. This problem has been discussed in

[4][5] for the case where c y u

( )

, represents a linear elliptic operator. Nonlinear PDE constraints have been considered in [6]. However, in these references a linear control mechanism is discussed. Concerning the optimization methodology for (1.1), the semi-smooth Newton method has been the solver of choice in [4]-[6].

On the other hand, in the field of signal acquisition and reconstruction, l1-based optimization and sparsity have been exploited to successfully recover “functions” from few samples; see, e.g., [8]-[10].

In this framework, it was shown [11] that l1-based inverse problems in signal recovery can be very efficiently solved by proximal methods. Nowadays, these iterative schemes are the method of choice in magnetic resonance imaging and a special proximal method called “Fast Iterative Soft Thresholding Algorithm” (FISTA) [12] is con- sidered the state-of-the-art method for solving finite-dimensional optimization problems of the following form

2

2 1

min ,

n

x

Ax b

β

x

∈ − +

where the rectangular matrix A represents a blur operator.

We remark that the research and successful application of proximal schemes are attracting attention of many scientists and practitioners, which result in many new developments in this field. We refer to, e.g., [13] for recent results and additional references.

The purpose of our work is to contribute to the field of PDE-based optimization with 1

L control costs by investigating proximal methods in this infinite-dimensional setting. In particular, we aim at implementing and analysing proximal schemes for solving (1.1) that exploit first-order optimality conditions. Our investigation is motivated by the fact that proximal methods may have a computational performance that is comparable to that of semismooth Newton methods. However, in contrast to the latter, proximal schemes do not require the con- struction of second-order derivatives and the implementation of, e.g., a Krylov solver.

For our investigation, we consider (1.1) with elliptic operators and linear and bilinear control mechanisms. Notice that the latter case has been a much less investigated problem. One of our main contributions is to prove convergence for all variants of the proximal schemes that we discuss in this paper. In particular, we prove an

( )

1k2

 convergence rate of the value of reduced cost functional, where k is the number of proximal iterations. This notion of convergence is used in l1-based optimization and in some application fields [14].

We remark that many arguments in our analysis are similar to those presented in the finite-dimensional case. However, some additional arguments are necessary in infinite dimensions, especially regarding the structure of our differential constraints and the discussion of our inexact proximal schemes. We refer to [13] for further results concerning the formulation of proximal schemes for infinite-dimensional optimization problems from a different perspective.

In the next section, we discuss linear and bilinear elliptic optimal control problems, where for completeness, some conditions for the existence of a unique control-to-state operators are considered. Section 3 is devoted to optimal control problems with sparsity costs and governed by elliptic equations with linear and bilinear control mechanisms. We discuss conditions for convexity of the bilinear problem and state the optimality conditions. In Section 4, we present a Fast Inexact Proximal method (FIP) that represents an infinite-dimensional extension of the FISTA method. In Section 5, the convergence rate of this method is proven to be

( )

2

1k

 . In Section 6, an inexact semismooth Newton method in function spaces is presented as the state of the art method for comparison purposes. For completeness, the theory of this method is extended to the case of elliptic bilinear control pro- blems. A numerical comparison of the FIP and Semismooth-Newton methods is presented in Section 7. A section of conclusion completes this work.

2. Elliptic Models with Linear and Bilinear Control Mechanisms

In this section, we discuss elliptic PDE models with linear and bilinear control structures. Notice that these models are already discussed in many references; see, e.g., [2][15]-[17]. However, in this section, we report the main results required for our analysis of convergence of the proposed proximal methods.

Consider the following boundary value problem

in

Ay+ =u f Ω (2.1)

0 on ,

(3)

represents a second-order linear elliptic differential operator of the following form 0 , 1 , n ij

i j j j

Ay a y a y

x x =   ∂ ∂ = − + ∂

such that ai j, ,a0 L

( )

∈ Ω , and ai j, satisfies the coercivity condition

( )

2 1 2

, 1 1

n n

ij j

i j=a x ξ ξ ≥θ j

a.e. in Ω

for some θ>0 and a0≥0. For the existence and uniqueness of solutions to (2.1) see ([15], Section 6). Further, we consider the following bilinear elliptic control problem

in

Ay+uy= f Ω (2.3)

0 on .

y= ∂Ω (2.4) In both linear and bilinear control settings, we require uUad, with the following set of feasible controls

( )

{

2

}

: : a.e. in ,

ad a b

U = uLu ≤ ≤u u Ω (2.5) where ua≤ ≤0 ub, ua <ub.

Now, we discuss the existence of a unique weak solution to (2.3)-(2.4). For this purpose, we need the Poincaré-Friedrichs lemma and denote with cΩ the Poincaré-Friedrichs constant; see, e.g., [15].

We denote ⋅ = ⋅: L2( ) induced by the inner product ⋅ ⋅ = ⋅ ⋅, : , L2( ) . Theorem 2.1. Let uUad, where

0 . a u a c θ Ω

> − − (2.6)

Then, there exists a unique weak solution yH10

( )

to the bilinear elliptic problem (2.3)-(2.4) and the following property holds

( )

1 1 .

H

y C f (2.7) Proof. The proof is immediate using the Lemma of Lax-Milgram and the following result

(

)

(

)

(

)

(

)

(

)

(

)

(

)

1 1 2 0 , 1 2 2 0

2 2 2

0

2 2 2

0

2 2

0 0

, : , d

d

d d d

d d d

min , .

n

ij

i j j i

a

a

a

a H H

y y

a y y Ay uy y a a u y x

x x

y a u y x

c y x c y x a u y x

c y x y x a u y x

c

c u a y C y

c

θ

κ θ κ

θ κ κ θ κ κ Ω = Ω

Ω Ω Ω Ω

Ω Ω Ω  ∂ ∂  = + =  + +  ∂ ∂   ≥ ∇ + + = ∇ + − ∇ + +   ≥ ∇ + + +     ≥ + + − =  

With 0

: 0

1 a

u a c

c θ κ Ω Ω + + = >

+ , we have that κcΩ=ua+a0+θ cΩ−κ and therefore C0:=κcΩ. In the forth

line the Poincaré-Friedrichs was used. Hence, C1:=C0−1. □ Remark 2.1. In the case of Ω =

( )

0,1n, n≤3, and A= −∆, including homogeneous Dirichlet boundary conditions, we have a0 =0, θ=1 and cΩ=1 2n such that we can ensure invertibility for ua> −2n.

Remark 2.2. In order to ensure a unique solution, we require condition (2.6) for the choice of ua in the bilinear case.

Next, we recall the following theorem stating higher regularity of solutions to (2.3)-(2.4); see ([18], Theorem 4.3.1.4).

Theorem 2.2. Let Ω ⊂n, n≤3, be a convex and bounded polygonal or polyhedral domain. If in addition to the assumptions of Theorem 2.1, we have that 1

( )

ij

(4)

holds

( )

(

)

2 ,

H

y Cf + yC f (2.8) for some appropriate constants C C, >0 that only depend on Ω.

Remark 2.3. Because H2

( )

can be embedded in L

( )

for n≤3 [19]and using (2.8), we obtain

( ) .

L

yC f (2.9) Theorem 2.1 and Theorem 2.2 ensure the existence of a unique control-to-state operator

( )

( )

( )

1 2

0

: ad , ,

S UH Ω ∩HuS u (2.10) where in the linear case S ul

( )

=A−1

(

fu

)

represents the unique solution to (2.1) and in the bilinear case

( ) (

)

1

b

S u = A u+ − f is the unique solution to (2.3). In the following, we use the expression S u

( )

when it is valid for both the linear and the bilinear systems.

Remark 2.4. The control-to-state operator S ub

( )

is not Fréchet-differentiable in the 2

L topology since for every ε >0 there is always an hL2

( )

with h

ε

such that u+ ∉h Uad and therefore it is not neces- sarily defined. However, we only need the following weaker form of differentiability, which is a directional dif- ferentiability in all vUad in the directions

(

u v

)

for uUad. This is called Q-differentiability; see [16].

Definition 2.1. (Q-differentiability). Let UX be a convex set and T U: →Y . Then T is called Q-differentiable in vU, if there exists a mapping TU

( )

v ∈

(

X Y,

)

, such that for all uU the following

holds

(

)

( )

( )(

)

0 if 0.

U Y

X X

T v u v T v T v u v

u v u v

+ − − − −

→ − →

In the following, we omit the index U and write T′=TU′. The Q-derivatives of S ub

( )

have the following properties.

Lemma 2.3. The control-to-state-operator Sb is at least two times Q-differentiable in Uad with

( )

2 ,

X Y=L and its derivatives have the following properties for all directions

φ φ

1, 2L2

( )

Ω : i) Sb

( )( )

u

φ

1H01

( )

Ω ∩H2

( )

is the solution yof

( )

1 b .

Ay′+uy′= −

φ

S u (2.11) ii) Sb′′

( )(

u

φ φ

1, 2

)

H01

( )

Ω ∩H2

( )

is the solution y′′ of

( )( )

( )( )

2 b 1 1 b 2 .

Ay′′+uy′′= −

φ

S u

φ

φ

S u

φ

(2.12) iii) The following inequalities hold

( )( )

1 2 1 ,

b

Su φ ≤C φ f (2.13)

( )(

1, 2

)

3 1 2 .

b

S′′ u φ φ ≤C φ φ f (2.14) Proof. Part (i) and (ii) can be shown by direct calculation (see ([16], Lemma 2.9). So part (iii) is left to be proved. If y′:=Sb

( )( )

u

φ

1H10

( )

Ω ∩H2

( )

Ω is a solution to

1

= b( ),

Ay′+uy′ −ϕS u

for uUad and fL2

( )

Ω , by using (2.9), we obtain

( )

( )

( )

1 b 1 b 2 1 ,

L L

y′ ≤C y′ ∞ ≤C φS uCφ S uC φ f (2.15) where the constants depend on the measure of Ω and not on y. Therefore we obtain (2.13) and y′∈L

( )

Ω . conclude Furthermore, let y′′:=Sb′′

( )(

u

φ φ

1, 2

)

H01

( )

Ω ∩H2

( )

Ω be a solution to the following problem

( )( )

( )( )

2 b 1 1 b 2 ,

Ay′′+uy′′= −

φ

S u

φ

φ

S u

φ

for uUad and fL2

( )

Ω . With the same arguments as above and using (2.15), we obtain

( )( )

( )( )

2 b 1 1 b 2 3 1 2 .

(5)

Therefore, we obtain (2.14), which completes the proof. □

3. Elliptic Optimal Control Problems with Sparsity Cost Functional

In this section, we discuss optimal control problems governed by the linear- and bilinear-control elliptic systems discussed in the previous section. We consider the following cost functional

( )

1

2 2

1

, : ,

2 2 L

J y u = yzuu (3.1) where zL2

( )

Ω , yH01

( )

Ω , uUad and

α β

, >0. This functional is made of a Fréchet-differentiable classical tracking type cost with L2-regularization and a nondifferentiable L1-control cost. Using the control- to-state operator (2.10), we have the following reduced optimal control problem

( )

( )

1

2 2

1 ˆ

min : .

2 2

ad L

u U J u S u z u u

α β

∈ = − + + (3.2) The nondifferentiable part J uˆ1

( )

:=

β

u L1 is convex. Therefore, in order to state convexity of the reduced functional J uˆ

( )

, we investigate the second-derivative of the differentiable part ˆ2

( )

: 1

( )

2 2

2 2

J u = S uzu . We have

( )(

)

( )( ) ( )( )

( )

( )(

)

2

ˆ , , , , , .

J′′ u v w = S uv S uw + S uz S′′ u v wv w In particular, in the linear case, we have

( )( )

1 2 2 2

( )

2

ˆ , 0, for all , 0.

J′′ u v v = A v− +α v > vLv ≠ (3.3) We conclude that the reduced functional is strictly convex in the linear case.

In the bilinear case, we have a non-convex optimization problem. However, local convexity can be guar- anteed under some conditions. To be specific, we chose the sufficient condition stated in the following theorem.

Lemma 3.1. Let C u′′

( )

: sup= v1 Sb′′

( )( )

u v v, , if the following inequality holds

( ) ( )

b ,

C′′ u S u − <z α (3.4) then the reduced functional J uˆ

( )

is strictly convex in a neighborhood of uUad.

Proof. Since J uˆ1

( )

:=

β

u L1 is convex, we have to prove that

( )

( )

2 2

2 1 ˆ :

2 b 2

J u = S uzu is strictly convex in u. Therefore we show that the reduced Hessian is positive definite in Uad as follows

( )( )

( )( ) ( )( )

( )

( )( )

( ) ( )

(

)

2

2

ˆ , , , , ,

,

b b b b

b

J u v v S u v S u v S u z S u v v v v

C u S u z v

α α

′′ = ′ ′ + − ′′ +

′′

≥ − −

and thus J uˆ

( )

is strictly convex in u. □

We remark that the result of Lemma 3.1 is well known. It expresses local convexity of the reduced objective when the state function is sufficiently close to the target and the weight of the quadratic L2 cost of the control is sufficiently large. Indeed, local convexity may result with much weaker assumptions. However, since our focus is the investigation of proximal schemes, we make the following strong assumption.

Assumption 1. We assume that (3.4) holds for all uUad.

Because of Lemma 2.3, this assumption holds if the regularization parameter α>C3 f

(

C f + z

)

. In the next step, the first-order optimality conditions for (3.2) are derived. First, we need the definition of the subdifferential.

Definition 3.1. Let H be a Hilbert space and F H: → be convex. We call the set-valued mapping

* :

F H H

∂  given by

( )

:

{

*: ,

( )

( )

for all

}

.

F u g H g v u F v F u v H

∂ = ∈ − ≤ − ∈

(6)

From ([20], Remark 3.2), we obtain that u is a solution of (3.2), if and only if there exists a

λ

∈ ∂J uˆ1

( )

such that

( ) ( )

*

(

)

, 0, for all ad,

S uS u − +z

α

u+

λ

u u− ≥ uU (3.5) where * denotes the adjoint operator. From (3.5), one can derive the optimality system by using the Lagrange multipliers

λ λ

a, bL2

( )

Ω . We have the following theorem (see [4], Theorem 2.1).

Theorem 3.2. The optimal solution u of (3.2) is characterized by the existence of

(

, ,

)

2

( )

2

( )

2

( )

a b L L L

λ λ λ ∈ Ω × Ω × Ω such that

( ) ( )

*

(

)

0,

b a

S uS u − +z αu+ +λ λ λ− = (3.6)

(

)

0, 0, 0,

b ub u b ub u

λ

≥ − ≥

λ

− = (3.7)

(

)

0, 0, 0,

a u ua a u ua

λ

≥ − ≥

λ

− = (3.8)

{

}

a.e. on x :u 0 ,

λ β

= ∈ Ω > (3.9)

{

}

a.e. on x :u 0 ,

λ ≤β ∈ Ω = (3.10)

{

}

a.e. on x :u 0 .

λ

= −

β

∈ Ω < (3.11) If one introduces the parameter µ λ λ λ:= + ba, it is shown in [4] that conditions (3.7)-(3.11) are equivalent

to

(

,

)

0,

B u

µ

= (3.12) where

(

)

{

(

)

}

{

(

)

}

(

)

{

}

{

(

)

}

, : max 0, min 0,

max 0, b min 0, a ,

B u u u c u c

u u c u u c

µ µ β µ β

µ β µ β

= − + − − + +

+ − + − + − + +

where c>0 is arbitrary. With this setting, the optimality system (3.6)-(3.11) reduces to the following

( ) ( )

*

(

)

0,

S uS u − +z αu+ =µ (3.13)

(

,

)

0.

B u

µ

= (3.14) In the linear-control case, the Equation (3.13) becomes the following

(

)

(

)

* 1 0,

AAf u z αu µ

− − − + + = (3.15) where

( )

* 1

A−∗ = A − . By setting y=A−1

(

fu

)

and p:= −A−∗

(

yz

)

, (3.15) becomes p+

α

u+ =

µ

0. We summarize the previous considerations into the following theorem.

Theorem 3.3. (Linear optimality conditions) The optimal solution

(

y u,

)

H01

( )

Ω ×L2

( )

to (3.2) in the linear control case is characterized by the existence of the dual pair

(

p,

µ

)

H01

( )

Ω ×L2

( )

such that

0

Ay+ − =u f (3.16)

*

0

A p+ − =y z (3.17)

0

p+

α

u+ =

µ

(3.18)

(

,

)

0.

B u

µ

= (3.19) Furthermore, the reduced gradient and the reduced Hessian of Jˆ2

( )

u are given by

( )

( )

* 1

2 2

ˆ , and ˆ .

J u

α

u p J′′ u

α

I A A− −

∇ = + = + (3.20)

(7)

For the bilinear-control system, we have S u

( )( )

φ

1 = −

(

A u+

)

−1

φ

1

(

A u+

)

−1f and therefore

( ) ( )

*

(

)

1

(

)

1 1

S u′ φ = − A u+ − f A u+ −∗φ such that (3.13) becomes the following

(

)

1

(

) (

(

)

1

)

0.

A uf A u −∗ A uf z

α

u

µ

− + + + − + + = (3.21) By setting y=

(

A u+

)

−1f and p:= −

(

A u+

) (

−∗ yz

)

this can be written as follows, yp+

α

u+ =

µ

0. We summarize the previous considerations into the following theorem.

Theorem 3.4. (Bilinear optimality system) The optimal solution

(

y u,

)

H01

( )

Ω ×L2

( )

to (3.2) in the bilinear control case is characterized by the existence of the dual pair

(

p,

µ

)

H01

( )

Ω ×L2

( )

such that

(

)

*

0 0 0

, 0.

Ay uy f A p y up z

yp u

B u

α µ

µ

+ − =

+ + − =

+ + =

=

(3.22)

Furthermore, the explicit reduced gradient and the reduced Hessian of Jˆ2

( )

u are given by

( )

2

ˆ ,

J u

α

u py

∇ = + (3.23) and

( )(

)

(

) (

) ( ) (

) (

)

(

) ( )

* 1 *

2 1 2 1 2 2

1

1 2 2

ˆ , ,

, .

J u v v v y A u A u yv y A u pv

v p A u yv αv

− − −

′′ = + + − +

− + +

4. Proximal Methods for Elliptic Control Problems

In this section, we discuss first-order proximal methods to solve our linear and bilinear optimal control problems. The starting point to discuss proximal methods consists of identifying a smooth and a nonsmooth part in the reduced objective J uˆ

( )

. That is, we consider the following optimization problem

( )

1

( )

2

( )

ˆ ˆ ˆ

min : ,

ad

u UJ u =J u +J u

(4.1) where, we assume

( )

1

ˆ is continuous, convex and nondifferentiable

J u (4.2)

( )

2

ˆ is Q-differentiable with respect to , convex, and has Lipschitz-continuous gradient:

ad

J u U

(4.3)

( )

( )

( )

2 2 2

ˆ ˆ ˆ , , ,

ad

J u J v L J u v u v U

∇ − ∇ ≤ − ∀ ∈ (4.4) where L J

( )

ˆ2 >0. Notice that our optimal control problem (3.2) has this additive structure where (4.2) holds for

( )

1

1 ˆ

L

J u =

β

u and ˆ2

( )

1

( )

2 2

2 2

J u = S uzu is at least two times Q-differentiable, it is convex under appropriate conditions discussed in the previous section, and it has Lipschitz-continuous gradient that we prove in the following lemma.

Lemma 4.1. The functional ˆ2

( )

1

( )

2 2

2 2

J u = S uzu has a Lipschitz-continuous gradient for

( )

1

(

)

S u =Afu (linear-control case) and for S u

( ) (

= A u+

)

−1 f (bilinear-control case). Proof. For the linear-control case, we have

( )

( )

(

)

(

)

(

2 2

)

1

2 2

1 1

,

ˆ ˆ

, L L

J u J v u v A A u v

u v A A u v A A u v

α

α α

−∗ −

−∗ − −∗ −

∇ − ∇ = − + −

(8)

such that we have the Lipschitz-constant

( )

2

(

1 2 2

)

, ˆ

L L

L J = α+ A A−∗ − .

For the bilinear-control case, we use the mean value theorem. There exists a ξ ∈Uad such that

( )

( )

( )

( )( )

( )( )

( )

( )(

)

( )

( )(

) ( )( )

( )(

) ( )

(

)

2 2

2

2 2 2 2 2

, 1 , 1

, 1

2 2

2

2 1 3 3

ˆ ˆ sup ˆ ˆ sup ˆ ,

sup , , , ,

,

h L h h L h

b b b b

h L h

J u J v J u h J v h J h u v

S u v S h S u v h S z u v h

C f C C f C f z u v

ξ

ξ ξ ξ ξ α

α

∈ Ω ≤ ∈ Ω ≤

∈ Ω ≤

′ ′ ′′

∇ − ∇ ≤ − = −

′ ′ ′′

= − + − − + −

≤ + − + −

for the last inequality, we use (2.7), (2.13), (2.14), which completes the proof. □ The following lemma is essential in the formulation of proximal methods.

Lemma 4.2. Let J2 be Q-differentiable with respect to Uad, and it has a Lipschitz continuous gradient with Lipschitz constant L J

( )

2 . Then for all LL J

( )

2 >0, the following holds

( )

( )

( )

2

2 2 2

ˆ ˆ ˆ , , , .

2 ad

L

J uJ v + ∇J v u− +v uvu vU (4.5) Proof.

( )

( )

( )

(

(

)

)

( )

( )

( )

(

(

)

)

( )

( )

( )

( )

( )

1

2 2 2 0 2 2

1

2 2 0 2 2

1 2

2 2 0

2

2 2

ˆ ˆ ˆ , ˆ ˆ , d

ˆ ˆ , ˆ ˆ d

ˆ ˆ , d

ˆ ˆ , .

2

J u J v J v u v J v t u v J v u v t

J v J v u v J v t u v J v u v t

J v J v u v Lt u v t

L

J v J v u v u v

= + ∇ − + ∇ + − − ∇ −

≤ + ∇ − + ∇ + − − ∇ −

≤ + ∇ − + −

≤ + ∇ − + −

Notice that L:=L J

( )

ˆ2 represents the smallest value of L such that (4.5) is satisfied. We remark that the discussion that follows is valid for LL J

( )

ˆ2 as in Lemma 4.5. However, as we discuss below, the efficiency of our proximal schemes depends on how close is the chosen L to the minimal and optimal value L J

( )

ˆ2 . Now, since this value is usually not available analytically, we discuss and implement below some numerical strategies for determining a sufficiently accurate approximation of L J

( )

ˆ2 . In particular, we consider a power iteration

[21], and the backtracking approach discussed in Remark 5.1.

Further, notice that also in the case of choosing LL J

( )

ˆ2 , our proximal scheme still converges with rate

1k (resp. 1k2) times a convergence constant. However, this convergence constant grows considerably as L becomes larger and therefore the convergence of the proximal method appears recognizably slower. On the other hand, if L is chosen smaller than the Lipschitz constant, then convergence cannot be guaranteed.

The strategy of the proximal scheme is to minimize an upper bound of the objective functional at each iteration, instead of minimizing the functional directly. Lemma 4.2 gives us the following upper bound for all

ad

vU . We have

( )

( )

{

}

( )

( )

( )

2

1 2 1 2 2

ˆ ˆ ˆ ˆ ˆ

min min , ,

2

ad ad

u U u U

L

J u J u J u J v J y u v u v

∈ ∈

 

+ ≤ + + ∇ − + −

 

where, we have equality if u=v. Furthermore, we have the following equation

( )

( )

( )

( )

( )

2

1 2 2

2

1 2

ˆ ˆ ˆ

argmin ,

2 1

ˆ ˆ

argmin .

2

ad

ad

u U

u U

L

J u J v J v u v u v

L

J u u v J v

L

+ + ∇ − +

 

 

 

   

=  + − − ∇ 

 

 

(4.6)

(9)

( )

( )

1

2 2

1

argmin for any ,

2

ad ad

U

u U∈ τ uL u v τ v v L

+ =

 

  

where the projected soft thresholding function is defined as follows

( )

{

}

{

( )

}

( )

{

}

{

}

{

( )

}

min , on :

: 0 on :

max , on :

ad

b

U

a

v u x v x

v x v x

v u x v x

τ

τ τ

τ

τ τ

 − ∈ Ω >



= ∈ Ω ≤

+ ∈ Ω < −



Proof. There exists a

γ

( )

u ∈ ∂ u L1, the subdifferential of 1

L

⋅ such that the solution 1

2 1 : argmin

2

ad

u U L

u = ∈ τ u + uv 

  fulfills the following variational inequality; see, e.g., ([20], Remark 3.2);

( )

, 0, ad.

u− +v τγ u uu ≥ ∀ ∈u U (4.7) Now, we show that uˆ : Uad

( )

v

τ

= fulfills (4.7). The following investigation of the different cases is meant to be pointwise. We have

v− >τ ub≥0:

It follows that uˆ=ub and therefore

γ

( )

uˆ =1 such that

(

ub− +v

τ

)(

u ub

)

≥ ∀ ∈0, u Uad.

• 0< − <v τ ub:

It follows that ˆu= − >v τ 0 and

γ

( )

uˆ =1 such that

(

u vˆ− +

τ

)(

u ub

)

= ∀ ∈0, u Uad.

v

τ

:

It follows that ˆu=0 and γ

( )

uˆ v B1

( )

0

τ

= ∈ such that

(

)

ˆ v ˆ 0, ad

u v τ u u u U

τ

− +   = ∀ ∈

  

 

  .

ua< + <v τ 0:

It follows that ˆu= + <v τ 0 and therefore

γ

( )

uˆ = −1 such that

(

u vˆ− −

τ

)(

u u−ˆ

)

= ∀ ∈0, u Uad

v+ <τ ua≤0:

It follows that uˆ=ua and therefore

γ

( )

uˆ = −1 such that

(

ua− −v

τ

)(

u ua

)

≥ ∀ ∈0, u Uad.

Based on this lemma, we conclude that the solution to (4.6) is given by

( )

( )

2

( )

1 2 2

1 1

ˆ ˆ ˆ

argmin ,

2

ad ad

U u U

L L

J u u v J v v J v

L β L

 

+ − ∇  =− ∇

   

 

  

thus obtaining an approximation to the optimal u sought. Therefore we can use this result to define an iterative scheme as follows

(

)

1 2 1

1 ˆ

,

ad

U

k k k

L

u u J u

L

β  − − 

− ∇

 

(10)

Algorithm 1 (Proximal (P) method) Require:

β

, Jˆ2, u0, Uad, TOL

Initialize: L=L J

( )

ˆ2 ; v0 =u0; t0=1; B0 =1, k=1; while Bk1 >TOL do

1. Uad 1 1 ˆ2

(

1

)

k k k

L

u u J u

L

β  − − 

= − ∇

 

2. µk = −αukS u

( ) ( )

k *

(

S ukz

)

(3.13) 3. Bk =B u

(

k,

µ

k

)

4. k= +k 1 end while

This scheme is discussed in [9][12] for the case of finite-dimensional optimization problems. Notice that the iterated thresholding scheme discussed in [9] coincides with Algorithm 1 for the special case L=1. The con- vergence results for Algorithm 1 presented in [12] can be extended to our elliptic control problems, using the theoretical results presented above. Therefore we can state the following theorem.

Theorem 4.4. Let

{ }

uk be a sequence generated by Algorithm 1 and *

u be the solution to (3.2) with linear- or bilinear-control elliptic equality constraints. Then, for every k≥1 the following holds

( )

( )

( )

2 *

2 0

* ˆ

ˆ ˆ .

2 k

L J u u

J u J u

k

− ≤

In [22], an acceleration strategy for proximal methods applied to convex optimization problems fulfilling (4.4) is formulated, which improves the rate of convergence of these schemes from 

( )

1k to

( )

2

1k

 . Speci- fically, one defines the sequence

{

t vk, k

}

with

2

0 1, k: 1 1 4k 1 2 ,

t = t = + + t − (4.8) and

(

1

) (

)

0 0 1

1

: , k: k k k k .

k

t

v u v u u u

t

= = + − (4.9)

Correspondingly, the optimization variable uk is updated by the following

(

)

1 2 1

1 ˆ

.

ad

U

k k k

L

u v J v

L

β − −

 

− ∇

 

This procedures is summarized in the following algorithm Algorithm 2 (Fast proximal (FP) method) Require:

β

, Jˆ2, u0, Uad, TOL

Initialize: L=L J

( )

ˆ2 ; v0 =u0; t0=1; B0 =1, k=1; while Bk1 >TOL do

1. Uad 1 1 ˆ2

( )

1

k k k

L

u v J v

L

β − −

 

= − ∇

 

2. µk = −αukS u

( ) ( )

k *

(

S ukz

)

(3.13) 3. Bk =B u

(

k,

µ

k

)

4.

2 1

1 1 4

2 k k

t

t = + + −

5. 1

(

)

1 1

k

k k k k

k t

v u u u

t

 − 

= +  −

 

(11)

The following convergence result represents an extension of ([12], Theorem 4.4). We have Theorem 4.5. Let

{ }

uk be a sequence generated by Algorithm 2 and

*

u be the solution of (3.2) with linear- or bilinear-control elliptic equality constraints. Then, for every k≥1 the following holds

( )

( )

( )

(

)

2 *

2 0

*

2 ˆ 2

ˆ ˆ .

1 k

L J u u

J u J u

k

− ≤

+

Algorithm 1 and Algorithm 2 require the calculation of

( )

*

(

1

(

)

)

(

)

2

ˆ linear

J u AAf u z αu

∇ = − − − +

( )

(

) (

(

)

1

)

(

)

1

(

)

2 ˆ

resp. ∇J u = − A u+ −∗ A u+ − fz A u+ − f +

α

u bilinear .

However, the exact inversion of a discretized elliptic differential operator A may become too expensive. Therefore one has to use iterative methods; e.g., the conjugate gradient method [23]. For this reason, we discuss an inexact version of the proximal scheme, where the equality constraints and the corresponding adjoint equa- tions are solved up to a given tolerance quantified by ε>0. In the following, we denote with ∇εJˆ2

( )

u the inexact gradient that corresponds to an inexact inversion of the equation Ay= −f u, resp.

(

A u y+

)

= f , that results in an approximated state variable yε, resp. pε, in the following sense

(

)

, resp. .

Ayε − +f u ≤ε A u y+ ε− f ≤ε Hence, there exists an e∈L2

( )

Ω with e <

ε

such that

(

)

(

) (

1

)

1 , resp. .

yε =Af − +u eyε = A u+ − f +e (4.10) We denote the inexact inversion method for the problem y=g, with an error yε −g ≤ε, with

(

, ,

)

invg

ε

. With this notation, the inexact gradient computation is illustrated in Algorithm 3. Algorithm 3 (Calculation of the inexact gradient ∇εJˆ2

( )

u )

Require:

β

, Jˆ2, u0, Uad, TOL

1. yε =inv A f

(

, −u

)

, resp. yε =inv A u f

(

+ , ,ε

)

2.

(

*

)

(

*

)

, , , resp. , ,

pε =inv A zyε ε pε =inv A +u zyε ε 3. ∇εJˆ2

( )

u = pε+αu, resp. ∇εJˆ2

( )

u = p yε ε+αu

With this preparation, we formulate our inexact proximal (IP) scheme with Algorithm 4. Algorithm 4 (Inexact proximal (IP) method)

Require:

β

, Jˆ2, u0, Uad, TOL, ε0

Initialize: L=L J

( )

ˆ2 ; v0 =u0; t0=1; B0 =1; k=1 while Bk−1 >TOL do

1. : 0

k k

ε

ε =

2. ad 1 1 ˆ2

(

1

)

k

U

k k k

L

u u J u

L ε

β − −

 

= − ∇

 

3. µk = −αukS u

( ) ( )

k *

(

S ukz

)

(3.13) 4. Bk =B u

(

k,

µ

k

)

5. k= +k 1 end while

(12)

method.

Algorithm 5 (Fast inexact proximal (FIP) method) Require:

β

, Jˆ2, u0, Uad, TOL, ε0

Initialize: L=L J

( )

ˆ2 ; v0 =u0; t0=1; B0 =1, k=1; while Bk1 >TOL do

1.

(

0

)

3 :

1 k

k

ε

ε

=

+

2. ad 1 1 ˆ2

( )

1 k

U

k k k

L

u v J v

L ε

β − −

 

= − ∇

 

3. µk = −αukS u

( ) ( )

k *

(

S ukz

)

(3.13) 4. Bk =B u

(

k,

µ

k

)

5.

2 1

1 1 4

2 k k

t

t = + + −

6. 1

(

)

1 1

k

k k k k

k t

v u u u

t

 − 

= +

 

7. k= +k 1 end while

5. Convergence Analysis of Inexact Proximal Methods

In this section, we investigate the convergence of our IP and FIP schemes. Notice that our analysis differs from that presented in [12] where finite-dimensional problems and exact inversion are considered. We start investigat- ing the error of the inexact gradient Jˆ2′ε

( )

u .

Lemma 5.1. The following estimate holds

( )

( )

2 2

ˆ ˆ ,

J u J u c

ε ε

∇ − ∇ ≤

for somec > 0.

Proof. In the linear case, we have

( )

*

(

1

(

)

)

2

ˆ

J u AAf u z αu

∇ = − − − + . Using (4.10) there exist the errors

( )

2 1, 2

e eL

  with e 1 ,e2 <

ε

such that

( )

( )

*

(

1

(

)

)

*

(

1

(

)

)

2 2 1 2

* 1 *

1 2

ˆ ˆ

,

J u J u A A f u e z e A A f u z

A A e A e c

ε

ε

− − − −

− − −

∇ − ∇ = − − + − + + − −

= − + <

 

  (5.1)

where

1 .

c= A A−∗ − + A−∗ (5.2) In the bilinear case, we have ∇Jˆ2

( )

u = −

(

A u+

) (

−∗

(

A u+

)

−1 fz

)

(

A u+

)

−1f +

α

u. Furthermore, Theorem 2.1 implies that the solution y:=

(

A u+

)

−1g of the equation Ay uy+ =g has the following property

(

)

1 2

( )

1 , for all .

A u+ − gC g gL Ω (5.3)

We also have

(

A u+

)

−∗g =

(

A*+u

)

−1gC g1 .

(13)

( )

( )

(

) (

(

) (

)

)

(

) (

)

(

) (

(

)

)

(

)

(

) (

(

)

)

(

) (

)

(

) (

(

)

)

(

)

(

)

(

)

(

(

)

)

(

) (

(

)

)

(

)

(

)

(

)

(

)

2 2

1 1

1 2 3

1 1

1 1

1 2 3

1 1

3

1 1

1 2 3

1 1

3

1

1 1 2 1 1 3

1 1

ˆ ˆ

( )

J u J u

A u A u f e z e A u f e

A u A u f z A u f

A u A u e e A u f e

A u A u f z A u e

A u A u e e y A u e

A u A u f z A u e

C A u e e C f C e

C y z C

ε

−∗ − −

−∗ − −

−∗ − −

−∗ − −

−∗ − −

−∗ − −

∇ − ∇

= − + + + − + + +

+ + + − +

= − + + + + +

− + + − +

≤ + + + + +

+ + + − +

≤ + + +

+ +

  

  

  

  

(

)

(

) (

)

3 2

1 1 1 ,

e

CCε ε f ε C f z ε cε

+ + + +

where

2

1 2 1 1 1 .

c=C C f + f +C + + z (5.4) For the three last inequalities, we use (5.3), (2.7), and ε<1. □ We refer to the estimation error of the inexact gradient in step k as follows

( )

( )

2 2

ˆ ˆ

: , where .

k k k k k

e = ∇εJ u − ∇J u ec

ε

Now, we define

( )

1

( )

( )

2

2 2

ˆ ˆ

, : , ,

2

L L

L Q u vu +J v + ∇J v u− +v uv

( )

1

( )

( )

2

2 2

ˆ ˆ

, : , , ,

2

e

L L

L

Q u vu +J v + ∇J v u− +v uv + e uv

and

( )

: argmin

{

( )

,

}

,

ad

e e

L u U L

P v = Q u v (5.5) such that one step of Algorithm 4, resp. Algorithm 5, can be written as follows

(

)

( )

1 1

1 , resp. 1 .

k k

e e

k L k k L k

u Pu u Pv

− −

= =

In order to prove the convergence of the IP method, we need the following two lemmas.

Lemma 5.2. For any vUad, one has w=PLe

( )

v iff there exists

γ

( )

v ∈ ∂ wL1, the subdifferential of 1

L

⋅ , such that

( )

(

)

( )

2

ˆ , 0, .

ad

J v L w v βγ v e u w u U

∇ + − + + − ≥ ∀ ∈ (5.6) Proof. This is immediate from the variational inequality of (5.5). For a proof see, e.g., [20]. □ Lemma 5.3. Let vUad and L>L J

( )

ˆ2 , then for any uUad, we have

( )

(

( )

)

( )

2

( )

( )

ˆ ˆ , , .

2

e e e e

L L L L

L

J uJ P vP vv +L v u Pv − +v P vu e

Proof. From (4.5), we have

( )

(

)

(

( )

)

ˆ e e , ,

L L L

(14)

and therefore

( )

(

( )

)

( )

(

( )

)

ˆ ˆ e ˆ e , .

L L L

J uJ P vJ uQ P v v (5.7) Now since

β

L1 and Jˆ2 are convex, we have

( )

( )

( )

( )

( )

( )

1 1 , and ˆ2 ˆ2 , ˆ2 .

e e

L L

L L

u P v u P v v J u J v u v J v

β ≥β + − βγ ≥ + − ∇

Summing the above inequalities gives

( )

( )

1

( )

( )

2

( )

2

( )

ˆ e e , ˆ , ˆ ,

L L L

J u ≥β P v + uP v βγ v +J v + u− ∇v J v (5.8) so using (5.6), (5.8), and the definition of QL in (5.7) gives the following

( )

(

( )

)

( )

( )

( )

( )

( )

( )

( )

( )

( )

( )

( )

2 2 2 2

ˆ ˆ , ˆ

2

, ,

2

, , .

2

e e e

L L L

e e e e

L L L L

e e e

L L L

L

J u J P v P v v u P v J v v

L

P v v L u P v v P v P v u e

L

P v v L v u P v v P v u e

βγ

− ≥ − − + − ∇ +

≥ − − + − − + −

= − + − − + −

Now, we prove a  convergence rate for Algorithm 4 (IP scheme).

Theorem 5.4. Let

( )

uk be the sequence generated by Algorithm 4 and u* be the solution of (3.2) with linear or bilinear elliptic equality constraints; let c be determined by (5.2) resp. (5.4). Then for any k≥1, we have

( )

( )

( )

2 *

2 0 0

*

ˆ 2

ˆ ˆ .

2

b a

k

L J u u c u u

J u J u

k

ε

− + −

− ≤ (5.9)

Proof. Using Lemma 5.3 with *

u=u , v=un and L=L J

( )

ˆ2 we obtain

( )

(

)

(

*

)

2 * *

1 1 1 1

2 2

* * *

1 1

2 2

2 , ,

2

, .

n n n n n n n k

n n n k

J u J u u u u u u u u u e

L L

u u u u u u e

L

+ + + +

+ +

− ≥ − + − − + −

= − − − + −

Summing this inequality over n=0,,k−1 gives

( )

1

(

)

2 2 1

* * * *

1 0 1

0 0

2 2 2

, , .

k k

n k n k k

n n

kJ u J u u u u u u e k u e

L L L

− −

+ +

= =

+ +

 

(5.10)

Using again Lemma 5.3 with u= =v un, we obtain

( )

(

)

(

)

2

1 1 1

2 2

, .

n n n n n n k

J u J u u u u u e

L − + ≥ − + +L + −

Multiplying this inequality by n and summing again over n=0,,k−1 gives

( ) (

) (

)

(

)

(

)

(

)

(

)

1 1 1 0 1 1 2

1 1 1

0 0

2

1

2

, 1 , , ,

k

n n n

n

k k

n n n k n k n k

n n

nJ u n J u J u

L

n u u n u e n u e u e

L − + + = − − + + + = = − + + ≥ − + − + + −

which simplifies to the following

( )

1

(

)

1 1

2

1 1 1

0 0 0

2 2 2

, , .

k k k

k n n n k k n k

n n n

kJ u J u n u u k u e u e

L L L

− − −

+ + +

= = =

+ +

 

(5.11)

(15)

( )

( )

(

)

2 1 2

2 * * * * 1 0 0 2 2 , , k

k k n n k k

n

k

J u J u u u n u u u u k u u e

L L

+ =

− ≥ − +

− − − + −

and hence with 0

k k

ε

ε = and uau uk, *≤ub, it follows that

( )

( )

( )

( )

2 2 * * 0 0 * * * 2 2 * * *

2 0 0 2 0 0

,

2 2

ˆ 2 ˆ 2

.

2 2

k k k k

k b a

L u u L u u

J u J u L u u e c u u

k k

L J u u c u u L J u u c u u

k k ε ε ε − − − ≤ + − ≤ + − − + − − + − ≤ ≤ □

Next, we present a convergence result for the FIP method. For this purpose, we need the following lemma. Lemma 5.5. Let

( )

uk ,

( )

vk and

( )

tk be the sequences generated by Algorithm 5, let ek be the error of the inexact gradient, and let u* be the solution to (3.2), then for any k≥1, we have

2 2

2

1 1 1 1

2 2 2

, ,

k k k k k k k k k

t w t w r r t r e

L − −L + ≥ + − +L + with

( )

( )

*

:

k k

w =J uJ u , rk:=tk1uk

(

tk1−1

)

uk1u*.

Proof. We apply Lemma 4.2 at the points

(

u:=u vk, :=vk

)

and likewise at the points

(

*

)

: , : k u =u v =v . We obtain the following

(

)

2

1 1

1 1 1 1

2

1 * 1 *

1 1 1 1

2 2 , 2 , ,

2 2 , 2 , ,

k k k k k k k k k k k

k k k k k k k k

L w w u v u v v u L u u e

L w u v u v v u L u u e

− − − + + + − − − + + + − ≥ − + − − + − − ≥ − + − − + −

where we used the fact that uk+1=PLe

( )

vk . Now, we multiply the first inequality above by

(

tk−1

)

and add it

to the second inequality to obtain the following

(

)

(

)

2

(

)

*

1 1 1

* 1

2

1 2 , 1

2 2

, , .

k k k k k k k k k k k k k

k k k k k k

t w t w t u v u v t v t u u

L

t u u e u u e

L L

+ + +

+

− − ≥ − + − − − −

+ − + −

Multiplying this inequality by tk and using tk21= −tk2 tk, which holds due to (4.8), we obtain

(

)

(

)

(

)

(

)

2

2 2 *

1 1 1 1

* 1

2

2 , 1

2

1 , .

k k k k k k k k k k k k k k

k k k k k k

t w t w t u v t u v t v t u u

L

t t u t u u e

L

− + + +

+

− ≥ − + − − − −

+ − − −

Applying the Pythagoras relation ab2+2 ba a, −c = bc2− −a c2, to the right-hand side of the last inequality with a:=t v bk k, :=t uk k+1, :c =

(

tk −1

)

uk+u*, we obtain

(

)

(

)

(

)

(

)

2 2

2 2 * *

1 1 1

* 1

2

1 1

2

1 , .

k k k k k k k k k k k k

k k k k k k

t w t w t u t u u t v t u u

L

t t u t u u e

L

− + +

+

− ≥ − − − − − − −

+ − − −

Therefore, with vk (see (4.9)) and rk defined as

(

)(

)

(

)

*

1 1 1 , : 1 1 1 1 ,

k k k k k k k k k k k k

t v =t u + t uu r =tutuu

it follows that

2 2

2

1 1 1 1

2 2 2

, .

k k k k k k k k k

t w t w r r t r e

L − −L + ≥ + − +L +

Figure

Table 1sin πx11 256
Table 1. Example 1: Comparison of the FIP, FIPB and ISSN methods.
Figure 1. Validation of the theoretical upper bound (Theorem 58).

References

Related documents

The Bendigo Health Outreach is modelled in part on the Cancer Patients’ Legal Service at the Peter McCallum Institute where practitioners from Baker Mackenzie Lawyers provide pro

It was the purpose of this investigation to describe the radiographic hip joint phenotype of the Pembroke Welsh Corgi and to test two diametrically opposed

FIGURE 1-17: Sample customer data for Colonial Adventure Tours.. Introduction to Colonial Adventure Tours Database

When the OneSource link on the ReferenceUSA Home page is selected the OneSource Global Business Browser database is opened. For instructions on how to use this database please refer

Five attributes that may be key to a good surgical teaching program were identified: relevant course content; reinforcement of material (covered in lectures, then again in

In the event the Rate of Interest cannot be determined in accordance with the foregoing provisions by the Principal Paying Agent (or such other party responsible for

Th is clear correlation demonstrated between skin and transplanted organ histo- pathology after solid organ (kidney) transplantation concurs with a previous report of a