Primal dual algorithm and ADMM for digital image processing

(1)

Primal-dual algorithm and ADMM for

digital image processing

by

Huichen Liang

u5606165

degree of Master of Mathematical Science

(2)

Acknowledgements

I would like to express my special appreciation and thanks to my supervisor Dr. Qinian

Jin, you have been a tremendous mentor for me. I would like to thank you for

encour-aging my research and for your brilliant comments. The biweekly meeting is very much

appreciated.

A special thanks to my best friend Jingyue Lu. Words cannot express how grateful I

am to you. Your advice on my study and life have been invaluable.

I would especially like to thank my family. You always encourage me to try new things.

Your support is vital for me to change my program from geophysics to math.

(3)

This thesis introduce two algorithms to remove the noise and blur from the image. In

the first section, we will talk about the primal-dual algorithms, which is efficient to solve

the non-smooth convex problem. For the general problem, this method will converge

to the saddle point with rate O(1/N) in finite dimension Hilbert space. Furthermore,

when either the primal object or dual object is uniformly convex, we can deduce that the

convergence rate can achieveO(1/N2). When both the primal object and dual object are

uniformly convex, we can deduce that the convergence rate can achieve O(ωN2). Since

the primal-dual algorithm is sensitive to the regularization parameter and it depends on

that the dual problem is solvable, in the second section, we introduce a method using

the alternating direction method of multipliers (ADMM) strategy by just adding a new

variable. This method will converge to the solution when the data obtained is exact.

(4)

Acknowledgements i

1 Introduction 1

2 First order primal-dual algorithms 5

2.1 the primal-dual algorithm . . . 7

2.2 Acceleration: Gis strongly convex . . . 12

2.3 Acceleration: Gand F∗ are both strongly convex . . . 16

3 ADMM regularization Algorithm 20 3.1 ADMM algorithm and basic estimates . . . 23

3.2 Convergence analysis of exact data case . . . 30

3.3 Regularization: noisy data case . . . 35

4 Experiments 43 4.1 The Primal-Dual Algorithm . . . 43

4.1.1 Total Variation Based Image Denoising . . . 44

4.1.1.1 The ROF Model . . . 44

4.1.1.2 The TV-L1 _Model _{. . . 47}

4.1.1.3 The Huber-ROF Model . . . 49

4.1.2 Advanced Imaging Problems . . . 51

4.1.2.1 Image Deconvolution and Zooming. . . 51

4.2 Experiments of ADMM Algorithm . . . 53

4.3 Conclusion . . . 55

Bibliography 56

(5)

Introduction

The first permanent photograph of a camera image was made in 1826 by Joseph Nicphore

Nipce. After that, more and more people are accustomed to use the camera to record the

meaningful scenes. For us, the photos are the clearer the better, however, every photo

should be more or less blurry. The noise and blur produced by not only by camera, but

also the sensor and circuitry of a scanner . Since these noise and blur are unavoidable,

this paper will introduce some algorithms with applications to remove the these bad

elements.

A digital image is a numeric representation of a two-dimensional image. The elements

form this digital picture is called elements pixels. We can use a intensity value to

represent the pixels, for example, (1,0,0) is to represent the colour red in Matlab.

Furthermore, A digital image can be transformed into a matrix. In particular, a digital

image can be represent by a two dimensional matrix to describe the grayscale that each

element of the matrix determines the intensity of the corresponding pixel. And a three

dimensional matrix can use to describe the colorscalethat the elements which are integer

numbers between 0 and 255 determine the intensity of the pixel with respect to the

color of the matrix. Once the image has been transformed into matrix, we can do some

calculation of it.

Many papers have proven that there are many methods to solve the ill-posed inverse

imaging problems. Usually, the problems can be divided into two parts: the convex

problems and the non-convex problems. For the convex problems, we can usually obtain

the global optimum whose accuracy is only depended on the the model, independent of

the initialisation and the optimisation algorithm. For the non-convex problems, we can

compute the more precious solution, however, the solution is sensitive to the initialisation

and the optimisation algorithm.

(6)

Therefore, in this paper, we will introduce two algorithms: the primal-dual algorithm

and the alternating direction method of multipliers (ADMM) to solve ill-posed inverse

imaging problems.

Total variation minimisation is vital to the convex method for imaging, which is based

on the principle that the noise signals always have a high total variation. This method

was introduced to solve the image denoising by Rudin, Osher and Fatemi [11]. The

advantage of this method is that it considers the sharp discontinuities in the result.

However, for the non-smooth problems, it is not easy to minimise the total variation.

Therefore at first, we will introduce an algorithm that the primal-deal algorithm, for the

non-smooth convex optimisation problems.

The second chapter of this paper, which is based on the paper [4], is to talk about the

primal-dual algorithm. In this chapter, we will focus on using primal-dual algorithm to

solve the generic saddle-point problem such that

min

x∈Xmaxy∈Y < Kx, y >+G(x)−F ∗

(y)

where X and Y are two finite-dimensional real vector spaces. We will analyze this

algorithm under several assumptions. Firstly, we will talk about the basic algorithm

under the assumption thatGandF∗are both are proper, convex, lower-semicontinuous

functions. Then Algorithm 1 will converge at the rate of O(1/N). And we havexn →

x∗, yn→y∗. In the second section of chapter 3 , we add one assumption that only Gis

uniformly convex, then we can obtain that the Algorithm 1 with constant step length

still converges at the rate of O(1/N). Therefore we consider to update the length of

step in each iteration. We obtain the Algorithm 2, that the length of steps in each

iteration depends on n. Under this algorithm, we deduce that, the convergence rate is

O(1/N2). In the next section, assuming both of G and F∗ are uniformly convex but

with the specific constant step length, we deduce the convergence rate isO(ωN/2) under

the Algorithm 3.

Variational regularization methods is a common tool to solve these inverse problems.

Total variation regularization what we mentioned is one kind of the variational

regu-larization methods. While from the above experiments, we can find that the choice of

the regularization parameter is very important for denoising. When it is very small, the

denoising effect is not apparent, while it is very large, the total variation term plays an

(7)

adding a variable y=W x such that

 



minimize f(y)

subject to Ax=b, W x=b, x∈domW

Next, this chapter continues to doing the convergence analysis under four assumptions

thatAis a bounded linear operator,f is a proper, lower semi-continuous, strong convex

function, Wv is a densely defined, closed, linear operator, and there exists a positive

constant c1 such that kAxk2 +kW xk2 ≥ c1kxk2. We find that the problem admits a

unique solution.

Furthermore, we observe that both of x and y will converge with the increase of the

iteration number. We letx∗ be the unique solution andy∗ =W x∗, then there holds

xk→x∗, yk→y∗, W xk→y∗, f(yk)→f(y∗), Dµkf(y

∗

, yk)→0

ask→ ∞, wherekis the iteration number. Obviously, the data obtained must contain

error, then we use δ to represent the noisy level, such that kbδ−bk2 _≤_δ_{. then there}

holds

xδ_k→x∗, y_kδ→y∗, W xδ_k→y∗, f(y_kδ)→f(y∗), D_µδ kf(y

∗_{, y}δ k)→0

asδ →0.

Furthermore, after we adding the a stopping criterion to the ADMM algorithm, there

holds

xδ_k_δ →x∗, y_kδ_δ →y∗, W xδ_k_δ →y∗, f(y_kδ_δ)→f(y∗), Dµ_kδf(y∗, ykδ)→0

asδ →0. And kδ is the first integer satisfied the stopping criterion.

The first section of final chapter shows the experiment result comparisons of

primal-dual algorithms using the total variation method. There are two major categories of

image deblurring which are image sharpening and image restoration by deconvolution.

Sharpening enhances the definition of edges in an image or high frequency components to

bring out invisible details. Image restoration based on deconvolution exploits a different

concept. The blurred image is modeled as the original image convolved with a 2D

filter (PSF), which degrades the image. The goal of image restoration is to undo the

convolution and in turn eliminate the degradation or the blurring.

The first model introduced in chapter three is the Rudin Osher Fatemi (ROF) Model

(8)

noise from a noisy digital image preserving sharp discontinuities. This method is to seek

a minimizer of the sum of a data fidelity term measured in the square of l2-norm and

the total variation regularization term which is given by

min

x Z

Ω

|Du|+ λ

2ku−gk

2 2

We also call it the TV-L2 model since data fidelity term is in two norm. The second

model is the TV-L1 Model, which just replaces theL2 norm in ROF model byL1 norm,

which is defined by

min

x Z

Ω

|Du|+λku−gk₁

Then it becomes to be non-strictly convex with non-unique global minimizer [12]. The

TV-L1 Model just change a little bit, while it always obtains the cleaner image than

ROF model especially for the salt and pepper noise, and is contrast invariant[16]. The

third model is the Huber-ROF model. This model is smooth in both terms, which just

replace the L1 norm by the Huber-norm such that

|x|α =  

 |x|2

2α if|x| ≤α

|x| − α

2 if|x|> α

This model are employed to avoid undesired staircasing effects and yields a more natural

results. Finally, we will extend the ROF model to image deconvolution and digital

zooming such that

min

u Z

Ω

|Du|+ λ

2kAu−gk

2 2

whereAis convolution with the point spread function. The total variation based

zoom-ing brzoom-ings a image with sharp edge with a very blurry result.

The second section of final chapter shows the experiment result comparisons of ADMM

algorithms adding a stop criterion. By settingf(y) =kyk1+ν₂kyk22, which can be solved

explicitly by the soft thresholding, we can see this method can obtain a clear result from

(9)

First order primal-dual

algorithms

Blurring in images can arise from many sources, such as limitations of the optical

sys-tem, camera and object motion, astigmatism, and environmental effects ([8]). Image

deblurring is the process of making a blurry image clearer to better represent the true

scene.

Let x and b denote the true image and the blurred image respectively. The blurring

process can be described by a bounded linear operatorK:X →Y between two Hilbert

spacesX and Y such thatb=Kx. Considering the appearance of unavoidable random

noise, one usually has the noisy data ˜bin the sense that

˜_b₌_Ax₊_ε

whereε denotes the noise. Therefore how to reconstruct the true image from the noisy

data becomes an important question.

The variational regularization method is to reconstruct the true image by considering

the minimization problem

min

x∈X{G(x) +F(Kx)}, (2.1)

where G : X → [0,∞] and F :Y → [0,∞] are proper, lower semi-continuous, convex

functions. The first term G(x) is the regularization term which is used to capture the

feature of the sought solution, and the second termF(Kx) is fidelity-to-data term which

measures the smallness of the residual.

(10)

LetF∗ denote the Fenchel conjugate ofF, i.e.

F∗(y) = sup

z∈_Rn

{hy, zi −F(z)},

thenF∗ :Y →[0,∞] is also a proper, lower semi-continuous, convex function with the

property that

F(z) = sup

y∈Rn

{hz, yi −F∗(y)}.

Consequently, (2.1) can be reformulated into the following equivalent saddle-point

prob-lem

min

x∈Xmaxy∈Y {G(x) +hKx, yi −F ∗

(y)}, (2.2)

whereh·,·idenotes the inner product whose induced norm is denoted byk · k. Therefore,

finding a solution ˆxof (2.1) is equivalent to find a solution (ˆx,yˆ) of (2.2) which is called

a saddle point in the sense that

L(ˆx, y)≤L(ˆx,yˆ)≤L(x,yˆ)

for all x∈X and y∈Y, where

L(x, y) =G(x) +hKx, yi −F∗(y).

It is easy to see that (ˆx,yˆ) is a saddle point if and only if

G(x)≥G(ˆx) +h−K∗y, xˆ −xˆi,

F∗(y)≥F∗(ˆy) +hKx, yˆ −yˆi.

which is equivalent to

Kxˆ∈∂F∗(ˆy), −(K∗yˆ)∈∂G(ˆx), (2.3)

where∂F∗ and∂G denote the subdifferential of F∗ andGrespectively. Recall that, for

a convex functionf :X→(−∞,∞], its subdifferential∂f(x) atx∈X is defined b

∂f(x) :={ξ ∈X :f(¯x)≥f(x) +hξ,x¯−xi, ∀¯x∈X.

In this chapter we assume that (2.2) admits a saddle point and we will introduce the

(11)

2.1 the primal-dual algorithm

We first recall that the classical Uzawa algorithm [1] builds approximate solutions

iter-atively by first updating y via a proximal maximization problem and then updating x

via a proximal minimization problem. To be more precise, the Uzawa algorithm takes

the form

yn+1= arg max

y∈Y

L(xn, y)−

1

2σky−ynk

2

,

xn+1= arg min

x∈X

L(x, yn+1) +

1

2τkx−xnk

2

which, according to the definition of L(x, y), can be written as

yn+1= arg max

y∈Y

hKxn, yi −F∗(y)−

1

2σky−ynk

2

,

xn+1= arg min

x∈X

G(x) +hKx, yn+1i+

1

2τkx−xnk

2

,

where σ and τ are suitably chosen positive numbers. This Uzawa algorithm has been

used in [12] to solve the total variation regularization method and the resulting method

is called the primal dual hybrid gradient method.

In [4] Chambolle and Pock revisited the Uzawa algorithm and proposed a class of

first-order primal-dual algorithms by introducing extrapolation steps. Their algorithms takes

the following form.

Algorithm 1

• Choose τ, σ >0,θ∈[0,1], (x0, y0)∈X×Y and set ¯x0 =x0.

• Iterations (n≥0): Updatexn, yn,x¯n as follows:

yn+1= arg max

y∈Y

hKx¯n, yi −F∗(y)−

1

2σky−ynk

2

,

xn+1= arg min

x∈X

G(x) +hKx, yn+1i+

1

2τkx−xnk

2

,

¯

xn+1=xn+1+θ(xn+1−xn).

We note that Algorithm 1 withθ = 0 becomes the Uzawa algorithm. In the following

we will give the convergence analysis of this algorithm forθ= 1.

(12)

(a) For any n∈_Nthere holds

||yn−yˆ||2

2σ +

||xn−xˆ||2

2τ ≤C

_||

y0−yˆ||2

2σ +

||x0−xˆ||2

2τ

, (2.4)

where C := (1−τ σL2₎−1_;

(b) Let xˆN = _N1 N P n=1

xn and yˆN = _N1 N P n=1

yn. Then the weak cluster points of (ˆxN,yˆN)

are saddle points of (2.2).

(c) If both X and Y have finite dimensions, then there exists a saddle point (x∗, y∗)

of (2.2) such that (xn, yn)→(x∗, y∗) asn→ ∞.

Proof. From the definition ofyn+1 andxn+1 and the first order optimality condition, we

can deduce that

yn−yn+1

σ +Kx¯n∈∂F ∗₍_y

n+1),

xn−xn+1

τ −K

∗

yn+1 ∈∂G(xn+1).

(2.5)

Therefore, by the definition of subdifferential, we have for any (x, y)∈X×Y that

F∗(y)≥F∗(yn+1) +

yn−yn+1

σ +Kx¯n, y−yn+1

, (2.6)

G(x)≥G(xn+1) +

xn−xn+1

τ −K

∗_y

n+1, x−xn+1

. (2.7)

Using the identity 2ha−b, ai=kak2_{− k}_b_k2₊_k_a₋_b_k2_{, we can deduce}

1

σhyn−yn+1, y−yn+1i=

1

2σ kyn−yn+1k

2_{− k}_y

n−yk2+ky−yn+1k2

(2.8)

1

τhxn−xn+1, x−xn+1i=

1

2τ kxn−xn+1k

2_{− k}_x

n−xk2+kx−xn+1k2

(2.9)

Summing (2.6) and (2.7), and using (2.8) and (2.9) we have

ky−ynk2

2σ +

kx−xnk2

2τ

≥ ky−yn+1k

2

2σ +

kx−xn+1k2

2τ +

kyn−yn+1k2

2σ +

kxn−xn+1k2

2τ

+ [G(xn+1)−F∗(y)]−[G(x)−F∗(yn+1)]

+hKx¯n, y−yn+1i − hK∗yn+1, x−xn+1i

= ky−yn+1k

2

2σ +

kx−xn+1k2

2τ +

kyn−yn+1k2

2σ +

kxn−xn+1k2

2τ

(13)

By using ¯xn= 2xn−xn−1, we can see that

hK(xn+1−x¯n), yn+1−yi=hK((xn+1−xn)−(xn−xn−1)), yn+1−yi

=hK(xn+1−xn), yn+1−yi − hK(xn−xn−1), yn−yi

− hK(xn−xn−1), yn+1−yni

≥ hK(xn+1−xn), yn+1−yi − hK(xn−xn−1), yn−yi

− kKkkxn−xn−1kkyn+1−ynk. (2.11)

Combine (2.10) and (2.11), we can deduce for all (x, y)∈X×Y that

ky−ynk2

2σ +

kx−xnk2

2τ

≥ ky−yn+1k

2

2σ +

kx−xn+1k2

2τ +

kyn−yn+1k2

2σ +

kxn−xn+1k2

2τ

+L(xn+1, y)−L(x, yn+1)− kKkkxn−xn−1kkyn+1−ynk

+hK(xn+1−xn), yn+1−yi − hK(xn−xn−1), yn−yi.

By using the Cauchy-Schwarz inequality, we have

kKkkxn−xn−1kkyn+1−ynk ≤ αkKk

2 kxn−xn−1k

2₊ kKk

2α kyn+1−ynk

2_. _(2.12)

for any α >0. By takingα =pσ/τ in (2.12) we therefore have

ky−ynk2

2σ +

kx−xnk2

2τ

≥ ky−yn+1k

2

2σ +

kx−xn+1k2

2τ + (1−

√

στkKk)kyn−yn+1k

2

2σ

+kxn−xn+1k

2

2τ −

√

στkKkkxn−1−xnk

2

2τ +L(xn+1, y)−L(x, yn+1)

+hK(xn+1−xn), yn+1−yi − hK(xn−xn−1), yn−yi. (2.13)

Summing (2.13) over nfrom n= 0 to n=N−1 it follows that

N X

n=1

(L(xn, y)−L(x, yn)) +

ky−yNk2

2σ +

kx−xNk2

2τ +

||xN −xN−1||2

2τ

+ (1−√στkKk)

N X

n=1

||yn−yn−1||2

2σ + (1−

√

στkKk)

N−1

X

n=1

||xn−xn−1||2

2τ

≤ ||y−y0||

2

2σ +

||x−x0||2

(14)

for all (x, y)∈X×Y, where x−1 =x0. By using again the Cauchy-Schwarz inequality

we have

|hK(xN −xN−1), yN−yi| ≤

kxN−xN−1k2

2τ +

τkKk2

2 ky−yNk

2_.

Consequently

N X

n=1

(L(xn, y)−L(x, yn)) + (1−στkKk2)

ky−yNk2

2σ +

kx−xNk2

2τ

+ (1−√στkKk)

N X

n=1

kyn−yn−1k2

2σ + (1−

√

στkKk)

N−1

X

n=1

kxn−xn−1k2

2τ

≤ ky−y0k

2

2σ +

kx−x0k2

2τ (2.14)

Note that for a saddle point (ˆx,yˆ) we have L(xn,yˆ) ≥ L(ˆx,yˆ) ≥ L(ˆx, yn). By setting

(x, y) = (ˆx,yˆ) in (2.14) and usingστkKk2_<_{1, we can derive that}

kyN−yˆk2

2σ +

kxN−xˆk2

2τ ≤C

_k

y0−yˆk2

2σ +

kx0−xˆk2

2τ

and

(1−√στkKk)

N−1

X

n=1

_k

yn−yn−1k2

2σ +

kxn−xn−1k2

2τ

≤ ky0−yˆk

2

2σ +

kx0−xˆk2

2τ (2.15)

for all N, whereC:= (1−τ σL2₎−1_{. This complete the proof of (a).}

Next we prove (b). From (2.14) we can deduce for ˆxN = _N1 N P n=1

xn and ˆyN = _N1 N P n=1

yn

that

L(ˆxN, y)−L(x,yˆN) =

1

N N X

n=1

(L(xn, y)−L(x, yn))≤

1

N

ky−y0k2

2σ +

kx−x0k2

2τ

(2.16)

for any (x, y)∈X×Y. Let (x∗, y∗) denote a weak cluster point of (¯xN,y¯N). Then there

is a subsequence (ˆxNj,yˆNj) such that ˆxNj * x

∗ _{and ˆ}_y

Nj * y

∗ _as _j _{→ ∞, where “}_*_”

denotes weak convergence. We have

h¯xNj, K

∗_y_{i → h}_x∗_{, K}∗_y_i _and _h_Kx,_y_¯

Nji → hKx, y

(15)

Since F∗ and G are convex and lower continuous, they are weak lower

semi-continuous. Thus

G(x∗)≤lim inf

j→∞ G(¯xNj) and F

∗₍_y∗₎_≤_{lim inf} j→∞ F

∗_(¯_y Nj).

By settingN =Nj in (2.16) and takingj→ ∞, we can deduce that

L(x∗, y)−L(x, y∗)≤lim inf

j→∞ (L(¯xNj, y)−L(x,y¯Nj))≤0 (2.17)

and thus L(x∗, y) ≤L(x∗, y∗) ≤L(x, y∗) for all (x, y)∈ X×Y. Therefore (x∗, y∗) is a

saddle-point of (2.2).

Finally we prove (c). From (a) we know that (xn, yn) is a bounded sequence. Since both

X and Y have finite dimension, there must exist a convergent subsequence (xnk, ynk)

whose limit is denoted by (x∗, y∗). From (2.15) we havexn−xn−1 →0 andyn−yn−1 →

as n → ∞. Therefore, ¯xnk → x

∗_{, and} _ynk+1 _→ _y∗ _and _x

nk+1 → x

∗ _as _k _{→ ∞. By}

settingn=nk in (2.6) and (2.7), letting k→ ∞, and using the lower semi-continuity of

F∗ and G, we can obtain

F∗(y)≥F∗(y∗) +hKx∗, y−y∗i, G(x)≥G(x∗) +hK∗y∗, x−x∗i.

Consequently L(x∗, y∗)−L(x∗, y) ≥ 0 and L(x, y∗)−L(x∗, y∗) ≥ 0, i.e. L(x∗, y) ≤

L(x∗, y∗)≤L(x, y∗) for all (x, y)∈X×Y. Therefore (x∗, y∗) is a saddle point of (2.2).

Now we take (x, y) = (x∗, y∗) in (2.13). For any N > nk we sum (2.13) over n from

n=nk ton=N −1 to obtain

ky∗−yNk2

2σ +

kx∗−xNk2

2τ −

kxnk−xnk−1k

2

2τ

+ (1−√στkKk)

N X

n=nk+1

kyn−yn−1k2

2σ + (1−

√

στkKk)

N−1

X

n=nk

kxn−xn−1k2

2τ

+hK(xN −xN−1), yN −y∗i − hK(xnk −xnk−1), ynk−y

∗_i

≤ ky

∗₋_y nkk

2

2σ +

kx∗−xnkk

2

2τ .

UsingxN −xN−1 →0 andστkKk2 <1 we can deduce that

lim sup

N→∞ _k

y∗−yNk2

2σ +

kx∗−xNk2

2τ

≤ kxnk−xnk−1k

2

2τ +hK(xnk−xnk−1), ynk−y

∗_i

+ky

∗₋_y nkk

2

2σ +

kx∗−xnkk

2

(16)

for allk. By takingk→ ∞, usingxnk−xnk−1 →0,xnk →x

∗ _and _y nk →y

∗ _as_k_→_0∞

we can obtain

lim sup

N→∞ _k

y∗−yNk2

2σ +

kx∗−xNk2

2τ

≤0

ThereforexN →x∗ and yN →y∗ asN → ∞.

2.2 Acceleration:

G

is strongly convex

In this section we will present an accelerated version of Algorithm 1 whenGis strongly

convex in the sense that there is a constantγ >0 such that

G(tx1+ (1−t)x2) +

γ

2t(1−t)kx1−x2k

2 _≤_tG₍_x

1) + (1−t)G(x2) (2.18)

for all x1, x2 ∈ X and 0 ≤ t ≤ 1. By using variable steps and variable relaxation

parameters, the accelerated algorithm takes the following form.

Algorithm 2

Initialization: Chooseτ0, σ0 >0 with τ0σ0kKk2 ≤1, (x0, y0)∈X×Y and set ¯x0=x0;

Iterations (n≥0): Update xn, yn,x¯n, θn, τn, σn as follows:

yn+1= arg max

y∈Y

1 2σn

ky−ynk2

,

xn+1 = arg min

x∈X

G(x) +hKx, yn+1i+

1 2τn

kx−xnk2

,

θn= 1/ p

1 + 2γτn, τn+1=θnτn, σn+1=σn/θn,

¯

xn+1 =xn+1+θn(xn+1−xn)

In the following we will show that Algorithm2 has the convergence rate O(1/N2). We

will use (ˆx,yˆ) to denote any saddle point of (2.2). Since G is strongly convex, we can

show that [3]

G(¯x)≥G(x) +hξ,x¯−xi+γ

2kx¯−xk

2 _(2.19)

for all ξ ∈ ∂G(x) and ¯x ∈ X. According to the definition of xn+1 we have xn−_τx_nn+1 −

K∗yn+1∈∂G(xn+1). Therefore

G(x)≥G(xn+1) +

xn−xn+1

τn

−K∗yn+1, x−xn+1

+γ

2kx−xn+1k

(17)

for all x∈X. By using the definition of yn+1 we can also derive that

F∗(y)≥F∗(yn+1) +

yn−yn+1

σn

+Kx¯n, y−yn+1

, ∀y ∈Y.

With the help of the above two inequalities, we can use the similar argument for deriving

(2.10) to obtain

kˆy−ynk2

2σn

+kˆx−xnk

2

2τn

≥ kˆy−yn+1k

2

2σn

+kˆx−xn+1k

2

2τn

+kyn−yn+1k

2

2σn

+kxn−xn+1k

2

2τn

+L(xn+1,yˆ)−L(ˆx, yn+1) +

γ

2kˆx−xn+1k

2

+hK(xn+1−x¯n), yn+1−yˆi.

Since (ˆx,yˆ) is a saddle point of (2.2), we may use (2.3), the strong convexity of G and

the convexity of F∗ to derive that

L(xn+1,yˆ)−L(ˆx, yn+1) = [G(xn+1)−G(ˆx)− h−K∗y, xˆ n+1−xˆi]

+ [F∗(yn+1)−F∗(ˆy)− hKx, yˆ n+1−yˆi]

≥ γ

2kxn+1−xˆk

2_.

Therefore

kˆy−ynk2

2σn

+kˆx−xnk

2

2τn

≥ kˆy−yn+1k

2

2σn

+kˆx−xn+1k

2

2τn

+kyn−yn+1k

2

2σn

+kxn−xn+1k

2

2τn

+γkxˆ−xn+1k2+hK(xn+1−x¯n), yn+1−yˆi.

Recall that ¯xn=xn+θn−1(xn−xn−1), we can further obtain

kˆy−ynk2

2σn

+kˆx−xnk

2

2τn

≥γkˆx−xn+1k2+

kˆy−yn+1k2

2σn

+kˆx−xn+1k

2

2τn

+ kyn−yn+1k

2

2σn

+kxn−xn+1k

2

2τn

+hK(xn+1−xn), yn+1−yˆi −θn−1hK(xn−xn−1), yn−yˆi

−θn−1kKkkxn−xn−1kkyn+1−ynk. (2.21)

By the Cauchy-Schwarz inequality we have

θn−1kKkkxn−xn−1kkyn+1−ynk ≤

kyn−yn+1k2

2σn

+θ

2

n−1kKk2σn

2 kxn−xn−1k

(18)

Combining this with (2.21) it follows that

kyˆ−ynk2 σn

+ kˆx−xnk

2

τn

≥(1 + 2γτn) τn+1

τn

kˆx−xn+1k2

τn+1

+σn+1

σn

kˆy−yn+1k2

σn+1

+kxn−xn+1k

2

τn

+ 2hK(xn+1−xn), yn+1−yˆi −2θn−1hK(xn−xn−1), yn−yˆi

−θ_n2−1kKk2σnτn−1

kxn−xn−1k2

τn−1

.

By the definition ofθn,τnand σn we can see that

(1 + 2γτn) τn+1

τn

= 1

θn

= σn+1

σn

= τn

τn+1

, σnτn=σn−1τn−1 =· · ·=σ0τ0

and

θ_n2−1kKk2σnτn−1 =θ2n−1

τn−1

τn

kKk2σ0τ0≤θn2−1

τn−1

τn

= τn

τn−1

.

Therefore

kˆy−ynk2 σn

+kˆx−xnk

2

τn

≥ τn

τn+1

kˆx−xn+1k2

τn+1

+kˆy−yn+1k

2

σn+1

+ kxn−xn+1k

2

τn

− τn

τn−1

kxn−xn−1k2

τn−1

+ 2hK(xn+1−xn), yn+1−yˆi −2θn−1hK(xn−xn−1), yn−yˆi.

By dividing the both side by τn and noting the fact θn−1/τn = 1/τn−1, we can deduce

that

∆n τn

≥ ∆n+1

τn+1

+kxn−xn+1k

2

τ2

n

−kxn−xn−1k

2

τ2

n−1

+ 2

τn

hK(xn+1−xn), yn+1−yˆi −

2

τn−1

hK(xn−xn−1), yn−yˆi, (2.22)

where

∆n=

kyˆ−ynk2 σn

+kxˆ−xnk

2

τn .

Summing the inequality (2.22) over nfrom n= 0 ton=N −1 and using x−1 =x0 we

deduce that

∆0

τ0

≥ ∆N

τN

+kxN−1−xNk

2

τ_N2₋₁ +

2

τN−1

hK(xN−xN−1), yN −yˆi

≥ ∆N

τN

+kxN−1−xNk

2

τ_N2₋₁ −

kxN−1−xNk2

τ_N2₋₁ − kKk

2_k_y

N −yˆk2

= 1

τN

(1−σNτNkKk2)τN

kyN −yˆk2 σNτN

+kxN −xˆk

2

(19)

= 1

τN

1−σ0τ0kKk2

σ0τ0

τNkyN −yˆk2+

kxN −xˆk2 τN

which implies

1−σ0τ0kKk2

σ0τ0

τ_N2kˆy−yNk2+kˆx−xNk2 ≤

∆0

τ0

τ_N2.

This inparticular shows that

kˆx−xNk2 ≤

∆0

τ0

τ_N2. (2.23)

We next show thatτN =O(1/N). Since θn= 1/

√

1 + 2γτnand τn+1 =θnτn, we have

τn+1 =

τn

√

1 + 2γτn .

Therefore

1

τn+1

=

s

1

τ2

n

+ 2γ

τn .

This clearly shows that 1/τn+1 ≥1/τn and thusτn+1 ≤τn≤τ0. Moreover

1

τn+1

= s 1 τn +γ 2

−γ2₌

1

τn

+γ s

1− γ

2

(1/τn+γ)2

≥

1

τn

+γ 1− γ

2

(1/τn+γ)2

= 1

τn

+γ− γ

2

1/τn+γ

= 1

τn

+ 1

1 +γτn

≥ 1

τn

+ 1

1 +γτ0

. Consequently 1 τN ≥ 1 τ0 + N

1 +γτ0

which in particular implies thatτN =O(1/N). Combining this with (2.23) we therefore

obtain the following convergence rate result.

Theorem 2.2. Assume that F∗ is convex and Gis strongly convex. Let (ˆx,yˆ)∈X×Y be the saddle point of (2.2). For Algorithm2 there holds

kˆx−xNk2 ≤ C N2

_k

ˆ

x−x0k2

τ₀2 +

kyˆ−y0k2

τ0σ0

,

(20)

2.3 Acceleration:

G

and

F

∗

are both strongly convex

In this section we will show that the primal-dual algorithm converges linearly when both

F∗ and G are strongly convex. We assume that G is strongly convex in the sense of

(2.18) and we also assume that F∗ is strongly convex in the sense that there is δ > 0

such that

F∗(ty1+ (1−t)y2) +

δ

2t(1−t)ky1−y2k

2_≤_tF∗

(y1) + (1−t)F∗(y2)

for all y1, y2 ∈Y and 0≤t ≤ 1. The primal-dual algorithm is then modified into the

following form.

Algorithm 3

Initialization: Choose µ = 2√γδ/kKk, τ = µ/(2γ), σ = µ/(2δ), θ ∈ [1/(1 +µ),1], (x0, y0)∈X×Y and set ¯x0=x0;

Iterations (n≥0): Update xn, yn,x¯n as follows:

yn+1= arg max

y∈Y

1

2σky−ynk

2

,

xn+1= arg min

x∈X

G(x) +hKx, yn+1i+

1

2τkx−xnk

2

,

¯

xn+1=xn+1+θ(xn+1−xn)

Let (ˆx,yˆ) ∈X×Y be the saddle point of (2.2). By the strong convexity of F∗ and G

together with the definition ofyn+1 andxn+1, we can see that

F∗(y)≥F∗(yn+1) +

yn−yn+1

σ +Kx¯n, y−yn+1

+δ

2ky−yn+1k

2_,

G(x)≥G(xn+1) +

xn−xn+1

τ −K

∗

yn+1, x−xn+1

+γ

2kx−xn+1k

2

and

L(x,yˆ)−L(ˆx, y)≥ γ

2kx−xˆk

2₊δ

2ky−yˆk

2

for all x ∈ X and y ∈ Y. By virtue of these three inequalities, we may use the same

argument for deriving (2.21) to obtain

kˆy−ynk2

2σ +

kˆx−xnk2

2τ ≥

2δ+ 1

σ _kˆ

y−yn+1k2

2 +

2γ+ 1

τ _k

ˆ

x−xn+1k2

2

+kyn−yn+1k

2

2σ +

kxn−xn+1k2

2τ

(21)

By the choices of µ,σ and τ we have

τ = µ 2γ =

1 kKk

s δ

γ, σ = µ

2δ =

1 kKk

r γ

δ, (2.24)

2δ+ 1

σ = 2δ

1 +1

µ

, 2γ+ 1

τ = 2γ

1 +1

µ

.

Therefore

1

µ δkˆy−ynk

2₊_γ_kˆ_x₋_x

nk2

≥

1 +1

µ

δkyˆ−yn+1k2+γkxˆ−xn+1k2

+ 1

µ δkyn−yn+1k

2₊_γ_k_x

n−xn+1k2

+hK(xn+1−x¯n), yn+1−yˆi.

Let

∆n:=δkˆy−ynk2+γkxˆ−xnk2.

Multiplying the above inequality byµ we obtain

∆n≥(1 +µ)∆n+1+δkyn−yn+1k2+γkxn−xn+1k2

+µhK(xn+1−x¯n), yn+1−yˆi. (2.25)

Recall that ¯xn = xn+θ(xn−xn−1). For any 0 < ω ≤ θ and α > 0 we may use the

Cauchy-Schwarz inequality to deduce that

µhK(xn+1−x¯n), yn+1−yˆi

=µhK(xn+1−xn), yn+1−yˆi −µθhK(xn−xn−1), yn+1−yˆi

=µhK(xn+1−xn), yn+1−yˆi −µωhK(xn−xn−1), yn−yˆi

−µωhK(xn−xn−1), yn+1−yni −µ(θ−ω)hK(xn−xn−1), yn+1−yˆi

≥µhK(xn+1−xn), yn+1−yˆi −µωhK(xn−xn−1), yn−yˆi

−µωkKk

αkxn−xn−1k

2

2 +

kyn+1−ynk2

2α

−µ(θ−ω)kKk

αkxn−xn−1k

2

2 +

kyn+1−yˆk2

2α

=µhK(xn+1−xn), yn+1−yˆi −µωhK(xn−xn−1), yn−yˆi

−µθkKkαkxn−xn−1k

2

2 −µωkKk

kyn+1−ynk2

2α

−µ(θ−ω)kKkkyn+1−yˆk

2

(22)

Now we chooseα=ωpγ/δ. Then

µθkKkα= 2θωγ, µωkKk α = 2δ.

Therefore

µhK(xn+1−x¯n), yn+1−yˆi

≥µhK(xn+1−xn), yn+1−yˆi −µωhK(xn−xn−1), yn−yˆi

−θωγkxn−xn−1k2−δkyn+1−ynk2− θ−ω

ω δkyn+1−yˆk

2_.

Combining this with (2.25) gives

∆n≥(1 +µ)∆n+1+γkxn−xn+1k2−ωθγkxn−xn−1k2

+µhK(xn+1−xn), yn+1−yˆi −µωhK(xn−xn−1), yn−yˆi

−θ−ω

ω δkyn+1−ynk

2_.

Now we chooseω such that

1 +µ− 1

ω = θ−ω

ω .

This implies thatω= ₂₊1+_µθ. By usingθ∈[1/(1 +µ),1] one can easily see thatω≤θand

0< ω <1. Consequently

∆n≥

1

ω∆n+1+γkxn−xn+1k

2₋_ωθγ_k_x

n−xn−1k2

+µhK(xn+1−xn), yn+1−yˆi −µωhK(xn−xn−1), yn−yˆi

≥ 1

ω∆n+1+γkxn−xn+1k

2₋_ωγ_k_x

n−xn−1k2

+µhK(xn+1−xn), yn+1−yˆi −µωhK(xn−xn−1), yn−yˆi.

Multiplying the both sides by ω−nwe obtain

ω−n∆n≥ω−n−1∆n+1+ω−nγkxn−xn+1k2−ω−n+1γkxn−xn−1k2

+ω−nµhK(xn+1−xn), yn+1−yˆi

−ω−n+1µhK(xn−xn−1), yn−yˆi.

Set x−1 = x0. By summing the above inequality over n from n = 0 to n= N −1 we

can deduce that

(23)

By virtue of the Cauchy-Schwarz inequality and (2.24) we have

µhK(xN −xN−1), yN −yˆi ≥ −

1 2µkKk

r γ

δkxN −xN−1k

2₊

s δ

γkyN−yˆk

2

!

=−γkxN −xN−1k2−δkkyN−yˆk2.

Therefore

∆0 ≥ω−N∆N −ω−N+1δkyN −yˆk2 ≥ω−N∆N−ω−N+1∆N =ω−N(1−ω)∆N

which implies that ∆N ≤ ₁∆₋0_ωωN with 0< ω <1. We thus obtain the following linear

convergence result.

Theorem 2.3. Assume that bothGandF∗ are strongly convex. Let(ˆx,yˆ) be the unique saddle point of (2.2). Then for the sequence (xn, yn) defined by Algorithm 3 there holds

δkˆy−ynk2+γkˆx−xnk2≤ ωn

1−ω δkˆy−y0k

2₊_γ_kˆ_x₋_x 0k2

,

(24)

ADMM regularization Algorithm

In this chapter we consider the following convex minimization problem

 



minimize f(W x)

subject to Kx=b, x∈dom(W)

(3.1)

arising from linear inverse problems, where K : X → H is a bounded linear operator

between two Hilbert spaces X and H, W : X → Y is a linear operator from X to

another Hilbert space Y with domain dom(W), and f : Y → (−∞,∞] is a proper,

lower semi-continuous, convex function which is used to capture the feature the sought

solution under the transform W.

For inverse problems the operator K is usually either non-invertible or ill-conditioned

with a huge condition number. Thus, a small perturbation on the data may lead the

problem (3.1) to have no solution; even if it has a solution, this solution may not depend

continuously on the data. Due to the unavoidable appearance of noise in the data,

regularization method should be employed to solve (3.1) in a stable manner.

Let bδ be the noisy data which is a perturbation of b, the variational regularization

method renders (3.1) into the well-posed unconstrained minimization problem

min

x∈dom(W)

1

2kKx−b

δ_k2₊_αf₍_{W x}₎

(3.2)

where the regularization parameter α >0 should be chosen carefully in order to

guar-antee a good performance. The primal-dual algorithms in the previous chapters can

be used to solve (3.2). There exist also other algorithms for solving (3.1). In

par-ticular, the alternating direction method of multipliers (ADMM) is among the most

famous ones. ADMM is a versatile splitting method introduced in [6,7] around the mid

(25)

1970s by Gabay, Mercier, Glowinski and Marrocco and it has been analyzed in [5, 10]

for well-posed problems. This method has been revitalized and popularized for solving

structured convex optimization problems in recent years; see [2] and references therein.

Although these methods perform well for solving (3.2), they suffer from the following

drawbacks for finding approximate solutions of (3.1):

(i) Since the performance of (3.2) depends on the choice of the regularization

param-eter . One has to tune the values of α and hence has to solve (3.2) for many

different values ofα in order to find a reasonable approximate solution. This can

be time-consuming.

(ii) All the available convergence analysis on these methods depends on the solvability

of the dual problem or or the existence of saddle points for the corresponding

Lagrangian function. Unfortunately, these conditions may not hold for (3.1) arising

from inverse problems.

In the following we will discuss the alternating direction method of multipliers developed

in [9] for solving the inverse problems (3.1) as a regularization method which avoids the

above two drawbacks. To formulate the method, we introduce an additional variable

y=W x and written (3.1) into the equivalent problem

 



minimize f(y)

subject to Kx=b, W x=y, x∈dom(W).

(3.3)

For (3.3) the corresponding augmented Lagrangian function is

Lρ1,ρ2(x, y;λ, µ) =f(y) +hλ, Kx−bi+hµ, W x−yi+ ρ1

2 kKx−bk+

ρ2

2 kW x−yk (3.4)

whereρ1 andρ2 are two positive constants. The ADMM algorithm proposed in [9] takes

the form

         

        

xn+1 = arg min

x∈dom(W)Lρ1,ρ2(x, yn;λn, µn)

yn+1= arg min

y∈Y Lρ1,ρ2(xn+1, y;λn, µn) λn+1 =λn+ρ1(Kxn+1−b),

µn+1=µn+ρ2(W xn+1−yn+1).

(3.5)

Obviously, thex−subproblem is a quadratic minimization problem, and they−subproblem

(26)

For the convenience analysis, we make the following assumptions.

Assumption 1. K:X →H is a bounded linear operator, and K∗:H→X is used to denote the adjoint of K.

Assumption 2. f :Y →(−∞,∞]is a proper, lower semi-continuous, strongly convex function in the sense that there exists a constant c0>0 such that

f(ty1+ (1−t)y2) +c0t(1−t)ky1−y2k2 ≤tf(y1) + (1−t)f(y2) (3.6)

for all y1, y2 ∈Y and 0≤t≤1.

Assumption 3. W :X →Y is a densely defined, closed, linear operator with domain dom(W). This implies that the adjointW∗ of W is weakly closed and densely defined.

Assumption 4. There exists a positive constant c1 such that

kKxk2+kW xk2 ≥c1kxk2, ∀x∈dom(W). (3.7)

To analyze the method (3.5), we will make use of the subdifferential∂f of f which has

been introduce in Chapter 2. Let

dom(∂f) ={y∈Y :∂f(y)6=∅}.

Then for any y∈dom(∂f) and µ∈∂f(y), the quantity

Dµf(¯y, y) =f(¯y)−f(y)− hµ,y¯−yi, ∀y¯∈Y (3.8)

is called the Bregman distance induced by f aty in the direction µ. From Assumption

2 it is easy to deduce that

Dµf(¯y, y)≥c0k¯y−yk2 (3.9)

for all ¯y∈Y,y∈dom(∂f) andµ∈∂f(y), and

hµ−µ, y¯ −y¯i ≥2c0ky−y¯k2 (3.10)

for all y,y¯∈dom(∂f),µ∈∂f(y), ¯µ∈∂f(¯y).

Under Assumptions1– 4, we can deduce that (3.1) has a unique solution wheneverbis

consistent in the sense that b=Kx for somex∈dom(W) withW x∈dom(f).

(27)

Proof. Denote f∗ := inf{f(W z) :Kz =b, z∈dom(W)}. Since bis consistent, we have

f∗ <∞. Let{zn}be a sequence such that

zn∈ dom(W), Kzn=b and lim

n→∞f(W zn) =f∗.

Since f is strong convex, it is coercive [3] and consequently {W zn} is bounded in Y.

From Assumption (4) we can also see that{zn}is bounded inX. By Bolzano-Weierstrass

theorem,{zn} has a subsequence, still denoted by the same notation, such that

zn* x∗ and W zn* y∗

for somex∗ ∈Xandy∗∈Y. By Assumption3and{zn} ⊂dom(D), we can deduce that

x∗ ∈dom(W) and y∗ =W x∗. From Assumption1we also have Kx∗ =b. According to

Assumption 2,f is convex and lower semi-continuous, and hencef is also weakly lower

semi-continuous [3]. Thus

f(W x∗)≤lim inf

n→∞ f(W zn) =f∗

which implies that x∗ must be a solution of (3.1).

Next we show the uniqueness. Assume that ¯x andx∗ are two solutions of (3.1). Then

f(W x∗) =f(Wx¯) = inf{f(W z) :Kz=b, z∈dom(W)}.

It then follows from the strong convexity of f that W x∗ =Wx¯. Note also that Kx∗ =

b=Kx¯. It then follows from Assumption 4 that

c1kx∗−x¯k2 ≤ kK(x∗−x¯)k2+kW(x∗−x¯)k2 = 0

and thus x∗ = ¯x.

3.1 ADMM algorithm and basic estimates

In practical applications, data are usually obtained by measurements and hence contain

errors. Thus, instead of a consistent data b we only have a noisy data bδ. In order

to use bδ _{to produce approximation solutions to the true solution of (}_3.1_{), we will use}

(3.5) with b replaced by bδ to produce an iterative sequence. In order to indicate the

dependence on the noisy data, we will place a superscript “δ” on every element of the

(28)

Algorithm 4

Initialization: Choose ρ1 > 0 and ρ2 > 0. Take y0 ∈ Y, λ0 ∈ H, and µ0 ∈ Y. Set

y₀δ=y0,λδ0 =λ0 and µδ0=µ0;

Iterations (n≥0): Update xδ_n,y_nδ,λδ_n andµδ_n as follows:

xδ_n₊₁= arg min

x∈dom(W)

n

hλδ_n, Kxi+hµδ_n, W xi+ ρ1

2 kKx−b

δ_k2₊ρ1

2 kW x−y

δ nk2

o ,

yδ_n₊₁= arg min

y∈Y n

f(y)− hµδ_n, yi+ρ1 2 kW x

δ

n+1−yk2

o ,

λδ_n₊₁=λδ_n+ρ1(Kxδn+1−bδ),

µδ_n₊₁=µδ_n+ρ2(W xδn+1−ynδ+1).

We first show that Algorithm 4 is well-defined. It suffices to show that the x− and

y−subproblems are well-defined. By rewriting these two subproblems into the equivalent

forms

xδ_n₊₁ = arg min

x∈dom(W)

nρ₁

2 kKx−b

δ₊_λ

n/ρ1k2+

ρ2

2kW x−y

δ

n+µδk/ρ2k2

o ,

y_nδ₊₁ = arg min

y∈Y n

f(y) + ρ2

2 ky−W x

δ

n+1−µδk/ρ2k2

o ,

we can obtain the well-definedness from the following result.

Lemma 3.2. Let Assumption 1–Assumption 4 hold.

(i) For any h∈H and v∈Y, the minimization problem

min

z∈dom(W)

nρ₁

2 kKz−hk

2₊ρ2

2 kW z−vk

2o _(3.11)

has a unique solution z, and Dz and z depend continuously on v and h.

(ii) For any v∈Y, the minimization problem

min

y∈Y n

f(y) +ρ2

2ky−vk

2o _(3.12)

has a unique solution y, andy andf(y) depend continuously on v.

Proof. (i) We first show the existence. Letm∗denotes the minimum value of (3.11) and

let {zn} be a minimizing sequence, i.e.

ρ1

2 kKzn−hk

2₊ρ2

2 kW zn−vk

2 _→_m∗ _as_n_{→ ∞}_.

Then {Kzn} is bounded inH and {W zn} is bounded in Y. By Assumption 4, we can

(29)

a subsequence, still denoted by the same notation, such that

zn* z, W zn* y

asn→ ∞. By Assumption3we have z∈dom(W) andy=W z. Moreover

ρ1

2 kKz−hk

2₊ρ2

2 kW z−vk

2 _≤_{lim inf}

n→∞ ρ₁

2kKzn−hk

2₊ρ2

2 kW zn−vk

2₌_m∗.

Therefore, z is the minimizer of (3.11). In view of Assumption 4, we can see that the

objective function in (3.11) is strictly convex and hence (3.11) has a unique solution.

Next we show the continuous dependence. Let {(hn, vn) ⊂H×Y be a sequence such

that (hn, vn)→(h, v) and letznbe the solution of (3.11) with (h, v) replaced by (hn, vn).

WE need to show thatzn →z and W zn→ W z. By the minimizing property ofzn we

have

ρ1

2kKzn−hnk

2₊ρ2

2 kW zn−vnk

2 _≤ ρ1

2kKz−hnk

2₊ρ2

2kW z−vnk

2_.

This implies that ρ1

2 kKzn−hnk2+

ρ2

2kW zn−vnk2is bounded by a constant independent

ofn. Consequently{Kzn}and{W zn}are bounded and thus{zn}is bounded inX. By

taking a subsequence if necessary we have

zn*zˆ and W zn* Wzˆ

for some ˆz∈dom(W). Since the norms are weakly lower semi-continuous, we have

ρ1

2kKzˆ−hk

2₊ρ2

2 kWzˆ−vk

2_≤_{lim inf}

n→∞ nρ₁

2 kKzn−hnk

2₊ρ2

2kW zn−vnk

2o

≤lim sup

n→∞ nρ₁

2 kKzn−hnk

2₊ρ2

2 kW zn−vnk

2o

≤lim sup

n→∞ nρ₁

2 kKz−hnk

2₊ρ2

2kW z−vnk

2o

= ρ1

2 kKz−hk

2₊ρ2

2kW z−vk

2_.

Since z is the unique minimizer of (3.11), we must have ˆz=z. Thus

zn* z, W zn* W z and Kzn* Kz (3.13)

and

lim

n→∞ nρ₁

2kKzn−hnk

2₊ρ2

2 kW zn−vnk

2o₌ ρ1

2 kKz−hk

2₊ ρ2

2 kW z−vk

(30)

Furthermore, we have

kKz−hk2 ≤lim inf

n→∞ kKzn−hnk

2_≤_{lim sup}

n→∞

kKzn−hnk2

=kKz−hk2+ρ2

ρ1

kW z−vk2−ρ2

ρ1

lim inf

n→∞ kW zn−vnk

2_.

Since kW z−vk2_≤_{lim inf}

n→∞kW zn−vnk2, we have

lim inf

n→∞ kKzn−hk

2 _{= lim sup}

n→∞

kkKzn−hk2 =kKz−hk2.

By the same procedure, we can also deduce that limn→∞kW zn−vk2 = kW z−vk2.

Thus, by virtue of (3.13) we can conclude thatKzn→Kz and W zn→W z. In view of

Assumption 4we thus have zn→z.

(ii) By using the same argument in (i) we can easily show that (3.12) has a solution.

Since f is strictly convex, the solution is unique.

Next we show the continuous dependence. Suppose we havevn→v inY. Letynbe the

solution of (3.12) withv replaced byvn. Then

ρ2(v−y)∈∂f(y) and ρ2(vn−yn)∈∂f(yn)

Thus, by the monotonicity of the subdifferential, we have

0≤ hρ(vn−yn)−ρ(v−y), yn−vni

which implies that

kyn−yk2≤ hvn−v, yn−yi ≤ kvn−vkkyn−yk.

Thereforekyn−yk ≤ kvn−vkand thusyn→yasn→ ∞. By the similar argument for

deriving (3.14), we have

lim

n→∞ n

f(yn) + ρ2

2 kyn−vnk

2o₌_f₍_y_{) +}ρ2

2ky−vk

2_.

Thusf(yn)→f(y) as n→ ∞.

Use the above lemma, we can immediately obtain the following result concerning the

(31)

Lemma 3.3. Let {bδ} be a sequence of noisy data satisfying kbδ−bk → 0 as δ → 0. Then for each fixed integern≥0 there hold

xδ_n→xn, yδn→yn, W xδn→W xn,

λδ_n→λn, µδn→µn, f(ynδ)→f(yn)

(3.15)

as δ→0, where (xn, yn, λn, µn) are defined by (3.5).

By using the definition ofxδ_n₊₁ andy_nδ₊₁ in Algorithm4, it is easy to see that

K∗λδ_n+ρ1K∗(Kxδn+1−bδ) +W∗[µδn+ρ2(W xδn+1−ynδ)] = 0,

0∈∂f(y_nδ₊₁)−µδ_k−ρ2(W xδn+1−yδn+1).

By introducing the residuals

rδ_n=Kxδ_n−bδ, sδ_n=Dxδ_n−yδ_n

we can deduce that

λδ_n₊₁−λδ_n=ρ1rδn+1, (3.16)

µδ_n₊₁−µδ_n=ρ2sδn+1 (3.17)

µδ_n₊₁∈∂f(y_nδ₊₁), (3.18)

K∗λδ_n+ρ1K∗rnδ+1 =−W

∗

[µδ_n+ρ2(W xδn+1−ynδ)]. (3.19)

Lemma 3.4. For all k≥1 there hold D

λδ_n₊₁, KxE+Dµδ_n₊₁, W xE=ρ2

D

yδ_n−y_nδ₊₁, W xE (3.20)

and

ρ1

D

rδ_n₊₁, Kx E

+ρ2

D

sδ_n₊₁, W x E

=ρ2

D

(yδ_n−y_nδ₊₁)−(yδ_n−1−yδn), W x E

(3.21)

for all x∈dom(W).

Proof. Multiplying (3.16) byK∗ and using (3.17), (3.19) we obtain for k≥0 that

K∗λδ_n₊₁ =K∗λδ_n+ρ1K∗rnδ+1 =−W

∗

µδ_n+ρ2(W xδn+1−ynδ)

=−W∗

µδ_n₊₁−ρ2(W xδn+1−ynδ+1) +ρ2(W xδn+1−ynδ)

=−W∗µδ_n₊₁+ρ2(yδn+1−ynδ)

(32)

Using again (3.16), (3.17) and (3.19), we then obtain for k≥1 that

ρ1K∗rδn+1 =K

∗

λδ_n₊₁−K∗λδ_n=W∗

µδ_n−µδ_n₊₁+ρ2(yδn−ynδ+1)−ρ2(ynδ−1−yδn)

=ρ2W∗

h

(y_nδ−yδ_n₊₁)−(y_nδ−1−ynδ)−sδn+1

i .

We thus complete the proof.

Since µδ_n₊₁ ∈∂f(y_nδ₊₁), we have from (3.10) that

2c0kynδ+1−ynδk2≤ hµδn+1−µδn, ynδ+1−yδni=ρ2hsδn+1, ynδ+1−yδni

=ρ2hsδn+1, ynδ+1−W xδn+1+W xδn+1−W xδn+W xδn−ynδi

=ρ2hsδn+1, sδn−sδn+1i+ρ2hsδn+1, W(xδn+1−xδn)i.

By using (3.21) in Lemma3.4, we then obtain

2c0kynδ+1−ynδk2≤ρ2hsδn+1, snδ −sδn+1i+ρ1hrnδ+1, rδn−rnδ+1i

+ρ2h(ynδ−yδn+1)−(ynδ−1−yδn), W(xδn+1−xδn)i.

By the definition ofsδ_n, we have

sδ_n₊₁−sδ_n=W xδ_n₊₁−y_nδ₊₁−(W xδ_n−yδ_n) =W(xδ_n₊₁−xδ_n)−(yδ_n₊₁−yδ_n).

Therefore, we can deduce that

2c0kynδ+1−ynδk2≤ρ1hrδn+1, rnδ−rδn+1i+ρ2hsδn+1, sδn−sδn+1i

+ρ2h(ynδ−yδn+1)−(ynδ−1−yδn),(sδn+1−snδ) + (ynδ+1−ynδ)i

=−ρ1 2(kr

δ

n+1k2− krδnk2+krnδ+1−rnδk2)

− ρ2 2 (ks

δ

n+1k2− ksδnk2+ksδn+1−sδnk2)

− ρ2 2 (ky

δ

n+1−ynδk2− kynδ−yδn−1k2+k(ynδ+1−yδn)−(ynδ−yδn−1)k2)

−ρ2h(ynδ+1−yδn)−(ynδ−ynδ−1), sδn+1−sδni (3.22)

Consequently, by introducing

E_nδ =ρ1krδnk2+ρ2ksnδk2+ρ2kynδ −ynδ−1k2,

we have

2c0kynδ+1−yδnk2 ≤Enδ−Enδ+1−

ρ1

2 kr

δ

(33)

−ρ2 2 k(y

δ

n+1−yδn)−(ynδ−ynδ−1) + (sδn+1−sδn)k2. (3.23)

Summing the both sides over nfrom nto∞ we obtain

2c0

∞ X

k=n

2c0kykδ+1−ykδk2+ ρ1

2

∞ X

k=n

krδ_k₊₁−r_kδk2

+ρ2 2

∞ X

k=n

k(y_kδ₊₁−y_kδ)−(y_kδ−y_kδ−1) + (sδk+1−sδk)k2 ≤Enδ (3.24)

Lemma 3.5. {E_nδ} is monotonically decreasing along the iteration and there is a con-stant C such that

∞ X

k=n

kyδ_k₊₁−yδ_kk2+kxδ_k₊₁−xδ_kk2+kK(xδ_k₊₁−xδ_k)k2+kW(xδ_k₊₁−xδ_k)k2≤CE_nδ

for all n≥0.

Proof. the monotonicity of {E_nδ} follows from (3.23). From (3.24) we can derive that

∞ X

k=n

ky_kδ₊₁−y_kδk2 ≤CE_nδ (3.25)

∞ X

k=n

krδ_k₊₁−rδ_kk2≤CE_nδ (3.26)

∞ X

k=n

kW(xδ_k₊₁−xδ_k)−(y_kδ−y_kδ−1)k2≤CEnδ (3.27)

Since ka+bk2_≤_2(k_a_k2₊_k_b_k2_{), by using (}_3.25_{) and (}_3.27_{) we can obtain}

∞ X

k=n

kW(xδ_k₊₁−xδ_k)k2_≤_CEδ n.

From (3.26), we have

∞ X

k=n

kK(xδ_k₊₁−xδ_k)k2 ₌

∞ X

k=n

krδ_k₊₁−rδ_kk2_≤_CEδ n.

In view of Assumption4, we can derive

(34)

Therefore we can obtain

∞ X

k=n

kxδ_k₊₁−xδ_kk2 ≤CE_nδ.

The proof is complete.

3.2 Convergence analysis of exact data case

In this section, we give the convergence analysis of the ADMM (3.5) for exact data

case. We will assume that bis a consistent data and let (ˆx,yˆ) denote any feasible point

of (3.3), i.e. ˆx ∈ dom(W) and ˆy ∈ dom(f) with Kxˆ = b and Wxˆ = ˆy. We will use

{xn, yn, λn, µn}to denote the sequence defined by (3.5).

Lemma 3.6. The sequences {xn} and {yn} are bounded and ∞

X

n=1

{Dµnf(yn+1, yn) +En}<∞ (3.28)

Proof. According to (3.17) and Lemma3.4, we can deduce that

Dµk+1f(ˆy, yk+1)−Dµkf(ˆy, yk) +Dµkf(yk+1, yk)

=hµk−µk+1,yˆ−yk+1i=−ρ2hsk+1, W(ˆx−xk+1) +sk+1i

=−ρ2ksk+1k2−ρ2h(yk−yk+1)−(yk−1−yk), W(ˆx−xk+1)i+ρ1hrk+1, K(ˆx−xk+1)i

=−ρ1krk+1k2−ρ2ksk+1k2+ρ2hyk−1−yn, W(ˆx−xk+1)i

−ρ2hyk−yk+1, W(ˆx−xk+1)i. (3.29)

After summing the above equality fromk=m tok=n−1, we can obtain

Dµnf(ˆy, yn)−Dµmf(ˆy, ym) +

n−1

X

k=m

Dµkf(yk+1, yk)

=−

n X

k=m+1

(ρ1krkk2+ρ2kskk2) +ρ2hym−1−ym, W(ˆx−xm+1)i

+ρ2

n−1

X

k=m+1

(35)

We need to estimate the last two terms of (3.30). For the fist term Pn−1

k=m+1hyk−1−

yk, W(xk−xk+1)i, by the Cauchy-Schwarz inequality and Lemma 3.5it yields

n−1

X

k=m+1

hyk−yk−1, W(xk+1−xk)i ≤ n−1

X

k=m+1

(kyk−yk−1k2+kW(xk+1−xk)k2)≤CEm.

For the second term, we have

− hyn−1−yn, W(ˆx−xn)i

=−hyn−1−yn, W(ˆx−xm)i − n−1

X

k=m

hyn−1−yn,−W(xk+1−xk)i

≤ kW(ˆx−xm)k2+ n−1

X

k=m

kW(xk+1−xk)k2+ (n−m+ 1)kyn−1−ynk2

≤ kW(ˆx−xm)k2+CEm,

where, for the last inequality, we used Lemma3.5.

Choosing m = 1 in (3.30). It is then easy to deduce that there exists a constant C1

independent to nsuch that

Dµnf(ˆy, yn) +

n−1

X

k=1

Dµkf(yk+1, yk) +ρ1

n X

k=2

krkk2+ρ2

n X

k=2

kskk2

≤Dµ1f(ˆy, y1) +ρ2kW(ˆx−x1)k

2₊_CE

1≤C1. (3.31)

This implies that

∞ X

k=1

(Dµkf(yk+1, yk) +ρ1

∞ X

k=1

krkk2+ρ2

∞ X

k=1

kskk2≤C1 <∞.

By (3.25) and the above inequality, it is natural to see that P∞_k₌₁Ek < ∞. From

(3.31) it follows that Dµnf(ˆy, yn)< C1. Since f is strong convex, we havec0kˆy−yk

2 _≤

Dµnf(ˆy, yn) < C1, which means {yn} is bounded. Furthermore, since

P∞

k=1Ek < ∞,

we obtain that as k → ∞, Ek → 0, which means rk → 0, sk → 0 and yk−yk+1 → 0.

Therefore we can conclude that Kxk → b, W xn −yn → 0 and yk −yk+1 → 0 as

k→ ∞.Thus{Kxk}and{W xk}are both bounded. By Assumption4, we can conclude

that{xk}is bounded.

Lemma 3.7. Denote (ˆx,yˆ) as the feasible point of (3.3). Then {Dµkf(ˆy, yk} is a

(36)

Proof. Let 0< m < n be any two integers. By using (3.30) we have

|Dµnf(ˆy, yn)−Dµmf(ˆy, ym)|

≤

n−1

X

k=m

Dµkf(yk+1, yk) +

n X

k=m+1

(ρ1krkk2+ρ2kskk2) +ρ2|hym−1−ym, W(ˆx−xm+1)i|

+ρ2

n−1

X

k=m+1

|hyk−1−yk, W(xk−xk+1)i|+ρ2|hyn−1−yn, W(ˆx−xn)i|

≤

n X

k=m

(Dµkf(yk+1, yk) +ρ1krkk

2₊_ρ

2kskk2) + 2ρ2CEm (3.32)

+ρ2kym−1−ymkkW(ˆx, xm+1)k+ρ2kyn−1−ynkkW(ˆx−xn)k (3.33)

By Lemma (3.6), we can deduce that |Dµnf(ˆy, yn)−Dµmf(ˆy, ym)| →0, as m, n→ ∞.

Therefore{Dµkf(ˆy, yk} is a Cauchy sequence, hence{Dµkf(ˆy, yk}is convergent.

Theorem 3.8. Assume b is consistent and the assumptions (1) to (4) in the previous chapter hold. Let x∗ be the unique solution of 3.1, and let y∗ = W x∗, then for the

ADMM, we have as k→ ∞,

xk→x∗, yk→y∗, W xk→y∗, f(yk)→f(y∗) and Dµkf(y

∗_{, y}

k)→0 (3.34)

Proof. The first part is to show {yk} is a Cauchy sequence.

As above, we denote (ˆx,yˆ) as the feasible points of (3.3). And we already know

Dµmf(yn, ym) =Dµmf(ˆy, ym)−Dµnf(ˆy, yn) +hµn−µm, yn−yˆi (3.35)

By equation (3.17) and Lemma3.4, we can deduce

hµn−µm, yn−yˆi

=

n−1

X

k=m

hµk+1−µk, yn−yˆi=ρ2

n−1

X

k=m

hsk+1, yn−yˆi

=ρ2hsk+1,sˆi −ρ2

n−1

X

k=m

hsk+1, sni+ρ2

n−1

X

k=m

hsk+1, W(xn−xˆ)i

=−

n−1

X

k=m

(ρ2hsk+1, sni+ρ1hrk+1, rni) +ρ2hyn−1−yn, W(xn−xˆ)i

(37)

By the Cauchy-Schwarz inequality, we can deduce

|hµn−µm, yn−yˆi| ≤ n−1

X

k=m

(ρ2|hsk+1, sni|+ρ1|hrk+1, rni|)

+ρ2|hyn−1−yn, W(xn−xˆ)i|+ρ2|hym−1−ym, W(xn−xˆ)i|

≤ 1 2

n X

k=m+1

(ρ1krkk2+ρ2kskk2)+

+ρ2|hyn−1−yn, W(xn−xˆ)i|+ρ2|hym−1−ym, W(xn−xˆ)i| (3.37)

Since Ek is monotonically decreasing along the iteration, then we can have further

deduction such that

|hµn−µm, yn−yˆi|

≤ 1 2

n X

k=m+1

Ek+ n−m

2 (ρ1krnk

2₊_ρ

2ksnk2)

+ρ2hyn−1−yn, W(xn−xˆ)i+ρ2|hym−1−ym, W(xn−xˆ)i|

≤

n X

k=m+1

Ek+ρ2hyn−1−yn, W(xn−xˆ)i+ρ2|hym−1−ym, W(xn−xˆ)i|

≤

n X

k=m+1

Ek+ρ2(kyn−yn−1k2+kym−ym−1k2+ 2kW(xn−xˆ)k2) (3.38)

According to Lemma 3.5, Lemma 3.6 and (3.38), we have hµn−µm, yn−yˆi → 0 as

m, n → ∞. And according to Lemma 3.7, we have Dµmf(ˆy, ym) −Dµnf(ˆy, yn) as

m, n → ∞. Thus by (3.35), we obtain that Dµmf(yn, ym) → 0 as m, n → ∞. Since f

is strong convex function, from (3.9) we can obtainkyn−ymk →0 as m, n→ ∞. Thus

{yk} is a Cauchy sequence. Therefore, there is ˜y ∈Y such thatyk→y˜ask→ ∞.

The second part is to show there is{xk} is Cauchy sequence.

Using Lemma 3.6, and yk →y˜ from above, we obtain W xk → y˜and Axk → b. Under

Assumption 4, we have

c1kxn−xkk2 ≤ kKxn−Kxkk2+kW xn−W xkk2 (3.39)

for any integers n, k. Hence, kxn−xkk →0 as n, k→ ∞. Therefore {xk} is a Cauchy

sequence inX and also in dom(W). Consequently, there is ˜x∈X such thatxk→x˜ as

k→ ∞. Furthermore, we haveb= limk→∞Kxk=Kx˜. SinceW is closed operator, and

{xk} is a sequence in dom(W), we have ˜x∈dom(W), Wx˜= ˜y.

The third part is to show ˜y∈dom(f),limk→∞f(yk) =f(˜y) and limk→∞Dµkf(˜y, yk) =

(38)

Since µk∈∂f(yk), we have

f(yk)≤f(ˆy) +hµk, yk−yˆi. (3.40)

Using equation (3.38), we have

hµk, yk−yˆi ≤ hµ1, yk−yˆi+ k X

i=2

Ei

+ρ2kyk−1−ykkkW(xk−xˆ)k+ρ2ky0−y1kkW(xk−xˆ)k (3.41)

Combining these two inequalities, and using Lemma 3.6, we have

f(yk)≤C <∞ (3.42)

whereC are some constants independent ofk.

Obviously, from above, we can easily see (˜x,y˜) is one feasible point of (3.3). Using (3.38)

and yk→y, W x˜ k→Wx˜, we have

lim sup

k→∞

|hµk, yk−y˜i| ≤ ∞ X

i=m+1

Ei (3.43)

for all integersm.

Combining equation (3.43) and Lemma3.6, we deduce thathµk, yk−y˜i →0 ask→ ∞.

Since (˜x,y˜) is one feasible point of (3.3), we can replace ˆyby ˜y. Thus lim sup_k→∞f(yk)≤

f(˜y). Combining it with (3.42) implies limk→∞f(yk) =f(˜y). Therefore

lim

k→∞Dµkf(˜y, yk) = 0

The final part is to show ˜x=x∗,y˜=y∗.

Let > 0 be any small number. Since hµn−µm, yn−yˆi → 0 as m, n → ∞, and by

Lemma (3.6), there existk0 such that

|hµk−µk0, yk−yˆi| ≤, ρ2|hyk0−1−yk0, W(xk−xˆ)i| ≤

fork0≤k. From equation (3.40), we can get

(39)

By equation (3.17) and Lemma (3.4), we have

hµk0, yk−yˆi=−hµk0, ski+hµ1, W(xk−xˆ)i+ρ2 k0 X

i=2

hsi, W(xk−xˆ)i

=−hµk0, ski − hλ1, rki+ρ2hy0−y1, W(xk−xˆ)i − k0 X

i=2

ρ1hri, rki

+

k0 X

i=2

ρ2h(yi−1−yi)−(yi−2−yi−1), W(xk−xˆ)i

=−hµk0, ski − hλ1, rki − k0 X

i=2

ρ1hri, rki+ρ2hyk0−1−yk0, W(xk−xˆ)i

Thus it is obvious that

|hµk0, yk−yˆi| ≤ kµk0kkskk+kλ1kkrkk+krkkρ1 k0 X

i=2

krik+

Therefore it is obvious that

f(yk)≤f(ˆy) + 2+kµk0kkskk+kλ1kkrkk+krkkρ1 k0 X

i=2

krik

By Lemma3.6, we deduce that

lim sup

k→∞

f(yk)≤f(ˆy) + 2

Since f is a lower semi-continuity, andyk →y˜, we get

f(˜y)≤lim inf

k→∞f(yk)≤lim sup_k_→∞f(yk)≤f(ˆy) + 2.

Hence f(˜y)≤f(ˆy). Since (˜x,y˜) is a feasible point of (3.3), thus we obtain

f(Wx˜) =f(˜y) =f(y∗) =f(W x∗) = minf(ˆy) = min{f(W x) :x∈dom(W), Kx=b}

By Theorem 3.1, we can conclude ˜x=x∗ and hence ˜y =y∗.

3.3 Regularization: noisy data case

In the real world, our data are always obtain by measurement which must contain errors.

(40)

We set

kbδ−bk ≤δ

where δ is a small positive number to represent the noise level. In this section we

consider Algorithm 4 with noisy data bδ. Due to the ill-posedness of inverse problems,

this algorithm exhibits semi-convergence property, i.e. the iterate converges toward the

sought solution at the beginning, and, after a critical number of iterations, the iterate

eventually diverges from the sought solution due to the amplification of noise. Thus a

stopping rule should be introduced to terminate the method so that a regularization

property can be ensured.

Rule 1. Let τ >1 be a fixed number and define kδ to be the integer such that

E_kδ_δ ≤max{ρ1, ρ2}τ2δ2 < Ekδ, 0≤k < kδ.

We will show the regularization property of Algorithm 4terminated by Rule 1.

Lemma 3.9. Rule1 defines a finite integerkδ. Moreover, there exist positive constants c and C such that

D_µδ nf(ˆy, y

δ n) +c

n X

k=m

E_kδ≤D_µδ mf(ˆy, y

δ

m) +ρ2hymδ−1−ymδ, W(ˆx−xδm+1)i

+C(kW(ˆx−xδ_m)k2+ksδ_mk2+E_mδ) (3.44)

Proof. Similar to the process of deriving (3.29), and using the assumption thatkbδ−bk ≤

δ, we can also derive

D_µδ

k+1f(ˆy, y δ

k+1)−Dµδ kf(ˆy, y

δ

k) +Dµδ kf(y

δ k+1, ykδ)

=−ρ1krδk+1k2−ρ2ksδk+1k2+ρ2hykδ−1−ykδ, W(ˆx−xδk+1)i

−ρ2hykδ−yδk+1, W(ˆx−xδk+1)i+ρ1hrδk+1, b−bδii

≤ −ρ1krδk+1k2−ρ2ksδk+1k2+ρ2hykδ−1−ykδ, W(ˆx−xδk+1)i

−ρ2hykδ−yδk+1, W(ˆx−xδk+1)i+ρ1δkrkδ+1k (3.45)

Under the stop criterion, we can find that for 1≤k < kδ−1,

ρ1krδk+1kδ≤

p

ρ1krk+1k2+ρ2ksk+1k2

s

ρ1E_kδ

τ2_max(_ρ 1, ρ2)

≤ 1

τE δ