Primal-dual algorithm and ADMM for
digital image processing
by
Huichen Liang
u5606165
degree of Master of Mathematical Science
Acknowledgements
I would like to express my special appreciation and thanks to my supervisor Dr. Qinian
Jin, you have been a tremendous mentor for me. I would like to thank you for
encour-aging my research and for your brilliant comments. The biweekly meeting is very much
appreciated.
A special thanks to my best friend Jingyue Lu. Words cannot express how grateful I
am to you. Your advice on my study and life have been invaluable.
I would especially like to thank my family. You always encourage me to try new things.
Your support is vital for me to change my program from geophysics to math.
This thesis introduce two algorithms to remove the noise and blur from the image. In
the first section, we will talk about the primal-dual algorithms, which is efficient to solve
the non-smooth convex problem. For the general problem, this method will converge
to the saddle point with rate O(1/N) in finite dimension Hilbert space. Furthermore,
when either the primal object or dual object is uniformly convex, we can deduce that the
convergence rate can achieveO(1/N2). When both the primal object and dual object are
uniformly convex, we can deduce that the convergence rate can achieve O(ωN2). Since
the primal-dual algorithm is sensitive to the regularization parameter and it depends on
that the dual problem is solvable, in the second section, we introduce a method using
the alternating direction method of multipliers (ADMM) strategy by just adding a new
variable. This method will converge to the solution when the data obtained is exact.
Acknowledgements i
1 Introduction 1
2 First order primal-dual algorithms 5
2.1 the primal-dual algorithm . . . 7
2.2 Acceleration: Gis strongly convex . . . 12
2.3 Acceleration: Gand F∗ are both strongly convex . . . 16
3 ADMM regularization Algorithm 20 3.1 ADMM algorithm and basic estimates . . . 23
3.2 Convergence analysis of exact data case . . . 30
3.3 Regularization: noisy data case . . . 35
4 Experiments 43 4.1 The Primal-Dual Algorithm . . . 43
4.1.1 Total Variation Based Image Denoising . . . 44
4.1.1.1 The ROF Model . . . 44
4.1.1.2 The TV-L1 Model . . . 47
4.1.1.3 The Huber-ROF Model . . . 49
4.1.2 Advanced Imaging Problems . . . 51
4.1.2.1 Image Deconvolution and Zooming. . . 51
4.2 Experiments of ADMM Algorithm . . . 53
4.3 Conclusion . . . 55
Bibliography 56
Introduction
The first permanent photograph of a camera image was made in 1826 by Joseph Nicphore
Nipce. After that, more and more people are accustomed to use the camera to record the
meaningful scenes. For us, the photos are the clearer the better, however, every photo
should be more or less blurry. The noise and blur produced by not only by camera, but
also the sensor and circuitry of a scanner . Since these noise and blur are unavoidable,
this paper will introduce some algorithms with applications to remove the these bad
elements.
A digital image is a numeric representation of a two-dimensional image. The elements
form this digital picture is called elements pixels. We can use a intensity value to
represent the pixels, for example, (1,0,0) is to represent the colour red in Matlab.
Furthermore, A digital image can be transformed into a matrix. In particular, a digital
image can be represent by a two dimensional matrix to describe the grayscale that each
element of the matrix determines the intensity of the corresponding pixel. And a three
dimensional matrix can use to describe the colorscalethat the elements which are integer
numbers between 0 and 255 determine the intensity of the pixel with respect to the
color of the matrix. Once the image has been transformed into matrix, we can do some
calculation of it.
Many papers have proven that there are many methods to solve the ill-posed inverse
imaging problems. Usually, the problems can be divided into two parts: the convex
problems and the non-convex problems. For the convex problems, we can usually obtain
the global optimum whose accuracy is only depended on the the model, independent of
the initialisation and the optimisation algorithm. For the non-convex problems, we can
compute the more precious solution, however, the solution is sensitive to the initialisation
and the optimisation algorithm.
Therefore, in this paper, we will introduce two algorithms: the primal-dual algorithm
and the alternating direction method of multipliers (ADMM) to solve ill-posed inverse
imaging problems.
Total variation minimisation is vital to the convex method for imaging, which is based
on the principle that the noise signals always have a high total variation. This method
was introduced to solve the image denoising by Rudin, Osher and Fatemi [11]. The
advantage of this method is that it considers the sharp discontinuities in the result.
However, for the non-smooth problems, it is not easy to minimise the total variation.
Therefore at first, we will introduce an algorithm that the primal-deal algorithm, for the
non-smooth convex optimisation problems.
The second chapter of this paper, which is based on the paper [4], is to talk about the
primal-dual algorithm. In this chapter, we will focus on using primal-dual algorithm to
solve the generic saddle-point problem such that
min
x∈Xmaxy∈Y < Kx, y >+G(x)−F ∗
(y)
where X and Y are two finite-dimensional real vector spaces. We will analyze this
algorithm under several assumptions. Firstly, we will talk about the basic algorithm
under the assumption thatGandF∗are both are proper, convex, lower-semicontinuous
functions. Then Algorithm 1 will converge at the rate of O(1/N). And we havexn →
x∗, yn→y∗. In the second section of chapter 3 , we add one assumption that only Gis
uniformly convex, then we can obtain that the Algorithm 1 with constant step length
still converges at the rate of O(1/N). Therefore we consider to update the length of
step in each iteration. We obtain the Algorithm 2, that the length of steps in each
iteration depends on n. Under this algorithm, we deduce that, the convergence rate is
O(1/N2). In the next section, assuming both of G and F∗ are uniformly convex but
with the specific constant step length, we deduce the convergence rate isO(ωN/2) under
the Algorithm 3.
Variational regularization methods is a common tool to solve these inverse problems.
Total variation regularization what we mentioned is one kind of the variational
regu-larization methods. While from the above experiments, we can find that the choice of
the regularization parameter is very important for denoising. When it is very small, the
denoising effect is not apparent, while it is very large, the total variation term plays an
adding a variable y=W x such that
minimize f(y)
subject to Ax=b, W x=b, x∈domW
Next, this chapter continues to doing the convergence analysis under four assumptions
thatAis a bounded linear operator,f is a proper, lower semi-continuous, strong convex
function, Wv is a densely defined, closed, linear operator, and there exists a positive
constant c1 such that kAxk2 +kW xk2 ≥ c1kxk2. We find that the problem admits a
unique solution.
Furthermore, we observe that both of x and y will converge with the increase of the
iteration number. We letx∗ be the unique solution andy∗ =W x∗, then there holds
xk→x∗, yk→y∗, W xk→y∗, f(yk)→f(y∗), Dµkf(y
∗
, yk)→0
ask→ ∞, wherekis the iteration number. Obviously, the data obtained must contain
error, then we use δ to represent the noisy level, such that kbδ−bk2 ≤δ. then there
holds
xδk→x∗, ykδ→y∗, W xδk→y∗, f(ykδ)→f(y∗), Dµδ kf(y
∗, yδ k)→0
asδ →0.
Furthermore, after we adding the a stopping criterion to the ADMM algorithm, there
holds
xδkδ →x∗, ykδδ →y∗, W xδkδ →y∗, f(ykδδ)→f(y∗), Dµkδf(y∗, ykδ)→0
asδ →0. And kδ is the first integer satisfied the stopping criterion.
The first section of final chapter shows the experiment result comparisons of
primal-dual algorithms using the total variation method. There are two major categories of
image deblurring which are image sharpening and image restoration by deconvolution.
Sharpening enhances the definition of edges in an image or high frequency components to
bring out invisible details. Image restoration based on deconvolution exploits a different
concept. The blurred image is modeled as the original image convolved with a 2D
filter (PSF), which degrades the image. The goal of image restoration is to undo the
convolution and in turn eliminate the degradation or the blurring.
The first model introduced in chapter three is the Rudin Osher Fatemi (ROF) Model
noise from a noisy digital image preserving sharp discontinuities. This method is to seek
a minimizer of the sum of a data fidelity term measured in the square of l2-norm and
the total variation regularization term which is given by
min
x Z
Ω
|Du|+ λ
2ku−gk
2 2
We also call it the TV-L2 model since data fidelity term is in two norm. The second
model is the TV-L1 Model, which just replaces theL2 norm in ROF model byL1 norm,
which is defined by
min
x Z
Ω
|Du|+λku−gk1
Then it becomes to be non-strictly convex with non-unique global minimizer [12]. The
TV-L1 Model just change a little bit, while it always obtains the cleaner image than
ROF model especially for the salt and pepper noise, and is contrast invariant[16]. The
third model is the Huber-ROF model. This model is smooth in both terms, which just
replace the L1 norm by the Huber-norm such that
|x|α =
|x|2
2α if|x| ≤α
|x| − α
2 if|x|> α
This model are employed to avoid undesired staircasing effects and yields a more natural
results. Finally, we will extend the ROF model to image deconvolution and digital
zooming such that
min
u Z
Ω
|Du|+ λ
2kAu−gk
2 2
whereAis convolution with the point spread function. The total variation based
zoom-ing brzoom-ings a image with sharp edge with a very blurry result.
The second section of final chapter shows the experiment result comparisons of ADMM
algorithms adding a stop criterion. By settingf(y) =kyk1+ν2kyk22, which can be solved
explicitly by the soft thresholding, we can see this method can obtain a clear result from
First order primal-dual
algorithms
Blurring in images can arise from many sources, such as limitations of the optical
sys-tem, camera and object motion, astigmatism, and environmental effects ([8]). Image
deblurring is the process of making a blurry image clearer to better represent the true
scene.
Let x and b denote the true image and the blurred image respectively. The blurring
process can be described by a bounded linear operatorK:X →Y between two Hilbert
spacesX and Y such thatb=Kx. Considering the appearance of unavoidable random
noise, one usually has the noisy data ˜bin the sense that
˜b=Ax+ε
whereε denotes the noise. Therefore how to reconstruct the true image from the noisy
data becomes an important question.
The variational regularization method is to reconstruct the true image by considering
the minimization problem
min
x∈X{G(x) +F(Kx)}, (2.1)
where G : X → [0,∞] and F :Y → [0,∞] are proper, lower semi-continuous, convex
functions. The first term G(x) is the regularization term which is used to capture the
feature of the sought solution, and the second termF(Kx) is fidelity-to-data term which
measures the smallness of the residual.
LetF∗ denote the Fenchel conjugate ofF, i.e.
F∗(y) = sup
z∈Rn
{hy, zi −F(z)},
thenF∗ :Y →[0,∞] is also a proper, lower semi-continuous, convex function with the
property that
F(z) = sup
y∈Rn
{hz, yi −F∗(y)}.
Consequently, (2.1) can be reformulated into the following equivalent saddle-point
prob-lem
min
x∈Xmaxy∈Y {G(x) +hKx, yi −F ∗
(y)}, (2.2)
whereh·,·idenotes the inner product whose induced norm is denoted byk · k. Therefore,
finding a solution ˆxof (2.1) is equivalent to find a solution (ˆx,yˆ) of (2.2) which is called
a saddle point in the sense that
L(ˆx, y)≤L(ˆx,yˆ)≤L(x,yˆ)
for all x∈X and y∈Y, where
L(x, y) =G(x) +hKx, yi −F∗(y).
It is easy to see that (ˆx,yˆ) is a saddle point if and only if
G(x)≥G(ˆx) +h−K∗y, xˆ −xˆi,
F∗(y)≥F∗(ˆy) +hKx, yˆ −yˆi.
which is equivalent to
Kxˆ∈∂F∗(ˆy), −(K∗yˆ)∈∂G(ˆx), (2.3)
where∂F∗ and∂G denote the subdifferential of F∗ andGrespectively. Recall that, for
a convex functionf :X→(−∞,∞], its subdifferential∂f(x) atx∈X is defined b
∂f(x) :={ξ ∈X :f(¯x)≥f(x) +hξ,x¯−xi, ∀¯x∈X.
In this chapter we assume that (2.2) admits a saddle point and we will introduce the
2.1
the primal-dual algorithm
We first recall that the classical Uzawa algorithm [1] builds approximate solutions
iter-atively by first updating y via a proximal maximization problem and then updating x
via a proximal minimization problem. To be more precise, the Uzawa algorithm takes
the form
yn+1= arg max
y∈Y
L(xn, y)−
1
2σky−ynk
2
,
xn+1= arg min
x∈X
L(x, yn+1) +
1
2τkx−xnk
2
which, according to the definition of L(x, y), can be written as
yn+1= arg max
y∈Y
hKxn, yi −F∗(y)−
1
2σky−ynk
2
,
xn+1= arg min
x∈X
G(x) +hKx, yn+1i+
1
2τkx−xnk
2
,
where σ and τ are suitably chosen positive numbers. This Uzawa algorithm has been
used in [12] to solve the total variation regularization method and the resulting method
is called the primal dual hybrid gradient method.
In [4] Chambolle and Pock revisited the Uzawa algorithm and proposed a class of
first-order primal-dual algorithms by introducing extrapolation steps. Their algorithms takes
the following form.
Algorithm 1
• Choose τ, σ >0,θ∈[0,1], (x0, y0)∈X×Y and set ¯x0 =x0.
• Iterations (n≥0): Updatexn, yn,x¯n as follows:
yn+1= arg max
y∈Y
hKx¯n, yi −F∗(y)−
1
2σky−ynk
2
,
xn+1= arg min
x∈X
G(x) +hKx, yn+1i+
1
2τkx−xnk
2
,
¯
xn+1=xn+1+θ(xn+1−xn).
We note that Algorithm 1 withθ = 0 becomes the Uzawa algorithm. In the following
we will give the convergence analysis of this algorithm forθ= 1.
(a) For any n∈Nthere holds
||yn−yˆ||2
2σ +
||xn−xˆ||2
2τ ≤C
||
y0−yˆ||2
2σ +
||x0−xˆ||2
2τ
, (2.4)
where C := (1−τ σL2)−1;
(b) Let xˆN = N1 N P n=1
xn and yˆN = N1 N P n=1
yn. Then the weak cluster points of (ˆxN,yˆN)
are saddle points of (2.2).
(c) If both X and Y have finite dimensions, then there exists a saddle point (x∗, y∗)
of (2.2) such that (xn, yn)→(x∗, y∗) asn→ ∞.
Proof. From the definition ofyn+1 andxn+1 and the first order optimality condition, we
can deduce that
yn−yn+1
σ +Kx¯n∈∂F ∗(y
n+1),
xn−xn+1
τ −K
∗
yn+1 ∈∂G(xn+1).
(2.5)
Therefore, by the definition of subdifferential, we have for any (x, y)∈X×Y that
F∗(y)≥F∗(yn+1) +
yn−yn+1
σ +Kx¯n, y−yn+1
, (2.6)
G(x)≥G(xn+1) +
xn−xn+1
τ −K
∗y
n+1, x−xn+1
. (2.7)
Using the identity 2ha−b, ai=kak2− kbk2+ka−bk2, we can deduce
1
σhyn−yn+1, y−yn+1i=
1
2σ kyn−yn+1k
2− ky
n−yk2+ky−yn+1k2
(2.8)
1
τhxn−xn+1, x−xn+1i=
1
2τ kxn−xn+1k
2− kx
n−xk2+kx−xn+1k2
(2.9)
Summing (2.6) and (2.7), and using (2.8) and (2.9) we have
ky−ynk2
2σ +
kx−xnk2
2τ
≥ ky−yn+1k
2
2σ +
kx−xn+1k2
2τ +
kyn−yn+1k2
2σ +
kxn−xn+1k2
2τ
+ [G(xn+1)−F∗(y)]−[G(x)−F∗(yn+1)]
+hKx¯n, y−yn+1i − hK∗yn+1, x−xn+1i
= ky−yn+1k
2
2σ +
kx−xn+1k2
2τ +
kyn−yn+1k2
2σ +
kxn−xn+1k2
2τ
By using ¯xn= 2xn−xn−1, we can see that
hK(xn+1−x¯n), yn+1−yi=hK((xn+1−xn)−(xn−xn−1)), yn+1−yi
=hK(xn+1−xn), yn+1−yi − hK(xn−xn−1), yn−yi
− hK(xn−xn−1), yn+1−yni
≥ hK(xn+1−xn), yn+1−yi − hK(xn−xn−1), yn−yi
− kKkkxn−xn−1kkyn+1−ynk. (2.11)
Combine (2.10) and (2.11), we can deduce for all (x, y)∈X×Y that
ky−ynk2
2σ +
kx−xnk2
2τ
≥ ky−yn+1k
2
2σ +
kx−xn+1k2
2τ +
kyn−yn+1k2
2σ +
kxn−xn+1k2
2τ
+L(xn+1, y)−L(x, yn+1)− kKkkxn−xn−1kkyn+1−ynk
+hK(xn+1−xn), yn+1−yi − hK(xn−xn−1), yn−yi.
By using the Cauchy-Schwarz inequality, we have
kKkkxn−xn−1kkyn+1−ynk ≤ αkKk
2 kxn−xn−1k
2+ kKk
2α kyn+1−ynk
2. (2.12)
for any α >0. By takingα =pσ/τ in (2.12) we therefore have
ky−ynk2
2σ +
kx−xnk2
2τ
≥ ky−yn+1k
2
2σ +
kx−xn+1k2
2τ + (1−
√
στkKk)kyn−yn+1k
2
2σ
+kxn−xn+1k
2
2τ −
√
στkKkkxn−1−xnk
2
2τ +L(xn+1, y)−L(x, yn+1)
+hK(xn+1−xn), yn+1−yi − hK(xn−xn−1), yn−yi. (2.13)
Summing (2.13) over nfrom n= 0 to n=N−1 it follows that
N X
n=1
(L(xn, y)−L(x, yn)) +
ky−yNk2
2σ +
kx−xNk2
2τ +
||xN −xN−1||2
2τ
+ (1−√στkKk)
N X
n=1
||yn−yn−1||2
2σ + (1−
√
στkKk)
N−1
X
n=1
||xn−xn−1||2
2τ
≤ ||y−y0||
2
2σ +
||x−x0||2
for all (x, y)∈X×Y, where x−1 =x0. By using again the Cauchy-Schwarz inequality
we have
|hK(xN −xN−1), yN−yi| ≤
kxN−xN−1k2
2τ +
τkKk2
2 ky−yNk
2.
Consequently
N X
n=1
(L(xn, y)−L(x, yn)) + (1−στkKk2)
ky−yNk2
2σ +
kx−xNk2
2τ
+ (1−√στkKk)
N X
n=1
kyn−yn−1k2
2σ + (1−
√
στkKk)
N−1
X
n=1
kxn−xn−1k2
2τ
≤ ky−y0k
2
2σ +
kx−x0k2
2τ (2.14)
Note that for a saddle point (ˆx,yˆ) we have L(xn,yˆ) ≥ L(ˆx,yˆ) ≥ L(ˆx, yn). By setting
(x, y) = (ˆx,yˆ) in (2.14) and usingστkKk2<1, we can derive that
kyN−yˆk2
2σ +
kxN−xˆk2
2τ ≤C
k
y0−yˆk2
2σ +
kx0−xˆk2
2τ
and
(1−√στkKk)
N−1
X
n=1
k
yn−yn−1k2
2σ +
kxn−xn−1k2
2τ
≤ ky0−yˆk
2
2σ +
kx0−xˆk2
2τ (2.15)
for all N, whereC:= (1−τ σL2)−1. This complete the proof of (a).
Next we prove (b). From (2.14) we can deduce for ˆxN = N1 N P n=1
xn and ˆyN = N1 N P n=1
yn
that
L(ˆxN, y)−L(x,yˆN) =
1
N N X
n=1
(L(xn, y)−L(x, yn))≤
1
N
ky−y0k2
2σ +
kx−x0k2
2τ
(2.16)
for any (x, y)∈X×Y. Let (x∗, y∗) denote a weak cluster point of (¯xN,y¯N). Then there
is a subsequence (ˆxNj,yˆNj) such that ˆxNj * x
∗ and ˆy
Nj * y
∗ as j → ∞, where “*”
denotes weak convergence. We have
h¯xNj, K
∗yi → hx∗, K∗yi and hKx,y¯
Nji → hKx, y
Since F∗ and G are convex and lower continuous, they are weak lower
semi-continuous. Thus
G(x∗)≤lim inf
j→∞ G(¯xNj) and F
∗(y∗)≤lim inf j→∞ F
∗(¯y Nj).
By settingN =Nj in (2.16) and takingj→ ∞, we can deduce that
L(x∗, y)−L(x, y∗)≤lim inf
j→∞ (L(¯xNj, y)−L(x,y¯Nj))≤0 (2.17)
and thus L(x∗, y) ≤L(x∗, y∗) ≤L(x, y∗) for all (x, y)∈ X×Y. Therefore (x∗, y∗) is a
saddle-point of (2.2).
Finally we prove (c). From (a) we know that (xn, yn) is a bounded sequence. Since both
X and Y have finite dimension, there must exist a convergent subsequence (xnk, ynk)
whose limit is denoted by (x∗, y∗). From (2.15) we havexn−xn−1 →0 andyn−yn−1 →
as n → ∞. Therefore, ¯xnk → x
∗, and ynk+1 → y∗ and x
nk+1 → x
∗ as k → ∞. By
settingn=nk in (2.6) and (2.7), letting k→ ∞, and using the lower semi-continuity of
F∗ and G, we can obtain
F∗(y)≥F∗(y∗) +hKx∗, y−y∗i, G(x)≥G(x∗) +hK∗y∗, x−x∗i.
Consequently L(x∗, y∗)−L(x∗, y) ≥ 0 and L(x, y∗)−L(x∗, y∗) ≥ 0, i.e. L(x∗, y) ≤
L(x∗, y∗)≤L(x, y∗) for all (x, y)∈X×Y. Therefore (x∗, y∗) is a saddle point of (2.2).
Now we take (x, y) = (x∗, y∗) in (2.13). For any N > nk we sum (2.13) over n from
n=nk ton=N −1 to obtain
ky∗−yNk2
2σ +
kx∗−xNk2
2τ −
kxnk−xnk−1k
2
2τ
+ (1−√στkKk)
N X
n=nk+1
kyn−yn−1k2
2σ + (1−
√
στkKk)
N−1
X
n=nk
kxn−xn−1k2
2τ
+hK(xN −xN−1), yN −y∗i − hK(xnk −xnk−1), ynk−y
∗i
≤ ky
∗−y nkk
2
2σ +
kx∗−xnkk
2
2τ .
UsingxN −xN−1 →0 andστkKk2 <1 we can deduce that
lim sup
N→∞ k
y∗−yNk2
2σ +
kx∗−xNk2
2τ
≤ kxnk−xnk−1k
2
2τ +hK(xnk−xnk−1), ynk−y
∗i
+ky
∗−y nkk
2
2σ +
kx∗−xnkk
2
for allk. By takingk→ ∞, usingxnk−xnk−1 →0,xnk →x
∗ and y nk →y
∗ ask→0∞
we can obtain
lim sup
N→∞ k
y∗−yNk2
2σ +
kx∗−xNk2
2τ
≤0
ThereforexN →x∗ and yN →y∗ asN → ∞.
2.2
Acceleration:
G
is strongly convex
In this section we will present an accelerated version of Algorithm 1 whenGis strongly
convex in the sense that there is a constantγ >0 such that
G(tx1+ (1−t)x2) +
γ
2t(1−t)kx1−x2k
2 ≤tG(x
1) + (1−t)G(x2) (2.18)
for all x1, x2 ∈ X and 0 ≤ t ≤ 1. By using variable steps and variable relaxation
parameters, the accelerated algorithm takes the following form.
Algorithm 2
Initialization: Chooseτ0, σ0 >0 with τ0σ0kKk2 ≤1, (x0, y0)∈X×Y and set ¯x0=x0;
Iterations (n≥0): Update xn, yn,x¯n, θn, τn, σn as follows:
yn+1= arg max
y∈Y
hKx¯n, yi −F∗(y)−
1 2σn
ky−ynk2
,
xn+1 = arg min
x∈X
G(x) +hKx, yn+1i+
1 2τn
kx−xnk2
,
θn= 1/ p
1 + 2γτn, τn+1=θnτn, σn+1=σn/θn,
¯
xn+1 =xn+1+θn(xn+1−xn)
In the following we will show that Algorithm2 has the convergence rate O(1/N2). We
will use (ˆx,yˆ) to denote any saddle point of (2.2). Since G is strongly convex, we can
show that [3]
G(¯x)≥G(x) +hξ,x¯−xi+γ
2kx¯−xk
2 (2.19)
for all ξ ∈ ∂G(x) and ¯x ∈ X. According to the definition of xn+1 we have xn−τxnn+1 −
K∗yn+1∈∂G(xn+1). Therefore
G(x)≥G(xn+1) +
xn−xn+1
τn
−K∗yn+1, x−xn+1
+γ
2kx−xn+1k
for all x∈X. By using the definition of yn+1 we can also derive that
F∗(y)≥F∗(yn+1) +
yn−yn+1
σn
+Kx¯n, y−yn+1
, ∀y ∈Y.
With the help of the above two inequalities, we can use the similar argument for deriving
(2.10) to obtain
kˆy−ynk2
2σn
+kˆx−xnk
2
2τn
≥ kˆy−yn+1k
2
2σn
+kˆx−xn+1k
2
2τn
+kyn−yn+1k
2
2σn
+kxn−xn+1k
2
2τn
+L(xn+1,yˆ)−L(ˆx, yn+1) +
γ
2kˆx−xn+1k
2
+hK(xn+1−x¯n), yn+1−yˆi.
Since (ˆx,yˆ) is a saddle point of (2.2), we may use (2.3), the strong convexity of G and
the convexity of F∗ to derive that
L(xn+1,yˆ)−L(ˆx, yn+1) = [G(xn+1)−G(ˆx)− h−K∗y, xˆ n+1−xˆi]
+ [F∗(yn+1)−F∗(ˆy)− hKx, yˆ n+1−yˆi]
≥ γ
2kxn+1−xˆk
2.
Therefore
kˆy−ynk2
2σn
+kˆx−xnk
2
2τn
≥ kˆy−yn+1k
2
2σn
+kˆx−xn+1k
2
2τn
+kyn−yn+1k
2
2σn
+kxn−xn+1k
2
2τn
+γkxˆ−xn+1k2+hK(xn+1−x¯n), yn+1−yˆi.
Recall that ¯xn=xn+θn−1(xn−xn−1), we can further obtain
kˆy−ynk2
2σn
+kˆx−xnk
2
2τn
≥γkˆx−xn+1k2+
kˆy−yn+1k2
2σn
+kˆx−xn+1k
2
2τn
+ kyn−yn+1k
2
2σn
+kxn−xn+1k
2
2τn
+hK(xn+1−xn), yn+1−yˆi −θn−1hK(xn−xn−1), yn−yˆi
−θn−1kKkkxn−xn−1kkyn+1−ynk. (2.21)
By the Cauchy-Schwarz inequality we have
θn−1kKkkxn−xn−1kkyn+1−ynk ≤
kyn−yn+1k2
2σn
+θ
2
n−1kKk2σn
2 kxn−xn−1k
Combining this with (2.21) it follows that
kyˆ−ynk2 σn
+ kˆx−xnk
2
τn
≥(1 + 2γτn) τn+1
τn
kˆx−xn+1k2
τn+1
+σn+1
σn
kˆy−yn+1k2
σn+1
+kxn−xn+1k
2
τn
+ 2hK(xn+1−xn), yn+1−yˆi −2θn−1hK(xn−xn−1), yn−yˆi
−θn2−1kKk2σnτn−1
kxn−xn−1k2
τn−1
.
By the definition ofθn,τnand σn we can see that
(1 + 2γτn) τn+1
τn
= 1
θn
= σn+1
σn
= τn
τn+1
, σnτn=σn−1τn−1 =· · ·=σ0τ0
and
θn2−1kKk2σnτn−1 =θ2n−1
τn−1
τn
kKk2σ0τ0≤θn2−1
τn−1
τn
= τn
τn−1
.
Therefore
kˆy−ynk2 σn
+kˆx−xnk
2
τn
≥ τn
τn+1
kˆx−xn+1k2
τn+1
+kˆy−yn+1k
2
σn+1
+ kxn−xn+1k
2
τn
− τn
τn−1
kxn−xn−1k2
τn−1
+ 2hK(xn+1−xn), yn+1−yˆi −2θn−1hK(xn−xn−1), yn−yˆi.
By dividing the both side by τn and noting the fact θn−1/τn = 1/τn−1, we can deduce
that
∆n τn
≥ ∆n+1
τn+1
+kxn−xn+1k
2
τ2
n
−kxn−xn−1k
2
τ2
n−1
+ 2
τn
hK(xn+1−xn), yn+1−yˆi −
2
τn−1
hK(xn−xn−1), yn−yˆi, (2.22)
where
∆n=
kyˆ−ynk2 σn
+kxˆ−xnk
2
τn .
Summing the inequality (2.22) over nfrom n= 0 ton=N −1 and using x−1 =x0 we
deduce that
∆0
τ0
≥ ∆N
τN
+kxN−1−xNk
2
τN2−1 +
2
τN−1
hK(xN−xN−1), yN −yˆi
≥ ∆N
τN
+kxN−1−xNk
2
τN2−1 −
kxN−1−xNk2
τN2−1 − kKk
2ky
N −yˆk2
= 1
τN
(1−σNτNkKk2)τN
kyN −yˆk2 σNτN
+kxN −xˆk
2
= 1
τN
1−σ0τ0kKk2
σ0τ0
τNkyN −yˆk2+
kxN −xˆk2 τN
which implies
1−σ0τ0kKk2
σ0τ0
τN2kˆy−yNk2+kˆx−xNk2 ≤
∆0
τ0
τN2.
This inparticular shows that
kˆx−xNk2 ≤
∆0
τ0
τN2. (2.23)
We next show thatτN =O(1/N). Since θn= 1/
√
1 + 2γτnand τn+1 =θnτn, we have
τn+1 =
τn
√
1 + 2γτn .
Therefore
1
τn+1
=
s
1
τ2
n
+ 2γ
τn .
This clearly shows that 1/τn+1 ≥1/τn and thusτn+1 ≤τn≤τ0. Moreover
1
τn+1
= s 1 τn +γ 2
−γ2=
1
τn
+γ s
1− γ
2
(1/τn+γ)2
≥
1
τn
+γ 1− γ
2
(1/τn+γ)2
= 1
τn
+γ− γ
2
1/τn+γ
= 1
τn
+ 1
1 +γτn
≥ 1
τn
+ 1
1 +γτ0
. Consequently 1 τN ≥ 1 τ0 + N
1 +γτ0
which in particular implies thatτN =O(1/N). Combining this with (2.23) we therefore
obtain the following convergence rate result.
Theorem 2.2. Assume that F∗ is convex and Gis strongly convex. Let (ˆx,yˆ)∈X×Y be the saddle point of (2.2). For Algorithm2 there holds
kˆx−xNk2 ≤ C N2
k
ˆ
x−x0k2
τ02 +
kyˆ−y0k2
τ0σ0
,
2.3
Acceleration:
G
and
F
∗are both strongly convex
In this section we will show that the primal-dual algorithm converges linearly when both
F∗ and G are strongly convex. We assume that G is strongly convex in the sense of
(2.18) and we also assume that F∗ is strongly convex in the sense that there is δ > 0
such that
F∗(ty1+ (1−t)y2) +
δ
2t(1−t)ky1−y2k
2≤tF∗
(y1) + (1−t)F∗(y2)
for all y1, y2 ∈Y and 0≤t ≤ 1. The primal-dual algorithm is then modified into the
following form.
Algorithm 3
Initialization: Choose µ = 2√γδ/kKk, τ = µ/(2γ), σ = µ/(2δ), θ ∈ [1/(1 +µ),1], (x0, y0)∈X×Y and set ¯x0=x0;
Iterations (n≥0): Update xn, yn,x¯n as follows:
yn+1= arg max
y∈Y
hKx¯n, yi −F∗(y)−
1
2σky−ynk
2
,
xn+1= arg min
x∈X
G(x) +hKx, yn+1i+
1
2τkx−xnk
2
,
¯
xn+1=xn+1+θ(xn+1−xn)
Let (ˆx,yˆ) ∈X×Y be the saddle point of (2.2). By the strong convexity of F∗ and G
together with the definition ofyn+1 andxn+1, we can see that
F∗(y)≥F∗(yn+1) +
yn−yn+1
σ +Kx¯n, y−yn+1
+δ
2ky−yn+1k
2,
G(x)≥G(xn+1) +
xn−xn+1
τ −K
∗
yn+1, x−xn+1
+γ
2kx−xn+1k
2
and
L(x,yˆ)−L(ˆx, y)≥ γ
2kx−xˆk
2+δ
2ky−yˆk
2
for all x ∈ X and y ∈ Y. By virtue of these three inequalities, we may use the same
argument for deriving (2.21) to obtain
kˆy−ynk2
2σ +
kˆx−xnk2
2τ ≥
2δ+ 1
σ kˆ
y−yn+1k2
2 +
2γ+ 1
τ k
ˆ
x−xn+1k2
2
+kyn−yn+1k
2
2σ +
kxn−xn+1k2
2τ
By the choices of µ,σ and τ we have
τ = µ 2γ =
1 kKk
s δ
γ, σ = µ
2δ =
1 kKk
r γ
δ, (2.24)
2δ+ 1
σ = 2δ
1 +1
µ
, 2γ+ 1
τ = 2γ
1 +1
µ
.
Therefore
1
µ δkˆy−ynk
2+γkˆx−x
nk2
≥
1 +1
µ
δkyˆ−yn+1k2+γkxˆ−xn+1k2
+ 1
µ δkyn−yn+1k
2+γkx
n−xn+1k2
+hK(xn+1−x¯n), yn+1−yˆi.
Let
∆n:=δkˆy−ynk2+γkxˆ−xnk2.
Multiplying the above inequality byµ we obtain
∆n≥(1 +µ)∆n+1+δkyn−yn+1k2+γkxn−xn+1k2
+µhK(xn+1−x¯n), yn+1−yˆi. (2.25)
Recall that ¯xn = xn+θ(xn−xn−1). For any 0 < ω ≤ θ and α > 0 we may use the
Cauchy-Schwarz inequality to deduce that
µhK(xn+1−x¯n), yn+1−yˆi
=µhK(xn+1−xn), yn+1−yˆi −µθhK(xn−xn−1), yn+1−yˆi
=µhK(xn+1−xn), yn+1−yˆi −µωhK(xn−xn−1), yn−yˆi
−µωhK(xn−xn−1), yn+1−yni −µ(θ−ω)hK(xn−xn−1), yn+1−yˆi
≥µhK(xn+1−xn), yn+1−yˆi −µωhK(xn−xn−1), yn−yˆi
−µωkKk
αkxn−xn−1k
2
2 +
kyn+1−ynk2
2α
−µ(θ−ω)kKk
αkxn−xn−1k
2
2 +
kyn+1−yˆk2
2α
=µhK(xn+1−xn), yn+1−yˆi −µωhK(xn−xn−1), yn−yˆi
−µθkKkαkxn−xn−1k
2
2 −µωkKk
kyn+1−ynk2
2α
−µ(θ−ω)kKkkyn+1−yˆk
2
Now we chooseα=ωpγ/δ. Then
µθkKkα= 2θωγ, µωkKk α = 2δ.
Therefore
µhK(xn+1−x¯n), yn+1−yˆi
≥µhK(xn+1−xn), yn+1−yˆi −µωhK(xn−xn−1), yn−yˆi
−θωγkxn−xn−1k2−δkyn+1−ynk2− θ−ω
ω δkyn+1−yˆk
2.
Combining this with (2.25) gives
∆n≥(1 +µ)∆n+1+γkxn−xn+1k2−ωθγkxn−xn−1k2
+µhK(xn+1−xn), yn+1−yˆi −µωhK(xn−xn−1), yn−yˆi
−θ−ω
ω δkyn+1−ynk
2.
Now we chooseω such that
1 +µ− 1
ω = θ−ω
ω .
This implies thatω= 2+1+µθ. By usingθ∈[1/(1 +µ),1] one can easily see thatω≤θand
0< ω <1. Consequently
∆n≥
1
ω∆n+1+γkxn−xn+1k
2−ωθγkx
n−xn−1k2
+µhK(xn+1−xn), yn+1−yˆi −µωhK(xn−xn−1), yn−yˆi
≥ 1
ω∆n+1+γkxn−xn+1k
2−ωγkx
n−xn−1k2
+µhK(xn+1−xn), yn+1−yˆi −µωhK(xn−xn−1), yn−yˆi.
Multiplying the both sides by ω−nwe obtain
ω−n∆n≥ω−n−1∆n+1+ω−nγkxn−xn+1k2−ω−n+1γkxn−xn−1k2
+ω−nµhK(xn+1−xn), yn+1−yˆi
−ω−n+1µhK(xn−xn−1), yn−yˆi.
Set x−1 = x0. By summing the above inequality over n from n = 0 to n= N −1 we
can deduce that
By virtue of the Cauchy-Schwarz inequality and (2.24) we have
µhK(xN −xN−1), yN −yˆi ≥ −
1 2µkKk
r γ
δkxN −xN−1k
2+
s δ
γkyN−yˆk
2
!
=−γkxN −xN−1k2−δkkyN−yˆk2.
Therefore
∆0 ≥ω−N∆N −ω−N+1δkyN −yˆk2 ≥ω−N∆N−ω−N+1∆N =ω−N(1−ω)∆N
which implies that ∆N ≤ 1∆−0ωωN with 0< ω <1. We thus obtain the following linear
convergence result.
Theorem 2.3. Assume that bothGandF∗ are strongly convex. Let(ˆx,yˆ) be the unique saddle point of (2.2). Then for the sequence (xn, yn) defined by Algorithm 3 there holds
δkˆy−ynk2+γkˆx−xnk2≤ ωn
1−ω δkˆy−y0k
2+γkˆx−x 0k2
,
ADMM regularization Algorithm
In this chapter we consider the following convex minimization problem
minimize f(W x)
subject to Kx=b, x∈dom(W)
(3.1)
arising from linear inverse problems, where K : X → H is a bounded linear operator
between two Hilbert spaces X and H, W : X → Y is a linear operator from X to
another Hilbert space Y with domain dom(W), and f : Y → (−∞,∞] is a proper,
lower semi-continuous, convex function which is used to capture the feature the sought
solution under the transform W.
For inverse problems the operator K is usually either non-invertible or ill-conditioned
with a huge condition number. Thus, a small perturbation on the data may lead the
problem (3.1) to have no solution; even if it has a solution, this solution may not depend
continuously on the data. Due to the unavoidable appearance of noise in the data,
regularization method should be employed to solve (3.1) in a stable manner.
Let bδ be the noisy data which is a perturbation of b, the variational regularization
method renders (3.1) into the well-posed unconstrained minimization problem
min
x∈dom(W)
1
2kKx−b
δk2+αf(W x)
(3.2)
where the regularization parameter α >0 should be chosen carefully in order to
guar-antee a good performance. The primal-dual algorithms in the previous chapters can
be used to solve (3.2). There exist also other algorithms for solving (3.1). In
par-ticular, the alternating direction method of multipliers (ADMM) is among the most
famous ones. ADMM is a versatile splitting method introduced in [6,7] around the mid
1970s by Gabay, Mercier, Glowinski and Marrocco and it has been analyzed in [5, 10]
for well-posed problems. This method has been revitalized and popularized for solving
structured convex optimization problems in recent years; see [2] and references therein.
Although these methods perform well for solving (3.2), they suffer from the following
drawbacks for finding approximate solutions of (3.1):
(i) Since the performance of (3.2) depends on the choice of the regularization
param-eter . One has to tune the values of α and hence has to solve (3.2) for many
different values ofα in order to find a reasonable approximate solution. This can
be time-consuming.
(ii) All the available convergence analysis on these methods depends on the solvability
of the dual problem or or the existence of saddle points for the corresponding
Lagrangian function. Unfortunately, these conditions may not hold for (3.1) arising
from inverse problems.
In the following we will discuss the alternating direction method of multipliers developed
in [9] for solving the inverse problems (3.1) as a regularization method which avoids the
above two drawbacks. To formulate the method, we introduce an additional variable
y=W x and written (3.1) into the equivalent problem
minimize f(y)
subject to Kx=b, W x=y, x∈dom(W).
(3.3)
For (3.3) the corresponding augmented Lagrangian function is
Lρ1,ρ2(x, y;λ, µ) =f(y) +hλ, Kx−bi+hµ, W x−yi+ ρ1
2 kKx−bk+
ρ2
2 kW x−yk (3.4)
whereρ1 andρ2 are two positive constants. The ADMM algorithm proposed in [9] takes
the form
xn+1 = arg min
x∈dom(W)Lρ1,ρ2(x, yn;λn, µn)
yn+1= arg min
y∈Y Lρ1,ρ2(xn+1, y;λn, µn) λn+1 =λn+ρ1(Kxn+1−b),
µn+1=µn+ρ2(W xn+1−yn+1).
(3.5)
Obviously, thex−subproblem is a quadratic minimization problem, and they−subproblem
For the convenience analysis, we make the following assumptions.
Assumption 1. K:X →H is a bounded linear operator, and K∗:H→X is used to denote the adjoint of K.
Assumption 2. f :Y →(−∞,∞]is a proper, lower semi-continuous, strongly convex function in the sense that there exists a constant c0>0 such that
f(ty1+ (1−t)y2) +c0t(1−t)ky1−y2k2 ≤tf(y1) + (1−t)f(y2) (3.6)
for all y1, y2 ∈Y and 0≤t≤1.
Assumption 3. W :X →Y is a densely defined, closed, linear operator with domain dom(W). This implies that the adjointW∗ of W is weakly closed and densely defined.
Assumption 4. There exists a positive constant c1 such that
kKxk2+kW xk2 ≥c1kxk2, ∀x∈dom(W). (3.7)
To analyze the method (3.5), we will make use of the subdifferential∂f of f which has
been introduce in Chapter 2. Let
dom(∂f) ={y∈Y :∂f(y)6=∅}.
Then for any y∈dom(∂f) and µ∈∂f(y), the quantity
Dµf(¯y, y) =f(¯y)−f(y)− hµ,y¯−yi, ∀y¯∈Y (3.8)
is called the Bregman distance induced by f aty in the direction µ. From Assumption
2 it is easy to deduce that
Dµf(¯y, y)≥c0k¯y−yk2 (3.9)
for all ¯y∈Y,y∈dom(∂f) andµ∈∂f(y), and
hµ−µ, y¯ −y¯i ≥2c0ky−y¯k2 (3.10)
for all y,y¯∈dom(∂f),µ∈∂f(y), ¯µ∈∂f(¯y).
Under Assumptions1– 4, we can deduce that (3.1) has a unique solution wheneverbis
consistent in the sense that b=Kx for somex∈dom(W) withW x∈dom(f).
Proof. Denote f∗ := inf{f(W z) :Kz =b, z∈dom(W)}. Since bis consistent, we have
f∗ <∞. Let{zn}be a sequence such that
zn∈ dom(W), Kzn=b and lim
n→∞f(W zn) =f∗.
Since f is strong convex, it is coercive [3] and consequently {W zn} is bounded in Y.
From Assumption (4) we can also see that{zn}is bounded inX. By Bolzano-Weierstrass
theorem,{zn} has a subsequence, still denoted by the same notation, such that
zn* x∗ and W zn* y∗
for somex∗ ∈Xandy∗∈Y. By Assumption3and{zn} ⊂dom(D), we can deduce that
x∗ ∈dom(W) and y∗ =W x∗. From Assumption1we also have Kx∗ =b. According to
Assumption 2,f is convex and lower semi-continuous, and hencef is also weakly lower
semi-continuous [3]. Thus
f(W x∗)≤lim inf
n→∞ f(W zn) =f∗
which implies that x∗ must be a solution of (3.1).
Next we show the uniqueness. Assume that ¯x andx∗ are two solutions of (3.1). Then
f(W x∗) =f(Wx¯) = inf{f(W z) :Kz=b, z∈dom(W)}.
It then follows from the strong convexity of f that W x∗ =Wx¯. Note also that Kx∗ =
b=Kx¯. It then follows from Assumption 4 that
c1kx∗−x¯k2 ≤ kK(x∗−x¯)k2+kW(x∗−x¯)k2 = 0
and thus x∗ = ¯x.
3.1
ADMM algorithm and basic estimates
In practical applications, data are usually obtained by measurements and hence contain
errors. Thus, instead of a consistent data b we only have a noisy data bδ. In order
to use bδ to produce approximation solutions to the true solution of (3.1), we will use
(3.5) with b replaced by bδ to produce an iterative sequence. In order to indicate the
dependence on the noisy data, we will place a superscript “δ” on every element of the
Algorithm 4
Initialization: Choose ρ1 > 0 and ρ2 > 0. Take y0 ∈ Y, λ0 ∈ H, and µ0 ∈ Y. Set
y0δ=y0,λδ0 =λ0 and µδ0=µ0;
Iterations (n≥0): Update xδn,ynδ,λδn andµδn as follows:
xδn+1= arg min
x∈dom(W)
n
hλδn, Kxi+hµδn, W xi+ ρ1
2 kKx−b
δk2+ρ1
2 kW x−y
δ nk2
o ,
yδn+1= arg min
y∈Y n
f(y)− hµδn, yi+ρ1 2 kW x
δ
n+1−yk2
o ,
λδn+1=λδn+ρ1(Kxδn+1−bδ),
µδn+1=µδn+ρ2(W xδn+1−ynδ+1).
We first show that Algorithm 4 is well-defined. It suffices to show that the x− and
y−subproblems are well-defined. By rewriting these two subproblems into the equivalent
forms
xδn+1 = arg min
x∈dom(W)
nρ1
2 kKx−b
δ+λ
n/ρ1k2+
ρ2
2kW x−y
δ
n+µδk/ρ2k2
o ,
ynδ+1 = arg min
y∈Y n
f(y) + ρ2
2 ky−W x
δ
n+1−µδk/ρ2k2
o ,
we can obtain the well-definedness from the following result.
Lemma 3.2. Let Assumption 1–Assumption 4 hold.
(i) For any h∈H and v∈Y, the minimization problem
min
z∈dom(W)
nρ1
2 kKz−hk
2+ρ2
2 kW z−vk
2o (3.11)
has a unique solution z, and Dz and z depend continuously on v and h.
(ii) For any v∈Y, the minimization problem
min
y∈Y n
f(y) +ρ2
2ky−vk
2o (3.12)
has a unique solution y, andy andf(y) depend continuously on v.
Proof. (i) We first show the existence. Letm∗denotes the minimum value of (3.11) and
let {zn} be a minimizing sequence, i.e.
ρ1
2 kKzn−hk
2+ρ2
2 kW zn−vk
2 →m∗ asn→ ∞.
Then {Kzn} is bounded inH and {W zn} is bounded in Y. By Assumption 4, we can
a subsequence, still denoted by the same notation, such that
zn* z, W zn* y
asn→ ∞. By Assumption3we have z∈dom(W) andy=W z. Moreover
ρ1
2 kKz−hk
2+ρ2
2 kW z−vk
2 ≤lim inf
n→∞ ρ1
2kKzn−hk
2+ρ2
2 kW zn−vk
2=m∗.
Therefore, z is the minimizer of (3.11). In view of Assumption 4, we can see that the
objective function in (3.11) is strictly convex and hence (3.11) has a unique solution.
Next we show the continuous dependence. Let {(hn, vn) ⊂H×Y be a sequence such
that (hn, vn)→(h, v) and letznbe the solution of (3.11) with (h, v) replaced by (hn, vn).
WE need to show thatzn →z and W zn→ W z. By the minimizing property ofzn we
have
ρ1
2kKzn−hnk
2+ρ2
2 kW zn−vnk
2 ≤ ρ1
2kKz−hnk
2+ρ2
2kW z−vnk
2.
This implies that ρ1
2 kKzn−hnk2+
ρ2
2kW zn−vnk2is bounded by a constant independent
ofn. Consequently{Kzn}and{W zn}are bounded and thus{zn}is bounded inX. By
taking a subsequence if necessary we have
zn*zˆ and W zn* Wzˆ
for some ˆz∈dom(W). Since the norms are weakly lower semi-continuous, we have
ρ1
2kKzˆ−hk
2+ρ2
2 kWzˆ−vk
2≤lim inf
n→∞ nρ1
2 kKzn−hnk
2+ρ2
2kW zn−vnk
2o
≤lim sup
n→∞ nρ1
2 kKzn−hnk
2+ρ2
2 kW zn−vnk
2o
≤lim sup
n→∞ nρ1
2 kKz−hnk
2+ρ2
2kW z−vnk
2o
= ρ1
2 kKz−hk
2+ρ2
2kW z−vk
2.
Since z is the unique minimizer of (3.11), we must have ˆz=z. Thus
zn* z, W zn* W z and Kzn* Kz (3.13)
and
lim
n→∞ nρ1
2kKzn−hnk
2+ρ2
2 kW zn−vnk
2o= ρ1
2 kKz−hk
2+ ρ2
2 kW z−vk
Furthermore, we have
kKz−hk2 ≤lim inf
n→∞ kKzn−hnk
2≤lim sup
n→∞
kKzn−hnk2
=kKz−hk2+ρ2
ρ1
kW z−vk2−ρ2
ρ1
lim inf
n→∞ kW zn−vnk
2.
Since kW z−vk2≤lim inf
n→∞kW zn−vnk2, we have
lim inf
n→∞ kKzn−hk
2 = lim sup
n→∞
kkKzn−hk2 =kKz−hk2.
By the same procedure, we can also deduce that limn→∞kW zn−vk2 = kW z−vk2.
Thus, by virtue of (3.13) we can conclude thatKzn→Kz and W zn→W z. In view of
Assumption 4we thus have zn→z.
(ii) By using the same argument in (i) we can easily show that (3.12) has a solution.
Since f is strictly convex, the solution is unique.
Next we show the continuous dependence. Suppose we havevn→v inY. Letynbe the
solution of (3.12) withv replaced byvn. Then
ρ2(v−y)∈∂f(y) and ρ2(vn−yn)∈∂f(yn)
Thus, by the monotonicity of the subdifferential, we have
0≤ hρ(vn−yn)−ρ(v−y), yn−vni
which implies that
kyn−yk2≤ hvn−v, yn−yi ≤ kvn−vkkyn−yk.
Thereforekyn−yk ≤ kvn−vkand thusyn→yasn→ ∞. By the similar argument for
deriving (3.14), we have
lim
n→∞ n
f(yn) + ρ2
2 kyn−vnk
2o=f(y) +ρ2
2ky−vk
2.
Thusf(yn)→f(y) as n→ ∞.
Use the above lemma, we can immediately obtain the following result concerning the
Lemma 3.3. Let {bδ} be a sequence of noisy data satisfying kbδ−bk → 0 as δ → 0. Then for each fixed integern≥0 there hold
xδn→xn, yδn→yn, W xδn→W xn,
λδn→λn, µδn→µn, f(ynδ)→f(yn)
(3.15)
as δ→0, where (xn, yn, λn, µn) are defined by (3.5).
By using the definition ofxδn+1 andynδ+1 in Algorithm4, it is easy to see that
K∗λδn+ρ1K∗(Kxδn+1−bδ) +W∗[µδn+ρ2(W xδn+1−ynδ)] = 0,
0∈∂f(ynδ+1)−µδk−ρ2(W xδn+1−yδn+1).
By introducing the residuals
rδn=Kxδn−bδ, sδn=Dxδn−yδn
we can deduce that
λδn+1−λδn=ρ1rδn+1, (3.16)
µδn+1−µδn=ρ2sδn+1 (3.17)
µδn+1∈∂f(ynδ+1), (3.18)
K∗λδn+ρ1K∗rnδ+1 =−W
∗
[µδn+ρ2(W xδn+1−ynδ)]. (3.19)
Lemma 3.4. For all k≥1 there hold D
λδn+1, KxE+Dµδn+1, W xE=ρ2
D
yδn−ynδ+1, W xE (3.20)
and
ρ1
D
rδn+1, Kx E
+ρ2
D
sδn+1, W x E
=ρ2
D
(yδn−ynδ+1)−(yδn−1−yδn), W x E
(3.21)
for all x∈dom(W).
Proof. Multiplying (3.16) byK∗ and using (3.17), (3.19) we obtain for k≥0 that
K∗λδn+1 =K∗λδn+ρ1K∗rnδ+1 =−W
∗
µδn+ρ2(W xδn+1−ynδ)
=−W∗
µδn+1−ρ2(W xδn+1−ynδ+1) +ρ2(W xδn+1−ynδ)
=−W∗µδn+1+ρ2(yδn+1−ynδ)
Using again (3.16), (3.17) and (3.19), we then obtain for k≥1 that
ρ1K∗rδn+1 =K
∗
λδn+1−K∗λδn=W∗
µδn−µδn+1+ρ2(yδn−ynδ+1)−ρ2(ynδ−1−yδn)
=ρ2W∗
h
(ynδ−yδn+1)−(ynδ−1−ynδ)−sδn+1
i .
We thus complete the proof.
Since µδn+1 ∈∂f(ynδ+1), we have from (3.10) that
2c0kynδ+1−ynδk2≤ hµδn+1−µδn, ynδ+1−yδni=ρ2hsδn+1, ynδ+1−yδni
=ρ2hsδn+1, ynδ+1−W xδn+1+W xδn+1−W xδn+W xδn−ynδi
=ρ2hsδn+1, sδn−sδn+1i+ρ2hsδn+1, W(xδn+1−xδn)i.
By using (3.21) in Lemma3.4, we then obtain
2c0kynδ+1−ynδk2≤ρ2hsδn+1, snδ −sδn+1i+ρ1hrnδ+1, rδn−rnδ+1i
+ρ2h(ynδ−yδn+1)−(ynδ−1−yδn), W(xδn+1−xδn)i.
By the definition ofsδn, we have
sδn+1−sδn=W xδn+1−ynδ+1−(W xδn−yδn) =W(xδn+1−xδn)−(yδn+1−yδn).
Therefore, we can deduce that
2c0kynδ+1−ynδk2≤ρ1hrδn+1, rnδ−rδn+1i+ρ2hsδn+1, sδn−sδn+1i
+ρ2h(ynδ−yδn+1)−(ynδ−1−yδn),(sδn+1−snδ) + (ynδ+1−ynδ)i
=−ρ1 2(kr
δ
n+1k2− krδnk2+krnδ+1−rnδk2)
− ρ2 2 (ks
δ
n+1k2− ksδnk2+ksδn+1−sδnk2)
− ρ2 2 (ky
δ
n+1−ynδk2− kynδ−yδn−1k2+k(ynδ+1−yδn)−(ynδ−yδn−1)k2)
−ρ2h(ynδ+1−yδn)−(ynδ−ynδ−1), sδn+1−sδni (3.22)
Consequently, by introducing
Enδ =ρ1krδnk2+ρ2ksnδk2+ρ2kynδ −ynδ−1k2,
we have
2c0kynδ+1−yδnk2 ≤Enδ−Enδ+1−
ρ1
2 kr
δ
−ρ2 2 k(y
δ
n+1−yδn)−(ynδ−ynδ−1) + (sδn+1−sδn)k2. (3.23)
Summing the both sides over nfrom nto∞ we obtain
2c0
∞ X
k=n
2c0kykδ+1−ykδk2+ ρ1
2
∞ X
k=n
krδk+1−rkδk2
+ρ2 2
∞ X
k=n
k(ykδ+1−ykδ)−(ykδ−ykδ−1) + (sδk+1−sδk)k2 ≤Enδ (3.24)
Lemma 3.5. {Enδ} is monotonically decreasing along the iteration and there is a con-stant C such that
∞ X
k=n
kyδk+1−yδkk2+kxδk+1−xδkk2+kK(xδk+1−xδk)k2+kW(xδk+1−xδk)k2≤CEnδ
for all n≥0.
Proof. the monotonicity of {Enδ} follows from (3.23). From (3.24) we can derive that
∞ X
k=n
kykδ+1−ykδk2 ≤CEnδ (3.25)
∞ X
k=n
krδk+1−rδkk2≤CEnδ (3.26)
∞ X
k=n
kW(xδk+1−xδk)−(ykδ−ykδ−1)k2≤CEnδ (3.27)
Since ka+bk2≤2(kak2+kbk2), by using (3.25) and (3.27) we can obtain
∞ X
k=n
kW(xδk+1−xδk)k2≤CEδ n.
From (3.26), we have
∞ X
k=n
kK(xδk+1−xδk)k2 =
∞ X
k=n
krδk+1−rδkk2≤CEδ n.
In view of Assumption4, we can derive
Therefore we can obtain
∞ X
k=n
kxδk+1−xδkk2 ≤CEnδ.
The proof is complete.
3.2
Convergence analysis of exact data case
In this section, we give the convergence analysis of the ADMM (3.5) for exact data
case. We will assume that bis a consistent data and let (ˆx,yˆ) denote any feasible point
of (3.3), i.e. ˆx ∈ dom(W) and ˆy ∈ dom(f) with Kxˆ = b and Wxˆ = ˆy. We will use
{xn, yn, λn, µn}to denote the sequence defined by (3.5).
Lemma 3.6. The sequences {xn} and {yn} are bounded and ∞
X
n=1
{Dµnf(yn+1, yn) +En}<∞ (3.28)
Proof. According to (3.17) and Lemma3.4, we can deduce that
Dµk+1f(ˆy, yk+1)−Dµkf(ˆy, yk) +Dµkf(yk+1, yk)
=hµk−µk+1,yˆ−yk+1i=−ρ2hsk+1, W(ˆx−xk+1) +sk+1i
=−ρ2ksk+1k2−ρ2h(yk−yk+1)−(yk−1−yk), W(ˆx−xk+1)i+ρ1hrk+1, K(ˆx−xk+1)i
=−ρ1krk+1k2−ρ2ksk+1k2+ρ2hyk−1−yn, W(ˆx−xk+1)i
−ρ2hyk−yk+1, W(ˆx−xk+1)i. (3.29)
After summing the above equality fromk=m tok=n−1, we can obtain
Dµnf(ˆy, yn)−Dµmf(ˆy, ym) +
n−1
X
k=m
Dµkf(yk+1, yk)
=−
n X
k=m+1
(ρ1krkk2+ρ2kskk2) +ρ2hym−1−ym, W(ˆx−xm+1)i
+ρ2
n−1
X
k=m+1
We need to estimate the last two terms of (3.30). For the fist term Pn−1
k=m+1hyk−1−
yk, W(xk−xk+1)i, by the Cauchy-Schwarz inequality and Lemma 3.5it yields
n−1
X
k=m+1
hyk−yk−1, W(xk+1−xk)i ≤ n−1
X
k=m+1
(kyk−yk−1k2+kW(xk+1−xk)k2)≤CEm.
For the second term, we have
− hyn−1−yn, W(ˆx−xn)i
=−hyn−1−yn, W(ˆx−xm)i − n−1
X
k=m
hyn−1−yn,−W(xk+1−xk)i
≤ kW(ˆx−xm)k2+ n−1
X
k=m
kW(xk+1−xk)k2+ (n−m+ 1)kyn−1−ynk2
≤ kW(ˆx−xm)k2+CEm,
where, for the last inequality, we used Lemma3.5.
Choosing m = 1 in (3.30). It is then easy to deduce that there exists a constant C1
independent to nsuch that
Dµnf(ˆy, yn) +
n−1
X
k=1
Dµkf(yk+1, yk) +ρ1
n X
k=2
krkk2+ρ2
n X
k=2
kskk2
≤Dµ1f(ˆy, y1) +ρ2kW(ˆx−x1)k
2+CE
1≤C1. (3.31)
This implies that
∞ X
k=1
(Dµkf(yk+1, yk) +ρ1
∞ X
k=1
krkk2+ρ2
∞ X
k=1
kskk2≤C1 <∞.
By (3.25) and the above inequality, it is natural to see that P∞k=1Ek < ∞. From
(3.31) it follows that Dµnf(ˆy, yn)< C1. Since f is strong convex, we havec0kˆy−yk
2 ≤
Dµnf(ˆy, yn) < C1, which means {yn} is bounded. Furthermore, since
P∞
k=1Ek < ∞,
we obtain that as k → ∞, Ek → 0, which means rk → 0, sk → 0 and yk−yk+1 → 0.
Therefore we can conclude that Kxk → b, W xn −yn → 0 and yk −yk+1 → 0 as
k→ ∞.Thus{Kxk}and{W xk}are both bounded. By Assumption4, we can conclude
that{xk}is bounded.
Lemma 3.7. Denote (ˆx,yˆ) as the feasible point of (3.3). Then {Dµkf(ˆy, yk} is a
Proof. Let 0< m < n be any two integers. By using (3.30) we have
|Dµnf(ˆy, yn)−Dµmf(ˆy, ym)|
≤
n−1
X
k=m
Dµkf(yk+1, yk) +
n X
k=m+1
(ρ1krkk2+ρ2kskk2) +ρ2|hym−1−ym, W(ˆx−xm+1)i|
+ρ2
n−1
X
k=m+1
|hyk−1−yk, W(xk−xk+1)i|+ρ2|hyn−1−yn, W(ˆx−xn)i|
≤
n X
k=m
(Dµkf(yk+1, yk) +ρ1krkk
2+ρ
2kskk2) + 2ρ2CEm (3.32)
+ρ2kym−1−ymkkW(ˆx, xm+1)k+ρ2kyn−1−ynkkW(ˆx−xn)k (3.33)
By Lemma (3.6), we can deduce that |Dµnf(ˆy, yn)−Dµmf(ˆy, ym)| →0, as m, n→ ∞.
Therefore{Dµkf(ˆy, yk} is a Cauchy sequence, hence{Dµkf(ˆy, yk}is convergent.
Theorem 3.8. Assume b is consistent and the assumptions (1) to (4) in the previous chapter hold. Let x∗ be the unique solution of 3.1, and let y∗ = W x∗, then for the
ADMM, we have as k→ ∞,
xk→x∗, yk→y∗, W xk→y∗, f(yk)→f(y∗) and Dµkf(y
∗, y
k)→0 (3.34)
Proof. The first part is to show {yk} is a Cauchy sequence.
As above, we denote (ˆx,yˆ) as the feasible points of (3.3). And we already know
Dµmf(yn, ym) =Dµmf(ˆy, ym)−Dµnf(ˆy, yn) +hµn−µm, yn−yˆi (3.35)
By equation (3.17) and Lemma3.4, we can deduce
hµn−µm, yn−yˆi
=
n−1
X
k=m
hµk+1−µk, yn−yˆi=ρ2
n−1
X
k=m
hsk+1, yn−yˆi
=ρ2hsk+1,sˆi −ρ2
n−1
X
k=m
hsk+1, sni+ρ2
n−1
X
k=m
hsk+1, W(xn−xˆ)i
=−
n−1
X
k=m
(ρ2hsk+1, sni+ρ1hrk+1, rni) +ρ2hyn−1−yn, W(xn−xˆ)i
By the Cauchy-Schwarz inequality, we can deduce
|hµn−µm, yn−yˆi| ≤ n−1
X
k=m
(ρ2|hsk+1, sni|+ρ1|hrk+1, rni|)
+ρ2|hyn−1−yn, W(xn−xˆ)i|+ρ2|hym−1−ym, W(xn−xˆ)i|
≤ 1 2
n X
k=m+1
(ρ1krkk2+ρ2kskk2)+
+ρ2|hyn−1−yn, W(xn−xˆ)i|+ρ2|hym−1−ym, W(xn−xˆ)i| (3.37)
Since Ek is monotonically decreasing along the iteration, then we can have further
deduction such that
|hµn−µm, yn−yˆi|
≤ 1 2
n X
k=m+1
Ek+ n−m
2 (ρ1krnk
2+ρ
2ksnk2)
+ρ2hyn−1−yn, W(xn−xˆ)i+ρ2|hym−1−ym, W(xn−xˆ)i|
≤
n X
k=m+1
Ek+ρ2hyn−1−yn, W(xn−xˆ)i+ρ2|hym−1−ym, W(xn−xˆ)i|
≤
n X
k=m+1
Ek+ρ2(kyn−yn−1k2+kym−ym−1k2+ 2kW(xn−xˆ)k2) (3.38)
According to Lemma 3.5, Lemma 3.6 and (3.38), we have hµn−µm, yn−yˆi → 0 as
m, n → ∞. And according to Lemma 3.7, we have Dµmf(ˆy, ym) −Dµnf(ˆy, yn) as
m, n → ∞. Thus by (3.35), we obtain that Dµmf(yn, ym) → 0 as m, n → ∞. Since f
is strong convex function, from (3.9) we can obtainkyn−ymk →0 as m, n→ ∞. Thus
{yk} is a Cauchy sequence. Therefore, there is ˜y ∈Y such thatyk→y˜ask→ ∞.
The second part is to show there is{xk} is Cauchy sequence.
Using Lemma 3.6, and yk →y˜ from above, we obtain W xk → y˜and Axk → b. Under
Assumption 4, we have
c1kxn−xkk2 ≤ kKxn−Kxkk2+kW xn−W xkk2 (3.39)
for any integers n, k. Hence, kxn−xkk →0 as n, k→ ∞. Therefore {xk} is a Cauchy
sequence inX and also in dom(W). Consequently, there is ˜x∈X such thatxk→x˜ as
k→ ∞. Furthermore, we haveb= limk→∞Kxk=Kx˜. SinceW is closed operator, and
{xk} is a sequence in dom(W), we have ˜x∈dom(W), Wx˜= ˜y.
The third part is to show ˜y∈dom(f),limk→∞f(yk) =f(˜y) and limk→∞Dµkf(˜y, yk) =
Since µk∈∂f(yk), we have
f(yk)≤f(ˆy) +hµk, yk−yˆi. (3.40)
Using equation (3.38), we have
hµk, yk−yˆi ≤ hµ1, yk−yˆi+ k X
i=2
Ei
+ρ2kyk−1−ykkkW(xk−xˆ)k+ρ2ky0−y1kkW(xk−xˆ)k (3.41)
Combining these two inequalities, and using Lemma 3.6, we have
f(yk)≤C <∞ (3.42)
whereC are some constants independent ofk.
Obviously, from above, we can easily see (˜x,y˜) is one feasible point of (3.3). Using (3.38)
and yk→y, W x˜ k→Wx˜, we have
lim sup
k→∞
|hµk, yk−y˜i| ≤ ∞ X
i=m+1
Ei (3.43)
for all integersm.
Combining equation (3.43) and Lemma3.6, we deduce thathµk, yk−y˜i →0 ask→ ∞.
Since (˜x,y˜) is one feasible point of (3.3), we can replace ˆyby ˜y. Thus lim supk→∞f(yk)≤
f(˜y). Combining it with (3.42) implies limk→∞f(yk) =f(˜y). Therefore
lim
k→∞Dµkf(˜y, yk) = 0
The final part is to show ˜x=x∗,y˜=y∗.
Let > 0 be any small number. Since hµn−µm, yn−yˆi → 0 as m, n → ∞, and by
Lemma (3.6), there existk0 such that
|hµk−µk0, yk−yˆi| ≤, ρ2|hyk0−1−yk0, W(xk−xˆ)i| ≤
fork0≤k. From equation (3.40), we can get
By equation (3.17) and Lemma (3.4), we have
hµk0, yk−yˆi=−hµk0, ski+hµ1, W(xk−xˆ)i+ρ2 k0 X
i=2
hsi, W(xk−xˆ)i
=−hµk0, ski − hλ1, rki+ρ2hy0−y1, W(xk−xˆ)i − k0 X
i=2
ρ1hri, rki
+
k0 X
i=2
ρ2h(yi−1−yi)−(yi−2−yi−1), W(xk−xˆ)i
=−hµk0, ski − hλ1, rki − k0 X
i=2
ρ1hri, rki+ρ2hyk0−1−yk0, W(xk−xˆ)i
Thus it is obvious that
|hµk0, yk−yˆi| ≤ kµk0kkskk+kλ1kkrkk+krkkρ1 k0 X
i=2
krik+
Therefore it is obvious that
f(yk)≤f(ˆy) + 2+kµk0kkskk+kλ1kkrkk+krkkρ1 k0 X
i=2
krik
By Lemma3.6, we deduce that
lim sup
k→∞
f(yk)≤f(ˆy) + 2
Since f is a lower semi-continuity, andyk →y˜, we get
f(˜y)≤lim inf
k→∞f(yk)≤lim supk→∞f(yk)≤f(ˆy) + 2.
Hence f(˜y)≤f(ˆy). Since (˜x,y˜) is a feasible point of (3.3), thus we obtain
f(Wx˜) =f(˜y) =f(y∗) =f(W x∗) = minf(ˆy) = min{f(W x) :x∈dom(W), Kx=b}
By Theorem 3.1, we can conclude ˜x=x∗ and hence ˜y =y∗.
3.3
Regularization: noisy data case
In the real world, our data are always obtain by measurement which must contain errors.
We set
kbδ−bk ≤δ
where δ is a small positive number to represent the noise level. In this section we
consider Algorithm 4 with noisy data bδ. Due to the ill-posedness of inverse problems,
this algorithm exhibits semi-convergence property, i.e. the iterate converges toward the
sought solution at the beginning, and, after a critical number of iterations, the iterate
eventually diverges from the sought solution due to the amplification of noise. Thus a
stopping rule should be introduced to terminate the method so that a regularization
property can be ensured.
Rule 1. Let τ >1 be a fixed number and define kδ to be the integer such that
Ekδδ ≤max{ρ1, ρ2}τ2δ2 < Ekδ, 0≤k < kδ.
We will show the regularization property of Algorithm 4terminated by Rule 1.
Lemma 3.9. Rule1 defines a finite integerkδ. Moreover, there exist positive constants c and C such that
Dµδ nf(ˆy, y
δ n) +c
n X
k=m
Ekδ≤Dµδ mf(ˆy, y
δ
m) +ρ2hymδ−1−ymδ, W(ˆx−xδm+1)i
+C(kW(ˆx−xδm)k2+ksδmk2+Emδ) (3.44)
Proof. Similar to the process of deriving (3.29), and using the assumption thatkbδ−bk ≤
δ, we can also derive
Dµδ
k+1f(ˆy, y δ
k+1)−Dµδ kf(ˆy, y
δ
k) +Dµδ kf(y
δ k+1, ykδ)
=−ρ1krδk+1k2−ρ2ksδk+1k2+ρ2hykδ−1−ykδ, W(ˆx−xδk+1)i
−ρ2hykδ−yδk+1, W(ˆx−xδk+1)i+ρ1hrδk+1, b−bδii
≤ −ρ1krδk+1k2−ρ2ksδk+1k2+ρ2hykδ−1−ykδ, W(ˆx−xδk+1)i
−ρ2hykδ−yδk+1, W(ˆx−xδk+1)i+ρ1δkrkδ+1k (3.45)
Under the stop criterion, we can find that for 1≤k < kδ−1,
ρ1krδk+1kδ≤
p
ρ1krk+1k2+ρ2ksk+1k2
s
ρ1Ekδ
τ2max(ρ 1, ρ2)
≤ 1
τE δ