Parallel Total Variation Minimization. Diplomarbeit

(1)

Institut f¨

ur Numerische und Angewandte Mathematik

Parallel Total Variation Minimization

Diplomarbeit

eingereicht von

Jahn Philipp M¨

uller

betreut von

Prof. Dr. Martin Burger

Prof. Dr. Sergei Gorlatch

M¨unster 03.11.2008

(2)

Abstract

In [ROF92] Rudin, Osher and Fatemi introduced a denoising algorithm using total variation regularization. In this work we provide a parallel algorithm for this varia-tional minimization problem. It is based on the primal-dual formulation and hence leads to solve the saddle point problem for the primal and dual variable. For that reason Newton’s method with damping was used. The arising constraint for the dual variable is approximated with a penalty method.

We apply domain decomposition methods to divide the original problem into several subproblems. The transmission conditions arising at the interfaces of the subbo-mains are handled via an overlapping decomposition and the well-known Schwarz method.

To make the Message Passing Interface (MPI) available in MATLABR_{we use} Mat-labMPI, a set of scripts provided by the MIT. The numerical results show a good convergence behavior and an excellent speedup of computation.

(3)

6 Basic Parallel Principles 41 6.1 MPI . . . 41 6.1.1 MatlabMPI . . . 42 6.2 Speedup . . . 43 7 Numerical Realization 44 7.1 Sequential Implementation . . . 45 7.2 Schur Complement . . . 48 7.3 Parallel Implementation . . . 48 7.3.1 Additive Version . . . 50 7.3.2 Multiplicative Version . . . 51 7.3.3 Remarks . . . 51 8 Results 52 8.1 Convergence Results . . . 52

8.2 Computation Time and Speedup . . . 60

(5)

List of Figures

1.1 Parallel Computing . . . 2

3.1 Anisotropic vs. isotropic total variation . . . 16

4.1 Barrier method vs. Penalty method, ε = 0.1 . . . 32

4.2 Barrier method vs. Penalty method, ε = 10−5 _{. . . 32}

5.1 Non overlapping domain decomposition . . . 35

5.2 Poisson equation with artifacts . . . 36

5.3 Overlapping domain decomposition . . . 36

5.4 Poisson equation solved with the multiplicative Schwarz method . . . 37

5.5 Coloring of decompositions with two colors . . . 38

5.6 Coloring of decompositions with four colors . . . 38

7.1 The degrees of freedom of the primal- dual problem . . . 45

7.2 Sequential version . . . 49

7.3 Parallel version with four processors . . . 49

7.4 Simple decomposition coloring with two colors . . . 51

8.1 Original and noisy test image . . . 52

8.2 Iterations of the sequential algorithm . . . 53

8.3 Iterations of the multiplicative parallel algorithm (2 CPUs) . . . 54

8.4 Iterations of the additive parallel algorithm (4 CPUs) . . . 54

8.5 Iterations of the multiplicative parallel algorithm (32 CPUs) . . . 55

8.6 Iterations of the additive parallel algorithm (32 CPUs) . . . 55

8.7 Additive algorithm, image size 16×16 . . . 56

8.8 Multiplicative algorithm, image size 16×16 . . . 57

8.9 Additive algorithm, image size 64×64 . . . 58

8.10 Multiplicative algorithm, image size 64×64 . . . 59

8.11 Computation time . . . 61

(6)

Notation

R R_{∪ {+∞} ∪ {−∞}}

U∗ _{dual space of} _U

Lp _{Lebesgue space, i.e. space of} _p_{-power integrable functions}

L1_loc space of local integrable functions

BV space of functions of bounded variation

W(1,1) Sobolew space of functions with weak derivatives in L1 C∞

0 space of infinity times differentiable functions with compact

support

Hdiv functions in L2 with weak divergence in L2

dJ(u;v) directional derivative of J atu in direction v

χK indicator function of the set K in terms of convex analysis, i.e. χK(x) = 0 if x∈K and ∞ otherwise

1K indicator function of the set K, i.e. 1K(x) = 1 if x ∈ K and 0 otherwise

sgn(p) signum function

domJ effective domain of a functional J J∗ _{convex conjugate of a functional} _J

J∗∗ _{biconjugate of a functional} _J

uk⇀ u weak convergence

uk⇀∗ u weak* convergence | · |T V total variation norm

k · k2 L2 norm

k · klp natural norm of the sequence space lp

ΠK(x) projection ofx to K

Ω closure of the set Ω

∂Ω boundary of the set Ω

⊕ direct sum

hu, vi duality product of u∈ U and v ∈ U∗

∂J(u) subdifferential of J at point u

(7)

Acknowledgments

First of all I would like to thank Prof. Dr. Martin Burger for giving me the oppor-tunity to work on this challenging and interesting topic, and for taking his time for assisting me with my problems and answering all my questions.

Additionally I thank Prof. Dr. Sergei Gorlatch for being my co-advisor and also his doctoral candidate Mareike Schellmann, especially for her help in the final phase of this thesis.

I further like to thank

• Martin Benning, Oleg Reichmann, Martin Drohmann, Alex Sawatzky and Christoph Brune for many helpful discussions and proof-reading this thesis. • all staff members of the Institute for Computational and Applied Mathematics.

I had a great time working here.

• all my friends who have supported me during the last years.

Last but not least I would like to thank my whole family for all the support through-out the time of my studies.

(8)

Chapter 1 Introduction

The subject of this thesis is the parallelization of nonlinear imaging algorithms, par-ticularly in the case of total variation minimization. We will limit ourselves to the consideration of the ROF (Rudin-Osher-Fatemi) model, introduced in [ROF92], but the concept might be easily adapted to other models based on convex variational problems with gradient energies.

Since total variation regularization provides some advantageous properties, like pre-serving edges, it is one of the most widely used denoising methods today [CS05]. It is, e.g. used in combination with the EM (Expectation Maximization) algorithm [SBW+_{08], [BSW}+_{] for reconstructing measured PET}1_{- data.}

Parallelization is getting more and more important in many applications. Especially in imaging it is desirable to extend the existing 2D algorithms to the three dimen-sional cases. But due to the enormous calculation effort, currently used workstations reach their technical limitations. One expedient is to divide the original problem into several subproblems, solve them independently on several CPU’s and merge them together to a solution of the complete problem (see Fig. 1.1 for an illustration). This can be done in parallel and promises a speedup for the computation. More im-portant, due to reduction of the problem size, restrictions by technical requirements (i.e. too low main memory) become negligible. Unfortunately data dependences may arise between the subproblems necessitating a communication between the CPU’s. Neglecting these dependences results in undesirable effects at the interfaces of the divisions. Hence parallel algorithms are needed to handle this issue and provide a solution coinciding to the original one.

In Chapter 2 we are going to provide some definitions and results of the theory of convexity, duality, and optimization which we will need for the analysis of the 1_{Positron emission tomography (PET) is a nuclear medicine imaging technique, where pairs of} gamma photon annihilations of an injected radioactive tracer isotope are measured.

(9)

Chapter 1: Introduction

(a) Sequential (b) Parallel

Figure 1.1: Instead of computing the whole problem on one CPU (a), one can divide

it into several subproblems and solve each of them on a different CPU (b). Probably communication between the CPU’s is necessary.

ROF model. The latter will be introduced in Chapter 3 where also the different formulations (primal, dual and primal-dual) will be discussed in detail. In Chapter 4 we will give an overview of existing solution methods for the ROF model and subsequently discuss a primal-dual Newton method, which is the basis of our parallel algorithm.

A short introduction into domain decomposition and especially to the well-known Schwarz methods will be presented in Chapter 5. Since we use the Message Passing Interface (MPI) for the parallel implementation, we are going to illustrate the con-cept of MPI and how it can be made available in MATLABR_{in Chapter 6. We will} also mention some aspects of parallelization and speedup.

The numerical realization of the proposed primal-dual algorithm will be explained in Chapter 7 where also two similar parallel versions will be presented. Finally in Chapter 8 the convergence results as well as the attained speedup will be illustrated.

(10)

Chapter 2 Mathematical Preliminaries

In this chapter we will provide some mathematical background needed later in this thesis. Since we are interested in finding (unique) global minima of (strictly) convex functionals we will state how these minima can be computed if they exist. Therefore we will introduce the concept of derivatives and convexity as well as some important properties of duality. We will mainly follow [Bur03].

2.1 Derivatives

Similar to functions in Rn _{we want to introduce a concept of derivatives for} func-tionals defined on Banach spaces.

Definition 2.1. LetJ :U → V be a continuous nonlinear operator where U,V are Banach spaces. Thedirectional derivative of J at a pointu in direction v is defined as

dJ(u;v) := lim t_↓0

J(u+tv)−J(u)

t ,

if the limit exists.

J is called Gˆateaux-differentiable atu if dJ(u, v) exists for all v ∈ U and dJ(u, .) is called Gˆateaux-derivative. If additionally dJ(u, .) :U → V is continuous and linear,

J is calledFr´echet-differentiable with Fr´echet-derivative

J′(u)v :=dJ(u;v) ∀v ∈ U.

The second Fr´echet-derivative is defined by

J′′(u)(v, w) := lim t_↓0

J′(u+tw)v−J′(u)v

(11)

Chapter 2: Mathematical Preliminaries

In an analogous way higher derivatives can be defined inductively. The directional derivation is also called first variation.

Remark 2.1. Note that the directional derivative dJ(u;v) equals Φ′₍_t_)|

t=0 with

Φ(t) :=J(u+tv).

Example 2.1. LetJ :L2(Ω)→R₊ _{be defined by}

J(u) := λ 2

Z

Ω

(u−f)2dx

with f ∈L2(Ω), λ ∈R_{and Ω}_⊂Rn _{being open and bounded. We set}

Φ(t) :=J(u+tv) = λ 2

Z

Ω

(u+tv−f)2dx

with an arbitrary v ∈L2(Ω). Differentiating leads to

Φ′(t) = λ 2 d dt Z Ω (u+tv−f)2dx = λ 2 Z Ω d dt(u+tv−f) 2_dx = λ Z Ω (u+tv−f)v dx

and we obtain the first Gˆateaux-derivative ofJ as

dJ(u;v) = Φ′(0) =λ

Z

Ω

(u−f)v dx=J′(u)v.

Since dJ(u;.) is continuous and linear, J is Fr´echet-differentiable.

With ˜Φ(t) :=J′(u+tw)v we can compute the second Fr´echet-derivative as ˜ Φ′(t)|t=0=λ Z Ω wv dx=J′′(u)(v, w).

2.2 Convexity

We will see that convex functionals provide some advantageous properties like the concept of subdifferentials or the uniqueness of global minima in the case of strict convexity.

Definition 2.2. A set C ⊂ U is called convex, if for all α∈[0,1] and u, v∈ C:

(12)

Let U be a Banach space and C ⊂ U convex. A functional J : C → R _{is called} convex, if for all α∈[0,1] andu, v ∈ C:

J(αu+ (1−α)v)≤αJ(u) + (1−α)J(v). (2.1) If the inequality (2.1) holds strictly (except for u = v or α ∈ {0,1}), J is called strictly convex.

An optimization problem

J(u)→min u_∈C is called convex, ifJ as well as C is convex.

Example 2.2. The indicator function of a convex set C is convex. Let

J(u) =χ_C(u) :=    0 if u∈ C +∞ else then αJ(u) + (1−α)J(v) =    0 if u, v ∈ C +∞ else and J(αu+ (1−α)v) =    0 if αu+ (1−α)v ∈ C +∞ else.

Therefore J(αu+ (1−α)v) > αJ(u) + (1−α)J(v) would only be possible in the case where u, v ∈ C. But then αu+ (1−α)v ∈ C, due to the convexity of C and hence

J(αu+ (1−α)v) = 0 =αJ(u) + (1−α)J(v).

Remark 2.2. Due to the fact that ˜J defined by ˜ J(u) :=    J(u) if u∈ C ∞ else

(13)

defined on the whole space U.

In order to generalize differentiability for non Fr´echet-differentiable convex function-als, we introduce the concept of subgradients:

Definition 2.3. LetU be a Banach space with dual U∗ _and _J _:_{U →}R _{be convex.} Then the subdifferential ∂J(u) at a point u is defined as:

∂J(u) :={p∈ U∗|J(w)≥J(u) +hp, w−ui,∀w∈ U}. (2.2) J is called subdifferentiable atu if ∂J(u) is not empty.

An element p∈∂J(u) is called subgradient of J at pointu.

Example 2.3. As an example we take a look at the Euclidean norm f : Rn _→ R_{, f}₍_x_{) =} _|_x_{|. Although it is not differentiable in} _x _{= 0, it is subdifferentiable at} every x∈Rn_{. The subdifferential is given by}

∂f(x) =    {ˆx∈Rn_{| |}_w_{| ≥ hˆ}_{x, w}_i_,_∀_w_∈_Rn_} _if _x_{= 0} x |x| else.

Thus in x = 0 the subdifferential consists of the whole Euclidean unit ball, for instance the interval [−1,1] in the case n= 1.

Remark 2.3. It can easily be seen, that if J : U → R _{is a convex} Fr´echet-differentiable functional then

∂J(u) ={J′(u)}

holds (see [Bur03, Proposition 3.6]). In general it is not true that∂J(u) is a singleton, as we have seen in Example 2.3.

We now want to state a criterion for strict convexity.

Theorem 2.1. LetC ⊂ U be open and convex and let the functionalJ :C →R _be twice continuously Fr´echet-differentiable. Then, J′′(u)(v, v) > 0 for all u ∈ C and

v ∈ U\{0}implies strict convexity of J. For a proof see [Bur03, Proposition 3.3].

One tremendous advantage of strictly convex functionals is the uniqueness of a global minimum.

Theorem 2.2. Let J :U →R _and

J(u)→min u∈U

(14)

be a strictly convex optimization problem. Then there exists at most one local minimum, which is a global one.

Proof. Let u be a local minimum of J and assume that it is no global minimum. Then there exists ˆu∈ U with J(ˆu)< J(u). Let us define

uα :=αuˆ+ (1−α)u∈ U for all α∈[0,1]. Due to (strict) convexity of J

J(uα)≤αJ(ˆu) + (1−α)J(u)< J(u).

Since uα →u as α→0, this is a contradiction to u being a local minimum. Hence

u is a global minimum.

Now let u, v be two global minima of J. For u6=v this implies

J(αu+ (1−α)v)< αJ(u) + (1−α)J(v) = infJ, for α∈]0,1[, which is a contradiction to the assumption.

2.3 Duality

The concept of duality is very important in the theory of optimization. Instead of considering the given primal problem, one can deduce the complementary dual problem, which may be easier to solve. Let us recall some definitions first.

Definition 2.4. A functional J :U →R _{is called} _proper _if ∀u∈ U J(u)6=−∞

and ∃u∈ U J(u)6= +∞.

The set

domJ :={u∈ U |J(u)<∞} is called theeffective domain of J.

In the following we consider proper functionals only. We are also in the need for a weaker concept of continuity:

(15)

J is calledupper semi-continuous if −J is lower semi-continuous.

Obviously a functional is continuous at a pointu if and only if it is upper and lower semi-continuous atu.

Definition 2.6. LetJ :U →R_{(not necessarily convex), then the}_{convex conjugate} (orLegendre-Fenchel transform)J∗ _:_U∗ _→R _{is defined by}

J∗(p) := sup u_∈U

{hu, pi −J(u)}.

Example 2.4. Consider again the indicator function of a convex set C:

J(u) =χ_C(u) :=    0 if u∈ C +∞ else. Then J∗(v) = sup u_∈U {hu, vi −χ_C(u)} = sup u∈C {hu, vi}.

As its name implies, the convex conjugate is always convex.

Lemma 2.1. Let U be a Banach space and J : U → R_{. Then} _J∗ _{is convex and} lower semi-continuous.

Proof. The convex conjugate of J is given by

J∗(p) = sup u_∈U hu, pi −J(u) = sup u∈domJ hu, pi −J(u)

i.e. J∗ is the point wise supremum of the family of continuous affine functions hu,·i −J(u) with u∈domJ

of U∗ _into R _{and hence} _J∗ _{is lower semi-continuous and convex [ET76, Definition} 4.1, p.17].

One may also build the biconjugate J∗∗ (as the convex conjugate of the convex conjugate) and achieve the following result:

(16)

Theorem 2.3. LetU be a reflexive Banach space (i.e. U =U∗∗_),_J _:_{U →}R_and_J∗∗ its biconjugate. Then J =J∗∗ if and only if J is convex and lower semi-continuous. Proof. Let J be convex and lower semi-continuous and ¯u ∈ U arbitrary. We will show thatJ∗∗(¯u) =J(¯u). J∗∗(¯u) = sup p∈U∗ hp,u¯i −J∗(p) = sup p_∈U∗ hp,u¯i −sup v_∈U hp, vi)−J(v) = sup p_∈U∗ inf v_∈U hp,¯u−vi+J(v) | {z } ≤hp,u¯−u¯i+J(¯u)=J(¯u)∀p ≤ J(¯u)

Since J is proper their exists ¯a ∈ R _{with ¯}_{a < J}_(¯_u_{). Take such an ¯}_a _arbitrary. Furthermore, due to the convexity and lower semi-continuity, the epigraph of J

epiJ ={(u, a)∈ U ×R_|_J₍_u₎_≤_a_}

is a closed convex set (cf. [ET76, Proposition 2.1 and 2.3]) which does not contain the point (¯u,¯a). Hence, applying the Hahn-Banach Theorem, we can strictly separate the epigraph of J and the point (¯u,¯a) by a closed affine hyperplane H of U ×R given by

H={(u, a)∈ U ×R_{| h}_{q, u}_i₊_αa₌_β_} with α , β ∈R _{. We thus have:}

hq,u¯i+α¯a < β (2.3)

hq, ui+αa > β ∀(u, a)∈epiJ. (2.4)

If J(¯u) < ∞ we can take u = ¯u and a = J(¯u) in (2.4) and achieve together with (2.3): hq,u¯i+αJ(¯u)> β >hq,u¯i+αa.¯ This implies α J(¯u)−¯a | {z } >0 >0

and hence α >0. When (2.4) is divided byα we can conclude

(17)

and with p=−_αq we obtain:

J∗∗(¯u) = sup p∈U∗ inf v∈U hp,u¯−vi+J(v) = sup p∈U∗ inf v∈U hp,u¯i+J(v)− hp, vi | {z } >_αβ ≥ sup p∈U∗ inf v∈U hp,u¯i+ β α = sup p_∈U∗ −1 αhq,u¯i+ β α (2.3) ≥ sup p∈U∗ 1 α(α¯a−β) + β α = ¯a.

Hence J∗∗(¯u)≥a¯for all ¯a < J(¯u) which implies

J∗∗(¯u)≥J(¯u).

If J(¯u) = +∞ then, by letting ¯a tend to +∞ (resp. ¯a to −∞) for α >0 (α < 0), (2.3) yields α= 0. Thus we have (cf. (2.3) and (2.4)):

hq,u¯i < β (2.5)

hq, ui > β ∀u∈domJ. (2.6)

Now let

β− hq, ui=γ <0 (2.7)

and p=−cq, with γ , c ∈R_{. Then}

J∗∗(¯u) = sup p_∈U∗ inf v_∈U hp,u¯−vi+J(v) = sup p_∈U∗ inf v_∈U hp,u¯i+J(v)− hp, vi = sup p_∈U∗ inf v_∈U −chq,u¯i | {z } 2.5 > ₋cβ +c hq, vi | {z } 2.7₌ _β −γ +J(v) ≥ sup p_∈U∗ inf v_∈U −cβ +cβ −cγ+J(v) = sup p_∈U∗ inf v_∈U J(v)−cγ

(18)

Since γ <0, −cγ is tending to ∞ forc→ ∞ and hence

J∗∗(¯u)≥ ∞=J(¯u).

In turn assume J not to be convex and lower semi-continuous. Since Lemma 2.1 yields the convexity and lower semi-continuouity of J∗∗ = (J∗)∗ _, _J _{can not equal}

toJ∗∗.

Example 2.5. In Example 2.4 we have computed the convex conjugate of J(u) =

χ_C(u) asJ∗₍_v_{) = sup}

u_∈C{hu, vi}. With the convexity of J(u) (see Example 2.2) we achieve

J∗∗(u) =J(u)

and hence the convex conjugate of supu_∈C{hu, vi}is given by χC(u).

Lemma 2.2. Let J :U → R _{(not necessarily convex) and} _J∗ _:_U∗ _→ R _{its convex} conjugate, then

p∈∂J(u)⇒u∈∂J∗(p).

Proof. Letp∈∂J(u) then by definition

J(w)≥J(u) +hp, w−ui (2.8) holds for all w∈ U. Let v ∈ U∗ _{be arbitrary, then}

J∗(p) +hu, v−pi = sup w_∈U hp, wi −J(w)+hu, v−pi = sup w∈U hp, w−ui −J(w) +hu, vi (2.8) ≤ sup w∈U −J(u) +hu, vi ≤ sup u_∈U −J(u) +hu, vi = J∗(v)

holds. Since v was chosen arbitrarily we have J∗(v) ≥ J∗(p) +hu, v−pi, ∀v ∈ U∗

which is equivalent to u∈∂J∗(p).

Note that if J is convex and lower semi-continuous, then in Lemma 2.2 equivalence holds. This follows from u∈∂J∗(p)⇒p∈∂J∗∗(u) and J =J∗∗.

(19)

2.4 Optimization

To conclude this chapter we want to state how to obtain existence of a global mini-mum and which optimality conditions have to be fulfilled for convex functionals. The fundamental theorem of optimization provides the existence of a global mini-mum from lower semi-continuity in combination with compactness.

Theorem 2.4. Let J : U → R _{be a proper, lower semi-continuous functional and} let there exist a non-empty and compact level set

S :={u∈ U |J(u)≤M} for some M ∈R_{. Then}

min u_∈U J(u) attains a global minimum.

For a proof see [Bur03, Theorem 2.3].

In the case of infinite-dimensional problems compactness is not caused by bounded-ness. Fortunately a similar property holds for (the dual of) Banach spaces. Therefore let us recall the definition of weak and weak* convergence.

Definition 2.7. LetU be an Banach space and U∗ _{its dual space. Then the} _weak

topology is defined as

uk ⇀ u :⇔ hv, uki → hv, ui ∀v ∈ U∗ and the weak* topology is defined

vk ⇀∗ v :⇔ hvk, ui → hv, ui ∀u∈ U.

The theorem of Banach-Alaoglu provides the compactness of the set {v ∈ U∗ _|

kvk_U∗ ≤C} , C∈R₊ in the weak*- topology.

LetU be a Banach space, C ⊂ U be convex, and

J :U →R

be a functional. Since we are interested in solutions of a constrained minimization problem

J(u)→min u∈C

(20)

we can also set ˜J :U →R_with

˜ J(u) =    J(u) ifu∈ C ∞ if u∈ U \ C

and consider the following unconstrained minimization problem: ˜

J(u)→min u∈U .

It is obvious from Definition 2.5 and Remark 2.2 that ˜J is convex and lower semi-continuous and hence without loss of generality we can assume J as a functional

J :U →R _{defined on the whole space} _U_.

Another advantage of convex functionals is that one has to consider only the first derivative to characterize a minimum. For general functionals we have the following necessary first-order condition:

Lemma 2.3. LetJ :U → R_{be Fr´echet-differentiable and let}_u_{be a local minimum} of J. Then J′₍_u_{) = 0 holds.}

For a proof see [Bur03, Proposition 2.8]

Also for non Fr´echet-differentiable convex functionals we can state a necessary and sufficient criterion for a minimum in the sense of subgradients:

Lemma 2.4. Let J : U →R _{be a convex functional. Then} _u_{∈ U} _{is a minimum of}

J if and only if 0∈∂J(u).

Proof. Let 0∈∂J(u) then we have with (2.2):

J(w)≥J(u) +h0, w−ui

| {z }

= 0

∀w∈ U

Sou is a global minimum of J. On the other hand let 06∈∂J(u), then there exists at least one w∈ U with

J(w)< J(u) +h0, w−ui

| {z }

= 0

(21)

Chapter 3 Total Variation Regularization and

the ROF Model

In this chapter we want to give a brief introduction into total variation regulariza-tion, in particular by means of the ROF model. We will specify three different kinds of formulations (primal, dual and primal-dual) and their derivation. In the following Ω⊂Rd _{will denote an open bounded set with Lipschitz boundary.}

Letf : Ω⊂Rd _→R_{be a noisy version of a given image}_u₀ _{with noise variance given}

by _Z

Ω

(u0−f)2dx≤σ2. (3.1)

The ROF (Rudin-Osher-Fatemi) model, first introduced in [ROF92], is based on using the total variation as a regularization term to find a denoised image ˆu. It is defined by ˆ u= arg min u_∈BV(Ω) λ 2 Z Ω (u−f)2dx | {z } data fitting + |u|T V | {z } regularization (3.2)

whereλis a positive parameter specifying the intensity of regularization, that should be set depending on the noise varianceσ(i.e. λ→ ∞forσ →0). Here|u|T V denotes the so-called total variation of u

|u|T V := sup ϕ_∈C∞ 0 (Ω)d ||ϕ||∞≤1 Z Ω u∇ ·ϕ dx. (3.3)

In the literature also the notation R_Ω|Du| was used for the total variation of u, corresponding to the interpretation of Du as a vector measure.

Foru∈W1,1_{(Ω) we have}

|u|T V =

Z

Ω

(22)

Chapter 3: Total Variation Regularization and the ROF Model

BV denotes the space of functions with bounded total variation

BV(Ω) =u∈L1(Ω)| |u|T V <∞ which is a Banach space endowed with the norm

||u||BV :=|u|T V +kukL1.

| · |T V is lower semi-continuous with respect to the strong topology in L1loc(Ω) ([AFP00, Proposition 3.6]) and hence due to the embedding L2_(Ω) _⊂ _L1_{(Ω), for}

bounded Ω also with respect to L2_(Ω).

Note that (3.3) is not unique for d > 1. Depending on the exact definition of the supremum norm1

||p||_∞:= ess sup x_∈Ω

||p(x)||lr

we obtain a family of equivalent seminorms:

Z Ω |Du|ls = sup ϕ_∈C∞ 0 (Ω)d ||ϕ_||∞≤1 Z Ω u∇ ·ϕ dx

with 1≤ s≤ ∞ and its H¨older conjugate r (i.e. 1_s +1_r = 1).

For example (cf. [Bur08]) we obtain the isotropic total variation (r = 2) which coincides with |u|T V = Z Ω s X i (uxi)2 ∀u∈C 1

or a (cubicly) anisotropic total variation (r=∞) which coincides with |u|T V = Z Ω X i |uxi| ∀u∈C 1_. _(3.5)

According to expectations, the different definitions have effects on the nature of minimizers of (3.2). So in the case of the isotropic total variation, corners in the edge set will not be allowed [Mey01], whereas orthogonal corners are favored by the anisotropic variant [EO04]. See Figure 3.1 for an illustration.

Overall the aim is to minimize the following functional to obtain a denoised version 1_{ess sup}

(23)

(a) Original image (b) Noisy image

(c) Isotropic TV (d) Anisotropic TV

Figure 3.1: Different definitions of the supremum norm have effects on the nature of

minimizers of (3.2). Images are taken from [BBD+06].

of the noisy image f:

J(u) := λ 2 Z Ω (u−f)2dx | {z } :=Jd(u) +|u|T V | {z } :=Jr(u) . (3.6)

This is a strictly convex optimization problem and hence provides the advantages stated in Chapter 2.

Lemma 3.1. J as defined in (3.6) is strictly convex.

Proof. In Example 2.1 we have computed the second Fr´echet-derivative of the data fitting termJd(u) as Jd′′(u)(v, w) =λ R Ωwv dx. Hence Jd′′(u)(v, v) = λ Z Ω v2dx λ>0 > 0 ∀u ∈BV(Ω) and v 6= 0 and with Theorem 2.1 we achieve strict convexity ofJd.

(24)

Chapter 3: Total Variation Regularization and the ROF Model Furthermore |u|T V is convex: |αu+ (1−α)v|T V = sup ||ϕ_||∞≤1 Z Ω (αu+ (1−α)v)∇ ·ϕ dx ≤ α sup ||ϕ||∞≤1 Z Ω u∇ ·ϕ dx +(1−α) sup ||ϕ_||∞≤1 Z Ω v∇ ·ϕ dx = α|u|T V + (1−α)|v|T V. All in allJ(u) =Jd(u) +|u|T V is strictly convex.

Due to the convexity of J(u) we can apply Lemma 2.4 and achieve an optimality condition in terms of subgradients for a minimum as:

0∈∂J(u).

Jd and | · |T V are convex (see Lemma 3.1), lower semi-continuous and do not take the value−∞(actually both functionals are not negative). HenceJ is a lower semi-continuous convex functional defined over a Banach space and thus semi-continuous over the interior of its effective domain (see [ET76, Corollary 2.5]). We thus can conclude the existence of ˜u ∈ domJd∩dom| · |T V where Jd is continuous and with ([ET76, Proposition 5.6]) we have

∂J(u) = ∂(Jd+| · |T V)(u) = ∂Jd(u) +∂|u|T V.

Recall that Remark 2.3 gives us∂Jd(u) ={Jd′(u)}and hence the optimality condition can be stated as

0∈λ(u−f) +∂|u|T V. (3.7) Remark 3.1. Defining K as the closure of the convex set

{∇ ·p|p∈C₀∞(Ω), kpk_∞≤1}

and using Example 2.5 we achieve the convex conjugate ofJr(u) =|u|T V as

J_r∗(v) = χK(v) :=    0 if v ∈K +∞ else.

(25)

3.1 Primal Formulation

The primal formulation of the ROF model is given by

J(u) = λ 2 Z Ω (u−f)2dx+ Z Ω |∇u|dx (3.8)

foru sufficiently smooth, particularly u∈W1,1_{(Ω). To obtain the associated}

Euler-Lagrange equation, we compute the first Gˆateaux-derivative of J. For this purpose we set Φ(t) := J(u+tv) = λ 2 Z Ω (u+tv−f)2dx+ Z Ω |∇(u+tv)|dx

with an arbitrary v ∈BV(Ω). Derivating leads to (assuming |∇(u+tv)| 6= 0)

Φ′(t) = λ Z Ω (u+tv−f)v+ Z Ω ∇(u+tv) |∇(u+tv)| · ∇v. Hence (assuming |∇u| 6= 0) Φ′_{(0) =} _λ Z Ω (u−f)v+ Z Ω ∇u |∇u| · ∇v = λ Z Ω (u−f)v− Z Ω ∇ · ∇u |∇u|v+ Z ∂Ω v ∇u |∇u| ·n ds.

Under the aspect thatv was chosen arbitrary and assuming homogeneous Neumann boundary conditions foru, which is a natural choice for images, we obtain the Euler Lagrange equation: λ(u−f)− ∇ · ∇u |∇u| = 0. (3.9)

To overcome the issue with the singularity at∇u= 0 the TV norm often is perturbed as follows:

|∇u|β :=

p

|∇u|2₊_β, _(3.10)

or in the anisotropic case:

|∇u|β :=

X

i

p

|uxi|2+β, (3.11)

with a small positive parameter β. The choice of β is of great importance due to the closeness to degeneration for a value chosen to small and undesirable smoothed edges in the case of an overlarged β.

(26)

3.2 Dual Formulation

Under the aspect that the regularization term in the primal formulation is not dif-ferentiable we want to deduce another formulation for the TV-Minimization. As we will see, we can achieve differentiability at the cost of getting side conditions. In the following we write sup_k_p_k∞≤1 for the exact supp∈C

∞

0 (Ω)dkpk∞≤1. Let us start with the exact formulation of the TV regularization:

inf u_∈BV(Ω) " λ 2 Z Ω (u−f)2dx+ sup kp_k∞≤1 Z Ω u∇ ·p dx # . (3.12)

Bounded sets in BV(Ω) are weak* compact [AFP00, Proposition 3.13] and due to [ET76, Corollary 2.2], | · |T V is lower semi-continuous with respect to the weak*-topology. Hence J(u) attains a minimum in BV [Zei85], which is a unique one (cf. Theorem 2.2) and (3.12) can be rewritten as

min u_∈BV(Ω) " λ 2 Z Ω (u−f)2dx+ sup kp_k∞≤1 Z Ω u∇ ·p dx # . (3.13)

To allow a consideration in L2 we set ˜ J(u) =    J(u) if u∈BV(Ω) ∞ if u∈L2(Ω)\BV(Ω). SettingA =L2_{(Ω) and} _B₌_{_p_∈_H div(Ω)| ||p||_∞ ≤1} we obtain min u_∈A sup p_∈B λ 2 Z Ω (u−f)2dx+ Z Ω u∇ ·p dx | {z } :=L(u,p) . (3.14) as an equivalent form of (3.13).

Lemma 3.2. WithA,Band L(u, p) defined as above, the following four conditions hold:

A and B are convex, closed and non-empty, (3.15) ∀u∈ A p7→L(u, p) is concave and upper semi-continuous, (3.16) ∀p∈ B u7→L(u, p) is convex and lower semi-continuous, (3.17) ∃p0 ∈ B such that limu∈A,||u||2→∞L(u, p) = +∞ (coercivity). (3.18)

(27)

and the non-emptiness ofB. The closedness of B can be seen as follows:

Forpk ∈ B let the sequence pk converge to pin Hdiv. Then pk→p inL2 and hence

pk converges pointwise to p almost everywhere (a.e.) and due tokpkk∞≤1:

|p(x)|= lim

pk(x)→p(x)|pk(x)| ≤1 a.e.

Thereforekpk_∞≤ 1 and hencep∈ B.

L(u, p) is linear in p and therefore (3.16) holds.

We have shown the convexity of λ₂ R_Ω(u− f)2 _{in Lemma 3.1 and the lower semi}

continuity is given by Remark I.2.2 [ET76]. Together with the fact that R_Ωu∇ ·p is linear in u this yields (3.17).

To obtain (3.18) we choose p0 = 0.

Theorem 3.1. With A, B and L(u, p) defined as in (3.14) and the line before we have min u_∈A sup p_∈B L(u, p) = max p_∈B min u_∈A L(u, p).

Proof. Due to Lemma 3.2 all assumptions for [ET76, Proposition 2.3, p. 175] are fulfilled and we achieve:

minu∈ Asup p_∈B L(u, p) = sup p_∈B inf u_∈A L(u, p). (3.19) Let us take a look at the righthand side of (3.19):

sup p_∈B inf u_∈A λ 2 Z Ω (u−f)2dx+ Z Ω u∇ ·p dx .

The first order optimality condition for the infimum leads to

λ(u−f) +∇ ·p = 0

⇔ u =f − 1

λ∇ ·p. (3.20)

Sincef and∇ ·pare inL2 we have u∈L2 and due to the strict convexity ofL(u, p),

(28)

Reinserting into the righthand side of (3.19) yields: sup p∈B 1 2λ Z Ω (∇ ·p)2dx+ Z Ω (f− 1 λ∇ ·p)∇ ·p dx = sup p∈B 1 2λ Z Ω (∇ ·p)2dx+ Z Ω f∇ ·p dx− 1 λ Z Ω (∇ ·p)2dx = sup p_∈B − 1 λ2 Z Ω (∇ ·p)2dx+ 2 λ Z Ω f∇ ·p dx .

By adding the constant term −f2 _{(which does not affect the supremum) and}

com-pleting the square, we achieve sup p_∈B − Z Ω (1 λ∇ ·p−f) 2 . (3.21)

Instead of computing the supremum, we can minimize the negative term and obtain inf p∈B Z Ω (_λ1∇ ·p−f)2 = inf p∈B 1_λ∇ ·p−f2 2 | {z } =:G(p) . (3.22)

Now let us consider the sublevel set S :={p∈ B | k1

λ∇ ·p−fk 2 2 ≤ kfk22}. There _λ1∇ ·p−f 2 ≤ kfk2

holds and with the triangle inequality we have k∇ ·pk2 ≤2λkfk2. Furthermore kpk_∞ ≤1 implies kpk2 ≤ p |Ω| and we obtain kpkHdiv = q k∇ ·pk2 2+kpk22 ≤2λkfk+ p |Ω|.

This gives us the boundedness of the sublevel set S in Hdiv (and obviously in L∞) and with the theorem of Banach-Alaoglu, this implies the weak- compactness of the sublevel set in Hdiv and weak*- compactness in L∞.

Now let pk,kpkk∞≤1, be a sequence with

_λ1∇ ·pk−f 2 −−−→ k_→∞ infp 1_λ∇ ·p−f2.

(29)

there exists a subsequence of pk, again denoted by pk, with

pk ⇀ p in Hdiv (3.23)

pk ⇀∗ p in L∞. (3.24)

From [ET76, Corollary I.2.2] G, as defined in (3.22), is lower semi-continuous on S

for the weak topology of Hdiv and hence:

G(p)≤lim inf k_→∞ G(pk) = infp 1_λ∇ ·p−f2, i.e. p is a solution of (3.22). Also due to (3.24), kpk_∞= lim inf k→∞ kpkk∞≤1

is fulfilled (cf. proof of Lemma 3.2). Summarizing we have shown

sup p_∈B inf u_∈A L(u, p) = max p_∈B min u_∈A L(u, p) and together with (3.19) this proves the assertion.

Due to Theorem 3.1 we may consider the so called dual problem min p∈B k 1 λ∇ ·p−fk 2 2 (3.25)

and after solving (3.25) forp we obtain the solution for the primal variable u from (3.20).

An alternative derivation using the convex conjugate (see Definition 2.6) is presented in [Cha04]. In the following we give a brief summary of this approach, adapted to our problem:

We have stated the Euler equation for the TV- Regularization (3.6) in (3.7) as 0∈λ(u−f) +∂|u|T V

(30)

Chapter 3: Total Variation Regularization and the ROF Model which is equivalent to λ(f −u)∈∂|u|T V Lem. 2.2 ⇔ u∈∂J_r∗(λ(f−u)) ⇔ 0∈ −u+∂Jr∗(λ(f −u)) ⇔ 0∈ −λu+λf−λf +λ∂J_r∗(λ(f−u)) ⇔ 0∈λ(f−u)−λf +λ∂J_r∗(λ(f −u)).

This implies that w=λ(u−f) is a minimum of 1 2 Z Ω (w−λf)2₊_λ∂J∗ r(w).

Therefore (recalling the definition ofJr∗ as stated in Remark 3.1)wis the projection of λf to K ={∇ ·p|p∈ C∞ 0 (Ω)d, ||p||∞≤1}, i.e w= ΠK(λf). Since w=λ(u−f) we achieve: u = f− w λ = f− ΠK(λf) λ = f−Π1 λK(f).

Computing this nonlinear projection amounts exactly in solving (3.25).

3.3 Primal-Dual Formulation

Another approach aims at solving directly the saddle point problem for u and p, given by the exact formulation of the TV regularization:

inf u_∈BV(Ω) sup kp_k∞≤1 λ 2 Z Ω (u−f)2dx+ Z Ω u∇ ·p dx . | {z } =:L(u,p) (3.26)

For a saddle point we achieve the following optimality conditions

∂L

(31)

and

L(u, p)≥L(u, q) ∀q , kqk_∞ ≤1, (3.28) where (3.28) can be rewritten as

Z

Ω

u∇ ·(p−q)≥0 ∀q , kqk_∞ ≤1. (3.29) Hereafter we will see that the conditions (3.27) and (3.28) imply the optimality condition (3.7).

Lemma 3.3. (3.29) implies that ∇ ·p∈∂|u|T V. Proof. Letw∈BV(Ω) be arbitrary. We have

Z

Ω

u∇ ·(p−q)≥0 ∀q , kqk_∞≤1.

Especially this inequality holds for the supremum ofq: sup kq_k∞≤1 Z Ω u∇ ·(p−q)≥0, (3.30) which implies sup kq_k∞≤1 Z Ω u∇ ·(q−p)≤0. (3.31) Obviously we have sup kq_k∞≤1 Z w∇ ·q ≥ Z w∇ ·q ∀q , kqk_∞≤1. (3.32)

In particular (3.32) is true for q=p and with (3.31) we achieve sup kq_k∞≤1 Z w∇ ·q≥ sup kq_k∞≤1 Z Ω u∇ ·(q−p) | {z } ≤0 + Z w∇ ·p ⇔ sup kqk∞≤1 Z w∇ ·q≥ sup kqk∞≤1 Z Ω u∇ ·q− Z u∇ ·p+ Z w∇ ·p ⇔ |w|T V ≥ |u|T V +h∇ ·p, w−ui. (3.33) Since w was chosen arbitrarily (3.33) is equivalent to ∇ · p ∈ ∂|u|T V (see Def. 2.2).

(32)

Alternatively (3.29) implies that ∇u∈∂χ(p) with

χ(p) :=    0 if ||p||_∞≤1 ∞ else.

(33)

Chapter 4 Solution Methods

4.1 Primal Methods

For the primal formulation of the TV regularization (3.8) there already exist several numerical solution methods. Most of them aim at solving the associated Euler-Lagrange equation (3.9). In this section we will give a brief overview, without raising the claim of completeness.

4.1.1 Steepest Descent

Rudin et al. proposed in their original paper [ROF92] an artificial time marching scheme to solve the Euler-Lagrange equation (3.9). Considering the image u as a function of space and time, they seek the steady state of the parabolic equation

∂u ∂t =∇ · ∇u |∇u|β −λ(u−f)

with initial conditionu0 =f at timet = 0. Here| · |β denotes the perturbed norm as introduced in (3.10). Using an explicit forward Euler scheme for time discretization one usually achieves slow convergence due to the Courant-Friedrich-Lewy (CFL) condition, especially in regions where |∇u| ≈0.

4.1.2 Fixed Point Iteration

In [VO96] Vogel and Oman suggested to use a lagged diffusivity fixed point iteration scheme to solve the Euler-Lagrange equation (3.9) directly

∇ · ∇uk+1 |∇uk_| β −λ(uk+1−f) = 0,

(34)

Chapter 4: Solution Methods

leading to solve the linear system

λ− ∇ · ∇ |∇uk_| β uk+1 =λf

for each iteration k. In spite of only linear convergence, one obtains good results after a few iterations.

4.1.3 Newton’s Method

Vogel and Oman (cf. [VO96]) as well as Chan, Chan and Zhou (cf. [CZC95]) proposed to apply Newton’s method:

uk+1 =uk−H_φ−1(uk)φ(uk) with φ(u) :=λu− ∇ · ∇u

|∇u_|β being the gradient ofJ and Hφ(u) its Hessian, given by

Hφ(u) = λ− ∇ · 1 |∇u|β I− ∇u∇u t |∇u|2 β ! ∇ ! .

So in each step one has to solve

λ− ∇ 1 |∇uk_| β I− ∇u k_∇_ukt |∇uk_| β ! ∇ !! δu=−φ(uk) (4.1)

with an update δu: uk+1 _← _uk₊_δu_{. We have locally quadratic convergence, but} especially in the case where β is small the domain of convergence turned out to be minor. So alternatively one can use a continuation procedure forβ, i.e starting with a large value (where (4.1) is well defined) and successively decrease it to the favored value (cf. [CZC95]).

4.2 Dual Methods

In [Cha04] Chambolle presents a duality based algorithm. It amounts to solving the following problem:

_λ1∇ ·p−f2 → min

|pi,j|2−1≤0∀i,j=1,...,n

(35)

which is just the discrete version of (3.25). Here the discrete divergence is given by

(∇ ·p)i,j =          p1i,j−p1i₋1,j if 1< i < n p1i,j if i= 1 −p1 i−1,j if i=n +          p2i,j−p2i,j₋1 if 1< j < n p2i,j if j = 1 −p2 i,j−1 if j =n (4.3)

The Karush-Kuhn-Tucker conditions (cf. [Roc70], Theorem 28.3) yield the existence of a Lagrange multiplier αi,j (the index indicates the affinity to each constraint in (4.2))

− ∇ _λ1∇ ·p−g_i,j +αi,jpi,j = 0, (4.4) with αi,j ≥ 0 and αi,j(|pi,j|2 −1) = 0. Hence, either αi,j > 0 and |pi,j| = 1 or |pi,j|<1 andαi,j >0. In both cases this leads to

αi,j = | ∇ _λ1∇ ·p−g i,j|. Note that ∇ 1 λ∇ ·p−g

i,j = 0 for αi,j = 0. Thus (4.4) can be solved by a fixed point iteration:

pn_i,j+1 = pni,j+τ((∇(∇ ·pn−λg))i,j −|(∇(∇ ·pn−λg))_i,j|pn_i,j+1)

with initial value p0 _{= 0 and} _{τ >} _{0. Rewriting leads to the following projection}

algorithm: pn_i,j+1 = p n i,j+τ((∇(∇ ·pn−λg))i,j 1 +|(∇(∇ ·pn₋_λg₎₎ i,j| .

Convergence is given forτ ≤ 1

8 although in practice the optimal choice forτ appears

to be 1

4. Alternatively one can apply a simpler projection:

pni,j+1 = pn i,j+τ((∇(∇ ·pn−λg))i,j max{1,|pn i,j+τ(∇(∇ ·pn−λg))i,j|} .

This algorithm simply projects pn _{back to the unit ball if the constraint} _|_pn_{| ≤}_{1 is} violated. Stability is ensured up toτ ≤ 1₄ (cf. [Cha05]) and in practice the algorithm also converges for that choice of τ.

(36)

4.3 Primal-dual Methods

For solving the primal-dual formulation we will use Newton’s method with damping. Before we will go into detail we have to approximate the constraint forp. kpk_∞≤1 in (3.26) can be stated by adding the characteristic function χ, defined by

χ(p) :=    0 if kpk_∞≤1 ∞ else toL(u, p): inf u_∈BV(Ω) sup p λ 2 Z Ω (u−f)2dx+ Z Ω u∇ ·p dx+χ(p) . (4.5)

Under the aspect that χ is not differentiable (actually it is not a function in the classical sense), we use an approximation instead. For this purpose we will represent two different techniques in the next subsection.

4.3.1 Barrier and Penalty Method

In the following we specify two alternatives for the approximation. Instead of using the exact formulation with χ as in (4.5) we replace L(u, p) by

Lε(u, p) =L(u, p)− 1

εF(kpk −1) (4.6)

with ε > 0 small and a term F penalizing if kpk −1>0. A typical example for F

is:

F(s) = 1

2max{s,0}

2_. _(4.7)

The so called Penalty approximation still allows violations of the constraint, alter-natively barrier methods (also called ”interior-point methods”) can be used. Their idea is to add a continuous barrier term G(p) to L such that G(p) = ∞, if the constraint is violated. Since the constraint kpk ≥ 1 is equivalent to kpk2 _≥ _{1 we}

may replacekpk by its square to achieve differentiability:

Lε(u, p) =L(u, p)−εG(kpk2−1). (4.8) For example one can choose G(s) = −log(−s).

The choice of approximation effects the shape of the solutionuof (4.5) which can be seen well in the one dimensional case. Therefore we want to solve the saddle point

(37)

problem (3.26) with either of the two methods on the given domain Ω = [−1,1]. Example 4.1. (Penalty approximation)

Using Penalty approximation as introduced in (4.6), with F as defined in (4.7), we achieve the saddle point problem

inf u_∈BV(Ω) sup p λ 2 Z Ω (u−f)2dx+ Z Ω up′dx− 1 εmax{|p| −1,0} 2 . | {z } :=LP(u,p)

Therefore we have the following optimality conditions

∂LP ∂u = λ(u−f) +p ′ _{= 0} ∂LP ∂p = −u ′ ₋_g _{= 0}_, with g defined by g :=    1 ε(|p| −1)sign(p) if |p| ≥1 0 else. This leads to u′ =    1 ε(1− |p|)sign(p) if |p| ≥1 0 else (4.9)

Example 4.2. (Barrier approximation)

Using Barrier approximation as introduced in (4.8), with G(s) = −log(−s), we achieve the saddle point problem

inf u_∈BV(Ω) sup p λ 2 Z Ω (u−f)2dx+ Z Ω up′dx+εlog(−(|p|2₋₁₎₎ | {z } :=LB(u,p)

with optimality conditions

∂LB ∂u = λ(u−f) +p′ = 0 (4.10) ∂LB ∂p = −u ′₊_ε 2p |p|2₋₁ = 0, (4.11)

(38)

leading to

u′ =ε 2p

|p|2₋₁. (4.12)

We can see that in the case of Penalty approximation (Example 4.1) we haveu′ = 0 for |p| ≤1. But we cannot really achieve u′ =±∞ since 1_ε(|p| −1) sgn(p)→ ±∞ is only fulfilled for |p| → ∞ and this was prevented by the penalty term. This leads to the TV regularization typical stair casing effect but smoothed edges of order 1_ε (Fig. 4.1(a)).

In the case of the Barrier method (Example 4.2) u′ = 0 is only possible at p = 0. However, we have p= 0 on an interval [a, b] ⊂[−1,1] only if f = con [a, b] with c

constant: p= 0 on [a, b] ⇒ p′ _{= 0} _{on [}_{a, b}_] (4.10) ⇔ λ(u−f) = 0 on [a, b] ⇔ f =u on [a, b] u′ =0 ⇔ f =c on [a, b] which is very unlikely for the noisy data f.

Furthermore, for a minimum the Barrier term could be bounded by a constant c, i.e.

−εlog(−(|p|2−1))≤c (4.13)

⇔ |p|2−1≤ −e−cε. (4.14)

Hence |p|2₋_{1 might behave like}₋_e−c

ε in some points in which

|u′| ≈ ε

e−cε =

εeεc (4.15)

holds. As we can see the gradient of u can possibly take very large values already for a large ε (e.g ε = 0.1). Therefore we obtain sharp edges but no homogeneous areas (Fig. 4.1(b)).

Remark 4.1. Note that if ε is small enough (i.e. smaller than the step size) this effect does not occur anymore (see Fig. 4.2).

(39)

Chapter 4: Solution Methods −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.2 0 0.2 0.4 0.6 0.8 1 1.2

(a) Penalty method

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 (b) Barrier method

Figure 4.1: Example for Barrier and Penalty method in the one dimensional case, with

step sizeh= 10−3 _and_ε_{= 0}_._1.

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.2 0 0.2 0.4 0.6 0.8 1 1.2

(a) Penalty method

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 (b) Barrier method

Figure 4.2: Example for Barrier and Penalty method in the one dimensional case, with

step sizeh= 10−3 _and_ε_{= 10}−5_.

4.3.2 Newton Method with Damping

Using penalty approximation as introduced before we achieve the following optimal-ity conditions for our problem:

∂Lε ∂u = λ(u−f) +∇ ·p = 0 ∂Lε ∂p = −∇u− 2 εH(p) = 0

with H(p) being the derivative of F(kpk −1).

We linearize the non linear termH(p) via a first-order Taylor-approximation, i.e.

(40)

Adding a damping term we have to solve the following linear system in each step:

λ(uk+1−f) +∇ ·pk+1= 0 −∇uk+1−2 εH′(p k )(pk+1−pk)− 2 εH(p k )−τk(pk+1−pk) = 0. (4.16)

Here the parameterτk _{controls the damping. This linear system can be discretized} easily and is the basis of our parallel algorithm. To achieve fast convergence we chooseε =εk↓0 during the iteration.

For better performance of the algorithm it is recommended to start with a small value ofτ and increase it during the iteration to avoid oscillations.

The starting values of τ and ε we have used, as well as their adaption process, are chosen from some experimental runs of the algorithm. Certainly further research to find optimal values for these parameters is needed.

(41)

Chapter 5 Domain Decomposition

As mentioned in the introduction one would like to divide the original problem into several subproblems to solve them in parallel. One idea is to split the given domain Ω of the problem into subdomains Ωi, i= 1, . . . , S. This approach is called domain decomposition and depending on the choice of subdomains one achieves overlapping or non overlapping decompositions. In the case that all unknowns of the problem are coupled, a straightforward splitting and independent computation on each sub-domain results in significant errors across the interfaces.

5.1 Non Overlapping Decomposition

Let Ω⊂Rd_{. We split Ω into} _S _{subregions Ω}

i such that S [ i=1 ¯ Ωi = ¯Ω with Ωi∩Ωj =∅ for i6=j.

For a better understanding let us restrict to the case of a decomposition into two subdomains in the two dimensional case (cf. Fig. 5.1). As an example let us consider the Poisson equation with homogeneous Dirichlet boundary conditions.

Example 5.1. Let Ω = [−1,1]2 _⊂_R2

−∆u = f in Ω

u = 0 on ∂Ω

(42)

Chapter 5: Domain Decomposition

Ω1 Ω2

Γ

Figure 5.1: Non overlapping decomposition withS = 2 andd= 2. Here Γ :=∂Ω1∩∂Ω2

denotes the interfaces between the subdomains.

(i= 1,2), this is equivalent to (cf. [TW05]): −∆u1 =f in Ω1 u1 = 0 on∂Ω1\Γ −∆u2 =f in Ω2 u2 = 0 on∂Ω2\Γ u1 =u2 on Γ ∂u1 ∂n1 =−∂u2 ∂n2 on Γ

As one can see, there are conditions on the interface Γ, so called transmission condi-tions. If they are neglected (as it is the case for an ad hoc approach) there may arise artifacts at the interface (see Fig 5.2 for an example). There are some algorithms to avoid this issue (e.g. the Dirichlet-Neumann algorithm, or the Neumann-Neumann algorithm). We limit ourselves on a more detailed consideration of overlapping do-main decompositions, so for further information see [TW05].

5.2 Overlapping Decomposition

To avoid the computation of the transmission conditions in the case of non- overlap-ping methods one can apply overlapoverlap-ping partitions. At the cost of having redundant degrees of freedom, and thus larger systems to solve, the update of the boundary data can be easily obtained from exactly this redundancy. Expanding the Ωi from

(43)

Chapter 5: Domain Decomposition −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 x y

(a) Solution without decomposition

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 0 0.02 0.04 0.06 0.08 0.1 0.12 x y

(a) Solution with a 1×2 decomposition

Figure 5.2: Poisson equation with f = 3x2 on Ω = [−1,1]2 and homogeneous Dirichlet

boundary conditions. Neglecting the transmission conditions results in artifacts at the interface (herex= 0).

the previous section to Ω′

i, such that

d(∂Ω′i∩Ωj, ∂Ω′j∩Ωi)≥δ for i6=j and ∂Ω′i ∩Ωj 6=∅ whereby Ω′

i is truncated at the boundary of Ω we achieve an overlapping domain decomposition. In the case of Ω being an uniform lattice with step sizeh,δ is given by δ =mh with m∈N_. Ω1 Ω2 Γ1 Γ2 Ω′ 1 z }| { Ω′ 2 z }| { | {z } δ

Figure 5.3: Overlapping decomposition with S = 2 and d= 2. Here Γ1 :∂Ω1∩Ω2 and

(44)

5.3 Schwarz Iteration

One of the first approaches for domain decomposition was the multiplicative Schwarz method, introduced in 1870 by H.A. Schwarz [Sch70]. The proof of convergence can be obtained via a maximum principle (see, e.g., [Lio88]). A similar formulation leads to the additive Schwarz method, and as we will see there exist an affinity to well known techniques for solving linear equation systems.

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 0 0.05 0.1 0.15 x y

(a) After 1 iteration

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 0 0.05 0.1 0.15 x y (b) After 2 iterations −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 0 0.05 0.1 0.15 x y (c) After 6 iterations −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 0 0.05 0.1 0.15 x y (d) After 28 iterations

Figure 5.4: Poisson equation with f = 3x2 on Ω = [−1,1]2 and homogeneous Dirichlet

boundary conditions. The step size ish= 1/32 and the overlap δ = 2.

5.3.1 Multiplicative Schwarz Method

The multiplicative Schwarz algorithm consists of two fractional steps: Let u(0) be an initial function, then we subsequently solve

         Lu(₁k+1) =f, in Ω1 u(₁k+1) =u(k) |Γ1, on Γ1 u(₁k+1) = 0, on∂Ω1\Γ1 and          Lu(₂k+1) =f, in Ω2 u₂(k+1) =u₁(k+1)_|_Γ₂, on Γ2 u(₂k+1) = 0, on∂Ω2\Γ2.

(45)

The next step is computed by

u(k+1)(x) =    u(₂k+1)(x), if x∈Ω2 u(₁k+1)(x), if x∈Ω\Ω2.

For an example let us look again at the Poisson equation (cf. Example 5.1).

Example 5.2. Let Ω be a uniform lattice with step size h on [−1,1]2 _⊂ _R2 _and

Ω′

1, Ω′2 a decomposition of Ω as in Figure 5.3. Choosing an overlap of δ= 2∗hand

applying the multiplicative Schwarz Algorithm provides a good approximation for the solution after a few iterations (see Fig. 5.4).

The multiplicative Schwarz method is related to the well-known Gauss-Seidel method and at a first view this approach is not convenient for a parallel implementation due to the need of uk₁+1 for the computation of uk₂+1. But dividing the domain Ω into several subdomains and painting them in two colors (let us say black and white) such that divisions of the same color do not overlap allows a parallel computation. An easy example for such a colored division is shown in Figure 5.5. First solving on all black painted domains can be done in parallel and provides the boundary condi-tions for the white domains, on which the solution can be computed afterwards also in parallel. In the realization of the parallelization one would provide a domain of each color in each processor.

Note that in the case of a two dimensional domain and a splitting in each direction one would need four colors to obtain a decomposition as mentioned above (see Fig. 5.6).

Figure 5.5: Coloring with two colors in

the case of 1×4 subdomains. The shaded areas indicates the overlapping.

Figure 5.6: Coloring with four colors in

the case of 4×4 subdomains (the overlap is not indicated).

(46)

5.3.2 Additive Schwarz Method

Alternatively one can use the additive Schwarz method, which provides a direct application of parallelization:          Lu(₁k+1) =f, in Ω1 u(₁k+1) =u(k) |Γ1, on Γ1 u(₁k+1) = 0, on ∂Ω1 \Γ1 and          Lu(₂k+1) =f, in Ω2 u₂(k+1) =u₁(k)_|_Γ₂, on Γ2 u(₂k+1) = 0, on∂Ω2\Γ2.

The next step is computed by

u(k+1)(x) =          u(₂k+1)(x), if x∈Ω\Ω1 u(₁k+1)(x), if x∈Ω\Ω2 u(₁k+1)(x)+u(₂k+1)(x) 2 if x∈Ω1∩Ω2.

As we can see there are no dependences between the subdomains. Hence all sub-domains can be assigned to different processors and computed in parallel without further modification. A coloring as used by the multiplicative Schwarz Method is not necessary. Since in the k+ 1 iteration the boundary values are taken from the

k step, this approach is akin to the well-known Jacobi method.

5.4 Application

Using domain decomposition methods for image processing was inspired by an ap-proach of M. Fornasier and C.-B. Sch¨onlieb (cf. [FS07],[For07]). In analogy to the Schwarz multiplicative algorithm they present a subspace correction method to solve the minimization problem (3.6) based on the following iteration procedure:

         u(₁k+1)≈arg minv1∈V1J(v1+u (k) 2 ) u(₂k+1)≈arg minv2∈V2J(u (k+1) 1 +v2) u(k+1):=u(₁k+1)+u(₂k+1)

with initial conditionu(0) ₌_u(0) 1 +u

(0)

2 ∈V1⊕V2 andV1, V2 a splitting of the original

space in two orthogonal subspaces.

The subspace minimizations are solved by oblique thresholding, where the projection was computed by the algorithm proposed by Chambolle [Cha04] (see also Section 4.2). They also provide a modification for parallel computation.

(47)

convergence to a point where J is smaller then the initial choice. However the numerical results are still promising.

The algorithm requires the computation of a fixed pointη, which can be restricted to a small strip around the interface. Unfortunately the width of the strip is dependent on the parameterλ, in particular forλincreasing (i.e. stronger smoothing) the strip size decreases.

Since the primal-dual Newton method as introduced in Section 4.3.2 yields in solving a linear system, we can apply the Schwarz approach directly (see Section 7.3) and thus only need a overlap of one pixel, independent of the choice of λ.

(48)

Chapter 6 Basic Parallel Principles

Traditionally, software has been written for sequential computation, which means that the program runs on a single computer having a single CPU (Central Process-ing Unit), whereas parallelization allows to run the programs on multiple CPUs. The computing resource for a parallel computation can be a single computer with multiple CPUs, a network of computers with single CPUs or a hybrid of both. The aim of parallelization is to save computational time. More important, due to the fact that some computations are limited by their memory requirements, they become only computable by dividing the given data to several processors.

Once we have decomposed our domain into several subdomains we want to divide them on multiple tasks. Our choice of a parallel programming model is the Message Passing Model. It can be used on Shared Memory machines, Distributed Memory machines as well as on hybrid architectures. Since modern computers, like our test system ZIVHP1_{, employ both shared and distributed memory architectures this is}

of great importance.

For the implementation of our algorithms we have used MATLABR2_{. In order to}

make the Message Passing Interface (MPI) available in MATLABR_{, we additionally} used MatlabMPI3 _{provided by the Lincoln Laboratory of the Massachusetts Institute}

Of Technology (MIT). In this chapter we want to state what MPI is and why we have used it for our problem. First we want to give a brief introduction to MPI.

6.1 MPI

The Message Passing Interface (MPI) standard in its first version was introduced in 1994 by the MPI Forum. It supplies a set of C, C ++ and Fortran functions for writing parallel programs by using explicit communication between the tasks. Up to date

1_{see 8.2 for more derails} 2

(49)

Chapter 6: Basic Parallel Principles

MPI is available in version 2.14_.

The actual number of used processes is declared at startup. Processes that should com-municate with each other are grouped in so-called communicators, where a priori all pro-cesses belong to a predefined communicator called MPI COMM WORLD, which is the only one we will use. All processes are numbered increasingly starting at 0 (its rank), which can be obtain during runtime with the functionMPI Comm rank. Analogously the size of the communicator (i.e. the number of all processes) can be obtained via MPI Comm size. Point-to-point communication (e.g. MPI Recv, MPI Send) as well as collective operations (e.g. MPI Bcast) are available.

Since we only use MatlabMPI, where just a few of the communication methods are imple-mented, we restrict ourselves to a detailed consideration only for this methods.

6.1.1 MatlabMPI

MatlabMPI provides a set of MATLABR_{scripts implementing some of the essential MPI} routines, in particularMPI Send,MPI RecvandMPI Bcast. One main difference to MPI is the fact that MatlabMPI uses the MATLABR_{standard I/O, i.e. the buffer files are saved} in.matfiles.

The MatlabMPI routines are structured as follows:

• MPI Send(dest,tag,comm,var1,var2,...) where

dest rank of the destination task

tag unique tag for communication

comm a MPI communicator (typically MPT COMM WORLD)

var1,var2,... variables to be send

• [var1,var2,...] = MPI Recv(source,tag,comm) where

source rank of the source task

tag unique tag for communication

comm a MPI communicator (typically MPT COMM WORLD) var1,var2,... variables to be received

Note thatMPI Sendis non-blocking in the sense of that the next statement can be executed immediately after the message was saved in the.matfile, whereasMPI Recvis blocking, i.e. the program is suspended until the message was received. To make MatlabMPI available

4

(50)

Chapter 6: Basic Parallel Principles

in MATLABR_the _src_{folder has to be added to the path definitions as well as the folder} the m-File is started from. One has to ensure that the path definitions are set a priori at each start of a MATLABR_session.

A script using MatlabMPI can be run as follows:

eval(MPI Run(m-File, processes ,machines));

wherem-Fileis the script to be started,processesthe number of processes andmachines the machines that should be used. For running on local processors, machines is set as

{}, otherwise it contains the list of nodes which are then connected via SSH (resp. RSH). Before running a script all files created by MatlabMPI have to be removed using the MatMPI Delete allcommand. For more detailed information see also the README files or the introduction given on the above mentioned homepage.

6.2 Speedup

To measure how much faster a parallel algorithm on p processors is in comparison to its corresponding sequential version, one can look at the speedup, which is defined by

Sp:= T1

Tp .

Here T1 denotes the execution time of the sequential algorithm and Tp the time needed on pprocessors. One says that the speedup is linear or ideal if Sp =p. In most cases the speedup is lower than linear (Sp < p), which results from the overhead arising in parallel programming. The overhead occurs i.e. due to the communication effort between the processors, extra redundant computations, or a changed algorithm. Also, against the first impression, a superlinear speedup (Sp > p) is possible. A main reason for superlinear speedup is the cache effect: Due to the smaller problem size on each CPU the cache swapping is reduced and the memory access time decreases sharply.

(51)

Chapter 7 Numerical Realization

In this chapter we specify the numerical realization of the primal-dual Newton method with damping as introduced in Section 4.3.2. Since we apply the anisotropic total variation (cf. (3.5)) we make use of a slightly changed penalty termF:

F(p1, p2) = 1 2max{|p1| −1,0} 2₊1 2max{|p2| −1,0} 2 and hence H(p) = sgn(p1)·(|p1| −1)·1{|p1|≥1} sgn(p2)·(|p2| −1)·1_{|p2|≥1} !

and its Hessian

H′(p) = 1{|p1|≥1} 0 0 1_{|p2|≥1}

!

. (7.1)

Note that we limit ourselves to the consideration of the two dimensional case, but an adaption to higher dimensions can be easily done. First let us assumef ∈[0,1]n×n being a given noisy image. The aim is to find a denoised image u by solving the linear system (4.16).

Laying the degrees of freedom of the dual variablepin the center between the pixels ofu allows us to compute the divergence ofp(using a single-sided difference quotient) effective as a value in each pixel. (cf. Fig. 7.1). For our given issue this looks as follows:

Define a discrete gradient∇u by

∇u:= ((∇u)1,(∇u)2)) (7.2) with (∇u)1i,j =    ui+1,j−ui,j ifi < n 0 ifi=n (∇u)2_i,j =    ui,j+1−ui,j ifi < n 0 ifi=n

(52)

Chapter 7: Numerical Realization •u11 ∗p2₁₁ •u12 ∗p₁₂2 . . . ∗p2₁_n−1 •u1n ∗p1 11 ∗p112 . . . ∗p11n •u21 ∗p2₂₁ •u22 ∗p₂₂2 . . . ∗p2₂n−₁ •u2n ∗p1 21 ∗p122 . . . ∗p12n .. . ... . .. ... ∗p1 n−₁₁ ∗p1n−₁₂ . . . ∗p1n−₁n •un1 ∗p2 n1 •un2 ∗p2 n2 . . . ∗p2nn−1 •unn

Figure 7.1: The degrees of freedom of the vector fieldp= (p1, p2) lay between the center

of the pixels of u, where u is represented by a (n×n) matrix. So p1 can be construed as a (n−1×n) matrix and p2 _{as a (}_n_×_n₋_{1) matrix.}

fori , j = 1, . . . , n.

Hence the discrete divergence as the negative adjoint operator of the gradient is given by

(∇ ·p)i,j =          p1_i,j−p1_i₋₁_,j if 1< i < n p1_i,j ifi= 1 −p1_i₋₁_,j ifi=n +          p2_i,j−p2_i,j₋₁ if 1< j < n p2_i,j ifj= 1 −p2_i,j₋₁ ifj=n. (7.3)

7.1 Sequential Implementation

After having discretized the linear equation system (4.16), we describe the implementation process in detail. We first want to construct a matrix A and a vector b such that under an appropriate renumbering foru and pthe following matrix equation

Ax=b (7.4)

is equivalent to the discretized linear system. For this purpose let us write u, p1 _and _p2

column wise in vectors:

~u=                   u11 .. . un1 u12 .. . un2 .. . unn                   , p~1₌                   p1 11 .. . p1_n₋₁₁ p1₁₂ .. . p1 n₋12 .. . p1_n₋₁_n                   , p~2 ₌                   p2 11 .. . p2_n₁ p2₁₂ .. . p2 n2 .. . p2_nn₋₁                  

Parallel Total Variation Minimization. Diplomarbeit

Institut f¨

ur Numerische und Angewandte Mathematik