Explorations on anisotropic regularisation of dynamic inverse problems by bilevel optimisation

(1)

inverse problems by bilevel optimisation

Martin Benning1_{, Carola-Bibiane Schönlieb}1_{, Tuomo} Valkonen1, Verner Vlačić1

1 _{Department of Applied Mathematics and Theoretical Physics, University of} Cambridge, United Kingdom

Abstract. We explore anisotropic regularisation methods in the spirit of [1].

Based on ground truth data, we propose a bilevel optimisation strategy to compute the optimal regularisation parameters of such a model for the application of video denoising. The optimisation poses a challenge in itself, as the dependency on one of the regularisation parameters is non-linear such that the standard existence and convergence theory does not apply. Moreover, we analyse numerical results of the proposed parameter learning strategy based on three exemplary video sequences and discuss the impact of these results on the actual modelling of dynamic inverse problems.

(2)

1. Introduction

In this paper, we employ bilevel optimisation to explore different choices of spatial-and temporal regularisation in a variational image reconstruction model.

Variational regularisation methods are extremely popular and versatile tools when it comes to computing approximate solutions to ill-posed inverse problems. Given the assumption of normal distributed noise, they usually have the form of a generalised Tikhonov-type regularisation, i.e.

uα∈R(α) := arg min u∈L2_(Ω)_∩U

₁

2kKu−gk 2

2+G(u;α)

, (1.1)

whereK ∈ L(L2(Ω), L2(Σ))is a bounded, linear operator mapping from the Hilbert spaceL2(Ω) to the Hilbert spaceL(Σ), withΩandΣ being bounded and connected domains. The function g ∈ L2(Σ) represents the given measurement data, and the normk · k2 simply denotes theL2(Σ)-norm. Given a signalu∈ U and regularisation

parametersα∈ V, the functionalG:U ×V →Rrepresents the regulariser, for Banach

spacesU and V.

Particularly in imaging and image processing applications, proper, lower semi-continuous (l.s.c), convex and non-smooth regularisers have attracted a lot of attention over the last two decades. Various types of total variation regularisation [2] and `1

regularisation of unitary transformed signals [3] have been proposed, in order to exploit sparsity of a signal with respect to a given representation.

Despite allowing for significant improvements in terms of reconstruction quality, non-smooth regularisation methods suffer from introducing systematic modelling artefacts like any other regularisation method; in case of total variation regularisation for instance, the regularisation method is well-known to introduce piecewise constant approximations of noisy, non-constant regions, which is known as the stair-casing effect (cf. [4, Section 4.2]).

In order to compensate for modelling artefacts, the concept of infimal convolutions can be used for combining the advantages of different regularisers into one. The infimal convolution of two proper, l.s.c. and convex functionalsJ1 andJ2 is defined as

(J1J2)(u) := inf

u=v+wJ1(v) +J2(w). (IC)

In [5, 6, 7, 8] it has been shown that an infimal convolution of the total variation and a higher-order total variation is beneficial for fighting the stair-casing phenomenon, and that infimal convolutions in general are useful in order to reconstruct functions that are additive compositions of functions which can individually be recovered by different regularisers.

(3)

also the decomposition of the latter into a sequence of images encoding dominantly temporal information and another sequence encoding dominantly spatial information. In what follows we want to learn optimal decompositions in this regularisation in the context of video de-noising by an appropriate bilevel optimisation approach. In the context of (1.1) the associated bilevel learning problem we will discuss looks for an optimal parameter vector αthat solves for some convex, proper, weak* lower semicontinuous cost functionalF :X →Rthe problem

min

α∈PF(uα) (P)

subject touαbeing a solution of (1.1).

In the context of computer vision and image processing bilevel optimisation is considered as a supervised learning method that optimally adapts itself to a given dataset of measurements and desirable solutions. In [9, 10, 11, 12], for instance the authors consider bilevel optimization for finite dimensional Markov random field models. In inverse problems the optimal inversion and experimental acquisition setup is discussed in the context of optimal model design in works by Haber, Horesh and Tenorio [13, 14], as well as Ghattas et al. [15, 16]. Recently parameter learning in the context of functional variational regularisation models (1.1) also entered the image processing community with works by the authors [17, 18, 19, 20, 21, 22], Kunisch, Pock and co-workers [23, 24, 25], Chung et al. [26], Hintermüller et al. [27] and others [28, 29, 30]. Interesting recent works also include bilevel learning approaches for image segmentation [31], learning and optimal setup of support vector machines [32] and learning discrete reaction-diffusion filters [33].

What we show is closest in flavour to recent applications of bilevel optimisation to supervised learning of optimal parameters in a total variation type regularisation model [34, 23, 35, 36]. The main difference in the theory and computational realisation to these works and the model discussed in this paper is due to the nonlinear dependence of the lower level problem on the parameter vectorαthat the model is optimised for. To our knowledge, this is the first publication that deals with learning of non-linear regularisation parameters in the context of regularisation of inverse problems.

The paper is organised as follows. First, we are going to recall the concept of spatio-temporal regularisation via infimal convolutions of regularisers. Then, we are going to present the bilevel optimisation framework for learning the regularisation parameters of the infimal convolution regularisers. Subsequently, we are going to address the numerical aspects of the parameter learning strategy. We then conclude with numerical examples and their discussion. We particularly want to address the question of how realistic the assumption of the coupling of the spatial and temporal regularisation is, given three different types of images sequences.

2. Regularisation of dynamic inverse problems

Let g = (g1, . . . , gm) ∈ L1(Ω;Rm). We denote by k · k2,1 the L1(Ω)-norm of the

two-norm of vector-valued functions, namely

kgk2,1:= Z

Ω

(4)

wherekg(x)k2= p

g1(x)2+. . .+gm(x)2. Based on [1], we introduce the anisotropic derivative∇κ and its negative adjointdivκdefined for a scalar κ∈(0,1) as

∇κ=

κ ∂ ∂x, κ

∂

∂y,(1−κ) ∂ ∂t

, and

divκ=κ ∂ ∂x+κ

∂

∂y+ (1−κ) ∂ ∂t,

(2.1)

With these, and for u ∈ W1,1_(Ω) _{we define the following dynamic regularisation}

functionals: Here (IC-TVTV) is a direct adaptation of the ICTV functional proposed

G(u;α1, α2, κ) = inf

u=v+wα1k∇κvk2,1+α2k∇1−κwk2,1 (IC-TVTV)

G(u;α1, α2, κ) = inf

u=v+w α1

2 k∇κvk 2

2+α2k∇1−κwk2,1 (IC-L2TV)

G(u;α1, α2) =α1k∇uk2,1+α2k

∂

∂tuk1 (Rigid TVTV)

G(u;α1, α2) =

α1 2 k∇uk

2 2+α2k

∂

∂tuk1 (RigidL

2_TV)

Table 1: The different dynamic regularisers used throughout this paper.

in [1]. Regulariser (IC-L2_{TV) is a modification that allows for different regularisation}

models; in case we have picked two rather complementary regularisation functionals, with the L2 _{norm of the gradient complementing the total variation regularisation.}

Regularisers (Rigid TVTV) and (RigidL2TV) can almost be seen as limiting cases of (IC-TVTV) and (IC-L2TV), respectively. Choosing κ ∈ {0,1} and restricting solutions to v = w converts (IC-TVTV) and (IC-L2TV) into (Rigid TVTV) and (RigidL2TV).

The basic motivation for these models is a decomposition of a dynamic image sequence into a spatial and a temporal component, such that these components are penalised individually with suitable regularisation functionals. However, in order to allow for space-time correspondence in these sequences, an additional anisotropy parameterκis introduced that ensures neither one of the components to be penalised by the spatial or the temporal regulariser alone. Speaking of spatial and temporal components, a spatial component is regularised in space only, whereas the temporal component is regularised in only in time. In (2.1) this corresponds to the extreme cases κ= 0 (only temporal regularisation) and κ = 1 (only spatial regularisation). The restriction toκ∈(0,1), which further ensures spatial and temporal regularity in both components, also guarantees well-posedness of (1.1).

A major challenge for successful regularisation of dynamic image sequences via any of the regularisers listed in Table 1 is the ’optimal’ choice of parametersα1, α2

and κ. On the one hand, this is due to the number of parameters itself, making it difficult to employ heuristic parameter choice rules that may succeed in case of single parameters. On the other hand, the dependency of the model on the regularisation parameterκis non-linear, making it even harder to optimise for.

(5)

3. Optimising the parameters—the theory

It is not immediately clear, which parameter choices forα1,α2andκmay be optimal

for dynamic inverse problems regularised by one of the anisotropic regularisers given in Table 1. Parameter learning approaches, as discussed in the introduction, provide a means towards studying optimal parameter choices with respect to a known ground truth. We concentrate in particular on the bilevel optimisation framework, first presented in [34, 23] and further studied in the context of multi-parameter regularisers in [35, 36]. The general form therein is given as

min α∈P

1

2kR(α)−gk 2

2 s.t. R(α) = arg min

u∈L1_(Ω)

1

2kf−Kuk 2

2+G(u;α). (3.1)

Here α denotes the vector of parameters we are optimising for, andP is our space of allowed parameters; generally α = (α1, α2, κ) with P = [0,∞)2×[0,1] for the

infimal convolution regularisers, and α = (α1, α2) with P = [0,∞)2 for the rigid

regularisers. Here,kR(α)−gk2is the cost functionalF(R(α))measuring the distance

of the denoised solutionR(α)from the ground truth originalg, andGis a regulariser for the lower level problem of reconstructingufrom noisyf.

3.1. Existence of solutions

A general existence and convergence theory for solutions α to (3.1) is presented in [35] whenGis linear inα and defined over the space of bounded variation functions, i.e. k∇uk2,1 in Table 1 is replaced by

sup

ϕ∈C∞₀ (Ω;Rm) kϕk2,∞≤1

Z

Ω

u(x) (divκϕ)(x)dx,

where the notationk · k2,∞is analogous tok · k2,1, but with theL∞(Ω)-norm replacing

theL1_(Ω)_{-norm. Our regularisers are not linear in}_κ_{however, so an extended theory}

would be needed. With an additional elliptic regularisation, however, we can still show existence of solutions easily.

Proposition 3.1. Consider the problem

min α∈P

1

2kR(α)−gk 2

2 s.t. R(α) = arg min

u∈L1_(Ω)

1

2kf−Kuk 2

2+G(u;α) +

2k∇uk 2_, _(3.2)

where G is one of the regularisers from Table 1, P the corresponding admissible set of parameters, and >0. Suppose constant functions are not in the null space ofK. Then there exists an optimal solution α∈ P to (3.2).

Proof. Note that all regularisers presented in Table 1 are lower semi-continuous with respect to the mutual convergence of α in R3 and of u in L1. They are moreover

continuous with respect toαfor fixedu. Let us first consider the inner problem

arg min

u∈L1_(Ω)

J(u;α) := 1

2kf−Kuk 2

2+G(u;α) +

2k∇uk 2

(6)

first. The term ₂k∇uk2_{and the fact that constant functions are not in the kernel of}

Kguarantees weak convergence inH1_(Ω)_{of a (subsequence of a) minimising sequence} {uk_}_{regardless of the choice of}_α_{in the inner problem. This implies the convergence in} L1_{, and consequently by the standard argument of calculus of variations, an existence}

of a solutionR(α)to the inner problem.

For a minimising sequence{αk_}_{of the whole problem (3.2) we therefore get weak} convergence inL2_of_uk _:=_R₍_αk₎_{to some}_{u. But since}_ˆ _k_uk₋_g_k2

2→ ku−gk22for such

a minimising sequence, we have so-called strict convergence ofuk₋_g _to_u₋_{g. In} _L2

this implies strong convergence of firstuk₋_g_to_u₋_{g, and then of}_uk _to_{u. Suppose}_ˆ alsoαk _→_α_ˆ_{. We then compute}

J(u; ˆα) = lim inf

k→∞ J(u;α

k_{) +}_G₍_u_{; ˆ}_α₎₋_G₍_u_;_αk₎

≥lim inf

k→∞ J(u;α

k_{) + lim inf}

k→∞ G(u; ˆα)−G(u;α

k₎

.

Using the continuity ofGwith respect toα, we therefore obtain

J(u; ˆα) = lim inf

k→∞ J(u;α

k₎_≥_{lim inf} k→∞ J(u

k_;_αk₎_≥_J_(ˆ_u_{; ˆ}_α₎_.

This shows thatuˆ=R( ˆα), and hence thatαˆ solves (3.2).

Remark 3.1. Note that we only used to show existence of solutions to the inner problem. In particular, we do not require > 0 to be constant, but we can allow =(α) to vary continuously with respect toα.

3.2. Derivative of the solution map

In order to use standard optimisation methods, such as BFGS, to minimise (3.1) we need to calculate a gradient of the solution mapR; equivalently, following the PDE approach of [34], we need to solve a so-called adjoint equation. A solution is given by the classical implicit function theorem [37, Theorem 4.E], and in particular the version in [38, Corollary 4.34] with relaxed assumptions. Let us suppose Gis differentiable, such thatR(α)is given as the solutionu=uα to

0 =S(u,α) :=K∗(Ku−f) +∇uG(u;α).

Then, ifS is strictly differentiable, and∇uS(u,α)is invertible, we have

∇R(α) = [∇uS(u,α)]−1∇αS(u,α). (3.4)

The strict differentiability of S may be achieved by a “second-degree” Huber regularisation [39]. The invertibility of∇uS(u,α)can be guaranteed by an additional elliptic regularisation term₂k∇uk2

2. Both extra regularisations are the same as already

employed in [34], where an alternative adjoint equation route was taken to avoid direct construction of∇R. We now take a closer look at the derivatives of the solution maps for the regularisers (IC-TVTV) and (IC-L2_{TV). Furthermore, as we are going to}

restrict ourselves to the regularisation of dynamic video sequences, we considerK to be the identity operator mapping fromL2_(Ω) _to _L2_(Ω) _{throughout the remainder of}

(7)

3.3. Derivative of the IC TVTV solution map

To use the formula (3.4), we need the derivative of the map

R(α1, α2, κ) =P1Rˆ(α1, α2, κ)

forP1(u, w) :=u, and

ˆ

R(α1, α2, κ) := arg min (u,w)

1

2ku−gk 2

2+α1k∇κ(u−w)k2,1+α2k∇1−κ(w)k2,1.

Clearly

∇R(α1, α2, κ) =P1∇Rˆ(α1, α2, κ) (3.5)

if the latter derivative exists. ForRˆ(α1, α2, κ)to have a unique minimiser and to be

differentiable, we further replaceRˆ(α1, α2, κ)with

ˆ

Rγ,(α1, α2, κ) = arg min (u,w)

Jγ,(u, w, α1, α2, κ) (3.6)

for

Jγ,(u, w, α1, α2, κ) := 1 2ku−gk

2

2+α1Θ(∇κ(u−w)) +α2Θ(∇1−κ(w)) +

1 2

Z

Ω

w dx

2

.

Here we use the notation

Θ(v) :=kvkγ+

2kvk 2

2 (3.7)

for the sum of the Huber-regularised1-normkvkγ =

R

ΩHγ(kv(x)k2)dxwith

Hγ(r) =

(

|r| −γ₂ |r| ≥γ

1 2γ|r|

2 _|_r_|_{< γ}

and parameter γ > 0, and additional Hilbert space regularisation with parameter >0. The last term of (3.6) eliminates the translational invariance inRwith respect

tow, thus forcing unique solutions as needed for the application of (3.4).

Note that existence of a solution(u, w)to (3.6) is guaranteed by simple application of Proposition 3.1 and Remark 3.1 provided κ 6∈ {0,1}. In that case Θ(∇κu) ≥ 0k∇uk2

2 for some0=0(κ, ).

For the following propositions we define the shorthand notations

Ψ1_κ(u) := [∇Θ](∇κ(u−w)), Ψ2κ(u) := [∇

2_Θ](_∇

κ(u−w))

and

Q:= I₀−_II,

where we denote by I the identity operator. We also model by c := χΩ the term R

Ωw dx=hc, wiin (3.6).

(8)

Proposition 3.2 (Derivative of the IC TVTV solution map). Let Rˆγ, be defined by (3.6)for someγ, >0, and

R(α1, α2, κ) :=P1Rγ,ˆ (α1, α2, κ).

Then

∇R(α1, α2, κ) =−P1

_∂S

∂(u, w)

−1 _∂S

∂(α1, α2, κ)

, (3.8a)

whereS=∇(u,w)Jγ, satisfies

∂S ∂(u, w) :=Q

∗

I−α1divκΨκ2(u−w)∇κ I

I I−α2div1−κΨ21−κ(w)∇1−κ+c⊗c

Q, (3.8b)

∂S ∂(α1, α2, κ)

:=Q∗

−divκΨ1κ(u−w) 0 T1

0 −div1−κΨ11−κ(w) T2

, (3.8c)

where

T1:=−α1

div− ∂

∂t∗

Ψ1κ(u−w)−α1divκΨ2κ(u−w)

∇ − ∂

∂t

(u−w), and (3.8d)

T2:=−α2

_∂

∂t∗−div

Ψ11−κ(w)−α2div1−κΨ21−κ(w) _∂

∂t− ∇

w. (3.8e)

The problem of finding the derivative of the solution map thus reduces to the problem of solving three linear systems with the operator (3.8b) for the right-hand side vectors the columns of (3.8c).

Proof. The optimality conditions for the solution u = Rγ,(α1, α2, κ) are obtained

from its Euler-Lagrange equationS(u, w, α1, α2, κ) = 0, which we split into

S1:=u−f −α1divκΨ1κ(u−w) = 0, and (3.9a)

S2:=α1divκΨ1κ(u−w)−α2div1−κΨ11−κ(w) +

Z

Ω

w dx= 0 (3.9b)

Now (S1, S2) = 0 defines (u, w)as an implicit function of (α1, α2, κ). By (3.4), the

Jacobian of the functionRγ,ˆ : (α1, α2, κ)7→(u, w)is given by

∂(u, w)

∂(α1, α2, κ) =−

_∂S

∂(u, w)

−1 _∂S

∂(α1, α2, κ)

(3.10)

Calculating all the partial derivatives and using (3.5), we obtain (3.8).

3.4. Derivative of the IC L2_{TV solution map}

Again we have a problem of the form

R(α1, α2, κ) =P1Rˆ(α1, α2, κ)

forP1(u, w) :=u, and

ˆ

R(α1, α2, κ) = arg min (u,w)

1

2ku−gk 2 2+

α1

2 k∇κ(u−w)k 2

2,1+α2k∇1−κ(w)k2,1. (3.11)

To employ the formula (3.4), we need to regulariseRˆ(α1, α2, κ)by replacing it with

ˆ

Rγ,(α1, α2, κ) = arg min

u,w

(9)

for

Jγ,(u, w, α1, α2, κ) := 1 2ku−gk

2 2+

α1

2 k∇κ(u−w)k 2

2+α2Θ(∇1−κ(w)) +

1 2 Z Ω w dx 2

Similar to Section 3.3, we obtain the following result.

Proposition 3.3 (Derivative of the IC L2_{TV solution map)}_. _Let _Rγ,ˆ _{be defined by}

(3.12)for someγ, >0, and

R(α1, α2, κ) :=P1Rγ,ˆ (α1, α2, κ).

Then

∇R(α1, α2, κ) =−P1

_∂S

∂(u, w)

−1 _∂S

∂(α1, α2, κ)

, (3.13a)

whereS=∇(u,w)Jγ, satisfies

∂S

∂(u, w) :=Q ∗

I−α1divκ∇κ I

I I−α2div1−κΨ21−κ(w)∇1−κ+c×c

Q (3.13b)

∂S ∂(α1, α2, κ)

:=Q∗

−divκ∇κ(u−w) 0 T1 0 −div1−κΨ11−κ(w) T2

(3.13c)

where

T2:=−α2 _∂

∂t∗ −div

Ψ1₁₋_κ(w)−α2div1−κΨ21−κ(w)

_∂

∂t− ∇

w, (3.13d)

T2:=−α1

div− ∂

∂t∗

∇κ(u−w)−α1divκ

∇ − ∂

∂t

(u−w). (3.13e)

HereQandcare as in Section 3.3.

Proof. As in Section 3.3, we find the optimal conditions for the minimization problem in (3.12) are

S1:=u−f−α1divκ∇κ(u−w) = 0 (3.14a)

S2:=α1divκ∇κ(u−w)−α2div1−κΨ11−κ(w) +

Z

Ω

w dx= 0 (3.14b)

The Jacobian of the functionRγ,: (α1, α2, κ)→uis again given by

∂u ∂(α1, α2, κ)

=−P1

_∂S

∂(u, w)

−1 _∂S

∂(α1, α2, κ)

. (3.15)

We thus calculate the partial derivatives with respect to all the variables to obtain (3.13).

4. Computational realisation

(10)

we update the quasi-Hessian only if it remains positive semi-definite. The following Armijo condition

F(α+σd)−F(α)≤σc∇F(α)·d (4.1) is used, whereF(α) =1₂kR(α)−gk2

2,dis the search direction,σthe step-length and

ca positive constant. The relative residual is used as a stopping criterion, so that the algorithm terminates if

kαi−αi−1k2< ρkαik2 (4.2)

is satisfied for a fixed, positive parameter ρ. As all models have to be differentiable in order to solve the upper level problem, we used theL2_norm

2k · k 2

2 with= 10−8

and the Huberised L1 normk · kγ with γ = 0.01 as our cost functionals in order to optimise the parameters(α1, α2, κ)for the regularisers in Table 1. We have further

used MATLAB’s inbuilt gmres function with a diagonally compensated incomplete Cholesky preconditioner in order to solve (3.10) and (3.15), respectively.

To compute numerical solutionsu(andw) of the lower-level problem for fixedα, the lower level problem of the bilevel optimisation (3.1) is solved via the primal-dual hybrid gradient method (PDHGM) as presented in [40]. In order to apply the PDHGM to the lower level problem, we need to recast it into a saddle-point formulation. In case of (IC-L2_{TV) for instance, this saddle-point formulation reads as}

min (u,w)

max (p,q) 1

2ku−gk 2 2+hA(

u w),(

p q)i −₂_α1

1kpk

2

2−δα2P(q), (4.3)

whereP ={p| kpk2,∞≤1} is the unit ball with respect to the supremum norm and

Ais a linear operator given by

A=

∇κ −∇κ

0 ∇1−κ

. (4.4)

To determine the step sizes in the PDHGM, we need to find a bound onkAk. Assuming κ∈[0,1], it is easy to show thatkAk2 _<₂₄_{. The other formulations can be cast to}

saddle-point formulations in the same fashion.

In order to discretise∇κ and∇1−κ, we simply use forward finite differences.

Note that the parameters , γ in (3.7) are numerical regularisation parameters only. As we run a relatively small number of PDHGM iterations to solve each inner problem, the solutions will be numerically inaccurate. This therefore allows us to ignoreγ and in the inner denoising problem, and to solve the original non-smooth problem instead. In the outer problem, we further do not restrict κ∈[0,1], as this is not strictly necessary. For reporting the results in a uniform fashion, we use the identity

k∇κvk1=|2κ−1| k∇ κ

2κ−1vk1, (4.5) which holds for the unsmoothed inner problems. Therefore every triple (α1, α2, κ)

withκ /∈[0,1]corresponds to a triple withκ∈(0,1).

5. Numerical results

For the implementation of the BFGS scheme we use the following numerical setup. The constant c in (4.1) is set to c = 10−4 _{throughout all experiments, whereas we}

use ρ = 10−8 _{in (4.2). In all cases the parameters} _α

(11)

lie in (10−5_,₁₀₀₎_{, and} _κ _{was constrained to lie in} ₍₋₅₀_,₅₀₎_{. We ran BFGS with}

100 different initialisations of the parameters α drawn from a 3rd order χ2_-squared

distribution with mean rescaled to the values (0.15,0.15) for TVTV and (3.9,0.15)

forL2_{TV, obtained by previous experimentation. The}_χ2_{-squared distribution is used}

in order to force the sampled parameters to be positive, but to not impose an upper bound. We report the afterwards optimised parameter α for which we obtained the best PSNR values for visual and quantitative comparison in the following figures and tables.

In order to solve the lower level problem, the PDHGM is run with a fixed number of iterations - 50 iterations in case the accelerated variant is applicable, otherwise 200 iterations, respectively. These were applied to the non-relaxed variants of the regularisers, as this allows us to leaveκunconstrained (see (4.5)). Each iteration uses warm initialisation with the optimal solution from the previous iteration.

We use three different2 + 1D video sequences for our denoising experiments: the sequences ’hand’, ’flight’ and ’harlem shake’. The sequence ’hand’, showing a hand falling onto a table, is of grid-size54×96and consists of 65 frames. It contains steady objects (like the table) and a moving object (the hand) that, at first not seen, moves onto the table where it becomes a steady object for the rest of the scene. The camera is fixed to a specific position. The sequence ’flight’ is filmed from a flying glider. Here the background scenery passes by, while also the camera is at movement. The movie has grid-size 96×54 and consists of 90 frames. The third sequence ’harlem shake’ is taken fromhttps://www.youtube.com/watch?v=-_ZG2xgNAr4. The original RGB video has grid-size540×720and consists of 711 frames. It has been converted to gray-scale double precision, and down-sampled to grid-size70×93and 73 frames in order to make processing in a reasonable amount of time possible. The video shows a room inside a lodge in which the majority of people are in a rather steady position, whereas one person is dancing (and therefore moving). Approximately half-way through the video the scene changes, and everyone is at movement. As in case of the video ’hand’, the camera is in a fixed position.

For all videos the underlying mesh-sizes are considered to be h = 1 in each dimension. Further, noisy versions of the video sequences have been created by perturbing the original sequences with Gaußian noise with mean zero and variance σ2_{= 0}_.₀₂_{respectively.}

5.1. Discussion of results

(12)

Table 2: “Hand” test video optimal results. The star ∗ in the optimal parameter means that the originalκfor the optimal result was outside the range[0,1], and the conversion (4.5) has been used to derive the presented values.

Model (α1, α2, κ) Opt. value PSNR SSIM

TVTV (0.162, 0.0844, 0.0466)∗ 104.5 32.08 0.9271 L2TV (9.85, 0.0674, 0.0529)∗ 127.5 31.22 0.9125

Table 3: “Flight” test video optimal results. The star ∗ in the optimal parameter means that the originalκfor the optimal result was outside the range[0,1], and the conversion (4.5) has been used to derive the presented values.

TVTV (0.0141, 0.017, 0.337)∗ 46.41 36.91 0.9569 L2TV (4.81, 0.0163, 0.542) 46.87 32.84 0.942

of the numerical reconstruction of (3.2) with (IC-L2TV) as a regulariser, for optimal parameters that are also given in Table 2. The results are similar to the TV-TV case; however, the spatial component shows a much smoother transition and therefore picks up the hand much earlier than TV-TV does. This also explains the quality-measure results in Table 2, as those tell us that TV-TV outperformsL2-TV for this sequence

both in terms of PSNR and SSIM.

For the next sequence, ’flight’, the situation looks quite similar in terms of PSNR and SSIM, however, the components are very different in this case. Figure 2a shows six consecutive frames of the original sequence, and the same frames from the sequence contaminated with noise. Figure 2b shows the reconstructions with (IC-TVTV) as a regulariser. Clearly, the temporal component in this case fails to pick up any information about the sequence, whereas all the information is stored in the spatial component. In case of (IC-L2TV), the optimal value forκis approximately 0.5 according to Table 3. Hence, we can hardly speak of a temporal and a spatial component in this case, but rather of an L2- and a TV-penalised component that are shown in Figure 2c. The first component is rather blurry, and more or less approximating the homogeneous sky and a blurred version of the cockpit. The second component on the other hand approximates the sharp features and structures, but not the homogeneous background parts.

For the last sequence shown in Figure 3, Harlem shake, we figure out that both (IC-TVTV) and (IC-L2_{TV) seem to perform almost equally well, indicating a small}

value ofκwhich is confirmed by Table 4. Given the nature of the scene, the temporal part captures most of the scene, as there are only very few pixels that remain (almost) unchanged over time. Some of those are captured in the spatial component, like parts of the table in the background on the left hand side for instance.

6. Conclusions & Outlook

(13)

Original ₀_.₈₀_s

Original ₀_.₈₇_s

Original ₁_.₃₇_s

Original ₁_.₄₃_s

Original ₁_.₅₀_s

Original ₁_.₆₀_s

NoisyNoisyNoisyNoisyNoisyNoisy

(a) Original and noisy videos

Reconstr.Reconstr.Reconstr.Reconstr.Reconstr.Reconstr.

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

SpatialSpatialSpatialSpatialSpatialSpatial

(b) TVTV reconstruction

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

(c)L2TV reconstruction

Figure 1: “Hand” test video reconstructions with optimally learned parameters using (3.2) for the ICTVTV and ICL2_{TV regularisation models. The reconstructions are}

depicted together with their temporal and spatial components. Note howL2_{TV picks}

(14)

Original ₁_.₃₃_s

Original ₁_.₆₇_s

Original ₂_.₀₀_s

Original ₂_.₃₃_s

Original ₂_.₆₇_s

Original ₃_.₀₀_s

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

(c)L2TV reconstruction

Figure 2: “Flight” test video reconstructions with optimally learned parameters using (3.2) for the ICTVTV and ICL2_{TV regularisation models. The reconstructions are}

depicted together with their temporal and spatial components. Note how TVTV manages to extract no temporal component, while L2_{TV manages to extract the}

(15)

Original

4.77s

Original

9.54s

Original

14.30s

Original

19.07s

Original

23.84s

Original

28.61s

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

T

emp

oral

(c)L2_{TV reconstruction}

(16)

Table 4: “Harlem shake” test video optimal results. The star ∗ in the optimal parameter means that the original κ for the optimal result was outside the range

[0,1], and the conversion (4.5) has been used to derive the presented values.

TVTV (0.0649, 0.0122, 0.0319) 39.52 37.67 0.9621 L2TV (23, 0.0114, 0.0265) 41.02 37.53 0.9604

for three distinctive video sequences. The sequences considered allow the assumption, that the use of the infimal convolution models for video denoising highly depends on the corresponding video sequence. As for sequences with stationary backgrounds, the decomposition into spatial and temporal components seems to work well, with the optimal κ being close to 0 or 1 allowing for a clear distinction of a temporal and a spatial component. For sequences like the ’flight’-sequence that lack stationary parts, the distinction is not clear at all, which is also underpinned by the optimalκ value being close to 0.5. In this setup the modelling assumption of multiple infimal convolutions of the same functional also breaks down, as they will be the same. A choice ofκ close to 0.5 only makes sense for infimal convolutions of functionals that promote very different information (like theL2TV model in our case).

Future research should address different error measures for the comparison of the denoised video sequences to the ground truth, in order to see, if different error measures will lead to similar or different conclusions. Further can the proposed research be easily extended to general, bounded linear operatorsK, which has been omitted here for the sake of brevity. Research on infimal convolutions of different, complementary regularisation functionals seems to be another promising direction that future research can head for.

Acknowledgements

M. Benning, C.-B. Schönlieb and T. Valkonen acknowledge support from the EPSRC grant Nr. EP/M00483X/1 and from the Leverhulme grant ‘Breaking the non-convexity barrier’. V. Vlačić was supported by a Bridgwater summer internship.

A data statement for the EPSRC

The code and data will be put into a repository as the final version is submitted.

References

[1] Holler M and Kunisch K 2014SIAM Journal on Imaging Sciences 72258–2300 [2] Rudin L I, Osher S and Fatemi E 1992Physica D: Nonlinear Phenomena60259–268 [3] Donoho D L 2006Information Theory, IEEE Transactions on521289–1306

[4] Burger M and Osher S 2013 A guide to the tv zooLevel Set and PDE Based Reconstruction Methods in Imaging(Springer) pp 1–70

[5] Chambolle A and Lions P L 1997Numerische Mathematik 76167–188

[6] Benning M, Brune C, Burger M and Müller J 2013Journal of Scientific Computing54269–310 [7] Benning M 2011Singular regularization of inverse problems Ph.D. thesis

(17)

[9] Roth and Black M J 2005 Fields of experts: A framework for learning image priorsComputer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on vol 2 (IEEE) pp 860–867

[10] Tappen M F 2007 Utilizing variational optimization to learn Markov random fieldsComputer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on (IEEE) pp 1–8 [11] Domke J 2012 Generic methods for optimization-based modelingInternational Conference on

Artificial Intelligence and Statisticspp 318–326

[12] Chen Y, Ranftl R and Pock T 2014Image Processing, IEEE Transactions onTo appear [13] Haber E and Tenorio L 2003Inverse Problems19611

[14] Haber E, Horesh L and Tenorio L 2010Inverse Problems26025002

[15] Bui-Thanh T, Willcox K and Ghattas O 2008SIAM Journal on Scientific Computation 30

3270–3288

[16] Biegler L, Biros G, Ghattas O, Heinkenschloss M, Keyes D, Mallick B, Tenorio L, van Bloemen Waanders B, Willcox K and Marzouk Y 2011 Large-scale inverse problems and quantification of uncertaintyvol 712 (John Wiley & Sons)

[17] De los Reyes J C and Schönlieb C B 2013Inverse Problems and Imaging 7

[18] Calatroni L, De los Reyes J C and Schönlieb C B 2014 Dynamic sampling schemes for optimal noise learning under multiple nonsmooth constraints System Modeling and Optimization (Springer Verlag) pp 85–95

[19] Reyes J C D L, Schönlieb C B and Valkonen T 2015 Journal of Mathematical Analysis and ApplicationsAccepted (PreprintarXiv:1505.01953)

[20] Reyes J C D L, Schönlieb C B and Valkonen T 2015 Bilevel parameter learning for higher-order total variation regularisation models submitted (PreprintarXiv:1508.07243)

[21] Calatroni L, De los Reyes J C and Schönlieb C B A variational model for mixed noise distribution in preparation

[22] Chung C V and De los Reyes J C Learning optimal spatially-dependent regularization parameters in total variation image restoration in preparation

[23] Kunisch K and Pock T 2013SIAM Journal on Imaging Sciences 6938–983

[24] Chen Y, Pock T and Bischof H 2012 Learning`1-based analysis and synthesis sparsity priors using bi-level optimizationWorkshop on Analysis Operator Learning vs. Dictionary Learning, NIPS 2012

[25] Ochs P, Ranftl R, Brox T and Pock T 2015 Bilevel Optimization with Nonsmooth Lower Level ProblemsInternational Conference on Scale Space and Variational Methods in Computer Vision (SSVM)to appear

[26] Chung J, Español M I and Nguyen T 2014arXiv preprint arXiv:1407.1911

[27] Hintermüller M and Wu T 2014 Bilevel optimization for calibrating point spread functions in blind deconvolution preprint

[28] Baus F, Nikolova M and Steidl G 2014Journal of Mathematical Imaging and Vision48295–307 [29] Schmidt U and Roth S 2014 Shrinkage fields for effective image restorationComputer Vision

and Pattern Recognition (CVPR), 2014 IEEE Conference on(IEEE) pp 2774–2781 [30] Fehrenbach J, Nikolova M, Steidl G and Weiss P 2015 Bilevel image denoising using gaussianity

testsScale Space and Variational Methods in Computer Vision(Lecture Notes in Computer Science vol 9087) ed Aujol J F, Nikolova M and Papadakis N (Springer International Publishing) pp 117–128 URLhttp://dx.doi.org/10.1007/978-3-319-18461-6_10

[31] Ranftl R and Pock T 2014 A deep variational model for image segmentation 36th German Conference on Pattern Recognition (GCPR)

[32] Klatzer T and Pock T 2015 Continuous Hyper-parameter Learning for Support Vector Machines Computer Vision Winter Workshop (CVWW)

[33] Chen Y, Yu W and Pock T 2015 On learning optimized reaction diffusion processes for effective image restorationIEEE Conference on Computer Vision and Pattern Recognition to appear [34] Reyes J C D L and Schönlieb C B 2013Inverse Problems and Imaging 71183–1214 (Preprint

arXiv:1207.3425)

[35] Reyes J C D L, Schönlieb C B and Valkonen T 2015 Journal of Mathematical Analysis and Applications In press (Preprint arXiv:1505.01953) URL http://iki.fi/tuomov/ mathematics/interior.pdf

[36] Reyes J C D L, Schönlieb C B and Valkonen T 2015 Bilevel parameter learning for higher-order total variation regularisation models submitted (Preprint arXiv:1508.07243) URL

http://iki.fi/tuomov/mathematics/tgv_learn.pdf

[37] Zeidler E 2012 Applied Functional Analysis: Applications to Mathematical Physics Applied Mathematical Sciences (Springer New York) ISBN 9781461208150

(18)