The structure of optimal parameters for image restoration problems

(1)

Contents lists available atScienceDirect

Journal

of

Mathematical

Analysis

and

Applications

www.elsevier.com/locate/jmaa

The

structure

of

optimal

parameters

for

image

restoration

problems

J.C. De Los Reyesa, C.-B. Schönliebb, T. Valkonena,b,∗

a _{Research Center on Mathematical Modelling (MODEMAT), Escuela Politécnica Nacional, Ladrón de} Guevara E11-253, Quito, Ecuador

b _{Department of Applied Mathematics and Theoretical Physics, University of Cambridge,} Wilberforce Road, CB3 0WA, Cambridge, United Kingdom

a r t i c l e i n f o a b s t r a c t

Article history: Received 8 May 2015

Available online 16 September 2015 Submitted by H. Frankowska Keywords:

Total variation

Total generalised variation Bi-level optimisation Optimality

Parameter choice

We study the qualitative properties of optimal regularisation parameters in variationalmodels for imagerestoration.The parametersare solutionsof bilevel optimisationproblemswiththeimagerestorationproblemasconstraint.Ageneral type of regulariser is considered, which encompasses total variation (TV), total generalised variation(TGV) andinﬁmal-convolution total variation (ICTV). We provethatundercertainconditionsonthegivendataoptimalparametersderived bybileveloptimisationproblemsexist.Acrucialpointintheexistenceproofturns outtobetheboundednessoftheoptimalparametersawayfrom0 whichweprove inthispaper.Theanalysisisdoneontheoriginal–inimagerestorationtypically non-smooth variational problem – as well as on a smoothed approximation set inHilbertspacewhich istheoneconsidered innumerical computations.Forthe smoothedbilevelproblemwealsoprovethatitΓ convergestotheoriginalproblem asthesmoothingvanishes.Allanalysisisdoneinfunctionspacesrather thanon thediscretisedlearningproblem.

1. Introduction

In this paper we consider the generalvariational image reconstructionproblem that, given parameters

α= (α1,. . . ,αN),N≥1,aimstocomputeanimage

uα∈arg min u∈X

J(u;α).

The image depends on α and belongs in our setting to a generic function space X. Here J is a generic energymodellingourpriorknowledgeontheimageuα.Thequalityofthesolutionuαofvariationalimaging

* Corresponding author.

E-mail addresses:[email protected](J.C. De Los Reyes), [email protected](C.-B. Schönlieb), tuomo.valkonen@iki.ﬁ(T. Valkonen).

http://dx.doi.org/10.1016/j.jmaa.2015.09.023

(2)

approacheslikethisonecruciallyreliesonagoodchoiceoftheparametersα.Weareparticularlyinterested inthecase J(u;α) = Φ(Ku) + N j=1 αjAjuj,

withKagenericboundedforwardoperator,Φ aﬁdelityfunction,andAj linearoperatorsactingonu.The

valuesAjuarepenalisedinthetotalvariationorRadonnormμj =μ_M(Ω;Rmj₎,andcombinedconstitute

theimageregulariser.Inthiscontext,αrepresentstheregularisationparameterthatbalancesthestrength of regularisation against the ﬁtness Φ of the solution to the idealised forward model K. The size of this parameterdependsonthelevelofrandomnoiseandthepropertiesoftheforwardoperator.Choosingittoo largeresultsinover-regularisationofthe solutionandinturn maycausethe lossof potentially important details in the image; choosing it too small under-regularises the solution and may result in a noisy and unstableoutput. Inthis workwewilldiscuss andthoroughly analyseabileveloptimisationapproachthat isabletodeterminetheoptimalchoiceofαinJ(;α).

Recently,bilevelapproaches forvariationalmodels havegainedincreasingattentioninimage processing and inverse problems in general. Based on prior knowledge of the problem in terms of a training set of imagedataandcorrespondingmodelsolutionsorknowledgeofothermodeldeterminantssuchasthenoise level,optimalreconstructionmodelsareconceivedbyminimisingacostfunctional –calledF inthesequel –constrainedtothevariationalmodelinquestion.Wewillexplainthisapproachinmoredetailinthenext section. Before,letus giveanaccount ofthestateof theartof bileveloptimisationformodellearning.In machinelearningbileveloptimisationiswellestablished.Itisasupervisedlearningmethodthatoptimally adaptsitselftoagivendatasetofmeasurementsanddesirablesolutions.In [41,42,28,29,18,19],forinstance the authors consider bilevel optimisation for ﬁnite dimensional Markov random ﬁeld (MRF) models. In inverse problems the optimal inversion and experimental acquisition setup is discussed in the context of optimal model designin worksby Haber, Horesh and Tenorio [34,32,33], as well as Ghattas et al.[13,8]. Recentlyparameterlearninginthecontextoffunctional variationalregularisationmodels alsoenteredthe image processingcommunitywith worksbytheauthors [23,14], Kunisch,Pockand co-workers [37,17]and Chungetal. [20].AveryinterestingcontributioncanbefoundinapreprintbyFehrenbachetal. [31]where the authors determine an optimal regularisation procedureintroducing particular knowledgeof the noise distributionintothelearningapproach.

Apartfromtheworkoftheauthors [23,14],allapproachesforbilevellearninginimage processingsofar areformulatedandoptimised inthediscrete setting.Oursubsequentmodelling,analysis andoptimisation willbecarriedoutinfunctionspaceratherthanonadiscretisationofthevariationalmodel.Inthiscontext, acareful analysis of the bilevelproblem is ofgreat relevancefor its applicationinimage processing. The structure of optimal regularisers is important, among others, for thedevelopment of solution algorithms. Inparticular, iftheparametersarebounded andlieintheinteriorofaclosedconnectedset,theneﬃcient optimisation methods can be used for solving the problem. Previous results on optimal parameters for inverseproblemswithpartial diﬀerentialequationshavebeen obtainedin,e.g., [16].

Inthispaperwestudythequalitativestructureofregularisation parametersarisingassolutionsfromthe bileveloptimisationofvariationalmodels.Inourframeworkthevariationalmodelsaretypicallyconvexbut non-smoothand posedinBanachspaces.Thetotalvariationandtotalgeneralised variationregularisation models are particular instances.Alongside theoptimisation of thenon-smooth variationalmodel, we also consider a smoothed approximation in Hilbert space which is typically the one considered in numerical computation. Under suitable conditions, we prove that – for both the original non-smooth optimisation problem as well as the regularised Hilbertspace problem – the optimal regularisers are bounded and lie inthe interiorof thepositiveorthant. Theconditions necessary to provethis turn outto be verynatural

(3)

conditionsonthegivendatainthecaseofanL2_cost_functional_F_._Indeed,_for_the_total_variation_regularisers

with L2_-squared_cost_and_ﬁdelity,_we_will _merely_require

TV(f)>TV(f0),

withf0theground-truthandf thenoisyimage.Thatis,thenoisyimageshouldoscillatemoreintermsof

the totalvariationfunctional,than theground-truth. Forsecond-ordertotal generalisedvariation [11],we obtain an analogous condition. Apartfrom the standard L2 _costs, _we _also _discuss _costs_that _constitute_a

smoothedL1_norm_of_the_gradient_of_the_original_data_–_we_will_call_this_the_Huberised_total_variation_cost

inthesequel–typicallyresultinginoptimalsolutionssuperiortotheones minimisinganL2 cost.Forthis case,however,theinteriorpropertyofoptimalparameterscouldbeveriﬁedforaﬁnitedimensionalversion ofthecostonly.Eventually,wealsoshowthatasthenumericalsmoothingvanishestheoptimalparameters forthesmoothedmodels tendtooptimalparametersoftheoriginalmodel.

Theresultsderivedinthispaperaremotivatedbyproblemsinimageprocessing.However,their applica-bility goeswell beyondthatand canbe generallyappliedto parameterestimation problemsof variational inequalitiesofthesecondkind,forinstancetheparameterestimationprobleminBinghamﬂow [22]. Previ-ousanalysisinthiscontexteitherrequiredboxconstraintsontheparametersinordertoproveexistenceof solutionsor theadditionof aparameterpenaltyto thecostfunctional [3,6,7,23]. Inthispaper,we require neither butrather provethat under certain conditions on the variational model and for reasonable data and cost functional, optimal parameters are indeed positiveand guaranteed to be bounded away from 0 and ∞.Aswe willsee laterthisisenoughforproving existenceofsolutionsandcontinuityof thesolution map.Thenextstepfromourworkinthishereisderivingnumericallyusefulcharacterisationsofsolutions to theensuingbi-levelprograms.Forthemostbasicproblemsconsideredhereinthishasbeendonein [23]

under numerical H1 _{regularisation.}_We _will _consider _in_the _follow-up _work _[24] _the _optimality_conditions

forhigher-orderregularisersandthenewcostfunctionalsintroducedinthiswork.Foranextension charac-terisationofoptimalitysystemsforbi-leveloptimisationinﬁnite dimensions,wepointthereaderto [27]as astartingpoint.

Outline of the paper InSection2weintroducethegeneralbilevellearningproblem,statingassumptionson thesetupofthecostfunctionalFandthelowerlevelproblemgivenbyavariationalregularisationapproach. The bilevel problem is discussed in its original non-smooth form (P) as well as in asmoothed form ina Hilbert space setting (Pγ,₎ _in_Section_2.2_, _that_will _be _the_one _used _in_the_numerical _{computations.}_The

bilevelproblemisputincontextwithparameterlearningfornon-smoothvariationalregularisationmodels, typical in image processing, by proving the validity of the assumptions for examples such as TV, TGV and ICTVregularisation.Themainresultsofthepaper –existenceofpositiveoptimalparametersforL2_,

Huberised TV and L1 _type _costs_and _the_convergence _of _the_smoothed _numerical_problem _to _the _original

non-smooth problem – are stated inSection 3. Auxiliary results, such as coercivity, lower semicontinuity and compactness results for the involved functionals, is the topic of Section 4. Proofs for existence and convergence of optimal parameters are contained in Section5. The paper ﬁnishes with abrief numerical discussion inSection6.

2. Thegeneralproblemsetup

Let Ω ⊂ Rm be an open bounded domain with Lipschitz boundary. This will be our image domain. Usually Ω = (0,w)×(0,h) for w and h the width and height of a two-dimensional image, although no such assumptions are made in this work. Our noisy or corrupted data f is assumed to lie in a Banach space Y, which isthe dual of Y_∗, while ourground-truth f0 lies inageneral Banachspace Z. Usually, in

(4)

data is just acorrupted version of the original image.Further d= 1 for grayscale images, and d = 3 for colourimagesintypicalcolourspaces.Forsub-sampledreconstructionfromFouriersamples,wemightuse aﬁnite-dimensionalspaceY =Cn_–_there_are_however_some_subtleties_with_that,_and_we_refer_the_interested

readerto [1].

Asourparameterspaceforregularisationfunctional weightswetake

P∞

α := [0,∞]N,

where N isthedimensionoftheparameterspace, thatisα= (α1,. . . ,αN),N ≥1.Observethatweallow

inﬁnite and zero valuesfor α. The reasonfor the former is that incase of TGV2, it is not reasonable to expect thatboth α1 and α2 are bounded; such conditionswould imply thatTGV2 performs betterthan

both TV2 and TV. But we want our learning algorithm to ﬁnd out whetherthat is the case! Regarding zerovalues,oneofourmain tasksisproving thatforreasonabledata,optimalparametersinfactlieinthe interior

intP_α∞= (0,∞]N.

This is required for the existence of solutions and the continuity of the solution map parametrised by additionalregularisationparametersneededforthenumericalrealisationofthemodel.Wealsoset

Pα:= [0,∞)N

forsomeoccasionswhenweneedaboundedparameter.

Remark1. Inmuch ofour treatment,we could allowfor spatiallydependentparametersα. However,the parameters would need to lie in a ﬁnite-dimensional subspace of C0(Ω;RN) in our theory. Minding our

generaldeﬁnitionofthefunctional Jγ,0₍_·_;_α_{) below,}_no_generality_is _lost_by_taking _α_to _be_vectors_in_RN_.

Wecould simplyreplace thesuminthe functional as alargersummodellingintegration over parameters withvaluesinaﬁnite-dimensionalsubspaceof C0(Ω;RN).

Inour generallearning problem,we look for α= (α1,. . . ,αN) solving for someconvex, proper, weak*

lowersemicontinuous costfunctionalF :X →Rtheproblem min α∈P_α∞F(uα) (P) subjectto uα∈arg min u∈X J(u;α), (Dα) with J(u;α) := Φ(Ku) + N j=1 αjAjuj.

Here wedenoteforshortthetotalvariationnorm

μj:=μM(Ω;Rmj), (μ∈ M(Ω;Rmj)).

ThefollowingcoversourassumptionswithregardtoA,K, andΦ.Wediscussvariousspeciﬁcexamplesin Section2.1immediatelyafter statingtheassumptions.

(5)

Assumption A-KA (Operators AandK). We assume thatY is aBanach space, and X a normed linear space, boththemselvesduals ofY_∗ andX_∗,respectively.Wethen assumethatthelinearoperators

Aj:X→ M(Ω;Rmj), (j= 1, . . . , N),

and

K:X →Y,

are bounded. Regarding K, we also assume theexistence of a bounded aright-inverse K† :R(K)→ X, where R(K) denotestherangeoftheoperatorK. Thatis,KK†=I onR(K).Wefurtherassumethat

u_X :=

N

j=1

Ajuj+KuY (1)

is a norm on X, equivalent to the standard norm. In particular, by the Banach–Alaoglu theorem, any sequence {ui_}∞

i=1 ⊂X withsupiuiX<∞possessesaweakly*convergentsubsequence.

Assumption A-Φ (The ﬁdelity Φ). Wesuppose Φ :Y →(−∞,∞] is convex, proper, weakly* lower semi-continuous, andcoerciveinthesense that

Φ(v)→+∞ as vY →+∞. (2)

We assumethat0∈dom Φ,andthe existenceoff ∈arg min_v_∈_Y Φ(v) suchthatf =Kf¯for somef¯∈X. Finally,we requirethateitherK iscompact,or Φ iscontinuous andstronglyconvex.

Remark2.Whenf¯exists,wecanchoosef¯=K†f.

Remark3.Insteadof0∈dom Φ,itwouldsuﬃcetoassume,moregenerally,thatdom Φ∩N_j₌₁kerAj=∅.

Wealsorequirethefollowingtechnicalassumptionontherelationshipoftheregularisationtermsandthe ﬁdelity Φ.Roughlyspeaking,inmostinterestingcases,itsaysthatforeach,we cancloselyapproximate thenoisydataf with functionsfδ,of“order ”.Theorderhereisnotdirectlyintermsofderivatives,but

as deﬁnedbytheoperators {Aj}throughtheconditions (3): fδ,j = 0 forj =, sothere isno information

at otherorders,while Afδ,isbounded, sotheinformation atorderisbounded.

AssumptionA-δ(Order reduction). Weassumethatforevery∈ {1,. . . ,N}andδ >0,thereexistsf¯δ,∈X

suchthat Φ(Kf¯δ,)< δ+ inf Φ, (3a) Af¯δ,<∞, and, (3b) j= Ajf¯δ,j= 0. (3c)

Example 1 (Orders related to TGV2 andICTV). ForICTV,asformulatedin Example 6,“order2”means that fδ,2 ∈ BV2(Ω),while “order 1”means fδ,1 ∈BV(Ω). For the deﬁnitionof BV2(Ω),we referto [26].

ForTGV2,asformulatedin Example 5,“order1”islikewiseBV(Ω),but“order2”isthespaceoffunctions

(6)

2.1. Speciﬁc examples

Wenowdiscussafewexamplestomotivatetheabstractframeworkabove. Example2 (Squared L2 _ﬁdelity)._With_Y ₌_L2_(Ω),_f _∈_Y_,_and

Φ(v) = 1

2v−f

2

Y,

werecover thestandardL2_-squared_ﬁdelity, _modelling_Gaussian_noise._On_a_bounded _domain_Ω, Assump-tion A-Φfollowsimmediately.

With ﬁdelity functionals as in Example 2, the operator K has a double-role in (Dα). It models any

forwardoperatorT,suchas ablurringkernelor asub-sampledFouriertransform,butalsoactsto extract theinterestingcomponentsofuinthecorrectspace.Initslatterrole,theoperatorisalsorequiredwithinF. WethereforeintroducetheoperatorK0forthatpurpose.ThenfordenoisingwewanttotakeK=K0,but

forotherimage processingtasks,wewouldcombinethiswiththeforwardoperatorT,takingK=T K0.

Example3 (Cost functionals). Forthecostfunctional F,givennoise-freedataf0∈Z=L2(Ω),weconsider

inparticulartheL2 cost

F_L2 2(u) := 1 2f0−K0u 2 L2_(Ω),

aswell astheHuberisedtotalvariationcost

FL1

η∇(u) :=D(f0−K0u)γ

with noise-free data f0 ∈ Y := BV(Ω). For the deﬁnition of the Huberised total variation, we refer to

Section2.2 onthenumericsofthebi-levelframework (P).Inthenumericalcounterpart [24]tothepresent paper,wehaveobservedthattheHuberised totalvariationcostproducesimprovedresultscomparedtothe

L2 costwhenmeasuredwiththeSSIM [46],whilehaving theadvantageof beingconvexunliketheSSIM. Example4 (Total variation denoising). WetakeK0astheembeddingofX= BV(Ω)∩L2(Ω) intoZ=L2(Ω),

andA1=D.WeequipX withthenorm

uX :=uL2_(Ω)+Du_M_(Ω;_Rn₎.

This makes K0 a bounded linear operator. If the domain Ω has Lipschitz boundary, and the dimension

satisﬁesn∈ {1,2},thespaceBV(Ω) continuouslyembedsintoL2(Ω)[2,Corollary3.49].Therefore,wemay identifyXwithBV(Ω) asanormedspace.Otherwise,ifn≥3,withoutgoingintothedetailsofconstructing

X asadualspace,1_we_deﬁne_weak*_convergence_in_X_as_combined_weak*_convergence_in_{BV(Ω) and}_L2_(Ω).

Anybounded sequenceinX will thenhaveaweak*convergentsubsequence.This istheonlypropertywe wouldusefromX beingadualspace.

Now,combinedwith Example 2andthechoiceK=K0,Y =Z,wegettotalvariationdenoisingforthe

sub-problem (Dα). Assumption A-KA holdingisimmediate fromthepreviousdiscussion. Assumption A-δ

is also easily satisﬁed, as with f ∈ BV(Ω)∩L2_(Ω), _we _may _simply _pick _f¯_δ,₁ ₌ _f _for _the _only _possible case = 1.Observe however that K is not compact unless n = 1, see [2, Corollary 3.49], so the strong

1 _{This can be achieved by allowing}_φ

(7)

convexity of Φ is crucial here. If the data f ∈ L∞(Ω), then it is well-known that solutions uˆ to (Dα)

satisfyuˆL∞(Ω)≤ fL∞(Ω).Wecouldthereforeconstructacompactembeddingbyaddingsomeartiﬁcial

constraintstothedataf.Thischangesinthenexttwoexamples,asboundednessofsolutionsforhigher-order regularisersisunknown;see also [43].

Example 5 (Second order total generalised variation denoising). Wetake

X= (BV(Ω)∩L2(Ω))×BD(Ω),

theﬁrstpartwiththesametopologyasin Example 4.WealsotakeZ=L2_(Ω),_denote _u_{= (}_v,_w_),_and_set

K0(v, w) =v, A1u=Dv−w, and A2u=Ew

for E the symmetrised diﬀerential. With K = K0 and Y = Z, this yields second-order total generalised

variation (TGV2) denoising [11] for the sub-problem (Dα). Assuming for simplicity that α1,α2 > 0 are

constants,toshow Assumption A-KA,werecallfrom [12]theexistenceofaconstantc=c(Ω) suchthat

c−1vBV(Ω)≤ vL1_(Ω)+Dv_M_(Ω;_Rn₎≤cv_BV(Ω),

where thenorm

vBV(Ω):= TGV2(1,1)(v) +vL1_(Ω).

Wemaythusapproximate

wL1_(Ω;_Rn₎≤ Dv−w_M_(Ω;_Rn₎+Dv_M_(Ω;_Rn₎

≤ Dv−w_M(Ω;Rn₎+cTGV2₍₁_,₁₎(v) +u_L1_(Ω) ≤(1 +c)Dv−w_M(Ω;Rn₎+Ew_M_(Ω;_Rn×n₎

+cvL1_(Ω).

ForsomeC >0,itfollows

uX = vL2_(Ω)+Dv_M_(Ω;_Rn₎ +wL1_(Ω;_Rn₎+Ew_M_(Ω;_Rn×n₎ ≤CDv−w_M(Ω;Rn₎+Ew_M_(Ω;_Rn×n₎+v_L2_(Ω) =C _N i=1 Ajuj+KuL2_(Ω) .

ThisshowsuX≤CuX.TheinequalityuX ≥ uX followseasilyfromthetriangleinequality,namely

uX≥ vL2_(Ω)+Dv−w_M_(Ω;_Rn₎+Ew_M_(Ω;_Rn×n₎.

Thus ·_X isequivalent to· X.

NextweobservethatclearlyDvk−wk Dv∗ −winM(Ω;Rn) andEwk Ew∗ inM(Ω;Rn×n) ifvk v∗

inBV(Ω) andwk_w∗ _weakly*_in_BD(Ω)._Thus_A

1andA2areweak*continuous. Assumption A-KAfollows.

Considerthenthesatisfactionof Assumption A-δ.If= 1,wemaythenpickf¯δ,1= (f,0),inwhichcase

A1fδ,1 =Df, and A2fδ,1 = 0.If = 2,which isthe only other case,we maypick asmoothapproximate

fδ,2to f suchthat

1

2f−fδ,2

2

(8)

Thenwesetf¯δ,2:= (fδ,2,∇fδ,2),yieldingA1f¯δ,2= 0 andA2f¯δ,2=∇2fδ,2.Byfδ,2beingsmoothweseethat A2f¯δ,2M<∞.Thus Assumption A-δissatisﬁedforthesquaredL2ﬁdelityΦ(v)=f −v2_L2_(Ω).

Example6 (Inﬁmal convolution TV denoising). Letus takeZ =L2_(Ω),_and _X _{= (BV(Ω)}_∩_L2_(Ω))_×_X 2,

where

X2:={v∈W1,1(Ω)| ∇v∈BV(Ω;Rn)},

andBV(Ω)∩L2(Ω) againhasthetopologyof Example 4.Settingu= (v,w),aswell as

K0(v, w) =v+w, A1u=Dv, and A2u=D∇w,

weobtainfor (Dα)withK=K0andY =Z theinﬁmalconvolutiontotalvariationdenoisingmodelof [15]. Assumptions A-KA,A-δ,andA-H areveriﬁedanalogouslytoTGV2 in Example 5.

Example7 (Deblurring and sub-sampled Fourier transforms for MRI). LetK0 andtheAjsbe givenbyone

ofthe regularisersof Example 4to Example 6. AlsotakethecostF =FL2

2 or FL1η∇ as in Example 3,and

Φ as the squaredL2 _ﬁdelity _of _{Example 2}_. _However,_let _us _now _take_K ₌_{T K}

0 for some bounded linear

operatorT :Z →Y. TheoperatorT couldbe,for example,ablurring kernelor a(sub-sampled) Fourier transform, in which case we obtain a modelfor learning the parameters for deblurring or reconstruction from Fouriersamples. Thelatterwouldbe important, forexamplefor magneticresonance imaging(MRI)

[5,44,45].Unfortunately,ourtheorydoesnoextendtomanyofthesecasesbecausewewillrequire,roughly,

K₀∗K0 ≤CK∗K forsomeconstantC >0. Inother words, the cost functional cannot see what the ﬁdelity does not.

Example8 (Parameter estimation in Bingham ﬂow). Binghamﬂuids arematerialsthatbehaveas solidsif themagnitudeofthestresstensorstaysbelowaplasticitythreshold,andasliquidsifthatquantitysurpasses thethreshold.Inacrosssectionalpipe,ofsectionΩ,thebehaviourismodelled bytheenergyminimisation functional min u∈H1 0(Ω) μ 2u 2 H1 0(Ω)−(f|u)H−1(Ω),H 1 0(Ω)+α Ω |∇u|dx, (4)

where μ > 0 stands for the viscosity coeﬃcient, α > 0 for the plasticity threshold and f ∈ H−1_(Ω). _In

many practicalsituations,the plasticitythreshold is notknowninadvance and hasto be estimated from experimentalmeasurements.Onethenaimsatminimising a leastsquaresterm

FL2 2 = 1 2u−f0 2 L2_(Ω) subjectto (4).

The bileveloptimisation problem can then be formulated as problem (P)–(Dα), with the choices X = Y =H1

0(Ω),K theidentity,A1=D and

φ(v) = μ 2v 2 H1 0(Ω)−(f|u)H−1(Ω),H 1 0(Ω).

Concentratingintherestofthispaperprimarilyonimageprocessingapplications,wewillhoweverbrieﬂy returntoBinghamﬂow in Example 13.

(9)

2.2. Considerations for numerical implementation

Forthenumericalsolutionofthedenoisingsub-problem,wewillinafollow-upwork [24]expanduponthe infeasible semi-smooth quasi-Newton approach taken in [35] for L2_{-TV image}_restoration _problems. _This

depends onHuber-regularisation of thetotalvariation measures,as well as enforcingsmoothnessthrough Hilbert spaces. This isusually doneby asquaredpenalty onthe gradient,i.e., H1 _{regularisation,} _but_we

formalisethismoreabstractlyinordertosimplifyournotationandargumentslateron.Therefore,wetake a convex, proper, and weak* lower-semicontinuous smoothing functional H : X → [0,∞], and generally expectittosatisfythefollowing.

AssumptionA-H (Smoothing). Weassumethat0∈domH andforeveryδ≥0,everyα∈ Pα∞,andevery

u∈X,theexistenceofuδ_∈_X _satisfying

H(uδ)<∞ and J(uδ;α)≤J(u;α) +δ. (5) Example 9 (H1 _{smoothing in}_BV(Ω)_)._Usually, _with_H1_(Ω;_Rm₎_∩_X ₌_∅_, _we_take

H(u) := 1 2∇u 2 L2_(Ω;_Rm×n₎, u∈H1(Ω;Rm)∩X, ∞, otherwise.

This isinparticular thecasewith Example 4(TV),where X = BV(Ω)∩L2_(Ω)_⊃_H1_(Ω),_and_{Example 5}

(TGV2), where X = (BV(Ω)∩L2(Ω))×BD(Ω)⊃H1(Ω)×H1(Ω;Rn) on abounded domain Ω. In both of these cases, weak* lower semicontinuity is apparent; for completeness we record this in Lemma 1 in Section 4.2.In caseof Example4, (5)is immediate from approximatingustrictly by functionsinC∞(Ω) usingstandardstrictapproximationresultsinBV(Ω)[2].Incaseof Example 5,thisalsofollowsbyasimple generalisation ofthesameargumentto TGV-strictapproximation,aspresented in [43,10].

Forparameters≥0 andγ∈(0,∞],wethenconsider theproblem min

α∈Pα∞

F(uα,γ,) (Pγ,)

where uα,γ,∈X∩domHsolves

min u∈XJ γ,₍_u_;_α₎ _(Dγ,₎ for Jγ,(u;α) :=H(u) + Φ(Ku) + N j=1 αjAjuγ,j.

Here wedenoteforshort theHuber-regularisedtotalvariationnorm

μγ,j :=μγ,M(Ω;Rmj), (μ∈ M(Ω;Rmj)),

asgivenbythefollowingdeﬁnition.Thereweinterpretγ=∞togivebackthestandardunregularisedtotal variationmeasure.ClearlyJ∞,0₌_J_,_and_(D

(10)

Deﬁnition1.Givenγ∈(0,∞], wedeﬁneforthenorm· 2 onRm,theHuberregularisation |g|γ = g2−₂1_γ, g2≥1/γ, γ 2g 2 2, g2<1/γ.

Weobservethatthiscanequivalentlybewrittenusingconvexconjugatesas

α|g|γ = sup q, g − 1 2γαq 2 2q2≤α . (6)

Thenifμ=fLn₊_μs_is_the_Lebesgue_{decomposition}_of_μ_{∈ M}_(Ω;_Rm_{) into}_the_absolutely_continuous_part

fLn _and _the_singular_part_μs_,_we _set

|μ|γ(V) :=

V

|f(x)|γdx+|μs|(V), (V ⊂Ω Borel-measureable).

Themeasures |μ|γ istheHuber-regularisationof thetotalvariation measures|μ|, andwedeﬁne itsRadon

normas theHuberregularisation oftheRadonnorm ofμ,thatis

μγ,M(Ω;Rm₎:=|μ|_γ_M_(Ω;_Rm₎.

Remark 4. The parameter γ varies in the literature. In this paper, we use the convention of [23], where largeγmeanssmallHuberregularisation.In [35],smallγ meanssmallHuberregularisation.Thatis, their regularisationparameteris1/γ inournotation.

2.3. Shorthand notation

Writingfornotationallightness

Au:= (A1u, . . . , ANu), and Rγ(μ1, . . . , μN;α) := N j=1 αjμjγ,j, R(·;α) :=R∞(·;α),

ourproblem (Pγ,₎_becomes

min

α∈P∞

α

F(uα) subject to uα∈arg min u∈X

Jγ,(u;α)

for

Jγ,(u;α) :=H(u) + Φ(Ku) +Rγ(Au;α).

Further,givenα∈ Pα,wedeﬁnethe“marginalised”regularisationfunctional

Tγ(v;α) := inf

¯

v∈K−1vR

(11)

HereK−1_stands_for_the_preimage,_so_the_constraint_is_¯_v_∈_X _with_v₌_K_¯_v_._Then_in_the_case₍_γ,₎_{= (}_∞_,₀₎

and F =F0◦K,ourproblemmayalsobewrittenas

min α∈Pα∞ F0(vα) subjectto vα∈arg min v∈Y Φ(v) +T(v;α).

This givestheproblem amuchmoreconventionalﬂair,asthefollowingexamplesdemonstrate.

Example 10 (TV as a marginal). Considerthe totalvariationregularisation of Example 4. Then A1=D,

N = 1, and

T(v;α) =R(Av;α) =αTV(v).

Example11 (TGV2 as a marginal). IncaseoftheTGV2 regularisationof Example 5,wehaveK(v,w)=v

and K†v= (v,0).ThusKK†f =f,etc., so

T(v;α) = inf

w∈BD(Ω)R(Dv−w, Ew;α) = TGV 2

α(v).

3. Mainresults

Ourtasknowistostudythecharacteristicsofoptimalsolutions,andtheirexistence.Ourresults,based on natural assumptions on the data and the original problem (P), derive properties of the solutions to thisproblemandallnumericallyregularisedproblems (Pγ,₎_suﬃciently_close_to_the_original_problem:_large γ >0 and small >0.We denoteby uα,γ, any solutionto (Dγ,) forany givenα∈ Pα,and byαγ, any

solutionto (Pγ,₎_._Solutions_to_(D

α)and (P)wedenote,respectively,byuα=uα,∞,0, andαˆ=α∞,0.

3.1. L2_{-squared cost}_and_L2_-squared_ﬁdelity

Ourmain existenceresultregardingL2_-squared_costs_and_L2_-squared_ﬁdelities_is_the_following.

Theorem 1. Let Y and Z be Hilbert spaces, F(u) = 1

2K0u−f0 2

Z, and Φ(v) = 12f −v 2

Y for some

f ∈ R(K), f0∈Z, and a bounded linear operator K0:X →Z satisfying

K0uZ≤C0KuY, for allu∈X for some constantC0>0. (8) Suppose Assumptions A-KA and A-δhold. If for some α¯∈intPα and t∈(0,1/C0] holds

T(f; ¯α)> T(f −t(K0K†)∗(K0K†f−f0); ¯α), (9) then problem (P)admits a solution αˆ∈intP_α∞.

If, moreover, Assumption A-H holds, then there exist ¯γ ∈(0,∞) and ¯ ∈(0,∞) such that the problem (Pγ,₎_with₍_γ,₎_∈_[¯_γ,_∞_]_×_[0_,_¯_] _admits_a_solution_α

γ,∈intPα∞, and the solution map

S(γ, ) := arg min

α∈Pα∞

F(uα,γ,)

(12)

We provethis result inSection 5. Outer semicontinuity of aset-valued map S :Rk _⇒ _Rm _means _[39]

thatforany convergent sequencexk _→_x_and _S₍_xk₎_yk _→_y_,_we _have_y_∈_S₍_x_)._In_particular, _the_outer

semicontinuity of S means that as the numerical regularisation vanishes, the optimal parameters for the regularisedmodels (Pγ,₎_tend_to_optimal_parameters_of_the_original_model_(P)_.

Remark5.LetZ =Y and K0=K in Theorem 1.Then (9)reducesto

T(f; ¯α)> T(f0; ¯α). (10)

Alsoobserve thatourresultrequires,Φ◦K to measure allthedata thatF measures,inthemoreprecise sense givenby (8).If (8) didnothold,anoscillatingsolutionuα forα∈∂Pα∞,could largelypassthrough

the nullspace of K, hence havelow valuefor the objective J of theinner problem, yet havea large cost givenbyF.

Corollary1 (Total variation Gaussian denoising). Suppose f,f0∈BV(Ω)∩L2(Ω), and

TV(f)>TV(f0). (11)

Then there exist ¯,γ >¯ 0such that any optimal solution αγ, to the problem

αγ,∈arg min α≥0 1 2f0−uα,γ, 2 L2_(Ω) with uα,γ,∈arg min u∈BV(Ω) ₁ 2f −u 2 L2_(Ω)+αDuγ,M+ 2∇v 2 L2_(Ω;_Rn₎ satisﬁes αγ,>0whenever ∈[0,¯], γ∈[¯γ,∞].

Thatis,fortheoptimalparametertobestrictlypositive,thenoisyimagef should,intermsofthetotal variation,oscillatemorethanthenoise-freeimagef0.Thisisaverynaturalcondition:ifthenoisesomehow

hadsmoothedoutfeaturesfrom f0,thenweshouldnotsmoothitanymorebyTV regularisation!

Proof. Assumptions A-KA, A-δ, and A-H we have already veriﬁed in Example 4. We then observe that

K0 =K,so we are inthesetting of Remark 5. Followingthe mappingof the TVproblem to the general

frameworkusingtheconstructionin Example 4,wehaveK=I and K†=I embeddingswith Y =L2_(Ω).

K†is boundedonR(K)=L2_(Ω)_∩_BV(Ω)._Moreover,_by _{Example 10}_, _T₍_v_;_α_¯₎_{= ¯}_α_TV(_v_)._Thus₍₁₀₎_with

thechoicet= 1 reducesto (11). 2

ForTGV2 wealsohaveaverynaturalcondition.

Corollary 2 (Second-order total generalised variation Gaussian denoising). Suppose that the data f,f0 ∈

L2_(Ω)_∩_BV(Ω) _{satisﬁes for some}_α

2>0the condition

TGV2₍_α₂_,₁₎(f)>TGV2₍_α₂_,₁₎(f0). (12) Then there exist,¯γ >¯ 0such that any optimal solution αγ,= ((αγ,)1,(αγ,)2)to the problem

αγ, ∈arg min α≥0 1 2f0−vα,γ, 2 L2_(Ω)

(13)

with (vα,γ,, wα,γ,)∈ arg min v∈BV(Ω) w∈BD(Ω) ₁ 2f −v 2 L2_(Ω)+α1Dv−wγ,M+α2Ewγ,M + 2(∇v,∇w) 2 L2_(Ω;_Rn_×_Rn×n₎ satisﬁes (αγ,)1,(αγ,)2>0whenever ∈[0,¯], γ∈[¯γ,∞].

Proof. Assumptions A-KA,A-δ, and A-H we have already veriﬁed inExample 5. We then observe that

K0 =K,so we areinthe setting of Remark 5.By Example 11, T(v;α¯)= TGV2α¯(v).Finally, similarlyto (11),wegetfor (10)witht= 1 thecondition (12). 2

Example 12 (Fourier reconstructions). Let K0 be given, for example as constructed inExample 4 or Ex-ample 5.If wetakeK =FK0 forF theFouriertransform –or anyother unitary transform–then (8)is

satisﬁedandK†=K₀†F∗.Thus (9)becomes

T(f; ¯α)> T(f −t(K0K0†F∗)∗(K0K0†F∗f−f0); ¯α).

WithF∗f,f0∈ R(K0) andt= 1 thisjustreduces to

T(f; ¯α)> T(Ff0; ¯α).

Unfortunately,ourresultsdonotcoverparameterlearningforreconstructionfrompartialFouriersamples exactly because of (8). What we can do is to ﬁnd the optimal parameters if we only know apart of the ground-truth, buthavefullnoisydata.

3.2. Huberised total variation and other L1_-type_{costs with}_L2_{-squared ﬁdelity}

We now consider the alternative“Huberised total variation”cost functional from Example 3. Unfortu-nately, weare unableto deriveforFL1

η∇ easilyinterpretableconditionsas fortheFL22.Theworkaroundis

to “discretise” thenorm, inthe senseof testingonlywith aﬁnitenumberof testfunctions.To be precise, recall that FL1 η∇(u) =D(f0−K0u)η, where μη = sup ξ∈C∞c (Ω;Rn), ξ ≤1 (μ|ξ)− 1 2ηξL2(Ω;Rn).

Theideaistoreplacetheset oftestfunctionsξbyaﬁniteset

V ={ξ1, . . . , ξM} ⊂ {ξ∈Cc∞(Ω;Rn)| ξ ≤1}.

Thatis,weapproximateFL1

η∇ by F_LV1 η∇(u) := sup ξ∈V (μ|ξ)− 1 2ηξL2(Ω;Rn), where μ=D(f0−K0u).

(14)

ChoosingV suitably, thisamountsto takingtheﬁnite-dimensionalHuberised 1-normofadiscretised gra-dientoff0−K0u.

To study bi-level optimisationproblems with thecost F_LV1

η∇, we rewrite the functional using a generic

HuberisedL1-cost FL1 η(z) := sup λ Z∗≤1 (λ|z−f0)− 1 2ηλ 2 Z∗

onareﬂexiveBanachspaceZ,forsomef0∈Z.Indeed,deﬁning

DVv:={(−divξ1|v), . . . ,(−divξM|v)}, (v∈BV(Ω)),

wehave

F_LV1

η∇:=FL1η◦D

V _where_f

0:=DVf0 andZ =RM with∞-norm.

OurresultsonFV L1

η∇ arethenbasedonthefollowing generalresult.

Theorem 2. Let Y be a Hilbert space, and Z a reﬂexive Banach space. Let F = FL1

η ◦K0, and Φ(v) =

1

2f −v 2

Y for some compact linear operator K0 :X →Z satisfying (8)and f ∈ R(K). Suppose

Assump-tions A-KA and A-δ hold. If for some α¯∈intPα and t>0holds

T(f; ¯α)> T(f −t(K0K†)∗λ; ¯α), λ∈∂FL1

η(K0K

†_f₎_, ₍₁₃₎

then the problem (P)admits a solution αˆ∈intP_α∞.

If, moreover, Assumption A-H holds, then there exist ¯γ ∈ (0,∞) and ¯ ∈ (0,∞) such that the prob-lem (Pγ,) with (γ,) ∈ [¯γ,∞]×[0,¯] admits a solution αγ, ∈ intPα∞, and the solution map S is outer

semicontinuous within [¯γ,∞]×[0,¯].

Weprovethis resultinSection5.

Remark6.IfK0=K,thecondition (13)hasthemuchmorelegibleform

T(f; ¯α)> T(f −tλ; ¯α), λ∈∂FL1

η(f). (14)

Also if K is compact, then the compactness of K0 follows from (8). In the following applications, K is

howevernot compactfor typical domains Ω⊂R2 _or _R3_, _so _we _have_to _make _K

0 compact bymaking the

rangeﬁnite-dimensional.

Corollary 3 (Total variation Gaussian denoising with discretised Huber-TV cost). Suppose that the data satisﬁes f,f0∈BV(Ω)∩L2(Ω) and for some t>0and ξ∈V the condition

TV(f)>TV(f+tdivξ), −divξ∈∂F_LV1

η∇(f). (15)

Then there exist,¯γ >¯ 0such any optimal solution αγ, to the problem

min α≥0F V L1 η∇(f0−vα) with

(15)

uα∈arg min u∈BV(Ω) ₁ 2f −u 2 L2_(Ω)+αDuγ,M+ 2∇v 2 L2_(Ω;_Rn₎ satisﬁes αγ, >0whenever ∈[0,¯], γ∈[¯γ,∞].

This saysthatfortheoptimalparametertobestrictly positive,thenoisyimagef shouldoscillatemore than the image f +tdivξ in the direction of the (discrete) total variation ﬂow. This is a very natural condition, andweobservethatthenon-discretisedcounterpart of (15)forγ=∞wouldbe

TV(f)>TV(f+tdivξ), ξ∈Sgn(Df0−Df),

where wedeﬁneforameasureμ∈ M(Ω;Rm) thesign

Sgn(μ) :={ξ∈L1(Ω;μ)|μ=ξ|μ|}.

Thatis,−divξ isthetotalvariationﬂow.

Proof. Analogous to Corollary 1 regarding the L2 _cost. _Note _that _in _the _application _of _{Theorem 2}_, _the

operatorDV isabsorbedbyK0.Precisely,wetakeK0=DV,whileK=I. 2

ForTGV2 wealsohaveananalogousnaturalcondition.

Corollary4 (TGV2Gaussian denoising with discretised Huber-TV cost). Suppose that the data f,f0∈L2(Ω) satisﬁes for some t,α2>0and ξ∈V the condition

TGV2₍_α₂_,₁₎(f)>TGV2₍_α₂_,₁₎(f+tdivξ), −divξ∈∂F_LV1

η∇(f). (16)

Then there exist¯,¯γ >0such any optimal solution αγ, = ((αγ,)1,(αγ,)2)to the problem

min α≥0F V L1 η∇(f0−vα) with (vα, wα)∈ arg min v∈BV(Ω) w∈BD(Ω) ₁ 2f −v 2 L2_(Ω)+α1Dv−wγ,M+α2Ewγ,M + 2(∇v,∇w) 2 L2_(Ω;_Rn_×_Rn×n₎ satisﬁes (αγ,)1,(αγ,)2>0whenever ∈[0,¯], γ∈[¯γ,∞].

Proof. Analogousto Corollary 2. Notethatfortheapplicationof Theorem 2,where inthepresent setting

u= (v,w),wetakeK0u=DVv,while Ku=v. 2

4. Afewauxiliaryresults

We record in this section some general results that will be useful in the proofs of the main results. These includethecoercivity of thefunctional Jγ,0₍_·_;_λ,_α_),_recorded _in_Section_4.1_. _We_then _discuss_some

elementarylowersemicontinuityfactsinSection4.2.WeprovideinSection4.3somenewresultsforpassing from strictconvergenceto strongconvergence.

(16)

4.1. Coercivity Observethat Rγ(μ1, . . . , μN; ¯α) = sup ψj∈Cc∞(Ω;Rmj×Rn) supx∈Ω ψj(x)22≤1 N j=1 ¯ αj μj(ψj)− 1 2γψj 2 L2_(Ω;_Rn₎ . Thus R(μ;α)≥Rγ(μ;α)≥R(μ;α)−C 2γL n_(Ω)

for someC =C( ¯α). Since Ω is bounded, it follows that given δ >0, forlarge enough γ >0 and every

≥0 holds

J(u;α)−δ≤Jγ,0(u;α)≤Jγ,(u;α)≤J0,(u;α). (17) We will use these properties frequently. Based on the coercivity and norm equivalence properties in As-sumptions A-KA and A-Φ, the following proposition statesthe important fact thatJγ,0 _is _coercive _with

respectto · _X andthusalsothestandardnorm ofX.

Proposition 1.Suppose Assumptions A-KA and A-Φhold, and that α∈intP_α∞. Let ≥0 and γ∈(0,∞]. Then

Jγ,(u;α)→+∞ as uX →+∞. (18)

Proof. Let {ui_}∞

i=1 ⊂ X, and suppose supiJγ,(ui;α) < ∞. Then in particular supiΦ(Kui) < ∞. By

Assumption A-Φthensup_iKui

Y <∞.But Assumption A-KAsays

ui_X=

N

j=1

Ajuij+KuiY ≤Jγ,(ui;α) +KuiY.

This implies sup_iui_X <∞. Bythe equivalenceof norms in Assumption A-KA, we immediately obtain

(18). 2

4.2. Lower semicontinuity

Werecord thefollowing elementarylower semicontinuityfactsthatwehave alreadyused tojustify our examples.

Lemma1. The following functionals are lower semicontinuous.

(i) μ→ ν−μγ,M with respect to weak* convergence in M(Ω;Rd).

(ii) v → f −v2

L2_(Ω;_Rd₎ with respect to weak convergence in Lp(Ω;Rd) for any 1< p≤2 on a bounded

domain Ω.

(17)

Proof. In each case, let {vi_}∞

i=1 converge to v. Denoting by G the involved functional, we write it as a

convex conjugate,G(v)= sup{v,ϕ−G∗(ϕ)}. Takingasupremisingsequence{ϕj}∞_j₌₁ forthisfunctional at any point v, weeasily see lowersemicontinuity byconsideringthesequences {vi,ϕj−G∗(ϕj)}∞_i₌₁ for eachj.Incase(ii)weusethefactthatϕj_∈_L2_(Ω;_Rd₎_⊂_[_Lp_(Ω;_Rd_)]∗ _when_{Ω is}_bounded.

In case(i), how exactlywe write G(μ)=ν−μγ,M as aconvex conjugate demandsexplanation. We

ﬁrstof allrecallthatforg∈Rn_,_the_{Huber-regularised}_norm_may_be_written_in_dual_form _as

|g|γ= sup q, g −γ 2q 2 2q2≤1 . Therefore, weﬁndthat G(μ) = sup ⎧ ⎨ ⎩μ(ϕ)− Ω γ 2ϕ(x) 2 2dx ϕ∈C_c∞(Ω), sup x∈Ω ϕ(x)2≤α ⎫ ⎬ ⎭. This hastherequiredform. 2

Wealsoshowherethatthemarginalregularisationfunctional Tγ₍_·_;_α_¯_{) is}_weakly*_lower _{semicontinuous}

onY.ChoosingKas in Example 4and Example 5,thisprovides inparticularaproofthatTV andTGV2 are lowersemicontinuouswithrespect toweakconvergenceinL2(Ω) whenn= 1,2.

Lemma 2. Suppose α¯∈intP_α∞, and Assumption A-KA holds. Then Tγ(·;α¯)is lower semicontinuous with respect to weak* convergence in Y, and continuous with respect to strong convergence in R(K).

Proof. Letvk _v∗ _weakly*_in_Y. _By_the_{Banach–Steinhaus}_theorem,_the_sequence _is_bounded_in_Y_._From

thedeﬁnition

Tγ(v; ¯α) := inf

¯

v∈K−1vR

γ₍_A_v_¯_{; ¯}_α₎_.

Therefore, ifwe pick>0 andv¯k∈K−1vk suchthat

Rγ(Av¯k; ¯α)≤Tγ(vk; ¯α) +,

then referralto Assumption A-KA, yieldsforsomeconstantc>0 the bound

cv¯kX ≤ vk+Rγ(Av¯k; ¯α)≤ vk+Tγ(vk; ¯α) +.

Without lossofgenerality, wemayassumethat lim inf

k→∞ T

γ₍_vk_{; ¯}_α₎_<_∞_,

becauseotherwisethereisnothingtoprove.Then{v¯k}∞_k₌₁isboundedinX,andthereforeadmitsaweakly* convergent subsequence. Letv¯be the limitof this, unrelabelled,sequence. SinceK iscontinuous, we ﬁnd thatK¯v=v. ButRγ₍_·_;_α_¯_{) is}_clearly_weak*_lower _{semicontinuous}_in_X_;_see_{Lemma 1}_. _Thus

Tγ(v; ¯α)≤Rγ(Av¯; ¯α)≤lim inf

k→∞ R

γ₍_A_¯_vk_{; ¯}_α₎_≤_{lim inf} k→∞ T

γ₍_vk_{; ¯}_α_{) +}_.

(18)

To seecontinuitywithrespect tostrongconvergenceinR(K),weobservethatifv=Ku∈ R(K),then bytheboundednessoftheoperators{Aj}Nj=1 weget

Tγ(v; ¯α)≤Rγ(u; ¯α)≤R(u; ¯α)≤Cu,

for some constant C > 0. So we know that Tγ(·;α¯)|R(K) is ﬁnite-valued and convex. Therefore it is continuous [30, LemmaI.2.1]. 2

4.3. From Φ-strict to strong convergence

In Proposition 3, formingpart of the proof of ourmain theorems, we will need to passfrom “Φ-strict convergence” of Kuk _to _v _to _strong _convergence, _using _the _following _lemmas. _The _former _means _that

Φ(Kuk) → Φ(v) and Kuk v weakly* in Y. By strong convexity in a Banach space Y, we mean the existenceofγ >0 suchthatforeveryy∈Y andz∈∂Φ(y)⊂Y∗holds

Φ(y)−Φ(y)≥(z|y−y) +γ 2y

₋_y2

Y, (y∈Y),

where (z|y) denotes the dual product, and the subdiﬀerential ∂Φ(y) is deﬁned by z satisfying the same expression with γ = 0. With regard to more advanced strict convergence results, we point the reader to

[25,38,36].

Lemma 3. Suppose Y is a Banach space, and Φ : Y → (−∞,∞] strongly convex. If vk _v_ˆ_∈ _dom_∂_Φ

weakly* in Y and Φ(vk₎_→_Φ(ˆ_v₎_{, then}_vk _→_v_ˆ_{strongly in}_Y_.

Remark7. Bystandardconvex analysis [30], v ∈dom∂Φ if Φ has aﬁnite-valued point of continuityand

v∈int dom Φ.

Proof. We ﬁrst of all note that −∞ < Φ(ˆv) < ∞ because v ∈ dom∂Φ implies v ∈ dom Φ. Let us pick

z∈∂Φ(ˆv).From thestrongconvexityofΦ,forsomeγ >0 then Φ(vk)−Φ(ˆv)≥(z|vk−ˆv) +γ

2v

k₋_v_ˆ2

Y.

Takingthelimitinﬁmum,weobserve

0 = Φ(ˆv)−Φ(ˆv)d≥lim inf k→∞ γ 2v k₋_v_ˆ2 Y.

Thisproves strongconvergence. 2

Wenowusethelemmatoshow strongconvergenceofminimisingsequences.

Lemma 4.Suppose Φis strongly convex, satisﬁes Assumption A-Φ, and that C ⊂Y is non-empty, closed, and convex with intC∩dom Φ=∅. Let

ˆ

v:= arg min

v∈C

Φ(v).

If {vk_}∞

(19)

Proof. BythestrictconvexityofΦ,impliedbystrongconvexity,andtheassumptionsonC,ˆvisuniqueand well-deﬁned.Moreovervˆ∈dom∂Φ.Indeed,ourassumptionsshowtheexistenceofapointv∈intC∩dom Φ. The indicator function δC is thencontinuous at v, and so standardsubdiﬀerential calculus(see, e.g., [30,

Proposition I.5.6]) implies that ∂(Φ+δC)(ˆv) = ∂Φ(ˆv)+∂δC(ˆv). But 0 ∈ ∂(Φ+δC)(ˆv) because ˆv ∈

arg minv∈YΦ(v)+δC(v).Thisimpliesthat∂Φ(ˆv)=∅.Consequentlyalsovˆ∈dom Φ,andΦ(ˆv)∈R.

Using thecoercivityof Φ in Assumption A-Φ wethen ﬁndthat{vk_}∞

k=1 isboundedinY, atleastafter

movingtoanunrelabelledtailofthesequencewithΦ(vk₎_≤_Φ(ˆ_v₎_{+ 1.}_Since_Y _is_a_dual_space,_the_unit_ball

isweak*compact,andwededucetheexistenceofasubsequence,unrelabelled,andv∈Y suchthatvk v

weakly* inY.Bytheweak*lowersemicontinuity(Assumption A-Φ),wededuce Φ(v)≤lim inf

k→∞ Φ(v

k_{) = Φ(ˆ}_v₎_.

Since eachvk _∈_C_,_and _C _is _closed,_also _v _∈_C_. _Therefore,_by _the_strict _convexity_and_the_deﬁnition _of_v_ˆ_,

necessarily v = ˆv. Therefore vk vˆweakly*inY, and Φ(vk)→Φ(ˆv). Lemma 3now shows thatvk →vˆ strongly inY. 2

5. Proofsofthe mainresults

Wenowprovetheexistence,continuity,andnon-degeneracy(interiorsolution)resultsofSection3through a series of lemmas and propositions, starting from general ones that are then specialised to provide the naturalconditionspresentedinSection3.

5.1. Existence and lower semicontinuity under lower bounds

Our principaltool for proving existenceis the following proposition.We will inthe rest ofthis section concentrateonprovingtheexistenceofthesetKinthestatement.Webase thisonthenaturalconditions of Section3.

Proposition 2 (Existence on compact parameter domain). Suppose Assumptions A-KA and A-Φhold. With

≥0and γ∈(0,∞]ﬁxed, if there exists a compact set K ⊂intP_α∞ with

inf

α∈P_α∞_\KF(uα,γ,)>α∈Pinfα∞

F(uα,γ,), (19)

then there exists a solution αγ,∈intPα∞ to (Pγ,). Moreover, the mapping

Iγ,(α) :=F(uα,γ,),

is lower semicontinuous within intPα∞.

Theproofdepends onthefollowing twolemmasthatwillbeusefullateronas well.

Lemma5 (Lower semicontinuity of the ﬁdelity with varying parameters). Suppose Assumptions A-KA, A-Φ, and A-H hold. Let uk _u∗ _{weakly* in}_X_{, and}₍_αk_,_γk_,k₎_→₍_α,_γ,₎_∈_int_P∞

α ×(0,∞]×(0,∞). Then

Jγ,(u;α)≤lim inf

k→∞ J γk_,k

(20)

Proof. Let ˆ >0 be such thatαj ≥ˆ, (j = 1,. . . ,N). Wethen deducefor largek andsomeC =C(Ω,γ) that lim sup k Jγ,k(uk; ˆ, . . . ,ˆ)≤lim sup k Jγk,k(uk; ˆ, . . . ,ˆ) +C ≤lim sup k Jγk,k(uk;αk) +C <∞. (20) Herewehaveassumedtheﬁnalinequalitytohold.Thiscomeswithoutloss ofgenerality,becauseotherwise there is nothingto prove.Observethatthis holds evenif {αk}∞_k₌₁ isnot bounded.In particular, if>0, restrictingktobe large,wemayassumethat

C1:=H(uk)<∞. (21) Werecallthat Jγ,(u;α) :=H(u) + Φ(Ku) + N j=1 αjAjuγ,j. (22)

Wewantto showlower semicontinuity ofeachof thetermsinturn.Westartwith thesmoothingterm. If

>0,using (21),wewrite

kH(uk) = (k−)H(uk) +H(uk)≤(k−)C1

2 +H(u

k₎_.

Bytheconvergencek_→_,_and _the_weak*_lower _{semicontinuity}_of_H_,_we_ﬁnd_that

H(u)≤lim inf k→∞ k_H₍_uk₎_. ₍₂₃₎ If= 0, wehave H(u) = 0· ∞= 0, whilestill 0≤sup k kH(uk)<∞. Thus (23)follows.

The ﬁdelity term Φ◦K is weak* lower semicontinuous by the continuity of K and the weak* lower semicontinuity ofΦ.Itthereforeremainstoconsiderthetermsin (22)involvingboththeregularisation pa-rametersα,aswellastheHuberisationparameterγ.Indeedusingthedualformulation (6)oftheHuberised norm,we haveforsomeconstantC=C(ˆ,Ω) that

Ajukγk_,j= sup ϕ(x) ≤1 ϕ dAjuk− 1 2γkϕ 2_dx ≥ sup ϕ(x) ≤1 Ω ϕ dAjuk+ 1 2γϕ 2_dx₋_C_|_γ−1₋₍_γk₎−1_| =Ajukγ,j−C|γ−1−(γk)−1|.

(21)

Thus, ifα∈ Pα, weget

αk_jAjukγk_,j=α_jA_juk_γk_,j+ (αk_j −α_j)A_juk_γk_,j

≥αjAjukγk_,j− |αk_j −α_j|A_juk_γk_,j

≥αjAjukγ,j−C|γ−1−(γk)−1| − |αkj −αj|Ajukγk_,j.

It follows from (20) thatthe sequence {Ajukγk_,j}_k∞₌₁ is bounded in M(Ω;Rmj) for eachj = 1,. . . ,N.

Thus lim inf k→∞ α k jAjukγk_,j≥lim inf k→∞ αjAju k γ,j ≥αjAjuγ,j,

where theﬁnalstepfollowsfrom Lemma 1.

Itremainstoconsider thecasethatα∈ Pα∞\ Pα,i.e.,whenα=∞forsome.Wemaypicksequences

{β

j}∞=1, (j = 1,. . . ,N), such that βj αj. Further, we may ﬁnd {βjk,}∞=1 such that β

k,

j ≤ αkj with

αk

j = lim→∞βjk, andβj= limk→∞βjk,.Then, bythebounded casestudiedabove

lim inf k→∞ α k jAjukγk_,j≥lim inf k→∞ β k, j Ajukγk_,j≥β_jA_ju_γ,j.

But{αk_jAjukγk_,j}∞_k₌₁ isboundedby (20),and

lim inf

→∞ β

jAjuγ,j ≥αjAjuγ,j.

Thus lowersemicontinuityfollows. 2

Lemma 6 (Convergence of reconstructions away from boundary). Suppose Assumptions A-KA, A-Φ, and A-H hold. Let (αk_,_γk_,k₎_→₍_α,_γ,₎_in_int_P∞

α ×(0,∞]×[0,∞). Then we can ﬁnd uα,γ,∈arg minJγ,(·;α)

and extract a subsequence satisfying

Jγk,k(u_αk_,γk_,k;αk)→Jγ,(u_α,γ,;α), (24a)

uαk_,γk_,k u∗ _α,γ, weakly* X, and (24b) Kuαk_,γk_,k→Ku_α,γ, strongly inY. (24c)

Proof. By Lemma 5, wehave lim inf k→∞ J γk,k₍_u αk_,γk_,k;αk)≥Jγ,(u_α,γ,;α) = min u∈XJ γ,₍_u_;_α₎_.

Wealsowanttheoppositeinequality lim sup

k→∞ J γk,k₍_u

αk_,γk_,k;αk)≤Jγ,(u_α,γ,;α). (25)

Letδ >0.If= 0,weuse Assumption A-H onu=uα,γ,,toproduceuδ.Otherwise,wesetuδ =uα,γ,.In

eithercase