Contents lists available atScienceDirect
Journal
of
Mathematical
Analysis
and
Applications
www.elsevier.com/locate/jmaa
The
structure
of
optimal
parameters
for
image
restoration
problems
J.C. De Los Reyesa, C.-B. Schönliebb, T. Valkonena,b,∗
a Research Center on Mathematical Modelling (MODEMAT), Escuela Politécnica Nacional, Ladrón de Guevara E11-253, Quito, Ecuador
b Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Wilberforce Road, CB3 0WA, Cambridge, United Kingdom
a r t i c l e i n f o a b s t r a c t
Article history: Received 8 May 2015
Available online 16 September 2015 Submitted by H. Frankowska Keywords:
Total variation
Total generalised variation Bi-level optimisation Optimality
Parameter choice
We study the qualitative properties of optimal regularisation parameters in variationalmodels for imagerestoration.The parametersare solutionsof bilevel optimisationproblemswiththeimagerestorationproblemasconstraint.Ageneral type of regulariser is considered, which encompasses total variation (TV), total generalised variation(TGV) andinfimal-convolution total variation (ICTV). We provethatundercertainconditionsonthegivendataoptimalparametersderived bybileveloptimisationproblemsexist.Acrucialpointintheexistenceproofturns outtobetheboundednessoftheoptimalparametersawayfrom0 whichweprove inthispaper.Theanalysisisdoneontheoriginal–inimagerestorationtypically non-smooth variational problem – as well as on a smoothed approximation set inHilbertspacewhich istheoneconsidered innumerical computations.Forthe smoothedbilevelproblemwealsoprovethatitΓ convergestotheoriginalproblem asthesmoothingvanishes.Allanalysisisdoneinfunctionspacesrather thanon thediscretisedlearningproblem.
© 2015TheAuthors.PublishedbyElsevierInc.Thisisanopenaccessarticle undertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/).
1. Introduction
In this paper we consider the generalvariational image reconstructionproblem that, given parameters
α= (α1,. . . ,αN),N≥1,aimstocomputeanimage
uα∈arg min u∈X
J(u;α).
The image depends on α and belongs in our setting to a generic function space X. Here J is a generic energymodellingourpriorknowledgeontheimageuα.Thequalityofthesolutionuαofvariationalimaging
* Corresponding author.
E-mail addresses:[email protected](J.C. De Los Reyes), [email protected](C.-B. Schönlieb), tuomo.valkonen@iki.fi(T. Valkonen).
http://dx.doi.org/10.1016/j.jmaa.2015.09.023
0022-247X/© 2015 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
approacheslikethisonecruciallyreliesonagoodchoiceoftheparametersα.Weareparticularlyinterested inthecase J(u;α) = Φ(Ku) + N j=1 αjAjuj,
withKagenericboundedforwardoperator,Φ afidelityfunction,andAj linearoperatorsactingonu.The
valuesAjuarepenalisedinthetotalvariationorRadonnormμj =μM(Ω;Rmj),andcombinedconstitute
theimageregulariser.Inthiscontext,αrepresentstheregularisationparameterthatbalancesthestrength of regularisation against the fitness Φ of the solution to the idealised forward model K. The size of this parameterdependsonthelevelofrandomnoiseandthepropertiesoftheforwardoperator.Choosingittoo largeresultsinover-regularisationofthe solutionandinturn maycausethe lossof potentially important details in the image; choosing it too small under-regularises the solution and may result in a noisy and unstableoutput. Inthis workwewilldiscuss andthoroughly analyseabileveloptimisationapproachthat isabletodeterminetheoptimalchoiceofαinJ(;α).
Recently,bilevelapproaches forvariationalmodels havegainedincreasingattentioninimage processing and inverse problems in general. Based on prior knowledge of the problem in terms of a training set of imagedataandcorrespondingmodelsolutionsorknowledgeofothermodeldeterminantssuchasthenoise level,optimalreconstructionmodelsareconceivedbyminimisingacostfunctional –calledF inthesequel –constrainedtothevariationalmodelinquestion.Wewillexplainthisapproachinmoredetailinthenext section. Before,letus giveanaccount ofthestateof theartof bileveloptimisationformodellearning.In machinelearningbileveloptimisationiswellestablished.Itisasupervisedlearningmethodthatoptimally adaptsitselftoagivendatasetofmeasurementsanddesirablesolutions.In [41,42,28,29,18,19],forinstance the authors consider bilevel optimisation for finite dimensional Markov random field (MRF) models. In inverse problems the optimal inversion and experimental acquisition setup is discussed in the context of optimal model designin worksby Haber, Horesh and Tenorio [34,32,33], as well as Ghattas et al.[13,8]. Recentlyparameterlearninginthecontextoffunctional variationalregularisationmodels alsoenteredthe image processingcommunitywith worksbytheauthors [23,14], Kunisch,Pockand co-workers [37,17]and Chungetal. [20].AveryinterestingcontributioncanbefoundinapreprintbyFehrenbachetal. [31]where the authors determine an optimal regularisation procedureintroducing particular knowledgeof the noise distributionintothelearningapproach.
Apartfromtheworkoftheauthors [23,14],allapproachesforbilevellearninginimage processingsofar areformulatedandoptimised inthediscrete setting.Oursubsequentmodelling,analysis andoptimisation willbecarriedoutinfunctionspaceratherthanonadiscretisationofthevariationalmodel.Inthiscontext, acareful analysis of the bilevelproblem is ofgreat relevancefor its applicationinimage processing. The structure of optimal regularisers is important, among others, for thedevelopment of solution algorithms. Inparticular, iftheparametersarebounded andlieintheinteriorofaclosedconnectedset,thenefficient optimisation methods can be used for solving the problem. Previous results on optimal parameters for inverseproblemswithpartial differentialequationshavebeen obtainedin,e.g., [16].
Inthispaperwestudythequalitativestructureofregularisation parametersarisingassolutionsfromthe bileveloptimisationofvariationalmodels.Inourframeworkthevariationalmodelsaretypicallyconvexbut non-smoothand posedinBanachspaces.Thetotalvariationandtotalgeneralised variationregularisation models are particular instances.Alongside theoptimisation of thenon-smooth variationalmodel, we also consider a smoothed approximation in Hilbert space which is typically the one considered in numerical computation. Under suitable conditions, we prove that – for both the original non-smooth optimisation problem as well as the regularised Hilbertspace problem – the optimal regularisers are bounded and lie inthe interiorof thepositiveorthant. Theconditions necessary to provethis turn outto be verynatural
conditionsonthegivendatainthecaseofanL2costfunctionalF.Indeed,forthetotalvariationregularisers
with L2-squaredcostandfidelity,wewill merelyrequire
TV(f)>TV(f0),
withf0theground-truthandf thenoisyimage.Thatis,thenoisyimageshouldoscillatemoreintermsof
the totalvariationfunctional,than theground-truth. Forsecond-ordertotal generalisedvariation [11],we obtain an analogous condition. Apartfrom the standard L2 costs, we also discuss coststhat constitutea
smoothedL1normofthegradientoftheoriginaldata–wewillcallthistheHuberisedtotalvariationcost
inthesequel–typicallyresultinginoptimalsolutionssuperiortotheones minimisinganL2 cost.Forthis case,however,theinteriorpropertyofoptimalparameterscouldbeverifiedforafinitedimensionalversion ofthecostonly.Eventually,wealsoshowthatasthenumericalsmoothingvanishestheoptimalparameters forthesmoothedmodels tendtooptimalparametersoftheoriginalmodel.
Theresultsderivedinthispaperaremotivatedbyproblemsinimageprocessing.However,their applica-bility goeswell beyondthatand canbe generallyappliedto parameterestimation problemsof variational inequalitiesofthesecondkind,forinstancetheparameterestimationprobleminBinghamflow [22]. Previ-ousanalysisinthiscontexteitherrequiredboxconstraintsontheparametersinordertoproveexistenceof solutionsor theadditionof aparameterpenaltyto thecostfunctional [3,6,7,23]. Inthispaper,we require neither butrather provethat under certain conditions on the variational model and for reasonable data and cost functional, optimal parameters are indeed positiveand guaranteed to be bounded away from 0 and ∞.Aswe willsee laterthisisenoughforproving existenceofsolutionsandcontinuityof thesolution map.Thenextstepfromourworkinthishereisderivingnumericallyusefulcharacterisationsofsolutions to theensuingbi-levelprograms.Forthemostbasicproblemsconsideredhereinthishasbeendonein [23]
under numerical H1 regularisation.We will consider inthe follow-up work [24] the optimalityconditions
forhigher-orderregularisersandthenewcostfunctionalsintroducedinthiswork.Foranextension charac-terisationofoptimalitysystemsforbi-leveloptimisationinfinite dimensions,wepointthereaderto [27]as astartingpoint.
Outline of the paper InSection2weintroducethegeneralbilevellearningproblem,statingassumptionson thesetupofthecostfunctionalFandthelowerlevelproblemgivenbyavariationalregularisationapproach. The bilevel problem is discussed in its original non-smooth form (P) as well as in asmoothed form ina Hilbert space setting (Pγ,) inSection2.2, thatwill be theone used inthenumerical computations.The
bilevelproblemisputincontextwithparameterlearningfornon-smoothvariationalregularisationmodels, typical in image processing, by proving the validity of the assumptions for examples such as TV, TGV and ICTVregularisation.Themainresultsofthepaper –existenceofpositiveoptimalparametersforL2,
Huberised TV and L1 type costsand theconvergence of thesmoothed numericalproblem to the original
non-smooth problem – are stated inSection 3. Auxiliary results, such as coercivity, lower semicontinuity and compactness results for the involved functionals, is the topic of Section 4. Proofs for existence and convergence of optimal parameters are contained in Section5. The paper finishes with abrief numerical discussion inSection6.
2. Thegeneralproblemsetup
Let Ω ⊂ Rm be an open bounded domain with Lipschitz boundary. This will be our image domain. Usually Ω = (0,w)×(0,h) for w and h the width and height of a two-dimensional image, although no such assumptions are made in this work. Our noisy or corrupted data f is assumed to lie in a Banach space Y, which isthe dual of Y∗, while ourground-truth f0 lies inageneral Banachspace Z. Usually, in
data is just acorrupted version of the original image.Further d= 1 for grayscale images, and d = 3 for colourimagesintypicalcolourspaces.Forsub-sampledreconstructionfromFouriersamples,wemightuse afinite-dimensionalspaceY =Cn–therearehoweversomesubtletieswiththat,andwerefertheinterested
readerto [1].
Asourparameterspaceforregularisationfunctional weightswetake
P∞
α := [0,∞]N,
where N isthedimensionoftheparameterspace, thatisα= (α1,. . . ,αN),N ≥1.Observethatweallow
infinite and zero valuesfor α. The reasonfor the former is that incase of TGV2, it is not reasonable to expect thatboth α1 and α2 are bounded; such conditionswould imply thatTGV2 performs betterthan
both TV2 and TV. But we want our learning algorithm to find out whetherthat is the case! Regarding zerovalues,oneofourmain tasksisproving thatforreasonabledata,optimalparametersinfactlieinthe interior
intPα∞= (0,∞]N.
This is required for the existence of solutions and the continuity of the solution map parametrised by additionalregularisationparametersneededforthenumericalrealisationofthemodel.Wealsoset
Pα:= [0,∞)N
forsomeoccasionswhenweneedaboundedparameter.
Remark1. Inmuch ofour treatment,we could allowfor spatiallydependentparametersα. However,the parameters would need to lie in a finite-dimensional subspace of C0(Ω;RN) in our theory. Minding our
generaldefinitionofthefunctional Jγ,0(·;α) below,nogeneralityis lostbytaking αto bevectorsinRN.
Wecould simplyreplace thesuminthe functional as alargersummodellingintegration over parameters withvaluesinafinite-dimensionalsubspaceof C0(Ω;RN).
Inour generallearning problem,we look for α= (α1,. . . ,αN) solving for someconvex, proper, weak*
lowersemicontinuous costfunctionalF :X →Rtheproblem min α∈Pα∞F(uα) (P) subjectto uα∈arg min u∈X J(u;α), (Dα) with J(u;α) := Φ(Ku) + N j=1 αjAjuj.
Here wedenoteforshortthetotalvariationnorm
μj:=μM(Ω;Rmj), (μ∈ M(Ω;Rmj)).
ThefollowingcoversourassumptionswithregardtoA,K, andΦ.Wediscussvariousspecificexamplesin Section2.1immediatelyafter statingtheassumptions.
Assumption A-KA (Operators AandK). We assume thatY is aBanach space, and X a normed linear space, boththemselvesduals ofY∗ andX∗,respectively.Wethen assumethatthelinearoperators
Aj:X→ M(Ω;Rmj), (j= 1, . . . , N),
and
K:X →Y,
are bounded. Regarding K, we also assume theexistence of a bounded aright-inverse K† :R(K)→ X, where R(K) denotestherangeoftheoperatorK. Thatis,KK†=I onR(K).Wefurtherassumethat
uX :=
N
j=1
Ajuj+KuY (1)
is a norm on X, equivalent to the standard norm. In particular, by the Banach–Alaoglu theorem, any sequence {ui}∞
i=1 ⊂X withsupiuiX<∞possessesaweakly*convergentsubsequence.
Assumption A-Φ (The fidelity Φ). Wesuppose Φ :Y →(−∞,∞] is convex, proper, weakly* lower semi-continuous, andcoerciveinthesense that
Φ(v)→+∞ as vY →+∞. (2)
We assumethat0∈dom Φ,andthe existenceoff ∈arg minv∈Y Φ(v) suchthatf =Kf¯for somef¯∈X. Finally,we requirethateitherK iscompact,or Φ iscontinuous andstronglyconvex.
Remark2.Whenf¯exists,wecanchoosef¯=K†f.
Remark3.Insteadof0∈dom Φ,itwouldsufficetoassume,moregenerally,thatdom Φ∩Nj=1kerAj=∅.
Wealsorequirethefollowingtechnicalassumptionontherelationshipoftheregularisationtermsandthe fidelity Φ.Roughlyspeaking,inmostinterestingcases,itsaysthatforeach,we cancloselyapproximate thenoisydataf with functionsfδ,of“order ”.Theorderhereisnotdirectlyintermsofderivatives,but
as definedbytheoperators {Aj}throughtheconditions (3): fδ,j = 0 forj =, sothere isno information
at otherorders,while Afδ,isbounded, sotheinformation atorderisbounded.
AssumptionA-δ(Order reduction). Weassumethatforevery∈ {1,. . . ,N}andδ >0,thereexistsf¯δ,∈X
suchthat Φ(Kf¯δ,)< δ+ inf Φ, (3a) Af¯δ,<∞, and, (3b) j= Ajf¯δ,j= 0. (3c)
Example 1 (Orders related to TGV2 andICTV). ForICTV,asformulatedin Example 6,“order2”means that fδ,2 ∈ BV2(Ω),while “order 1”means fδ,1 ∈BV(Ω). For the definitionof BV2(Ω),we referto [26].
ForTGV2,asformulatedin Example 5,“order1”islikewiseBV(Ω),but“order2”isthespaceoffunctions
2.1. Specific examples
Wenowdiscussafewexamplestomotivatetheabstractframeworkabove. Example2 (Squared L2 fidelity). WithY =L2(Ω),f ∈Y,and
Φ(v) = 1
2v−f
2
Y,
werecover thestandardL2-squaredfidelity, modellingGaussiannoise.Onabounded domainΩ, Assump-tion A-Φfollowsimmediately.
With fidelity functionals as in Example 2, the operator K has a double-role in (Dα). It models any
forwardoperatorT,suchas ablurringkernelor asub-sampledFouriertransform,butalsoactsto extract theinterestingcomponentsofuinthecorrectspace.Initslatterrole,theoperatorisalsorequiredwithinF. WethereforeintroducetheoperatorK0forthatpurpose.ThenfordenoisingwewanttotakeK=K0,but
forotherimage processingtasks,wewouldcombinethiswiththeforwardoperatorT,takingK=T K0.
Example3 (Cost functionals). Forthecostfunctional F,givennoise-freedataf0∈Z=L2(Ω),weconsider
inparticulartheL2 cost
FL2 2(u) := 1 2f0−K0u 2 L2(Ω),
aswell astheHuberisedtotalvariationcost
FL1
η∇(u) :=D(f0−K0u)γ
with noise-free data f0 ∈ Y := BV(Ω). For the definition of the Huberised total variation, we refer to
Section2.2 onthenumericsofthebi-levelframework (P).Inthenumericalcounterpart [24]tothepresent paper,wehaveobservedthattheHuberised totalvariationcostproducesimprovedresultscomparedtothe
L2 costwhenmeasuredwiththeSSIM [46],whilehaving theadvantageof beingconvexunliketheSSIM. Example4 (Total variation denoising). WetakeK0astheembeddingofX= BV(Ω)∩L2(Ω) intoZ=L2(Ω),
andA1=D.WeequipX withthenorm
uX :=uL2(Ω)+DuM(Ω;Rn).
This makes K0 a bounded linear operator. If the domain Ω has Lipschitz boundary, and the dimension
satisfiesn∈ {1,2},thespaceBV(Ω) continuouslyembedsintoL2(Ω)[2,Corollary3.49].Therefore,wemay identifyXwithBV(Ω) asanormedspace.Otherwise,ifn≥3,withoutgoingintothedetailsofconstructing
X asadualspace,1we defineweak*convergenceinXascombinedweak*convergenceinBV(Ω) andL2(Ω).
Anybounded sequenceinX will thenhaveaweak*convergentsubsequence.This istheonlypropertywe wouldusefromX beingadualspace.
Now,combinedwith Example 2andthechoiceK=K0,Y =Z,wegettotalvariationdenoisingforthe
sub-problem (Dα). Assumption A-KA holdingisimmediate fromthepreviousdiscussion. Assumption A-δ
is also easily satisfied, as with f ∈ BV(Ω)∩L2(Ω), we may simply pick f¯δ,1 = f for the only possible case = 1.Observe however that K is not compact unless n = 1, see [2, Corollary 3.49], so the strong
1 This can be achieved by allowing φ
convexity of Φ is crucial here. If the data f ∈ L∞(Ω), then it is well-known that solutions uˆ to (Dα)
satisfyuˆL∞(Ω)≤ fL∞(Ω).Wecouldthereforeconstructacompactembeddingbyaddingsomeartificial
constraintstothedataf.Thischangesinthenexttwoexamples,asboundednessofsolutionsforhigher-order regularisersisunknown;see also [43].
Example 5 (Second order total generalised variation denoising). Wetake
X= (BV(Ω)∩L2(Ω))×BD(Ω),
thefirstpartwiththesametopologyasin Example 4.WealsotakeZ=L2(Ω),denote u= (v,w),andset
K0(v, w) =v, A1u=Dv−w, and A2u=Ew
for E the symmetrised differential. With K = K0 and Y = Z, this yields second-order total generalised
variation (TGV2) denoising [11] for the sub-problem (Dα). Assuming for simplicity that α1,α2 > 0 are
constants,toshow Assumption A-KA,werecallfrom [12]theexistenceofaconstantc=c(Ω) suchthat
c−1vBV(Ω)≤ vL1(Ω)+DvM(Ω;Rn)≤cvBV(Ω),
where thenorm
vBV(Ω):= TGV2(1,1)(v) +vL1(Ω).
Wemaythusapproximate
wL1(Ω;Rn)≤ Dv−wM(Ω;Rn)+DvM(Ω;Rn)
≤ Dv−wM(Ω;Rn)+cTGV2(1,1)(v) +uL1(Ω) ≤(1 +c)Dv−wM(Ω;Rn)+EwM(Ω;Rn×n)
+cvL1(Ω).
ForsomeC >0,itfollows
uX = vL2(Ω)+DvM(Ω;Rn) +wL1(Ω;Rn)+EwM(Ω;Rn×n) ≤CDv−wM(Ω;Rn)+EwM(Ω;Rn×n)+vL2(Ω) =C N i=1 Ajuj+KuL2(Ω) .
ThisshowsuX≤CuX.TheinequalityuX ≥ uX followseasilyfromthetriangleinequality,namely
uX≥ vL2(Ω)+Dv−wM(Ω;Rn)+EwM(Ω;Rn×n).
Thus ·X isequivalent to· X.
NextweobservethatclearlyDvk−wk Dv∗ −winM(Ω;Rn) andEwk Ew∗ inM(Ω;Rn×n) ifvk v∗
inBV(Ω) andwk w∗ weakly*inBD(Ω).ThusA
1andA2areweak*continuous. Assumption A-KAfollows.
Considerthenthesatisfactionof Assumption A-δ.If= 1,wemaythenpickf¯δ,1= (f,0),inwhichcase
A1fδ,1 =Df, and A2fδ,1 = 0.If = 2,which isthe only other case,we maypick asmoothapproximate
fδ,2to f suchthat
1
2f−fδ,2
2
Thenwesetf¯δ,2:= (fδ,2,∇fδ,2),yieldingA1f¯δ,2= 0 andA2f¯δ,2=∇2fδ,2.Byfδ,2beingsmoothweseethat A2f¯δ,2M<∞.Thus Assumption A-δissatisfiedforthesquaredL2fidelityΦ(v)=f −v2L2(Ω).
Example6 (Infimal convolution TV denoising). Letus takeZ =L2(Ω),and X = (BV(Ω)∩L2(Ω))×X 2,
where
X2:={v∈W1,1(Ω)| ∇v∈BV(Ω;Rn)},
andBV(Ω)∩L2(Ω) againhasthetopologyof Example 4.Settingu= (v,w),aswell as
K0(v, w) =v+w, A1u=Dv, and A2u=D∇w,
weobtainfor (Dα)withK=K0andY =Z theinfimalconvolutiontotalvariationdenoisingmodelof [15]. Assumptions A-KA,A-δ,andA-H areverifiedanalogouslytoTGV2 in Example 5.
Example7 (Deblurring and sub-sampled Fourier transforms for MRI). LetK0 andtheAjsbe givenbyone
ofthe regularisersof Example 4to Example 6. AlsotakethecostF =FL2
2 or FL1η∇ as in Example 3,and
Φ as the squaredL2 fidelity of Example 2. However,let us now takeK =T K
0 for some bounded linear
operatorT :Z →Y. TheoperatorT couldbe,for example,ablurring kernelor a(sub-sampled) Fourier transform, in which case we obtain a modelfor learning the parameters for deblurring or reconstruction from Fouriersamples. Thelatterwouldbe important, forexamplefor magneticresonance imaging(MRI)
[5,44,45].Unfortunately,ourtheorydoesnoextendtomanyofthesecasesbecausewewillrequire,roughly,
K0∗K0 ≤CK∗K forsomeconstantC >0. Inother words, the cost functional cannot see what the fidelity does not.
Example8 (Parameter estimation in Bingham flow). Binghamfluids arematerialsthatbehaveas solidsif themagnitudeofthestresstensorstaysbelowaplasticitythreshold,andasliquidsifthatquantitysurpasses thethreshold.Inacrosssectionalpipe,ofsectionΩ,thebehaviourismodelled bytheenergyminimisation functional min u∈H1 0(Ω) μ 2u 2 H1 0(Ω)−(f|u)H−1(Ω),H 1 0(Ω)+α Ω |∇u|dx, (4)
where μ > 0 stands for the viscosity coefficient, α > 0 for the plasticity threshold and f ∈ H−1(Ω). In
many practicalsituations,the plasticitythreshold is notknowninadvance and hasto be estimated from experimentalmeasurements.Onethenaimsatminimising a leastsquaresterm
FL2 2 = 1 2u−f0 2 L2(Ω) subjectto (4).
The bileveloptimisation problem can then be formulated as problem (P)–(Dα), with the choices X = Y =H1
0(Ω),K theidentity,A1=D and
φ(v) = μ 2v 2 H1 0(Ω)−(f|u)H−1(Ω),H 1 0(Ω).
Concentratingintherestofthispaperprimarilyonimageprocessingapplications,wewillhoweverbriefly returntoBinghamflow in Example 13.
2.2. Considerations for numerical implementation
Forthenumericalsolutionofthedenoisingsub-problem,wewillinafollow-upwork [24]expanduponthe infeasible semi-smooth quasi-Newton approach taken in [35] for L2-TV imagerestoration problems. This
depends onHuber-regularisation of thetotalvariation measures,as well as enforcingsmoothnessthrough Hilbert spaces. This isusually doneby asquaredpenalty onthe gradient,i.e., H1 regularisation, butwe
formalisethismoreabstractlyinordertosimplifyournotationandargumentslateron.Therefore,wetake a convex, proper, and weak* lower-semicontinuous smoothing functional H : X → [0,∞], and generally expectittosatisfythefollowing.
AssumptionA-H (Smoothing). Weassumethat0∈domH andforeveryδ≥0,everyα∈ Pα∞,andevery
u∈X,theexistenceofuδ∈X satisfying
H(uδ)<∞ and J(uδ;α)≤J(u;α) +δ. (5) Example 9 (H1 smoothing inBV(Ω)). Usually, withH1(Ω;Rm)∩X =∅, wetake
H(u) := 1 2∇u 2 L2(Ω;Rm×n), u∈H1(Ω;Rm)∩X, ∞, otherwise.
This isinparticular thecasewith Example 4(TV),where X = BV(Ω)∩L2(Ω)⊃H1(Ω),and Example 5
(TGV2), where X = (BV(Ω)∩L2(Ω))×BD(Ω)⊃H1(Ω)×H1(Ω;Rn) on abounded domain Ω. In both of these cases, weak* lower semicontinuity is apparent; for completeness we record this in Lemma 1 in Section 4.2.In caseof Example4, (5)is immediate from approximatingustrictly by functionsinC∞(Ω) usingstandardstrictapproximationresultsinBV(Ω)[2].Incaseof Example 5,thisalsofollowsbyasimple generalisation ofthesameargumentto TGV-strictapproximation,aspresented in [43,10].
Forparameters≥0 andγ∈(0,∞],wethenconsider theproblem min
α∈Pα∞
F(uα,γ,) (Pγ,)
where uα,γ,∈X∩domHsolves
min u∈XJ γ,(u;α) (Dγ,) for Jγ,(u;α) :=H(u) + Φ(Ku) + N j=1 αjAjuγ,j.
Here wedenoteforshort theHuber-regularisedtotalvariationnorm
μγ,j :=μγ,M(Ω;Rmj), (μ∈ M(Ω;Rmj)),
asgivenbythefollowingdefinition.Thereweinterpretγ=∞togivebackthestandardunregularisedtotal variationmeasure.ClearlyJ∞,0=J,and (D
Definition1.Givenγ∈(0,∞], wedefineforthenorm· 2 onRm,theHuberregularisation |g|γ = g2−21γ, g2≥1/γ, γ 2g 2 2, g2<1/γ.
Weobservethatthiscanequivalentlybewrittenusingconvexconjugatesas
α|g|γ = sup q, g − 1 2γαq 2 2q2≤α . (6)
Thenifμ=fLn+μsistheLebesguedecompositionofμ∈ M(Ω;Rm) intotheabsolutelycontinuouspart
fLn and thesingularpartμs,we set
|μ|γ(V) :=
V
|f(x)|γdx+|μs|(V), (V ⊂Ω Borel-measureable).
Themeasures |μ|γ istheHuber-regularisationof thetotalvariation measures|μ|, andwedefine itsRadon
normas theHuberregularisation oftheRadonnorm ofμ,thatis
μγ,M(Ω;Rm):=|μ|γM(Ω;Rm).
Remark 4. The parameter γ varies in the literature. In this paper, we use the convention of [23], where largeγmeanssmallHuberregularisation.In [35],smallγ meanssmallHuberregularisation.Thatis, their regularisationparameteris1/γ inournotation.
2.3. Shorthand notation
Writingfornotationallightness
Au:= (A1u, . . . , ANu), and Rγ(μ1, . . . , μN;α) := N j=1 αjμjγ,j, R(·;α) :=R∞(·;α),
ourproblem (Pγ,)becomes
min
α∈P∞
α
F(uα) subject to uα∈arg min u∈X
Jγ,(u;α)
for
Jγ,(u;α) :=H(u) + Φ(Ku) +Rγ(Au;α).
Further,givenα∈ Pα,wedefinethe“marginalised”regularisationfunctional
Tγ(v;α) := inf
¯
v∈K−1vR
HereK−1standsforthepreimage,sotheconstraintis¯v∈X withv=K¯v.Theninthecase(γ,)= (∞,0)
and F =F0◦K,ourproblemmayalsobewrittenas
min α∈Pα∞ F0(vα) subjectto vα∈arg min v∈Y Φ(v) +T(v;α).
This givestheproblem amuchmoreconventionalflair,asthefollowingexamplesdemonstrate.
Example 10 (TV as a marginal). Considerthe totalvariationregularisation of Example 4. Then A1=D,
N = 1, and
T(v;α) =R(Av;α) =αTV(v).
Example11 (TGV2 as a marginal). IncaseoftheTGV2 regularisationof Example 5,wehaveK(v,w)=v
and K†v= (v,0).ThusKK†f =f,etc., so
T(v;α) = inf
w∈BD(Ω)R(Dv−w, Ew;α) = TGV 2
α(v).
3. Mainresults
Ourtasknowistostudythecharacteristicsofoptimalsolutions,andtheirexistence.Ourresults,based on natural assumptions on the data and the original problem (P), derive properties of the solutions to thisproblemandallnumericallyregularisedproblems (Pγ,)sufficientlyclosetotheoriginalproblem:large γ >0 and small >0.We denoteby uα,γ, any solutionto (Dγ,) forany givenα∈ Pα,and byαγ, any
solutionto (Pγ,).Solutionsto (D
α)and (P)wedenote,respectively,byuα=uα,∞,0, andαˆ=α∞,0.
3.1. L2-squared cost and L2-squared fidelity
Ourmain existenceresultregardingL2-squaredcostsandL2-squaredfidelitiesisthefollowing.
Theorem 1. Let Y and Z be Hilbert spaces, F(u) = 1
2K0u−f0 2
Z, and Φ(v) = 12f −v 2
Y for some
f ∈ R(K), f0∈Z, and a bounded linear operator K0:X →Z satisfying
K0uZ≤C0KuY, for allu∈X for some constantC0>0. (8) Suppose Assumptions A-KA and A-δhold. If for some α¯∈intPα and t∈(0,1/C0] holds
T(f; ¯α)> T(f −t(K0K†)∗(K0K†f−f0); ¯α), (9) then problem (P)admits a solution αˆ∈intPα∞.
If, moreover, Assumption A-H holds, then there exist ¯γ ∈(0,∞) and ¯ ∈(0,∞) such that the problem (Pγ,)with (γ,)∈[¯γ,∞]×[0,¯] admits a solution α
γ,∈intPα∞, and the solution map
S(γ, ) := arg min
α∈Pα∞
F(uα,γ,)
We provethis result inSection 5. Outer semicontinuity of aset-valued map S :Rk ⇒ Rm means [39]
thatforany convergent sequencexk →xand S(xk)yk →y,we havey∈S(x).Inparticular, theouter
semicontinuity of S means that as the numerical regularisation vanishes, the optimal parameters for the regularisedmodels (Pγ,)tendtooptimalparametersoftheoriginalmodel (P).
Remark5.LetZ =Y and K0=K in Theorem 1.Then (9)reducesto
T(f; ¯α)> T(f0; ¯α). (10)
Alsoobserve thatourresultrequires,Φ◦K to measure allthedata thatF measures,inthemoreprecise sense givenby (8).If (8) didnothold,anoscillatingsolutionuα forα∈∂Pα∞,could largelypassthrough
the nullspace of K, hence havelow valuefor the objective J of theinner problem, yet havea large cost givenbyF.
Corollary1 (Total variation Gaussian denoising). Suppose f,f0∈BV(Ω)∩L2(Ω), and
TV(f)>TV(f0). (11)
Then there exist ¯,γ >¯ 0such that any optimal solution αγ, to the problem
αγ,∈arg min α≥0 1 2f0−uα,γ, 2 L2(Ω) with uα,γ,∈arg min u∈BV(Ω) 1 2f −u 2 L2(Ω)+αDuγ,M+ 2∇v 2 L2(Ω;Rn) satisfies αγ,>0whenever ∈[0,¯], γ∈[¯γ,∞].
Thatis,fortheoptimalparametertobestrictlypositive,thenoisyimagef should,intermsofthetotal variation,oscillatemorethanthenoise-freeimagef0.Thisisaverynaturalcondition:ifthenoisesomehow
hadsmoothedoutfeaturesfrom f0,thenweshouldnotsmoothitanymorebyTV regularisation!
Proof. Assumptions A-KA, A-δ, and A-H we have already verified in Example 4. We then observe that
K0 =K,so we are inthesetting of Remark 5. Followingthe mappingof the TVproblem to the general
frameworkusingtheconstructionin Example 4,wehaveK=I and K†=I embeddingswith Y =L2(Ω).
K†is boundedonR(K)=L2(Ω)∩BV(Ω).Moreover,by Example 10, T(v;α¯)= ¯αTV(v).Thus (10)with
thechoicet= 1 reducesto (11). 2
ForTGV2 wealsohaveaverynaturalcondition.
Corollary 2 (Second-order total generalised variation Gaussian denoising). Suppose that the data f,f0 ∈
L2(Ω)∩BV(Ω) satisfies for some α
2>0the condition
TGV2(α2,1)(f)>TGV2(α2,1)(f0). (12) Then there exist,¯γ >¯ 0such that any optimal solution αγ,= ((αγ,)1,(αγ,)2)to the problem
αγ, ∈arg min α≥0 1 2f0−vα,γ, 2 L2(Ω)
with (vα,γ,, wα,γ,)∈ arg min v∈BV(Ω) w∈BD(Ω) 1 2f −v 2 L2(Ω)+α1Dv−wγ,M+α2Ewγ,M + 2(∇v,∇w) 2 L2(Ω;Rn×Rn×n) satisfies (αγ,)1,(αγ,)2>0whenever ∈[0,¯], γ∈[¯γ,∞].
Proof. Assumptions A-KA,A-δ, and A-H we have already verified inExample 5. We then observe that
K0 =K,so we areinthe setting of Remark 5.By Example 11, T(v;α¯)= TGV2α¯(v).Finally, similarlyto (11),wegetfor (10)witht= 1 thecondition (12). 2
Example 12 (Fourier reconstructions). Let K0 be given, for example as constructed inExample 4 or Ex-ample 5.If wetakeK =FK0 forF theFouriertransform –or anyother unitary transform–then (8)is
satisfiedandK†=K0†F∗.Thus (9)becomes
T(f; ¯α)> T(f −t(K0K0†F∗)∗(K0K0†F∗f−f0); ¯α).
WithF∗f,f0∈ R(K0) andt= 1 thisjustreduces to
T(f; ¯α)> T(Ff0; ¯α).
Unfortunately,ourresultsdonotcoverparameterlearningforreconstructionfrompartialFouriersamples exactly because of (8). What we can do is to find the optimal parameters if we only know apart of the ground-truth, buthavefullnoisydata.
3.2. Huberised total variation and other L1-type costs with L2-squared fidelity
We now consider the alternative“Huberised total variation”cost functional from Example 3. Unfortu-nately, weare unableto deriveforFL1
η∇ easilyinterpretableconditionsas fortheFL22.Theworkaroundis
to “discretise” thenorm, inthe senseof testingonlywith afinitenumberof testfunctions.To be precise, recall that FL1 η∇(u) =D(f0−K0u)η, where μη = sup ξ∈C∞c (Ω;Rn), ξ ≤1 (μ|ξ)− 1 2ηξL2(Ω;Rn).
Theideaistoreplacetheset oftestfunctionsξbyafiniteset
V ={ξ1, . . . , ξM} ⊂ {ξ∈Cc∞(Ω;Rn)| ξ ≤1}.
Thatis,weapproximateFL1
η∇ by FLV1 η∇(u) := sup ξ∈V (μ|ξ)− 1 2ηξL2(Ω;Rn), where μ=D(f0−K0u).
ChoosingV suitably, thisamountsto takingthefinite-dimensionalHuberised 1-normofadiscretised gra-dientoff0−K0u.
To study bi-level optimisationproblems with thecost FLV1
η∇, we rewrite the functional using a generic
HuberisedL1-cost FL1 η(z) := sup λ Z∗≤1 (λ|z−f0)− 1 2ηλ 2 Z∗
onareflexiveBanachspaceZ,forsomef0∈Z.Indeed,defining
DVv:={(−divξ1|v), . . . ,(−divξM|v)}, (v∈BV(Ω)),
wehave
FLV1
η∇:=FL1η◦D
V wheref
0:=DVf0 andZ =RM with∞-norm.
OurresultsonFV L1
η∇ arethenbasedonthefollowing generalresult.
Theorem 2. Let Y be a Hilbert space, and Z a reflexive Banach space. Let F = FL1
η ◦K0, and Φ(v) =
1
2f −v 2
Y for some compact linear operator K0 :X →Z satisfying (8)and f ∈ R(K). Suppose
Assump-tions A-KA and A-δ hold. If for some α¯∈intPα and t>0holds
T(f; ¯α)> T(f −t(K0K†)∗λ; ¯α), λ∈∂FL1
η(K0K
†f), (13)
then the problem (P)admits a solution αˆ∈intPα∞.
If, moreover, Assumption A-H holds, then there exist ¯γ ∈ (0,∞) and ¯ ∈ (0,∞) such that the prob-lem (Pγ,) with (γ,) ∈ [¯γ,∞]×[0,¯] admits a solution αγ, ∈ intPα∞, and the solution map S is outer
semicontinuous within [¯γ,∞]×[0,¯].
Weprovethis resultinSection5.
Remark6.IfK0=K,thecondition (13)hasthemuchmorelegibleform
T(f; ¯α)> T(f −tλ; ¯α), λ∈∂FL1
η(f). (14)
Also if K is compact, then the compactness of K0 follows from (8). In the following applications, K is
howevernot compactfor typical domains Ω⊂R2 or R3, so we haveto make K
0 compact bymaking the
rangefinite-dimensional.
Corollary 3 (Total variation Gaussian denoising with discretised Huber-TV cost). Suppose that the data satisfies f,f0∈BV(Ω)∩L2(Ω) and for some t>0and ξ∈V the condition
TV(f)>TV(f+tdivξ), −divξ∈∂FLV1
η∇(f). (15)
Then there exist,¯γ >¯ 0such any optimal solution αγ, to the problem
min α≥0F V L1 η∇(f0−vα) with
uα∈arg min u∈BV(Ω) 1 2f −u 2 L2(Ω)+αDuγ,M+ 2∇v 2 L2(Ω;Rn) satisfies αγ, >0whenever ∈[0,¯], γ∈[¯γ,∞].
This saysthatfortheoptimalparametertobestrictly positive,thenoisyimagef shouldoscillatemore than the image f +tdivξ in the direction of the (discrete) total variation flow. This is a very natural condition, andweobservethatthenon-discretisedcounterpart of (15)forγ=∞wouldbe
TV(f)>TV(f+tdivξ), ξ∈Sgn(Df0−Df),
where wedefineforameasureμ∈ M(Ω;Rm) thesign
Sgn(μ) :={ξ∈L1(Ω;μ)|μ=ξ|μ|}.
Thatis,−divξ isthetotalvariationflow.
Proof. Analogous to Corollary 1 regarding the L2 cost. Note that in the application of Theorem 2, the
operatorDV isabsorbedbyK0.Precisely,wetakeK0=DV,whileK=I. 2
ForTGV2 wealsohaveananalogousnaturalcondition.
Corollary4 (TGV2Gaussian denoising with discretised Huber-TV cost). Suppose that the data f,f0∈L2(Ω) satisfies for some t,α2>0and ξ∈V the condition
TGV2(α2,1)(f)>TGV2(α2,1)(f+tdivξ), −divξ∈∂FLV1
η∇(f). (16)
Then there exist¯,¯γ >0such any optimal solution αγ, = ((αγ,)1,(αγ,)2)to the problem
min α≥0F V L1 η∇(f0−vα) with (vα, wα)∈ arg min v∈BV(Ω) w∈BD(Ω) 1 2f −v 2 L2(Ω)+α1Dv−wγ,M+α2Ewγ,M + 2(∇v,∇w) 2 L2(Ω;Rn×Rn×n) satisfies (αγ,)1,(αγ,)2>0whenever ∈[0,¯], γ∈[¯γ,∞].
Proof. Analogousto Corollary 2. Notethatfortheapplicationof Theorem 2,where inthepresent setting
u= (v,w),wetakeK0u=DVv,while Ku=v. 2
4. Afewauxiliaryresults
We record in this section some general results that will be useful in the proofs of the main results. These includethecoercivity of thefunctional Jγ,0(·;λ,α),recorded inSection4.1. Wethen discusssome
elementarylowersemicontinuityfactsinSection4.2.WeprovideinSection4.3somenewresultsforpassing from strictconvergenceto strongconvergence.
4.1. Coercivity Observethat Rγ(μ1, . . . , μN; ¯α) = sup ψj∈Cc∞(Ω;Rmj×Rn) supx∈Ω ψj(x)22≤1 N j=1 ¯ αj μj(ψj)− 1 2γψj 2 L2(Ω;Rn) . Thus R(μ;α)≥Rγ(μ;α)≥R(μ;α)−C 2γL n(Ω)
for someC =C( ¯α). Since Ω is bounded, it follows that given δ >0, forlarge enough γ >0 and every
≥0 holds
J(u;α)−δ≤Jγ,0(u;α)≤Jγ,(u;α)≤J0,(u;α). (17) We will use these properties frequently. Based on the coercivity and norm equivalence properties in As-sumptions A-KA and A-Φ, the following proposition statesthe important fact thatJγ,0 is coercive with
respectto · X andthusalsothestandardnorm ofX.
Proposition 1.Suppose Assumptions A-KA and A-Φhold, and that α∈intPα∞. Let ≥0 and γ∈(0,∞]. Then
Jγ,(u;α)→+∞ as uX →+∞. (18)
Proof. Let {ui}∞
i=1 ⊂ X, and suppose supiJγ,(ui;α) < ∞. Then in particular supiΦ(Kui) < ∞. By
Assumption A-ΦthensupiKui
Y <∞.But Assumption A-KAsays
uiX=
N
j=1
Ajuij+KuiY ≤Jγ,(ui;α) +KuiY.
This implies supiuiX <∞. Bythe equivalenceof norms in Assumption A-KA, we immediately obtain
(18). 2
4.2. Lower semicontinuity
Werecord thefollowing elementarylower semicontinuityfactsthatwehave alreadyused tojustify our examples.
Lemma1. The following functionals are lower semicontinuous.
(i) μ→ ν−μγ,M with respect to weak* convergence in M(Ω;Rd).
(ii) v → f −v2
L2(Ω;Rd) with respect to weak convergence in Lp(Ω;Rd) for any 1< p≤2 on a bounded
domain Ω.
Proof. In each case, let {vi}∞
i=1 converge to v. Denoting by G the involved functional, we write it as a
convex conjugate,G(v)= sup{v,ϕ−G∗(ϕ)}. Takingasupremisingsequence{ϕj}∞j=1 forthisfunctional at any point v, weeasily see lowersemicontinuity byconsideringthesequences {vi,ϕj−G∗(ϕj)}∞i=1 for eachj.Incase(ii)weusethefactthatϕj∈L2(Ω;Rd)⊂[Lp(Ω;Rd)]∗ whenΩ isbounded.
In case(i), how exactlywe write G(μ)=ν−μγ,M as aconvex conjugate demandsexplanation. We
firstof allrecallthatforg∈Rn,theHuber-regularisednormmaybewrittenindualform as
|g|γ= sup q, g −γ 2q 2 2q2≤1 . Therefore, wefindthat G(μ) = sup ⎧ ⎨ ⎩μ(ϕ)− Ω γ 2ϕ(x) 2 2dx ϕ∈Cc∞(Ω), sup x∈Ω ϕ(x)2≤α ⎫ ⎬ ⎭. This hastherequiredform. 2
Wealsoshowherethatthemarginalregularisationfunctional Tγ(·;α¯) isweakly*lower semicontinuous
onY.ChoosingKas in Example 4and Example 5,thisprovides inparticularaproofthatTV andTGV2 are lowersemicontinuouswithrespect toweakconvergenceinL2(Ω) whenn= 1,2.
Lemma 2. Suppose α¯∈intPα∞, and Assumption A-KA holds. Then Tγ(·;α¯)is lower semicontinuous with respect to weak* convergence in Y, and continuous with respect to strong convergence in R(K).
Proof. Letvk v∗ weakly*inY. BytheBanach–Steinhaustheorem,thesequence isboundedinY.From
thedefinition
Tγ(v; ¯α) := inf
¯
v∈K−1vR
γ(Av¯; ¯α).
Therefore, ifwe pick>0 andv¯k∈K−1vk suchthat
Rγ(Av¯k; ¯α)≤Tγ(vk; ¯α) +,
then referralto Assumption A-KA, yieldsforsomeconstantc>0 the bound
cv¯kX ≤ vk+Rγ(Av¯k; ¯α)≤ vk+Tγ(vk; ¯α) +.
Without lossofgenerality, wemayassumethat lim inf
k→∞ T
γ(vk; ¯α)<∞,
becauseotherwisethereisnothingtoprove.Then{v¯k}∞k=1isboundedinX,andthereforeadmitsaweakly* convergent subsequence. Letv¯be the limitof this, unrelabelled,sequence. SinceK iscontinuous, we find thatK¯v=v. ButRγ(·;α¯) isclearlyweak*lower semicontinuousinX;see Lemma 1. Thus
Tγ(v; ¯α)≤Rγ(Av¯; ¯α)≤lim inf
k→∞ R
γ(A¯vk; ¯α)≤lim inf k→∞ T
γ(vk; ¯α) +.
To seecontinuitywithrespect tostrongconvergenceinR(K),weobservethatifv=Ku∈ R(K),then bytheboundednessoftheoperators{Aj}Nj=1 weget
Tγ(v; ¯α)≤Rγ(u; ¯α)≤R(u; ¯α)≤Cu,
for some constant C > 0. So we know that Tγ(·;α¯)|R(K) is finite-valued and convex. Therefore it is continuous [30, LemmaI.2.1]. 2
4.3. From Φ-strict to strong convergence
In Proposition 3, formingpart of the proof of ourmain theorems, we will need to passfrom “Φ-strict convergence” of Kuk to v to strong convergence, using the following lemmas. The former means that
Φ(Kuk) → Φ(v) and Kuk v weakly* in Y. By strong convexity in a Banach space Y, we mean the existenceofγ >0 suchthatforeveryy∈Y andz∈∂Φ(y)⊂Y∗holds
Φ(y)−Φ(y)≥(z|y−y) +γ 2y
−y2
Y, (y∈Y),
where (z|y) denotes the dual product, and the subdifferential ∂Φ(y) is defined by z satisfying the same expression with γ = 0. With regard to more advanced strict convergence results, we point the reader to
[25,38,36].
Lemma 3. Suppose Y is a Banach space, and Φ : Y → (−∞,∞] strongly convex. If vk vˆ∈ dom∂Φ
weakly* in Y and Φ(vk)→Φ(ˆv), then vk →vˆstrongly in Y.
Remark7. Bystandardconvex analysis [30], v ∈dom∂Φ if Φ has afinite-valued point of continuityand
v∈int dom Φ.
Proof. We first of all note that −∞ < Φ(ˆv) < ∞ because v ∈ dom∂Φ implies v ∈ dom Φ. Let us pick
z∈∂Φ(ˆv).From thestrongconvexityofΦ,forsomeγ >0 then Φ(vk)−Φ(ˆv)≥(z|vk−ˆv) +γ
2v
k−vˆ2
Y.
Takingthelimitinfimum,weobserve
0 = Φ(ˆv)−Φ(ˆv)d≥lim inf k→∞ γ 2v k−vˆ2 Y.
Thisproves strongconvergence. 2
Wenowusethelemmatoshow strongconvergenceofminimisingsequences.
Lemma 4.Suppose Φis strongly convex, satisfies Assumption A-Φ, and that C ⊂Y is non-empty, closed, and convex with intC∩dom Φ=∅. Let
ˆ
v:= arg min
v∈C
Φ(v).
If {vk}∞
Proof. BythestrictconvexityofΦ,impliedbystrongconvexity,andtheassumptionsonC,ˆvisuniqueand well-defined.Moreovervˆ∈dom∂Φ.Indeed,ourassumptionsshowtheexistenceofapointv∈intC∩dom Φ. The indicator function δC is thencontinuous at v, and so standardsubdifferential calculus(see, e.g., [30,
Proposition I.5.6]) implies that ∂(Φ+δC)(ˆv) = ∂Φ(ˆv)+∂δC(ˆv). But 0 ∈ ∂(Φ+δC)(ˆv) because ˆv ∈
arg minv∈YΦ(v)+δC(v).Thisimpliesthat∂Φ(ˆv)=∅.Consequentlyalsovˆ∈dom Φ,andΦ(ˆv)∈R.
Using thecoercivityof Φ in Assumption A-Φ wethen findthat{vk}∞
k=1 isboundedinY, atleastafter
movingtoanunrelabelledtailofthesequencewithΦ(vk)≤Φ(ˆv)+ 1.SinceY isadualspace,theunitball
isweak*compact,andwededucetheexistenceofasubsequence,unrelabelled,andv∈Y suchthatvk v
weakly* inY.Bytheweak*lowersemicontinuity(Assumption A-Φ),wededuce Φ(v)≤lim inf
k→∞ Φ(v
k) = Φ(ˆv).
Since eachvk ∈C,and C is closed,also v ∈C. Therefore,by thestrict convexityandthedefinition ofvˆ,
necessarily v = ˆv. Therefore vk vˆweakly*inY, and Φ(vk)→Φ(ˆv). Lemma 3now shows thatvk →vˆ strongly inY. 2
5. Proofsofthe mainresults
Wenowprovetheexistence,continuity,andnon-degeneracy(interiorsolution)resultsofSection3through a series of lemmas and propositions, starting from general ones that are then specialised to provide the naturalconditionspresentedinSection3.
5.1. Existence and lower semicontinuity under lower bounds
Our principaltool for proving existenceis the following proposition.We will inthe rest ofthis section concentrateonprovingtheexistenceofthesetKinthestatement.Webase thisonthenaturalconditions of Section3.
Proposition 2 (Existence on compact parameter domain). Suppose Assumptions A-KA and A-Φhold. With
≥0and γ∈(0,∞]fixed, if there exists a compact set K ⊂intPα∞ with
inf
α∈Pα∞\KF(uα,γ,)>α∈Pinfα∞
F(uα,γ,), (19)
then there exists a solution αγ,∈intPα∞ to (Pγ,). Moreover, the mapping
Iγ,(α) :=F(uα,γ,),
is lower semicontinuous within intPα∞.
Theproofdepends onthefollowing twolemmasthatwillbeusefullateronas well.
Lemma5 (Lower semicontinuity of the fidelity with varying parameters). Suppose Assumptions A-KA, A-Φ, and A-H hold. Let uk u∗ weakly* in X, and (αk,γk,k)→(α,γ,)∈intP∞
α ×(0,∞]×(0,∞). Then
Jγ,(u;α)≤lim inf
k→∞ J γk,k
Proof. Let ˆ >0 be such thatαj ≥ˆ, (j = 1,. . . ,N). Wethen deducefor largek andsomeC =C(Ω,γ) that lim sup k Jγ,k(uk; ˆ, . . . ,ˆ)≤lim sup k Jγk,k(uk; ˆ, . . . ,ˆ) +C ≤lim sup k Jγk,k(uk;αk) +C <∞. (20) Herewehaveassumedthefinalinequalitytohold.Thiscomeswithoutloss ofgenerality,becauseotherwise there is nothingto prove.Observethatthis holds evenif {αk}∞k=1 isnot bounded.In particular, if>0, restrictingktobe large,wemayassumethat
C1:=H(uk)<∞. (21) Werecallthat Jγ,(u;α) :=H(u) + Φ(Ku) + N j=1 αjAjuγ,j. (22)
Wewantto showlower semicontinuity ofeachof thetermsinturn.Westartwith thesmoothingterm. If
>0,using (21),wewrite
kH(uk) = (k−)H(uk) +H(uk)≤(k−)C1
2 +H(u
k).
Bytheconvergencek→,and theweak*lower semicontinuityofH,wefindthat
H(u)≤lim inf k→∞ kH(uk). (23) If= 0, wehave H(u) = 0· ∞= 0, whilestill 0≤sup k kH(uk)<∞. Thus (23)follows.
The fidelity term Φ◦K is weak* lower semicontinuous by the continuity of K and the weak* lower semicontinuity ofΦ.Itthereforeremainstoconsiderthetermsin (22)involvingboththeregularisation pa-rametersα,aswellastheHuberisationparameterγ.Indeedusingthedualformulation (6)oftheHuberised norm,we haveforsomeconstantC=C(ˆ,Ω) that
Ajukγk,j= sup ϕ(x) ≤1 ϕ dAjuk− 1 2γkϕ 2dx ≥ sup ϕ(x) ≤1 Ω ϕ dAjuk+ 1 2γϕ 2dx−C|γ−1−(γk)−1| =Ajukγ,j−C|γ−1−(γk)−1|.
Thus, ifα∈ Pα, weget
αkjAjukγk,j=αjAjukγk,j+ (αkj −αj)Ajukγk,j
≥αjAjukγk,j− |αkj −αj|Ajukγk,j
≥αjAjukγ,j−C|γ−1−(γk)−1| − |αkj −αj|Ajukγk,j.
It follows from (20) thatthe sequence {Ajukγk,j}k∞=1 is bounded in M(Ω;Rmj) for eachj = 1,. . . ,N.
Thus lim inf k→∞ α k jAjukγk,j≥lim inf k→∞ αjAju k γ,j ≥αjAjuγ,j,
where thefinalstepfollowsfrom Lemma 1.
Itremainstoconsider thecasethatα∈ Pα∞\ Pα,i.e.,whenα=∞forsome.Wemaypicksequences
{β
j}∞=1, (j = 1,. . . ,N), such that βj αj. Further, we may find {βjk,}∞=1 such that β
k,
j ≤ αkj with
αk
j = lim→∞βjk, andβj= limk→∞βjk,.Then, bythebounded casestudiedabove
lim inf k→∞ α k jAjukγk,j≥lim inf k→∞ β k, j Ajukγk,j≥βjAjuγ,j.
But{αkjAjukγk,j}∞k=1 isboundedby (20),and
lim inf
→∞ β
jAjuγ,j ≥αjAjuγ,j.
Thus lowersemicontinuityfollows. 2
Lemma 6 (Convergence of reconstructions away from boundary). Suppose Assumptions A-KA, A-Φ, and A-H hold. Let (αk,γk,k)→(α,γ,)in intP∞
α ×(0,∞]×[0,∞). Then we can find uα,γ,∈arg minJγ,(·;α)
and extract a subsequence satisfying
Jγk,k(uαk,γk,k;αk)→Jγ,(uα,γ,;α), (24a)
uαk,γk,k u∗ α,γ, weakly* X, and (24b) Kuαk,γk,k→Kuα,γ, strongly inY. (24c)
Proof. By Lemma 5, wehave lim inf k→∞ J γk,k(u αk,γk,k;αk)≥Jγ,(uα,γ,;α) = min u∈XJ γ,(u;α).
Wealsowanttheoppositeinequality lim sup
k→∞ J γk,k(u
αk,γk,k;αk)≤Jγ,(uα,γ,;α). (25)
Letδ >0.If= 0,weuse Assumption A-H onu=uα,γ,,toproduceuδ.Otherwise,wesetuδ =uα,γ,.In
eithercase