Variational Problem - Kullback-Leibler minimisation

Chapter 4 Bayesian Approach to Bar Code Denoising

5.3 Kullback-Leibler minimisation

5.3.3 Variational Problem

A=nN(m,2(−∂2_t +Bε)−1) : (m,A)∈ H

whereBε=ε−2A2−ε−1A0and that the setAais defined in the same way withHreplaced

byHafor somea > 0. Given the measureµεdefined by (5.17), i.e. the law of transition

paths, we aim to find optimal Gaussian measuresνεfromAorAaminimising the Kullback-

Leibler divergenceDKL(νε||µε). To that end, first in view of (5.40), the constants |x1−x0| 2

4 andlog(Zµ,ε)can be neglected in the minimisation process since they do not depend on the

choice ofνε. Hence we are only concerned with minimising the modified Kullback-Leibler

divergenceDe_KL(ν_ε||µ_ε). Furthermore, instead of minimisingDe_KL(ν_ε||µ_ε), we consider the

variational problem inf ν∈A εDe_KL(ν_ε||µ_ε) +εγkAk2_H1₍₀_,₁₎ , (5.42)

whereγ >0andAis given by (5.31). We will also study the minimisation problem over the setA_a. The reasons why the problem (5.42) is of interest to us are the following. First, multiplyingDe_KL(ν_ε||µ_ε) by εdoes not change the minimisers. Yet after this scaling the

m-dependent terms ofDe_KL(ν_ε||µ_ε) (the first two terms on the right hand side of (5.41))

and the A-dependent terms (middle line of (5.41)) are well-balanced since they are all order one quantities with respect toε. Moreover, the regularisation termεγkAk2

H1₍₀_,₁₎ is

necessary because the matrixBε, along any infimising sequence for εDe_KL(ν_ε||µ_ε), will

only converge weakly and the minimiser may not be attained inA. This issue is illustrated in [187, Example 3.8 and Example 3.9] and a similar regularisation is used there.

Remark 5.3.3. The normalisation constantZµ,ε in(5.40) is dropped in our minimisation

problem. This is one of the advantages of quantifying measure approximations by means of the Kullback-Leibler divergence. However, understanding the asymptotic behaviour ofZµ,ε

in the limitε→0is quite important, even though this is difficult. In particular, it allows us to study the asymptotic behaviour of the scaled Kullback-Leibler divergenceεDKL(νε||µε),

whereby quantitative information on the quality of the Gaussian approximation in the small temperature limit can be extracted. In the next section we study behaviour of the minimisers of the functional defined in(5.42)in the limitε→ 0; we postpone study ofεDKL(νε||µε),

which requires analysis ofZµ,εin the limitε→0, to future work.

Remark 5.3.4. We choose the small weightεγwith someγ >0in front of the regularisation term with the aim of weakening the contribution from the regularisation so that it disappears in the limitε → 0. For the study of the Γ-limit of Fε, we will considerγ ∈ (0,1₂); see

Theorem5.4.5in the next section.

Remark 5.3.5. The Kullback-Leibler divergence is not symmetric in its arguments. We do not studyDe_KL(µ_ε||ν_ε)because minimisation of this functional over the class of Gaussian

measures leads simply to moment matching and this is not appropriate for problems with multiple minimisers, see [19, Section 10.7].

The following theorem establishes the existence of minimisers for the problem (5.42).

Theorem 5.3.6. Given the measureµε defined by(5.17)with fixedε > 0. There exists at

least one measureν ∈ A(orAa) minimising the functional

ν7→εDeKL(ν||µε) +εγkAk2H1₍₀_,₁₎ (5.43)

overA(orAa).

Proof. We only prove the theorem for the case where the minimising problem is defined overAa since the other case can be treated in the same manner. First we show that the

infimum of (5.43) overA_ais finite for any fixedε >0. In fact, considerA∗ =a·Idwith a > 0 andm∗ being any fixed function in H1±(0,1). Then we show that F(m∗,A∗) is

finite. For this, by the formula (5.41), we only need to show that

Eνε

Z 1

Ψε(z(t) +m∗(t))dt <∞.

SinceA∗ =a·Id, from (5.28) one can see thatz(t)∼N(0,2Gε(t, t))under the measure νε. In addition, it follows from (5.71) that|Gε(t, t)|F ≤Cεa.e. on(0,1)for someC >0.

Then from the growth condition (A-4) onΨεand the fact thatm∗∈L∞(0,1),

Eνε Z 1 0 Ψε(z(t) +m∗(t))dt = Z 1 0 Z Rd 1 p (4π)d_det(_G ε(t, t)) e−14x T_G_ε(_t,t₎−1_x Ψε(x+m∗(t))dxdt = Z 1 0 Z Rd 1 (4π)d/2e −1 4|x| 2 Ψε (Gε(t, t)1/2)x+m∗(t) dxdt ≤C1exp km∗kα_L∞₍₀_,₁₎ Z Rd e−12|x|2+C2εα|x|αdx <∞ sinceα∈[0,2).

Next, we prove that the minimiser exists. By examining the proof of [187, The- orem 3.10], one can see that the theorem is proved if the following statement is valid: if

a sequence{An} ⊂ H1a(0,1)satisfiessupnkAnkH1₍₀_,₁₎ < ∞, then the sequence{B_n}

withBn = ε−2An2 −ε−1A0n, viewed as multiplication operators, contains a subsequence

that converges to B = ε−2_A2 ₋_ε−1_A0 _in _L(_Hβ_,_H−β₎ _{for some} _A _∈ _H1

a(0,1) and

someβ ∈ (0,1). Hence we only need to show that the latter statement is true. In fact, if sup_nkAnkH1₍₀_,₁₎ <∞, then there exists a subsequence{A_n_k}and someA ∈H1(0,1)

such thatAnk *AinH

1₍₀_,_{1). By Rellich’s compact embedding theorem,}_A

nk →Ain

L2(0,1)and passing to a further subsequence we may assume thatAnk →Aa.e. on[0,1]. This implies thatAis symmetric andA≥a·Ida.e. and henceA∈H1a(0,1). In addition,

it is clear thatBnk * B inL

2₍₀_,_{1). According to Lemma}_5.7.9_{, for any}_{α, β >} ₀ _such thatβ > max(α, α/2 + 1/4), a matrix-valued function inH−α(0,1)can be viewed as a multiplication operator inL(Hβ,H−β). Thanks to the compact embedding fromL2(0,1) toH−α(0,1), we obtainBnk →BinL(H

β_,_H−β_{). The proof is complete.}

Remark 5.3.7. minimisers of (5.43) are not unique in general. The uniqueness issue is outside the scope of this chapter; see more discussions about uniqueness of minimising the Kullback-Leibler divergence in [187, Section 3.4].

In document Asymptotic analysis and computations of probability measures (Page 131-133)