Chapter 4 Bayesian Approach to Bar Code Denoising
5.1 Introduction
Determining the behaviour of transition paths of complex molecular dynamics is essential for understanding many problems in physics, chemistry and biology. Direct simulation of these systems can be prohibitively expensive, mainly due to the fact that the dynamical systems can exhibit the phenomenon ofmetastability, which involves disparate time scales: the transitionbetweenmetastable states is logarithmic in the inverse temperature, whilst the fluctuationswithinthe metastable states have durations which are exponential in the inverse temperature. In many systems the interest is focused on the transition between metastable states and not the local fluctuations within them. This chapter addresses the problem of characterising the most likely transition paths of molecular models of chemical reactions.
We focus on the Brownian dynamics model from molecular dynamics which takes the form of a gradient flow in a potential, subject to small additive thermal noise:
dx(t) =−∇V(x(t))dt+√2εdW(t); (5.1) we study the equation subject to the end-point conditions
x(0) =x−, x(T) =x+. (5.2) HereV :Rd→ Ris the potential function,W is a standard Brownian motion inRdand
ε > 0 is a small parameter related to the temperature of the thermal system. The Brow- nian dynamics model is widely used in the study of molecular dynamics [148]. It is also referred to as the overdamped Langevin equation, and can be derived from the second or-
der Langevin dynamics model, which has the form of damped-driven Newtonian dynamics with potential energyV, by taking a large friction or a small mass limit; see [184, Chapter 7, Exercise 8] and [148] for explicit derivations.
Mathematically we understand the processx(t), t∈[0, T]satisfying (5.1), (5.2) to be the initial value problem of (5.1) starting fromx(0) = x−, subject to the conditioning
x(T) = x+ [112]. We propose to study the sample path of this conditioned process as a model for the temporal evolution of molecules making a transition between two atomistic configurationsx±. In this chapter, we will assume thatx±are critical points ofV; indeed
most interest focuses on the case where both endpoints are chosen to be local minima ofV. When the temperatureεis small and when the end-point condition onx(T) is re- moved, typical realisations of (5.1) exhibit fluctuations around the local minima ofV for long stretches of time (exponential inε−1) while the occasional rapid transitions between different minima occur on a much shorter time scale which is only logarithmic in ε−1. The difference between these time scales makes it difficult to sample transition paths when
ε is small. As an alternative to direct sampling, several notions of “most likely transi- tion paths” have been proposed; of particular interest here are the Freidlin-Wentzell and Onsager-Machlup theories.
In the zero temperature limitε → 0, the behaviour of transition paths can be pre- dicted with overwhelming probability using Freidlin-Wentzell theory [98]. For any fixed
T, the solution process{x(t), t∈[0, T]}to (5.1), (5.2) satisfies a large deviation principle with rate (or action) functional given by
ST(ϕ) := 1 4 Z T 0 |ϕ0(t) +∇V(ϕ(t))|2dt (5.3) withϕ ∈ H±1(0, T;Rd) := {x ∈ H1(0, T;Rd) : x(0) = x−, x(T) = x+}. Loosely speaking the large deviation principle states that for any smallδ > 0, the probability that the solutionxlies in a tube of widthδaround a given pathϕis approximately given by
P{x: sup
t∈[0,T]
|x(t)−ϕ(t)| ≤δ} ≈exp(−ε−1ST(ϕ)) (5.4)
forεsmall enough. HerePdenotes the law of the process defined in (5.1), (5.2). The large
deviation principle thus characterises the exponential tail of the distribution of the transition paths; but what is of most interest to us is that it leads to a natural variational definition of the most likely path: the minimiser of the rate functionalST can be interpreted as most
likely path in the sense that the probability of a trajectory in a small neighbourhood of this minimiser is exponentially larger inε−1 than the probability of hitting neighbourhoods of any other paths.
In view of the boundary conditions (5.2), one can rewrite the functionalST as ST(ϕ) := 1 4 Z T 0 |ϕ0(t) +∇V(ϕ(t))|2dt = 1 4 Z T 0 |ϕ0(t)|2+|∇V(ϕ(t))|2dt+1 2 Z T 0 ϕ0(t)· ∇V(ϕ(t))dt = 1 4 Z T 0 |ϕ0(t)|2+|∇V(ϕ(t))|2dt+1 2((V(x+)−V(x−)). (5.5)
The last term in this expression only depends on the boundary conditions and not on the specific choice ofϕ. Hence minimisingST(ϕ) is equivalent to minimising the following
Freidlin-Wentzell functional ST(ϕ) := 1 4 Z T 0 |ϕ0(t)|2+|∇V(ϕ(t))|2dt (5.6) overH1
±(0, T;Rd), and from now on we refer to the minimisation of this functional as
the Freidlin-Wentzell approach. The Freidlin-Wentzell viewpoint has been enormously in- fluential in the study of chemical reactions. For example the elastic band method [135] and the string method [80, 82] are numerical methods for finding minimal energy paths based on minimisation of the action functional (5.3). See the review article [220] for recent development of transition path theory.
At finite temperatureε > 0, optimal transition paths can be defined as minimisers of the Onsager-Machlup functional [78]. This functional is defined by maximising small ball probabilities for pathsx(·) solving (5.1), (5.2). To be more precise, we denote byP0 the law of the Brownian bridge on[0, T]connectingx−andx+, corresponding to vanishing
drift (V = 0) in (5.1), (5.2), which depends onε.Then under certain conditions onV (see (ii) of Remark 5.2.2), the measure P is absolutely continuous with respect toP0 and the Radon-Nikodym density is given by
dP dP0 (x) = 1 Z exp −1 2ε Z T 0 Ψε(x(t))dt (5.7) where Ψε(x) := 1 2|∇V(x)| 2−ε∆V(x). (5.8)
define the Onsager-Machlup functionalIεover the spaceH±1(0, T;Rd)by Iε(x) := 1 2 Z T 0 1 2|x 0 (t)|2+ Ψε(x(t)) dt=ST(x)− ε 2 Z T 0 ∆V(x(t))dt. (5.9)
In [78] it was shown that for anyx1, x2 ∈H±1(0, T;Rd)
lim δ→0 P(Bδ(x1)) P(Bδ(x2)) = exp 1 ε(Iε(x2)−Iε(x1))
whereBr(x) denotes a ball inC([0, T];Rd) with centrex and radius r. Hence for any
fixedx2, the above ratio of the small ball probability, as a function ofx1, is maximised at minimisers ofIε. In this sense minimisers ofIε are analogous to Maximum A Posterior
(MAP) estimatorswhich arise for the posterior distributionPin Bayesian inverse problems;
see [63].
The Onsager-Machlup functional (5.9) differs from the Freidlin-Wentzell functional only by the integral of the Itˆo correction termε∆V. This difference arises because of the order in which the limitsε→0andδ →0are taken: in Freidlin-Wentzell theory the radius of the ballδis fixed and limitε→0is studied while in Onsager-Machlup theoryεis fixed and limitδ → 0 is studied. For fixedT > 0, it is clear thatIε(ϕ) → ST(ϕ)asε → 0.
Hence for fixed time scaleTthe Onsager-Machlup theory agrees with the Freidlin-Wentzell theory in the low temperature limit. However, this picture can be different for largeT, more precisely whenT → ∞ asε → 0. In fact, as demonstrated in [188], it is possible that whenT 1, the MAP transition path spends a vast amount of time at a saddle point ofV
rather than at minima; moreover, for two paths with the same energy barrier, the one passing through steeper confining walls is always preferred to the other since a larger value of∆V
gives rise to a lower value ofIε. The discussion about the order of limits gives a clue as to
why this apparent contradiction occurs: by studying the limitδ → 0in Onsager-Machlup theory, for fixed temperatureε, we remove entropic effects.
Both minimising the Onsager-Machlup functional (5.9) or finding MAP estimators are attempts to capture key properties of the distributionP by identifying a single most
likely path. This can be viewed as approximating the measureP by a Dirac measure in
a well-chosen point. The key idea in this chapter is to find better approximations toν by working in a larger class of measures than Diracs. We will study the best Gaussian approx- imations with respect to Kullback-Leibler divergence. The mean of an optimal Gaussian should capture the concentration of the target measure while its fluctuation characteristics are described by the covariance of the Gaussian. Furthermore the fluctuations can cap- ture entropic effects. Thus by using the Gaussian approximation we aim to overcome the shortcomings of the Onsager-Machlup approach. The idea of finding Gaussian approxima-
tions for non-Gaussian measures by means of the Kullback-Leibler divergence is not new. For example, in the community of machine learning [194], Gaussian processes have been widely used together with Bayesian inference for regression and prediction. Similar ideas have also been used to study models in ocean-atmosphere science [161] and computational quantum mechanics [14]. Recently, the problem of minimising the Kullback-Leibler di- vergence between non-Gaussian measures and certain Gaussian classes was studied from the calculus of variation point of view [187] and numerical algorithms for Kullback-Leibler minimisation were discussed in [190].
The present chapter builds on the theory developed in [187] and extends it to tran- sition path theory. More specifically, the set of Gaussian measures for approximations is parameterised by a pair of functions(m,A), wheremrepresents the mean andA(defined in (5.18)) is used to define the covariance operator for the underlying Gaussian measure. For a fixed temperatureε, the Kullback-Leibler divergence is expressed as a functionalFε
depending on(m,A) and existence of minimisers is shown in this framework. Then the asymptotic behaviour of the best Gaussian approximations in the low temperature limit is studied in terms of theΓ-convergence of the functionals{Fε}. The limiting functional (see
(5.57)) is identified as the sum of two parts. The first part, depending only onm, is iden- tical to theΓ-limit of the rescaled Freidlin-Wentzell action functional, implying that for
ε→0the most likely transition paths defined as the best Gaussian meanmcoincide with large deviation paths. The second part takes entropic effects into account and expresses the penalty for the fluctuations in terms ofA; it vanishes ifA=D2V(m(t))but this choice of
Ais only admissible if the HessianD2V(m)is positive definite. A strictly positive penalty occurs when D2V(m(t)) has a negative eigenvalue. Therefore minimising the limiting functional amounts to selecting those optimal pathsmamong the large deviation paths that do not spend time in saddles or local maximisers. We stress that although at finite noise intensityε > 0there is no explicit characterisation of our most likely transition paths, it is possible to approximately determine them numerically, as demonstrated in [190], see also Section5.5.
This chapter is organised as follows. In the next section we introduce a time- rescaling of the governing Langevin equation, in terms ofε, in which the undesirable effects of the Onsager-Machlup minimisation are manifest; we also introduce some notation used throughout this chapter. Furthermore, assumptions on the potentialV are discussed. In Section5.3, we define the subset of Gaussian measures over which Kullback-Leibler min- imisation is conducted; the existence of minimisers to the variational problem is established at the end of this section. Then in Section5.4, we study the low temperature limit of the Gaussian approximation usingΓ-convergence. The mainΓ-convergence result is given in Theorem5.4.5. Section5.5discusses some important consequences of theΓ-convergence
result, with emphasis on the link with theories of Freidlin-Wentzell and Onsager-Machlup. The proofs of Theorem5.4.5and some related results are presented in Section5.6.