Characterisation of gradient flows on finite state Markov chains

(1)

ISSN:1083-589X _{in PROBABILITY}

Characterisation of gradient flows

on finite state Markov chains

*

Helge Dietert

†

Abstract

In his 2011 work, Maas has shown that the law of any time-reversible continuous-time Markov chain with finite state space evolves like a gradient flow of the relative entropy with respect to its stationary distribution. In this work we show the converse to the above by showing that if the relative law of a Markov chain with finite state space evolves like a gradient flow of the relative entropy functional, it must be time-reversible. When we allow general functionals in place of the relative entropy, we show that the law of a Markov chain evolves as gradient flow if and only if the generator of the Markov chain is real diagonalisable. Finally, we discuss what aspects of the functional are uniquely determined by the Markov chain.

Keywords:Gradient flows; Finite state Markov chains; Time-reversibility. AMS MSC 2010:60J27.

Submitted to ECP on May 12, 2014, final version accepted on March 1, 2015. SupersedesarXiv:1405.2552v2.

1 Introduction

The seminal paper of Jordan, Kinderlehrer, and Otto [3] identified Markov processes in the continuous setting as gradient flows of the entropy using the Wasserstein distance. This understanding lead to many new results (see Villani [6] for an overview). More recently, Maas [4] considered Markov chains with finite state space and showed that, in this case, the Wasserstein distance does not allow this identification. Instead, assuming a time-reversible Markov chain, he was able to construct a different metric that allows this identification. Different constructions have been given in [1, 5] and the setting used is also described in [2].

The construction of the metric is involved and uses time-reversibility at several places. This further motivates our study of the converse of these statements. For this we will first introduce the setting used and define the gradient flow.

We consider continuous-time irreducible Markov chain with finite state spaceX = {0,1, . . . , d}. We denote its generator byQ∈_RX ×X _{where for}_i₆₌_j_{the entry}_Q

ij is the transition rate from stateito statejandQii=−Pj6=iQij.

Given the initial probability distribution µ of the Markov chain, the probability distribution after timetwill be given byµetQ_{. Note that the transition matrix}_etQ_acts on the right on the row vectorµand the evolution of the law is captured by the Markov

*_{Support: UK Engineering and Physical Sciences Research Council (EPSRC) grant EP/H023348/1 for the} University of Cambridge Centre for Doctoral Training, the Cambridge Centre for Analysis.

(2)

semigroupetQ_{. Since the Markov chain is irreducible, there exists a unique stationary} distributionπto whichµetQ_{will converge as}_t_{→ ∞}_.

For the definition of a gradient flow, letP be the space of probability distributions on X with positive mass for any state. Then P can be naturally understood as the

d-dimensional sub-manifold{v∈RX :P

i∈Xvi = 1andvi>0∀i∈ X }ofRX. Under this identification, the tangent space at any point is

T ={v∈_RX _:X

i∈X

vi= 0}.

Given a functionalF :P 7→ _Rand a Riemannian metricg onP, the gradient flow

ρ:_R+_{7→ P} _{is determined by the differential equation}

g|ρ(t)( ˙ρ(t), v) =−dF|ρ(t)(v) ∀v∈T, t∈R+.

That isρ˙ equals−dFunder the identification of the tangent space and the cotangent space through the metricg.

We say that the gradient flow undergofFequals the flow associated to the semigroup

etQ_{if for all}_µ_{∈ P}_{the trajectory}_t_7→_µetQ_{equals the gradient flow}_ρ₍_t₎_with_ρ_{(0) =}_µ_. From the continuous setting a natural functional is the relative entropyHwith respect to the stationary stateπ, which is defined by

H(ρ) =−X i∈X ρilog πi ρi .

Throughout this work the relative entropy is understood with respect to the stationary distribution of the considered Markov chain.

The result by Maas [4] now is: There exists a metricgonP such that the gradient flow undergof the relative entropyHequals the flow associated to the semigroupetQ.

First we show that the metric is not unique:

Theorem 1.1.Ford≥2, consider a continuous-time irreducible Markov chain with finite state spaceX ={0,1, . . . , d}, generatorQ, and stationary distributionπ. LetF:P 7→_R be a differentiable functional andga Riemannian metric onP. If the flow associated to the semigroupetQequals the gradient flow ofFunder the metricg, then forρ∈ Pwith

ρ6=π, there exists another metricg˜onP such thatg˜6=gatρand such that the gradient flow ofFunder the metricg˜still equals the flow associated to the semigroupetQ_.

As converse of the construction we show:

Theorem 1.2.Consider a continuous-time irreducible Markov chain with finite state spaceX ={0,1, . . . , d}, generatorQ, and stationary distributionπ. If the flow associated to the semigroupetQ _{equals the gradient flow of} _F _∈_C2 _{under a Riemannian metric}

g∈C1onP, then, withg|πas metric atπ, (a) g|πis uniquely determined byQandF, (b) Qcan be computed fromg|πandF, (c) Qis real diagonalisable,

(d) Qis time-reversible ifFis the relative entropyH.

Conversely, ifQis real diagonalisable, then there exists a Riemannian metricgand a smooth functionalFonP such that the gradient flow ofFunder the metricgequals the flow associated to the semigroupetQ.

(3)

This result shows that the assumption of time-reversibility in the construction of the metric by Maas is necessary and cannot be relaxed. Moreover, the results for the generatorQcome from the differentiability around the equilibrium distributionπ, so that the theorem holds as long as there is a neighbourhood ofπin which the Markov semigroupetQequals the gradient flow.

Remark 1.3.Theorems 1.1 and 1.2 are obtained by analysing the Riemannian structure of the gradient flow and thus can be formulated for general gradient flows on finite-dimensional manifolds. For this, part (d) of Theorem 1.2 can be formulated with a weighted `2-norm, i.e. Qis symmetric with respect to this norm, if Fis the squared distance toπunder this`2_{-norm. In fact, relating the results to}_Q_{acting on the bigger} space_RX makes the proofs slightly longer.

Combining this theorem with Maas’ result gives our main theorem.

Theorem 1.4 (Characterisation of Markov chains).Consider a continuous-time irre-ducible Markov chain with finite state space and generatorQ.

• The Markov chain is time-reversible if and only if there exists a metricgsuch that the flow associated to the semigroupetQequals the gradient flow underg∈C1of the relative entropy with respect to the stationary distribution.

• The Markov chain has a real diagonalisable generatorQif and only if there exists a metric g ∈ C1 _{and a functional} _F _∈ _C2 _{such that the flow associated to the} semigroupetQ_{is the gradient flow of}_F_under_g_.

The characterisation of real diagonalisable generatorsQ shows for example that Markov chains that also have an oscillatory behaviour cannot be described by gradient flows.

Finally, we remark that the relative entropy Hdepends on the generator Qonly through the equilibrium distribution π. Moreover, any functional F that allows to construct a gradient flow for all time-reversible Markov chains must have a similar Taylor expansion aroundπ. More precisely:

Theorem 1.5.Fix a finite state spaceX, a distributionπ∈ Pand a functionalF:P 7→_R in C2_{. Suppose that, for every generator} _Q _{defining an irreducible time-reversible} Markov chain with state spaceXand stationary distributionπ, there exists a Riemannian metricg∈C1onP such that the gradient flow ofFunder the metricgequals the flow associated to the semigroupetQ. Then there exists a positive constantαsuch that

dF|π= dH|π= 0andd2F|π=αd2H|π, whereHis the relative entropy with respect toπ.

Here we use the notationdF|πto denote the first derivative ofFat the pointπ, which we understand as linear map fromT to_R. Withd2_F|

πwe denote the second derivative at the pointπ, which is a linear map fromT×T to_R.

The assumption on the functionalFis not empty, because Maas’ result states that the relative entropyHwith respect toπis a functional satisfying the assumption. Moreover, it cannot be strengthened to uniqueness. For this, another functional is the quadratic formFdefined byF|π = dF|π = 0andd2F|π = αd2H|π for someα > 0. This satisfies the assumption, because for any generatorQthe constant metricggiven by the value

g|π in part (a) of Theorem 1.2 indeed defines a Riemannian metric with the required identification.

Remark 1.6.In [4], Maas considered continuous-time Markov chains obtained from an irreducible discrete-time Markov chain with transition matrixKby choosing the jump times according to a Poisson process. The resulting generator isQ=K−I and, by

(4)

analogy with continuous-time diffusion processes, the semigroupetQ_{is also called a heat} flow.

By time-rescaling, this is no restriction to the class of Markov chains for the study of gradient flows because, for any generator Q, we can find some α > 0 such that

K=I+αQis non-negative along the diagonal andKdefines a transition matrix. Now given a functional F, if we can find a metric g such that the flow associated to the semigroupet(αQ)equals the gradient flow ofFunderg, then the flow associated to the semigroupetQequals the gradient flow ofFunder the rescaled metricg/α.

2 Characterisation of Markov chains

Recall that for an irreducible Markov chain the evolution µetQ _{converges to the} unique stationary distributionπfor anyµ∈ P. HenceµQvanishes if and only ifµ=π. With this observation, we can construct a perturbation of the metric in order to prove Theorem 1.1.

Proof of Theorem 1.1. Let e1 be the vector field given by µQ atµ ∈ P. For a small enough neighbourhoodV ⊂ Pofρ, we can find smooth vector fieldse2, . . . , edsuch that

e1, . . . , edis a basis ofT at everyµ∈V. Letηbe a smooth cutoff functional with compact support, vanishing outsideV, and satisfyingη(ρ)6= 0. Then define another metric˜gby

˜

g|µ(ei, ej) =

(

g(ei, ej)|µ+η(µ)a ifi=j= 2,

g(ei, ej)|µ otherwise, forµ∈V and a constanta∈R. Outside ofV, defineg˜=g.

Ifais small enough,g˜is still positive definite and is therefore a metric. Moreover,g˜

creates the same gradient flow, because for everyµ∈ Pand anyv∈T

˜

g|µ(µQ, v) = ˜g|µ(e1, v) =g|µ(e1, v) =g|µ(µQ, v).

For the characterisation at the equilibrium distribution, we use the assumed differen-tiability ofgandF.

Proof of Theorem 1.2. The equality of the flow associated to the semigroupetQ_{and the} gradient flow implies that the time derivatives of both evolutions agree at every state

π+hwithh∈T. This means that, for allv∈T,

g|π+h((π+h)Q, v) =−dF|π+h(v). (2.1) SinceπQ= 0, this implies thatdFmust vanish atπ. Moreover, it simplifies Equation (2.1) to

g|π+h(hQ, v) =−dF|π+h(v). LetM = d2_F|

π, then the Taylor expansion aroundh= 0shows by the assumed regularity ofgandFthat

g|π(hQ, v) +O(khk2) =−M(h, v) +O(khk2). As this holds for arbitraryh∈T, the linear terms must agree. Hence,

g|π(wQ, v) =−M(w, v) ∀v, w∈T. (2.2) Furthermore, we claim that the restriction ofQtoT defines an automorphism on

T. SinceQpreserves the probability mass (i.e. P

i∈XQji = 0forj ∈ X), its range is insideT. IfQwas not an automorphism, av ∈T satisfyingvQ= 0would exist by the

(5)

Rank-Nullity Theorem. But then, for small enoughα, alsoπ+αvwould be a stationary state, contradicting the irreducibility of the Markov chain.

Hence Equation (2.2) determines the value ofg|π for all arguments, which proves part (a) of the theorem.

Giveng|π andM, Equation (2.2) determinesvQ·wfor allv, w∈T, becauseg|πis a positive form. By the mass conservationvQ∈T, so that this determinesvQfor allv∈T. SinceπQ= 0, this determinesQand shows part (b).

In order to prove part (c) letf1, . . . , fd be a basis ofT which is orthonormal under

g|π, i.e. g|π(fi, fj) =δij. LetQ¯ be the matrix corresponding to the generatorQin this basis, i.e.fiQ=P

d

j=1Q¯ijfjfori= 1, . . . , d. Also letM¯ be the matrix corresponding to

M in this basis, i.e.M¯ij =M(fi, fj). Then Equation (2.2) becomes

xQ¯·y=−xM¯ ·y ∀x, y∈Rd.

HenceQ¯=−M¯. Since the partial derivatives ofFcommute,M¯ is symmetric. Therefore,

¯

Qis real diagonalisable andQ¯hasdreal eigenvectors inT. Sinceπis another eigenvector ofQnot inT, this implies thatQis real diagonalisable, which is the statement of part (c).

For the converse of the theorem, we assume thatQis diagonalisable and we need to construct a suitable functional and metric onP. For this, fix eigenvectorsπ, f1, . . . , fd ofQwith eigenvalues0, λ1, . . . , λd. Sinceπis the only stationary distribution, λi 6= 0 fori= 1, . . . , d. AsQmaps intoT, this shows thatf1, . . . , fdis a basis ofT. Define the constant Riemannian metricgonP by

g(fi, fj) =δij,

and the functionalF:P 7→_Rby

F(π+ d X i=1 aifi) = 1 2 d X i=1 (−λi)a2i.

Then at any stateµ=π+Pd

i=1aifi∈ Pwe have, forj = 1, . . . , d,

g(µQ, fj) =λjajand −dF|µ(fj) =λjaj.

Hence the flow associated to the semigroupetQ_{and the gradient flow agree, because} their time derivatives agree for every probability distributionµ∈ P.

For the remaining part (d), the functionalFis the relative entropyH. The second derivatived2H|π ofHatπis given by

M(w, v) = X

α∈X

wαvα

πα

forv, w∈T. LetΠ∈_RX ×X _{be the diagonal matrix with diagonal entries}₍_π₎

i∈X. Then,

by the calculated form ofM, we havevΠ−1·w=M(v, w)for allv, w∈T.

OverT, the metricghas an inversebatπwhich is defined byg|π(v, wb) =v·wfor

v, w∈T and is a positive definite symmetric automorphism onT. Define the symmetric matrixa∈_RX ×X _by_va₌_vb_for_v_∈_T _and₁_a_{= 0}_{, where}₁_{is the vector with all entries} 1.

Sincebis an automorphism onT, Equation (2.2) impliesg(vQ, ua) =−vΠ−1_·₍_ua₎_for allu, v∈T. By the construction ofa, this shows that for allu, v∈T

(6)

SinceπQandπΠ−1_a_{both vanish, Equation (2.3) also holds for}_v₌_π_{. Hence it holds} for allv∈_RX _{which shows}_Q_·_u₌₋_Π−1_a_·_u_{for all}_u_∈_T_.

By the conservation of probabilityQ·1= 0and by the symmetry ofaalsoa·1= 0, so thatQ·u=−Π−1_a_·_u_{holds for all}_u_∈

RX. This shows

ΠQ=−a.

Sinceais symmetric, this shows thatQsatisfies the detailed balance equation, i.e. that the Markov chain is time-reversible.

The remaining Theorem 1.5 is reduced to the following lemma.

Lemma 2.1.Assume the hypothesis of Theorem 1.5. ThenFandHhave a minimum atπ, and the derivativesM = d2F|π andN = d2H|π are positive non-degenerate forms onT. For everyv ∈ T vanishing in exactly one state, there existsαv ∈R+ such that

M(v,·) =αvN(v,·).

With this lemma the theorem can be proved.

Proof of Theorem 1.5. By the previous lemma F andHhave a minimum at π so that

dFπ = dHπ = 0.

Ifd= 1, thenM andN from the lemma correspond to positive scalars so that there exists a positive scalarαsatisfying the required relationM =αN.

Hence, assumed≥2. For two statesv,˜v∈T vanishing only in one common state, the constantsαv andαv˜of the lemma must agree. Ifv and˜v are linearly dependent, then this follows directly from the bilinearity ofM andN. Otherwise, for small enough

λ∈R, alsov+λv˜is inT and vanishing in exactly one state. Hence the lemma applies to this state and by linearity follows

αv+λ˜vN(v+λv,˜ ·) =αvN(v,·) +λαv˜N(˜v,·).

SinceN is non-degenerate, the functionalsN(v,·)andN(˜v,·)are linearly independent, so that the equation impliesαv+λ˜v=αv andαv+λ˜v =αv˜and thus, as claimed,αv=α˜v.

Hence, fori∈ X, there existsα¯i such thatM(v,·) = ¯αiN(v,·)holds forv ∈T with

vi= 0andvj 6= 0forj 6=i. By continuity ofM(v,·)andα¯iN(v,·)with respect tov, the result also holds for allv∈T withvi = 0.

Ifd≥3then fori, j∈ X there existsv∈T withvi=vj= 0which implies that allα¯i fori∈ X agree. Ifd= 2, we can considerv= (1−1 0)andv˜= (0 1−1)withαv = ¯α2 andα˜v= ¯α0. Thenv+ ˜vsatisfies the lemma as well so that as beforeα¯0= ¯α2. Likewise,

¯

α0= ¯α1and allα¯iagree.

The common valueαof theα¯ifori∈ X is the claimed constant satisfyingd2F|π=

αd2_H|_{π. Since the derivatives}_d2_F|

π andd2H|πare positive forms, the constantαmust be positive.

The remaining lemma is proved by considering suitable Markov chains.

Proof of Theorem 2.1. Since π has positive mass for every state i ∈ X, there exists a time-reversible irreducible Markov chain with stationary stateπ. Consider such a Markov chain with generatorQ and letg be a metric such that the flow associated to the Markov semigroupetQ _{is the gradient flow of} _F _under_g_{. Then for any initial} probability stateµ∈ Pthe valueF(µetQ₎_{is decaying as}_t_{increases. Hence}_F_{must have} a minimum atπ. As in the proof of Theorem 1.4, Equation (2.2) must hold andg|πandQ are non-degenerated overT. HenceM = d2_F|

π must be non-degenerated and positive, becauseFhas a minimum atπ. SinceHsatisfies the assumptions imposed onF, this also holds forH.

(7)

Ifd= 1, there exists no such statev∈T. Thus assumed≥2henceforth. Moreover, label the states such thatvd = 0 and identifyM, N and g|π with the corresponding matrix, i.e. M(v, w) = vM ·w, and N(v, w) = vN·w, andg|π(v, w) = vg|π ·w for all

v, w∈T.

Then, Equation (2.2) impliesM−1_Q₌₋_g_|−1

π . HenceM−1Qis symmetric and thus

M−1QM =QT. LikewiseN−1QN =QT and thusM N−1QN M−1=Q. Therefore, ifvis an eigenvector ofQ, thenvM N−1_{is again an eigenvector of}_Q_{with the same eigenvalue.}

We finish the proof of the lemma by showing thatvandw:=vM N−1_{are proportional.} For this, we showwd = 0andwivj =viwj for alli, j = 0,1, . . . , d−1by constructing suitable Markov chains whose generatorQhas the eigenvectorv.

For the first part, letλ >0and consider the time-reversible irreducible Markov chain with stationary stateπand generator matrix

Q= Π−1          −π0λ 0 0 . . . 0 π0λ 0 −π1λ 0 . . . 0 π1λ 0 0 −π2λ . . . 0 π2λ .. . ... ... . .. ... ... 0 0 0 . . . −πd−1λ πd−1λ π0λ π1λ π3λ . . . πd−1λ −(1−πd)λ          .

The eigenspace for−λis{u∈T :ud= 0}and containsv. Hence,wd= 0.

By further relabelling the states, it suffices to show for the second casew0v1=v0w1. For this, letλ >0and consider the Markov chain with generator

Q= Π−1          −π0λ−β0 µ 0 . . . 0 π0λ+β0−µ µ −π1λ−β1 0 . . . 0 π1λ+β1−µ 0 0 −π2λ . . . 0 π2λ .. . ... ... . .. ... ... 0 0 0 . . . −πd−1λ πd−1λ π0λ+β0−µ π1λ+β1−µ π2λ . . . πd−1λ γ          where β0=µ π0v1 v0π1 , β1=µ π1v0 v1π0 , γ=−(1−πd)λ−µ π0v1 v0π1 −µπ1v0 v1π0 + 2µ.

By choosing µ >0 small enough, this defines an irreducible time-reversible Markov chain with stationary stateπand vQ=−λv. Hence,wis again an eigenvector with eigenvalue−λ. From the first component, we find the required ratio

−λw0=−λw0−µ v1w0 v0π1 +µw1 π1 ⇒v1w0=v0w1.

References

[1] Shui-Nee Chow, Wen Huang, Yao Li, and Haomin Zhou,Fokker-Planck equations for a free energy functional or Markov process on a graph, Arch. Ration. Mech. Anal.203(2012), no. 3, 969–1008. MR-2928139

[2] Matthias Erbar and Jan Maas,Gradient flow structures for discrete porous medium equations, Discrete Contin. Dyn. Syst.34(2014), no. 4, 1355–1374. MR-3117845

[3] Richard Jordan, David Kinderlehrer, and Felix Otto,The variational formulation of the Fokker-Planck equation, SIAM J. Math. Anal.29(1998), no. 1, 1–17. MR-1617171

[4] Jan Maas,Gradient flows of the entropy for finite Markov chains, J. Funct. Anal.261(2011), no. 8, 2250–2292. MR-2824578

(8)

[5] Alexander Mielke,A gradient structure for reaction-diffusion systems and for energy-drift-diffusion systems, Nonlinearity24(2011), no. 4, 1329–1346. MR-2776123

[6] Cédric Villani,Optimal transport, Grundlehren der Mathematischen Wissenschaften [Funda-mental Principles of Mathematical Sciences], vol. 338, Springer-Verlag, Berlin, 2009, Old and new. MR-2459454

Acknowledgments. The author would like to thank James Norris for many helpful