A Large Deviation Principle and an Expression of the Rate Function for a Discrete Stationary Gaussian Process

(1)

HAL Id: hal-01096758

https://hal.inria.fr/hal-01096758

Submitted on 18 Dec 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Rate Function for a Discrete Stationary Gaussian

Process

Olivier Faugeras, James Maclaurin

To cite this version:

Olivier Faugeras, James Maclaurin. A Large Deviation Principle and an Expression of the Rate Func-tion for a Discrete StaFunc-tionary Gaussian Process. Entropy, MDPI, 2014, pp.21. �10.3390/e16126722�. �hal-01096758�

(2)

entropy

ISSN 1099-4300 www.mdpi.com/journal/entropy

Article

A Large Deviation Principle and an Expression of the

Rate Function for a Discrete Stationary Gaussian

Process

Olivier Faugeras1,*, James MacLaurin1

1 _{INRIA Sophia Antipolis Mediterannee, 2004 Route Des Lucioles, Sophia Antipolis,}

France

*Author to whom correspondence should be addressed; [email protected]

Received: xx / Accepted: xx / Published: xx

Abstract: We prove a Large Deviation Principle for a stationary Gaussian process over ❘b, indexed by ❩d (for some positive integers d and b), with positive definite spectral density and provide an expression of the corresponding rate function in terms of the mean of the process and its spectral density. This result is useful in applications where such an expression is needed.

Keywords: Stationary Gaussian process; Large Deviation Principle; Spectral Density

(3)

1. Introduction

In this paper we prove a Large Deviation Principle (LDP) for a spatially stationary, indexed by❩d, Gaussian Process over ❘b, and obtain an expression for the rate function. Our work in mathematical neuroscience involves the search for asymptotic descriptions of large ensembles of neurons [1,11,14]. Since there are many sources of noise in the brains of mammals [15], the mathematician interested in modelling certain aspects of brain functioning is often led to consider spatial Gaussian processes that (s)he uses to model these noise sources. This motivates us to use Large Deviations techniques. Being also interested in formulating predictions that can be experimentally tested in neuroscience laboratories, we strive to obtain analytical results i.e. effective results from which for example numerical simulations can be developed. This is why we determine a more tractable expression for the rate function in this article.

Our result concerns the Large Deviations of ergodic phenomena, the literature of which we now briefly survey. Donsker and Varadhan obtained a Large Deviations estimate for the law governing the empirical process generated by a Markov Process [9]. They then determined a Large Deviations Principle for a ❩-indexed Stationary Gaussian Process, obtaining a particularly elegant expression for the rate function using spectral theory. [4,5,7] obtain a Large Deviations estimate for the empirical measure generated by ergodic processes satisfying certain mixing conditions. [2] obtain a variety of results for the Large Deviations of ergodic phenomena, including one for the Large Deviations of ❩-indexed

❘b_{-valued stationary Gaussian processes. [}₁₇_{] have proved an LDP for a stationary}

❩d_{-indexed Gaussian Process over}_❘_{, and [}₃_{] obtain an LDP for an}_❘_-indexed,_❘_-valued

stationary Gaussian process.

In the work we are developing [12], we need Large Deviation results for spatially ergodic Ornstein-Uhlenbeck processes. This requires Theorem1of this paper.

In the first section we make some preliminary definitions and state the chief theorem, Theorem 1, for zero mean processes. In the second section we prove the theorem. In appendixA, we state and prove several identities involving the relative entropy which are necessary for the proof of Theorem 1. In appendix B we prove a general result for the Large Deviations of exponential approximations. We prove Corollary2which extends the result in Theorem1to non-zero mean processes in appendixC.

(4)

2. Preliminary Definitions For some topological space Ω equipped with its Borelian

σ-algebraB(Ω), we denote the set of all probability measures byM(Ω). We equipM(Ω) with the topology of weak convergence. Our process is indexed by❩d. Forj ∈ ❩d_{, we}

write j = (j(1), . . . , j(d)). For some positive integer n > 0, we let Vn = {j ∈ ❩d :

|j(δ)| ≤ nfor all1 ≤ δ ≤ d}. Let T = ❘b_{, for some positive integer}_b_{. We equip} _T

with the Euclidean topology,T❩d with the cylindrical topology, and denote the Borelian

σ-algebra generated by this topology by B(T❩d). For some µ ∈ M(T❩d) governing a process(Xj₎

j∈❩d, we letµVn ∈ M(TVn) denote the marginal governing (Xj)_j_∈_V_n. For somej ∈ ❩d_{, let the shift operator}_Sj _:_T❩d

→ T❩d beS(ω)k ₌_ωj+k_{. We let}_M s(T❩

d ) be the set of all stationary probability measures µ on (T❩d,B(T❩d)) such that for all

j ∈❩d_,_µ_◦₍_Sj₎−1 ₌_µ_{. We use Herglotz’ Theorem to characterise the law}_Q_{∈ M}

s(T❩

d ) governing a stationary process(Xj₎_{in the following manner. We assume that}_E_[_Xj_{] = 0}

and E[X0†Xk] = 1 (2π)d Z [−π,π]d exp(ihk, θi) ˜G(θ)dθ. (1)

Our convention throughout this paper is to denote the transpose ofXas†Xand the spectral density with a tilde. h·,·iis the standard inner product on❘d. HereG˜(θ)is a continuous function[−π, π]d_→_Cb×b_{, where we consider}_[−_{π, π}_]d_{to have the topology of a torus. In}

additionG˜(−θ) = †_G_˜₍_θ_{) = ¯˜}_G₍_x_¯_{indicates the complex conjugate of}_x_{). We assume that} for allθ ∈ [−π, π]d_,_{det ˜}_G₍_θ₎_>_G_˜

min for someG˜min > 0, from which it follows that for

eachθ,G˜(θ)is Hermitian positive definite. Ifx∈Cb_{, then}†_xXj _{is a stationary sequence}

with spectral density†xG˜x¯. We employ the operator norm overCb×b_{. Let}_p_n_:_T❩d _{→ T}❩d

be such thatpn(ω)k =ωk modVn. Here, and throughout the paper, we takek mod Vnto

be the elementl ∈Vnsuch that for all1≤γ ≤d,l(γ) = k(γ) mod (2n+ 1). Define the

process-level empirical measureµˆn :T❩

d → Ms T❩das ˆ µn(ω) = 1 (2n+ 1)d X k∈Vn δSk_p n(ω). (2) LetΠn_{∈ M}_M s(T❩ d

)be the image law ofQunderµˆn. We note that in this definition

we need not have chosenVnto have an odd side length(2n+1): this choice is for notational

convenience, and these results could easily be reproduced in the case that Vn has side

(5)

In the context of mathematical neuroscience, (Xj₎ _{could correspond to a model of}

interacting neurons on a lattice (d = 1,2 or 3), as in [12,13]. We note that the Large Deviation Principle of this paper may be used to obtain an LDP for other processes using standard methods, such as the Contraction Principle or Lemma13.

Definition 1. Let(Ω,H)be a measurable space, andµ,νprobability measures.

I(2)₍_µ_||_ν_{) = sup}

f∈B

{Eµ_[_f_]₋_logEν_[exp(_f_)]}_,

whereB is the set of all bounded measurable functions. IfΩ is Polish and H = B(Ω), then we only need to take the supremum over the set of all continuous bounded functions. Let(Yj₎_{be a stationary Gaussian Process on}_T _{such that}_E_[_Yj_{] = 0}_,_E_[_Yj†_Yk_{] = 0}

andE[Yj†_Yj_{] = Id}

b. EachYj is governed by a lawP, and we write the governing law in

Ms(T❩

d

)asP❩d. It is clear that the governing law overVnmay be written asP⊗Vn (that

is the product measure ofP, indexed overVn).

Definition 2. LetE2be the subset ofMs(T❩

d

)defined by E2 ={µ∈ Ms(T❩

d

)|Eµ_[k_ω0_k2_]_<_∞}_.

We define the process-level entropy to be, forµ∈ Ms(T❩

d ) I(3)(µ) = lim n→∞ 1 (2n+ 1)dI (2) _µVn_||_P⊗Vn_.

It is a consequence of Lemma11that ifµ /∈ E2, thenI(3)(µ) =∞. For further discussion

of this rate function, and a proof thatI(3) _{is well-defined, see [}₇_].

Definition 3. A sequence of probability laws(Γn₎_{on some topological space}_Ω_equipped

with its Borelianσ-algebra is said to satisfy a strong large deviation principle (LDP) with rate functionI : Ω→❘ifI is lower semicontinuous, for all open setsO,

lim n→∞ 1 (2n+ 1)dlog Γ n₍_O₎_{≥ −}_inf x∈OI(x)

and for all closed setsF

lim n→∞ 1 (2n+ 1)dlog Γ n₍_F₎_{≤ −}_inf x∈FI(x).

If furthermore the set{x : I(x) ≤ α}is compact for allα ≥ 0, we say that I is a good rate function.

(6)

Definition 4. For µ ∈ E2, we denote the Cb × Cb-valued spectral measure on

([−π, π]d_,_B([−_{π, π}_]d₎₎_{(which exists due to Herglotz’s Theorem) by}_µ_˜_{. We have}

1 (2π)d Z [−π,π]d exp(ihk, θi)dµ˜(θ) =Eµ_ω0†_ωk_. Forθ ∈ [−π, π]d_{, let}_H_˜₍_θ_{) = ˜}_G₍_θ₎−1

2 be the Hermitian positive definite square root of ˜ G(θ)−1_and ˜ H(θ) = X j∈❩d Hjexp (−ihj, θi). (3)

Theb×bmatricesHj_{are the coefficients of the absolutely convergent Fourier series (due}

to Wiener’s Theorem) ofG˜−1/2_{. Define}_β _:_T❩d

→ T❩das follows (β(ω))k = X

j∈❩d

†_Hj_ωk−j_.

The Theorem below is the chief result of this paper.

Theorem 1. (Πn₎_{satisfies a strong LDP with good rate function}_I(3)₍_µ_◦_β−1₎_{. Here}

I(3) µ◦β−1 =    I(3)₍_µ₎₋_Γ(_µ₎ _if_µ_{∈ E} 2 ∞ otherwise, (4) where Γ(µ) =    Γ1+ Γ2(µ) ifµ∈ E2 0 otherwise. Here Γ1 =− 1 2(2π)d Z [−π,π]d log det ˜G(θ)dθ Γ2(µ) = 1 2(2π)d Z [−π,π]d trIdb−G(˜ θ)−1 d˜µ(θ).

Finally, the rate function uniquely vanishes atµ=Q.

Corollary 2. Suppose that the underlying processQis defined as previously, except with meanEQ_[_ωj_{] =} _c_{for all}_j _∈ _❩d_{and some constant}_c _∈ _❘b_{. If we denote the image law}

(7)

of the empirical measure byΠn

c, then(Πnc)satisfies a strong LDP with good rate function (forµ∈ E2)

I(3)(µ)−Γ(µ) + 1 2

†_c_G_˜₍₀₎−1_c₋†_c_G_˜₍₀₎−1_mµ_. ₍₅₎

Heremµ ₌_Eµ(ω)_[_ωj_]_{for all}_j _∈_❩d_{. If}_{µ /}_{∈ E}

2, then the rate function is infinite. The rate

function has a unique minimum, i.e. I(3)₍_µ₎₋_Γ(_µ_{) +} 1 2

†_c_G_˜₍₀₎−1_c₋†_c_G_˜₍₀₎−1_mµ _{= 0}_if and only ifµ=Q.

We prove this in appendixC.

3. Proof of Theorem 1 In this proof we essentially adapt the methods of [2,10]. We introduce the following metric overT❩d. Forj ∈ ❩d_{, let}_λj _{= 3}−dQd

δ=12−|j(δ)|. Define

the metricdλ_{as follows,}

dλ(x, y) = X

j∈❩d

λjmin kxj−yjk,1

,

where the above is the Euclidean norm. Letdλ,M _{be the induced Prohorov metric over}

M(T❩d). These metrics are compatible with their respective topologies.

Forθ ∈[−π, π]d_{, let}_F˜₍_θ_{) = ˜}_G₍_θ₎1/2_{be the Hermitian positive square root of}_G˜_and

˜

F(θ) = X

j∈❩d

Fjexp (−ihj, θi). (6)

Theb×b matricesFj _{are the coefficients of the absolutely convergent Fourier series of}

the positive square root. Defineτ :T❩d → T❩d andτ(M):T❩

d → T❩d as follows, (τ(ω))k = X j∈❩d †_Fj_ωk−j_, τ(n)(ω) k = X j∈Vn †_Fj_ωk−j d Y δ=1 1− |j(δ)| 2n+ 1 . (7)

We note thatτ =β−1 _{(on a suitable domain where the series are convergent) and that}_τ (n)

(8)

We note (using Lemma6) thatP❩d ◦τ−1 _{has spectral density}_G_˜_{. We define} ˜ F(n)(θ) = X j∈Vn Fjexp (−ihj, θi) d Y δ=1 1− |j(δ)| 2n+ 1 . (8) We write ε(n) = sup_θ_∈_[₋_π,π_]d F˜(θ)−F˜(n)(θ) 2 . By Fejer’s Theorem, ε(n) → 0 as

n→ ∞. We defineG˜(n)(θ) = ˜F(n)(θ)2, noting that this is the spectral density ofP❩

d ◦τ₍−_n1₎. LetΓ(n)(µ) = Γ1,(n)+ Γ2,(n)(µ), where forµ∈ E2,

Γ1,(n) =− 1 2(2π)d Z [−π,π]d log det ˜G(n)(θ)dθ, (9) Γ2,(n)(µ) = 1 2(2π)d Z [−π,π]d trIdb−G˜−_(n)1(θ) d˜µ(θ). (10) If µ /∈ E2, let Γ1,(n) = Γ2,(n)(µ) = 0. Let Rn ∈ M(Ms(T❩ d

))be the law governing ˆ

µn(Y), where we recall that the stationary process(Yj)is defined just below definition1.

LetΠn

(m) ∈ M(Ms(T❩

d

))be the law governingµˆn(τ(m)(Y)).

Lemma 3. (Rn₎_{satisfies a Large Deviation Principle with good rate function}_I(3)₍_µ₎_{. If}

µ /∈ E2, thenI(3)(µ) =∞.

Proof. The first statement is proved in [7]. The last statement follows from Lemma11

below.

Lemma 4. (Πn

(m))satisfies a strong LDP with good rate function given by, forµ∈ E2

I₍(3)_m₎(µ) = inf

ν∈E2:µ=ν◦τ₍−m1)

I(3)(ν) = I(3)(µ)−Γ(m)(µ). (11)

Ifµ /∈ E2, thenI₍(3)_m₎(µ) =∞.

Proof. The sequence of laws governingµˆn(Y)◦τ₍−_m1₎(asn→ ∞, withmfixed) satisfies a

strong LDP with good rate function as a consequence of an application of the contraction principle to Lemma3(sinceτ(m) is continuous). Now, through the same reasoning as in

[10, Lemma 2.1,Theorem 2.3], it follows from this that(Πn

(m))satisfies a strong LDP with

the same rate function as that ofµˆn(Y)◦τ₍−_m1₎. The last identification in (11) follows from

Lemmas6 and9 in Appendix A. We only need to take the infimum over E2 because by

Lemma3,I(3)₍_ν₎_{is infinite for}_{ν /}_{∈ E} 2.

(9)

Lemma 5. If0<b_< 1

2ε(m) then for alln >0

1 (2n+ 1)dlogE P❩d " exp bX k∈Vn kτ(m)(ω)k−τ(ω)kk2 !# ≤ −1 2log (1−2bε(m)).

The proof is almost identical to that in [10, Lemma 2.4]. We are now ready to prove Theorem1.

Proof. We apply Lemma 13 in the Appendix to the above result. We substituteY(m) =

τ(m)(ω)and W = τ(ω). Taking m → ∞ in the equation in Lemma 5, we find (38) is

satisfied if we stipulate thatκ= 0. After noting the LDP governingΠn

(m)in (11), we may

thus conclude that(Πn₎_{satisfies a strong LDP with good rate function}

lim

δ→0_mlim_→∞γ∈infBδ₍_µ₎I

(3)

(m)(γ), (12)

whereBδ₍_µ_{) =} _{_γ _:_dλ,M₍_{µ, γ}₎_≤_δ_}_.

It remains for us to identify (12) with the rate function in (4). We claim that for each

δ >0, lim m→∞γ∈infBδ₍_µ₎ I₍(3)_m₎(γ)= inf γ∈Bδ₍_µ₎(I (3)₍_γ₎₋_Γ(_γ₎₎_. ₍₁₃₎

To see this, we have from Lemma 11 that for all m and all γ, and constants α1 < 1,

α3 >1andα2, α4 ∈❘, (note that ifγ /∈ E2 the inequalities below are immediate from the

definitions)

I(3)(γ)−Γ(γ)≥(1−α1)I(3)(γ)−α2 (14)

I₍(3)_m₎(γ)≥(1−α1)I(3)(γ)−α2 (15)

I(3)(γ)−Γ(γ)≤(1 +α3)I(3)(γ)−α4

I₍(3)_m₎(γ)≤(1 +α3)I(3)(γ)−α4

We thus see that if I(3)₍_γ_{) =} _∞ _{for all} _γ _∈ _Bδ₍_µ₎_{, then (}₁₃_{) is identically infinite on}

both sides. Otherwise, it may be seen from (14) and (15) that it suffices to establish (13) in the case thatBδ₍_µ_{) =} _Bδ

l(µ) := {γ : dλ,M(µ, γ) ≤ δ, I(3)(γ) ≤ l}for somel < ∞.

But it follows from (29) and (34) that for allγ ∈Bδ

l(µ), there exist constants(αm5 )which

converge to0asm → ∞and such that

I (3) (m)(γ)−I(3)(γ) + Γ(γ) ≤ α m 5 . We may thus

(10)

conclude (13). The expression for the rate function in (4) now follows sinceI(3)₍_γ₎₋_Γ(_γ₎

is lower semicontinuous, by Lemma12.

For the second statement in the Theorem, ifI(3)₍_µ_◦_β−1_{) = 0}_{, then}

I(2) ₍_µ_◦_β−1₎Vn_||_P⊗Vn _{= 0}_{for all} _n_{. However since the relative entropy has a unique} zero, this means that(µ◦β−1₎Vn ₌_P⊗Vn _{for all}_n_{. But this means that} _µ_◦_β−1 ₌ _P❩d_, and therefore (using Lemma6)µ=Q.

Acknowledgements: This work was partially supported by the European Union Seventh Framework Programme (FP7/2007- 2013) under grant agreement no. 269921 (BrainScaleS), no. 318723 (Mathemacs), and by the ERC advanced grant NerVi no. 227747.

A. Properties of the Entropy LetK˜ : [−π, π]d _→_Cb×b_{possess an absolutely convergent}

Fourier Series, and be such that the eigenvalues ofK(˜ θ)are strictly greater than zero for allθ. We require thatK˜ is the density of a stationary sequence, which means that we must also assume that for allθ

˜

K(−θ) =†K(˜ θ) = ¯˜K(θ).

This means, in particular, thatK(˜ θ)is Hermitian. We write

(∆(ω))k = X j∈❩d †_Sj_ωk−j_, _where ₍₁₆₎ X j∈❩d Sj_{exp (−}_i_h_{j, θ}_{i) = ˜}_K(_θ₎−1₂_. ₍₁₇₎

Here K˜−1₂ _{is understood to be the positive Hermitian square root of} _K_˜−1_{. The Fourier}

Series of K˜−1₂ _{is absolutely convergent as a consequence of Wiener’s Theorem. In this} section, we determine a general expression for I(3)₍_ξ_◦_∆−1₎_{. We are generalising the}

result forb = 1 with❩-indexing given in [10]. These results are necessary for the proofs in the previous section.

(11)

We similarly write that (Υ(ω))k = X j∈❩d †_Rj_ωk−j_, _where ₍₁₈₎ X j∈❩d Rjexp (−ihj, θi) = ˜K(θ)12. (19)

As previously,K˜12 is understood to be the positive definite Hermitian square root ofK˜. We note thatR−j ₌†_Rj _and_S−j ₌†_Sj_.

Lemma 6. For allξ∈ E2,ξ◦∆−1andξ◦Υ−1are inE2 and

ξ◦∆−1◦Υ−1 =ξ◦Υ−1◦∆−1 =ξ.

Proof. We make use of the following standard Lemma from [16], to which the reader is referred for the definition of an orthogonal stochastic measure. Let (Uj₎ _∈ _Rb _{be a}

zero-mean stationary sequence governed by ξ ∈ E2. Then there exists an orthogonal ❘b_{-valued stochastic measure}_Zξ₌_Zξ_(∆)₍_∆_{∈ B([−}_{π, π}_[d₎₎_{, such that for every}_j _∈_❩d

(ξa.s.) Uj = 1 2π Z [−π,π]d exp(ihj, θi)Zξ(dθ). (20)

Conversely any orthogonal stochastic measure defines a zero-mean stationary sequence through (20). It may be inferred from this representation that

Zξ◦∆−1

(dθ) =†K(˜ θ)−12Zξ(dθ),

Zξ◦Υ−1(dθ) =†K(˜ θ)12Zξ(dθ).

The proof that this is well-defined makes use of the fact thatK˜12 andK˜−12 are uniformly continuous, since their Fourier Series’ each converge uniformly. This gives us the Lemma. We note for future reference that, ifξhas spectral measuredξ˜(θ), then the spectral density ofξ◦∆−1 _is

˜

K−1₂₍_θ₎_d_ξ_˜₍_θ_{) ˜}_K−1₂₍_θ₎_. ₍₂₁₎

(12)

Definition 5. Ifξ ∈ E2, we define Γ∆(ξ) = 1 2 Eξ_k_ω0_k2₋Eξ◦∆−1_k_ω0_k2₋ 1 2(2π)d Z [−π,π]d log detK(˜ θ)dθ. (22) Otherwise we defineΓ∆₍_ξ_{) = 0}_. Lemma 7. Ifξ ∈ E2, Γ∆(ξ) = 1 2(2π)d Z [−π,π]d trIdb−K(˜ θ)−1 d ˜ξ(θ)− 1 2(2π)d Z [−π,π]d log detK(˜ θ)dθ. (23) Proof. We see from (21) thatξ◦∆−1 _{has spectral measure}_K_˜−1

2(θ)dξ˜(θ) ˜K−12(θ). We thus find that Γ∆(ξ) = 1 2(2π)d Z [−π,π]d trd ˜ξ(θ)− 1 2(2π)d Z [−π,π]d trK˜−12(θ)d ˜ξ(θ) ˜K− 1 2(θ) − 1 2(2π)d Z [−π,π]d log detK(˜ θ)dθ. = 1 2(2π)d Z [−π,π]d trd ˜ξ(θ)− 1 2(2π)d Z [−π,π]d trK(˜ θ)−1d ˜ξ(θ) − 1 2(2π)d Z [−π,π]d log detK(˜ θ)dθ.

Lemma 8. For allξ∈ E2,

I(3) ξ◦∆−1

≤I(3)(ξ)−Γ∆(ξ).

Proof. We assume for now that there exists aqsuch thatSj _{= 0}_{for all}_|_j_{| ≥} _q_{, denoting}

the corresponding map by∆q. LetNqm : TVm → TVm be the following linear operator.

Forj ∈Vm, let(Nqmω)j =

P

k∈VmS

(k−j) modVm_ωk_{. Let}_ξ`

q,n =ξVn◦(Nqn)−1. It follows

from this assumption that theVl marginals of ξ`q,n and ξ ◦∆−q1 are the same, as long as

l≤n−q. Thus I(2) (ξ◦∆−_q1)Vl||_P⊗Vl₌_I(2) ( `ξq,n)Vl||P⊗Vl ≤I(2)ξ`q,n||P⊗Vn . (24)

(13)

This last inequality follows from a property of the Relative Entropy I(2)_{, namely that it}

is nondecreasing as we take a ‘finer’σ-algebra (it is a direct consequence of [9, Lemma 2.3]). IfξVn _{does not have a density for some}_n_then_I(3)₍_ξ₎_{is infinite and the Lemma is} trivial. If otherwise, we may readily evaluateI(2)_ξ_`

q,n||P⊗Vn

using a change of variable to find that I(2)ξ`q,n||P⊗Vn =I(2) ξVn_||_P⊗Vn₊1 2E ξVn kNn qωk2− kωk2 + 1 2log det N n q .

We divide (24) by(2l+ 1)d_{, substitute the above result, and finally take}_l _{→ ∞}_(while

fixingn =l+q) to find that

I(3) ξ◦∆−_q1 ≤I(3)(ξ) + 1 2 Eξ◦(∆)−1 kω0k2 −Eξ kω0k2 + 1 2(2π)d Z [−π,π]d log detK(˜ θ)dθ =I(3)(ξ)−Γ∆_q(ξ).

HereΓ∆q(ξ)is equal toΓ∆(ξ)as defined above, subject to the above assumption thatSj =

0 for |j| > q. On taking q → ∞, it may be readily seen that Γ∆

q → Γ∆ pointwise.

Furthermore the lower semicontinuity ofI(3)_{dictates that}

I(3) ξ◦∆−1

≤ lim

q→∞

I(3) ξ◦∆−_q1

,

which gives us the Lemma.

Lemma 9. For allξ∈ E2,I(3)(ξ◦∆−1) =I(3)(ξ)−Γ∆(ξ).

Proof. We find, similarly to the previous Lemma, that ifγ ∈ Ms(T❩

d )then I(3) γ◦Υ−1 ≤I(3)(γ)+1 2 h Eγ◦Υ−1_k_ω0_k2₋Eγ_k_ω0_k2i₋ 1 2(2π)d Z [−π,π]d log det ˜K(θ)dθ. (25) We substituteγ =ξ◦∆−1_{into the above and, after noting Lemma}₆_{, we find that}

I(3)(ξ)≤I(3) ξ◦∆−1 +1 2 Eξ_k_ω0_k2₋E(ξ◦∆−1)_k_ω0_k2 − 1 2(2π)d Z [−π,π]d log det ˜K(θ)dθ =I(3) ξ◦∆−1 + Γ∆(ξ). (26)

(14)

We next prove some matrix identities which are needed in the proof of Lemma11. Lemma 10. IfA, B ∈Cl×l_{(for some positive integer}_l_{) are both Hermitian, then}

tr(AB)≤bkAkkBk ≤

(

lkAktr(B) if B is positive

ltr(A)kBk if Ais positive (27) IfAandB are positive and in additionAis invertible then

tr (B)≤lkA−1_k2_tr(ABA)_. ₍₂₈₎

Proof. The first part of (27) follows from the von Neumann’s trace inequality. Given two matricesAandB inCl×l |tr(AB)| ≤ l X t=1 αtβt,

where the αt’s and βt’s are the singular values of A and B. In the case where A and

B are Hermitian the singular values are the magnitudes of the (real) eigenvalues. By Cauchy-Schwartz,Pl t=1αtβt≤ q Pl t=1α2t× q Pl t=1βt2 ≤lkAk kBk. IfBis positive, so

are its eigenvalues andkBk ≤tr(B), hence (27). IfAis invertible,tr(B) = tr(A−2_ABA)_.

If moreover and A andB are both Hermitian positive, we obtain the second identity by applying (27) to the Hermitian matrixA−2_{and the Hermitian positive matrix}_ABA_.

Lemma 11. For all0< a < 1 2,

aEµ_[k_ω0_k2_{] +} b

2log(1−2a)≤I

(3)₍_µ₎_. ₍₂₉₎

For allµ ∈ E2, there exist constantsα1 <1, α2 ∈ ❘, α3 > 1, α4 ∈ ❘andm >ˇ 0such

that for allm >mˇ,

Γ(m)(µ)≤α1I(3)(µ) +α2, (30)

Γ(µ)≤α1I(3)(µ) +α2. (31)

Γ(m)(µ)≥ −α3I(3)(µ) +α4 (32)

Γ(µ)≥ −α3I(3)(µ) +α4 (33)

There exist constantsαm

3 , αm4 >0which converge to zero asm→ ∞, and such that

Γ₍_m₎(µ)−Γ(µ)

(15)

Proof. It is a standard result that ifµ /∈ Ms(T❩

d

), thenI(3)₍_µ_{) =} _∞_{, for which the first}

result is evident. Letµ ∈ Ms(T❩

d

). Forw ∈ TVn _{we let}_f₍_ω_{) =} P

k∈Vnkω

k_k2_{. The}

functionfM(ω) = f(ω)1af(ω)≤M is bounded, and hence from definition1we have

a Z TVn fMdµVn ≤log Z TVn exp(afM)dP⊗Vn+I(2)(µVn||P⊗Vn).

We obtain using an easy Gaussian computation that log Z TVn exp(af)dP⊗Vn ₌₋(2n+ 1) d_b 2 log(1−2a).

Upon takingM → ∞and applying the dominated convergence theorem we obtain

aEµ_[k_ω0_k2_]_{≤ −}b

2log(1−2a) + 1 (2n+ 1)dI

(2)₍_µVn_||_P⊗Vn₎_.

We have used the stationarity of µ. By taking the limit n → ∞ we obtain the first inequality (29).

It follows from the definition that (30)-(33) are true ifµ /∈ E2. Thus we may assume

thatµ ∈ E2. We choosemˇ to be such that the eigenvalues ofF˜(m) (as defined in (8)) are

strictly greater than zero for allm >mˇ. It may be easily verified that

b X s=1 ˜ µss [−π, π]d = (2π)dEµ_k_ω0_k2_. ₍₃₅₎

We observe the following upper and lower bounds which hold for allm > mˇ (and forΓ1

too),

Γ1,(m)≤ −

1

2m≥m,θˇ inf∈[−π,π]dlog det ˜G(m)(θ)<∞ (36) Γ1,(m)≥ −

1

2m≥m,θˇsup∈[−π,π]d

log det ˜G(m)(θ)>−∞ (37)

We recall that, sinceG˜(m) = ˜F(2m),

Γ2,(m)(µ) = 1 2(2π)d Z [−π,π]d tr (d˜µ(θ))− 1 2(2π)d Z [−π,π]d trF˜−_(m)2(θ)d˜µ(θ).

(16)

Note that tr( ˜F₍−_m2₎(θ)dµ˜(θ)) = tr( ˜F₍−_m1₎(θ)( ˜F₍−_m1₎(θ)dµ˜(θ) ˜F₍−_m1₎(θ)) ˜F(m)(θ)) =

tr( ˜F₍−_m1₎(θ)dµ˜(θ) ˜F₍−_m1₎(θ)), apply Lemma10, (28), to this, obtaining

Γ2,(m)(µ)≤ 1 2(1−α ∗ 1)Eµ kω0k2 , where α∗₁ = 1 b θ∈[−π,πinf]d_,m_≥_m_ˇ kF˜(m)(θ)k−2 >0. Ifα∗

1 ≥ 1, then (30) is clear becauseI(3)(µ) ≥ 0, the above inequality would imply that

Γ2,(m)is negative and (36). Otherwise, we may use (29) and (36) to find that

Γ(m)(µ)≤ 1 2a(1−α ∗ 1)I(3)(µ)− b(1−α∗ 1) 4a log(1−2a)− 1

2m≥m,θˇ inf∈[−π,π]dlog det ˜G(m). We may substitutea > 1₂(1−α∗

1), lettingα1 = ₂1_a(1−α∗1), into the above to obtain (30).

The second inequality (31) follows by takingm → ∞in the first. For the third inequality, we find using (27) that

Γ2,(m)(µ)≥ 1 2(1−α ∗ 3)Eµ[kω0k2], where α∗₃ =b inf θ∈[−π,π]d_,m_≥_m_ˇ kF˜(m)(θ)k2 . If α∗

3 ≤ 1, then (32) is clear because of the fact that I(3)(µ) ≥ 0, the above inequality

would mean thatΓ2,(m)is positive, and (37). Otherwise, we may use (29) and (37) to find

that Γ2,(m)(µ)≥ 1−α∗ 3 2a I (3)₍_µ_{) +} b(1−α∗3) 4a log(1−2a)− 1 2m≥m,θˇsup∈[−π,π]d log det ˜G(m)(θ).

This yields (32) on takinga < α∗3−1

2 . Taking limits asm → ∞yields (33).

Now, making use of Lemma10, (27), it may be seen that

Γ₂_,₍_m₎(µ)−Γ₂(µ) = 1 2(2π)d Z [−π,π]d trG˜−1(θ)−G˜_(m)−1(θ)d˜µ(θ) . ≤ b 2(2π)dGm Z [−π,π]d tr(d˜µ(θ)).

HereGm = supθ∈[−π,π]dkG˜−1(θ)−G˜−₍_m1₎(θ)k. The convergence of Γ₁_,₍_m₎ toΓ₁ is clear. We thus obtain (34).

(17)

Lemma 12. I(3)₍_µ₎₋_Γ(_µ₎_{is lower-semicontinuous.}

Proof. SinceI(3)₍_µ₎₋_Γ(_µ₎_{is infinite for all} _{µ /}_{∈ E}

2, we only need to prove the Lemma

forµ ∈ E2. We need to prove that ifµ(j) → µthen lim

j→∞

I(3)₍_µ(j)_◦_β−1₎_≥ _I(3)₍_µ_◦_β−1₎_.

We may assume without loss of generality that lim

j→∞I

(3)₍_µ(j)_◦_β−1_{) = lim}

j→∞I

(3)₍_µ(j)_◦_β−1₎_∈_R_{∪ ∞}_.

IfEµ(j)_[k_v0_k2_]_{→ ∞}_{, then by (}₂₉_{) in Lemma}₁₁_,_lim_j_→∞_I(3)₍_µ(j)_◦_β−1_{) =} _∞_{, satisfying}

the requirements of the Lemma.

Thus we may assume that there exists a constantlsuch thatEµ(j)

[kω0_k2_]_≤ _l_{for all}_j_.

We therefore have that, for allm, because of (11) and (4),

lim j→∞ I(3)(µ(j)◦β−1) = lim j→∞ I₍(3)_m₎ µ(j) + Γ(m) µ(j) −Γ µ(j) .

Making use of (32), we may thus conclude that lim j→∞ I(3)(µ(j)◦β−1)≥ lim j→∞ I₍(3)_m₎(µ(j))−ǫ∗₍_m₎, for someǫ∗

(M) which goes to0as m → ∞. In addition, lim

j→∞

I₍(3)_m₎(µ(j)₎ _≥ _I(3)

(m)(µ)due to

the lower semi-continuity ofI₍(3)_m₎. On takingm → ∞, sinceI₍(3)_m₎(µ)→I(3)₍_µ_◦_β−1₎_{, we}

find that

lim

j→∞

I(3)(µ(j)◦β−1)≥I(3)(µ(j)◦β−1).

B. A lemma on the Large Deviations of Stationary Random Variables The following lemma is an adaptation of [2, Theorem 4.9] to❩d. We state it in a general context. LetB

be a Banach Space with normk · k. Forj ∈ ❩d_{, let}_λ

j = 3−d

Qd

δ=12−|j(δ)|. We note that

P

k∈❩dλk = 1. Define the metricdλonB❩

d by dλ(x, y) = X j∈❩d λjmin kxj−yjk,1 .

(18)

Let the induced Prohorov metric on M(B❩d₎ _be _dλ,M_{. For} _ω _∈ B❩d _and _j _∈ ❩d_{, we}

define the shift operator as Sj₍_ω₎k ₌ _ωj+k_{. Let} _Bδ₍_A_{) =} _{_x _∈ _B❩d

: dλ₍_{x, y}₎ _≤

δfor somey ∈A}be the closed blowup, andB(δ)be the closed blowup of{0}. Suppose that form ∈ ❩+, Y(m), W ∈ B❩

d

are stationary random variables, governed by a probability law P_{. We suppose that} _µ_ˆ_n₍_Y₍_m₎₎ _{is governed by} _Πn

(m) and µˆn(W) is

governed byΠn

W - these being the empirical process measures, defined analogously to (2).

Suppose that for eachm, (Πn

(m))satisfies an LDP with good rate functionJ(m). Suppose

thatW =Y(m)+Z(m)for some series of stationary random variablesZ(m)onB❩

d . Lemma 13. If there exists a constantκ >0such that for allb_>₀

lim m→∞nlim→∞ 1 (2n+ 1)dlogE " exp bX j∈Vn minkZ₍j_m₎k2_,₁ !# < κ, (38) then(Πn

W)satisfies an LDP with good rate function

J(x) = lim

ǫ→0_mlim_→∞y∈infBǫ₍_x₎J(m)(y) = lim_ǫ_→₀_mlim_→∞_y_∈inf_Bǫ₍_x₎J(m)(y).

Proof. It suffices, thanks to [6, Theorem 4.2.16, Ex 4.2.29], to prove that for allǫ >0,

lim m→∞nlim→∞ 1 (2n+ 1)dlogP d λ,M_(ˆ_µn₍_W₎_,_µ_ˆn₍_Y (m)))> ǫ =−∞. (39)

Forx ∈B❩d_{, write}_|_x_|_λ _:= _dλ₍_x,₀₎_{. Let}_B _{∈ B(}B❩d₎_{. Then, noting the definition of}_p_n

just above (2), ˆ µn(W)(B) = 1 (2n+ 1)d X j∈Vn 1B Sjpn(Y(m)) +Sjpn(Z(m)) ≤ 1 (2n+ 1)d X j∈Vn 1B Sjpn(Y(m)) +Sjpn(Z(m)) 1B(ǫ)(Sjpn(Z(m))) + 1B(ǫ)c Sjp_n(Z₍_m₎) ≤ 1 (2n+ 1)d X j∈Vn 1Bǫ(Sjp_n(Y₍_m₎)) + 1 (2n+ 1)d#{j ∈Vn :|S j_p n(Z(m))|λ > ǫ} ≤µˆn(Y(m))(Bǫ) + 1 (2n+ 1)d# j ∈Vn :|Sjpn(Z(m))|λ > ǫ .

(19)

Thus P _dλ,M_(ˆ_µn₍_W₎_,_µ_ˆn₍_Y₍_m₎₎₎_{> ǫ} _≤P 1 (2n+ 1)d# j ∈Vn :|Sjpn(Z(m))|λ > ǫ > ǫ ≤P X j∈Vn |Sjpn(Z(m))|2λ >(2n+ 1)dǫ3 ! ≤exp −b₍₂_n_{+ 1)}d_ǫ3_EP " exp bX j∈Vn |Sjpn(Z(m))|2λ !# ≤exp −b₍₂_n_{+ 1)}d_ǫ3_EP  exp  b X j∈Vn,k∈❩d λkmin(kpn(Z(m))j+kk2,1)    

for an arbitraryb _> ₀_{. Since} P

k∈❩dλk = 1 and the exponential function is convex, by

Jensen’s Inequality exp −b₍₂_n_{+ 1)}d_ǫ3_EP  exp  b X j∈Vn,k∈❩d λkmin(kpn(Z(m))j+kk2,1)     ≤exp −b₍₂_n_{+ 1)}d_ǫ3_EP " X k∈❩d λkexp b X j∈Vn min(kpn(Z(m))j+kk2,1) !# = exp −b₍₂_n_{+ 1)}d_ǫ3_EP " exp bX j∈Vn min(kZ₍j_m₎k2_,₁₎ !# ,

by the stationarity ofZ(m) and the fact thatP_k_∈_❩dλk= 1. We may thus infer, using (38),

that lim m→∞nlim→∞ 1 (2n+ 1)dlogP d λ,M_(ˆ_µn₍_W₎_,_µ_ˆn₍_Y (m)))> ǫ ≤ −b_ǫ3₊_κ. ₍₄₀₎

Sinceb_{is arbitrary, we may take}b_{→ ∞}_{to obtain (}₃₉_).

C. Proof of Corollary 2 We now prove Corollary2.

Proof. Letφ:T → T beφ(ω) =ω+c, andφ❩d :T❩d → T❩d beφ❩d(ω)j ₌_φ₍_ωj₎_{. Let}

Ψ❩d : Ms(T❩ d )→ Ms(T❩ d )beΨ❩d(µ) = µ◦(φ❩d)−1_{, and}_{Ψ :}_M s(T)→ Ms(T)be

(20)

respective topologies. SinceΠn

c = Πn◦(Ψ❩

d

)−1_{, we have by a contraction principle [}₆_,

theorem 4.2.1] that(Πn

c)satisfies a strong LDP with good rate function

I(3)((Ψ❩d)−1(µ))−Γ((Ψ❩d)−1(µ)). (41)

Clearly(Ψ❩d)−1₍_µ₎_{is in}_E

2if and only ifµis inE2. Letν = (Ψ❩

d

)−1₍_µ₎_{. It is well known}

that if(Ψ❩d)−1₍_µ₎Vn _{is absolutely continuous relative to} _P⊗Vn_{, then the relative entropy} may be written as I(2)(((Ψ❩d)−1(µ))Vn_||_P⊗Vn_{) =} _E((Ψ❩ d )−1₍_µ₎₎Vn₍_x₎ " logd((Ψ ❩d )−1₍_µ₎₎Vn dP⊗Vn (x) # =EµVn(x) log dµ Vn d(Ψ(P))⊗Vn(x) =I(2)(µVn_||(Ψ(_P₎₎⊗Vn₎_.

Otherwise the relative entropy is infinite. Thus if the relative entropy is finite, (Ψ❩d)−1₍_µ₎Vn _{must possess a density, and this means that} _µVn _{possesses a density which} we denote byr(x) :TVn → TVn_{. We note that the density of}_(Ψ(_P₎₎⊗Vn _is

ρVn₍_x_{) = (2}_π₎−(2n+1)₂ db _exp ₋1 2 X j∈Vn kxj−ck2 ! . (42)

Accordingly we find that

I(2) µVn_||(Ψ(_P₎₎⊗Vn ₌_EµVn(x) log r(x) ρVn(x) =I(2) µVn_||_P⊗Vn₋₍₂_n_{+ 1)}d†_mµ_c₊ (2n+ 1) d 2 kck 2_.

We divide by(2n+ 1)d_{and take}_n_{to infinity to obtain}

I(3)((Ψ❩d)−1(µ)) =I(3)(µ)−†mµc+ 1 2kck

2_.

IfµVn _{does not possess a density for some} _n_{, then both sides of the above equation are} infinite. It may be verified that the spectral density ofνis given by

(21)

On substituting this into the expression in Theorem1, we find that Γ2(ν) =Γ2(µ) + 1 2tr Idb−G(0)˜ −1 c†c− 1 2tr Idb−G(0)˜ −1 mµ†c + c†mµ =Γ2(µ) + 1 2 †_c_Id b−G(0)˜ −1 c−†cIdb−G(0)˜ −1 mµ.

We have used the fact thatG˜(0) is symmetric. We thus obtain (5). This minimum of the rate function remains unique because of the bijectivity of(Ψ❩d).

References

1. Javier Baladron, Diego Fasoli, Olivier Faugeras, and Jonathan Touboul. Mean-field description and propagation of chaos in networks of Hodgkin-Huxley and Fitzhugh-Nagumo neurons. The Journal of Mathematical Neuroscience, 2(1), 2012. 2. John R. Baxter and Naresh C. Jain. An approximation condition for large deviations

and some applications. In Vitaly Bergulson, editor,Convergence in Ergodic Theory and Probability. Ohio State University Mathematical Research Institute Publications, 1993.

3. W. Bryc and A. Dembo. On large deviations of empirical measures for stationary gaussian processes. Stochastic Processes and Their Applications, 58(1):23–34, 1995. 4. W. Bryc and A. Dembo. Large deviations and strong mixing. InAnnales de l’IHP

Probabilités et statistiques, volume 32, pages 549–569. Elsevier, 1996.

5. T. Chiyonobu and S. Kusuoka. The large deviation principle for hypermixing processes. Probability Theory and Related Fields, 78:627–649, 1988.

6. A. Dembo and O. Zeitouni. Large deviations techniques. Springer, 1997. 2nd Edition.

7. J.D. Deuschel, D.W. Stroock, and H. Zessin. Microcanonical distributions for lattice gases. Communications in Mathematical Physics, 139, 1991.

8. Jean-Dominique Deuschel and Daniel W. Stroock. Large Deviations, volume 137 of

Pure and Applied Mathematics. Academic Press, 1989.

9. M.D. Donsker and S.R.S Varadhan. Asymptotic evaluation of certain markov process expectations for large time, iv. Communications on Pure and Applied Mathematics, XXXVI:183–212, 1983.

(22)

10. M.D. Donsker and S.R.S. Varadhan. Large deviations for stationary Gaussian processes. Commun. Math. Phys., 97:187–210, 1985.

11. Olivier Faugeras and James Maclaurin. Asymptotic description of neural networks with correlated synaptic weights. Rapport de recherche RR-8495, INRIA, March 2014.

12. Olivier Faugeras and James MacLaurin. Asymptotic description of stochastic neural networks. I. Existence of a large deviation principle. vol. 352, number 10. 2014. Comptes Rendus Mathematiques.

13. Olivier Faugeras and James MacLaurin. Large deviations of a spatially-ergodic neural network with learning. Arxiv depot, INRIA Sophia Antipolis, 2014.

14. Olivier Faugeras, Jonathan Touboul, and Bruno Cessac. A constructive mean field analysis of multi population neural networks with random synaptic weights and stochastic inputs. Frontiers in Computational Neuroscience, 3(1), 2009.

15. E.T. Rolls and G. Deco. The noisy brain: stochastic dynamics as a principle of brain function. Oxford university press, 2010.

16. A.N. Shiryaev. Probability. Springer, 1996.

17. Y. Steinberg and O. Zeitouni. On tests for normality. IEEE Transactions on Information Theory, 38(6), 1992.

c

2014 by the authors; licensee MDPI, Basel,

Switzerland. This article is an open access article

distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).