Conditional large deviations for a sequence of words

(1)

www.elsevier.com/locate/spa

Conditional large deviations for a sequence of words

Matthias Birkner

Weierstraß-Institut f¨ur Angewandte Analysis und Stochastik, Mohrenstraße 39, 10117 Berlin, Germany

Received 7 August 2006; received in revised form 15 May 2007; accepted 25 May 2007 Available online 8 June 2007

Abstract

Cut an i.i.d. sequence(Xi)of ‘letters’ into ‘words’ according to an independent renewal process. Then

one obtains an i.i.d. sequence of words, and thus the level 3 large deviation behaviour of this sequence of words is governed by the specific relative entropy. We consider the corresponding problem for the

conditionalempirical process of words, where one conditions on a typical underlying(Xi). We find that if

the tails of the word lengths decay exponentially, the large deviations under the conditional distribution are almost surely again governed by the specific relative entropy, but the set of attainable limits is restricted.

We indicate potential applications of such a conditional LDP to the computation of the quenched free energy for directed polymer models with random disorder.

c

MSC:60F10; 60G10; 82D60

Keywords:Conditional process level large deviations; Quenched free energy

1. Scenario and main result

LetEbe a countable set (‘letters’ or ‘symbols’),ν∈P(E)a probability measure onEwith ν(x) > 0 for all x ∈ E. Let(Xi)i∈N be an i.i.d.-ν sequence,(τj)j∈N an independent i.i.d.-ρ

sequence with values inN. We assume thatρhas exponentially bounded tails

∃C, λ: ∀n:ρ_n ≤Cexp(−λn) (1)

and that theτs generate an aperiodic renewal process, i.e. gcd{i:ρ_i >0} =1. Cut out theX-sequence according toτ: PutT0:=0,Ti :=Ti−1+τi fori ≥1,

Yi = XTi−1+1,XTi−1+2, . . . ,XTi

, i ∈_N, (2)

E-mail address:[email protected].

(2)

with values inE˜ = ∪∞

k=1Ek(‘words’). We write|y| =kfor the ‘length’ ofy=(y1, . . . ,yk)∈

˜

E. By the independence properties of the ingredients,Y =(Yi)i=1,2,...is then an i.i.d. sequence

with marginal distribution

q0((x1, . . . ,xk)):=P(Y1=(x1, . . . ,xk))=ρk k

Y

i=1

ν(xi). (3)

For a sequence(Yi)with values inE˜Nwe writeLi = |Yi|for the length of thei-th word (in the

present scenario, we haveLi =τi, but it will be convenient to have a variable for word lengths

also ifY does not arise from a construction with aτ-sequence). Note that we have a (left) shift θ:EN→ ENon letter sequences and a (left) shiftθ˜: ˜EN→ ˜ENon word sequences. Let

RN := 1 N N−1 X i=0 δ_θ˜i(₍_Y1_,...,_YN₎per) (4)

be the empirical distribution process of the words with values inP(_E˜N₎_{, the probability measures}

on sequences of words. Here, y1, . . . ,ymperdenotes the periodic extension of(y1, . . . ,ym)∈ ˜

Em to an element ofE˜N.

The setsE andE˜ are countable, so they are Polish spaces with the discrete metric. ThenEN

andE˜Nare again metric spaces e.g. via

d_AN (z1,z2, . . .), (z0₁,z0₂, . . .):= ∞ X n=1 2−|n| dA(zn,z0n)∧1

for A=Eor A= ˜E. This metric induces the product topology onENorEÑ, respectively. We equipP(_EÑ₎_{with the topology of weak convergence. Write}_Pshift₍_EÑ₎_{for the shift invariant}

probability measures onEÑ, andPerg₍_EÑ₎_{for the set of (}_θ˜_{-shift) ergodic probability measures} onEÑ. Note thatPshift₍_EÑ₎_{is a closed subset of}_P₍_EÑ₎_.

It is well known that the family of distributionsL(RN)satisfies a large deviation principle;

the ‘good’ rate function is given by

H(Q|Q0)= lim N→∞ 1 Nh Q|_F N |Q 0_| FN , (5)

the specific relative entropy with respect to Q0 := L(Y)= (q0)⊗N; see e.g. [6,5], Chap. IX or [3], Chapter 6.5. HereFN =σ (Y1, . . . ,YN),Q|FN isQrestricted to the firstN words, and

for probability measuresµ,µ0on some measurable space,

h(µ|µ0)=    Z log dµ dµ0dµ ifµis absolutely continuous w.r.t.µ 0_, ∞ otherwise,

denotes the relative entropy ofµwith respect to µ0. Our aim is to understand the almost sure large deviation behaviour of the family of random probability distributions

L(RN | X).

AsP(EN)andP(_E˜N₎_{are Polish, we can and shall think in the following of a family of regular}

(3)

L(RN| X) = X j1<···<jN N Y i=1 ρ(ji −ji−1) N−1 X k=0 1 Nδθ˜k X|[1···j1],X|[j1+1···j2],X|[jN−1+1···jN] per , (6) where forx=(xi)∈ EN,k< ` x|[k···`]:=(xk,xk+1, . . . ,x`)∈ ˜E. (7)

Quantities involving the conditional expectation of exponential functionals of RN appear

naturally in the computation of the quenched free energy for polymer models in disordered media. In particular, the asymptotic evaluation of the free energy can be formulated as a conditional large deviation problem, and variational formulas as in Corollary 1 make the energy–entropy trade-off explicit. This potential application motivated our original interest in the question studied in this note; see Section2for more details.

It is natural to invert the cutting by concatenation: Let the concatenation operatorκ : ˜EN→

ENbe defined in the obvious way by

κ(y1,y2,y3, . . .)=y₁1,y₂1, . . . ,y_`1 1,y 2 1,y22, . . . ,y2`2,y 3 1, . . .

for yi = (y₁i, . . . ,yi_`i) ∈ E˜. For finite sequences of words,κ(y1, . . . ,yn) ∈ E|y1|+···+|yn| is defined analogously.

One can imagine that because of the conditioning, which fixes atypicalrealisation of theX -sequence, the conditional lawL(RN | X)feels restrictions, and that some deviations, which are

simply exponentially unlikely under the unconditional law, become actually impossible once a typicalXis fixed. Let

R:= ( Q∈P(_E˜N₎_:_{w- lim} L→∞ 1 L L−1 X j=0 δ_θj_κ(_Y₎=ν⊗N Q-a.s. ) , (8)

where w-lim denotes the limit with respect to the weak topology onP(EN).Q∈Rmeans that underQ, the concatenation of words has almost surely the same asymptotic statistics as a typical realisation of(Xi). ObviouslyQ0∈R.

Our main result is a full LDP for the (random) familyL(RN|X),N ∈ N; it roughly states

that underP(RN ∈ · |X), only such deviations can be realised which respect the restriction set

R.

Theorem 1. Under Assumption(1), the following events occur with probability 1:

lim sup

N→∞

1

N logP(RN ∈ F|X) ≤ −Q∈F∩Rinf∩Pshift₍_E˜N₎

H(Q|Q0)

for all closed F⊂P₍_E˜N_), ₍₉₎

lim inf

N

1

N logP(RN ∈G| X)≥ −Q∈G∩Rinf∩Pshift₍_E˜N₎

H(Q|Q0)

for all open G⊂P₍_E˜N_). ₍₁₀₎

(4)

Corollary 1. For any bounded continuous functionΦ: ˜EN→_Rwe have lim N 1 N logE exp N Z Φ(y)RN(dy) |X = sup Q∈R∩Pshift₍_E˜N₎ Z Φ(y)Q(dy)−H(Q|Q0) a.s. (11)

Remark 1. The same results hold for the ‘non-periodic’ flavour of the empirical process,

Rnon-per_N := 1 N N−1 X i=0 δ_θ˜i_Y.

Furthermore, the restriction to aperiodicρ is not severe. Ifρhas periodd >1, simply consider

E0:=Edas a new alphabet.

Remark 2. Theorem 1does not hold in this form without assumptions on the tails ofρ. In fact, in a situation whereρndecays only algebraically, one can probe exponentially (inN, the number

of pieces one wants to cut) far ahead into theX-sequence in order to find regions whereXlooks atypical.

For a concrete example, consider the following scenario: Let(Xi)be i.i.d. Ber(1/2),ρn =

C/na,a>2, som_ρ :=P

nnρn<∞. Put

σN :=min{k∈N: Xk=Xk+1=Xk+[N(m_ρ+)]=1}.

Let q1(x1, . . . ,xm) := ρm1(x1 = · · · = xm = 1), and let O ⊂ P(E˜N) be a (small)

neighbourhood of(q1)⊗N. Under(q1)⊗N, all words consist entirely of 1s. Note that logσN ∼

N(m_ρ+)log 2 by the Erd˝os–R´enyi law andP(RN ∈ O | X)≥e−NρσN byLemma 9below

(note that forQ=(q1)⊗N, we haveH_Lc(Q)= −_EQlogρL1 in this case; cfLemma 3) for large

enoughN, so

lim inf

N→∞

1

N logP(RN ∈ O|X)≥lim infN→∞

1

N logρσN >−∞.

On the other hand, if(9)held true in this scenario, the answer would have to be−∞, because (q1)⊗N6∈R.

ByLemma 8,(9)will hold withRreplaced byR, but in view ofRemark 8in Section3, this amounts essentially only to the unconditional upper bound, which we expect not to be sharp. The intuitive argument advocated on Section1, that any limitingQmust be built ‘on top’ of a typical

X-sequence, is not valid in general. In fact, whenρhas algebraic tails, there will be a trade-off on the exponential scale between how deep one probes into the fixedX-sequence, which allows one to find more atypical regions, and the price for those long jumps. In view of the potential application to the computation of quenched free energies for polymer models in random media considered in Section2, it appears a very interesting problem to find a quantitative description of this phenomenon. This question will be pursued in future work.

Remark 3. In many applications, see e.g. Section2below, one is actually interested in a level 2 large deviation problem, i.e. the behaviour of the empirical distributionN−1PN

i=1δYi. This

(5)

formulation of the conditional large deviation behaviour on level 2, as the restriction setRcan only be expressed in terms of the empirical process (i.e. a level 3 object).

Remark 4. It is conceivable that the results continue to hold if the discrete setEis replaced by a Polish space. A technical difficulty one will encounter when transferring the arguments to a general context is in giving a suitably generalised definition of the (conditional) specific entropy appearing inLemmas 3and4. We have not pursued this issue further.

The rest of this paper is organised as follows. In Section2we indicate how Corollary 1, or rather, its analogue in a scenario where in contrast to Assumption(1),ρhas algebraic tails, could be used to represent the quenched free energy of directed polymer models with random disorder via a variational formula. We illustrate the use ofCorollary 1by expressing the quenched free energy of a modified polymer model. Coming back to the main plot, we give in Section3 a useful characterisation of the property Q ∈ Runder the additional constraint that Qhas finite mean word lengths. This characterisation allows us to make a connection between Q and an ‘underlying’ i.i.d.-νsequence, and to decompose the relative entropy into a part derived from the concatenated letter sequence plus a part related to the word lengths, given the concatenation. In Section4, we prove the upper bound(9); Section5treats the lower bound(10).

2. Relation to quenched free energy computations

Computations involving conditional expectations of exponential functionals of RN appear

in studies of directed polymer models in random environments. As an example let us consider the (modified) quenched specific free energy for the random heteropolymer model (see [1] and references there), defined as

fque(λ,h):=lim 1 N logZ ∗ N,X, where Z∗_N,X =E " exp λ N X n=1 (Xn+h)sign(Sn) ! ;SN =0 # .

Here,λ,h≥0 are parameters,(Sn)is a symmetric simple random walk onZstarting atS0=0,

(Xn)are i.i.d. random variables, independent of S, taking the values±1 with probability 1/2

each, andErefers to expectation with respect to(Sn). In this context, if Sn =0, ‘sign(Sn)’ is

defined as sign(Sn−1)—one thinks of the ‘bonds’ between the steps of the random walk being

above or below the axis. We implicitly assume that N is even; otherwise Z∗_N_,_X =0. This is a model for a polymer with a random composition of hydrophilic and hydrophobic monomers near an oil–water interface. The ‘letter’Ximodels the affinity of monomeritowards different parts of

the solvent.hmodels differences in the affinity of the two types of monomers, andλis an inverse temperature parameter. The free energy itself uses the same expression without the restriction on {SN =0}; this difference is irrelevant in the limit (see [1], Lemma 2).

Note that for the computation of the free energy, the details of the a priori measure on paths (Sn)are not important. All that matters is the fact that excursions from 0 are independent and

symmetric, the only datum that is required for computing Z∗_N_,_X is the distribution(ρn)of the

(6)

assigning independent random signs to the excursions, we can rewrite Z∗_N_,_X =X k X j1<···<jk=N k Y i=1 ρji−ji−1× k Y `=1 cosh  λ j_` X i=j_`−1+1 (Xi +h)  , (12)

whereρn = P0(S1, . . . ,Sn−1 6= 0,Sn = 0)are the return probabilities for the random walk.

Thus forz≥0 the (random) generating function ofZ∗_N_,_X is given by

θ(z)=X N zNZ∗_N_,_X =X N X k X j1<···<jk=N k Y i=1 ρji−ji−1× k Y `=1    zji−ji−1_cosh  λ j_` X i=j_`−1+1 (Xi+h)      = ∞ X k=1 Fk(X;z), where Fk(X;z):= X j1<···<jk k Y i=1 ρji−ji−1exp k X `=1 fz (Xj_`−1+1, . . . ,Xj`) ! (13) with

fz((x1, . . . ,x`)):=`logz+log cosh λ

` X i=1 (xi +h) ! . (14)

By introducing an auxiliary i.i.d.-ρsequence(τi)as in Section1and defining(Yi)as in(2), this

can be expressed as Fk(X;z)=E exp k Z fz(y) π1Rk(dy) |X , (15)

whereπ1 : ˜EN→ ˜E is the projection to the first coordinate (and henceπ1Rk := Rk◦(π1)−1,

the empirical distribution of the firstkwords).

Thus if we could (at least in principle) compute the almost sure asymptotic growth rate ϕ(z):= lim

k→∞

1

klogFk(X;z)

via an analogue ofCorollary 1, we obtained that the radius of convergence ofθ(z)is given by

r_θ :=sup{z≥0:ϕ(z) <0}, and hence the quenched specific free energy

fque(λ,h)= −log sup{z≥0:ϕ(z) <0} = −logr_θ.

Note that the tails ofρn, the return probability of a one-dimensional random walk, decay only

algebraically in this scenario. In particular, ρ does not satisfy Assumption (1), so that the application of Corollary 1to the computation of ϕ(z)is not justified (and would, in view of Remark 2, almost certainly yield an incorrect result). We reiterate our statement from the end of Remark 2that in view of the above considerations, it would be very interesting to extend Theorem 1to the general case.

(7)

In order to illustrate the application of the conditional large deviation principle stated in Section1, let us consider a modified model, where

the partition functionZ∗_N_,_Xis given by(12)withρsatisfying lim sup

n→∞ (

logρn) /n<0.(16)

This is a model for a situation where the polymer has a strong attraction towards the interface, as under the a priori measure excursions have short tails. We do not advertise this model as particularly physically relevant, we would rather view it as an illustration of the use of the techniques developed in this paper under the restriction of Assumption(1). There can never be a depinning transition (as is the case for the original model; see [1]), but still for fixed realisation of (Xi), the polymer can try to optimise its configuration by grouping excursions according

to stretches of Xis with the same sign, and there will be an energy–entropy trade-off. In this

situation, the application ofCorollary 1will be justified.

Let us briefly discuss the corresponding annealed scenario, where one also averages over the sequenceXdescribing the polymer composition. Let

fann(λ,h):=lim 1

N logE

Z∗_N,X

be the annealed specific free energy and θann(z)be the generating function of the sequence

E[Z∗N,X]. Arguing as above we haveθann(z)=

P∞

k=1Fkann(z)whereFkann(z):= E[Fk(X;z)].

As under the annealed measure the ‘marked excursions’(Yi)i=1,2,...are i.i.d., we see from(15)

thatF_kann(z)= F₁ann(z)k; hence ϕann₍_z₎_:= _lim k→∞ 1 klogF ann k (X;z)=logF1ann(z). Note that F₁ann(z)= ∞ X j=1 zjρjE " cosh λ j X i=1 (Xi+h) !# = ∞ X j=1 zjρj j X m=0 2−j _j m cosh(λ(j−2m+j h)) .

This can be viewed as a power series in zwith positive coefficients; let Rann₁ be its radius of convergence (note thatR₁ann>0 as cosh(λ(1+h)j)grows only exponentially inj). Letzann_∗ be the (unique) solution of F₁ann(zann_∗ )=1 (which exists becauseF₁ann(0)= 0,F₁ann(z) → ∞as

z%R₁ann); hence

fann(λ,h)= −log sup{z≥0:ϕann(z) <0}= −log(zann∗ ).

An application ofCorollary 1yields

Lemma 1. For the modified model(16)we have for any0≤z<Rann₁

ϕ(z)= sup Q∈R∩Pshift₍_E˜N₎ Z fz(y)(π1Q)(dy)−H(Q|Q0) a.s., (17)

where in the notation of Section1, E = {±1},ν(±1) = 1/2, q0((x1, . . . ,x_`)) = 2−`ρ_` for

(8)

Note that(15)actually requires only a level 2 large deviation analysis, but it seems that in order to express the restriction setR, one is forced to use a level 3 formulation—the empirical distribution of words alone seems too weak to capture the restrictions coming from conditioning on a typical

X-sequence.

An explicit evaluation of the variational problem in(17)appears extremely difficult in general. Still, we can obtain fromLemma 1that the ‘quenched to annealed bound’ is always strict in this model, i.e.

fque(λ,h) < fann(λ,h) ∀λ >0,h≥0 (18)

so there is no so-called weak disorder regime. This is not very surprising; we will see below that in the unconditional problem, the sequenceXand the excursions both behave atypically in order to maximise the free energy, while in the quenched case,X is forced to be typical.

Lemma 1is basicallyCorollary 1applied to the asymptotic evaluation of(15). There is a slight complication because fzis not bounded, but (at least) forz<R₁annwe can find >0 such

that lim sup k→∞ 1 klogE exp (1+)k Z fz(y) (π1Rk)(dy) X <∞ a.s., (19)

which suffices for an application of Varadhan’s Lemma; see e.g. Condition 4.3.3 in [3]. In order to check(19)note that fz(y)≤ C0|y|; thus forz < R₁annwe can find >0 andz0 ∈ (z,R₁ann)

such that(1+)fz(y)≤ fz0(y)for ally∈ ˜E. AsFann

k (z

0₎_{grows only exponentially, the same}

will hold true for the sequence of conditional expectations inside the log in(19), e.g. by a simple combination of Markov’s Inequality and the Borel–Cantelli Lemma as in the proof ofLemma 8. In order to prove(18), it suffices to check that ϕ(z) < ϕann(z)for all z ∈ (0,R₁ann). For this it is instructive to apply Varadhan’s Lemma to the unconditional distribution and use the representation

ϕann₍_z₎₌_log_Fann

1 (z):= sup Q∈Pshift₍_E˜N₎ Z fz(y)(π1Q)(dy)−H(Q|Q0) = sup q∈P(_E˜₎ Z fz(y)q(dy)−h(q|q0)

=logF₁ann(z)− inf

q∈P(_E˜₎

h(q|q∗,ann), (20) whereq∗,ann((x1. . . ,x_`)) = 1

F₁ann(z)ρ`

Q`

i=1ν(xi)×exp f ((x1, . . . ,x`)) is (the marginal of)

the unconstrained maximiser, which depends implicitly on z. Equality between the two sup-terms above stems from the fact that among all Qwith given marginalπ1Q = q, the specific

relative entropyH(Q|Q0)is minimised by the product measure Q=q⊗N.

Fix z ∈ (0,Rann₁ ); note that Q∗,ann := (q∗,ann)⊗N 6∈ R. A quick way to check this is as follows: In the case h > 0, we see easily that P

yy1q∗,ann(y) > 0, so

limL→∞L−1PL_j₌₁κ(Y)j >0 almost surely underQ∗,ann, and henceQ∗,ann6∈R. On the other

hand, ifh = 0 we can observe thatP

|y|=`yiyjq∗,ann(y) > 0 for any` ≥ 2, 1 ≤ i,j ≤ `,

i.e. letters are positively correlated underq∗,ann, so limL→∞L−1PL_j₌₁κ(Y)jκ(Y)j+1 > 0

almost surely underQ∗,ann, and hence againQ∗,ann6∈R.

AsR∩A_M is compact (see Remark 8), where AM = {Q : H(Q|Q0) ≤ M}is the M

(9)

B_δ(Q∗,ann)∩A_M ⊂Rc, and so byLemma 1 ϕ(z)≤ sup Q∈Pshift₍_E˜N₎_∩(_(B_δ_(Q∗,ann₎₎c_∪_Ac M) Z f(y)(π1Q)(dy)−H(Q|Q0) < ϕann₍_z₎

for a suitable choice ofMandδin view of(20).

3. A characterisation of the restriction set

Imagine cutting the sequenceXinto pieces and then looking at the empirical process of these pieces. Then obviously the concatenationκ(Y)under a limitingQ∈Pshift₍_E˜N₎_{need not be shift}

invariant. For example, if we arrange theτs in such a way that the cut-points tend to occur before a certain pattern, then under RN, the law of the concatenated sequence will have a (possibly

atypical underν⊗N) inclination to begin with this pattern.

A way to reinstate shift invariance (and in some way ‘get back the underlying i.i.d. sequence’) which works whenQhas finite mean word lengths is to size-biasQaccording toL1:= |Y1|and

then ‘randomise out the origin’ – this is familiar from the theory of stationary renewal processes. Using this idea we obtain in this section a characterisation of the setRdefined in(8).

ForQ∈Pshift₍_E˜N₎_with_m

Q:=EQL1<∞letQˆ ∈P(E˜N)be defined by ˆ Q(Yi, . . . ,Yk)∈ Bk = 1 mQEQ h L11Bk (Yi, . . . ,Yk)i (21)

(for anyk ∈ _N, and measurable Bk ⊂ ˜Ek). Let(Yˆi)i∈Nhave law Qˆ; givenYˆ, V uniform on

{0,1, . . . ,L1−1}, put

Z :=θVκ(Yˆ). (22) We denote the distribution of Z obtained in this way byΨQ ∈ P(EN)to stress that it depends

onQ. Explicitly, for measurableA⊂EN,

ΨQ(A)= 1 mQEQ "_L 1−1 X i=0 1A θi_(κ(_Y₎₎ # . (23)

We check thatΨQ is shift invariant: Fixm∈N,Bm ⊂Em measurable. We have

P (Z1, . . . ,Zm)∈ Bm | ˆY = 1 | ˆY1_| | ˆY1_| X i=1 1Bm (κ(_Yˆ₎ i, . . . , κ(Yˆ)i+m−1) , and hence (with a slight abuse of notation)

ΨQ((Z1, . . . ,Zm)∈Bm)= 1 mQ EQ " L1 1 L1 L1 X i=1 1Bm (κ(_Yˆ₎ i, . . . , κ(Yˆ)i+m−1) # = 1 mQ EQ "_L 1 X i=1 1Bm (κ(_Yˆ₎ i, . . . , κ(Yˆ)i+m−1) # .

(10)

AsQisθ˜-shift invariant, EQ "_L 1 X i=1 1Bm (κ(_Yˆ₎ i, . . . , κ(Yˆ)i+m−1) # =_EQ   L1+···+Lk X i=L1+···+Lk−1+1 1Bm (κ(_Yˆ₎ i, . . . , κ(Yˆ)i+m−1)  

for anyk∈_N; hence

ΨQ((Z1, . . . ,Zm)∈ Bm)= 1 MmQEQ "_L 1+···+LM X i=1 1Bm (κ(_Yˆ₎ i, . . . , κ(Yˆ)i+m−1) #

for allM∈_N. Similarly, we have

ΨQ((Z2, . . . ,Zm+1)∈Bm)= 1 MmQEQ "_L 1+···+LM X i=1 1Bm (κ(_Yˆ₎ i+1, . . . , κ(Yˆ)i+m) # , and consequently ΨQ((Z1, . . . ,Zm)∈ Bm)−ΨQ((Z2, . . . ,Zm+1)∈ Bm) ≤ 2 MmQ . TakingM→ ∞we see thatΨQ is shift invariant.

Remark 5. IfQisθ-shift ergodic and has finite mean word lengths˜ EQ|Y1| <∞, thenΨQ is

θ-shift ergodic.

Proof. LetA ⊂ EN beθ-shift invariant. Then for y = (y1,y2, . . .) ∈ ˜E,κ(y) ∈ Aimplies θi_(κ(_y₎₎_∈_A_{for any}_i_{, so in particular}_κ(_θ(˜ _y₎₎₌_θ|y1|_(κ(_y₎₎_∈_{A. Thus, the event}_{_κ(_Y₎_∈_A_}

isθ˜-shift invariant, so Q(κ(Y)∈ A)∈ {0,1}by assumption. On the other hand, we see from (23)and the discussion above that

ΨQ(A)= 1 mQ EQ   |Y1|−1 X i=0 1A θi_(κ(_Y₎₎  = 1 mQ EQ h |Y1|1A(κ(Y)) i ∈ {0,1}.

Lemma 2. Assume that Q ∈ Pshift₍_E˜N₎_satisfies

EQ|Y1| <∞. Then we have Q ∈ Rif and

only ifΨQ=ν⊗N. In this case,LQ(κ(Y))ν⊗N.

Proof. Let ΨQ = ν⊗N. Then under Qˆ, the sequence κ(Y) almost surely has the ‘right’

asymptotic pattern frequencies (i.e.

lim N→∞ 1 N N X i=1 1Bk((κ(Y)i, . . . , κ(Y)i+k−1))=ν ⊗k₍_B k) a.s.

for any measurableBk ⊂ Ek,k ∈ N). As Q ˆQ(in fact, the density(EQL1)/L1is strictly

(11)

Now assume thatQ∈R. AsQˆ Q, the sequenceZi,i ∈N, underΨQalso has the ‘right’

asymptotic pattern frequencies, i.e. lim N→∞ 1 N N X i=1 1Bk((Zi, . . . ,Zi+k−1))=ν ⊗k₍_B k) a.s. (24)

for anyk ∈ _N,Bk ⊂ Ek measurable. It suffices to verify that any shift invariant sequence(Zi)

satisfying(24)is in fact an i.i.d.-νsequence. The limit on the left hand side of(24)is equal to

P((Z1, . . . ,Zk)∈ Bk |I)

whereI is the shift invariantσ-field. Thus

P((Z1, . . . ,Zk)∈Bk)=E[P((Z1, . . . ,Zk)∈Bk |I)]=ν⊗k(Bk)

so that indeedL(Z)=ν⊗N.

Now assume thatΨQ=ν⊗Nand letA⊂ENbe a (measurable)ν⊗N-null set. Then we have

0=ν⊗N(A)=ΨQ(A)= 1 EQL1EQ "_L 1−1 X i=0 1A(θiκ(Y)) # ≥ 1 EQL1 Q(κ(Y)∈A_{) .} This proves thatLQ(κ(Y))ν⊗N.

Remark 6. IfQ∈RandEQL1<∞, by the above there is a random(Y,V)such thatY ∼ ˆQ

andθVκ(Y)is distributed like an i.i.d.-νsequence. We can ‘invert’ this relation, at least in the two-sided scenario: There is (on some probability space) a random pair (∆, Z) with values in

Z×EZsuch thatL(Z)=ν⊗ZandL(θ∆Z)=L_Qˆ(κ(Y)). For example, one can take(Y,V)

as above and then defineZ :=θVκ(Y),∆:= −V.

Remark 7. Note that the mappings Q 7→ Qˆ, Q 7→ ΨQ are not continuous with respect to

the weak topology on Pshift₍_E˜N₎_(as _E˜N ₃ ₍_yi₎

i 7→ |y1| is not bounded, weak convergence

need not imply convergence of the first moment of piece lengths). On the other hand, assume that QN ∈Pshift(E˜N)converge weakly toQ∞and that additionallyEQN[L1]→EQ∞[L1] as

N → ∞. Then

ˆ

QN → ˆQ∞ weakly onP(E˜N) and ΨQN →ΨQ∞ weakly onP(E N_).

Proof. Note that by the assumptions, the family {L_QN(L1),N ∈ N}is uniformly integrable.

Hence also for anyk ∈ _N,yi ∈ ˜E, the family{L_Q

N(L11(Y

i ₌ _yi_,_i ₌₁_{, . . . ,}_k_)),_N _∈ N}is

uniformly integrable. This implies ˆ QN(Yi =yi,i =1, . . . ,k)→ ˆQ∞(Yi =yi,i=1, . . . ,k). Similarly, because 0 ≤ PL1 i=11(κ(Y)i = z1, . . . , κ(Y)i+m = zm+1) ≤ L1 (for anym ∈ N, zj ∈E), we conclude that ΨQN(Z1=z1, . . . ,Zm+1=zm+1)→ΨQ∞(Z1=z1, . . . ,Zm+1=zm+1).

Remark 8. Much of the difficulty in the proofs below stems from the fact that the setRis not closed in the weak topology. In fact,R∩Pshift₍_E˜N₎₌_Pshift₍_E˜N₎_{. On the other hand, let}

(12)

be the level sets of the rate functionQ 7→ H(Q|Q0). One can see from the considerations in Lemma 5andProposition 2that

for anyM,the setR∩A_M is closed(in the weak topology onP(_E˜N_)). ₍₂₅₎

Proof. For the first claim it suffices to showR⊃ {Q∈Pshift₍_E˜N₎_:

EQ|Y1|<∞}, as this set

is dense inPshift₍_E˜N₎_{. Fix an arbitrary}_Q_in_Pshift₍_E˜N₎_satisfying

EQ|Y1|<∞. Letq˜ ∈P(E˜) be given by ˜ q((x1, . . . ,xn))= C n−3/2 n Y i=1 ν(xi),

i.e. the length of the word has heavy tails; given that the length isn, it looks likenindependent draws fromν. Define QN as follows: underQ˜N, the blocks(Yk N+1,Yk N+2, . . . ,Y(k+1)N−1),

k ∈ _N₊, are i.i.d, L_Q˜

N((Y

1_{, . . . ,}_YN₎₎ _{= ˜}_q _⊗ _Q_|

σ(Y1_,...,_YN−1₎. QN is defined as Q˜N with

randomised origin, formally QN = N−1PNi=0−1Q˜N ◦ ˜θi. Then we have QN ∈ Pshift(E˜N)

(in fact even QN ∈ Perg(E˜N)), QN → Q weakly. Finally, each QN ∈ R because the

word length underq˜ has no mean: imagine pointing at positionU inκ(Y)under QN, where

U ∼ Unif({1, . . . ,L}). As L → ∞, the probability tends to 1 that one actually looks inside a ‘q˜-word’ of the concatenation, where the pattern frequencies are what they ought to be in a ν⊗N_-sequence.

In order to verify(25), note thatA_Mc is open because H(·|Q0)is lower semicontinuous. By

combining Lemmas 7and 5 we can choose for any Q ∈ A_M \ R an open neighbourhood

UQ 3 Q such that lim sup_N1 logP RN ∈UQ|X

≤ −2M. ByProposition 2, we must have UQ∩R⊂AMc. Hence(R∩AM)cis open.

3.1. A decomposition of the specific relative entropy

In this section we study how the specific entropy (and the specific relative entropy w.r.t.Q0) of aQcan be expressed in terms involvingΨQ, which will be useful later on. Here and in the

following, for a probability measurePand a discrete random variableUwe will be writingP(U)

for the random variable f(U), where f(u)= P(U =u). Similarly,P(U|V)meansg(U,V), whereg(u, v)=P(U =u|V =v).

Lemma 3. Let Y = (Yi)i∈Nhave distribution Q; write Li := |Yi|, KN := κ(Y1, . . . ,YN).

Assume Q∈Perg₍_E˜N₎_{satisfies m}

Q :=EQL1<∞. Then we have lim N→∞ −1 N logQ(K N₎₌_m QH(ΨQ) Q-a.s., (26) lim N→∞ −1 N logQ L1, . . . ,LN|KN =:H_Lc(Q) (27)

exists Q-almost surely, the limit H_Lc(Q)is a constant. In particular, the specific entropy of Q can be represented as H(Q)= lim N→∞ −1 N logQ Y1, . . . ,YN =mQH(ΨQ)+HLc(Q). (28)

(13)

We call H_Lc(Q) the conditional specific entropy of word lengths under Q, given the concatenation. Intuitively, a ‘ΨQ-typical’ wordx∈ ˜Eof length|x| ≈N mQcan be decomposed

in ≈ exp(N H_Lc(Q)) different ways into ‘Q|_F

N-typical’ N-vectors of words (y

1_{, . . . ,}_yN₎

satisfying κ(y1, . . . ,yN) = x. See the proof of Lemma 9for a rigorous implementation of this notion.

Proof. WriteSN :=L1+ · · · +LN (= |KN|), fix >0. Note that in the event

AN:= {N(mQ−)≤SN ≤N(mQ+)} we have Q(κ(Y)|_[1···_N₍_m Q+)],SN)≤ Q(K N₎_≤ _Q_(κ(_Y₎_| [1···N(mQ−)]).

The second inequality together with the facts that lim infN→∞1AN = 1 almost surely by

ergodicity ofQandLQ(κ(Y))ΨQbyLemma 2shows that

lim sup N→∞ 1 N logQ(K N₎_{≤ −}₍_m Q−)H(ΨQ) a.s. (29) because lim n→∞ 1

n logΨQ|[1···n]((Z1, . . . ,Zn))= −H(ΨQ) forΨQ-a.a.Z =(Z1,Z2, . . .),

whereH(ΨQ)is the specific entropy ofΨQ (recall thatΨQisθ-shift ergodic byRemark 5). On

the other hand, writing

Q(κ(Y)|_[1···_N₍_m

Q+)],SN)=Q(κ(Y)|[1···N(mQ+)])Q SN|κ(Y)|[1···N(mQ+)]

and noting that lim N→∞ 1 N logQ SN|κ(Y)|[1···N(mQ+)] =0 a.s. (30) we obtain lim inf 1 N logQ(K (N)₎_{≥ −}₍_m Q+)H(ΨQ) (31)

almost surely as above. Taking → 0 in(29)and(31), we obtain(26). Intuitively,(30)holds true because the conditional distribution concentrates on a set of size≈ const×N; a formal argument might be as follows: For anyx∈E[N(mQ+)]_,δ >_{0 we have}

[N(mQ+)]

X

k=[N(m Q−₎]_,

Q(SN=k|κ(Y)|[1···N(m Q+₎]=x)≤exp(−δN)

Q SN =k|κ(Y)|[1···N(mQ+)]=x≤2Nexp(−δN)

which is summable in N. Thus the Borel–Cantelli Lemma together with lim inf1AN = 1 a.s.

shows that lim sup N→∞ −1 N logQ SN|κ(Y)|[1···N(mQ+)] _≤_δ a.s. for anyδ >0.

(14)

Finally, we know by ergodicity ofQthat lim N→∞ −1 N logQ Y1, . . . ,YN

exists almost surely and equalsH(Q), the specific entropy ofQ. Writing

QY1, . . . ,YN=Q(KN)QL1, . . . ,LN|KN

, this gives(27)and(28).

The following result decomposes the specific entropy of Q with respect to Q0 into a part which comes from the concatenated letters and a part describing the different word length distributions.

Lemma 4. Assume Q∈Perg₍_E˜N₎_{satisfies m}

Q :=EQL1<∞. Then we have H(Q|Q0)=mQH(ΨQ|ν⊗N)−EQ logρL1−H

c

L(Q). (32)

Note that the term−_EQ logρL1−H

c

L(Q)can be interpreted as the conditional specific relative

entropy of word lengths underQwith respect toρ⊗N, given the concatenation.

Proof. We haveQ-a.s. by ergodicity ofQ

H(Q|Q0)= lim N→∞ 1 N log Q Y1, . . . ,YN Q0 _Y1_{, . . . ,}_YN = −H(Q)− lim N→∞ 1 N N X i=1 logρLi − lim N→∞ 1 N L1+···+LN X j=1 logν(κ(Y)j) = −H_Lc(Q)−_EQ logρL1 −mQH(ΨQ)− lim N→∞ 1 N L1+···+LN X j=1 logν(κ(Y)j)

byLemma 3. Furthermore note that

lim N→∞ 1 N L1+···+LN X j=1 logν(κ(Y)j)=mQ Z EN logρ(z1)ΨQ(d z) Q-a.s. (33)

because(L1+ · · · +LN)/N →mQ,LQ(κ(Y))ΨQbyLemma 2andΨQ isθ-shift ergodic

byRemark 5.

Finally note that (becauseν⊗Nis a product measure)

−mQH(ΨQ)−mQ

Z

EN

logν(z1)ΨQ(dz)=mQH(ΨQ|ν⊗N)

to complete the proof.

4. Conditioning and the restriction set, upper bound

In this section we prove the upper bound inTheorem 1. First we show thatP(RN ≈ Q|X)

is super-exponentially expensive for any typicalX and Q 6∈ R. Intuitively, this is so because thenRN ≈ Qrequires inclusion of substantial (i.e. with length of order N) atypical pieces of

theX-sequence in the sum(6), which requires that at least some of the j-increments appearing in(6)are exponentially long inN. Because of(1), all such terms will be extremely small.

(15)

Lemma 5. Let Q ∈Pshift₍_E˜N₎_\_R_{satisfy m}

Q =EQ|Y(1)|<∞. Then for any B ≥0there is

an open neighbourhoodU ⊂Pshift₍_E˜N₎_{of Q such that}

lim sup

N→∞

1

N logP(RN ∈U|X)≤ −B a.s. (34)

Proof. Step1. First we claim that there existε1, ε2 >0 such that for any large enoughM ∈N

there is a subsetBM ⊂ ˜EM of ‘X-unlikely sentences’ with the following properties:

X

(w1_,...,wM₎_∈_B M

Q(Y1, . . . ,YM)=(w1, . . . , wM)≥ε₁ and (35)

P(the sequenceXbegins with an element ofκ(BM))≤exp(−ε2M). (36)

In order to see this note that ifQis alsoθ˜-ergodic, by combining(26)and(33)we have (recall

KN =κ(Y1, . . . ,YN)) 1 N log Q(KN) |KN| Q i=1 ν(K_iN) −→ N→∞mQH(ΨQ |ν⊗N) Q-a.s. (37)

and the right hand side is strictly positive forQ6∈R. In the general case we see by decomposing

Qinto its ergodic components (cf e.g. [4], Thm. 5.2.16) that there is a random variableZ˜ ≥ 0, adapted to the shift invariant sigma-field, such that

1 N log Q(KN) |KN| Q i=1 ν(K_iN) −→ N→∞ ˜ Z Q-a.s., (38)

and the event{ ˜Z >0}has strictly positive probability underQif and only if Q6∈ R. Thus by assumption we can findε1, ε2>0 such thatQ(Z˜ > ε2) >2ε1; hence

Q      1 N log Q(KN) |KN| Q i=1 ν(K_iN) > ε2      >2ε1 (39)

forN large enough. AsE˜ is countable, for any large enoughN we can find (pairwise different) wordsw1, . . . , wL ∈ ˜E(thewi andLwill depend onN, but we suppress this dependency in the notation) such that

L X j=1 Q(KN =wj) >2ε1 and (40) logQ(KN =wj)≥ε₂N+ |wj| X i=1 logν(w_ij), j=1, . . . ,L. (41)

(16)

Note that(40)implies L X j=1 |wj| Y i=1 ν(wj i)≤exp(−ε2N) L X j=1 QKN =wj≤exp(−ε₂N). (42) Finally, for each of the wordswj (j = 1, . . . ,L) choose Mj (pairwise different) ordered

decompositions into N subwords wj,k,1, . . . , wj,k,N (k = 1, . . . ,Mj) such that wj =

κ(wj,k,1_{, . . . , w}j,k,N₎_{for each} _j_,_k_and Mj X k=1 Q(Y1, . . . ,YN)=(wj,k,1, . . . , wj,k,N)≥ 1 2Q KN =wj, j =1, . . . ,L.

This yields(35)and(36)with BN := n (wj,k,1_{, . . . , w}j,k,N₎_:₁_≤_k_≤ _M j,1≤ j≤ L o .

Step2. LetA⊂_Nhave asymptotic density p∈(0,1), i.e. lim

n→∞

|A∩ {1, . . . ,n}|

n = p, (43)

andε∈(0,1). We claim that for any p0> pwe have for allNlarge enough

X 1≤s1<s2<···<_sN |{s₁,...,sN}∩A|≥_εN N Y j=1 ρ(sj−sj−1)≤ (C˜)Nexp N εlog1 ε +(1−ε)log 1 1−ε × exp(− ˜λ/p 0₎ 1−exp(− ˜λ/p0₎ !εN , (44)

whereC˜,λ˜ are given byLemma 6. The left hand side of(44)is not more than

X 1≤i1<···<id_εNe≤N X j₁<···<jdεNe j₁,...,jdεNe ∈A dεNe Y `=1 ρ∗(i_`−i_`−1)(_j `−j_`−1) ≤ X 1≤i1<···<id_εNe≤N X j1<···<jd_εNe j1,...,jd_εNe ∈A dεNe Y `=1 ˜ Ci`−i`−1_exp_{− ˜}λ(_j `−j_`−1) ≤ ˜CN _N dεNe X j1<···<jd_εNe j1,...,jd_εNe ∈A exp− ˜λjd_εNe = ˜CN N dεNe ∞ X r=dεNe exp(− ˜λtr(A)) r−1 dεNe −1 , (45)

where we usedLemma 6in the first inequality and

(17)

is the position of ther-th element ofA. Note that for largeN _N dεNe ≤const×exp N εlog1 ε +(1−ε)log 1 1−ε (47) by Stirling’s Formula, that fors∈ [0,1)

∞ X r=n sr _r₋₁ n−1 = ∞ X r1,...,rn=1 sr1+···+rn ₌ _s 1−s n (48) and that by(43) lim r→∞ tr(A) r = 1 p. (49)

Finally, combine(45)and(47)–(49)to obtain the claim(44).

Step3. Consider (a large)M ∈_N, letε1, ε2andBM be as chosen in Step 1, put

U :=nQ0∈Pshift(_E˜N₎_:_Q0₍ Y1, . . . ,YM)∈BM > ε1/2 o . (50)

Note that by(35),U is an open neighbourhood ofQ. Let

A:= {i ∈_N:(Xi,Xi+1, . . . ,Xi+k)∈κ(BM)for somek} (51)

be the (random) set of positions where some element ofκ(BM)starts on the given X. As X

is i.i.d. and|κ(BM)| < ∞, A has a non-random asymptotic density p, and p ≤ exp(−ε2M)

by(36). Furthermore we see from(6)that for large enough N (so that boundary terms coming from the periodisation become negligible) only such summands(j1, . . . ,jN)will contribute to P(RN ∈U|X)which have the property that

#{1≤i ≤ N: ji ∈ A} ≥ ε1

4M =:ε. (52)

(We divide byM to account for possible overlaps of the concatenations of different elements of BM.) Now(44)yields lim sup N→∞ 1 N logP(RN ∈U|X) ≤logC˜+ εlog1 ε+(1−ε)log 1 1−ε +εlog2 exp− ˜λ/p ≤logC˜+ εlog1 ε+(1−ε)log 1 1−ε +εlog 2− ε1 4M × ˜λexp(ε2M) .

The expression in the last line can be made arbitrarily negative by picking a largeM (note that the terms involvingεare uniformly bounded forε∈(0,1)).

Lemma 6. Letρsatisfy(1). There areC and˜ λ >˜ 0such that

(18)

Proof. We have ρ∗k₍_n₎_≤ ∞ X n₁,...,nk=1 n1+···+_nk=n k Y i=1 Ce−λni =Cke−λn _n₋₁ k−1 . (54) Fixε∈(0,1/2). Ask7→n−1 k−1

is increasing fork< (n−1)/2, the right hand side of(54)for

k≤εnis not more than

Cke−λn n−1 dεne ≤const×Ckexp −λn+n εlog1 ε+(1−ε)log 1 1−ε

by Stirling’s Formula, while fork> εnthe observationn_k₋₁−1≤2n−1yields the bound

Cke−λn2k/ε =(21/εC)ke−λn.

Putλ˜ :=λ+εlogε+(1−ε)log(1−ε),C˜ :=21/εC. Note thatλ < λ˜ can be chosen arbitrarily close toλ, at the expense of enlargingC˜.

Lemma 7. Let ρ satisfy (1). Then any Q ∈ Pshift₍_E˜N₎ _{with H}₍_Q_|_Q0_{) <} _∞ _{has m}

Q =

EQ|Y1|<∞.

Proof. Letµ := L_Q(|Y1|) ∈ P(_N)be the marginal distribution of word lengths under Q. As

N−1h(Q|_F N |Q 0_| FN)% H(Q|Q 0_{) <}_∞_and_h_(µ_|_ρ)_≤_h₍_Q_| F1 | Q 0_| F1)it suffices to check that h(µ|ρ)= ∞ X n=1 ρnµn ρn logµn ρn < ∞ (55) impliesP

nnµn<∞. This must be well known; for completeness and lack of a reference, here

is a short argument: Split the sum in(55)into

∞ X n=1 µn≥ρn ρn µn ρn logµn ρn + ∞ X n=1 µn<ρn ρn µn ρn logµn ρn.

Asx 7→xlogxis continuous on[0,1], the second sum has some finite value∈(−∞,0], so the assumption implies ∞> ∞ X n=1 µn≥ρn µnlogµ n ρn = ∞ X n=1 µn≥ρn µn   logµn−logρn | {z } ≥0    ≥ ∞ X n=1 µn≥Cexp(−_λn/2) µn(logµn−logρn)≥ ∞ X n=1 µn≥Cexp(−_λn/2) µnλ n 2 by(1). On the other hand,

∞ X

n=1

µn<Cexp(−λn/2)

nµn<∞

(19)

Next we observe that an unconditional upper bound is automatically also an upper bound for the conditional distributions:

Lemma 8. For any closed F ⊂P(_E˜N₎_{we have}

lim sup

N→∞

1

N logP(RN ∈ F| X)≤ −Q∈F∩Pinfshift₍_E˜N₎

H(Q|Q0) a.s. (56) This is well known; here is a short proof for the sake of completeness.

Proof. WriteI(F):=inf_Q_∈_F_∩_Pshift₍_E˜N₎H(Q|Q0). For >0 we have by Markov’s Inequality

and the unconditional LDP

P(P(RN ∈F | X)≥exp(−N(I(F)−2)))

≤eN(I(F)−2)E[P(RN ∈F | X)]=eN(I(F)−2)P(RN ∈F)

≤eN(I(F)−2)e−N(I(F)−)=e−N forN large enough, and hence

lim sup

N→∞

1

N logP(RN ∈F | X)≤ −I(F)−2 a.s.

by the Borel–Cantelli Lemma. Take→0 to conclude.

The following is the main result of this section:

Proposition 1. For any closed F⊂P(_E˜N₎_{we have a.s.}

lim sup

N→∞

1

N logP(RN ∈ F| X)≤ −Q∈F∩Pinfshift₍_E˜N₎_∩_R

H(Q|Q0). (57) In particular, forF ∩R = ∅the conditional probabilityP(RN ∈ F | X)decays almost surely

super-exponentially.

Remark 9. As the weak topology onPshift₍_E˜N₎_{is separable, it is standard to strengthen}₍₅₇₎_to

hold with probability 1 simultaneously for all closed setsF; see e.g. [2], proof of Prop. III.2.

Proof of Proposition 1. First note that even thoughRNis not exactly shift invariant because of

boundary terms, it is nearly so: for any weak neighbourhood OofPshift₍_E˜N₎_{, there is}_n 0such

that RN ∈ O for N ≥ n0. AsPshift(E˜N)is closed in the weak topology, we can restrict to F∩Pshift₍_E˜N₎_{on the right hand side of}_(57).

Fix B >0 andε > 0 for the moment; letAB = {Q : H(Q|Q0)≤ B}be the B-level set

of H(· |Q0). Recall thatAB is compact with respect to the weak topology onP(E˜N). For any

Q∈F∩A_B choose an open neighbourhoodUQofQas follows:

(1) IfQ6∈R, takeUQ =Uas guaranteed byLemma 5, so that(34)is satisfied.

(2) If Q ∈ R, chooseUQ such that inf_Q0_∈_U

Q H(Q

0_|_Q0₎_≥ _H₍_Q_|_Q0₎₋_ε_{. This is possible by}

lower semicontinuity of H(· |Q0).

AsF∩A_Bis compact, we can pick a finite sub-coverUQ1, . . . ,UQm. Note thatF∩ ∪

m i=1UQi

c

is closed and contained inA_Bc, so inf

Q∈F∩ ∪m i=1UQi

cH(Q|Q

(20)

and hence lim sup N→∞ 1 N logP RN∈ F∩ ∪ m i=1UQi c | X≤ −B a.s.

byLemma 8. On the other hand, fori =1, . . . ,mwe have by construction (employingLemma 5 ifQi 6∈RandLemma 8ifQi ∈R) lim sup N→∞ 1 N logP(RN∈UQi |X)≤(−B)∨ − inf Q∈F∩RH(Q |Q0)+ε a.s., and consequently lim sup N→∞ 1 N logP(RN∈ F |X)≤(−B)∨ − inf Q∈F∩R H(Q|Q0)+ε a.s. TakeB→ ∞,ε→0 to conclude. 5. Lower bound

Proposition 2.Let Q∈R∩Pshift₍_E˜N₎_{, and let O}_⊂_P₍_E˜N₎_{be an open neighbourhood of Q.} Then we have lim inf N→∞ 1 N logP(RN ∈O|X)≥ −H(Q|Q 0₎ _a_._s_. ₍₅₈₎

Remark 10.Again it is standard to strengthen(58)to hold with probability 1 simultaneously for all open setsO; see e.g. [2], proof of Proposition III.3.

We will have occasion to consider open neighbourhoods ofQ∈Pshift₍_E˜N₎_{of the following}

form: ˜ OQ := Q0∈Pshift(_E˜N₎_: Z gidQ0− Z gidQ < ˜ εi,i =1, . . . ,A_O˜ Q (59) wheregi : ˜EN→R(i =1, . . . ,A_O˜

Q) satisfykgik∞≤1 and depend only ony

1_{, . . . ,}_yBO˜_Q

for someB_O˜

Q

∈_N. Note that such sets generate the weak topology onPshift₍_E˜N₎_.

Forx∈ Em,m∈ {n,n+1, . . .} ∪ {∞}, let Rn(x):= 1 n n−1 X i=0 δ_θi(₍_x|[1···n])per)∈P(E N₎ ₍₆₀₎

be the correspondingn-th empirical letter process measure. Furthermore, for 1≤ j1<· · ·< jn

let ˜ Rn_j 1,...,jn(x):= 1 n n−1 X i=0 δ_θ˜i₍_x_| [1···j1],x|[j1+1···j2],...,x|[_jn−1+1···jn])per ∈P(E˜N) (61)

be then-th empirical word process measure obtained by cuttingxat the cut-points ji. Note that

in this notation (see Eqs.(2)and(4)),

(21)

Proof of Proposition 2. We can assume that H(Q|Q0) < ∞, and hence in view ofLemma 7 we may also assume thatmQ := EQ|Y1| < ∞. Let us first consider a shift ergodic Q. Note

that Q ∈ R ∩Pshift₍_E˜N₎_with

EQL1 < ∞ implies ΨQ = ν⊗N, and hence H(Q|Q0) =

−_EQ logρL1 − H

c

L(Q) by Lemma 4. We can find a neighbourhood O˜Q ⊂ O of the type

defined in(59), and it suffices to restrict toO˜Q. For givenε >0, take the open neighbourhood

U ⊂Pshift₍_EN₎_of_ν⊗N_{guaranteed by}_{Lemma 9. By the strong law, the event}

{Rn(X)∈U for all sufficiently largen}

has probability 1. As P(RN ∈ ˜OQ|X)≥ X 0<j1<···<_jN=[_{m Q N}], ˜ RN_j 1,...,jN(X)∈ ˜OQ N Y i=1 ρji−ji−1, we obtain lim inf N→∞ 1 N logP(RN ∈O|X)≥ −H(Q|Q 0₎₋_ε

byLemma 9. Takeε→0 to conclude the proof in the ergodic case. Now consider a general Q ∈ R ∩ Pshift₍_E˜N₎ _with

EQ|Y1| < ∞. By the Ergodic

Decomposition Theorem (cf e.g. [4], Thm. 5.2.16), we can represent

Q=

Z

Perg₍_E˜N₎

RρQ(d R), (62)

whereρQ is a probability measure onPerg(E˜N). The event

( w− lim L→∞ 1 L L−1 X j=0 δ_θj_κ(_Y₎ =ν⊗N )

is invariant under the (word-level) shift _θ˜ _{and has} _Q_{-probability one; thus by} _{(62), we have} ρQ(R) = 1. Furthermore, as EQ|Y1| = R ER|Y1|ρQ(dR), ρQ must be concentrated on

{Q0 : _EQ|Y1| < ∞}. As R 7→ H(R|Q0)is lower semicontinuous and affine,(62) implies

H(Q|Q0)=R

H(R|Q0) ρQ(dR), andQcan be approximated by finite convex combinations of

Qi ∈R∩Perg(E˜N)in such a way that the corresponding specific relative entropies converge as

well (see e.g. [4], Lemma 5.4.24 and its proof). More precisely, for anyδ >0, we can findn∈_N, λ1, . . . , λn ∈(0,1)withPni=1λi =1 andQi ∈R∩Perg(E˜N)withmQi =EQi |Y1|<∞(so

in particularΨQi =ν⊗N) such that

˜

Q:=λ1Q1+ · · · +λ_nQn∈O and

H(Q|Q0)≥H(Q˜|Q0)−δ=λ₁H(Q1|Q0)+ · · · +λnH(Qn|Q0)−δ. (63)

Let O˜ ⊂ O be an open neighbourhood of Q˜ of the type defined in (59), and let O˜m be

corresponding open neighbourhoods of Qm,m = 1, . . . ,n, but with ε˜i replaced by ε˜i/(2n).

(22)

¯

Nm := [N1mQ1]+ · · · [NmmQm]. Note that by construction for anyx∈ E

¯ Nn _and_j 1<· · ·< j_Nn˜ , ˜ RNm_j ˜ Nm−1+1,j_Nm˜ ₋ 1+2,...,j_Nm˜ x| (_N¯ m−1+1)··· ¯Nm ∈ ˜Om form=1, . . . ,n =⇒ ˜RNn_j˜ 1,...,j_Nn˜ (x)∈ ˜O.

Let Um be a neighbourhood of ΨQm (= ν⊗N) as constructed in Lemma 9 corresponding

to Q = Qm and ε = δ. Applying Lemma 9 separately on the stretches X|[(_N¯

m−1+1)··· ¯Nm],

m = 1, . . . ,n, and ‘glueing together’ the corresponding vectors of cut-points, we obtain from the discussion above that in the event

GN := {R[Nmm_Qm](X|[(_N¯ m−1+1)··· ¯Nm])∈Um,m=1, . . . ,n} we have P(RN∈ ˜O|X)≥ n Y m=1 exp−Nm H(Qm|Q0)+δ ≥ exp−Nλ1H(Q1|Q0)+ · · · +λnH(Qn|Q0)+2δ ≥ exp−NH(Q|Q0)+3δ

when N is sufficiently large. Now∪_M∩_N_≥_MGN occurs almost surely (one can e.g. use large

deviation results for the empirical distribution ofX to see thatP((GN)c)decays exponentially

inN); hence lim inf N→∞ 1 N logP(RN ∈O|X)≥ −H(Q|Q 0₎₋₃_δ. Now takeδ→0.

The following lemma is the combinatorial core of the lower bound; its intuitive content is that for a wordxof length≈ N mQ which looks ‘ΨQ-typical’, there are≈exp(N H_Lc(Q))ways of

cutting it intoN subwords in such a way that a ‘Q-typical’ sequence arises. The ‘price’ for any such pattern of cut-points will then be≈exp(NEQ logρ(|Y1|)).

Lemma 9. Let Q ∈ Perg₍_E˜N₎ _{with m}

Q := EQ[L1] < ∞ be given, and let O˜Q be a

neighbourhood of Q as defined in (59). For anyε > 0 there exists an open neighbourhood

U ⊂Pshift₍_EN₎_of_Ψ

Qand N0∈Nsuch that

N≥ N0, x∈ E[mQN]with R[mQN](x)∈U implies X 0<j1<···<jN=[mQN], ˜ R N j1,...,jN(x)∈ ˜OQ N Y i=1 ρ(ji −ji−1)≥exp N EQ logρL1+H c L(Q)−ε _. (64)

(23)

Proof. Step1. LetO˜0

Q be defined as in(59)withε˜i replaced byε˜i/2 (i = 1, . . . ,AOQ), and

similarlyO˜00

Qwithε˜i replaced byε˜i/4. For M∈N,ε1>0,x∈E

[MmQ]_let JM,ε1(x):=      (j1, . . . ,jM): 0≤ j1<· · ·< jM = [MmQ],R˜Mj1,...,jM(x)∈ ˜O 0 Q, 1 M M X i=1

logρji−ji−1 ∈ [EQlogρL1 −ε1,EQlogρL1+ε1]

     .

This is the set of all cut-vectors which are ‘suitable’ for the given wordx. We claim that for given ε2>0 we can chooseM sufficiently large and pairwise different wordsξ1, . . . , ξL ∈ E[MmQ]

such that L X i=1 ΨQ|[1···MmQ](ξi)≥1−ε2 and (65) |J_M_,ε 1(ξ i₎_{| ≥}_exp _M₍_Hc L(Q)−ε2), i=1, . . . ,L. (66)

In order to check this letQˆ be defined as in(21), recall(dQˆ/dQ)(Y)= |Y1|/mQ. ByLemma 3

and the fact thatQˆ Qwe haveQˆ-a.s.

lim N→∞ 1 N|κ(Y 1_{, . . . ,}_YN₎_{| =}_m Q lim N→∞ 1 N log ˆ Q(κ(Y1, . . . ,YN))= −mQH(ΨQ), lim N→∞ 1 N log ˆ Q(Y1, . . . ,YN)= −H(Q), lim N→∞ 1 N N X i=1 logρ(|Yi|)=_EQ logρ(|Y1|), lim N→∞RN =Q∈ ˜O00 Q.

Thus, for large enoughNwe can findApairwise differentzi ∈ ˜Eand for eachzkwe can choose

Bkdifferent decompositions(yk,j,1, . . . ,yk,j,N)∈ ˜EN, j =1, . . . ,Bk, where

κ(yk,j,1, . . . ,yk,j,N)=zk for eachj,k,

such that each|zk| ∈ [N(mQ−ε1),N(mQ−ε1)],

A

X

k=1

ˆ

(24)

and the following holds fork=1, . . . ,Aand j =1, . . . ,Bk(unless otherwise quantified): ˆ Q(KN=zk)≥exp(−N(mQH(ΨQ)+ε1)), (68) ˆ Q (Y1, . . . ,YN)=(yk,j,1, . . . ,yk,j,N) ≤exp(−N(H(Q)−ε₁)), (69) Bk X j=1 ˆ Q(Y1, . . . ,YN)=(yk,j,1, . . . ,yk,j,N)≥(1−ε₁)Qˆ(KN =zk), (70) 1 N N X i=1 logρ(|yk,j,i|)∈h_EQlogρ(|Y1|)−ε1 2,EQlogρ(|Y 1_|₎₊ε1 2 i , (71) 1 N N−1 X i=1 δ_θ˜i(₍_yk,j,1_,...,_yk,j,N₎per)∈ ˜O00_Q. (72)

Note that(69),(70)and(68)imply for eachkthat

Bke−N(H(Q)−ε1) ≥ Bk X j=1 ˆ Q (Y1, . . . ,YN)=(yk,j,1, . . . ,yk,j,N) ≥ (1−ε₁)Qˆ(KN =zk)≥(1−ε₁)e−N(mQH(ΨQ)+ε1) ≥ exp −N(mQH(ΨQ)+2ε1)

forNlarge enough; hence

Bk≥exp N(H(Q)−mQH(ΨQ)−3ε1)=exp N(HLc(Q)−3ε1)

fork=1, . . . ,AbyLemma 3. Note that this together with(71)and(72)shows that

|J_N_,ε

1(z

k₎_{| ≥}_exp₍_N₍_Hc

L(Q)−3ε1)) fork=1, . . . ,A (73)

(with a notational grain of salt because|zk|is not exactly[N mQ]). This is almost what we need

to prove(65)and(66), except for the slight nuisance that thezk have not exactly length[N mQ]

and the fact that(67)guarantees that one of the zk is very likely to occur as the concatenation of the first N words under Qˆ, whereas (65)speaks about the first [N mQ] letters under ΨQ.

Remembering the Definition(22)ofΨQinvolving Qˆ, this can e.g. be remedied as follows: Pick

Mso large that(67)–(73)are satisfied forN=M. Consider the set of words { ˜ξr :r =1, . . . ,R}

:= {θiκ(yk,j,1, . . . ,yk,j,N):0≤i<|yk,j,1|,j =1, . . . ,Bk, k=1, . . . ,A},

and truncate each of them at[M(mQ−2ε1)]letters. Thus in view of(22)and(67),

R X r=1 ΨQ|[1···M(mQ−2ε1)](ξ˜ r_| [1···M(mQ−2ε1)])≥1−ε1.

Now generate a set{ξi :i =1, . . . ,L}from{ ˜ξr :r =1, . . . ,R}by attaching various suffixes of length[MmQ] − [M(mQ−2ε1)]to eachξ˜r in such a way thatP_iL=1ΨQ|[1···MmQ](ξ

i₎_≥₁₋₂_ε

1.

As eachξi agrees with somezk except for a very short initial piece and a short final piece,(73) implies|J_M,ε

1(z

k₎_{| ≥}_exp _M₍_Hc

L(Q)−4ε1)

for eachiwhenMis large enough. By choosing ε1small enough, this proves(65)and(66).

(25)

Step2. LetA := {ξi :i =1, . . . ,L}denote the set of words of length[MmQ]constructed in

Step 1. ForK ≥ [MmQ](we think ofK [MmQ]) andε3>0 denote the set of allx ∈ EK

such that 1 N#{1≤ j≤ K− [MmQ] :(xj, . . . ,xj+[MmQ]−1)=ξ i_{} −}_Ψ Q|[1···MmQ](ξ i₎ < ε3 / L

for allξi ∈A byD_K,ε₃. Note thatx∈D_K,ε

3 means that the letter sequencexis typical forΨQ

in the sense that the frequency of all the patternsξi (i =1, . . . ,L) chosen above is close to the theoretical value.

We claim that for anyε >0 we can choose the aboveε1, ε2, ε3sufficiently small and L,M

sufficiently large andN0∈Nsuch that

N ≥N0,x∈D_[_mQ_N_]_,ε 3 =⇒ X 0<j1<···<jN=[mQN], ˜ R N_j 1,...,jN(x)∈ ˜OQ N Y i=1 ρji−ji−1 ≥exp N EQ logρL1+H c L(Q)−ε . (74)

Note that(74)implies the claim of the lemma by choosingU as

n Ψ ∈Pshift(EN): |Ψ|_[1···_m QM](ξ i₎₋_Ψ Q|[1···mQM](ξi)|< ε3/(2L),i=1, . . . ,L o .

Step3. It remains to prove(74); the idea is as follows:x∈D_[_m

QN],ε3 implies that we can cover

x with≈ N/Mnon-overlapping patterns fromA := {ξi,i =1, . . . ,L}, up to a small fraction of remaining ‘gaps’. On each of the patterns from the ‘almost covering’, we have by construction sufficiently many choices of ‘good cut-points’, and the probability that the jumps bridge exactly the given ‘gaps’ is controlled on the exponential scale because the total gap length is only a small fraction ofN. Here are the details:

x∈D_[_m

QN],ε3 implies

#j ≤ [mQN] − [mQM] :one of the words fromA starts at j ≥ [N mQ](1−ε2−ε3)

(asPL

i=1(ΨQ(ξi)−ε3/L)≥1−ε2−ε3). Letn1:=n0∨(2mQ/δ0), wheren0,δ0are as given

byLemma 10. WhenNis large enough, we can find ˜

ε∈ [ε/2,2ε] (75)

(˜εwill implicitly depend onN because we require certain expressions below to be integers, but (75)will be satisfied independently ofN) such that

k=(1− ˜ε)N

M ∈N

and we can findkpositions

n1≤r1<· · ·<rk ≤ [N mQ] −n1

where one of the patterns fromA is written onx, i.e.

x|[rj···rj+[MmQ]−1]=ξ

i _{for some}_i _{∈ {}₁_{, . . . ,}_L_}_,_j₌₁_{, . . . ,}_k_,

and

(26)

Note that between the end of the(j−1)-th and the beginning of the j-th subword fromA onx, there is a ‘gap’ of length

sj :=rj −rj−1− [mQM] (≥n1).

The total length of these gaps is

s1+ · · · +sk+1= [N mQ] −(1− ˜ε)

N

M[MmQ] =ε 0

N mQ.

The display above implicitly definesε0; whenN(andM) are large enough, it will satisfy ε0_∈

˜

ε(1−δ₀/2),ε(˜ 1+δ₀/2), (76)

whereδ0is as given byLemma 10.

The sum appearing in(74)hasN summation variables, and on each of thek‘good subwords ofx’ fixed above, we will useMof them. Thus there remain

N−k M= ˜εN

summation variables which we can use to ‘fill the gaps’. We can findm1, . . . ,mk+1 ∈ Nsuch

that m1+ · · · +mk+1= ˜εN and (1−δ₀) sj mQ ≤mj ≤(1+δ0) sj mQ, j =1, . . . ,k+1.

To see this consider firstm˜i :=(si/mQ)(ε/ε˜ 0). Then we havePm˜i = ˜εN, but them˜i need not

be integers. On the other hand we have

si mQ( 1−δ₀)≤ ˜mi−1≤ d ˜mie ≤ ˜mi+1≤ si mQ( 1+δ₀) andS := Pk+1

i=1(m˜i − d ˜mie) ∈ {0,1, . . . ,k+1}. Then put e.g. mi := d ˜mie +1 ifi ≤ S,

mi := d ˜mieotherwise. In order to generate vectors(j1, . . . ,jN)suitable for the right hand side

of(74), we can proceed as follows: On the ‘good subwords’, choose anyM-vector of cut-points from the correspondingJM,ε1, and usemi summation variables to generate the ‘jump’ over the

i-th gap. UsingLemma 10, we can choose for each gapmi cut-lengths whose total probability

underρis at least exp(−C si). By the definition ofJM,ε1, any such vector(j1, . . . ,jN)will have

the property that RN_j

1,...,jN(x)∈ ˜OQ because the contribution to R N

j1,...,jN(x)from the ‘gaps’ is

negligible. Furthermore, again by the definition ofJM,ε1and the choice of the cut-points on the

gaps, we have N Y i=1 ρji−ji−1 ≥ exp k M EQlogρL1 −ε1 ×exp −Cε0mQN =exp N (1− ˜ε)(_EQlogρL1 −ε1)−ε 0 C mQ

for each such choice. Finally, by(66)there are at least

(27)

admissible choices of cut-points on the good subwords. Combining, we obtain that the right hand side of(74)is at least exp N (1− ˜ε)(_EQlogρL1+H c L(Q)−ε1−ε2)−ε 0 C mQ

wheneverN is sufficiently large.

The following lemma is a standard result for aperiodic renewal processes:

Lemma 10. Let mQ ∈ [1,∞). There exist n0 ∈ N, C >0 andδ0 >0such that for any pair

(m,n)∈_N2satisfying n≥n0, n mQ (1−δ₀)≤m≤ n mQ (1+δ₀)

there are`1, . . . , `mwith

`1+ · · · +`m =n and

m

Y

i=1

ρ`i ≥exp(−C n).

Remark 11. Note that the proof ofProposition 2viaLemma 9is rather combinatoric. At least in the case of a neighbourhood of anergodic Q ∈ R, one can use the coupling betweenX and a shift ofκ(Y)under Qˆ given byRemark 6to employ a ‘conditional tilting’ argument which is more in the probabilistic spirit of classical proofs of lower large deviation bounds.

Acknowledgements

The author would like to thank Andreas Greven and Frank den Hollander for very stimulating discussions on this subject.

References

[1] Erwin Bolthausen, Frank den Hollander, Localization transition for a polymer near an interface, Ann. Probab. 25 (3) (1997) 1334–1366.

[2] Francis Comets, Large deviation estimates for a conditional probability distribution. Applications to random interaction Gibbs measures, Probab. Theory Related Fields 80 (3) (1989) 407–432.

[3] Amir Dembo, Ofer Zeitouni, Large Deviations Techniques and Applications, Jones and Bartlett Publishers, Boston, 1993.

[4] Jean-Dominique Deuschel, Daniel W. Stroock, Large Deviations, Academic Press, Boston, 1989. [5] Richard S. Ellis, Entropy, Large Deviations, and Statistical Mechanics, Springer-Verlag, New York, 1985. [6] Hans F¨ollmer, Steven Orey, Large deviations for the empirical field of a Gibbs measure, Ann. Probab. 16 (3) (1988)