Large Deviations* - Probability: Theory and Examples. Rick Durrett. Edition 4.1, April 21, 2013

2.6 Large Deviations*

LetX1, X2, . . .be i.i.d. and letSn=X1+· · ·+Xn. In this section, we will investigate

the rate at which P(Sn > na) →0 for a > µ=EXi. We will ultimately conclude

that if themoment-generating functionϕ(θ) =Eexp(θXi)<∞ for someθ >0, P(Sn≥na)→0 exponentially rapidly and we will identify

γ(a) = lim

n→∞

nlogP(Sn≥na)

Our first step is to prove that the limit exists. This is based on an observation that will be useful several times below. Letπn =P(Sn≥na).

πm+n≥P(Sm≥ma, Sn+m−Sm≥na) =πmπn

sinceSm andSn+m−Smare independent. Lettingγn = logπn transforms multipli- cation into addition.

Lemma 2.6.1. Ifγm+n≥γm+γn then as n→ ∞,γn/n→supmγm/m.

Proof. Clearly, lim supγn/n≤supγm/m. To complete the proof, it suffices to prove that for anymliminfγn/n≥γm/m. Writingn=km+`with 0≤` < mand making repeated use of the hypothesis givesγn≥kγm+γ`. Dividing byn=km+`gives

γ(n) n ≥ _km km+` _γ₍_m₎ m + γ(`) n

Lettingn→ ∞and recallingn=km+`with 0≤` < mgives the desired result. Lemma 2.6.1 implies that limn→∞_n1logP(Sn ≥na) =γ(a) exists≤0. It follows

from the formula for the limit that

P(Sn≥na)≤enγ(a) (2.6.1) The last two observations give us some useful information aboutγ(a).

Exercise 2.6.1. The following are equivalent: (a)γ(a) =−∞, (b)P(X1 ≥a) = 0, and (c)P(Sn≥na) = 0 for alln.

Exercise 2.6.2. Use the definition to conclude that if λ ∈ [0,1] is rational then

γ(λa+ (1−λ)b)≥λγ(a) + (1−λ)γ(b). Use monotonicity to conclude that the last relationship holds for allλ∈[0,1] soγis concave and hence Lipschitz continuous on compact subsets ofγ(a)>−∞.

The conclusions above are valid for any distribution. For the rest of this section, we will suppose:

(H1) ϕ(θ) =Eexp(θXi)<∞for some θ >0

Let θ+ = sup{θ :φ(θ)<∞}, θ− = inf{θ :φ(θ) <∞} and note that φ(θ)<∞ for

θ∈(θ−, θ+). (H1) implies thatEXi+<∞so µ=EX+−EX−∈[−∞,∞). If θ >0

Chebyshev’s inequality implies

eθnaP(Sn ≥na)≤Eexp(θSn) =ϕ(θ)n or lettingκ(θ) = logϕ(θ)

P(Sn≥na)≤exp(−n{aθ−κ(θ)}) (2.6.2) Our first goal is to show:

Lemma 2.6.2. Ifa > µandθ >0 is small thenaθ−κ(θ)>0.

Proof. κ(0) = logϕ(0) = 0, so it suffices to show that (i) κis continuous at 0, (ii) differentiable on (0, θ+), and (iii)κ0(θ)→µas θ→0. For then

aθ−κ(θ) =

Z θ

a−κ0(x)dx >0 for smallθ.

LetF(x) =P(Xi≤x). To prove (i) we note that if 0< θ < θ0< θ−

eθx≤1 +eθ0x ₍_∗₎

so by the dominated convergence theorem asθ→0

eθxdF →

1dF = 1 To prove (ii) we note that if|h|< h0 then

|ehx−1|= Z hx 0 eydy ≤ |hx|eh0x

so an application of the dominated convergence theorem shows that

ϕ0(θ) = lim h→0 ϕ(θ+h)−ϕ(θ) h = lim h→0 Z _ehx₋₁ h e θx dF(x) = Z xeθxdF(x) forθ∈(0, θ+)

From the last equation, it follows thatκ(θ) = logφ(θ) hasκ0(θ) =φ0(θ)/φ(θ). Using (∗) and the dominated convergence theorem gives (iii) and the proof is complete.

Having found an upper bound on P(Sn ≥ na), it is natural to optimize it by

finding the maximum ofθa−κ(θ):

dθ{θa−logϕ(θ)}=a−ϕ

0₍_θ₎_/ϕ₍_θ₎

so (assuming things are nice) the maximum occurs whena=ϕ0(θ)/ϕ(θ). To turn the parenthetical clause into a mathematical hypothesis we begin by defining

Fθ(x) = 1

ϕ(θ)

Z x

−∞

eθydF(y)

wheneverφ(θ) <∞. It follows from the proof of Lemma 2.6.2 that ifθ ∈(θ−, θ+),

Fθis a distribution function with mean

Z x dFθ(x) = 1 ϕ(θ) Z ∞ −∞ xeθxdF(x) = ϕ 0₍_θ₎ ϕ(θ)

Repeating the proof in Lemma 2.6.2, it is easy to see that ifθ∈(θ−, θ+) then

φ00(θ) =

Z ∞

−∞

2.6. LARGE DEVIATIONS* 77 So we have d dθ ϕ0(θ) ϕ(θ) = ϕ00(θ) ϕ(θ) − _ϕ0₍_θ₎ ϕ(θ) 2 = Z x2dFθ(x)− Z x dFθ(x) 2 ≥0 since the last expression is the variance ofFθ. If we assume

(H2) the distributionF is not a point mass atµ

then ϕ0(θ)/ϕ(θ) is strictly increasing and aθ−logφ(θ) is concave. Since we have

ϕ0(0)/ϕ(0) =µ, this shows that for eacha > µthere is at most oneθa ≥0 that solves

a=ϕ0(θa)/ϕ(θa), and this value of θmaximizesaθ−logϕ(θ). Before discussing the existence ofθa we will consider some examples.

Example 2.6.1. Normal distribution.

eθx(2π)−1/2exp(−x2/2)dx= exp(θ2/2)

(2π)−1/2exp(−(x−θ)2/2)dx

The integrand in the last integral is the density of a normal distribution with meanθ

and variance 1, soϕ(θ) = exp(θ2/2),θ∈(−∞,∞). In this case,ϕ0(θ)/ϕ(θ) =θ and

Fθ(x) =e−θ

2_/2Z x

−∞

eθy(2π)−1/2e−y2/2dy

is a normal distribution with meanθand variance 1.

Example 2.6.2. Exponential distribution with parameterλ. Ifθ < λ

Z ∞ 0 eθxλe−λxdx=λ/(λ−θ) ϕ0(θ)ϕ(θ) = 1/(λ−θ) and Fθ(x) = λ λ−θ Z x 0

eθyλe−λydy

is an exponential distribution with parameterλ−θand hence mean 1/(λ−θ). Example 2.6.3. Coin flips. P(Xi= 1) =P(Xi =−1) = 1/2

ϕ(θ) = (eθ+e−θ)/2

ϕ0(θ)/ϕ(θ) = (eθ−e−θ)/(eθ+e−θ)

Fθ({x})/F({x}) =eθx_/φ₍_θ_{) so}

Fθ({1}) =eθ/(eθ+e−θ) and Fθ({−1}) =e−θ/(eθ+e−θ)

Example 2.6.4. Perverted exponential. Letg(x) =Cx−3e−xforx≥1,g(x) = 0 otherwise, and chooseC so thatgis a probability density. In this case,

ϕ(θ) =

eθxg(x)dx <∞

if and only ifθ≤1, and whenθ≤1, we have

ϕ0(θ) ϕ(θ) ≤ ϕ0(1) ϕ(1) = Z ∞ 1 Cx−2dx Z ∞ 1 Cx−3dx= 2

Recallθ+= sup{θ:ϕ(θ)<∞}. In Examples 2.6.1 and 2.6.2, we haveφ0(θ)/φ(θ)↑ ∞ as θ ↑ θ+ so we can solve a = φ0(θ)/φ(θ) for any a > µ. In Example 2.6.3, φ0(θ)/φ(θ)↑1 asθ→ ∞, but we cannot hope for much more sinceF and henceFθis

supported on{−1,1}.

Exercise 2.6.3. Letxo= sup{x:F(x)<1}. Show that ifxo<∞thenφ(θ)<∞

for allθ >0 andφ0(θ)/φ(θ)→xoas θ↑ ∞.

Example 2.6.4 presents a problem since we cannot solvea=ϕ0(θ)/ϕ(θ) whena >2. Theorem 2.6.5 will cover this problem case, but first we will treat the cases in which we can solve the equation.

Theorem 2.6.3. Suppose in addition to (H1) and (H2) that there is a θa ∈(0, θ+)

so that a=ϕ0(θa)/ϕ(θa). Then, as n→ ∞,

n−1logP(Sn≥na)→ −aθa+ logϕ(θa)

Proof. The fact that the limsup of the left-hand side ≤ the right-hand side follows from (2.6.2). To prove the other inequality, pick λ ∈ (θa, θ+), let Xλ

1, X2λ, . . . be

i.i.d. with distributionFλand letSλ

n =X1λ+· · ·+Xnλ. WritingdF/dFλfor the Radon-

Nikodym derivative of the associated measures, it is immediate from the definition that dF/dFλ =e−λxϕ(λ). If we let F_λn and Fn denote the distributions of S_nλ and

Sn, then Lemma 2.6.4. dF n dFn λ =e−λxϕ(λ)n.

Proof. We will prove this by induction. The result holds whenn= 1. Forn >1, we note that Fn=Fn−1∗F(z) = Z ∞ −∞ dFn−1(x) Z z−x −∞ dF(y) = Z dF_λn−1(x) Z dFλ(y) 1(x+y≤z)e−λ(x+y)ϕ(λ)n =E1_(Sλ n−1+Xnλ≤z)e −λ(Sλ n−1+Xnλ)_ϕ₍_λ₎n = Z z −∞ dF_λn(u)e−λuϕ(λ)n

where in the last two equalities we have used Theorem 1.6.9 for (Sλ

n−1, Xnλ) and Sλ

Ifν > a, then the lemma and monotonicity imply (∗) P(Sn ≥na)≥

Z nν

e−λxϕ(λ)ndFλn(x)≥ϕ(λ)ne−λnν(Fλn(nν)−Fλn(na)) Fλ has meanϕ0(λ)/ϕ(λ), so if we have a < ϕ0(λ)/ϕ(λ) < ν, then the weak law of large numbers implies

Fλn(nν)−Fλn(na)→1 asn→ ∞

From the last conclusion and (∗) it follows that lim inf

n→∞ n

−1_log_P₍_S

n > na)≥ −λν+ logφ(λ)

2.6. LARGE DEVIATIONS* 79 To get a feel for what the answers look like, we consider our examples. To prepare for the computations, we recall some important information:

κ(θ) = logφ(θ) κ0(θ) =φ0(θ)/φ(θ) θa solvesκ0(θa) =a γ(a) = lim

n→∞(1/n) logP(Sn≥na) =−aθa+κ(θa)

Normal distribution(Example 2.6.1)

κ(θ) =θ2/2 κ0(θ) =θ θa=a γ(a) =−aθa+κ(θa) =−a2/2

Exercise 2.6.4. Check the last result by observing thatSnhas a normal distribution

with mean 0 and variancen, and then using Theorem 1.2.3. Exponential distribution(Example 2.6.2) with λ= 1

κ(θ) =−log(1−θ) κ0(θ) = 1/(1−θ) θa= 1−1/a γ(a) =−aθa+κ(θa) =−a+ 1 + loga

With these two examples as models, the reader should be able to do

Exercise 2.6.5. Let X1, X2, . . . be i.i.d. Poisson with mean 1, and let Sn = X1+

· · ·+Xn. Find limn→∞(1/n) logP(Sn ≥ na) for a > 1. The answer and another

proof can be found in Exercise 3.1.4.

Coin flips(Example 2.6.3). Here we take a different approach. To find the θ that makes the mean of Fθ = a, we set Fθ({1}) = eθ/(eθ+e−θ) = (1 +a)/2. Letting

x=eθ gives

2x= (1 +a)(x+x−1) (a−1)x2+ (1 +a) = 0 Sox=p(1 +a)/(1−a) andθa = logx={log(1 +a)−log(1−a)}/2.

φ(θa) = e θa₊_e−θa 2 = eθa 1 +a= 1 p (1 +a)(1−a)

γ(a) =−aθa+κ(θa) =−{(1 +a) log(1 +a) + (1−a) log(1−a)}/2

In Exercise 3.1.3, this result will be proved by a direct computation. Since the formula forγ(a) is rather ugly, the following simpler bound is useful.

Exercise 2.6.6. Show that for coin flipsϕ(θ)≤exp(ϕ(θ)−1)≤exp(βθ2_{) for}_θ_≤₁

where β =P∞

n=11/(2n)! ≈0.586, and use (2.6.2) to conclude that P(Sn ≥an) ≤

exp(−na2_/₄_β_{) for all} _a _∈ _[0_,_{1]. It is customary to simplify this further by using} β≤P∞

n=12

−n_{= 1.}

Turning now to the problematic values for which we cannot solvea=φ0(θa)/φ(θa), we begin by observing that if xo = sup{x:F(x)<1} andF is not a point mass at

xo then φ0(θ)/φ(θ)↑ x0 as θ ↑ ∞but φ0(θ)/φ(θ)< x0 for all θ < ∞. However, the result fora=xo is trivial:

nlogP(Sn≥nxo) = logP(Xi=xo) for alln

Exercise 2.6.7. Show that asa↑xo,γ(a)↓logP(Xi=xo).

Theorem 2.6.5. Suppose xo = ∞, θ+ < ∞, and ϕ0(θ)/ϕ(θ) increases to a finite limit a0 as θ↑θ+. Ifa0≤a <∞

n−1log P(Sn≥na)→ −aθ++ logϕ(θ+)

i.e., γ(a)is linear fora≥a0.

Proof. Since (logϕ(θ))0 =ϕ0(θ)/ϕ(θ), integrating from 0 toθ+shows that log(ϕ(θ+))<

∞. Lettingθ=θ+ in (2.6.2) shows that the limsup of the left-hand side≤the right- hand side. To get the other direction we will use the transformed distributionFλ, for

λ=θ+. Lettingθ↑θ+ and using the dominated convergence theorem forx≤0 and the monotone convergence theorem forx≥0, we see thatFλ has meana0. From (∗) in the proof of Theorem 2.6.3, we see that ifa0≤a < ν=a+ 3

P(Sn≥na)≥ϕ(λ)ne−nλν(Fλn(nν)−F n λ(na)) and hence 1 nlogP(Sn≥na)≥logϕ(λ)−λν+ 1 nlogP(S λ n∈(na, nν])

LettingX1λ, X2λ, . . .be i.i.d. with distributionFλandSλn=X1λ+· · ·+Xnλ, we have P(S_nλ∈(na, nν])≥P{S_nλ₋₁∈((a0−)n,(a0+)n]} ·P{X_nλ∈((a−a0+)n,(a−a0+ 2)n]} ≥1 2P{X λ n ∈((a−a0+)n,(a−a0+)(n+ 1)]}

for largenby the weak law of large numbers. To get a lower bound on the right-hand side of the last equation, we observe that

lim sup n→∞ 1 nlogP(X λ 1 ∈((a−a0+)n,(a−a0+)(n+ 1)]) = 0

for if the lim sup was<0, we would haveEexp(ηX1λ)<∞for someη >0 and hence Eexp((λ+η)X1)<∞, contradicting the definition ofλ=θ+. To finish the argument

now, we recall that Theorem 2.6.1 implies that lim

n→∞

nlogP(Sn≥na) =γ(a)

exists, so our lower bound on the lim sup is good enough.

By adapting the proof of the last result, you can show that (H1) is necessary for exponential convergence:

Exercise 2.6.8. SupposeEXi= 0 andEexp(θXi) =∞for allθ >0. Then

nlogP(Sn≥na)→0 for alla >0

Exercise 2.6.9. SupposeEXi= 0. Show that if >0 then lim inf

n→∞ P(Sn≥na)/nP(X1≥n(a+))≥1

Chapter 3

Central Limit Theorems

The first four sections of this chapter develop the central limit theorem. The last five treat various extensions and complements. We begin this chapter by considering special cases of these results that can be treated by elementary computations.

3.1 The De Moivre-Laplace Theorem

LetX1, X2, . . .be i.i.d. withP(X1= 1) =P(X1=−1) = 1/2 and letSn=X1+· · ·+ Xn. In words, we are betting $1 on the flipping of a fair coin andSn is our winnings

at timen. Ifnandkare integers

P(S2n= 2k) = 2n n+k 2−2n

sinceS2n = 2k if and only if there aren+k flips that are +1 and n−k flips that

are −1 in the first 2n. The first factor gives the number of such outcomes and the second the probability of each one. Stirling’s formula (see Feller, Vol. I. (1968), p. 52) tells us

n!∼nne−n√2πn asn→ ∞ (3.1.1) wherean∼bn meansan/bn →1 asn→ ∞, so

₂_n n+k = (2n)! (n+k)!(n−k)! ∼ (2n) 2n (n+k)n+k₍_n₋_k₎n−k · (2π(2n))1/2 (2π(n+k))1/2₍₂_π₍_n₋_k₎₎1/2 and we have ₂_n n+k 2−2n ∼ 1 + k n −n−k · 1−k n −n+k ·(πn)−1/2· 1 + k n −1/2 · 1−k n −1/2 (3.1.2) The first two terms on the right are

= 1−k 2 n2 −n · 1 +k n −k · 1−k n k

A little calculus shows that:

Lemma 3.1.1. Ifcj→0,aj→ ∞andajcj→λthen(1 +cj)aj _→_eλ_.

Proof. As x → 0, log(1 +x)/x → 1, so ajlog(1 +cj) → λ and the desired result follows.

Exercise 3.1.1. Generalize the last proof to conclude that if max1≤j≤n|cj,n| →0,

Pn j=1cj,n→λ, and supn Pn j=1|cj,n|<∞then Qn j=1(1 +cj,n)→e λ_.

Using Lemma 3.1.1 now, we see that if 2k=x√2n, i.e.,k=xpn/2, then

1−k 2 n2 −n = 1−x2/2n−n →ex2/2 1 + k n −k =1 +x/√2n −x√n/2 →e−x2/2 1−k n k =1−x/√2n x√n/2 →e−x2/2

For this choice ofk,k/n→0, so

1 +k n −1/2 · 1−k n −1/2 →1 and putting things together gives:

Theorem 3.1.2. If2k/√2n→xthen P(S2n = 2k)∼(πn)−1/2e−x

2_/2

. Our next step is to compute

P(a√2n≤S2n≤b√2n) = X

m∈[a√2n,b√2n]∩2Z

P(S2n=m)

Changing variablesm=x√2n, we have that the above is

≈ X

x∈[a,b]∩(2Z/√2n)

(2π)−1/2e−x2/2·(2/n)1/2

where 2Z/√2n= {2z/√2n: z ∈ Z}. We have multiplied and divided by √2 since the space between points in the sum is (2/n)1/2_{, so if} _n_{is large the sum above is}

≈

Z b

(2π)−1/2e−x2/2dx

The integrand is the density of the (standard) normal distribution, so changing no- tation we can write the last quantity asP(a≤χ≤b) whereχ is a random variable with that distribution.

It is not hard to fill in the details to get:

Theorem 3.1.3. The De Moivre-Laplace Theorem. If a < b then asm→ ∞

P(a≤Sm/√m≤b)→

Z b

3.2. WEAK CONVERGENCE 83

In document Probability: Theory and Examples. Rick Durrett. Edition 4.1, April 21, 2013 (Page 81-89)