The concept of contiguity plays an essential role in developing the distribution of test statistics under a sequence of local alternatives. For this, we need to consider a sequence
of hypotheses,Pn:=Pnθ0 and Qn:=Qθnn where θn=θ0+δ/
p (n)
Pn is said to be contiguous to Qn if for any sequence of eventsAn [Pn(An)−→0]⇒[Qn(An)−→0]
The relation of contiguity to asymptotic properties of test statistics under alternate hypotheses is established by the celebrated Le Cam’s third lemma. In most cases,as in the continuous time HMM, the derivations of the distributions of these statistics are very difficult. The following lemma helps us surmount these obstacles:
Le Cam’s Third Lemma: IfQn is contiguous to Pn and the pair of statistics (Sn, Ln) is asymptotically normal with mean (µ1, µ2, σ12, σ22, σ12) with µ2=−12σ22 andLn being the likelihood ratio, then under Qn
Sn is asymptotically normal (µ1+σ12, σ12)
Thus we can translate the asymptotic normality under null distributions to that under alternatives.
Le Cam’s first lemma gives us a very useful and widely used characterization of contiguity for a sequence of hypotheses.
Le Cam’s first lemma: If underPn the log-likelihood ratioLn is asymptotically log-normal (−1
2(σ
2), σ2) thenQ
n is contiguous to Pn
This lemma is used in establishing contiguity for a class of iid models by essentially expanding the log likelihood around the null by a Taylor’s expansion. We extend that approach to the continuous time HMM model. For this,we approximate the behavior of log-likelihood as a ergodic sequence of stationary increments. Specifically we invoke the following propositions of Douc et al:
• Proposition 1: For all θ, there exists a stationary ergodic sequencehk,∞such that
|ln(θ)−Pnk=0hk,∞(θ)| →0 as n→ ∞
• Proposition 2: hk,∞ is obtained as a uniform limit of the continuous and
the gradient ash0k,∞(θ) .
These propositions helps us mimic the Taylor’s expansion argument as in the iid model. However our proof is not identical. We employ the trick of approximating the log likelihoods to a stationary sequence-a property that holds for our models. We then combine theorems of Douc et al that guarantee certain properties of the stationary processes and observed information. We then use the latter in our main proof to establish contiguity properties. We write ln(θn)−ln(θ0) = [ln(θn)− Pn k=0hk,∞(θn)]+ [−ln(θ0) +Pnk=0hk,∞(θ0)] + [Pnk=0hk,∞(θn)−Pnk=0hk,∞(θ0)]
The first two term tend to 0 by Proposition 1. For the third term we can write Pn
k=0hk,∞(θn)−Pnk=0hk,∞(θ0)
=Pnk=0[h0k,∞(θ0) + 1/2(θ0−θn)2[hk,∞(θ0)00+r(k, θ0)
Now recall thatθn=θ0+δ/
p
(n) and note that r(k, θ0)−→0 as θn−→θ0
The ergodic and stationarity property ofhk,∞(θ) was proved by Leroux(1992). The
ergodicity and stationarity ofhk,∞(θ)0 and hk,∞(θ)00 is a consequence of theorems 4 and 5
of Douc et al.
So from Birkoff’s ergodic theorem we have Pnk=0r(k, θ0)/n→0 From section 5.4 on
normality we know that the score function i.eln(θ0)0/
√
n=Pnk=0[h0k,∞(θ0)/
√ n] is asymptotically normal. Thus we have
∆log(Ln) =δN(0, J0)−
1 2δ
2J 1
For contiguity we need to showJ0==J1 asymptotically.
whereJ0 =Eθ0[h
0(1,∞, θ
0)h0t(1,∞, θ0)] andJ1 =limn→∞n−1∇2θln(θ0) which is the limit
of the observed information score .
The concluding part of this section establishes the asymptotic equivalence of the observed score informationJ1 and the asymptotic covariance matrix J0 ( Note that in iid
Recall that we had defined h0(k,0, x0, θ) =Eθ[ k X i=1 φθ(xi−1, xi, yi)|y0:k, x0 =x]−Eθ[ k X i=1 φθ(xi−1, xi, yi)|y0:k−1, x0 =x] where φθ(x, x0, y) =∇θlog[qθ(x, x0)gθ(x0, y)]
We drop the subscriptsx0 and θfor notational convenience. Note that
∇2θln(θ0) ==Eθ[ n X i=l φθ(xi−1, xi, yi)|y0:k, x0] +varθ[ n X i=1 φθ(xi−1, xi, yi)|y0:k, x0]
As in the expression of the score function we can break up the above as Eθ[Pni=lφθ(xi−1, xi, yi)|y0:k, x0 =x] =Pnk=1(Eθ[Pki−1φθ(xi−1, xi, yi)|y0:k, x0] −Eθ[Pki−−11φθ(xi−1, xi, yi)|y0:k−1, x0] and varθ[Pni=1φθ(xi−1, xi, yi)|y0:k, x0 =x] =Pnk=1(varθ[Pki=1φθ(xi−1, xi, yi)|y0:k, x0] −(varθ[Pki=1−1φθ(xi−1, xi, yi)|y0:k−1, x0] Define τ1,k(θ) =varθ[ k X i=1 φθ(xi−1, xi, yi)|y0:k−1, x0]
From Proposition 5 of Douc et al, under the assumptionsEθ[supxsupθφ(θ, x1, Y1)]<∞
and Eθ[supxsupθφ(θ, x1, Y1)]2 <∞;h01,k(θ) and τ1,k(θ) both have limits as k −→ ∞, Pθ - a.s. The assumptions are trivially true from the boundedness of the transition function and the emission densities. Let hk,∞(θ) and τk,∞(θ) denote these limits. It follows from
the definitions above that [hk,∞]∞k=1 and [τk,∞]∞k=1] arePθ∗-stationary and ergodic. Also,
the limit of the observed Fisher information will be -Eθ0[h0,∞(θ0) +τ0,∞(θ0)] . This is
=Eθ0[h
0(1,∞, θ
0)h0t(1,∞, θ0)] Thus we can conclude that
∆log(Ln) =δN(0, J0)−12δ2J0
Hidden Markov models admit a contiguous structure. The equivalence of the asymptotic information and Fisher’s matrix now yields the asymptotic normality of the MLE. To see this we recall the result of asymptotic normality of the score function: n−1/2∇θln(θ∗)→N(0, J(θ∗) weakly.
Now we can apply a Taylor’s expansion around θ∗ to get
0 =∇θln( ˆθn) = (5.5.1) ∇θln( ˆθ∗) +ln[θ∗+t( ˆθn−θ∗)]( ˆθn−θ∗) (5.5.2)
⇒n1/2( ˆθn−θ∗) =−n−1ln[θ∗+t( ˆθn−θ∗)]−1n−1/2∇θln(θ∗) (5.5.3)
By strong consistency we have ˆθn→θ∗ almost surely, so the first factor converges to
J(θ∗)−1 a.s. The second factor by virtue of the equivalence converges to N(0, J(θ∗))
.Hence the following result
n1/2( ˆθn−θ∗)→N(0, J−1(θ∗))
which is the standard asymptotic result of the MLE.