Integer-Valued Time Series.

(1)

(2)

(3)

P

ROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Universiteit van Tilburg, op gezag van de rector magnificus, prof.dr. F.A. van der Duyn Schouten, in het openbaar te verdedigen ten overstaan van een door het college voor promoties aangewezen com-missie in de aula van de Universiteit op woensdag 7 november 2007 om 14.15 uur door

R

AMON VAN DEN

A

KKER

,

(4)

OVERIGE COMMISSIELEDEN: prof.dr. John H.J. Einmahl

prof.dr. Marc Hallin prof.dr. Chris A.J. Klaassen prof.dr. Brendan P.M. McCabe dr. Johan J.J. Segers

THOMAS STIELTJES INSTITUTE FOR MATHEMATICS

(5)

(6)

(7)

First of all I want to express my gratitude and appreciation to my academic advisors, Feico Drost and Bas Werker. I really enjoyed our cooperation and ap-preciate their kind support. I also enjoyed the collaboration with Johan Segers and Marc Hallin. A lot of thanks to John Einmahl, Marc Hallin, Chris Klaassen, Brendan McCabe, and Johan Segers for their willingness to serve on my thesis committee. I am grateful to my colleagues of the department of Econometrics & OR for the relaxed and stimulating working environment. In particular, I want to thank Alex, Hendri, and Ruud. Finally, but above all, I thank Vera, Aitor, Eddy, Grizzly, my parents, and Daan for their love and support.

(8)

(9)

This thesis consists of two parts. The first part contributes statistical method-ology for nonnegative integer-valued time series. The second part of this thesis consists of two chapters. One chapter is concerned with the development of ef-ficient estimators of the marginal distribution functions from multivariate data if one has knowledge on the dependence structure. The other chapter considers semiparametric estimation for general (continuous) time series models with innovations that are not necessarily independently and identically distributed. A short overview of both parts is presented below.

Part IIn many sciences one encounters nonnegative discrete valued time se-ries, often as counts of events or objects at consecutive points in time. Espe-cially in economics and medicine many interesting variables are (nonnegative) integer-valued. For example: the number of transactions in SNS-Reaal during each day, the number of patients in a hospital at the end of the day, the num-ber of claims an insurance company receives during each day, the numnum-ber of epileptic seizures a patient suffers each day, etcetera. Hence the need for ade-quate probabilistic models and statistical techniques for nonnegative discrete valued time series is apparent. However, until the early eighties this area of re-search did not attract much attention. As possible explanation McKenzie (2003) mentions that modeling discrete valued time series is a challenging topic in time series analysis since most traditional representations of dependence be-come either impossible or impractical. The last two decades there have been attempts to develop suitable classes of models; the class of INteger-valued Au-toRegressive (INAR) processes can, presently, be considered as the major model for discrete valued time series. Part I of this thesis contributes statistical meth-ods for INAR processes. Chapter 1 contains some probabilistic results on INAR processes. The existence of a strictly stationary solution, the existence of mo-ments under the stationary distribution, and the (uniform) ergodicity of INAR

(10)

processes is investigated. In Chapter 2 parametric INAR processes are discussed: the innovation distributionG belongs to a parametric family. The main result of this chapter is that parametric INAR models enjoy the Local Asymptotic Nor-mality (LAN) property. The proofs are made tractable by a certain representa-tion of the transirepresenta-tion-scores, which is motivated by an informarepresenta-tion-loss inter-pretation of the model. Furthermore, a new computationally attractive, asymp-totically efficient estimator of the parameters is provided. Using a parametric model exposes the researcher to misspecification. Therefore Chapter 3 consid-ers a semiparametric model, where hardly any assumptions are made on the innovation distribution. The focus is on efficient estimation of the Euclidean parameters as well as the distribution of the innovations. Even inefficient es-timation of the innovation distribution has, to my best knowledge, not been addressed before. A possible explanation for this is that, even if the parame-ters are known, the innovations cannot be calculated from the observations. Consequently, estimation of the innovation distribution cannot be based on residuals (as is the case for AR processes). However, estimation of the inno-vation distribution is, just as for standard AR models, an important topic. For INAR processes this might be even more important, since in some applications the innovation distribution has a physical interpretation. We provide an esti-mator which might be viewed upon as a nonparametric maximum likelihood estimator. It turns out that we cannot prove efficiency by standard semipara-metric methodology. Efficiency is proved by using the special representation of the limit distribution. In Chapters 2 and 3 the models only considered the ‘stationary part’ of the parameter space. To analyze the INAR model on the boundary of the parameter space, Chapter 4 considers a nearly nonstationary INAR(1) model and derives its limit experiment (in the Le Cam framework). The main result of this chapter is that this limit experiment is based on one ob-servation from a Poisson distribution. This is rather surprising since limit ex-periments are usually Locally Asymptotically Quadratic (LAQ; see Jeganathan (1995) and Le Cam and Yang (1990)) and even non-regular models often enjoy a shift structure (see Hirano and Porter (2003a)), whereas the Poisson limit ex-periment does not enjoy these two properties. To illustrate the statistical conse-quences of the convergence to a Poisson limit experiment, we exploit this limit experiment to construct efficient estimators of the autoregression parameter in various models, and to construct an efficient test for the null hypothesis of a unit root. Related to this, we show that the Dickey-Fuller test for a unit root has no (local asymptotic) power.

Part II Chapter 5 discusses estimation of the marginals from a bivariate ran-dom sample. The only assumption on the marginals is that they are absolutely continuous. By Sklar’s theorem, the joint distribution is uniquely determined by the copula (the dependence structure), and the marginal distributions. Of course, the marginal empirical distribution functions are pn-consistent

(11)

esti-mators of the marginal distribution functions. If the components are indepen-dent then these estimators are known to be efficient. We prove that, amongst smooth copulas, this is actually the only case that the marginal empirical distri-bution functions are efficient. So the natural question is how knowledge on the copula should be exploited to improve on the empirical distribution functions. Motivated by an empirical likelihood argument we provide a new estimator of the marginal distribution functions. Since the tangent space is the sum of two non-orthogonal spaces, traditional semiparametric arguments cannot be used to prove efficiency of our estimator. We derive, by ad hoc arguments, a special representation of the limiting distribution of our estimator. Using this repre-sentation we prove efficiency.

Chapter 6 derives semiparametric efficiency bounds for parametric compo-nents in general semiparametric time series models. The time series models considered are not, as is the case in the usual semiparametric time series ap-proach, assumed to be driven by a sequence of independent innovations with an unknown distribution. Instead of this, the dependence between the inno-vations is seen as an additional nonparametric nuisance parameter. A Local Asymptotic Normality (LAN) result is, under quite natural and economical con-ditions, derived implying a lower bound on the asymptotic performance of (regular) estimators.

This thesis is partly based on the following research papers.

Chapter 1:

Drost, F. C., R. van den Akker, and B.J.M. Werker (2007). Note on integer-valued bilinear time series models,CentER discussion Paper, 2007-47, Tilburg Univer-sity.

Chapter 2:

Drost, F. C., R. van den Akker, and B. J. M. Werker (2006). Local asymptotic nor-mality and efficient estimation for INAR(p) models,CentER discussion Paper, 2006-45, Tilburg University.

Chapter 3:

Drost, F. C., R. van den Akker, and B.J.M. Werker (2007). Efficient estimation of autoregression parameters and innovation distribution for semiparametric nonnegative integer-valued AR(p) models, CentER discussion Paper, 2007-23, Tilburg University.

Chapter 4:

Drost, F. C., R. van den Akker, and B. J. M. Werker (2006). An asymptotic analysis of nearly unstable INAR(1) models,CentER discussion Paper, 2006-44, Tilburg University.

(12)

Chapter 5:

Segers, J. J. J., R. van den Akker, and B. J. M. Werker (2007). Efficient estimation of marginals by exploiting knowledge on the copula,Working paper.

Chapter 6:

Drost, F. C., R. van den Akker, and B. J. M. Werker (2007). Semiparametric effi-ciency bounds for time series models with non-i.i.d. innovations,Working pa-per.

(13)

Acknowledgements v

Preface vii

I Nonnegative integer-valued autoregressive processes

1

1 Preliminaries 3

1.1 Definition . . . 3

1.2 Stationarity, moments & auxiliaries . . . 5

2 Parametric stationary INAR(p) models 15 2.1 Local Asymptotic Normality . . . 15

2.2 Efficient estimation . . . 26

2.2.1 Innovation distribution is known . . . 26

2.2.2 Innovation distribution belongs to a parametric model . . 27

3 Semiparametric stationary INAR(p) models 31 3.1 The estimator . . . 33 3.2 Limit distribution . . . 35 3.2.1 Likelihood equations . . . 36 3.2.2 Asymptotic normality . . . 38 3.2.3 Proof of Lemma 3.2.1 . . . 40 3.3 Efficiency . . . 52

3.4 Monte Carlo study & Empirical Application . . . 58

4 The limit experiment of nearly unstable INAR(1) models 63 4.1 Preliminaries . . . 66 4.2 The limit experiment: one observation from a Poisson distribution 72

(14)

4.3 Applications . . . 82

4.3.1 Efficient estimation ofhin nearly unstable INAR models . 82 4.3.2 Efficient estimation in the global model in caseGis known 89 4.3.3 Testing for a unit root . . . 94

4.4 Auxiliaries . . . 96

II Other essays

97

5 Efficient estimation of marginals by exploiting knowledge on the cop-ula 99 5.1 Introduction . . . 99

5.2 Assumptions and Notation . . . 102

5.3 Copula and second marginal known . . . 105

5.3.1 Information lower-bound & inefficiency ofF_n . . . 105

5.3.2 The estimator . . . 108

5.3.3 The limit distribution . . . 114

5.3.4 Efficiency proof . . . 123

5.4 Copula known . . . 126

5.4.1 Tangent space & inefficiency of (Fn,Gn) . . . 126

5.4.2 The estimator . . . 129

5.4.3 Efficiency proof . . . 139

6 Semiparametric efficiency bounds for time series models with non-i.i.d. innovations 141 6.1 Introduction . . . 141 6.2 Setup . . . 143 6.2.1 Innovation structure . . . 143 6.2.2 Group structure . . . 145 6.2.3 Examples . . . 147 6.3 Assumptions . . . 148 6.4 Main Results . . . 152

6.4.1 Parametric LAN theorem . . . 152

6.4.2 Semiparametric lower bound . . . 155

6.5 Appendix: a general LAN theorem . . . 162

Bibliography 167

(15)

Nonnegative integer-valued

autoregressive processes

(16)

(17)

In Section 1.1 we recall the definition of INAR processes. Section 1.2, the main part of this chapter, contributes some probabilistic results on INAR processes. In particular, conditions for the existence of a strictly stationary solution and the existence of moments under the stationary distribution are provided.

Denition

1.1

The nonnegative INteger-valued AutoRegressive process of the order 1 (INAR(1)) was introduced by Al-Osh and Alzaid (1987). The INAR(1) process is defined by the recursion, X_t =ϑ◦X_t₋₁+ε_t, t∈Z₊=N∪{0}, (1.1) where1, ϑ◦X_t−₁= XXt−1 j=1 Z_j(t).

The variables (Z(_jt))j∈N,t∈Z+are i.i.d Bernoulli distributed variables with success

probabilityθ∈[0, 1], independent of the i.i.d. innovation sequence (εt)t∈Z+with

distributionGonZ+. Finally, the starting valueX−1, with distributionνonZ+,

is independent of (εt)t∈Z+ and (Z

(t)

j )j∈N,t∈Z+. The random variableϑ◦Xt−1 is

called the Binomial thinning ofXt−1 (this operator was introduced by Steutel

and Van Harn (1979) and, conditionally on Xt−1, it follows a Binomial

distri-bution with success probabilityθand number of trials equal toXt−1). Display

(1.1) can be interpreted as a branching process with immigration. The outcome

1_{An empty sum equals, by definition, 0. Although it would be more accurate to write}_ϑ(t)_◦ instead ofϑ◦, this superscript is, to keep in line with the literature, dropped.

(18)

Xt is composed of the surviving elements of Xt−1 during the period (t−1,t],

ϑ◦X_t−₁, and the number of immigrants during this period,ε_t. Each element of

Xt−1survives with probabilityθand its survival has no effect on the survival of

the other elements, nor on the number of immigrants. In the literature on sta-tistical inference for branching processes with immigration it is assumed that one observes both theX process and theεprocess. We consider the empirically more common situation where the number of immigrantsεt is not observed.

Note that, even if the true parameterθwould be known, the number of immi-grants cannot be derived from theX process in the INAR(1) model.

The more general INAR(p) processes were first introduced by Al-Osh and Alzaid (1990) but Du and Li (1991) proposed a different setup. In the setup of Du and Li (1991) the autocorrelation structure of an INAR(p) process is the same as that of an AR(p) process, whereas it corresponds to the one of an ARMA(p,p−1) pro-cess in the setup of Al-Osh and Alzaid (1990). The setup of Du and Li (1991) has been followed by most authors, and we use their setup as well. The INAR(p) process is an analogue of (1.1) with p lags. An INAR(p) process is recursively defined by, Xt=ϑ1◦Xt−1+ϑ2◦Xt−2+ · · · +ϑp◦Xt−p+εt, t∈Z+, (1.2) where, fori=1, . . . ,p, ϑi◦Xt−i= XXt−i j=1 Z(_jt,i).

Here (Z(_jt,i))_j_∈_N_,_t∈_Z₊, i ∈{1, . . . ,p}, are p mutually independent collections of i.i.d. Bernoulli variables with success probabilitiesθi ∈[0, 1],i =1, . . . ,p,

inde-pendent of theZ₊-valued i.i.d.G-distributed innovations (ε_t)_t_∈_Z₊. The starting value (X−1, . . . ,X−p)0is independent of (εt)t∈Z+ and (Z

(t,i)

j )i∈{1,...,p},j∈N,t∈Z+, and

has distribution νonZ₊p. The corresponding probability space is denoted by (Ω,F,Pν,θ,G), whereθ=(θ1, . . . ,θp)0.

Without going into details, let us mention some empirical applications of INAR processes. Applications in the medical sciences can be found in, for example, Franke and Seligmann (1993) (epileptic seizure counts), Bélisle et al. (1998) (spike trains), and Cardinal et al. (1999) (infectious disease incidence). An ap-plication to psychometrics can be found in Böckenholt (1999a) (daily emo-tion experiences), an applicaemo-tion to environmentology in Thyregod et al. (1999) (rainfall (rain data is most often collected by means of a tipping bucket rain gauge, which is a discrete sampler counting the number of times a bucket is filled in each sampling time interval)); recent applications to economics in, for example, Böckenholt (1999b), Berglund and Brännäs (2001) (number of plants

(19)

in Swedish municipalities), Brännäs and Hellström (2001), Rudholm (2001), Böck-enholt (2003), Freeland and McCabe (2004), Gouriéroux and Jasiak (2004) (num-ber of claims an insurance company receives), and McCabe and Martin (2005); and Pickands III and Stine (1997) and Ahn et al. (2000) considered queueing applications.

Stationarity, moments & auxiliaries

1.2

This section introduces notation for Part I of this thesis, and provides some probabilistic results, which we need in later chapters.

Throughout Part I the number of lags, p∈N, is fixed. The following notation is used:G denotes the set of all probability measures onZ+=N∪{0}. The

Bi-nomial distribution with parametersθ∈[0, 1] andn∈Z+is denoted by Binn,θ (Bin_0,_θis the Dirac-measure concentrated in 0), b_n_,_θdenotes the corresponding point mass function, andδx denotes the Dirac measure concentrated inx. In

general, we denote a probability measure onZ+by a capital, and denote the

as-sociated probability mass function by the corresponding lower case. ForG∈G,

µG denotes the mean ofG, and σ2_G denotes its variance. As usual Eν,θ,G(·) is

shorthand forR(·) dPν,θ,G. For (probability) measuresF andG,F∗G denotes

the convolution of F andG. Finally, F= (F_t)_t≥−p is the filtration generated byX, i.e.Ft =σ

¡

X−p, . . . ,Xt ¢

. Note that, contrary to classical AR(p) processes,

Ft6=σ ¡

X−p, . . . ,X−1,ε0, . . . ,εt ¢

.

Next, we compute the first two conditional moments of an INAR(p) process to gain some insight in its dependence structure. It immediately follows from (1.2) that, fort∈Z₊, E_θ_,_G[Xt |Ft−1]=Eθ,G £ Xt |Xt−1, . . . ,Xt−p ¤ =µG+ p X i=1 θiXt−i∈[0,∞], and, var_θ_,_G[Xt |Ft−1]=varθ,G £ Xt|Xt−1, . . . ,Xt−p ¤ =σ2_G+ p X i=1 θi(1−θi)Xt−i∈[0,∞].

Hence an INAR(p) process has the same autoregression function as an AR(p) process. However, an INAR(p) process has conditional heteroskedasticity of au-toregressive form (actually it is an ARCH(p) process), whereas the conditional variance is constant for AR(p) processes. For computations on higher-order moments we refer to Silva and Oliveira (2004, 2005).

(20)

it follows, fort∈Z+,

P_θ_,_G{X_t=x_t|F_t−₁}=P_θ_,_G©X_t =x_t |X_t−₁, . . . ,X_t−pª=P₍θ_X,G

t−1,...,Xt−p),xt,

where, forxt−p, . . . ,xt ∈Z+, the transition-probabilityP₍θ_x,G_t₋₁_,...,_x_t₋_p_),_x_t is given by,

P₍θ_x,G t−1,...,xt−p),xt =Pθ,G ( p X i=1 ϑi◦Xt−i+εt =xt |Xt−1=xt−1, . . . ,Xt−p=xt−p ) = ³ Binxt−1,θ1∗ · · · ∗Binxt−p,θp∗G ´ {xt}. (1.3)

Note that X =(Xt)t≥−p is apth order Markov chain. To exploit this Markovian

structure we introduce theZp₊-valued processY =(Y_t)_t_≥₀defined by

Yt=(Xt−1,Xt−2, . . . ,Xt−p)0, t∈Z+. (1.4)

Under Pν,θ,G the processY is a (first-order) Markov chain in Zp+. It is easy to

see that, in case g(0)<1 andθ∈(0, 1)p, the Markov chainY is irreducible on {α,α+1, . . . }p, whereα=min{k∈Z+|g(k)>0}. It is also easily seen that, under

these conditions, the chain is also aperiodic.

Franke and Seligmann (1993) gave conditions for the existence of a (strictly) stationary INAR(1) process using generating functions. Du and Li (1991), Dion et al. (1995), and Latour (1998) proved the existence of a stationary INAR(p) process in caseEGε2₀< ∞and

Pp

i=1θi <1. Only using an elementary result on

Markov chains, we give an alternative shorter proof.

Theorem 1.1. For all G ∈G with g(0) ∈[0, 1), µG < ∞, and θ ∈(0, 1)p with Pp

i=1θi<1, there exists a probability measureνθ,GonZ

p

+such that X is a strictly stationary process underPνθ,G,θ,G. The support ofνθ,Gis given by{α,α+1, . . . }

p_, whereα=min{k∈Z+|g(k)>0}.

Remark1. Clearly, in caseg(0)=1, a strictly stationary solution is given byXt=

0 for allt, i.e.νθ,G=δ0.

Proof.

First note that it suffices to prove that the Markov chainY (see (1.4)) has a sta-tionary distribution. We prove this for the caseg(0)>0. The caseg(0)=0 runs along the same lines.

By Qn_i_,_j we denote the n-step probability of moving from state i to j of the processY, i.e.,Q_in_,_j =P_δ_i_,_θ_,_G{Y_n=j},i,j∈Z₊p. Since, under the imposed condi-tions,Y is aperiodic and irreducible onZp₊, it suffices, by, for example, Theorem 8.8 in Billingsley (1995), to prove that there exist statesi,j ∈Zp₊for whichQn_i_,_j

(21)

does not converge to 0 asn→ ∞.

It is easy to see that, for allt∈Z+,Eδ0,θ,GXt < ∞whenEGε0< ∞. We first show that we even have sup_t∈_Z₊Eδ0,θ,GXt< ∞. Note that this statement indeed holds true if we can show thatEδ0,θ,GXt≤µG

P_t j=0θ j ∗, whereθ∗= Pp i=1θi which is less

than 1 by assumption. Obviously we haveEδ0,θ,GX−1= · · · =Eδ0,θ,GX−p=0 and

E_δ₀_,_θ_,_GX₀ =µ_G. Hence the statement holds for t ∈{−p, . . . , 0}. Let N ∈Z₊. As-suming thatE_δ₀_,_θ_,_GXt ≤µG

P_t j=0θ

j

∗is valid for allt∈{−p, . . . ,N} we obtain

E_δ₀_,_θ_,_GX_N₊₁=µ_G+ p X i=1 θ_iE_δ₀_,_θ_,_GX_N₊₁_−i≤µ_G+ p X i=1 θ_iµ_G N X j=0 θ_∗j =µ_G N_X+1 j=0 θ_∗j,

which concludes the induction argument.

Using sup_t_∈_Z₊Eδ0,θ,GXt< ∞, Markov’s inequality yields, forM>0,

sup t∈Z+ Pδ0,θ,G ½ max i=1,...,pXt−i>M ¾ ≤ p M t∈supZ+ Eδ0,θ,GXt.

Hence there existsM∈Nsuch that, for allt∈Z+,Pδ0,θ,G

©

maxi=1,...,pXt−i ≤M ª

≥ 1/2. DefineB_M ={(x₁, . . . ,x_p)∈Z₊p| ∀i∈{1, . . . ,p} : x_i≤M}, then, forn≥1,

Qn+p_0,0 =P_δ₀_,_θ_,_G{Yn+p=0}≥Pδ0,θ,G{Yn∈BM,Yn+p=0} = X (i1,...,in−1)∈Z(+n−1)p in∈BM (in+1,...,in+p−1)∈Z(+p−1)p Q0,i1· · ·Qin+p−1,0 = X (i1,...,in−1)∈Z(+n−1)p in∈BM Q_0,_i₁· · ·Q_i_n₋₁_,_i_nQ_ip n,0.

Usingin∈BMwe obtain the (very crude) boundQ_ip_n_,0≥ £ g(0)(1−θ∗)pM ¤_p , where θ∗=maxi=1,...,pθi. Since, X (i1,...,in−1)∈Z+(n−1)p,in∈BM Q_0,_i₁· · ·Q_i_n₋₁_,_i_n=P_δ₀_,_θ_,_G©X_n−p≤M, . . . ,X_n−₁≤Mª≥1 2,

we obtain, for alln∈N,

Qn+p_0,0 ≥£g(0)(1−θ∗)pM ¤p X (i1,...,in−1)∈Z(+n−1)p in∈BM Q0,i1· · ·Qin−1,in ≥1 2 £ g(0)(1−θ∗)pM ¤p >0,

(22)

Only in special cases it is possible to derive explicit formulas forνθ,G. As an

ex-ample: forp=1 andG=Poisson(µ) we haveν_θ_,_G=Poisson(µ/(1−θ₁)) (this is well-known and is, using generating functions, easy to check). For this specific case it is immediate that, under the stationary distribution, all moments exist. In general this is not the case: if, for example,σ2_G= ∞then, underν_θ_,_G,X₀ can-not have a finite second moment. The next lemma gives sufficient conditions for the existence of moments under the stationary distribution. Oddly enough it appears that the existence of higher order moments has not been considered before.

Lemma1.2.1. LetG∈G withg(0)∈[0, 1), andθ∈(0, 1)p withPp_i=₁θi <1. Then,

fork∈N,EGεk₀< ∞if and only ifEνθ,G,θ,GX

k

0 < ∞.

Proof.

Of course, we only have to prove the ‘only if’.

We give the proof for the case g(0) >0, for g(0)=0 the argument is almost similar. Under the assumptionEGεk₀< ∞the stationary distributionνθ,Gexists.

SinceY is an irreducible, aperiodic Markov chain with stationary distribution

ν_θ_,_G, we haveL(Yt |Pδ0,θ,G)→νθ,G. Hence (use the Portmanteau Lemma) we have, E_νθ_,_G_,_θ_,_GX₀k≤lim inf t→∞ Eδ0,θ,GX k t ≤sup t≥0 E_δ₀_,_θ_,_GX_tk.

Thus it suffices to prove sup

t≥0

E_δ₀_,_θ_,_GX_tk< ∞. (1.5)

Fork=1 we have shown in the proof of Theorem 1.1 thatE_δ₀_,_θ_,_GXt is bounded

in t ∈Z+. LetK ≥2. Suppose now that (1.5) holds for k =1, . . . ,K−1. If we

prove that then (1.5) also holds fork=K, then, by induction, the proof of the lemma is complete. So suppose (1.5) holds fork=1, . . . ,K−1. IfZ1, . . . ,Zn are

i.i.d. Bernoulli(θ) variables then we have, fork≥2, the bound (easily follows by elementary martingale theory; see, for example, Dharmadhikari et al. (1968))

E¯¯Pn_i₌₁(Zi−θ) ¯ ¯k

≤Cknk/2, where the constantCk>0 only depends onk. Using

thatϑ_i◦X_t_−i, conditional onX_t−i, follows a Binomial(X_t−i,θ) distribution, this yields the following inequality,

≤CKEδ0,θ,GX

K/2

t−i .

So, using the induction hypothesis (K/2≤K−1), we obtain

M= kε0kK+p sup t∈Z+, 1≤i≤p

¡

Eδ0,θ,G|ϑi◦Xt−i−θiXt−i|

(23)

where we denotekZkK = ¡

Eδ0,θ,G|Z|K

¢_1/_K

, i.e. the LK(Pδ0,θ,G) norm. We prove that, for alls≥0,

kXskK ≤ M 1−Pp_i=₁θi . (1.6) We haveEδ0,θ,GX K

−i =0 fori =1, . . . ,p, andEδ0,θ,GX

K

0 = kε0kKK. So (1.6) holds for s= −p, . . . , 0. Lett∈N. Suppose now that (1.6) holds for−p≤s≤t−1. We have,

kXtkK ≤ ° ° ° ° °Xt− p X i=1 θiXt−i ° ° ° ° ° K + ° ° ° ° ° p X i=1 θiXt−i ° ° ° ° ° K ≤ kεtkK+ p X i=1 kϑi◦Xt−i−θiXt−ikK+ p X i=1 θikXt−ikK ≤M+ p X i=1 θ_i M 1−Pp_i=₁θi ≤ M 1−Pp_i=₁θi .

Hence (1.6) holds for s =t. By induction we conclude that (1.6) holds for all

t∈Z+. This completes the proof.

In subsequent chapters we repeatedly have to deal with objects that are build of terms f(X_t−p, . . . ,X_t), i.e. they depend on two consecutive observations on

Y. Therefore we introduce the processZ=(Zt)t≥0, defined by

Zt=(Xt, . . . ,Xt−p)0, t∈Z+. (1.7)

It is easy to see that, in caseg(0)<1 andθ∈(0, 1)p,Zis an irreducible, aperiodic Markov chain on the state space2 Z =support(ν_θ_,_G⊗Pθ,G)⊂Zp+₊ 1. The next proposition contains some auxiliary results.

Proposition1.2.1. LetG∈G withg(0)∈(0, 1),µG< ∞,θ∈(0, 1)p with Pp

i=1θi <

1. The following results hold.

1. Letνa probability measure onZp₊. Ifh:Zp₊+1→RsatisfiesE_νθ_,_G_,_θ_,_Gh2(Z0)<

∞, then, 1 p n n X t=0 ¡ h(Z_t)−E_θ_,_G[h(Z_t)|F_t₋₁]¢−→d N(0,σ2), underP_ν_,_θ_,_G,

whereσ2is given by,

σ2=E_νθ_,_G_,_θ_,_Gh2(Z0)−Eνθ,G

¡

E_θ_,_G[h(Z0)|F−1]

¢₂

.

2_{As usual,}_ν_⊗_Pθ,G_{denotes the joint distribution of (}_X

(24)

2. LetC >0. The Markov chain Z isV1-uniformly ergodic3forV1:Zp++ 1→

[1,∞) given byV₁(Z_t)=1+CP_i=p ₀X_t−i. If alsoσ_G2 < ∞thenZ is alsoV₂ -uniformly ergodic forV2(Zt)=1+C

¡Pp i=0Xt−i

¢₂

.

3. Add the assumptionσ2_G< ∞. Letνa probability measure onZp₊. LetK be a compact subset ofRk. Let, for everyκ∈K, f(·;κ) :Z₊p+1→Rsuch that,

sup κ∈K ¯ ¯_f₍_x_−p_{, . . . ,}_x₀_;_κ₎¯¯_≤_C Ã p X i=0 x−i !₂ ,

for some constant C >0, and for every x−p, . . . ,x0 ∈ Z+ the map κ 7→ f(x−p, . . . ,x0;κ) is continuous. Then we have, underPν,θ,G

sup κ∈K ¯ ¯ ¯ ¯_n1 n X t=0 f(Xt−p, . . . ,Xt;κ)−Eνθ,G,θ,Gf(X−p, . . . ,X0;κ) ¯ ¯ ¯ ¯−→p 0. (1.8) And, forK3κn→κ0, 1 n n X t=0 f(Xt−p, . . . ,Xt;κn) p −→Eνθ,G,θ,Gf(X−p, . . . ,X0;κ0), underPν,θ,G. (1.9)

4. UnderP_νθ_,_G_,_θ_,_G theβ-mixing (also called: absolute regularity mixing) co-efficients4ofZ satisfy

β(n)≤Cρn, for alln∈N, for some constantC>0 and 0<ρ<1.

5. Add the assumptionEGε3₁< ∞. LetZndenote the empirical process ofZ,

i.e. forf :Zp+₊ 1→RsatisfyingEνθ,G,θ,Gf

2₍_Z 0)< ∞, Znf = 1 p n n X t=0 ¡ f(Zt)−Eνθ,G,θ,Gf(Z0) ¢ .

LetF be a collection ofR-valued functions onZp+₊ 1with, for someC > 0, sup_f_∈_F|f(x−p, . . . ,x0)| ≤C(x−p+ · · · +x0), and such that its bracketing

numbers5with respect to the L2(νθ,G⊗Pθ,G)-norm, denoted by,N[ ](δ,F),

δ>0, satisfy

logN_{[ ]}(x,F)=O(x−2ζ),

3_{Recall that the Markov chain} _Z _is _V_{-uniformly ergodic, with} _V _: _Z _→ _[1,_∞_{), if} sup_z_∈Zsup_f_:_|_f_|≤_V|E_θ_,_G[f(Zt)|Z0=z]−Eνθ,G,θ,Gf(Z0)|/V(z)→0 ast→ ∞.

4_{For the definition see, for example, Davydov (1973) or Doukhan (1994, page 3 and pages} 87-88).

5_{A bracket is a pair of elements [}_f_,_g_{] of}_L

2(νθ,G⊗Pθ,G) such thatf ≤g. Forδ>0 the

brack-eting numberN[ ](δ,F) is the smallest cardinality of collectionsS(δ) of brackets such that for allf ∈Fthere exists [g,h]∈S(δ) such thatg≤f ≤handR(h−g) d(ν_θ_,_G⊗Pθ,G₎_≤_δ2_.

(25)

withζ∈(0, 1). Then the process©Znf |f ∈F ª

weakly converges, under

P_ν_θ_,_G_,_θ_,_G, in`∞(F) to a tight Gaussian process. 6. DefineV :Zp₊→[1,∞) byV(x−1, . . . ,x−p)=1+

Pp

i=1aix−i, whereai =θi+

· · ·+θpfori=1, . . . ,p. Let (θn,Gn) be a sequence with, for alln∈N,θn,i>0

for alli,P_ip₌₁θ_n_,_i<1,g_n(0)∈(0, 1) andE_G_nε₀< ∞. Then

lim n→∞sup y∈Zp+ sup_f_:_|_f_|≤V ¯ ¯ ¯Eδy,θn,Gnf(Y1)−Eδy,θ,Gf(Y1) ¯ ¯ ¯ V(y) =0 implies lim n→∞_f_:sup_|_f_|≤V ¯ ¯ ¯ ¯ Z f dνθ,G− Z f dνθn,Gn ¯ ¯ ¯ ¯=0. Proof.

Proof of Part 1: SinceY is a positive recurrent Markov chain, this follows from Theorem 4.3.16 in Dacunha-Castelle and Duflo (1986) in caseν=δy. From this,

the result extends to arbitraryνby looking at pointwise convergence of charac-teristic functions, conditioning on the initial value and using dominated con-vergence.

Proof of Part 2: We consider theV₁-uniform ergodicity first. IntroduceV :Zp₊+1→ [1,∞) given byV(Z_t)=1+P_ip+₌₁1a_iX_t+₁_−i, witha_i =θ_i+ · · · +θ_p fori =1, . . . ,p

andap+1=(1−a1)/2. If we verify that there exists a constantδ>0 such that we

have, for allzt−1∈Z, except for some finite set, the inequality

Eθ,G £

V(Zt)|Zt−1=zt−1=(xt−1, . . . ,xt−p−1)0

¤

−V(zt−1)≤ −δV(zt−1),

i.e. that a Foster-Lyanupov drift criterium holds, we obtain the first result from Meyn and Tweedie (1994, Theorem 16.01). Let 0<δ<(1−a1)(minj=1,...,pθj)/2<

1. We have, (1−δ)V(zt−1)−Eθ,G[V(Zt)|Zt−1=zt−1]=a+(1−δ)ap+1xt−p−1 + p X i=1 [(1−δ)ai−ci]xt−i,

for some constantaandci given byci=a1θi+ai+1,i =1, . . . ,p. We show that,

fori=1, . . . ,p, (1−δ)ai−ci>0 which implies that

(1−δ)V(zt−1)−Eθ,G[V(Zt)|Zt−1=zt−1]>0,

outside a finite set. We have, fori=1, . . . ,p−1, (usea_i=θ_i+a_i+₁), (1−δ)ai−a1θi−ai+1= −δai+(1−a1)θi> −δ+(1−a1) min

(26)

and (1−δ)ap−a1θp−ap+1> −δ+0.5θp(1−a1)/2, which concludes the proof of

theV₁-uniform ergodicity. Next we prove theV₂-uniform ergodicity. Introduce ˜ V :Zp+₊ 1→[1,∞) given by ˜V(Zt)=1+ ³Pp+1 i=1 aiXt+1−i ´₂ , withai =θi+ · · · +θp fori =1, . . . ,p, andap+1=(1−a2₁) ¡ minjθj ¢₂

/4. Let 0<δ<ap+1/4. After some

calculus we find, E_θ_,_G£V˜(Zt)|Zt−1=zt−1 ¤ =a+ p X i=1 αiXt−i+ p X i=1 p X j=1 βi jXt−iXt−j,

whereβi j=a₁2θiθj+θia1aj+1+θja1ai+1+ai+1aj+1fori,j =1, . . . ,p. We show

that (1−δ)aiaj−βi j>0 fori,j=1, . . . ,p. Using that, fori,j=1, . . . ,p−1, aiaj=θiθj+θi(θj+1+ · · · +θp)+θj(θi+1+ · · · +θp)+ai+1aj+1,

andaiaj<(θi+ · · · +θp), we obtain, fori,j =1, . . . ,p−1,

(1−δ)aiaj−βi j>(1−a12)θiθj−δaiaj≥(1−a21)(min_j θj)2−δ>0.

Fori =1, . . . ,p−1 we have (1−δ)aiap−βi p >(1−a2₁)θiθp−ap+1−δ>0, and

(1−δ)a2_p−β_pp > −δ+(1−a2₁)θ2_p−a_p+₁(2a₁θ_p+a_p+₁)>0. We conclude that, outside a finite set, we have

(1−δ) ˜V(zt−1)−Eθ,G £ ˜ V(Zt)|Zt−1=zt−1 ¤ >0.

So another application of the drift criterion in Meyn and Tweedie (1994, Theo-rem 16.01) shows thatZ isV2-uniformly ergodic.

Proof of Part 3: Since Z isV2-uniformly ergodic, a combination of Part 2 with

Meyn and Tweedie (1994, Theorem 16.0.1) yields, for a constant M >0 and

ρ∈(0, 1), such that for allt∈Z+,

sup z∈Z sup_f_:_|_f_|≤V₂¯¯Eθ,G[f(Zt)|Z0=z]−Eνθ,G,θ,Gf(Z0) ¯ ¯ V2(z) ≤Mρt.

Using thatV2(Z0) isPνθ,G,θ,G-integrable (by Lemma 1.1) we easily obtain

lim t→∞_f_:sup_|f_|≤V 2 ¯ ¯_E_ν_,_θ_,_G_f₍_Z_t₎₋_E_ν θ,G,θ,Gf(Z0) ¯ ¯₌_0. _(1.10)

Display (1.10) yields, forM>0, lim

t→∞Eν,θ,GV2(Zt)1{V2(Zt)≥M}=Eνθ,G,θ,GV2(Z0)1{V2(Z0)≥M}.

Hence we obtain, forM>0,

lim n→∞ 1 n n X t=0 Eν,θ,GV2(Zt)1{V2(Zt)≥M}=Eνθ,G,θ,GV2(Z0)1{V2(Z0)≥M}.

(27)

Dominated convergence now yields, lim M→∞n→∞lim 1 n n X t=0 Eν,θ,GV2(Zt)1{V2(Zt)≥M}=0.

Hence Assumption DM in Andrews (1992) is satisfied. Assumption TSE-1B in that paper is also satisfied, by the compactness ofK, by the continuity ofκ7→

f(z;κ), and because we have, from (1.10),

lim n→∞ 1 n n X t=0 Pν,θ,G{Zt∈A}=νθ,G⊗Pθ,G(A), for A⊂Z+p+1.

A combination of the law of large numbers for Markov chains (see, for exam-ple, Dacunha-Castelle and Duflo (1986, Theorem 4.3.15)) with Theorem 4 in Andrews (1992) now yields,

sup κ∈K ¯ ¯ ¯ ¯_n1 n X t=0 f(Xt−p, . . . ,Xt;κ)−Eν,θ,Gf(X−p, . . . ,X0;κ) ¯ ¯ ¯ ¯−→p 0, underPν,θ,G.

This yields (1.8), since (1.10) yields,

sup κ∈K ¯ ¯_E_ν_,_θ_,_G_f₍_X_t_−p_{, . . . ,}_X_t_;_κ₎₋_E_ν θ,G,θ,Gf(X−p, . . . ,X0;κ) ¯ ¯_→_{0 as}_t_{→ ∞,}

Display (1.9) follows by dominated convergence.

Proof of Part 4: LetQndenote then-step transition-operator ofZ (drop the su-perscriptθ,G). From well-known results on mixing-numbers for Markov chains (see, for example, Doukhan (1994, pages 87-89) it follows that it is sufficient to prove that there exists a functionA:Z₊p+1→(0,∞) such thatR Ad(νθ,G⊗Pθ,G)<

∞and

kQn(z,·)−ν_θ_,_G⊗Pθ,Gk_TV≤A(z)ρn, z∈Z, (1.11) for some 0<ρ<1, wherek · kTVthe total variational norm of a signed measure.

By Part 2Z isV1-uniformly ergodic. Meyn and Tweedie (1994, Theorem 16.0.1)

now yields, for a constantM>0 andρ∈(0, 1), such that for allt∈Z+,

sup z∈Z sup_f_:_|_f_|≤V₂¯¯Eθ,G[f(Zt)|Z0=z]−Eνθ,G,θ,Gf(Z0) ¯ ¯ V1(z) ≤Mρt.

SinceE_ν_θ_,_G_,_θ_,_GV₁(Z₀)< ∞(by Lemma 1.1) andV ≥1 (1.11) immediately follows. Proof of Part 5: this follows from Part (1) and Doukhan et al. (1995, Theorem 1, Application 4, and Display (2.16)). In their setup proceed as follows. Taker = 3/2, notice that, using Markov’s inequality andEνθ,GX

3

(28)

envelope belongs toΛ3(P)=Λxpx(P). Next, note that (2.16) in Doukhan et al.

(1995) is satisfied, since we haveP∞_n=₁n−1/2β(n)(1−ζ)(r−1)/(2r)< ∞, by Part 4. Proof of Part 6: Analogous to the proof of Part 2 it follows that the Markov chain

Y onZp₊ isV-uniformly ergodic forV(Yt)=1+ Pp

i=1aiXt−i, ai =θi+ · · · +θp.

An application of Kartashov (1985, Theorem B) yields thatY is strongly stable in this norm, i.e. that Part 6 holds.

Let us briefly comment on this proposition. Part 1 is stated for easy reference; its purpose is clear. In Chapter 2, where we discuss the LAN-property for para-metric INAR models, we encounter remainder terms which we will handle with Part 3. We proved this uniform law of large numbers by exploiting a high level result of Andrews (1992). The V-uniform ergodicity, Part 2, which we prove by a drift criterium, appears to be new to the literature. Using this property, Part 4, Part 5 and Part 6 follow quite easily from the literature. Part 5 is used in Chapter 3 to demonstrate weak convergence of the infinite-dimensional part a ‘score-process’. And Part 6 shows that, in appropriate topologies, the stationary distributionν_θ_,_Gis a continuous mapping of (θ,G).

(29)

models

This chapter considers parametric INAR(p) models:G belongs to a paramet-ric class of distributions, say (Gα|α∈A⊂Rq). Estimators of the parameters are provided by several authors. Forp=1 andGα=Poisson(α), Franke and Selig-mann (1993) analyzed maximum likelihood. Du and Li (1991) and Freeland and McCabe (2005) derived the limit-distribution of the OLS-estimator of θ. Brännäs and Hellström (2001) considered GMM estimation, Silva and Oliveira (2004) proposed a frequency domain based estimator ofθ, and Silva and Silva (2006) considered a Yule-Walker estimator. Jung et al. (2005) analyzed, by a Monte Carlo study, the finite sample behavior of several estimators for the case

p=1. Zheng et al. (2006) analyzed random coefficient INAR(p) processes. And Enciso-Mora et al. (2006) and Neal and Subba Rao (2007) considered MCMC estimation. In this chapter we are interested in asymptotic efficient estimation of the parameters in an INAR(p) model. Maximum likelihood is, in general, considered to be computationally unattractive, since the transition-densities are convolutions of p+1 distributions. The main result of this chapter is that parametric INAR models enjoy the Local Asymptotic Normality property. A key step, which makes the analysis tractable, is a certain conditional expectation representation of the transition-scores. This representation is motivated by an information-loss interpretation of the model. As a consequence of the LAN-property, we obtain an efficient estimator of (θ,α) if there is available apn -consistent estimator. We prove that such an initial estimator always exists. This yields a computationally attractive and efficient estimator.

Local Asymptotic Normality

2.1

We always restrict ourselves to the stationary parameter regime, i.e.,θ∈(0, 1)p withPp_i=₁θi<1 (see Chapter 4 for the asymptotic structure of an INAR(1) model

(30)

immigration-distributionGand the initial distributionνare completely known. Observing (X_−p, . . . ,X_n) leads to the following sequence of statistical experiments

E₁(n)(ν,G)= ³ Zn+₊ 1+p, 2Zn++1+p_, ³ P(_νn_,_θ)_,_G|θ∈Θ ´´ , n∈Z+,

where the initial distribution ν and the immigration distributionG ∈G are fixed,Θ={θ∈(0, 1)p |Pp_i=₁θi <1}, andP(_νn_,_θ)_,_G denotes the law of (X−p, . . . ,Xn)

on the measurable space

³

Zn+₊ 1+p, 2Zn++1+p ´

underP_ν_,_θ_,_G. In this modelGis com-pletely known, but we also want to consider the case whereGbelongs to a para-metric model, for example,G=Poisson(α). So let A⊂Rq andGA=(Gα)α∈A be

a family of elements inG, such thatα7→G_αis sufficiently smooth (this will be made precise later). We then consider the sequence of experiments, induced by observing (X−p, . . . ,Xn), E₂(n)(ν,G_A)= ³ Zn+₊ 1+p, 2Zn++1+p_, ³ P(_νn_,_θ)_,_α|θ∈Θ,α∈A ´´ , n∈Z₊,

where, for notational convenience, we abbreviateGαin sub- and superscripts byα. In particular,νθ,Gα is denoted byνθ,α. In this section we prove the LAN-property for the sequence of experimentsE₂(n)(ν,GA),n∈Z+, immediately

im-plying the LAN-property for the sequence of experimentsE₁(n)(ν,G),n∈Z+.

LetGA=(Gα|α∈A) be a parametric family of innovation distributions, where

Ais an open, convex subset ofRqsuch that,

(A1) the support ofG_αdoes not depend onαand we have 0<g_α(0)<1; (A2) for alle∈Z+andα∈A, the expressions,

hα(e)= ∂ ∂αlog ¡ gα(e) ¢ 1(0,1] ¡ gα(e) ¢ ∈Rq, ˙ hα(e)= ∂2 ∂αT_∂αlog ¡ gα(e) ¢ 1(0,1] ¡ gα(e) ¢ ∈Rq×q,

are defined and, for alle∈Z+, they are continuous inα;

(A3) for every (θ,α)∈Θ×A, there existsδ>0 and a constantC>0 such that sup ( ˜θ, ˜α):|( ˜θ, ˜α)−(θ,α)|<δ E_θ˜_{, ˜}_α £ |hα˜(ε0)|2|X0, . . . ,X−p ¤ ≤C Ã p X i=0 X−i !₂ , (2.1) and, fori,j=1, . . . ,q, sup ( ˜θ, ˜α):|( ˜θ, ˜α)−(θ,α)|<δ E_θ˜_{, ˜}_α £¯_¯ ˙ hα˜,i j(ε0) ¯ ¯_|_X₀_{, . . . ,}_X_−p¤_≤_C Ã p X i=0 X−i !₂ , (2.2)

(31)

(A4) the information-equalityEαhαhTα(ε0)= −Eαh˙α(ε0) is satisfied, and theq×

q matrixE_αh_αh_αT(ε₀) is non-singular and continuous inα; (A5) Eαε2₀< ∞forα∈A;

(A6) Gα=Gα0impliesα=α0.

Remark 2. Assumption (A1) is necessary to make sure that the INAR process can reach state 0. This is a reasonable assumption for virtually all applications. From a technical point of view, this assumption will help us to prove invertibil-ity of the Fisher information.

Remark3. It is well-known that Assumptions (A2) and (A4) ensure thatG_Ais dif-ferentiable in quadratic mean with scorehα(ε0) (see, for example, Lemma 7.6

in Van der Vaart (2000)) and consequentlyEαh(ε0)=0, see the proof of

Theo-rem 7.2 in Van der Vaart (2000).

Remark4. Assumptions (A1)-(A6) are of the Cramér-type. Conditions (2.1) and (2.2) in Assumption (A3) are rather awkward. A simple sufficient condition is given by |hα,i(e)| ≤aα+cαe and|h˙α,i j| ≤bα+dαe2 for aα,bα,cα anddα that are (locally) bounded inα. Now it is easy to see that the (in the literature often-used) example A=(0,∞) andGα=Poisson(α) satisfies the conditions above. We note that in (2.1) and (2.2) the upper-boundC¡P_ip₌₀X−i

¢₂

can be replaced byP_ν_θ_,_α_,_θ_,_α-integrable variablesM₁θ,αandM₂θ,αin case the initial distributionν

has finite support (in that case ergodicity instead ofV2-uniform ergodicity (see

Proposition 1.2.1.2) suffices).

To see that the sequence of experiments (E₂(n)(ν,GA))n∈Z+has the LAN-property,

we need to determine the asymptotic behavior of a localized log-likelihood ra-tio. To that end we first write down the likelihood. By thep-th order Markov-structure, the likelihood is given by,

Ln(θ,α|X−p, . . . ,Xn)=ν{X−1, . . . ,X−p} n Y t=0 P₍θ_X,α t−1,...,Xt−p),Xt.

Since the likelihood is extremely smooth in (θ,α), it seems to be appropriate to establish the LAN-property directly, using a Taylor-expansion. This is the path we take. To obtain useful expressions for the transition-scores forθandα, we briefly discuss how we can view upon the model as an information-loss model. Suppose that, instead of just observing X−p, . . . ,Xn, we would also be able to

observe ϑi ◦Xt−i, i =1, . . . ,p, t = 0, . . . ,n. Then εt = Xt − Pp

i=1ϑi ◦Xt−i also

belongs to the information set at timet, just as in the classical AR(p) model. In our model, with only observations on X−p, . . . ,Xn, this does not hold true;

there is loss of information. The ‘information-loss principle’, see for example Le Cam and Yang (1988) or Bickel et al. (1998, Proposition A.5.5), suggests that the transition-score for θi in the model where we only observe Xt−p, . . . ,Xt,

(32)

equals the conditional expectation, given Xt−p, . . . ,Xt, of the transition-score

forθ_i in the model with also observations onϑ_i◦X_t−i,i=1, . . . ,p. It is not dif-ficult to see that the transition-score forθi in the model with the additional

observationsϑi◦Xt−i is nothing but the score of a BinXt−i,θi distribution. Recall

that the score of a Bin_x_,_θdistribution is given by, forθ∈(0, 1),

˙ sx,θ(k)= µ ∂ ∂θlog bx,θ(k) ¶ 1(0,1](bx,θ(k))= k−θx θ(1−θ)1{0,...,x}(k), x∈Z+. (2.3) Hence, the information-loss structure suggests that the transition-score forθi

in our model equals,

E_θ_,_α£s˙_X_t₋_i_,_θ_i(ϑi◦Xt−i)|Xt, . . . ,Xt−p ¤

.

Similarly, the transition-score forαis conjectured to be equal to,

E_θ_,_α£h_α(ε_t)|X_t, . . . ,X_t_−p¤.

One way to make this reasoning precise, is to show that the model is differen-tiable in quadratic mean with respect to (θ,α). Instead, since the model is ex-tremely smooth, we may derive the transition-scores directly by calculating the partial derivatives of logP₍θ_x,α

t−1,...,xt−p),xt with respect to bothθandα. It is easy to

see that, forxt−p, . . . ,xt∈Z+,i =1, . . . ,p,θ∈(0, 1)p, we have,

˙ `_θ_,_i(x_t_−p, . . . ,x_t₋₁,x_t;θ,α)= ∂ ∂θi log ³ P₍θ_x,α t−1,...,xt−p),xt ´ 1_(0,1] ³ P₍θ_x,α t−1,...,xt−p),xt ´ = P ks˙xt−i,θi(k) bxt−i,θi(k) ³ Gα*j6=iBinxt−j,θj ´ {xt−k} P₍θ_x,α t−1,...,xt−p),xt 1(0,1] ³ P₍θ_x,α t−1,...,xt−p),xt ´ =Eθ,α £ ˙ sXt−i,θi(ϑi◦Xt−i)|Xt=xt, . . . ,Xt−p=xt−p ¤ , (2.4) where we putEθ,α £ · |Xt =xt, . . . ,Xt−p=xt−p ¤ =0 ifPν,θ,α{Xt−p=xt−p, . . . ,Xt= xt}=0. Similarly we find, forxt−p, . . . ,xt ∈Z+, andi=1, . . . ,q,

˙ `α,i(xt−p, . . . ,xt−1,xt;θ,α)= ∂ ∂αi log ³ P₍θ_x,α t−1,...,xt−p),xt ´ 1(0,1] ³ P₍θ_x,α t−1,...,xt−p),xt ´ = P ehα,i(e)gα(e) ³ *_j₌_1,...,_pBin_x_t₋_j_,_θ_j ´ {x_t−e} P₍θ_x,α t−1,...,xt−p),xt 1(0,1] ³ P₍θ_x,α t−1,...,xt−p),xt ´ =E_θ_,_α£h_α_,_i(ε_t)|X_t=x_t, . . . ,X_t_−p=x_t−p¤. (2.5) For the casep=1 andGα=Poisson(α), representation (2.4) was also found by Freeland and McCabe (2004). Although we established (2.4) and (2.5) also by direct calculations, we stress that the structure is due to the information-loss

(33)

interpretation of the model. From the representation it immediately follows that the score is a martingale. If we would not have the representations avail-able, this would be a tedious matter. A Taylor-expansion of the localized log-likelihood ratio, a martingale central limit theorem, and a law of large numbers now suggest that the sequence of experiments (E₂(n)(ν,G_A))_n∈_Z₊ has the LAN-property. The following theorem gives the precise result.

Theorem 2.1. LetG_A⊂Gsatisfy Assumptions (A1)-(A5),νa probability measure on Zp₊, and(θ,α)∈Θ×A. Then the sequence of experiments (E₂(n)(ν,GA))n∈Z+ has the LAN-property in(θ,α), i.e. for every u=(u1,u2)∈Rp×Rq the following

expansion holds, log dP(n) ν,θ+u1/pn,α+u2/pn dP(_νn_,_θ)_,_α (X−p, . . . ,Xn)=log Ln ³ θ+_pu1 n,α+ u2 p n |X−p, . . . ,Xn ´ Ln ¡ θ,α|X−p, . . . ,Xn ¢ =uTSn− 1 2u T_Ju₊_R n, where the score (also called central sequence),

Sn=Sn(θ,α)= 1 p n n X t=0 µ_˙ `θ(Xt−p, . . . ,Xt;θ,α) ˙ `α(Xt−p, . . . ,Xt;θ,α) ¶ , (2.6) satisfies Sn−→d N(0,J),underPν,θ,α. (2.7)

The Fisher-information defined by,

J=J(θ,α)= µ Jθ Jθ,α Jα,θ Jα ¶ = µ Eνθ,α,θ,α`˙θ`˙ T θ(X−p, . . . ,X0;θ,α) Eνθ,α,θ,α`˙θ`˙ T α(X−p, . . . ,X0;θ,α) Eνθ,α,θ,α`˙α`˙ T θ(X−p, . . . ,X0;θ,α) Eνθ,α,θ,α`˙α`˙ T α(X−p, . . . ,X0;θ,α) ¶ , is non-singular, and Rn=Rn(u,θ,α) p −→0underPν,θ,G.

Remark5. If one wants to draw the initial value, (X₋₁, . . . ,X_−p)0, according to the stationary distribution, one considers the sequence of experiments ˜E₂(n)(GA)=

(Zn+₊ 1+p, 2Z+n+1+p_{, (}_P(n)

νθ,α,θ,α |θ ∈Θ,α∈ A)), n ∈Z+. If the conditions in Theo-rem 2.1 are satisfied and if the initial value is negligible: for allu=(u1,u2)∈Rp×

Rq we haveν_θ_+u₁_/p

n,α+u2/pn{X−1, . . . ,X−p}−νθ,α{X−1, . . . ,X−p}=o(Pνθ,α,θ,α; 1), then we also have the LAN-property for ( ˜E₂(n)(GA))n∈Z+. In casep=1,A=(0,∞),

andGα=Poisson(α), it is easy to see, using generating functions, that νθ,α= Poisson(α/(1−θ)). For this case the negligibility of the initial value readily fol-lows. See the proof of Lemma 3.3.1 how to verify, in general, the negligibility.

(34)

Remark6. For the casep=1 andGα=Poisson(α), the non-singularity ofJwas obtained, via direct calculation, by Franke and Seligmann (1993).

Proof.

Using Assumption (A5) onGAand Lemma 1.2.1 we obtainEν0,θ,αX

2

0< ∞, where,

for notational convenience, we denoteν0=νθ,α.

Expansion of log-likelihood ratio: Letu=(u1,u2)∈Rp×Rq,u6=0 (the caseu=0

is, of course, trivial). SinceΘ×Ais open and convex we obtain, by Taylor’s the-orem, log Ln ³ θ+_pu1 n,α+ u2 p n|X−p, . . . ,Xn ´ Ln ¡ θ,α|X−p, . . . ,Xn ¢ =uTSn(θ,α)− 1 2u T_J n( ˜θn, ˜αn)u, (2.8)

where ( ˜θn, ˜αn) is a random point on the line-segment between (θ,α) and (θ+ u1/ p n,α+u2/ p n) and Jn(θ,α)= − 1 p n ∂ ∂(θ,α)TSn(θ,α). (2.9)

First, we give some auxiliary calculations in Part 0. Part 1 shows thatSn(θ,α)−→d

N(0,J) underPν,θ,α, in Part 2 we prove that Jn( ˜θn, ˜αn) p

−→J underPν,θ,α, and, finally, in Part 3 we prove the non-singularity ofJ.

Part 0: auxiliary calculations In this part we show that certain expressions are integrable, which is needed in Step 1 and Step 2.

It is easy to see that, forθ∈(0, 1),`∈N, we have

∂` ∂θ`log bx,θ(k)=(−1) `+1₍_`₋_1)!k θ`−(`−1)! x−k (1−θ)`, and hence ¯ ¯ ¯ ¯ ∂ ` ∂θ`log bx,θ(k) ¯ ¯ ¯ ¯≤(`−1)!x µ 1 (1−θ)`∨ 1 θ` ¶ ≤(`−1)! x (1−θ)`_θ`. (2.10) For notational convenience we denote

˙

s_x_,_θ(k)= ∂

∂θlog bx,θ(k), and ¨sx,θ(k)= ∂2

∂θ2log bx,θ(k).

From (2.4) and (2.10) we obtain the bound

¯

¯_`˙_θ_,_i₍_X_−p_{, . . . ,}_X₀_;_θ_,_α₎¯¯_≤ 1