Accelerated Asymptotics for Diffusion Model Estimation

(1)

Accelerated Asymptotics for Di

ﬀ

usion Model Estimation

∗

Federico M. Bandi

GSB, The University of Chicago

and

Peter C. B. Phillips

Cowles Foundation for Research in Economics

Yale University

First Draft:

October

1 7,

1

999 This Version

: January

1 8, 2000

Abstract

We propose a semiparametric estimation procedure for scalar homogeneous stochastic dif-ferential equations. We specify a parametric class for the underlying diffusion process and identify the parameters of interest by minimizing criteria given by the integrated squared differ-ence between kernel estimates of drift and diffusion function and their parametric counterparts. The nonparametric estimates are simplified versions of those in Bandi and Phillips (1998). A complete asymptotic theory for the semiparametric estimates is developed. The limit theory relies on infill and long span asymptotics and the asymptotic distributions are shown to de-pend on thechronological local timeof the underlying diffusion process. The estimation method and asymptotic results apply to both stationary and nonstationary processes. As is standard with semiparametric approaches in other contexts, faster convergence rates are attained than is possible in the fully functional case.

Keywords: Diﬀusion, Drift, Infill asymptotics, Kernel density, Local time, Martingale, Nonpara-metric estimation, Semimartingale, SemiparaNonpara-metric estimation, Stochastic diﬀerential equation.

JEL Classification: C14, C22

∗_{PRELIMINARY VERSION. This paper was presented at the Cowles Foundation Conference “New Developments}

in Time Series Econometrics”, Yale University, October 23-24,1999. Bandi thanks the Graduate School of Business of the University of Chicago forfinancial support. Phillips thanks the NSF for support under Grant No. SBR 97-30295. Comments are welcome and may be sent to [email protected].

(2)

1 Introduction

The estimation of continuous-time models, such as those described by potentially nonlinear stochas-tic differential equations, has been intensively studied in recent research. Stanton (1998) provides a recent concise survey and discussion of applications in finance. In the last few years, this litera-ture has shown a tendency to turn to fully functional procedures to identify and estimate the two functions that describe the solution to the stochastic differential equation of interest, that is the drift and diffusion functions [c.f. Jiang and Knight (1997), Stanton (1998) and Bandi and Phillips (1998) — hereafter BP].The motivation for this focus is clear. By not imposing a specific parametric structure, fully functional methods reduce the extent of potential misspecifications. Unfortunately, they do so at the expense of slower convergence rates and the potential of greater estimation er-ror over their parametric counterparts. Yet, the informational content of accurately implemented functional methods can be put to work as a useful descriptive tool to understand more about the underlying dynamics from a general perspective and to investigate more effective procedures for parametric inference.

This paper seeks to design an estimation methodology that exploits the generality of functional methods while improving on their convergence properties. Also, we wish to utilize any available information about possible parametrizations for the two functions of interest. A natural way to proceed is to define a semiparametric estimation procedure that matches functional estimates to their parametric counterparts. In order to do so, we specify a parametric class for the underlying diffusion process and estimate the parameters of interest by minimizing two criteria which can be readily interpreted as the integrated squared differences between kernel estimates of the drift and diffusion function and their corresponding parametric expressions.

The nonparametric estimates we use here are simplified versions of those in BP (1998). As discussed in BP (1998), drift and diﬀusion functions can be identified separately using functional analogues of the true theoretical functions. Only minimal requirements need to be placed on the data generating mechanism for this approach to be justified. In particular, we do not require the existence of a time-invariant marginal data density, so stationarity is not needed.

The present work develops an asymptotic theory for the new semiparametric estimates. The limit theory relies on infill and long span asymptotics, just as in the fully nonparametric case [c.f. BP (1998) and Bandi (1999)], and the asymptotic distributions are shown to depend on the

chronological local time of the underlying diﬀusion process, that is on the time that the process spends in the vicinity of each spatial point [see Protter (1990) for a general discussion and Phillips and Park (1998) for an introduction to this concept in econometrics]. As expected, semiparametric methods entail eﬃciency gains with respect to fully functional procedures by virtue of their faster convergence rates. The same intuition as in the standard semiparametric regression context carries over to the continuous-time model examined in this paper [c.f. Andrews (1989)].

From a purely technical point of view, this work merges two strands of the most recent econo-metrics literature, namely the estimation of nonlinear models of integrated time-series [Park and Phillips (1999, 2000)] and the functional identification of diﬀusions under minimal assumptions on the dynamics of the underlying process [Florens-Zmirou (1993), Jacod (1997) and BP (1998)]. In eﬀect, the ‘minimum distance’ type of estimation that is presented in this paper can be interpreted as extremum estimation for potentially nonstationary and nonlinear continuous-time models.

The paper proceeds as follows. Section 2 discusses the model and objects of econometric interest. Section 3 details the estimation procedure. Section 4 presents the main results. In Section 5 we outline the Brownian motion case. Section 6 discusses covariance matrix estimation. Section 7 concludes. Notation is listed in Section 8 and proofs are given in Section 9.

(3)

2 The model

We consider a diﬀusion process_{Xt:t≥0}generated by

dXt=µ(Xt,θdrif t)dt+σ(Xt,θdif f)dBt, (1)

with initial condition X0 =X and where Bt is a standard Brownian motion defined on thefiltered

probability space (Ω,₌B,(₌B_t )t≥0, P). The initial conditionX∈L2and is taken to be independent

of _{Bt : t ≥ 0}. The parameter vectors θdrif t and θdif f are such that ¡

θdrif t,θdif f¢ = θ _∈ Θ

where Θis a compact subset of RM _{for a generic}_M_{. More speci}_fi_cally, _θdrif t _∈_Θdrif t_⊂_Rm1 _and

θdif f _∈Θdif f _⊂Rm2 _with _m

1+m2 =M. The vectors θdrif t and θdif f jointly define a parametric

family for (1). Since we will be dealing with extremum estimation procedures, it is convenient to denote the true values of these parameters by θ₀drif t and θ₀dif f.

We now define the left-continuousfiltration

=t:=σ(X)_∨₌B_t =σ(X, Bs; 0≤s≤t) 0≤t <∞

and the collection of null sets

ℵ:=_{N _⊆Ω;_∃G_∈₌_∞ with N _⊆G and P(G) = 0_}. We create the augmented filtration

e

=Xt :=σ(=t∪ ℵ) 0≤t <∞.

As in BP (1998), the following conditions are used in the study of (1). They guarantee the existence and pathwise uniqueness of a nonexplosive solution to (1) that is adapted to the augmentedfiltration

{=eXt }.

2.1 Assumption

(A) µ(_·,θdrif t) and σ(_·,θdif f) are time-homogeneous, B-measurable functions on D= (l, u) with

−∞ ≤ l < u _{≤ ∞} where B is the σ-field generated by Borel sets on D. Both functions are at least twice continuously diﬀerentiable. Hence, they satisfy local Lipschitz and growth conditions. Thus, for every compact subset J = [1/H, H] with H > 0 of the range of the process, there exist constants C1 and C2 such that, for all x and y in J,

|µ(x,θdrif t)₋µ(y,θdrif t)_|+_|σ(x,θdif f)₋σ(y,θdif f)_|_≤C1|x−y|,

and

|µ(x,θdrif t)_|+_|σ(x,θdif f)_|_≤C2{1+|x|}.

(B) σ2(_·,θdif f)>0on D.

(C) [Feller’s (1952) necessary and suﬃcient condition for nonexplosion]. We define V(α,θ)as

Z α 0 S0(y,θ)_{ Z y 0 · 2 S0₍_x,_θ₎_σ2₍_x,_θdif f₎ ¸ dx_}dy 3

(4)

where S0₍_x,_θ₎ _{is the} _fi_{rst derivative of the natural scale measure,} S(α,θ) = Z α 0 exp_{ Z y 0 · −2µ(x,θ drif t₎ σ2₍_x,_θdif f₎ ¸ dx_}dy.

We require V(α,θ) to diverge at the boundaries of D, i.e.

lim

α→lV(α,θ) = limα→uV(α,θ) =∞.

(D) µ(_·,θdrif t) and σ(_·,θdif f) are at least twice continuously diﬀerentiable in θdrif t and θdif f. As discussed in BP (1998), under conditions (A), (B) and (C), the stochastic diﬀerential equation has a strong solutionXt that is unique, recurrent and continuous int∈[0, T].Assumption (D) will

be used in the development of our asymptotics.

The objects of econometric interest are the drift,µ(.,θdrif t), and the diﬀusion term,σ2(.,θdif f). Their conditional moment definitions are well known. They can be interpreted as representing the ‘instantaneous’ conditional mean and the ‘instantaneous’ conditional variance of increments in the process [c.f. Karlin and Taylor (1981), for example]. More precisely, µ(.,θdrif t) describes the conditional expected rate of change of the process for infinitesimal time changes, whereasσ2(.,θdif f) gives the conditional rate of change of volatility.

3 The Econometric procedure

We define a ‘minimum distance’ type of estimation that exploits the consistency of accurately defined functional estimators and provides estimates of the parameters of interest by ‘matching’ the parametric expressions to their nonparametric counterparts.

The first step consists of defining the functional estimates. We consider a simplified version of the estimators in BP (1998). Assume the dataXtis recorded discretely at {t=t1, t2, .., tn} in the

time interval [0, T], with T _≥ T0 > 0, where T0 is a positive constant. Also, assume equispaced

data. Hence,

{Xt=X∆n,T, X2∆n,T, X3∆n,T, ..., Xn∆n,T}

are nobservations at

{t1 =∆n,T, t2 = 2∆n,T, t3 = 3∆n,T, ..., tn=n∆n,T}

where ∆n,T =T /n. The diﬀusion function estimator is

b σ2₍_n,T₎(x) = 1 ∆n,T Pn₋1 j=1K( Xj∆_n,T−x hn,T )[X(j+1)∆n,T −Xj∆n,T] 2 Pn j=1K( Xj∆_n,T−x hn,T ) . (2)

The drift function estimator is

b µ₍_n,T₎(x) = 1 ∆n,T Pn₋1 j=1 K( Xj∆_n,T−x hn,T )[X(j+1)∆n,T −Xj∆n,T] Pn j=1K( Xj∆_n,T−x hn,T ) . (3)

(5)

Remark 3.1 The estimators above are defined as rather straightforward sample analogues of the theoretical functions. BP (1998) discuss the properties of consistency and asymptotic mixed normality of more general functional analogues to the true functions than (2) and (3) above. They show that recurrence (which is implied by Assumption 2.1 above), rather than stationarity, is all that is needed to achieve identification. They derive the asymptotics as (i) the time span (T) and (ii) the number of data points (n) increase with (iii) the frequency of observations (T_n _→0). Condition (iii) is necessary for the consistent estimation of continuous-time models using fully functional methods under general assumptions. Condition (i)is crucial only for drift estimation as the local dynamic of the process does not contain suﬃcient information to identify its infinitesimal

first moment. By letting the time span increase to infinity, the drift can be recovered in the limit since the process continues to make repeated visits to diﬀerent spatial points by virtue of recurrence.

Remark 3.2 Formulae (2) and (3) can be interpreted as estimates of Stanton’s first order ap-proximations to the infinitesimal moments of a diﬀusion [c.f. Stanton (1997)]. They are proven to be consistent and asymptotically mixed normal under the same conditions on the time span and the sample size that were described in the previous remark [c.f. Bandi (1999)].

Remark 3.3 More general sample analogues to the true functions of the type described in BP (1998) could be used to derive the functional estimates. We decided to employ specifications based on simple smoothing rather than on convoluted kernels [as in the most general case examined by BP (1998)] for simplicity. In eﬀect, the use of more involved specifications would not qualitatively change the asymptotic results given here.

In finite samples the use of convoluted kernels does make a diﬀerence [c.f. Bandi and Nguyen (1999)]. In particular, we know that the choice of the optimal smoothing parameter for the drift is empirically cumbersome. Yet, the use of convoluted kernels limit the eﬀects of potentially subopti-mal choices. Extension to more general kernels can be easily derived from the apparatus discussed below.

Consider the criteria

Qdrif t_n,T = T n n X i=1 ³ b µ₍_n,T₎(X_i_∆_n,T)₋µ(X_i_∆_n,T,θdrif t)´2 (4) and Qdif f_n,T = T n n X i=1 ³ b σ2₍_n,T₎(X_i_∆_n,T)₋σ2(X_i_∆_n,T,θdif f)´2 (5)

where µ_b₍_n,T₎(.) and σ_b2₍_n,T₎(.) are defined in (3) and (2), respectively. Formulae (4) and (5) can be interpreted as the integrated mean squared diﬀerences between the kernel estimates and their corresponding parametric specifications. We consider averaged squared diﬀerences over a fixed time span T = T. Despite the fact that we fix T to define the criteria to be minimized, the nonparametric estimates rely on asymptotic sampling over an enlarging time span for the reasons discussed in Remark 3.1. In the sequel we will often refer to the expression “fixed time span” to describe the situation where kernel estimation is assumed to be implemented (in the limit) over a

fixed observation span. As far as the criteria (4) and (5) are concerned, we will assume throughout that the X0

i∆n,Ts are recorded discretely over a fixed time period T. This assumption will be

extended in Remark 5.1.

(6)

The semiparametric estimatesbθdrif t_n,T andbθdif f_n,T are obtained as follows:

b

θdrif t_n,T : = arg min θdrif t_∈_Θdrif t_⊂_Θ Qdrif t_n,T = arg min θdrif t_∈_Θdrif t_⊂_Θ T n n X i=1 ³ b µ₍_n,T₎(X_i_∆_n,T)₋µ(X_i_∆_n,T,θdrif t)´2 (6) and

bθdif f_n,T : = arg min θdif f_∈_Θdif f_⊂_Θ Qdif f_n,T = arg min θdif f_∈_Θdif f_⊂_Θ T n n X i=1 ³ b σ2₍_n,T₎(X_i_∆_n,T)₋σ2(X_i_∆_n,T,θdif f)´2. (7)

Remark 3.4 As in the fully nonparametric case, we identify the drift and diﬀusion parameters (bθdrif t_n,T andbθdif f_n,T , that is) separately. This is of particular importance when we are interested in the parametrization of a specific function in situations where the other function is treated as a nuisance parameter.

Remark 3.5 In the next section we derive the consistency and asymptotic mixed normality of

b

θdrif t_n,T andbθdif f_n,T asT _{→ ∞}, n_{→ ∞}and T_n _→0. As in the fully nonparametric case, we show that

b

θdif f_n,T can be identified over a fixed time span (T =T) provided T_n _→0, that is as the frequency of observations increases.

4 Limit theory

First, we consider the drift case.

Theorem 4.1 (Consistency of the drift estimates) Assume n _{→ ∞}, T _{→ ∞} and hn,T → 0 (as n, T _{→ ∞}) such that ∆n,T

hn,T → 0,

LX(T,x) hn,T (∆n,T)

α ₌ _O

a.s.(1) for some α ∈ (0,1₂) and LX(T, x)hn,T a.s.→ ∞,then

Qdrif t_n,T (θdrif t)a.s._→ Qdrif t(θdrif t,θdrif t₀ ) =

Z _∞

−∞

³

µ(s,θdrif t₀ )₋µ(s,θdrif t)´2LX(T , s)ds, (8)

uniformly in θdrif t_. _{Also, let} _B₍_θdrif t_,_ε₎ _{denote an open ball in} _Θdrif t _{of radius} _ε _around _θdrif t_.

Assume that for _∀δ>0and _∀ε>0 inf θdrif t_∈_/_B³_θdrif t 0 ,ε ´ Z |s|≤δ ³ µ(s,θ₀drif t)₋µ(s,θdrif t)´2ds >0. (9) Then, b

(7)

Theorem 4.2 (The limit distribution of the drift estimates) Given n_{→ ∞}, T _{→ ∞} and

hn,T →0 (as n, T → ∞) such that ∆_hn,T

n,T →0,

α ₌_O

a.s.(1) for some α∈(0,1₂) and LX(T, x)hn,T a.s.→ ∞,then (Ξmu(T))−1/2 ³ bθdrif t_n,T ₋θ₀drif t´_→dN(0,I), (10) where Ξmu(T) =B(T)−mu1Ω(T)muB(T)−mu1, and Bmu= µZ _∞ −∞ ∂µ0(a) ∂θ ∂µ0(a) ∂θ0 LX(T , a)da ¶ , Ωmu= ÃZ _∞ −∞ σ2(a) µ_∂_µ 0(a) ∂θ ∂µ0(a) ∂θ0 ¶ ¡_L X(T , a) ¢2 LX(T, a) da ! .

Remark 4.3 The result is consistent with what we would expect to obtain in a correctly specified standard nonlinear regression context with heteroskedastic errors [c.f. Davidson and MacKinnon (1993) for a classical treatment]. The only diﬀerence is that we replace integrals with respect to probability measures with spatial integrals [c.f. Park and Phillips (1998b)]. This is due to the generality of the approach adopted here. In particular, it is due to the robustness to deviations from stationarity.

Remark 4.4 Coherently with the fully nonparametric case, the rate of convergence is path-dependent and is driven by the rate of divergence to infinity of the chronological local time of the underlying process through the integral Ωmu.By virtue of the averaging, the rate is faster than in

fully functional context, i.e.

q

hn,TLX(T, x).

Remark 4.5 As usual, the limit theory clarifies the sense in which enlarging the time span (T _{→ ∞}) is crucial for consistent estimation of the infinitesimalfirst moment of a diﬀusion. In eﬀect, if wefixT(=T), thenLX(T , x) =Op(1) andΞmu(T) =Op(1). In consequence,bθ

drif t n,T

p

9θ₀drif t [c.f. formula (10) above] when T isfixed.

We now turn to the diﬀusion parameter estimates.

Theorem 4.6 (Consistency of the diﬀusion estimates) Assume n_{→ ∞}, T _{→ ∞} and hn,T →0 (as n, T _{→ ∞}) such that ∆n,T

hn,T →0 and

α ₌_O

a.s.(1) for some α∈(0,1₂),then

Qdif f_n,T (θdif f)a.s._→Qdif f(θdif f,θ₀dif f) =

Z _∞

−∞

³

σ2(s,θdif f₀ )₋σ2(s,θdif f)´2LX(T , s)ds. (11)

(8)

uniformly in θdif f_. _{Also, let} _B₍_θdif f

0 ,ε) denote an open ball of radius ε around θ

dif f

0 in Θdif f.

Assume that for _∀δ >0 and_∀ ε>0 inf θdif f_∈_/_B³_θdif f 0 ,ε ´ Z |s|≤δ ³ σ2(s,θdif f₀ )₋σ2(s,θdif f)´2ds >0. (12) Then, b

θdif f_n,T a.s._→ θ₀dif f.

Theorem 4.7 (The limit distribution of the diﬀusion estimates) Given n_{→ ∞}, T _{→ ∞}

and hn,T →0 (as n, T → ∞) such that ∆_hn,T

n,T →0,

α ₌_O

a.s.(1) for some α∈(0,1₂)

and h 3 n,T ∆n,T →0,then 1 p ∆n,T

Ξ−_sigma1/2 (T)³bθdif f_n,T ₋θdif f₀ ´_→dN(0,I) (13)

where

Ξsigma(T) =B(T)−_sigma1 Ω(T)sigmaB(T)−_sigma1

and Bsigma= µZ _∞ −∞ ∂σ₀2(a) ∂θ ∂σ2₀(a) ∂θ0 LX(T , a)da ¶ Ωsigma= ÃZ _∞ −∞ 4σ4(a) µ ∂σ2₀(a) ∂θ ∂σ2₀(a) ∂θ0 ¶ ¡_L X(T , a) ¢2 LX(T, a) da ! .

Remark 4.8 In light of Remark 4.3, the integralsBsigma and Ωsigma can be readily interpreted

as spatial analogues of the expectations that would arise from the standard nonlinear estimation of conditional expectations. The term 4σ4(a) is generated by the quadratic nature of the nonpara-metric estimator of the infinitesimal second moment of a diﬀusion.

Remark 4.9 As in the drift case, the rate of convergence is path-dependent. Also, the semipara-metric estimates entail eﬃciency gains with respect to their nonparametric counterparts. In eﬀect, the functional estimates have slower pointwise convergence rates given by

√

hn,TLX(T,x)

√

∆n,T

.

Remark 4.10 The rate of convergence of the diﬀusion estimates is faster than the rate of con-vergence of the drift estimates. The diﬀerence is remarkable and is given by the multiplicative factor √1

∆n,T

=p_Tn. This is a standard result that reflects perfectly the diﬀerence in the pointwise convergence rates of the nonparametric estimates (that is,

q hn,TLX(T, x) versus √ hn,TLX(T,x) √_∆ n,T ).

(9)

Remark 4.11 The condition h

3

n,T

∆n,T → 0 guarantees that the limit distribution is driven by the

‘variance term’ in the estimation error decomposition [see the proof of Theorem 4.7]. Should

h3_n,T

∆n,T → ∞, then the ‘bias term’ would dominate, but the semiparametric estimates would still

converge to a Gaussian distribution. No choice of the smoothing parameter can make the ‘bias term’ dominate in the drift case due to the slow rate of convergence of the ‘variance term’.

Remark 4.12 As usual, the diﬀusion parameters can be identified over a fixed time span [c.f. BP (1998)]. In this case the convergence rate ceases to be path-dependent: we experience √n -convergence for the semiparametric estimates andqnh_n,T-convergence for the fully nonparametric counterpart in (2) above. The gain in eﬃciency which is assured by the adoption of the semipara-metric approach is noteworthy and is perfectly consistent with more traditional semiparasemipara-metric models [c.f. Andrews (1989), for example]. To summarize, if T isfixed (=T) then,

√ n³bθdif f_n,T ₋θdif f₀ ´_→dM N µ 0, 1 TΞsigma(T) ¶ , where

Ξsigma(T) =B(T)−_sigma1 Ω(T)sigmaB(T)−_sigma1

and Bsigma= µZ _∞ −∞ ∂σ₀2(a) ∂θ ∂σ2₀(a) ∂θ0 LX(T , a)da ¶ Ωsigma= µZ _∞ −∞ 4σ4(a) µ ∂σ2 0(a) ∂θ ∂σ2 0(a) ∂θ0 ¶ LX(T , a)da ¶ .

5 An instructive example: Brownian Motion

Assume the data is generated from a Brownian motionB=σW with varianceσ2.We parametrize the diﬀusion as

dXt=µdt+σdWt

and minimize the criteria (4) and (5). It follows that

b θdrif t_n,T = 1 n n X i=1 b µ₍_n,T₎(Xi∆_n,T) and b θdif f_n,T = v u u t1 n n X i=1 b σ2₍_n,T₎(Xi∆_n,T).

The limit theories can be derived in closed form since the rate of divergence to infinity of the Brownian local time is known. In particular,

(10)

Bmu = µZ _∞ −∞ ∂µ0(a) ∂θ ∂µ0(a) ∂θ0 LX(T , a)da ¶ = Z _∞ −∞ LX(T , a)da = 1 σ2[B]T = Z _∞ −∞ 1 σT 1 2_L W Ã 1, 1 T 1 2 a σ ! da = Z _∞ −∞ T LW (1, x)dx = T[W]1 = T , and Ωmu = ÃZ _∞ −∞ σ2(a) µ ∂µ0(a) ∂θ ∂µ0(a) ∂θ0 ¶ ¡_L X(T , a) ¢2 LX(T, a) da ! = Z _∞ −∞ σ2 1 σ2 µ T 1 2_L W µ 1, 1 T 1 2 a σ ¶¶2 1 σT 1 2L_W ³ 1, 1 T12 a σ ´ da = Z _∞ −∞ σ21 σ T 1 2 T 1 2 µ T 1 2_L W µ 1, 1 T12 a σ ¶¶2 T12L_W µ 1, 1 T12 T12 T 1 2 a σ ¶ da = Z _∞ −∞ σ2T 1 2 ³ T 1 2_L W (1, x) ´2 T12L_W µ 1,T 1 2 T12 x ¶dx = 1 T12L_W(1,0 +o(1)) Z _∞ −∞ σ2T 3 2 ₍_L W(1, x))2dx. Then, T14(bθdrif t n,T −θ drif t 0 )→dM N Ã 0, Z _∞ −∞ σ2 Ã (LW (1, x))2 T 1 2_L W (1,0) ! dx ! .

The rate of convergence, T14, is faster than in the fully nonparametric case, where it is known to

be T14h1/2

n,T [BP (1998)]. We now turn to diﬀusion estimation. Write

Bsigma = µZ ∞ −∞ ∂σ₀2(a) ∂θ ∂σ2₀(a) ∂θ0 LX(T , a)da ¶

(11)

= Z _∞ −∞ 4σ2LX(T , a)da = 4σ2 Z _∞ −∞ LX(T , a)da = 4σ2T , and Ωsigma = ÃZ _∞ −∞ 4σ4(a) µ ∂σ2₀(a) ∂θ ∂σ2₀(a) ∂θ0 ¶ ¡_L X(T , a) ¢2 LX(T, a) da ! = Z _∞ −∞ 4σ4¡4σ2¢ Ã¡ LX(T , a) ¢2 LX(T, a) ! da = 1 T12L_W(1,0 +o(1)) Z _∞ −∞ 4σ4(4σ2)T 3 2 ₍_L W(1, x))2dx. In consequence, T14 p ∆n,T (bθdif f_n,T ₋θdif f₀ )_→dM N Ã 0, Z _∞ −∞ σ2 Ã (LW(1, x))2 T 1 2_L W(1,0) ! dx ! .

As in the previous case, the rate of convergence that would emerge from purely functional estimation is slower, viz. T 1 4 √_∆ n,T h1_n,T/2.

Remark 5.1 It appears that we can increase further the rate of converge by working with squared diﬀerences defined over an enlarging time span. In this case,

T12(bθ drif t n,T −θ drif t 0 )→dN ¡ 0,σ2¢ and T12 p ∆n,T(bθ dif f n,T −θ dif f 0 )→dN ¡ 0,σ2¢.

Of course, this result holds more generally. For more general diﬀusion processes than Brownian motion we have (Ξmu(T))−1/2 ³ bθdrif t_n,T ₋θ₀drif t´_→dN(0,I) where Ξmu(T) =B(T)−mu1Ω(T)muB(T)−mu1, and Bmu= µZ _∞ −∞ ∂µ0(a) ∂θ ∂µ0(a) ∂θ0 LX(T, a)da ¶ , 11

(12)

Ωmu= µZ _∞ −∞ σ2(a) µ ∂µ0(a) ∂θ ∂µ0(a) ∂θ0 ¶ LX(T, a)da ¶ . Also, 1 p ∆n,TΞ −1/2 sigma(T) ³ b θdif f_n,T ₋θdif f₀ ´_→dN(0,I) where

Ξsigma(T) =B(T)−sigma1 Ω(T)sigmaB(T)−sigma1

and Bsigma= µZ _∞ −∞ ∂σ2₀(a) ∂θ ∂σ2₀(a) ∂θ0 LX(T, a)da ¶ , Ωsigma= µZ _∞ −∞ 4σ4(a) µ ∂σ₀2(a) ∂θ ∂σ2₀(a) ∂θ0 ¶ LX(T, a)da ¶ .

6 Covariance matrix estimation

We will only consider the drift case. The results readily extend to covariance matrix estimation in the diﬀusion case. From Theorem 4.2 above, write

asicov(bθdrif t_n,T ) = Ξmu(θdrif t0 ,θ dif f 0 ) = ³ Bmu(θ₀drif t) ´₋1³ Ωmu(θ₀drif t,θ₀dif f) ´ ³ Bmu(θ₀drif t) ´₋1 with Bmu(θ₀drif t) = ÃZ ∞ −∞ ∂µ(a,θdrif t₀ ) ∂θ ∂µ(a,θ₀drif t) ∂θ0 LX(T , a)da ! and Ωmu(θdrif t0 ,θ dif f 0 ) = ÃZ _∞ −∞ σ2(a,θ₀dif f) Ã ∂µ(a,θ₀drif t) ∂θ ∂µ(a,θdrif t₀ ) ∂θ0 ! ¡ LX(T , a) ¢2 LX(T, a) da ! . It is straightforward to prove [see proof of Theorem 4.7] that

b Bmu(θdrif t)_n,T = ∆_n,T n X i=1 ∂µ(Xi∆_n,T,θdrif t) ∂θ ∂µ(Xi∆_n,T,θdrif t) ∂θ0 a.s. → Z _∞ −∞ ∂µ(a,θdrif t) ∂θ ∂µ(a,θdrif t) ∂θ0 LX(T , a)da = Bmu(θdrif t)

(13)

and b Ωmu(θdrif t,θdif f)_n,T,T = hn,T h_n,T ³ ∆_n,T´2 ∆n,T n X i=1 σ2(Xi∆_n,T,θdif f) ∂µ(Xi∆_n,T,θdrif t) ∂θ ∂µ(Xi∆_n,T,θdrif t) ∂θ0 Pn j=1K( Xj∆ n,T−Xi∆n,T h_n,T ) Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) a.s. → Z _∞ −∞ σ2(a,θdif f) µ ∂µ(a,θdrif t₎ ∂θ ∂µ(a,θdrif t₎ ∂θ0 ¶ ¡_L X(T , a) ¢2 LX(T, a) da = Ωmu(θdrif t,θdif f.)

uniformly in Θdrif t and Θdif f.. We combine this result with the continuity of ∂µ(.,θ_∂θdrif t) and

σ2₍_.,_θdif f.₎ _at_θdrif t

0 andθ

dif f.

0 (c.f. Assumption 2.1) and the consistency ofbθ

drif t n,T andbθ

dif f n,T (from

Theorems 4.1and 4.6) to yield

b Bmu(bθ drif t n,T )n,T a.s.→Bmu(θ0drif t) and b Ωmu(bθ drif t n,T ,bθ dif f n,T )n,T a.s. →Ωmu(θ0drif t,θ dif f 0 ).

The proof follows standard arguments in extremum estimation [see the proof of Theorem 4.2 for a similar derivation]. In consequence,

³ b Bmu(bθ drif t n,T )n,T ´−1³ b Ωmu(bθ drif t n,T ,bθ dif f. n,T )n,T ´ ³ b Bmu(bθ drif t n,T )n,T ´−1 a.s. → ³ Bmu(θ0drif t) ´₋1 Ωmu(θdrif t0 ,θ dif f 0 ) ³ Bmu(θdrif t0 ) ´₋1 = Ξmu(θ0drif t,θ dif f 0 ).

Defining the criterion over an enlarging time span as in Remark 5.1 above, we obtain

b Bmu(bθ drif t n,T )n,T = ∆n,T n X i=1 ∂µ(Xi∆n,T,bθ drif t n,T ) ∂θ ∂µ(Xi∆n,T,bθ drif t n,T ) ∂θ0 a.s. → Z _∞ −∞ ∂µ(a,θ₀drif t) ∂θ ∂µ(a,θdrif t₀ ) ∂θ0 LX(T, a)da = Bmu(θ0drif t) and b Ωmu(bθ drif t n,T ,bθ dif f n,T )n,T = ∆n,T n X i=1 σ2(Xi∆n,T,bθ dif f n,T ) ∂µ(Xi∆n,T,bθ drif t n,T ) ∂θ ∂µ(Xi∆n,T,bθ drif t n,T ) ∂θ0 13

(14)

a.s. → Z _∞ −∞ σ2(a,θ₀dif f) Ã ∂µ(a,θ₀drif t) ∂θ ∂µ(a,θdrif t₀ ) ∂θ0 ! LX(T, a)da = Ωmu(θdrif t0 ,θ dif f 0 ).

The last two expressions further clarify the analogy between the methods developed here and more standard nonlinear estimation problems. As conventional in correctly specified nonlinear models with heterogeneous errors, the asymptotic variance-covariance matrix is simply estimated as a convolution of averages involving the outer-product of the gradient of the conditional expectation calculated at the estimated parameter vector.

7 Conclusion

This paper shows how to utilize the informational content of carefully implemented nonparametric methods in the estimation of continuous time models of the diﬀusion type while improving on their generally poor convergence properties. From a practical point of view, the semiparametric procedure suggested in this work combines the simplicity of limit theories that can be interpreted as spatial counterparts of the standard asymptotics for nonlinear econometric models with the generality of methods that are robust to deviations from strong distributional assumptions, such as stationarity. Furthermore, the general estimation strategy given here provides a framework which may be particularly appealing to researchers with strong beliefs about potential parametrizations for the two functions of interest.

The next step is naturally to study a testing procedure for alternative parametric specifications based on the quadratic criteria used in this paper. Due to the broadly applicable identifying information that is embodied in the estimated functional drift and diﬀusion functions and thefinite sample accuracy of the asymptotics of the functional estimates [c.f. Bandi and Nguyen (1999)], we believe such test procedures are likely to be attractive. They can, for instance, be expected to have better size properties and more power than testing methods that are based on density-matching procedures which rely on stationarity [c.f. Pritsker (1998)]. Research on this subject will be reported by the authors in subsequent work.

8 Notation

→a.s. almost sure convergence →p convergence in probability ⇒,_→d weak convergence

:= definitional equality op(1) tends to zero in probability Op(1) bounded in probability oa.s.(1) tends to zero almost surely Oa.s.(1) bounded almost surely

=d distributional equivalence

MN (0, V) mixed normal distribution with varianceV

(15)

9 Proofs

Proof of theorem 4.1 First, we prove uniform strong convergence of the criterionQdrif t_n,T (θdrif t) as in (8). Write Qdrif t_n,T (θdrif t) = T n n X i=1 ³ b µ₍_n,T₎(X_i_∆_n,T)₋µ(X_i_∆_n,T,θdrif t)´2 = T n n X i=1   _∆1 n,T Pn−1 j=1 K( Xj∆_n,T−Xi∆ n,T hn,T )[X(j+1)∆n,T −Xj∆n,T] Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) −µ(Xi∆_n,T,θdrif t)    2 = T n n X i=1   _∆1 n,T Pn₋1 j=1 K( Xj∆_n,T−Xi∆ n,T hn,T )[X(j+1)∆n,T −Xj∆n,T] Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )    2 | {z } (a) + T n n X i=1 µ2(Xi∆_n,T,θdrif t) | {z } (b) −2T n n X i=1 µ(Xi∆_n,T,θdrif t)   _∆1 n,T Pn₋1 j=1 K( Xj∆_n,T−Xi∆ n,T hn,T )[X(j+1)∆n,T −Xj∆n,T] Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )    | {z } (c)

Then, using the results in BP (1998),

(b) = T n n X i=1 µ2(Xi∆_n,T,θdrif t) = Z T 0 µ2(Xs,θdrif t)ds+oa.s.(1) = Z +∞ −∞ µ2(a,θdrif t)LX(T , a)da+oa.s.(1). Furthermore, (c) = ₋2T n n X i=1 µ(Xi∆_n,T,θdrif t)   _∆1 n,T Pn₋1 j=1 K( Xj∆_n,T−Xi∆ n,T hn,T )[X(j+1)∆n,T −Xj∆n,T] Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )    = ₋2T n n X i=1 µ(Xi∆_n,T,θdrif t)   _∆n,T1 Pn₋1 j=1 K( Xj∆_n,T−Xi∆ n,T hn,T ) R(j+1)∆n,T j∆n,T µ(Xs)ds Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )    | {z } (c1) 15

(16)

−2T n n X i=1 µ(Xi∆_n,T,θdrif t)   _∆n,T1 Pn₋1 j=1 K( Xj∆_n,T−Xi∆ n,T hn,T ) R(j+1)∆n,T j∆n,T σ(Xs)dBs Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )    | {z } (c2) . It follows that, (c1) = ₋2T n n X i=1 µ(Xi∆_n,T,θdrif t)   _∆1 n,T Pn₋1 j=1 K( Xj∆_n,T−Xi∆ n,T hn,T ) R(j+1)∆n,T j∆n,T µ(Xs)ds Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )    = ₋2T n n X i=1 µ(Xi∆_n,T,θdrif t)    Pn₋1 j=1K( Xj∆_n,T−Xi∆ n,T hn,T )µ(Xj∆n,T +oa.s.(1)) Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )    = ₋2 Z T 0 µ(Xs,θdrif t)   1 hn,T RT 0 K( Xu−Xs hn,T )µ(Xu+oa.s.(1))du 1 hn,T RT 0 K( Xa−Xs hn,T )da  ds+oa.s.(1) = ₋2 Z +∞ −∞ µ(s,θdrif t) Ã 1 hn,T R_∞ −∞K( u−s hn,T)µ(u+oa.s.(1))LX(T, u)du 1 hn,T R_∞ −∞K( a−s hn,T)LX(T, a)da ! LX(T , s)ds+oa.s.(1) = ₋2 Z +∞ −∞ µ(s,θdrif t) ÃR_∞ −∞K_R(c)µ(s+hn,Tc)LX(T, s+hn,Tc)dc ∞ −∞K(e)LX(T, s+hn,Te)de ! LX(T , s)ds+oa.s.(1) = ₋2 Z +∞ −∞

µ(s,θdrif t)µ(s,θ₀drif t)LX(T , s)ds+oa.s.(1).

Also, it is easy to prove [see the proof of Theorem 4.2] that

(c2) = 2T n n X i=1 µ(Xi∆n,T,θ drif t₎   1 ∆n,T Pn₋1 j=1 K( Xj∆_n,T−Xi∆_n,T hn,T ) R(j+1)∆n,T j∆n,T σ(Xs)dBs Pn j=1K( Xj∆_n,T−Xi∆_n,T hn,T )  a.s. →0, and Φ−1/2(T)_∗(c2) =Op(1), with Φ(T) = Z _∞ −∞ Ã 4σ2(a,θdrif t₀ )µ2(a,θdrif t) ¡ LX(T , a) ¢2 LX(T, a) ! da.

We now examine the quadratic term (a). Write,

T n n X i=1   1 ∆n,T Pn₋1 j=1 K( Xj∆_n,T−Xi∆_n,T hn,T )[X(j+1)∆n,T −Xj∆n,T] Pn j=1K( Xj∆_n,T−Xi∆_n,T hn,T )   2

(17)

= T n n X i=1   1 ∆n,T Pn₋1 j=1 K( Xj∆_n,T−Xi∆_n,T hn,T ) R(j+1)∆n,T j∆n,T µ(Xs)ds Pn j=1K( Xj∆_n,T−Xi∆_n,T hn,T )   2 | {z } (a1) + T n n X i=1   1 ∆n,T Pn₋1 j=1 K( Xj∆_n,T−Xi∆_n,T hn,T ) R(j+1)∆n,T j∆n,T σ(Xs)dBs Pn j=1K( Xj∆_n,T−Xi∆_n,T hn,T )   2 | {z } (a2) + 2T n n X i=1    ³ 1 ∆n,T ´2P n−1 j=1 Pn−1 k=1K( Xj∆_n,T−Xi∆_n,T hn,T )K( Xk∆_n,T−Xi∆_n,T hn,T ) R(k+1)∆n,T k∆n,T µ(Xs)ds R(j+1)∆n,T j∆n,T σ(Xs)dBs Pn j=1K( Xj∆_n,T−Xi∆_n,T hn,T ) Pn k=1K( Xk∆_n,T−Xi∆_n,T hn,T )    | {z } (a3)

We start with (a1).Following the same steps as for(c1) above we deduce that

T n n X i=1   _∆1 n,T Pn₋1 j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) R(j+1)∆n,T j∆n,T µ(Xs)ds Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )    2 a.s. → Z +∞ −∞ µ2(s,θdrif t₀ )LX(T , s)ds.

We now examine (a2). Write

T n n X i=1   _∆1 n,T Pn₋1 j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) R(j+1)∆n,T j∆n,T σ(Xs)dBs Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )    2 = T n n X i=1 1 µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¶2   1 hn,T n−1 X j=1 K(Xj∆n,T −Xi∆n,T hn,T ) Z (j+1)∆n,T j∆n,T σ(Xs)dBs   2 = T n n X i=1 1 µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¶2 ¡ M_ni(1)¢2, where M_ni(r) = 1 hn,T J_X−1 j=1 K(Xj∆n,T −Xi∆n,T hn,T ) Z (j+1)∆n,T j∆n,T σ(Xs)dBs + 1 hn,T K(XJ∆n,T −Xi∆n,T hn,T ) Z rn∆n,T J∆n,T σ(Xs)dBs

for J = [nr]. M_ni(r) is anL2-martingale_∀i. Note that

dM_ni(r) = 1 hn,T K(XJ∆n,T −Xi∆n,T hn,T )σ(Xrn∆n,T)dBrn∆n,T 17

(18)

and dhM_ni, M_nki r = dM i n(r)dMnk(r) = " n∆n,T h2_n,T K( XJ∆n,T −Xi∆n,T hn,T )K(XJ∆n,T −Xk∆n,T hn,T )σ2(Xrn∆n,T) # dr. Then, h M_ni, M_nki r = Z r 0 dhM_ni, M_nki s = Z r 0 " n∆n,T h2 n,T K(X[ns]∆n,T −Xi∆n,T hn,T )K(X[ns]∆n,T −Xk∆n,T hn,T )σ2(Xsn∆n,T) # ds ∼ _h21 n,T Z rT 0 " K(Xu−Xi∆n,T hn,T )K(Xu−Xk∆n,T hn,T )σ2(Xu) # du. Furthermore, M_ni(r) = Z r 0 dM_ni(s) = Z r 0 1 hn,T K(X[sn]∆n,T −Xi∆n,T hn,T )σ(Xsn∆n,T)dBsn∆n,T and M_ni(r)M_nk(r) = Z r 0 " n∆n,T h2 n,T K(X[sn]∆n,T −Xi∆n,T hn,T )K(X[sn]∆n,T −Xk∆n,T hn,T )σ2(Xsn∆n,T) # ds ∼ hM_ni, M_nki r. (14)

The approximation (14) will be used in what follows. Given the continuous martingale Mi n(r),

there is a unique decomposition of the continuous submartingale ¡M_ni(r)¢2 as the sum of a contin-uous martingale and a contincontin-uous integrable increasing process [see Chung and Williams (1990), for example] such that

¡

M_ni(r)¢2₋¡M_ni(0)¢2= 2

Z r

0

MidMi+ [Mi]r ∀r

with M_ni(0) = 0in our case. Hence,

a2 = T n n X i=1 1 ³_∆ n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆_n,T hn,T ) ´2 ¡ M_ni(1)¢2 = T n n X i=1 1 µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¶2[M i n]1 +2T n n X i=1 1 µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¶2 µZ 1 0 M_nidM_ni ¶ .

(19)

The first term can be represented as follows, T n n X i=1 1 µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¶2[M i n]1 = T n n X i=1 1 h2 n,T RT 0 K2( Xu−Xi∆ n,T hn,T )σ 2₍_X u)du µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¶2 +oa.s.(1) = 1 hn,T Z T 0    1 hn,T RT 0 K 2₍Xu−Xs hn,T )σ 2₍_X u)du ³ 1 hn,T RT 0 K( Xa−Xs hn,T )da ´2   ds+oa.s.(1) = 1 hn,T Z T 0    1 hn,T RT 0 K 2₍u−s hn,T)σ 2₍_u₎_L X(T, u)du ³ 1 hn,T RT 0 K( a−s hn,T)LX(T, a)da ´2   LX(T , s)ds+oa.s.(1) = 1 hn,T K2 ½Z _∞ −∞ σ2(s)LX(T , s) LX(T, s) ds ¾ +oa.s.(1), where K2= R_∞ −∞K 2₍_φ₎_d_φ_._{In consequence,} T n n X i=1 1 ³Pn j=1K( Xj∆_n,T−Xi∆_n,T hn,T ) ´2[M i_] 1 a.s. →0 iﬀ 1 hn,T ½Z _∞ −∞ σ2(s)LX(T , s) LX(T, s) ds ¾ a.s. → 0asn, T _{→ ∞}, which is implied by hn,TLX(T, x) a.s.

→ ∞ ∀x. We now analyze the second component of (a2). By Brownian motion embedding, the quantity

2T n n X i=1 1 µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¶2 µZ 1 0 M_nidM_ni ¶

converges to a normal variate with mean zero and whose variance is given by the limiting covariance variation process at time1. Write

4 µ T n ¶2_Xn i=1 n X k=1 R1 0 MniMnkd[Mni, Mnk] µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¶2µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xk∆ n,T hn,T ) ¶2.

Using (14)and noting that R₀1£Mi n, Mnk ¤ d[Mi n, Mnk] = [Mi n,Mnk]21

2 ,the previous expression becomes

(20)

4 µ T n ¶2_Xn i=1 n X k=1 R1 0 MniMnkd[Mni, Mnk] µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¶2µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xk∆ n,T hn,T ) ¶2 ∼ 2 µ T n ¶2_Xn i=1 n X k=1 [Mi n, Mnk]21 µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¶2µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xk∆ n,T hn,T ) ¶2 = 2 µ T n ¶2_Xn i=1 n X k=1 µ 1 h2 n,T RT 0 K( Xu−Xi∆ n,T hn,T )K( Xu−Xk∆ n,T hn,T )σ 2₍_X u)du ¶2 µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¶2µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xk∆ n,T hn,T ) ¶2 = 2 Z T 0 Z T 0 µ_R T 0 1 h2 n,T K(Xu−Xa hn,T )K( Xu−Xb hn,T )σ 2₍_X u)du ¶2 ³ 1 hn,T RT 0 K( Xf−Xa hn,T )df ´2³ 1 hn,T RT 0 K( Xe−Xb hn,T )de ´2dadb+oa.s.(1) = 2 Z _∞ −∞ Z _∞ −∞ µ_R ∞ −∞ 1 h2 n,T K(_hu−a n,T)K( u−b hn,T)σ 2₍_u₎_L X(T, u)du ¶2 ³ 1 hn,T RT 0 K( f−a hn,T)LX(T, f)df ´2³ 1 hn,T RT 0 K( e−b hn,T)LX(T, e)de ´2LX(T , a)LX(T , b)dadb = 2 1 hn,T Z _∞ −∞ Z _∞ −∞ 1 hn,T ³R_∞ −∞ 1 hn,TK( u−a hn,T)K( u−b hn,T)σ 2₍_u₎_L X(T, u)du ´2 LX(T , a)LX(T , b)dadb ³ 1 hn,T RT 0 K( f−a hn,T)LX(T, f)df ´2³ 1 hn,T RT 0 K( e−b hn,T)LX(T, e)de ´2 = 2 1 hn,T Z _∞ −∞ Z _∞ −∞ 1 hn,T ³R_∞ −∞K(c)K( a−b hn,T +c)σ 2₍_a₊_ch n,T)LX(T, a+chn,T)dc ´2 LX(T , a)LX(T , b)dadb ¡ LX(T, a) ¢2¡ LX(T, b) ¢2 = 2 1 hn,T Z _∞ −∞ Z _∞ −∞ ³R_∞ −∞K(c)K(k+c)σ 2₍_a₎_L X(T, a)dc ´2 LX(T , a)LX(T , a−hn,Tk)dadk ¡ LX(T, a) ¢2¡ LX(T, a−hn,Tk) ¢2 = 2 1 hn,T ÃZ ∞ −∞ σ4(a)L2_X(T , a) ¡ LX(T, a) ¢2 da ! ÃZ ∞ −∞ µZ ∞ −∞ K(c)K(k+c)dc ¶2 dk ! . Then, Ψ(T)−1/2   2T n n X i=1 1 ³_∆ n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆_n,T hn,T ) ´2 µZ 1 0 M_nidM_ni ¶  =Op(1), where Ψ(T) = 2 hn,T ÃZ ∞ −∞ σ4(a)L2_X(T , a) ¡ LX(T, a) ¢2 da ! ÃZ ∞ −∞ µZ ∞ −∞ K(c)K(k+c)dc ¶2 dk ! a.s. →0 asn, T _{→ ∞}.

(21)

2T n n X i=1 1 ³_∆ n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆_n,T hn,T ) ´2 µZ 1 0 M_nidM_ni ¶ a.s. → 0.

Similar steps allow us to show that (a3)a.s._→0.This proves the first part of the theorem. Note that the limit quantity Qdrif t₍_θdrif t₎ _{is continuous in} _θdrif t _{by virtue of Assumption 2.}₁_{. Then, by (9)}

and for every ε>0,_∃ξ >0

P³bθdrif t_n,T _∈/B(θ₀drif t,ε)´

≤ P³Qdrif t(bθ_n,Tdrif t,θ₀drif t)_≥ξ´

≤ P ³

Qdrif t(bθdrif t_n,T ,θ₀drif t)₋Qdrif t_n,T (bθdrif t_n,T ) +Qdrif t_n,T (bθdrif t_n,T )₋Qdrif t(θdrif t₀ ,θdrif t₀ )_≥ξ ´

≤ P³Qdrif t(bθdrif t_n,T ,θ₀drif t)₋Qdrif t_n,T (bθdrif t_n,T ) +Qdrif t_n,T (θ₀drif t) +oa.s.(1)−Qdrif t(θ₀drif t,θ₀drif t)≥ξ ´

≤ P Ã

2 sup

θdrif t_∈_Θdrif t |

Qdrif t(θ,θdrif t₀ )₋Q_n,Tdrif t(θ)_|+oa.s.(1)≥ξ !

a.s. →0.

This proves the second part of the statement.

Proof of theorem 4.2 Write

b θdrif t_n,T ₋θdrif t₀ =₋[ .. Qdrif t_n,T (θ∗)]−1 . Qdrif t_n,T (θdrif t₀ ), where θ∗ _∈ ³ b θdrif t_n,T ,θdrif t₀ ´ . Qdrif t_n,T (θ₀drif t) = ₋T n n X i=1 ³ b µ_n,T(Xi∆_n,T)−µ(Xi∆_n,T,θdrif t0 ) ´∂µ(Xi∆_n,T,θdrif t0 ) ∂θ .. Qdrif t_n,T (θ∗) = T n n X i=1 ∂µ(Xi∆_n,T,θ∗) ∂θ ∂µ(Xi∆_n,T,θ∗) ∂θ0 | {z } .. Qdrif tn,T (A)(θ∗) − T_n n X i=1 ³ b µ_n,T(Xi∆_n,T)−µ(Xi∆_n,T,θ∗) ´∂µ(Xi∆_n,T,θ∗) ∂θ∂θ0 | {z } .. Qdrif tn,T (B)(θ∗) . First, we examine .. Qdrif t_n,T (θ∗). Consider ..

Qdrif t_n,T (A)(θdrif t). Uniformly in Θdrif t we obtain

(22)

.. Qdrif t_n,T (A)(θdrif t) = T n n X i=1 ∂µ(Xi∆_n,T,θdrif t) ∂θ ∂µ(Xi∆_n,T,θdrif t) ∂θ0 = Z T 0 ∂µ(Xs,θdrif t) ∂θ ∂µ(Xs,θdrif t) ∂θ0 ds+oa.s.(1) = Z ∞ −∞ ∂µ(a,θdrif t) ∂θ ∂µ(a,θdrif t) ∂θ0 1 σ2₍_a₎LX(T , a)da+oa.s.(1) = Z _∞ −∞ ∂µ(a,θdrif t) ∂θ ∂µ(a,θdrif t) ∂θ0 LX(T , a)da+oa.s.(1)

= Q..drif t(A)(θdrif t) +oa.s.(1), (15)

by the occupation time formula [c.f. BP (1998)]. Hence,

|Q..drif t_n,T (A)(bθdrif t_n,T )₋Q..drif t(A) (θdrif t₀ )_|

≤ | .. Qdrif t_n,T (A)(bθdrif t_n,T )₋ .. Qdrif t(A) (bθdrif t_n,T )+ .. Qdrif t(A)(bθdrif t_n,T )₋ .. Qdrif t(A)(θdrif t₀ )_| ≤ | .. Qdrif t_n,T (A)(bθdrif t_n,T )₋ .. Qdrif t(A) (bθdrif t_n,T )_|+_| .. Qdrif t(A)(bθdrif t_n,T )₋ .. Qdrif t(A)(θdrif t₀ )_| ≤ oa.s.(1) +| .. Qdrif t(A) ³ θdrif t₀ +oa.s.(1) ´ −Q..drif t(A) (θdrif t₀ )_| ≤ oa.s.(1) +oa.s.(1) a.s. → 0,

by the continuity of Q..drif t(A) (.) and (15). Since θ∗ lies on the line segment between bθdrif t_n,T and

θ₀drif t, then

..

Qdrif t_n,T (A)(θ∗) =

..

Qdrif t(θ₀drif t) +oa.s.(1).

Following the same steps, it is simple to prove that

.. Qdrif t_n,T (B)(θ∗)a.s._→0. We now analyze . Q_n,T (θ₀drif t),writing −Q._n,T (θ₀drif t) = T n n X i=1 ³ b µ_n,T(Xi∆_n,T)−µ(Xi∆_n,T,θdrif t0 ) ´∂µ(Xi∆_n,T,θdrif t0 ) ∂θ = T n n X i=1    1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) h X₍_j₊₁₎_∆_n,T ₋Xj∆n,T i ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) −µ(Xi∆_n,T,θdrif t0 )    ∂µ(Xi∆_n,T,θ0drif t) ∂θ .

(23)

−Q._n,T (θ₀drif t) = T n n X i=1    1 h_n,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) R(j+1)∆_n,T j∆n,T ³ µ0(Xs)−µ0(Xi∆_n,T) ´ ds ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )    ∂µ0(Xi∆_n,T) ∂θ | {z } An,T + T n n X i=1    1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) R(j+1)∆_n,T j∆n,T σ0(Xs)dBs ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )    ∂µ0(Xi∆_n,T) ∂θ | {z } Bn,T(1) .

First, we examine the second term, Bn,T(1). Consider the martingale

Bn,T(r) = T n [nr] X i=1    1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) R(j+1)∆_n,T j∆n,T σ0(Xs)dBs ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )   ∂µ0(X_∂θi∆n,T).

As n _{→ ∞}, T _{→ ∞} and hn,T → 0 such that ∆_h_n,Tn,T → 0 and LX_h_n,T(T,x)(∆n,T)α =Oa.s.(1) for some α_∈(0,1₂) [c.f. BP (1998)], its quadratic variation process can be written as

[Bn,T]r = µ T n ¶2 [_Xnr] i=1 [nr] X k=1     ³ 1 hn,T ´2_Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )K( Xj∆_n,T−Xk∆ n,T hn,T ) R(j+1)∆_n,T j∆n,T σ 2 0(Xs)ds µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xk∆ n,T hn,T ) ¶ µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¶    × × Ã ∂µ0(Xi∆_n,T) ∂θ ∂µ0(Xk∆_n,T) ∂θ0 ! = µ T n ¶2 [nr] X i=1 [nr] X k=1     n X j=1 ³ 1 hn,T ´2 K(Xj∆n,T−Xi∆n,T hn,T )K( Xj∆n,T−Xk∆ n,T hn,T )σ 2 0(Xj∆n,T +oa.s.(1))∆n,T µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xk∆ n,T hn,T ) ¶ µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¶    +oa.s.(1) = Z rT 0 Z rT 0    Z T 0 ³ 1 hn,T ´2 K(Xu−Xa hn,T )K( Xu−Xb hn,T )σ 2 0(Xu)du ³ 1 hn,T RT 0 K( Xu−Xa hn,T )du ´ ³ 1 hn,T RT 0 K( Xu−Xb hn,T )du ´    µ ∂µ0(Xa) ∂θ ∂µ0(Xb) ∂θ0 ¶ dadb+oa.s.(1) = Z _∞ −∞ Z _∞ −∞    Z _∞ −∞ ³ 1 hn,T ´2 K(_hu−a n,T)K( u−b hn,T)σ 2 0(u)LX(T, u)du ³ 1 hn,T R_∞ −∞K( u−a hn,T)LX(T, u)du ´ ³ 1 hn,T R_∞ −∞K( u−b hn,T)LX(T, u)du ´   × × µ ∂µ0(a) ∂θ ∂µ0(b) ∂θ0 ¶ LX(rT , a)LX(rT , b)dadb+oa.s.(1). Thus, setting 23

(24)

u₋a hn,T =z, we can write Z _∞ −∞ Z _∞ −∞  Z ∞ −∞ 1 hn,TK(z)K( a+hn,Tz−b hn,T )σ 2 0(a+hn,Tz)LX(T, a+hn,Tz)dz ³R_∞ −∞K(z)LX(T, a+hn,Tz)dz ´ ³ 1 hn,T R_∞ −∞K( a+hn,Tz−b hn,T )LX(T, a+hn,Tz)dz ´  _× × µ ∂µ0(a) ∂θ ∂µ0(b) ∂θ0 ¶ LX(rT , a)LX(rT , b)dadb+oa.s.(1). Further, setting a₋b hn,T =k, we obtain Z _∞ −∞ Z _∞ −∞  Z ∞ −∞ K(z)K(z+k)σ2₀(a+hn,Tz)LX(T, a+hn,Tz)dz ³R_∞ −∞K(z)LX(T, a+hn,Tz)dz ´ ³R_∞ −∞K(z+k)LX(T, a+hn,Tz)dz ´  _× × µ ∂µ0(a) ∂θ ∂µ0(a−khn,T) ∂θ0 ¶ LX(rT , a)LX(rT , a−khn,T)dadk+oa.s.(1) a.s. → Z _∞ −∞ σ2₀(a) µ ∂µ0(a) ∂θ ∂µ0(a) ∂θ0 ¶ ¡_L X(rT , a) ¢2 LX(T, a) da.

Then, by Brownian motion embedding [c.f. Revuz and Yor (1994)]

Bn,T(1)→dN Ã 0, ÃZ _∞ −∞ σ2(a) µ ∂µ0(a) ∂θ ∂µ0(a) ∂θ0 ¶ ¡_L X(T , a) ¢2 LX(T, a) da !! . Next, we examine An,T. An,T = T n n X i=1    1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) R(j+1)∆_n,T j∆n,T ³ µ0(Xs)−µ0(Xi∆_n,T) ´ ds ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )   ∂µ0(X_∂θi∆n,T) = T n n X i=1    1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ³ µ0(X∗)−µ0(Xj∆_n,T) ´ ∆n,T ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )   ∂µ0(X_∂θi∆n,T) | {z } A1 n,T + T n n X i=1    1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ³ µ0(Xj∆n,T)−µ0(Xi∆n,T) ´ ∆n,T ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )   ∂µ0(X_∂θi∆n,T) | {z } A2 n,T ,

(25)

where X∗_∈₍_X

(j+1)∆n,T, Xj∆n,T) by the mean value theorem. A

1

n,T can be bounded as follows

A1_n,T _≤const.κn,T Ã T n n X i=1 ∂µ0(Xi∆_n,T) ∂θ !

where κn,T = maxi≤nsupj∆n,T≤s≤(j+1)∆n,T |Xs−Xj∆n,T|. We know that κn,T = Oa.s. ¡

∆n,T1/2¢. Then the bound becomes,

const.Oa.s. ³ ∆n,T1/2 ´ µµZ ∞ −∞ ∂µ0(a) ∂θ LX(T , a)da ¶ +oa.s.(1) ¶ . Now consider term A2_n,T.

A2_n,T = T n n X i=1    1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ³ µ0(Xj∆n,T)−µ0(Xi∆n,T) ´ ∆n,T ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )   ∂µ0(X_∂θi∆n,T) = T n n X i=1    1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ³ µ0₀(x∗ ij) ³ Xj∆n,T −Xi∆n,T ´´ ∆n,T ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )    ∂µ0(Xi∆_n,T) ∂θ , where x∗ =f(Xj∆n,T, Xi∆n,T)∈[Xj∆n,T, Xi∆n,T].Hence, A2_n,T = Z T 0 1 hn,T RT 0 K( Xu−Xs hn,T )µ 0 0(f(Xu, Xs)) (Xu−Xs) 1 hn,T RT 0 K( Xu−Xs hn,T )du ∂µ0(Xs) ∂θ duds+oa.s(1) = Z _∞ −∞ 1 hn,T R_∞ −∞K( u−s hn,T)µ 0 0(f(s, u)) (u−s) 1 hn,T R_∞ −∞K( u−s hn,T)LX(T, u)du ∂µ0(s) ∂θ LX(T, u)LX(T , s)duds+oa.s(1). In consequence, 1 hn,T A2_n,T = Z _∞ −∞ 1 hn,T R_∞ −∞K( u−s hn,T)µ 0 0(f(s, u)) ³ u−s hn,T ´ 1 hn,T R_∞ −∞K( u−s hn,T)L(T, u)du ∂µ0(s) ∂θ LX(T, u)LX(T , s)duds+oa.s(1) = Z _∞ −∞ R_∞ −∞cK(c)µ 0 0(f(s, s+hn,Tc)) R_∞ −∞K(c)LX(T, s+hn,Tc)dc ∂µ0(s) ∂θ LX(T, s+hn,Tc)LX(T , s)dsdc+oa.s(1) = Z _∞ −∞ Z _∞ −∞ cK(c)µ 0 0(s) σ2 0(s) ∂µ0(s) ∂θ LX(T, s+hn,Tc) LX(T , s) LX(T, s) dsdc − Z _∞ −∞ Z _∞ −∞ cK(c)µ 0 0(s) σ2 0(s) ∂µ0(s) ∂θ LX(T, s) LX(T , s) LX(T, s) dsdc +oa.s(1). 25

(26)

Then, 1 hn,T A2_n,T = Z _∞ −∞ Z _∞ −∞ cK(c)µ 0 0(s) σ₀2(s) ∂µ0(s) ∂θ LX(T , s) LX(T, s) (LX(T, s+hn,Tc)−LX(T, s))dsdc. In consequence, 1 h3_n,T/2 A2_n,T = 2 Z _∞ −∞ Z _∞ −∞ cK(c)µ 0 0(s) σ₀2(s) ∂µ0(s) ∂θ LX(T , s) LX(T, s) 1 2phn,T (LX(T, s+hn,Tc)−LX(T, s))dsdc.

By a simple application of the results in BP (1998) we can write,

1 h3_n,T/2 A2_n,T → d2 Z _∞ −∞ Z _∞ −∞ Ã cK(c) Ã µ0₀(s) σ2 0(s) ! LX(T , s) LX(T, s) ∂µ0(s) ∂θ ! B(LX(T, s), c)dsdc,

where B(., .)is a standard Brownian sheet. Hence,

³ b θdrif t_n,T ₋θ₀drif t´ = µµZ _∞ −∞ ∂µ0(a) ∂θ ∂µ0(a) ∂θ0 LX(T , a)da ¶ +oa.s.(1) ¶₋1 [A1_n,T +A2_n,T +Bn,T(1)] → d µZ _∞ −∞ ∂µ0(a) ∂θ ∂µ0(a) ∂θ0 LX(T , a)da ¶−1 M N Ã 0, ÃZ _∞ −∞ σ2(a) µ ∂µ0(a) ∂θ ∂µ0(a) ∂θ0 ¶ ¡_L X(T , a) ¢2 LX(T, a) da !! d =N(0,Ξmu(T)).

This, in turn, implies

Ξ−_mu1/2(T)³bθ_n,Tdrif t₋θ₀drif t´_→dN(0,I), where Ξmu(T) =B(T)−mu1Ω(T)muB(T)−mu1, Bmu= µZ _∞ −∞ ∂µ0(a) ∂θ ∂µ0(a) ∂θ0 LX(T , a)da ¶ and Ωmu= ÃZ _∞ −∞ σ2(a) µ ∂µ0(a) ∂θ ∂µ0(a) ∂θ0 ¶ ¡_L X(T , a) ¢2 LX(T, a) da ! . This proves the stated result.

(27)

Proof of theorem 4.6 We can follow the same steps as in the proof of Theorem 4.1.

Proof of theorem 4.7 As in the proof of Theorem 4.2, write

bθdif f_n,T ₋θdif f₀ =₋[ .. Qdif f_n,T (θ∗)]−1 . Qdif f_n,T (θdif f₀ ), where θ∗ _∈ ³bθdif f_n,T ,θ₀dif f´ . Qdif f_n,T (θ₀dif f) = ₋T n n X i=1 ³ b σ2_n,T(Xi∆_n,T)−σ2(Xi∆_n,T,θdif f0 ) ´∂σ2(Xi∆_n,T,θdif f0 ) ∂θ .. Qdif f_n,T (θ∗) = T n n X i=1 ∂σ2(Xi∆_n,T,θ∗) ∂θ ∂σ2(Xi∆_n,T,θ∗) ∂θ0 | {z } .. Qdif fn,T(A)(θ∗) − T_n n X i=1 ³ b σ2_n,T(Xi∆_n,T)−σ2(Xi∆_n,T,θ∗) ´∂σ2(Xi∆_n,T,θ∗) ∂θ∂θ0 | {z } .. Qdif fn,T (B)(θ∗) .

First, we examine Q..dif f_n,T (θ∗). ConsiderQ.._n,Tdif f(A)(θdif f). Uniformly inΘdif f we obtain

.. Qdif f_n,T (A)(θdif f) = T n n X i=1 ∂σ2(Xi∆_n,T,θdif f) ∂θ ∂σ2(Xi∆_n,T,θdif f) ∂θ0 = Z T 0 ∂σ2(Xs,θdif f) ∂θ ∂σ2(Xs,θdif f) ∂θ0 ds+oa.s.(1) = Z ∞ −∞ ∂σ2(a,θdif f) ∂θ ∂σ2(a,θdif f) ∂θ0 1 σ2₍_a₎LX(T , a)da+oa.s.(1) = Z _∞ −∞ ∂µ(a,θdif f) ∂θ ∂µ(a,θdif f) ∂θ0 LX(T , a)da+oa.s.(1)

= Q..dif f(A)(θdif f) +oa.s.(1), (16)

by the occupation time formula [c.f. BP (1998)]. Then, using the continuity of Q..dif f (.), the a.s. consistency of bθdif f_n,T and result (16) as in the proof of Theorem 4.2, we can write

..

Qdif f_n,T (A)(θ∗) = Q..dif f(A)(θ₀dif f) +oa.s.(1),

since θ∗ lies on the line segment connectingbθdif f_n,T and θdif f₀ . Further,

..

Qdif f_n,T (B) (θ∗) = Q..dif f(B)(θ₀dif f) +oa.s.(1) a.s. → 0. Now, consider . Qdif f_n,T (θdif f₀ ). 27

(28)

−Q.dif f_n,T (θ₀dif f) = T n n X i=1 ³ b σ2(Xi∆_n,T)−σ2(Xi∆_n,T,θ₀dif f) ´∂σ2(Xi∆_n,T,θdif f0 ) ∂θ = T n n X i=1    1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ³ X₍_j₊₁₎_∆_n,T ₋Xj∆n,T ´2 ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) −σ2(Xi∆_n,T,θ0dif f)    ∂σ2₍_X i∆_n,T,θdif f0 ) ∂θ .

Then, writing σ2(Xi∆_n,T,θdif f₀ ) =σ02(Xi∆_n,T),we obtain

−Q.dif f_n,T (θdif f₀ ) = T n n X i=1    1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) R(j+1)∆_n,T j∆n,T 2 ¡ Xs−Xj∆n,T ¢ µ0(Xs)ds ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )   ∂σ 2 0(Xi∆_n,T) ∂θ | {z } An,T + T n n X i=1    1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) R(j+1)∆_n,T j∆n,T 2 ¡ Xs−Xj∆n,T ¢ σ0(Xs)dBs ∆n,T hn,T Pn j=1K( Xj∆n,T−Xi∆ n,T hn,T )   ∂σ 2 0(Xi∆_n,T) ∂θ | {z } Bn,T(1) + T n n X i=1    1 hn,T Pn j=1K( Xj∆n,T−Xi∆ n,T hn,T ) R(j+1)∆_n,T j∆n,T ³ σ₀2(Xs)−σ02(Xi∆_n,T) ´ ds ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )    ∂σ2 0(Xi∆_n,T) ∂θ | {z } Cn,T .

Fisrt, we examine the second term Bn,T(1). Consider the martingale s 1 ∆n,T Bn,T(r) = T n [nr] X i=1    q 1 ∆n,T 1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) R(j+1)∆_n,T j∆n,T 2 ¡ Xs−Xj∆n,T ¢ σ0(Xs)dBs ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )   ∂σ 2 0(Xi∆_n,T) ∂θ .

Then, as n_{→ ∞}, T _{→ ∞} and hn,T → 0 such that ∆_h_n,Tn,T →0 and LX_h_n,T(T,x)(∆n,T)α = Oa.s.(1) for

some α_∈(0,1₂) [c.f. BP (1998)], we can write the quadratic variation process as

[Bn,T]r = µ T n ¶2 [_Xnr] i=1 [nr] X k=1     ³ 1 hn,T ´2Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )K( Xj∆_n,T−Xk∆ n,T hn,T ) R(j+1)∆n,T j∆n,T 2 ¡ Xs−Xj∆n,T ¢ σ2 0(Xs)ds ∆n,T µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xk∆ n,T hn,T ) ¶ µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¶    

(29)

× Ã ∂σ₀2(Xi∆_n,T) ∂θ ∂σ2 0(Xk∆_n,T) ∂θ0 ! = µ T n ¶2 [nr] X i=1 [nr] X k=1     n X j=1 4³_h1 n,T ´2 K(Xj∆n,T−Xi∆n,T hn,T )K( Xj∆_n,T−Xk∆ n,T hn,T )σ 4 0 ¡ Xj∆n,T +oa.s.(1) ¢ (∆n,T) µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xk∆ n,T hn,T ) ¶ µ ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¶     × Ã ∂σ2 0(Xi∆_n,T) ∂θ ∂σ2 0(Xk∆_n,T) ∂θ0 ! = Z rT 0 Z rT 0    Z T 0 4 ³ 1 hn,T ´2 K(Xu−Xa hn,T )K( Xu−Xb hn,T )σ 4 0(Xu)du ³ 1 hn,T RT 0 K( Xu−Xa hn,T )du ´ ³ 1 hn,T RT 0 K( Xu−Xb hn,T )du ´    µ ∂σ2₀(Xa) ∂θ ∂σ₀2(Xb) ∂θ0 ¶ dadb +oa.s.(1) = Z _∞ −∞ Z _∞ −∞    Z _∞ −∞ 4³_h1 n,T ´2 K(_hu−a n,T)K( u−b hn,T)σ 4 0(u)LX(T, u)du ³ 1 hn,T R_∞ −∞K( u−a hn,T)LX(T, u)du ´ ³ 1 hn,T R_∞ −∞K( u−b hn,T)LX(T, u)du ´   × × µ ∂σ₀2(a) ∂θ ∂σ₀2(b) ∂θ0 ¶ LX(rT , a)LX(rT , b)dadb+oa.s.(1). Setting u₋a hn,T =z, we can write Z _∞ −∞ Z _∞ −∞  Z ∞ −∞ 4_h1 n,TK(z)K( a+hn,Tz−b hn,T )σ 4 0(a+hn,Tz)LX(T, a+hn,Tz)dz ³R_∞ −∞K(z)LX(T, a+hn,Tz)dz ´ ³ 1 hn,T R_∞ −∞K( a+hn,Tz−b hn,T )LX(T, a+hn,Tz)dz ´  _× × µ ∂σ2₀(a) ∂θ ∂σ₀2(b) ∂θ0 ¶ LX(rT , a)LX(rT , b)dadb+oa.s.(1). Further, setting a₋b hn,T =k, we obtain Z _∞ −∞ Z _∞ −∞  Z ∞ −∞ 4K(z)K(z+k)σ4 0(a+hn,Tz)LX(T, a+hn,Tz)dz ³R_∞ −∞K(z)LX(T, a+hn,Tz)dz ´ ³R_∞ −∞K(z+k)LX(T, a+hn,Tz)dz ´  _× × µ ∂σ₀2(a) ∂θ ∂σ2₀(a₋khn,T) ∂θ0 ¶ LX(rT , a)LX(rT , a−khn,T)dadk+oa.s.(1) a.s. → Z _∞ −∞ 4σ4₀(a) µ ∂σ2 0(a) ∂θ ∂σ2 0(a) ∂θ0 ¶ ¡_L X(rT , a) ¢2 LX(T, a) da. 29

(30)

Then, as in the proof of Theorem 4.2, 1 p ∆n,T Bn,T(1)→dM N Ã 0, ÃZ _∞ −∞ 4σ₀4(a) µ ∂σ₀2(a) ∂θ ∂σ2₀(a) ∂θ0 ¶ Ã¡_L X(T , a) ¢2 LX(T, a) ! da !! . Now examine Cn,T. Cn,T = T n n X i=1    1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) R(j+1)∆_n,T j∆n,T ³ σ2₀(Xs)−σ20(Xi∆_n,T) ´ ds ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )   ∂σ 2 0(Xi∆_n,T) ∂θ = T n n X i=1    1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ¡ σ2₀(X∗)₋σ₀2(Xj∆n,T) ¢ ∆n,T ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )   ∂σ 2 0(Xi∆_n,T) ∂θ | {z } C1 n,T + T n n X i=1    1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ³ σ₀2(Xj∆n,T)−σ 2 0(Xi∆_n,T) ´ ∆n,T ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )   ∂σ 2 0(Xi∆_n,T) ∂θ | {z } C2 n,T ,

where X∗_∈(X₍_j₊₁₎_∆_n,T, Xj∆n,T) by the mean value theorem. First, we analyse termC

2 n,T. C2_n,T = T n n X i=1    1 hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T ) ³ σ2 0(Xj∆n,T)−σ 2 0(Xi∆_n,T) ´ ∆n,T ∆n,T hn,T Pn j=1K( Xj∆_n,T−Xi∆ n,T hn,T )   ∂σ 2 0(Xi∆_n,T) ∂θ .

Following the same steps as for A2_n,T in the proof of Theorem 4.2 we can prove that

1 h3_n,T/2 C2_n,T _→d4 Z _∞ −∞ Z _∞ −∞ cK(c) Ã σ0₀(s) σ0(s) ! LX(T , s) LX(T, s) ∂σ2₀(s) ∂θ B(LX(T, s), c)dsdc,

where B(., .) is a standard Brownian sheet. As for An,T and C1n,T, it is easy to see that An,T =

oa.s.(Bn,T)and Cn,T1 =oa.s.(C2n,T). Then,

b θdif f_n,T ₋θdif f₀ = ₋[ .. Qdif f_n,T (θ∗)]−1 . Qdif f_n,T (θdif f₀ ) = µµZ _∞ −∞ ∂σ₀2(a) ∂θ ∂σ₀2(a) ∂θ0 LX(T , a)da ¶ +oa.s.(1) ¶−1 [Bn,T +An,T +C1n,T +C2n,T] = µµZ ∞ −∞ ∂σ₀2(a) ∂θ ∂σ₀2(a) ∂θ0 LX(T , a)da ¶ +oa.s.(1) ¶−1 [Bn,T +oa.s.(Bn,T) +oa.s.(C2n,T) +C2n,T]. If h 3 n,T ∆n,T →0,then

(31)

s 1 ∆n,T ³ b θdif f_n,T ₋θ₀dif f ´ → d µZ _∞ −∞ ∂σ2 0(a) ∂θ ∂σ2 0(a) ∂θ0 LX(T , a)da ¶−1 M N Ã 0, ÃZ _∞ −∞ 4σ4(a) µ ∂σ2 0(a) ∂θ ∂σ2 0(a) ∂θ0 ¶ ¡_L X(T, a) ¢2 LX(T, a) da !! d =N(0,Ξsigma(T)). In consequence, 1 p ∆n,T

Ξ−_sigma1/2 (T)³bθ_n,Tdif f₋θ₀dif f´_→dN(0,I),

where

Ξsigma(T) =B(T)−sigma1 Ω(T)sigmaB(T)−sigma1 ,

Bsigma= µZ _∞ −∞ ∂σ2 0(a) ∂θ ∂σ2 0(a) ∂θ0 LX(T , a)da ¶ , and Ωsigma= ÃZ _∞ −∞ 4σ4(a) µ ∂σ₀2(a) ∂θ ∂σ₀2(a) ∂θ0 ¶ ¡_L X(T, a) ¢2 LX(T, a) da ! .

This concludes the proof for the diﬀusion estimator.

References

[1] Aït-Sahalia, Y., 1996b. Testing continuous-time models of the spot interest rate. The Review of Financial Studies 2, 385-426.

[2] Andrews, D.W.K., 1989. Asymptotics for Semiparametric Econometric Models: III. Testing and Examples. C.F.D.P. No. 910, Yale University.

[3] Bandi, F.,1999. Stanton’s (1997) approximations to thefirst and second moment of a diﬀusion: statistical properties and implications for pricing. Unpublished working paper, Yale University. [4] Bandi, F., Nguyen T., 1999. Fully nonparametric estimators for diﬀusions: a small sample

analysis. Unpublished working paper, Yale University.

[5] Bandi, F., Phillips P.C.B., 1998. Econometric estimation of diﬀusion models. Unpublished working paper, Yale University.

[6] Baxter, M., Rennie, A.,1996. Financial Calculus. An Introduction to Derivative Pricing. Cam-bridge University Press, CamCam-bridge.

[7] Davidson, R., MacKinnon, J., 1993. Estimation and Inference in Econometrics. Oxford Uni-versity Press, New York.

(32)

[8] Duﬃe, D., 1992. Dynamic Asset Pricing Theory. Princeton University Press.

[9] Florens-Zmirou, D., 1993. On estimating the diﬀusion coeﬃcient from discrete observations. Journal of Applied Probability 30, 790-804.

[10] Härdle, W.,1990. Applied Nonparametric Regression. Cambridge University Press, Cambridge. [11] Jacod, J., 1997. Nonparametric kernel estimation of the diffusion coefficient of a diffusion.

Prépublication No. 405 du Laboratoire de l’Université Paris VI.

[12] Jiang, G.J., Knight, J., 1997. A Nonparametric approach to the estimation of diﬀusion processes, with an application to a short-term interest rate model. Econometric Theory 13, 615-645.

[13] Karatzas, I., Shreve, S.E., 1988. Brownian Motion and Stochastic Calculus. Springer-Verlag, New York.

[14] Karlin, S., Taylor,1981. A Second Course in Stochastic Processes. Academic Press, New York. [15] Park, J., Phillips P.C.B., 1999. Asymptotics for nonlinear transformations of integrated time

series. Econometric Theory,15, 269-298.

[16] Park, J., Phillips P.C.B., 2000. Nonlinear regressions with integrated time series.Econometrica

(forthcoming).

[17] Phillips, P.C.B., Park, J., 1997. Nonstationary density estimation and kernel autoregression. Unpublished working paper, Yale University.

[18] Pritsker, M., 1998. Nonparametric density estimation and tests of continuous-time interest rate models. The Review of Financial Studies 3.

[19] Protter, P., 1990. Stochastic Integration and Diﬀerential Equations. Springer-Verlag, New York.

[20] Revuz, D., Yor, M., 1994. Continuous Martingales and Brownian Motion. Second Edition, Springer-Verlag, New York.

[21] Stanton, R., 1998. A nonparametric model of term strucure dynamics and the market price of interest rate risk. Journal of Finance 5,1973-2002.