To establish the asymptotic properties of the smooth backfitting estimators one-dimensional marginal estimators have to be investigated, since the smooth back-fitting estimators inherits their behavior. In the proofs the dimension index of the estimators will be suppressed, i. e. bµN Wh (xj) = bµ1,j,NWh (xj), eµN Wh (xj) = e
µ1,j,NWh (xj) and for the local linear estimators analogously. Using the integral form of the stochastic differential equaltion for X1, the local constant and the local linear estimator will be decomposed into a bias and a variance part. For this define for l = 0, 1
btBh,l(xj) = 1 T
XnT k=1
Kh(xj, X(k−1)∆j )³X(k−1)∆j − xj h
´lZ k∆
(k−1)∆
µ1(Xs) ds
btVh,l(xj) = 1 T
XnT k=1
Kh(xj, X(k−1)∆j )³X(k−1)∆j − xj h
´lXd
i=1
Z k∆
(k−1)∆
σ1i(Xs) dWsi.
For abbreviation
Keh,l(xj, X(k−1)∆j ) = Kh(xj, X(k−1)∆j )³X(k−1)∆j − xj h
´l
is defined and for a more compact notation bfh,0(xj) = bfh(xj) and bfh,(0,0)(xi, xj) = fbh(xi, xj) is introduced. This enables to write the marginal Nadaraya-Watson estimator as
(4.21) bµN Wh (xj) = bfh,0(xj)−1btBh,l(xj)+ bfh,0(xj)−1btVh,l(xj) = bµN W,Bh (xj)+ bµN W,Vh (xj)
Appendix I: Preliminary Lemmata 109
and the marginal local linear estimator õbLLh (xj)
As building blocks for the asymptotic distribution for the smooth backfitting estimator serve the uniform convergence rates of both parts and the asymptotic distribution of the variance part. To establish this, the one-and two-dimensional marginal density estimates, will be investigated first, beacuse they arise as well in the algorithms of the smooth backfitting estimators.
Lemma 4.1. Under Assumptions 4.1 and 4.2 it holds that sup
The result is here stated for arbitrary n. In general there are cases, where it is possible to obtain the superoptimal rate T−1/2 for n → ∞ (see e. g. Bosq, 1998). However for multivariate diffusion processes (in contrast to the scalar case) these conditions are not satisfied and more than the standard nonparametric rate cannot be achieved.
Proof. Consider the one-dimensional case first and decompose the estimator into bias and variance
fbh,k(xj) − κk(xj)f (xj) = bfh,k(xj) − E bfh,k(xj) + E bfh,k(xj) − κk(xj)f (xj).
Standard kernel calculations show that the bias is of order h2 uniformly over the interior of Gj.
110 4. ESTIMATING ADDITIVE DIFFUSIONS
By the Lipschitz continuity of the kernel it holds for the first maximum that
l=1,...,Nmax sup
Cramer’s conditions are are easily verified with a constant (T h)−1.
With these results a Hoeffding-type inequality (see Theorem 1.3 in Bosq, 1998) can be applied to obtain
P
For the two-dimensional case, it can be decomposed as above into bias and vari-ance and proceeded as before. The only difference is that the varivari-ance of the kernel is then of order h−2.
Next, the uniform convergence of the estimators will be investigated, beginning with the bias parts
Appendix I: Preliminary Lemmata 111
Lemma 4.2. Under Assumptions 4.1 and 4.2 it holds that sup
For the proof uniform bounds for btBh,l(xj) or more precisely for a centered version have to be established
where bµBh(xj) is the estimator under investigation. To apply a Hoeffding-type inequality (as in the proof of Lemma 4.1) treat the two terms separately and regard them as sums of T α-mixing random variables St,nA(xj) and St,nB (xj), t = 0, . . . , T − 1. Next it will be shown that E(St,nA(xj))2 = O(T−2n−1h−1) and E(St,nB(xj))2 = O(T−2n−1h−1).
Apply the mean value theorem to obtain (setting t = 0 wlog) E(S0,nA (xj))2
112 4. ESTIMATING ADDITIVE DIFFUSIONS
The last line follows by iterated expectations and an application of the Burkhol-der-Davis-Grundy inequality. The covariance terms can be bounded by Cauchy-Schwarz, and the verification of Cramer’s conditions goes along the same lines.
This yields the stated rate for E(S0,nA (xj)2.
Because µ1 and the expected value of the estimator are bounded the differences can be taken out of the expectations. Then the rates follow from standard kernel calculations. This completes the proof of
sup To show the lemma, consider the following decomposition
b
µB,N Wh (xj) − m1(xj) = bfh,0(xj)−1btB∗h,0(xj) + E(bµ1,Bh (xj)) − m1(xj).
For the investigation of the first term it suffices to concentrate on the numerator, because the density is bounded from below on Gj. Then the statement follows.
For the local linear case, use the analogous expansion and the same arguments hold.
Appendix I: Preliminary Lemmata 113
Lemma 4.3. Under Assumptions 4.1 and 4.2 it holds that sup
Proof. Because the density is bounded from below, it suffices to consider the numerator parts of the estimators. Defining random variables
Ri,l(xj) = 1
Utilizing Itˆo’s lemma, Cramer’s conditions can be proved with a constant (T h)−1. Then as in the proof of Lemmata 4.1 and 4.2 (using an exponential inequality and covering arguments) it follow that
sup
xj∈Gj
|btVh,l(xj)| = OP³ log T (hT )1/2
´
and both parts of the lemma follow.
Beside the uniform convergence results, the asymptotic distribution of the vari-ance parts of the two estimators have to be derived. This is given in the following Lemma 4.4. Under Assumptions 4.1 and 4.2 and if T h → ∞ and nh3 → ∞ for
114 4. ESTIMATING ADDITIVE DIFFUSIONS
The joint distribution of the vector of the bµV,i,N Wh (xj) is a multivariate (d2 -dimensional) normal distribution with covariances given by
cov(√
T hbµi,Vh (xj),√
T hbµih0,V(xj)) = κ2(xj)
κ0(xj)f (xj)E(aii0(X) | Xj = xj) and zero otherwise. The same holds in the local linear case.
Proof. To derive asymptotic normality, the distribution of btVh,l has to be consid-ered. To do so, decompose it into a discretization error and a stochastic error
btVh,l(xj) =1
To bound the discretization error, calculate E(JD,l(xj))2 = nT−1E
where it is used that all covariances vanish. The last bound follows from the Burkholder-Davis-Grundy inequality. This yields in total
JD,l(xj) = OP(T−1/2n−1/2h−2) = oP(T−1/2h−1/2) by assumption.
Next derive the asymptotic distribution of √
hT JT,l(xj). Note that for every T
Appendix I: Preliminary Lemmata 115
with probability one. And for T → ∞ it holds that hT hJT,l(xj), JT,l(xj)i = hT−1
Applying Proposition 1.21 in Kutoyants (2004) the following asymptotic distri-bution is obtained
Using the convergence results of Lemma 4.1, the statement of the Lemma follows for both estimators, recalling their definition.
Next, some preliminary results for estimating the diffusion matrix are presented.
Again the dimension index of the estimators will be omitted. To decompose the marginal kernel estimators, recall that by applying Itˆo’s lemma
(Xk∆1 − X(k−1)∆1 )(Xk∆2 − X(k−1)∆2 ) =
Next, convergence results for these estimators will be derived.
116 4. ESTIMATING ADDITIVE DIFFUSIONS
Lemma 4.5. Under Assumptions 4.1 and 4.2 it holds that sup same. Therefore, the proof of this lemma is analogous to the proof of Lemma 4.2 and therefore omitted.
Lemma 4.6. Under Assumptions 4.1 and 4.2 it holds that sup
Proof. Write the numerator parts of the estimators as sum of α-mixing random variables
Appendix I: Preliminary Lemmata 117
Cramer’s conditions with a constant (T h)−1 can then be verified by seeing that supk|Zt+k∆| = OP(n−1). This allows to apply an exponential inequality as above.
Thus, it remains to show (4.24). First it holds that
(4.25) E(S0,n(xj))2 = Xn−1 k=0
E eKh,l(xj, Xk∆j )2Z0,k2
+
n−2X
k=0
Xn−1 k0=k+1
E eKh,l(xj, Xk∆j )Z0,kKeh,l(xj, Xkj0∆)Z0,k0.
Start with the first sum and resolve the square to obtain
E eKh,l(xj, Xk∆j )2Z0,k2 = E eKh,l(xj, Xk∆j )2(X(k+1)∆1 − Xk∆1 )2(X(k+1)∆2 − Xk∆2 )2
− 2 E eKh,l(xj, Xk∆j )2(X(k+1)∆1 − Xk∆1 )(X(k+1)∆2 − Xk∆2 )
Z (k+1)∆
k∆
a12(Xs) ds + E eKh,l(xj, Xl∆j )2
³Z (k+1)∆
k∆
a12(Xs) ds
´2
= S1+ S2+ S3.
These three quantities are investigated separately. First recall that
(X(k+1)∆1 − Xk∆1 )2 =
Z (k+1)∆
k∆
a11(Xs) ds + OP¡(log n)1/2 n3/2
¢
and then an application of the mean value theorem yields
S1 = E eKh,l(xj, Xk∆j )2
Z (k+1)∆
k∆
a11(Xs) ds
Z (k+1)∆
k∆
a22(Xs) ds + O¡(log n)1/2 n5/2h
¢
= κ2(xj)
n2h E(a11(Xs)a22(Xs) | Xsj = xj)(1 + o(1)).
Because the drift is bounded, it holds that S2 = O(n−3h−1).
Finally, the last term satisfies S3 = κ2(xj)
n2h E((a12(Xs))2 | Xsj = xj)(1 + o(1)).
In total the first term in equation (4.25) satisfies the desired rate. Because of the stationarity, the second term is bounded by nPn−1
k0=1E eKh,l(xj, X0j)Z0,k
118 4. ESTIMATING ADDITIVE DIFFUSIONS Keh,l(xj, X0j)Z0,k0. This will be decomposed into three parts
n Xn−1 k0=1
E eKh,l(xj, X0j)(X∆1 − X01)(X∆2 − X02)
× eKh,l(xj, Xkj0∆)(X(k10+1)∆− Xk10∆)(X(k20+1)∆− Xk20∆)
= 1 n3
Xn−1 k0=1
E eKh,l(xj, X0j) eKh,l(xj, Xkj0∆)
× µ1(X0)µ2(X0)µ1(Xk0∆)µ2(Xk0∆)(1 + o(1))
= O(n−2),
because the density and the drift are bounded. For the second part, we get
n Xn−1 k0=1
E eKh,l(xj, X0j)(X∆1 − X01)(X∆2 − X02) eKh,l(xj, Xkj0∆)
Z (k0+1)∆
k0∆
a12(Xs) ds
= 1 n2
Xn−1 k0=1
E eKh,l(xj, X0j) eKh,l(xj, Xkj0∆)µ1(X0)µ2(X0)a12(Xk0∆)(1 + o(1))
= O(n−1) and finally
n Xn−1 k0=1
E eKh,l(xj, X0j) Z ∆
0
a12(Xs) ds eKh,l(xj, Xkj0∆)
Z (k0+1)∆
k0∆
a12(Xs) ds = O(n−1).
Then, the covariances are of smaller order and in total equation (4.25) is estab-lished.
Finally, the asymptotic distribution of the variance parts is derived.
Lemma 4.7. Under Assumptions 4.1 and 4.2 and if nT h → ∞ and nh3 → ∞ for T → ∞ and n → ∞ it holds that
√nT hbaN W,Vh (xj)−−→ ND ¡
0, κ20(xj)
κ0(xj)2v12(xj)¢
√nT hbaLL,Vh (xj)−−→ ND ¡
0, κ20(xj)
κ0(xj)2v12(xj)¢ ,
where v12(xj) = (f (xj))−1E¡
a11(X)a22(X)(a12(X))2 | Xj = xj¢ .