Preliminary Lemmata - Model Choice in Structured Nonparametric Regression and Diffusion Models

To establish the asymptotic properties of the smooth backfitting estimators one-dimensional marginal estimators have to be investigated, since the smooth back-fitting estimators inherits their behavior. In the proofs the dimension index of the estimators will be suppressed, i. e. bµ^{N W}_h (x^j) = bµ^1,j,NW_h (x^j), eµ^{N W}_h (x^j) = e

µ^1,j,NW_h (x^j) and for the local linear estimators analogously. Using the integral form of the stochastic differential equaltion for X¹, the local constant and the local linear estimator will be decomposed into a bias and a variance part. For this define for l = 0, 1

bt^B_h,l(x^j) = 1 T

XnT k=1

Kh(x^j, X_(k−1)∆^j )³X_(k−1)∆^j − x^j h

´_lZ _k∆

(k−1)∆

µ¹(Xs) ds

bt^V_h,l(x^j) = 1 T

XnT k=1

K_h(x^j, X_(k−1)∆^j )³X_(k−1)∆^j − x^j h

´_lX^d

i=1

Z _k∆

(k−1)∆

σ¹ⁱ(X_s) dW_sⁱ.

For abbreviation

Keh,l(x^j, X_(k−1)∆^j ) = Kh(x^j, X_(k−1)∆^j )³X_(k−1)∆^j − x^j h

´_l

is defined and for a more compact notation bf_h,0(x^j) = bf_h(x^j) and bf_h,(0,0)(xⁱ, x^j) = fb_h(xⁱ, x^j) is introduced. This enables to write the marginal Nadaraya-Watson estimator as

(4.21) bµ^{N W}_h (x^j) = bf_h,0(x^j)⁻¹bt^B_h,l(x^j)+ bf_h,0(x^j)⁻¹bt^V_h,l(x^j) = bµ^{N W,B}_h (x^j)+ bµ^{N W,V}_h (x^j)

Appendix I: Preliminary Lemmata 109

and the marginal local linear estimator Ãµb^LL_h (x^j)

As building blocks for the asymptotic distribution for the smooth backfitting estimator serve the uniform convergence rates of both parts and the asymptotic distribution of the variance part. To establish this, the one-and two-dimensional marginal density estimates, will be investigated first, beacuse they arise as well in the algorithms of the smooth backfitting estimators.

Lemma 4.1. Under Assumptions 4.1 and 4.2 it holds that sup

The result is here stated for arbitrary n. In general there are cases, where it is possible to obtain the superoptimal rate T^−1/2 for n → ∞ (see e. g. Bosq, 1998). However for multivariate diffusion processes (in contrast to the scalar case) these conditions are not satisfied and more than the standard nonparametric rate cannot be achieved.

Proof. Consider the one-dimensional case first and decompose the estimator into bias and variance

fb_h,k(x^j) − κ_k(x^j)f (x^j) = bf_h,k(x^j) − E bf_h,k(x^j) + E bf_h,k(x^j) − κ_k(x^j)f (x^j).

Standard kernel calculations show that the bias is of order h² uniformly over the interior of G^j.

110 4. ESTIMATING ADDITIVE DIFFUSIONS

By the Lipschitz continuity of the kernel it holds for the first maximum that

l=1,...,Nmax sup

Cramer’s conditions are are easily verified with a constant (T h)⁻¹.

With these results a Hoeffding-type inequality (see Theorem 1.3 in Bosq, 1998) can be applied to obtain

For the two-dimensional case, it can be decomposed as above into bias and vari-ance and proceeded as before. The only difference is that the varivari-ance of the kernel is then of order h⁻².

Next, the uniform convergence of the estimators will be investigated, beginning with the bias parts

Appendix I: Preliminary Lemmata 111

Lemma 4.2. Under Assumptions 4.1 and 4.2 it holds that sup

For the proof uniform bounds for bt^B_h,l(x^j) or more precisely for a centered version have to be established

where bµ^B_h(x^j) is the estimator under investigation. To apply a Hoeffding-type inequality (as in the proof of Lemma 4.1) treat the two terms separately and regard them as sums of T α-mixing random variables S_t,n^A(x^j) and S_t,n^B (x^j), t = 0, . . . , T − 1. Next it will be shown that E(S_t,n^A(x^j))² = O(T⁻²n⁻¹h⁻¹) and E(S_t,n^B(x^j))² = O(T⁻²n⁻¹h⁻¹).

Apply the mean value theorem to obtain (setting t = 0 wlog) E(S_0,n^A (x^j))²

112 4. ESTIMATING ADDITIVE DIFFUSIONS

The last line follows by iterated expectations and an application of the Burkhol-der-Davis-Grundy inequality. The covariance terms can be bounded by Cauchy-Schwarz, and the verification of Cramer’s conditions goes along the same lines.

This yields the stated rate for E(S_0,n^A (x^j)².

Because µ¹ and the expected value of the estimator are bounded the differences can be taken out of the expectations. Then the rates follow from standard kernel calculations. This completes the proof of

sup To show the lemma, consider the following decomposition

µ^{B,N W}_h (x^j) − m¹(x^j) = bf_h,0(x^j)⁻¹bt^B∗_h,0(x^j) + E(bµ^1,B_h (x^j)) − m¹(x^j).

For the investigation of the first term it suffices to concentrate on the numerator, because the density is bounded from below on G^j. Then the statement follows.

For the local linear case, use the analogous expansion and the same arguments hold.

Appendix I: Preliminary Lemmata 113

Lemma 4.3. Under Assumptions 4.1 and 4.2 it holds that sup

Proof. Because the density is bounded from below, it suffices to consider the numerator parts of the estimators. Defining random variables

Ri,l(x^j) = 1

Utilizing Itˆo’s lemma, Cramer’s conditions can be proved with a constant (T h)⁻¹. Then as in the proof of Lemmata 4.1 and 4.2 (using an exponential inequality and covering arguments) it follow that

sup

x^j∈G^j

|bt^V_h,l(x^j)| = O_P³ log T (hT )^1/2

and both parts of the lemma follow.

Beside the uniform convergence results, the asymptotic distribution of the vari-ance parts of the two estimators have to be derived. This is given in the following Lemma 4.4. Under Assumptions 4.1 and 4.2 and if T h → ∞ and nh³ → ∞ for

114 4. ESTIMATING ADDITIVE DIFFUSIONS

The joint distribution of the vector of the bµ^{V,i,N W}_h (x^j) is a multivariate (d² -dimensional) normal distribution with covariances given by

cov(√

T hbµ^i,V_h (x^j),√

T hbµⁱ_h⁰^,V(x^j)) = κ²(x^j)

κ₀(x^j)f (x^j)E(aⁱⁱ⁰(X) | X^j = x^j) and zero otherwise. The same holds in the local linear case.

Proof. To derive asymptotic normality, the distribution of bt^V_h,l has to be consid-ered. To do so, decompose it into a discretization error and a stochastic error

bt^V_h,l(x^j) =1

To bound the discretization error, calculate E(J_D,l(x^j))² = nT⁻¹E

where it is used that all covariances vanish. The last bound follows from the Burkholder-Davis-Grundy inequality. This yields in total

J_D,l(x^j) = O_P(T^−1/2n^−1/2h⁻²) = o_P(T^−1/2h^−1/2) by assumption.

Next derive the asymptotic distribution of √

hT J_T,l(x^j). Note that for every T

Appendix I: Preliminary Lemmata 115

with probability one. And for T → ∞ it holds that hT hJ_T,l(x^j), J_T,l(x^j)i = hT⁻¹

Applying Proposition 1.21 in Kutoyants (2004) the following asymptotic distri-bution is obtained

Using the convergence results of Lemma 4.1, the statement of the Lemma follows for both estimators, recalling their definition.

Next, some preliminary results for estimating the diffusion matrix are presented.

Again the dimension index of the estimators will be omitted. To decompose the marginal kernel estimators, recall that by applying Itˆo’s lemma

(X_k∆¹ − X_(k−1)∆¹ )(X_k∆² − X_(k−1)∆² ) =

Next, convergence results for these estimators will be derived.

116 4. ESTIMATING ADDITIVE DIFFUSIONS

Lemma 4.5. Under Assumptions 4.1 and 4.2 it holds that sup same. Therefore, the proof of this lemma is analogous to the proof of Lemma 4.2 and therefore omitted.

Lemma 4.6. Under Assumptions 4.1 and 4.2 it holds that sup

Proof. Write the numerator parts of the estimators as sum of α-mixing random variables

Appendix I: Preliminary Lemmata 117

Cramer’s conditions with a constant (T h)⁻¹ can then be verified by seeing that sup_k|Z_t+k∆| = O_P(n⁻¹). This allows to apply an exponential inequality as above.

Thus, it remains to show (4.24). First it holds that

(4.25) E(S_0,n(x^j))² = Xn−1 k=0

E eK_h,l(x^j, X_k∆^j )²Z_0,k²

n−2X

k=0

Xn−1 k⁰=k+1

E eK_h,l(x^j, X_k∆^j )Z_0,kKe_h,l(x^j, X_k^j0∆)Z_0,k⁰.

Start with the first sum and resolve the square to obtain

E eKh,l(x^j, X_k∆^j )²Z_0,k² = E eKh,l(x^j, X_k∆^j )²(X_(k+1)∆¹ − X_k∆¹ )²(X_(k+1)∆² − X_k∆² )²

− 2 E eK_h,l(x^j, X_k∆^j )²(X_(k+1)∆¹ − X_k∆¹ )(X_(k+1)∆² − X_k∆² )

Z _(k+1)∆

k∆

a¹²(X_s) ds + E eK_h,l(x^j, X_l∆^j )²

³Z (k+1)∆

k∆

a¹²(X_s) ds

´₂

= S₁+ S₂+ S₃.

These three quantities are investigated separately. First recall that

(X_(k+1)∆¹ − X_k∆¹ )² =

Z _(k+1)∆

k∆

a¹¹(X_s) ds + O_P¡(log n)^1/2 n^3/2

and then an application of the mean value theorem yields

S1 = E eKh,l(x^j, X_k∆^j )²

Z _(k+1)∆

k∆

a¹¹(Xs) ds

Z _(k+1)∆

k∆

a²²(Xs) ds + O¡(log n)^1/2 n^5/2h

= κ²(x^j)

n²h E(a¹¹(X_s)a²²(X_s) | X_s^j = x^j)(1 + o(1)).

Because the drift is bounded, it holds that S₂ = O(n⁻³h⁻¹).

Finally, the last term satisfies S₃ = κ²(x^j)

n²h E((a¹²(X_s))² | X_s^j = x^j)(1 + o(1)).

In total the first term in equation (4.25) satisfies the desired rate. Because of the stationarity, the second term is bounded by nP_n−1

k⁰=1E eK_h,l(x^j, X₀^j)Z_0,k

118 4. ESTIMATING ADDITIVE DIFFUSIONS Ke_h,l(x^j, X₀^j)Z_0,k⁰. This will be decomposed into three parts

n Xn−1 k⁰=1

E eKh,l(x^j, X₀^j)(X_∆¹ − X₀¹)(X_∆² − X₀²)

× eKh,l(x^j, X_k^j0∆)(X_(k¹⁰_+1)∆− X_k¹⁰_∆)(X_(k²⁰_+1)∆− X_k²⁰_∆)

= 1 n³

Xn−1 k⁰=1

E eK_h,l(x^j, X₀^j) eK_h,l(x^j, X_k^j0∆)

× µ¹(X0)µ²(X0)µ¹(Xk⁰∆)µ²(Xk⁰∆)(1 + o(1))

= O(n⁻²),

because the density and the drift are bounded. For the second part, we get

n Xn−1 k⁰=1

E eK_h,l(x^j, X₀^j)(X_∆¹ − X₀¹)(X_∆² − X₀²) eK_h,l(x^j, X_k^j0∆)

Z _(k⁰_+1)∆

k⁰∆

a¹²(X_s) ds

= 1 n²

Xn−1 k⁰=1

E eK_h,l(x^j, X₀^j) eK_h,l(x^j, X_k^j0∆)µ¹(X₀)µ²(X₀)a¹²(X_k⁰_∆)(1 + o(1))

= O(n⁻¹) and finally

n Xn−1 k⁰=1

E eK_h,l(x^j, X₀^j) Z _∆

a¹²(X_s) ds eK_h,l(x^j, X_k^j0∆)

Z _(k⁰_+1)∆

k⁰∆

a¹²(X_s) ds = O(n⁻¹).

Then, the covariances are of smaller order and in total equation (4.25) is estab-lished.

Finally, the asymptotic distribution of the variance parts is derived.

Lemma 4.7. Under Assumptions 4.1 and 4.2 and if nT h → ∞ and nh³ → ∞ for T → ∞ and n → ∞ it holds that

√nT hba^{N W,V}_h (x^j)−−→ N^D ¡

0, κ²₀(x^j)

κ₀(x^j)²v¹²(x^j)¢

√nT hba^LL,V_h (x^j)−−→ N^D ¡

0, κ²₀(x^j)

κ₀(x^j)²v¹²(x^j)¢ ,

where v¹²(x^j) = (f (x^j))⁻¹E¡

a¹¹(X)a²²(X)(a¹²(X))² | X^j = x^j¢ .

In document Model Choice in Structured Nonparametric Regression and Diffusion Models (Page 114-125)