Bandwidth selection - Kernel based nonparametric coefficient estimation in diffusion models

A very important question for practical issues is how to choose a proper bandwidth h in our model. There is an immense amount of papers exclusively dealing with this topic for nonparametric estimation procedures as density or regression estimation. We will restrict ourselves to three methods, which will be introduced in this section.

In general, the practitioner has n observations sampled at a given frequency ∆. Thus, both parameters are determined by the available data and a third parameter h has to be chosen by the use of certain procedures based on this sample. The question which procedure is optimal is hard to answer. Some selection methods are highly computable, whereas others are based on unknown quantities, which in turn have to be estimated. An overview concerning this problem in the context of nonparametric density estimation can be found in Jones et al. (1996). For kernel based regression estimation, we refer to Vieu (1993).

First of all, we recall Assumption A3, ii), where we assumed that

n∆h5 =o(1), as n→ ∞.

This assumption guarantees that the numerator of the bias term fulfills 1 T h n−1 X i=0 K Xi∆−x h Z (i+1)∆ i∆ (b(Xs)₋b(x))ds=oP((n∆h)−1/2), as n → ∞. By choosing h=T−1/5 _{= (}_n_∆)−1/5_{, this term is not negligible and denotes the occurring}

bias. Hence, according to Theorem 2.7, the asymptotic mean squared error (AMSE) of ˆ_b₍_x_{) is of the following form}

AMSE(ˆb(x)) =ABIAS(ˆb(x))2+ AVARˆb(x) =h4µ2₂(K)Λ(x) + R RK 2₍_z₎_dz_σ_˜2₍_x₎ n∆hπ(x) , where Λ(x) := b′(x)π′(x) π(x) + b′′₍_x₎ 2 denotes one part of the bias term and

µ2(K) =

z2K(z)dz

the second moment of K.

We are able to recognize the well-known tradeoff between these two parts and, hence, in order to minimize the AMSE, we will differentiate the sum with respect to h. The

resulting bandwidth is called “oracle bandwidth” and in our case this optimal bandwidth

hopt,oracle(x) has the form

hopt,oracle(x) = (n∆)−1/5 _σ_˜2₍_x₎R RK 2₍_z₎_dz 4µ2 2(K)Λ2(x)π(x) −1/5 .

To ensure that this bandwidth fulfills A3, ii), ∆ and T have to fulfill

T8/5∆→0.

Using this bandwidth, the optimal AMSE is of order

AM SEˆb(x)=O (n∆)−4/5.

Assuming higher order smoothness properties of b and π, this rate can be fastened. Obviously,hopt,oracle(x) depends on the unknown quantities Λ(x),σ˜2₍_x_{) and}_π₍_x_{), which all}

have to be estimated. This task is quite challenging in practical issues, because it is often unclear how two build an appropriate pilot estimator which is a first-stage estimator. One possibility would be to use kernel based estimators again, where the occurring bandwidths are chosen by a rule of thumb, for example hROT ≡ (n∆)−1/5. Moreover, recall that

hopt,oracle(x) is a local plug-in choice for h and, hence, for every x at which b has to be estimated, a new bandwidth has to be computed. Thus, this method is highly computable, although it provides a natural choice of the bandwidth according to the minimization of the (asymptotic) mean squared error.

An alternative approach is provided by the selection of h invoking a global performance criterion, namely by selecting h such that the integrated mean squared error (“IMSE”)

IMSE(ˆb) = Z

M SE(ˆb(x))dx=EZ ˆb(x)₋b(x)2dx

= MISE(ˆb)

is minimized, where the integration takes place over the support of the stationary density

π of X. Moreover, we mention that the order of integration can be reversed due to the positivity of the integrand. In our case, the asymptotic IMSE of ˆb (AIMSE(ˆb)) has the form AIMSE(ˆb) =h4µ2₂(K) Z Λ(x)2π(x)dx+ R RK 2₍_z₎_dzR _σ_˜2₍_x₎_dx n∆h .

Analogously, we find the representation of the optimal bandwidth parameter ˜hopt,oracle as follows: ˜ hopt,oracle= (n∆)−1/5 R RK 2₍_z₎_dzR _σ_˜2₍_x₎_dx 4R Λ2₍_x₎_π₍_x₎_dxµ2 2(K) −1/5 .

This choice for the bandwidth is now x-independent, but the appearing integrals have to be discretized and the integrands have to be replaced by estimators afterwards. These

estimators are then again dependent on a bandwidth, which, in turn, can also be chosen by an appropriate rule of thumb.

The third presented method is cross-validation. For independent and identically dis- tributed data, this procedure is quite standard in the literature and has extensively been studied; see for example H¨ardle and Marron (1985) for a pioneering work in the context of nonparametric regression. In our case, the leave-one-out cross-validation method (see H¨ardle and Marron (1985)) is not appropriate because the available data set contains non-independent copies. Nevertheless, due to our assumptions, the jump-diffusion X is exponentially β-mixing (and, hence, also strong mixing or α-mixing). Thus, the depen- dency in terms of the correlation decreases as the lag between two observations increases. In the context of mixing data, Chu and Marron (1991) as well as Burman et al. (1994) introduced a generalization of the leave-one-out cross-validation method for dependent (strong mixing) data. Burman et al. (1994) initially called this method theH-block cross- validation, which provides a method for choosing an optimal global bandwidth parameter

hH−CV.

The intuition adapted to our model is as follows: Fix an l ∈ 1, ..., n and estimate b(Xl∆)

by a subsample of the available data set _{Xi∆}i=1,...,n such that H observations on both sides are removed and b(Xl∆) is then estimated by the remaining n−(2H+ 1) observa-

tions. To ensure asymptotic optimality,H has to be an increasing integer-valued positive sequence. Now define ˆb₋(l+H)∆:(l+H)∆(Xl∆) as the estimate ofb(Xl∆) based on the sample

{X∆, X2∆, ..., X(l−H−1)∆, X(l+H+1)∆, ..., Xn∆}. Then, the smoothing parameter hH−CV is selected by H-CV(h) = argmin_h n_X−H i=H+1 _X (i+1)∆−Xi∆ ∆ −ˆb(Xi∆) 2 ,

see Burmann et al. (1994), where also an ad-hoc choice of the sequence H is given as

H =_⌊n1/4_⌋_.

For practical issues, H can, for instance, be selected by analyzing the empirical auto- correlation function of the data.

2.8 Comparison to an alternative nonparametric estimation ap-

In document Kernel based nonparametric coefficient estimation in diffusion models (Page 43-45)