## Nonparametric Estimation of Scalar Di¤usion

## Models of Interest Rates Using Asymmetric Kernels

Nikolay Gospodinovy _{Masayuki Hirukawa}z
Concordia University and CIREQ Setsunan University

This Version: February 2012

Abstract

This paper proposes an asymmetric kernel-based method for nonparametric estimation of scalar di¤usion models of spot interest rates. We derive the asymptotic theory for the asymmetric kernel estimators of the drift and di¤usion functions for general and positive recurrent processes and illustrate the advantages of the Gamma kernel for bias correction and e¢ ciency gains. The …nite-sample properties and the practical relevance of the pro-posed nonparametric estimators for bond and option pricing are evaluated using actual and simulated data for U.S. interest rates.

Keywords: Nonparametric regression; Gamma kernel; di¤usion estimation; spot interest rate; derivative pricing.

JEL classi…cation: C13; C14; C22; E43; G13.

We would like to thank the Co-Editor (Franz Palm), three anonymous referees, Evan Anderson, Zongwu Cai, Bruce Hansen, Arthur Lewbel, Peter Phillips, Jeroen Rombouts, Ximing Wu, and participants at the 2007 MEG, 2008 CESG and 2008 SEA meetings and seminars at DePaul, Loyola, Purdue, and Queen’s University for helpful comments. The …rst author gratefully acknowledges …nancial support from FQRSC, IFM2 and SSHRC. The second author gratefully acknowledges …nancial support from Japan Society of the Promotion of Science (grant number 23530259).

y_{Department of Economics, Concordia University, 1455 de Maisonneuve Blvd.} _{West, Montreal,}

Quebec H3G 1M8, Canada; phone: (+1)514-848-2424 (ext. 3935); fax: (+1)514-848-4536; e-mail: [email protected]; web: http://alcor.concordia.ca/~gospodin/.

z_{Faculty of Economics, Setsunan University, 17-8 Ikeda Nakamachi, Neyagawa, Osaka, 572-8508, }

Ja-pan; phone: (+81)72-839-8095; fax: (+81)72-839-8138; e-mail: [email protected]; web: http://www.setsunan.ac.jp/~hirukawa/.

### 1

### Introduction

Nonparametric methods have attracted growing attention in econometrics due to their

‡ex-ibility for handling possible nonlinearities in conditional moment function estimation. This

paper focuses on nonparametric estimation of time-homogeneous drift and di¤usion functions

in continuous-time models that are used to describe the underlying dynamics of spot interest

rates. More speci…cally, we improve the nonparametric estimators of drift and di¤usion

func-tions of Stanton (1997) by means of a nonstandard smoothing technique. The main source

for this improvement comes from asymmetric kernel functions which are employed in place of

standard symmetric kernels.

Kernel smoothing has been widely applied for estimating continuous-time di¤usion

pro-cesses; examples of recent contributions include Florens-Zmirou (1993), Aït-Sahalia (1996a,b),

Jiang and Knight (1997), Stanton (1997), Chapman and Pearson (2000), Bandi (2002), Bandi

and Phillips (2003), Fan and Zhang (2003), Nicolau (2003), and Arapis and Gao (2006). An

interesting situation that often arises in economics and …nance is when the conditioning

vari-ables are nonnegative (i.e., they have a natural boundary at the origin) such as nominal interest

rates, volatility, etc. In this case, the Nadaraya-Watson (“NW”) regression estimator based on

a standard symmetric kernel is not appropriate without a boundary correction for the region

near the origin. This has generated a large literature on correction methods for boundary

e¤ects in nonparametric regression estimation which includes Gasser and Müller (1979), Rice

(1984), Müller (1991), Fan and Gijbels (1992), Fan (1993), among others.

As a viable alternative to these boundary correction methods, Chen (2000) and Scaillet

(2004) advocate the use of asymmetric kernel functions1 that have the same support as the

1

Strictly speaking, asymmetric kernel functions should be referred to as kernel-type weighting functions. In a slightly di¤erent context, Gouriéroux and Monfort (2006) and Jones and Henderson (2007) argue that unlike the case of symmetric kernels, the roles of the data pointX and the design pointxin asymmetric kernels are not exchangeable, which leads to a lack of normalization in density estimation using these kernels. Nevertheless, we follow the adopted convention in the literature for these kernel-type functions. Also, note that the lack of normalization is not an issue in regression estimation.

marginal density of the conditioning variables. In addition to eliminating boundary e¤ects,

asymmetric kernels have a number of important advantages in the analysis of economic and

…nancial data.

First, the shape of the asymmetric kernels varies according to the position at which

smoothing is made; in other words, the amount of smoothing changes in an adaptive

man-ner. Figure 1 plots the shapes of a Gamma kernel function for …ve di¤erent design points

(x= 0:00;0:02;0:04;0:08 and 0:16) at which the smoothing is performed.2 It is worth noting that for all plotted functions, the smoothing parameter value of the Gamma kernel is …xed. In

contrast, the amount of smoothing by symmetric kernels with a single bandwidth parameter

is …xed everywhere. Since the distribution of interest rates is empirically characterized by a

mode near the boundary and the sparse tail region, a single bandwidth does not su¢ ce in this

situation; while it is appropriate to employ a short bandwidth in the region near the boundary,

a longer bandwidth is required to capture the shape of the tail. As a result, when the data

in-dicate that the distribution contains both dense and sparse regions, it may prove useful to turn

to variable bandwidth methods (e.g. Abramson, 1982; Fan and Gijbels, 1992). The adaptive

smoothing property of asymmetric kernels bears some similarities to the variable bandwidth

methods, but unlike these methods, the adaptive smoothing for asymmetric kernels is achieved

by a single smoothing parameter which makes them much more appealing in empirical work.

Second, asymmetric kernels achieve the optimal rate of convergence (in mean integrated

squared error sense) within the class of nonnegative kernel estimators. Third, unlike the case

with symmetric kernels, the variances of asymmetric kernel estimators tend to decrease as the

design point moves away from the boundary. This property is particularly advantageous for

the analysis of interest rate data since the support of the density has a sparse region for high

interest rate values. In sum, we can view asymmetric kernels as a combination of a boundary

2

correction device and a “variable bandwidth” method.

Although asymmetric kernels are relatively new in the literature, several papers report

favorable evidence from applying them to empirical models in economics and …nance. A

non-exhaustive list includes: (i) estimation of recovery rate distributions on defaulted bonds

(Renault and Scaillet, 2004), (ii) income distribution estimation (Bouezmarni and Scaillet,

2005; Hagmann and Scaillet, 2007); (iii) actuarial loss distribution estimation (Hagmann and

Scaillet, 2007; Gustafsson et al., 2009); (iv) hazard estimation (Bouezmarni and Rombouts,

2008); (v) regression discontinuity design (Fé, 2010); and (vi) realized integrated volatility

estimation (Kristensen, 2010).

Incorporating asymmetric kernel smoothing into the nonparametric drift and di¤usion

es-timators of Stanton (1997) is expected to shed additional light on the nonparametric estimation

of spot rate di¤usion models. The Monte Carlo simulations in Chapman and Pearson (2000)

indicate that when the true drift is linear in the level of the spot interest rate, there are two

biases in Stanton’s (1997) symmetric kernel estimate of the drift function; namely, a bias near

the origin and a pronounced downward bias in the region of high interest rates where the

data are sparse. While the existing boundary correction methods may provide a remedy for

improving the performance of the drift estimate near the boundary, they are of little help for

removing the bias that occurs for high values of the spot rate. In contrast, given the appealing

properties of asymmetric kernels listed above, asymmetric kernel smoothing is expected to

reduce substantially both of these biases in the estimate of the drift function.

The remainder of the paper is organized as follows. Section 2 develops the asymmetric

ker-nel estimators of the drift and di¤usion functions and establishes their asymptotic properties.

It also discusses the smoothing parameter selection and the implementation of a bootstrap

procedure for inference. In Section 3, the proposed estimators are implemented to study the

simula-tion experiment that examines the …nite-sample performance of the symmetric and asymmetric

kernel estimators of di¤usion models and their practical relevance for bond and option pricing.

Section 4 summarizes the main results of the paper. Assumptions and proofs are provided in

the Appendix.

### 2

### Estimation of Scalar Di¤usion Models via Asymmetric

### Ker-nel Smoothing

2.1 NW Estimators Using the Gamma Kernel

The estimation of scalar di¤usion models plays an important role in the analysis of term

struc-ture of interest rates and derivative pricing. The general form of the underlying continuous-time

process for the spot rate Xt is represented by the stochastic di¤erential equation (“SDE”)

dXt= (Xt)dt+ (Xt)dWt; (1)

whereWtis a standard Brownian motion, (Xt)is the drift function and (Xt)is the di¤usion

(or instantaneous volatility) function. We assume that the initial condition X0 is …xed, i.e.

X0 =x0 2(0;1).3

Stanton (1997) develops a nonparametric approach to estimating the dynamics of spot

interest rate driven by the scalar di¤usion process (1). More speci…cally, Stanton (1997) uses

the in…nitesimal generator and a Taylor series expansion to give the …rst-order approximations

to (X)and 2(X)as

E(Xt+ XtjXt) = (Xt) +o( ); (2)

E (Xt+ Xt)2 jXt = 2(Xt) +o( ); (3)

where is a discrete, arbitrarily small, time step and o( ) denotes a remainder term which
goes to 0 as _{!} 0.4 _{Each of these conditional expectations at the design point} _{x} _{can}

3

This particular initialization is used in Florens-Zmirou (1993), among others. Alternative initializations are also possible.

4

be estimated by running a nonparametric regression. The approach by Stanton (1997) is

convenient because it enables us to estimate ( )and 2( ) separately, whereas the approaches by Aït-Sahalia (1996a) and Jiang and Knight (1997) require their sequential estimation; see

Arapis and Gao (2006) for a comparison of these three methods.

Since the conditioning variableXtis nonnegative, it is not appropriate to employ symmetric

kernels for estimating (X)and 2_{(}_{X}_{)}_{without a boundary correction. The key building block}

for nonparametric drift and di¤usion estimation that we propose in this paper is the NW kernel

regression estimator based on the Gamma kernel function (Chen, 2000)5

K_{G}_{(}_{x=b}_{+1}_{;b}_{)}(u) = u

x=b_{exp (} _{u=b}_{)}

bx=b+1 _{(}_{x=b}_{+ 1)}1fu 0g; (4)

where ( ) = R_{0}1y 1exp ( y)dy; > 0 is the Gamma function and b is the smoothing
parameter. Since the density of a Gamma distribution has support[0;_{1}), the Gamma kernel
function does not generate a boundary bias.

Suppose that we observe a discrete sample_{f}Xi gni=1 atnequally spaced time points from

the short-rate di¤usion process _{f}Xt: 0 t Tg satisfying the SDE (1). Here is the step

size between observations and T = n is the time span of the sample. Given a design point

x >0, we de…ne the Gamma-NW estimator of the drift (X) and di¤usion function 2(X) as

bb(x) =
1 Pni=11 X(i+1) Xi KG(x=b+1;b)(Xi )
Pn 1
i=1 KG(x=b+1;b)(Xi )
(5)
and
b2b(x) =
1 Pn_{i}_{=1}1 X(i+1) Xi 2KG(x=b+1;b)(Xi )
Pn 1
i=1 KG(x=b+1;b)(Xi )
: (6)

more accurate. However, Fan and Zhang (2003) show that higher-order approximations tend to reduce biases at the expense of in‡ating variances nearly exponentially, and thus we focus only on …rst-order approximations.

5_{Two other asymmetric kernels, namely, the Inverse Gaussian and Reciprocal Inverse Gaussian kernels }

(Scail-let, 2004), can be also applied in this context, but we focus exclusively on the Gamma kernel due to its its better …nite-sample performance (see Gospodinov and Hirukawa, 2007).

2.2 Asymptotic Properties of Gamma-NW Estimators

In this section, we present some large sample results for the Gamma-NW estimator in the

di¤usion model (1). Following Bandi and Phillips (2003; abbreviated as “BP” hereafter), we

explore the in-…ll and long span asymptotics such that n _{! 1}, T _{! 1} (long span), and

= T =n _{!} 0 (in-…ll). It is worth mentioning that the asymptotic results in BP are based
on the assumption of symmetric, nonnegative kernel functions; see Assumption 2 in BP. None

of the asymmetric kernels proposed by Chen (2000) and Scaillet (2004) can be expressed in

the form of K(X x;b) for a data point X, design point x, and smoothing parameter b. Therefore, establishing the limiting theory for the Gamma estimator does not directly follow

from BP and is new to the literature.

The …rst results are based on the assumption that the short-rate process Xt is recurrent.

LetS(x) denote the natural scale function of Xt which is de…ned, for some generic constant

c_{2}(0;_{1}), as
S(x) =
Z x
c
exp
Z y
c
2 (u)
2_{(}_{u}_{)} du dy; (7)

ands(x)be the speed function of Xt given by

s(x) = _{2} 2

(x)S0_{(}_{x}_{)}: (8)

Because the range ofXtis(0;1),Xtbecomes recurrent if and only iflimx!0S(x) = 1and

limx!1S(x) =1 (Assumption 1(iii) in Appendix A). The full set of regularity conditions is

provided in Appendix A.

In the subsequent analysis, the concept of local time plays a key role. The chronological

local time of the di¤usion process (1) is de…ned as

LX(T; x) =
1
2_{(}_{x}_{)}lim_{!}_{0}
1Z T
0
1[x;x+ )(Xs) 2(Xs)ds (9)

for every x > 0, where 1A( ) denotes the indicator function on the set A. LX(T; x) is a

whenXtis strictly stationary,LX(T; x)=T a:s:

! f(x), wheref(x)is the time-invariant marginal density ofXt; see Bosq (1998, Theorem 6.3), for instance. In contrast, for a general recurrent

di¤usion, LX(T; x) diverges to in…nity at a rate no faster thanT. Using the Gamma kernel,

LX(T; x)for a …xed T can be (strongly) consistently estimated by

b LX(T; x; b) = n X i=1 KG(x=b+1;b)(Xi ): (10)

To describe the di¤erent asymptotic properties of the estimator depending on the position

of the design point x, we denote by “interior x” and “boundary x” a design point x that

satis…es x=b _{! 1} and x=b _{!} for some > 0 as n; T _{! 1}, respectively. Also, let “_{)}”
denote weak convergence to a random sequence. Theorem 1 states the asymptotic properties

of the Gamma-NW estimator for general recurrent di¤usions.

Theorem 1 (general recurrent case).

(i) (Drift estimator) Suppose that Assumptions 1 and 3 in Appendix A hold. In addition,
if b5=2LX(T; x) =Oa:s:(1), then
q
b1=2_{L}b
X(T; x; b) bb(x) (x) BR(x)b )N 0;
2_{(}_{x}_{)}
2p x1=2 (11)

for interior x, and q

bLbX(T; x; b)fbb(x) (x)g )N 0;

(2 + 1) 2(x)

22 +1 2_{( + 1)} (12)

for boundary x, where BR(x) = 0(x)_{f}1 +xs0(x)=s(x)_{g}+ (x=2) 00(x).

(ii) (Di¤usion estimator) Suppose that Assumptions 1 and 3 in Appendix A hold. In addition, if b5=2LX(T; x)= =Oa:s:(1), then

s
b1=2_{L}b
X(T; x; b)
b2b(x) 2(x) BR2(x)b )N 0;
4_{(}_{x}_{)}
p
x1=2 (13)

for interior x, and
s
bLbX(T; x; b)
b2b(x) 2(x) )N 0;
(2 + 1) 4(x)
22 2_{( + 1)} (14)

for boundary x, where BR2(x) = 2(x)

0

f1 +xs0(x)=s(x)_{g}+ (x=2) 2(x) 00.

Proof. See Appendix B.

Theorem 1 establishes the limiting behavior of the drift and di¤usion function estimators

when a single smoothing parameter b is used for both interior and boundary regions. For

each of the results, the rate on b balances orders of magnitude in squared bias and variance

for interior x. This rate undersmooths the curve for boundary x, and as a consequence, the

O(b) leading bias term becomes asymptotically negligible. The theorem also suggests that the di¤usion estimator has a faster convergence rate than the drift estimator, regardless of

the position of the design point x. This result is consistent with Remark 8 in BP. However,

the chronological local time is random, and thus convergence rates of the two estimators are

path-dependent (BP, Remark 4).

The existing literature often assumes stationarity or ergodicity of the processXt. To derive

the distributional theory of the Gamma-NW estimator in this case, we impose the additional

condition thatR_{0}1s(x)dx <_{1}(Assumption 2 in Appendix A). Using this assumption, it can
be demonstrated that LbX(T; x; b)=T

a:s:

! f(x) and that s0_{(}_{x}_{)}_{=s}_{(}_{x}_{) =} _{f}0_{(}_{x}_{)}_{=f}_{(}_{x}_{)}_{. }

Accord-ingly, we have the following corollary.

Corollary 1 (positive recurrent case).

(i) (Drift estimator) Suppose that Assumptions 1, 2 and 3 in Appendix A hold. In addition,
if b=O T 2=5 , then
p
T b1=2 _{b}
b(x) (x) BS(x)b )N 0;
2_{(}_{x}_{)}
2p x1=2_{f}_{(}_{x}_{)} (15)

for interior x, and

p

T b_{f}_{b}_{b}(x) (x)_{g )}N 0; (2 + 1)

2_{(}_{x}_{)}

22 +1 2_{( + 1)}_{f}_{(}_{x}_{)} (16)

for boundary x, where BS_{(}_{x}_{) =} 0_{(}_{x}_{)}_{f}_{1 +}_{xf}0_{(}_{x}_{)}_{=f}_{(}_{x}_{)}_{g}_{+ (}_{x=}_{2)} 00_{(}_{x}_{)}_{.}

(ii) (Di¤usion estimator) Suppose that Assumptions 1, 2 and 3 in Appendix A hold. In
addition, if b=O n 2=5 , then
p
nb1=2 _{b}2
b(x) 2(x) BS2(x)b )N 0;
4_{(}_{x}_{)}
p
x1=2_{f}_{(}_{x}_{)} (17)

for interior x, and

p

nb _{b}2_{b}(x) 2(x) _{)}N 0; (2 + 1)

4_{(}_{x}_{)}

22 2_{( + 1)}_{f}_{(}_{x}_{)} (18)

for boundary x, where BS2(x) = 2(x) 0f1 +xf0(x)=f(x)g+ (x=2) 2(x) 00.

In Corollary 1, the leading bias terms in_{b}_{b}(x)and_{b}2_{b}(x)for interiorx, namely,BS(x)band

BS

2(x)b, are from nonparametric estimation of the conditional expectation using the Gamma

kernel; on the other hand, the bias terms for boundaryxbecome asymptotically negligible due

to undersmoothing as in Theorem 1. While the bias terms are of order O(b), it follows from

=T =n _{!}0 that_{b}_{b}(x) has a slower convergence rate of orderO T 1=2b 1=4 for interiorx

andO T 1=2b 1=2 for boundary x. Therefore, if the sample size is not su¢ ciently large, it is

much harder to estimate the drift accurately, especially for design points in the region of high

values where the data are sparse. However, the property of asymmetric kernel smoothing that

the variance of the estimator decreases withx, is expected to provide an e¤ective remedy.

Furthermore, Corollary 1 and the trimming argument in Chen (2000) yield the mean

integ-rated squared error (“MISE”) of each estimator given byM ISE_{f}_{b}_{b}(x)_{g}=O b2+T 1b 1=2

and M ISE _{b}2_{b}(x) = O b2+n 1b 1=2 . Then, the MISE-optimal smoothing parameters
are b = O T 2=5 and b 2 = O n 2=5 for bb(x) and b2b(x), respectively. This result

in Chapman and Pearson (2000, p.367). As a consequence, the optimal MISEs are of order

M ISE _{f}_{b}_{b}(x)_{g}=O T 4=5 and M ISE _{b}2_{b}(x) =O n 4=5 .

2.3 Selection of Smoothing Parameter

The practical implementation of the proposed nonparametric estimator requires a choice of a

smoothing parameter.6 Indeed, as argued in Chapman and Pearson (2000), the performance

of the NW estimator depends crucially on the choice of a bandwidth parameter, as well as

the persistence of the data. In this respect, asymmetric kernels are not an exception, even if

they possess the property of adaptive smoothing with a single smoothing parameter. While

the result in the previous section provides some guidance in this direction, the expression for

the optimal smoothing parameter in mean squared error (“MSE”) or MISE sense involves

unknown quantities and a “plug-in rule”is di¢ cult to obtain. Note also that the MSE-optimal

smoothing parameter depends explicitly on the design point and, in principle, they should

take di¤erent values at each x. Hagmann and Scaillet (2007), however, argue for a single,

global smoothing parameter since the dependence on the design point x may deteriorate the

adaptability of asymmetric kernels.

In this paper, we adopt a cross-validation (“CV”) approach to choosing a global smoothing

parameter for nonparametric curve estimation based on asymmetric kernels. Since the data

are dependent, the leave-one-out CV is not appropriate. Instead, we work with theh-block CV

version of Györ…et al. (1989) and Burmanet al. (1994), whereh data points on both sides of

thes-th observation are removed from T observations _{f}XtgTt=1 and the unknown function of

interestm(Xs)is estimated from the remainingT (2h+ 1)observations. The idea behind this

method is that for ergodic processes, the blocks of length h are asymptotically independent

although the block size may need to shrink (at certain rate) relative to the total sample size in

6_{Bandwidth selection for kernel estimators of continuous-time di¤usion models is an on-going research }

pro-gram. Some recent contributions tailored to continous-time Markov processes include Bandiet al. (2010) and Kanaya and Kristensen (2011).

order to ensure the consistency of the procedure. Hallet al. (1995) establish the asymptotic

properties of theh-block CV method.

In our context with equally spaced observations_{f}Xi gni=1, letm^ (i h) :(i+h) (Xi )denote

the estimate of the drift (Xi ) or the di¤usion 2(Xi ) from n (2h + 1) observations

X ; X2 ; :::; X(i h 1) ; X(i+h+1) ; :::; Xn (=XT) . Then, the smoothing parameter can

be selected by minimizing the least squares cross-validation function

CV (b) = arg min
b2B
n h_{X}
i=h+1
Yi m^ (i h) :(i+h) (Xi ) 2 (Xi ); (19)
where Yi = X(i+1) Xi = and X(i+1) Xi
2

= when estimating the drift and

di¤usion functions, respectively, _{B} is a predetermined range of b, and ( ) is a weighting
function that has compact support and is bounded by 1. In our numerical work, we set

( ) 1. This procedure could be further extended to the modi…ed (asymptotically optimal)

h-block CV method proposed by Racine (2000).

Since formal results on the optimal selection of the block size h are not available in the

literature, we propose an automatic, data-dependent choice of h that takes into account the

persistence of the data. While Burmanet al. (1994) and Racine (2000) adopt an ad hoc …xed

fraction rule (h =n1=4), this choice ofh does not account for di¤erent degrees of persistence in the underlying process. Given some similarities between the choice of h and the selection

of a bandwidth parameter in the heteroskedasticity and autocorrelation consistent (“HAC”)

variance estimation, we follow Andrews (1991) and modify the plug-in-rule forhtoh= ( n)1=4,
where = _{(1} _{)}42_{(1+ )}2 2 and <1is the AR coe¢ cient from an AR(1) regression forfXi g

n i=1.7

When = 0 (or equivalently = 0), this procedure naturally reduces to the leave-one-out CV for serially uncorrelated data.

2.4 Bootstrap Procedure

Inference on the conditional expectations (Xt)and 2(Xt)can be conducted using the

asymp-totic distributions derived in Theorem 1 or Corollary 1 above. This involves explicit estimation

of unknown functions of the data that enter the asymptotic distributions. A more practical

approach, that typically provides better approximations to the …nite-sample distributions of

the statistics of interest, employs simulation-based or bootstrap methods.8 In particular, we

propose a parametric bootstrap procedure based on the discretized version of model (1)

X_{(}_{i}_{+1)} Xi = (Xi ) + (Xi )

p

"i ; (20)

where"i iid

N(0;1). Our bootstrap method also deals explicitly with the discretization bias induced by the discrete-time version of the underlying model. This is achieved by directly

sampling from the continuous-time model with a step size set to_{4}=M for someM 1(in our
empirical application,M = 100) while keeping the design points the same across the bootstrap
samples.9 More speci…cally, using the estimated drift and di¤usion functions_{b}_{b}(x)and_{b}b(x),10

we setx_{0}=Xi and compute recursively

x_{k}=x_{k} _{1}+_{b}_{b}(x_{k} _{1})

M +bb(xk 1)

r

M"k; (21)

fork= 1; :::M, where "_{k}iidN(0;1). We then set X_{(}_{i}_{+1)} =x_{M} and repeat this procedure for
each design point to construct the bootstrap sample nX_{(}_{i}_{+1)} on

i=1: Finally, each bootstrap

seriesX_{(}_{i}_{+1)} X_{i} is used to re-estimate (X) and (X) at the same design points.

Let _{b}(x)and _{b}(x)denote the corresponding bootstrap estimates. Repeating this
proced-ure a large number of times provides an approximation to the distributions of _{b}(x) _{b}_{b}(x)

8_{Bootstrap methods for inference in nonparametric regression models have been developed by Härdle and}

Marron (1991) and Frankeet al. (2002), among others.

9_{We would like to thank an anonymous referee for suggesting this to us.}
1 0

The di¤usion estimates are obtained by imposing the restriction (0) = 0 as in Stanton (1997) which
substantially reduces the bias documented in Fan and Zhang (2003) and ensures the nonnegativity of the
simulated interest rates as_{4 !}0.

and _{b}(x) _{b}b(x) at each design point. Then, pointwise con…dence bands can be

construc-ted as _{b}_{b}(x) q 1 _{2} ; _{b}_{b}(x) q _{2} and _{b}b(x) q 1 _{2} ; bb(x) q _{2} , where

q ( ) and q ( ) denote the -quantiles of the bootstrap distributions of _{b}(x) _{b}_{b}(x) and

b(x) bb(x). This method can also be used for bias correction of the original estimates

bb(x) and bb(x) which could further improve the accuracy of the approximation in a double

bootstrap procedure.

### 3

### Empirical and Simulation Analysis

3.1 Dynamics of U.S. Risk-Free RateOne outstanding question in the analysis of the term structure of U.S. interest rates is

con-cerned with the presence of possible nonlinearities in the underlying dynamics of the spot rate.

Despite the large literature on this issue (Aït-Sahalia, 1996a,b; Stanton, 1997; Chapman and

Pearson, 2000; Bandi, 2002; Durham, 2003; Fan and Zhang, 2003; Jones, 2003; Arapis and

Gao, 2006; among others), there is still no consensus on the presence of statistically signi…cant

nonlinearities in short-term interest rates and their economic importance for pricing bonds

and interest rate derivative products. For instance, Chapman and Pearson (2000) argue that

the nonlinearity in the drift of the spot rate at high values of interest rates documented by

Stanton (1997) could be spurious due to the poor …nite-sample properties of Stanton’s (1997)

estimator. Fan and Zhang (2003) conclude that there is little evidence against linearity in the

short rate drift function. Similarly, Bandi (2002), Durham (2003) and Jones (2003) …nd no

empirical support for nonlinear mean reversion in short-term rates. In contrast, Arapis and

Gao (2006) report that their speci…cation testing strongly rejects the linearity of the short rate

drift at both daily and monthly frequency.

Here, we revisit this issue by estimating the drift function using our proposed Gamma kernel

estimator. The data used for estimation are 564 monthly observations for the annualized

January 1952 – December 1998. As it is typically the case for interest rate data, the U.S.

risk-free rate exhibits high persistence and strong conditional heteroskedasticity. Given the

sensitivity of the nonparametric estimates of the drift function to the bandwidth (Chapman

and Pearson, 2000), this high persistence should be properly incorporated into the selection

procedure for a smoothing parameter.

Since our interest lies in identifying possible nonlinearities in the drift function of the spot

rate, it proves instructive to estimate the CIR (Coxet al., 1985) model

dXt= ( Xt)dt+ Xt1=2dWt; (22)

which speci…es the drift as a linear function in X of the form (Xt) = ( Xt), where

denotes the speed of mean reversion and is the long-run mean of X. Given the nature

of the observed data for the spot rate, the unknown parameters ( ; ; ) are estimated from the discretized version of the model with = 1=12 which induces a discretization bias. To correct for the discretization bias, we follow Brozeet al. (1995) and estimate the parameters

of the CIR model by indirect inference using the same auxiliary model and discretization step

as suggested by Broze et al. (1995). The resulting estimates for the U.S. risk-free rate are

b= 0:2804,b= 0:0541and _{b}= 0:0876.

Figure 2 presents the nonparametric estimates of the drift function for the risk-free rate

from the Gamma NW and Gaussian NW regressions along with the estimated parametric drift

function from the CIR model. The smoothing parameter for the Gamma kernel estimator is

chosen byh-block CV described in Section 2.3 while the smoothing parameter for the Gaussian

kernel is set equal to four times the bandwidth foriiddata as in Stanton (1997). The di¤erences

in the nonparametric estimates reveal that part of the accelerated mean reversion at higher

interest rates, reported in the previous literature, may be due to larger biases of the symmetric

kernel-based estimators although the Gamma kernel estimator also suggests that the mean

signi…cance of these …ndings, we also report 95% con…dence bands for the Gamma kernel

estimates obtained by the bootstrap procedure outlined in Section 3.4 with 999 bootstrap

replications. Despite the mild nonlinearity of the nonparametric estimates, the parametric

estimate of the drift function from the CIR model falls within the 95% con…dence bands (with

the exception of very low interest rates) and it appears that the null hypothesis of a linear drift

cannot be rejected for this particular sample. Also, the nonparametric estimate based on the

Gaussian kernel lies inside the 95% Gamma kernel-based con…dence bands except for very large

interest rates where the bias due to sparseness of the data becomes more pronounced. The

Gamma kernel estimate of the di¤usion function and the associated 95% bootstrap con…dence

bands are plotted in Figure 3. To gain further insights into the …nite-sample properties of the

nonparametric estimators and the economic importance of these seemingly small statistical

di¤erences for bond and option pricing, we conduct a simulation experiment whose design

closely mimics our empirical application.

3.2 Monte Carlo Experiment

The data for the simulation experiment is generated from the CIR model (22) with true

parameter values that are set equal to the estimates from U.S. data in the previous section,

( ; ; ) = (0:2804;0:0541;0:0876);and a time step between two consecutive observation equal to = 1=12 corresponding to monthly data. The CIR model is convenient because the transition and marginal densities are known and the bond and call option prices are available in

closed form (Coxet al., 1985). 5;000sample paths for the spot interest rate of600observations (n= 600; T = 50) are constructed recursively by drawing random numbers from the transition non-central chi-square density and using the values for , ; and (see Chapman and

Pearson, 2000, for more details).

The expressions for the price of a coupon discount bond and a call option on a

Jiang (1998) and Phillips and Yu (2005) and compute the prices of a three-year zero-coupon

discount bond and a one-year European call option on a three-year discount bond with a face

value of $100 and an exercise price of $87 with an initial interest rate of 7% by simulating spot

rate data from the estimated di¤usion process. The simulated bond and derivative prices are

then compared to the analytical prices based on the true values of the parameters.

More speci…cally, the price of a zero-coupon bond with face value P0 and maturity( t)

is computed as
P_{t} =P0Et exp
Z
t
~
Xudu ; (23)

where the risk-neutral process X~t evolves as dX~t = b( ~Xt)dt+b( ~Xt)dWt, and b( ) and b( )

denote the nonparametric estimates of the drift and di¤usion functions, respectively.11 The

expectation is evaluated by Monte Carlo simulation using a discretized version of the dynamics

of the spot rate.

The price of a call option with maturity (m t) on a zero-coupon bond with maturity

( t);face value P0 and exercise priceK is computed as

C_{t}m = Et exp
Z m
t
~
Xudu max (Pm K;0)
= Et exp
Z m
t
~
Xudu max P0Em exp
Z
m
~
Xudv K;0 ; (24)

where m < and sample paths for X~t are simulated from the nonparametrically estimated

discretized model of spot rate.

We consider the NW estimators based on Gaussian and Gamma kernels with smoothing

parameters that are selected by h-block CV. The average Monte Carlo estimates and 95%

Monte Carlo con…dence bands for the drift function are plotted in Figures 4 and 5. While

the Gamma NW kernel estimator is practically unbiased, the Gaussian NW kernel exhibits

a downward bias for interest rates higher than 9-10%. Furthermore, the asymmetric kernel

1 1

For simplicity, the market price of risk is assumed to be equal to zero since its computation requires another interest rate process of di¤erent maturity.

estimator exhibits smaller variability both near the boundary and in the tail region where the

data are sparse.12

The economic signi…cance of the improved estimation of di¤usion models of spot rate is

evaluated by comparing bond and option pricing errors based on symmetric and asymmetric

kernel estimators13 for the CIR model. The results are presented in Table 1. The bond and

derivative prices based on the asymmetric kernel estimator are less biased than its Gaussian

counterpart which could translate in potentially substantial …nancial bene…ts and investment

pro…ts for large portfolios. Another important feature of the results is that the Gamma-based

bond and option prices enjoy much smaller variability and tighter con…dence intervals than the

symmetric kernel-based option prices. Finally, while the con…dence intervals for the Gamma

kernel-based call prices appear symmetric around the median estimate, the con…dence intervals

for the Gaussian kernel-based call prices tend to be highly asymmetric (with long right tail)

which illustrates the economic impact of the boundary problems of symmetric kernels.

### 4

### Conclusion

This paper proposes an asymmetric kernel estimator for estimating the underlying scalar

di¤u-sion model of spot interest rate. The asymptotic properties of the Gamma kernel estimator of

the drift and di¤usion functions are established for both interior and boundary design points.

We show that the Gamma kernel estimator possesses some appealing properties such as lack

of boundary bias and adaptability in the amount of smoothing. The paper adopts a block

cross-validation method for dependent data in choosing the smoothing parameter. The

…nite-sample performance of the asymmetric kernel estimator and its practical relevance for bond

1 2

The estimates of the di¤usion function ( ) are not reported to conserve space. Overall, the di¤usion function estimates based on the Gamma and Gaussian kernels are similar due to the zero restriction of the di¤usion function at the origin (see footnote 10). When this restriction is removed, the Gamma kernel estimator tends to perform better as a result of its intrinsic boundary correction property.

1 3

Kristensen (2008) demonstrates that when the processXtis stationary, derivative prices obtained by plug-in

nonparametric estimates of the drift and di¤usion functions have the same rate of convergence as parametric estimators.

and option pricing are evaluated using simulated data of spot interest rate. The empirical

ana-lysis of the U.S. risk-free rate reveals the existence of some mild (and statistically insigni…cant)

nonlinearity in the drift function although the estimated increased mean reversion at higher

interest rate levels is substantially smaller than originally reported by Stanton (1997).

### A

### Appendix: Assumptions

This appendix provides a set of assumptions for Theorem 1 and Corollary 1 that are similar

to those in Bandi (2002) and BP. Our …rst assumption is basically the same as Assumption

1 in BP, which ensures that the SDE (1) has a unique strong solution Xt and that Xt is

recurrent. Assumption 2, together with Assumption 1, implies that the processXt is positive

recurrent (or ergodic) and ensures the existence of a time-invariant distributionP0with density

f(x) =s(x)=R_{0}1s(x)dx. Moreover, ifX0 has distributionP0,Xtbecomes strictly stationary.

Assumption 3 controls the rates of convergence or divergence of the sequences used in the

asymptotic results for general and positive recurrent cases.

Assumption 1.

(i) ( ) and ( ) are time-homogeneous,B-measurable functions on (0;_{1}), where B is the
-…eld generated by Borel sets on (0;_{1}). Both functions are at least twice continuously
di¤erentiable. Hence, they satisfy local Lipschitz and growth conditions. Thus, for every

compact subset J of the range (0;_{1}), there exist constants C_{1}J and C_{2}J such that, for all

x; y_{2}J,_{j} (x) (y)_{j}+_{j} (x) (y)_{j} CJ

1 jx yjand j (x)j+j (x)j C2Jf1 +jxjg.

(ii) 2( )>0 on(0;_{1}).

(iii) The natural scale functionS(x) satis…eslimx!0S(x) = 1and limx!1S(x) =1.

Assumption 3. Asn; T _{! 1}, (=T =n)_{!}0,b(=bn;T)!0such that

LX(T; x)

b

s

log 1 =oa:s:(1): (A1)

### B

### Appendix: Preliminary Lemma and Proof of Theorem 1

The proof of Theorem 1 closely follows the proof of Theorem 3 in BP by adapting it to the

case of the Gamma kernel. We provide only the proof for the drift estimator _{b}_{b}(x), because
the one for the di¤usion estimator_{b}2_{b}(x)follows similar arguments. To establish the results in
Theorem 1, we need the following lemma.

Lemma 1. Under Assumptions 1 and 3 in Appendix A,

n 1 X i=1 KG(x=b+1;b)(Xi ) (Xi ) = Z T 0 KG(x=b+1;b)(Xs) (Xs)ds+oa:s:(1): (B1)

Proof of Lemma 1. Consider that

n 1
X
i=1
K_{G}_{(}_{x=b}_{+1}_{;b}_{)}(Xi ) (Xi )
Z T
0
K_{G}_{(}_{x=b}_{+1}_{;b}_{)}(Xs) (Xs)ds
n 1
X
i=0
Z (i+1)
i
K_{G}_{(}_{x=b}_{+1}_{;b}_{)}(Xi ) KG(x=b+1;b)(Xs) (Xi )ds
+
n 1
X
i=0
Z (i+1)
i
K_{G}_{(}_{x=b}_{+1}_{;b}_{)}(Xs)f (Xs) (Xi )gds + KG(x=b+1;b)(X0) (X0)
A1(x) +A2(x) +Oa:s:( ); (B2)

where the bound of the third term is obtained fromX0 =x0, Assumption 1 and boundedness

of the kernel. Observe that

A1(x)
n 1
X
i=0
Z (i+1)
i
K_{G}0 _{(}_{x=b}_{+1}_{;b}_{)} X~is jXs Xi j j (Xi )jds (B3)

forX~is on the line segment connectingXs and Xi . Since

kn;T max
i n _{i} _{s}sup_{(}_{i}_{+1)} jXs Xi j=Oa:s:
s
log 1
!
(B4)

(see p.267 in BP),X~is=Xs+oa:s:(1)uniformly over i n, and thus
A1 kn;T
n 1
X
i=0
Z (i+1)
i
K_{G}0 _{(}_{x=b}_{+1}_{;b}_{)}(Xs+oa:s:(1)) j (Xs+oa:s:(1))jds
= kn;T
Z T
0
K_{G}0_{(}_{x=b}_{+1}_{;b}_{)}(Xs+oa:s:(1)) j (Xs+oa:s:(1))jds
= kn;T
b
Z _{1}
0
b K_{G}0_{(}_{x=b}_{+1}_{;b}_{)}(u+oa:s:(1)) j (u)jLX(T; u)du: (B5)
Note that
K_{G}0_{(}_{x=b}_{+1}_{;b}_{)}(u) = x
b
ux=b 1exp ( u=b)
bx=b+1 _{(}_{x=b}_{+ 1)}
1
b
ux=bexp ( u=b)
bx=b+1 _{(}_{x=b}_{+ 1)}: (B6)

Then, by the properties of the Gamma function,

Z _{1}
0
b K_{G}0_{(}_{x=b}_{+1}_{;b}_{)}(u) du = x
Z _{1}
0
ux=b 1exp ( u=b)
bx=b+1 _{(}_{x=b}_{+ 1)}du+
Z _{1}
0
ux=bexp ( u=b)
bx=b+1 _{(}_{x=b}_{+ 1)}du
= x b
x=b _{(}_{x=b}_{)}
bx=b+1 _{(}_{x=b}_{+ 1)} + 1 = 1 + 1 = 2: (B7)

Combining this result with the continuity of ( ) and LX(T; ) and Assumption 3, we can

conclude that A1 is bounded by Oa:s: LX(_{b}T;x)

q

log 1 = oa:s:(1). Similarly, A2(x) is

bounded byOa:s: LX(_{b}T;x)

q

log 1 =oa:s:(1).

Proof of Theorem 1. Observe that

bb(x) (x) =
Pn 1
i=1 KG(x=b+1;b)(Xi )f (Xi ) (x)g
Pn 1
i=1 KG(x=b+1;b)(Xi )
+
Pn 1
i=1 KG(x=b+1;b)(Xi )
n_{X}
(i+1) Xi
(Xi )
o
Pn 1
i=1 KG(x=b+1;b)(Xi )
B (x) +V (x): (B8)

Using Lemma 1 above and Lemma 6 in BP, we have

B (x) =
RT
0 KG(x=b+1;b)(Xs)f (Xs) (x)gds+oa:s:(1)
RT
0 KG(x=b+1;b)(Xs)ds+oa:s:(1)
=
R_{1}
0 KG(x=b+1;b)(u)f (u) (x)gs(u)ds+oa:s:(1)
R_{1}
0 KG(x=b+1;b)(u)s(u)ds+oa:s:(1)
+ (s:o:); (B9)

where(s:o:)denotes the term that is of smaller order than the …rst term. Ignoring smaller order terms, taking a second-order Taylor expansion for (u)aroundu=x, and usingE( x x) =b

and E( x x)2 =b(x+ 2b) for x Gamma (x=b+ 1; b), we obtain the following

approxim-ation of the …rst term:

R_{1}
0 KG(x=b+1;b)(u)f (u) (x)gs(u)ds
R_{1}
0 KG(x=b+1;b)(u)s(u)ds
= 0(x) 1 +xs
0_{(}_{x}_{)}
s(x) +
x
2
00_{(}_{x}_{)} _{b}_{+}_{o}_{(}_{b}_{)}_{:} _{(B10)}

On the other hand, V (x) can be rewritten as

V (x) = Pn 1 i=1 KG(x=b+1;b)(Xi ) 1 R(i+1) i f (Xs) (Xi )gds Pn 1 i=1 KG(x=b+1;b)(Xi ) + Pn 1 i=1 KG(x=b+1;b)(Xi ) 1 R(i+1) i (Xs)dW s Pn 1 i=1 KG(x=b+1;b)(Xi ) V1(x) +V2(x); (B11)

whereV1(x) is bounded by Oa:s:

q

log 1 =oa:s:(1) following the same argument as in

the proof of Lemma 1 above. The leading variance term of the numerator of V2(x) can be

determined by
n 1
X
i=0
K_{G}2_{(}_{x=b}_{+1}_{;b}_{)}(Xi )
1 Z (i+1)
i
2_{(}_{X}
s)ds+oa:s:(1)
=
Z T
0

K_{G}2_{(}_{x=b}_{+1}_{;b}_{)}(Xs+oa:s:(1)) 2(Xs) +oa:s:(1) ds+oa:s:(1): (B12)

Ignoring smaller order terms, and de…ning Ab(x) = b 1 (2x=b+ 1)= 22x=b+1 2(x=b+ 1) ,

we have
Z T
0
K_{G}2_{(}_{x=b}_{+1}_{;b}_{)}(Xs) 2(Xs)ds
= Ab(x)
Z T
0
KG(2x=b+1;b=2)(Xs) 2(Xs)ds
= Ab(x)
Z T
0
K_{G}_{(2}_{x=b}_{+1}_{;b=}_{2)}(u) 2(u)LX(T; u)du; (B13)

where (see Chen, 2000)

Ab(x) =
( _{b} 1=2
2p x1=2 +o b
1=2 _{for interior} _{x}
b 1 _{(2 +1)}
22 +1 2_{( +1)} +o b 1 for boundaryx:
(B14)

Hence, for interiorx,
b1=2
Z T
0
K_{G}2_{(}_{x=b}_{+1}_{;b}_{)}(Xs) 2(Xs)ds
a:s:
!
2_{(}_{x}_{)}
2p x1=2LX(T; x): (B15)

It follows from the arguments in the proof of Theorem 3 in BP that

b1=4V2(x))M N 0;
2_{(}_{x}_{)}
2p x1=2_{L}
X(T; x)
: (B16)
Therefore,
q
b1=2_{L}
X(T; x)V2(x))N 0;
2_{(}_{x}_{)}
2p x1=2 : (B17)

The result for interior x can be established by combining (B8), (B10) and (B17) under the

assumption that b5=2LX(T; x) = Oa:s:(1) and then replacing LX(T; x) by LbX(T; x; b). The

### References

[1] Abramson, I. S. (1982): “On Bandwidth Variation in Kernel Estimates - A Square Root Law,”Annals of Statistics, 10, 1217 - 1223.

[2] Aït-Sahalia, Y. (1996a): “Nonparametric Pricing of Interest Rate Derivative Securities,”

Econometrica, 64, 527 - 560.

[3] Aït-Sahalia, Y. (1996b): “Testing Continuous Time Models of the Spot Interest Rate,”

Review of Financial Studies, 9, 385 - 426.

[4] Andrews, D. W. K. (1991): “Heteroskedasticity and Autocorrelation Consistent Covari-ance Matrix Estimation,”Econometrica, 59, 817 - 858.

[5] Arapis, M., and J. Gao (2006): “Empirical Comparisons in Short-Term Interest Rate Models Using Nonparametric Methods,”Journal of Financial Econometrics, 4, 310 - 345.

[6] Bandi, F. M. (2002): “Short-Term Interest Rate Dynamics: A Spatial Approach,”Journal of Financial Economics, 65, 73 - 110.

[7] Bandi, F. M., V. Corradi, and G. Moloche (2010): “Bandwidth Selection for Continuous-Time Markov Processes,” Unpublished manuscript.

[8] Bandi, F. M., and P. C. B. Phillips (2003): “Fully Nonparametric Estimation of Scalar Dif-fusion Models,”Econometrica, 71, 241 - 283.

[9] Bosq, D. (1998): Nonparametric Statistics for Stochastic Processes: Estimation and Pre-diction, 2nd edition. New York: Springer-Verlag.

[10] Bouezmarni, T., and J. V. K. Rombouts (2008): “Density and Hazard Rate Estimation for Censored and -Mixing Data Using Gamma Kernels,”Journal of Nonparametric Statist-ics, 20, 627 - 643.

[11] Bouezmarni, T., and O. Scaillet (2005): “Consistency of Asymmetric Kernel Density Es-timators and Smoothed Histograms with Application to Income Data,”Econometric The-ory, 21, 390 - 412.

[12] Broze, L., O. Scaillet, and J.-M. Zakoïan (1995): “Testing for Continuous-Time Models of the Short-Term Interest Rate,”Journal of Empirical Finance, 2, 199 - 223.

[13] Burman, P., E. Chow, and D. Nolan (1994): “A Cross-Validatory Method for Dependent Data,”Biometrika, 81, 351 - 358.

[14] Chapman, D. A., and N. D. Pearson (2000): “Is the Short Rate Drift Actually Nonlinear?,”

Journal of Finance, 55, 355 - 388.

[15] Chen, S. X. (2000): “Probability Density Function Estimation Using Gamma Kernels,”

Annals of the Institute of Statistical Mathematics, 52, 471 - 480.

[16] Cox, J. C., J. E. Ingersoll, Jr., and S. A. Ross (1985): “A Theory of the Term Structure of Interest Rates,”Econometrica, 53, 385 - 408.

[17] Durham, G. B. (2003): “Likelihood-Based Speci…cation Analysis of Continuous-Time Models of the Short-Term Interest Rate,”Journal of Financial Economics, 70, 463 - 487.

[18] Fan, J. (1993): “Local Linear Regression Smoothers and Their Minimax E¢ ciencies,”

Annals of Statistics, 21, 196 - 216.

[19] Fan, J., and I. Gijbels (1992): “Variable Bandwidth and Local Linear Regression Smooth-ers,”Annals of Statistics, 20, 2008 - 2036.

[20] Fan, J., and C. Zhang (2003): “A Reexamination of Di¤usion Estimators with Applications to Financial Model Validation,”Journal of the American Statistical Association, 98, 118 -134.

[21] Fé, D. (2010): “An Application of Local Linear Regression with Asymmetric Kernels to Regression Discontinuity Designs,” Economics Discussion Paper Series EDP-1016, Uni-versity of Manchester.

[22] Florens-Zmirou, D. (1993): “On Estimating the Di¤usion Coe¢ cient from Discrete Ob-servations,”Journal of Applied Probability, 30, 790 - 804.

[23] Franke, J., J.-P. Kreiss, and E. Mammen (2002): “Bootstrap of Kernel Smoothing in Nonlinear Time Series,”Bernoulli, 8, 1 - 37.

[24] Gasser, T., and H.-G. Müller (1979): “Kernel Estimation of Regression Functions,” in T. Gasser and M. Rosenblatt (eds.), Smoothing Techniques for Curve Estimation: Pro-ceedings of a Workshop Held in Heidelberg, April 2 - 4, 1979. Berlin: Springer-Verlag, 23 - 68.

[25] Gospodinov, N., and M. Hirukawa (2007): “Time Series Nonparametric Regression Using Asymmetric Kernels,” CIRJE Discussion Paper No. CIRJE-F-573, CIRJE, Faculty of Economics, University of Tokyo.

[26] Gouriéroux, C., and A. Monfort (2006): “(Non)Consistency of the Beta Kernel Estimator for Recovery Rate Distribution,” CREST Discussion Paper n 2006-31.

[27] Gustafsson, J., M. Hagmann, J. P. Nielsen, and O. Scaillet (2009): “Local Transformation Kernel Density Estimation of Loss Distributions,”Journal of Business & Economic Stat-istics, 27, 161 - 175.

[28] Györ…, L., W. Härdle, P. Sarda, and P. Vieu (1989): Nonparametric Curve Estimation from Time Series, Lecture Notes in Statistics, Springer-Verlag, Berlin.

[29] Hagmann, M., and O. Scaillet (2007): “Local Multiplicative Bias Correction for Asymmet-ric Kernel Density Estimators,”Journal of Econometrics, 141, 213 - 249.

[30] Hall, P., S. N. Lahiri, and J. Polzehl (1995): “On Bandwidth Choice in Nonparametric Regression with Both Short- and Long-Range Dependent Errors, ”Annals of Statistics, 23, 1921 - 1936.

[31] Härdle, W., and J. S. Marron (1991): “Bootstrap Simultaneous Error Bars for Nonpara-metric Regression,”Annals of Statistics, 19, 778 - 796.

[32] Jiang, G. J. (1998): “Nonparametric Modeling of U.S. Interest Rate Term Structure Dynamics and Implications on the Prices of Derivative Securities,”Journal of Financial and Quantitative Analysis, 33, 465 - 497.

[33] Jiang, G. J., and J. L. Knight (1997): “A Nonparametric Approach to the Estimation of Di¤usion Processes with an Application to a Short-Term Interest Rate Model,” Econo-metric Theory, 13, 615 - 645.

[34] Jones, C. S. (2003): “Nonlinear Mean Reversion in the Short-Term Interest Rate,”Review of Financial Studies,16, 793 - 843.

[35] Jones, M. C., and D. A. Henderson (2007): “Kernel-Type Density Estimation on the Unit Interval,”Biometrika, 24, 977 - 984.

[36] Kanaya, S., and D. Kristensen (2011): “Optimal Sampling and Bandwidth Selection for Nonparametric Kernel Estimators of Di¤usion Processes,” Unpublished manuscript.

[37] Kristensen, D. (2008): “Estimation of Partial Di¤erential Equations with Applications in Finance,”Journal of Econometrics, 144, 392 - 408.

[38] Kristensen, D. (2010): “Nonparametric Filtering of the Realized Spot Volatility: A Kernel-Based Approach,”Econometric Theory, 26, 60 - 93.

[39] Müller, H.-G. (1991): “Smooth Optimum Kernel Estimators near Endpoints,”Biometrika, 78, 521 - 530.

[40] Nicolau, J. (2003): “Bias Reduction in Nonparametric Di¤usion Coe¢ cient Estimation,”

Econometric Theory, 19, 754 - 777.

[41] Phillips, P. C. B., and J. Yu (2005): “Jackkni…ng Bond Option Prices,”Review of Financial Studies, 18, 707 - 742.

[42] Racine, J. S. (2000): “A Consistent Cross-Validatory Method for Dependent Data: hv -Block Cross-Validation,”Journal of Econometrics, 99, 39 - 61.

[43] Renault, O., and O. Scaillet (2004): “On the Way to Recovery: A Nonparametric Bias Free Estimation of Recovery Rate Densities,”Journal of Banking & Finance, 28, 2915 - 2931.

[44] Rice, J. (1984): “Boundary Modi…cation for Kernel Regression,”Communications in Stat-istics: Theory and Methods, 13, 893 - 900.

[45] Scaillet, O. (2004): “Density Estimation Using Inverse and Reciprocal Inverse Gaussian Kernels,”Journal of Nonparametric Statistics, 16, 217 - 226.

[46] Stanton, R. (1997): “A Nonparametric Model of Term Structure Dynamics and the Market Price of Interest Rate Risk,”Journal of Finance, 52, 1973 - 2002.

Table 1. Monte Carlo statistics of bond and option prices based on nonparametric (Gaussian NW and Gamma NW) estimators in the CIR model with = 0:2804; = 0:0541 and = 0:0876.

bond price call option price true price 82.425 1.889 Gaussian NW estimator median estimate 82.359 1.656 standard deviation 1.322 0.515 95% con…dence interval [80.420, 85.573] [1.014, 3.026] Gamma NW estimator median estimate 82.447 1.704 standard deviation 1.115 0.347 95% con…dence interval [80.665, 85.058] [1.133, 2.463]

Notes: The statistics in the table are computed from 5,000 samples generated from the CIR
model with_{4}= 1=12 andn= 600. The prices of a three-year zero-coupon discount bond and
a one-year European call option on a three-year bond with face value of $100, strike price of
$87 and initial interest rate of 7% are computed by Monte Carlo simulation.

Figure 1. Shapes of the Gamma kernel function with a …xed smoothing parameter (0.02) at various design points.

Figure 2. Estimates of the drift function ( )of the risk-free rate with the NW estimator based on the Gamma kernel (along with 95% bootstrap con…dence bands), NW estimator based on the Gaussian kernel and indirect inference estimator of the CIR model. The smoothing parameter for the Gamma kernel estimator is chosen byh-block cross-validation withh= ( n)1=4, where is de…ned in Section 2.3. The smoothing parameter for the Gaussian kernel estimator is set equal to four times the bandwidth parameter foriiddata (Stanton, 1997).

Figure 3. Estimate of the di¤usion function ( )of the risk-free rate with the NW estimator based on the Gamma kernel (along with 95% bootstrap con…dence bands). The smoothing parameter for the Gamma kernel estimator is chosen by h-block cross-validation with h = ( n)1=4, where is de…ned in Section 2.3.

Figure 4. Average Monte Carlo drift (Gamma and Gaussian NW) estimates from CIR model
with ( ; ; ) = (0:2804; 0:0541; 0:0876). The smoothing parameters are selected by h-block
cross validation withh= ( n)1=4_{, where} _{is de…ned in Section 2.3.}

Figure 5. 95% Monte Carlo con…dence intervals of the drift (Gamma and Gaussian NW)
estimates from CIR model with( ; ; ) = (0:2804;0:0541;0:0876)and smoothing parameters
selected byh-block cross validation withh= ( n)1=4_{, where} _{is de…ned in Section 2.3.}