fv is usually unknown. In this section, we discuss the case when fv is estimated nonpara-metrically.
We consider here the modi…ed estimator (1.2.8) with estimated bfv;
bn=
fbv(vi) = 1
n 1
Xn j=1
1
hK vj vi
h :
K ( ) is standard kernel function de…ned in Assumption 37.
The estimation of fvintroduces some new problems: the estimation of fv is in expanding sets [ 0; n]; the estimator now needs a linear representation. As shown in Wooldridge (2007), Hirano, Imbens, and Ridder (2003), Magnac and Maurin (2007), and many others, the estimator with estimated fv can have smaller variance than the one using the true fv: This is also the case here, however, the rate remains the same. For the convenience of inference, we prove the consistency of the bootstrap when we are in the nice world. Note that the convergence rate in this model is slower than root-n.
1.3.1 The Consistency of bfv(v)
To have a point-wise consistent estimate of bfv(v) ; we need the number of observations around the point v to tend to in…nity. We know that fv( n) inf
v2[ 0; n]fv(v) for n large enough: So if bfv( n) is consistent for fv( n) ; the point-wise consistency of bfv(v) on the whole interval [ 0; n] should hold.
The standard nonparametric analysis (e.g., Li and Racine 2007) gives that
Eh fbv(v)i
= fv(v) + q
q!fv(q)(v) hq; (1.3.2)
varh fbv(v)i
= fv(v)
nh ; (1.3.3)
where q R
vqK (v) dv; R
K (v)2dv; and q is the order of Kernel function K. From
equation (1.3.2) and (1.3.3),
fbv(v)
fv(v) = 1 + q q!
fv(q)(v) hq fv(v) + O
r
nhfv(v) : (1.3.4)
To control the variance term, we need the number of observations used to estimate fv( n) ; nhf ( n) to tend to in…nity. The bias term could be controlled by using a high order kernel function with a bandwidth h n c; for some c > 0.
The optimal convergence rate condition (1.2.15) and the tail condition (1.2.14) imply that nfv( n) ! 1: For the consistency of bfv(v) on [ 0; n], we need a little bit stronger condition than that:
n1 chfv( n) ! 1; (1.3.5)
for some 0 < ch < 1: The optimal convergence rate condition remains the same:
1 n
Z n
0
E Y 2D v
fv(v) dv = cov (Y ; U )2: (1.3.6)
However, condition (1.3.5) and (1.3.6) place a more restrictive condition on fv(v):
Z n
0
1 fv(v)dv
!1 ch
fv( n) ! 1; (1.3.7)
for some 0 < ch < 1. This is the new and stronger tail condition needed to be in the nice world with the estimated instead of true density. Condition (1.3.7) rules out fv(v) exp ( vc) for c < 1 in example 2: This is because the tail of that fv(v) is too thin to ensure the consistency of bfv on the entire expanding sets, if we choose h = n cfor some c > 0:
Assumption 11 (Restriction on fv) For n chosen from condition (1.3.6), fvf(v+h)
v(v) =
1 + o (1) ; for v 2 [ 0; n] ; where h is the bandwidth used in the kernel function, h n c; for some c > 0:
Assumption 11 is for the consistency of bfv(v) ; intuitively, it says that the density of those observations used in estimation should be close to the density we estimate. It is not hard to verify that those fv in Lemma 1.2.10 satisfy Assumption 11, so it is reasonable to impose this assumption.
Lemma 1.3.1 For n chosen from condition (1.3.6), under Assumption 11, if h n ch, for some 0 < ch < 1; using Kernel de…ned in Assumption 37 with q > 1 cc h
h
sup
v2[ 0; n]
fbv(v) fv(v) = O ln n nh
1 2
! :
Note that condition (1.3.6) can possibly give nas fast as n12; if the tail of fv(v) is thick enough: Hansen (2008) also obtains the uniform convergence rate of sup
v2[ 0; n]
fbv(v) fv(v)
on expanding set. However, this does not cover our result here, because our n may go to in…nity faster.
1.3.2 The First-Order Asymptotics We consider the …rst-order asymptotics of 1nPn
1 bni. To simplify notation, let mni
DiTniYi
n E(Un); then ni mni
fv(vi); bni mni
fbv(vi): Observe that
bni = mni
fbv(vi) = mni
fv(vi) +
mni fv(vi) fbv(vi) fv2(vi) +
mni fv(vi) fbv(vi) 2
fv2(vi) bfv(vi) ; (1.3.8)
where the …rst two terms on the right hand side are the in‡uence term and could be analyzed
using standard U-statistics, and the last term is the residual term, which is asymptotically negligible.
With the uniform convergence of bfv(v) over the expanding sets, the following theorem gives the linear representation form, by applying the standard U-statistics (see Powell et al. 1989) technique on the in‡uence term and showing the residual term is asymptotic negligible.
Theorem 1.3.2 Suppose fv(v) satis…es condition (1.3.7). Let Assumption 3, 35 v 11, 37 hold. For n chosen from condition (1.3.6), we set h = n ch; 0 < ch ch; and q > 1 cc h
where the in‡uence term is asymptotic normal and achieves the fastest rate of convergence, and
q 2 n
nBn 1:
By Theorem 1.3.2 and for the same reason as in Corollary 1.2.12, we have the following Corollary.
Corollary 1.3.3 Suppose all Assumptions in Theorem 1.3.2 hold, then
bn E (Y ) Bn= 1 where the in‡uence term is asymptotic normal and achieves the fastest rate of convergence, and
q 2
n
nBn 1:
Proof.2It is not hard to see that the Lindeberg condition (1.2.10) also works for 1nPn 1 bni:
The rest of the proof is done by Theorem 1.3.2 and the delta method.
The variance here is smaller than that in Corollary 1.2.12 with same degree of trimming, con…rming previous results. However, the convergence rate remains the same:
1.3.3 Bootstrapping the Estimator
Suppose we have data fzigni=1 and a statistic % formed from fzigni=1. The bootstrap ran-domly generates a series fzigni=1 many times according to the empirical distribution of original series fzigni=1; and then gets a new statistic % based on fzigni=1. % is used to ap-proximate the distribution of %. The consistency of bootstrap has been discussed intensively in the literature. For an comprehensive review, see Horowitz (2001) and references therein.
Estimator (1.3.1) with a nonparametric estimated component is essentially a U-statistic.
After some transformation, equation (1.3.8) becomes
1 n
Xn
i=1bni = 1 n
Xn i=1
2mni fv(vi)
1 n (n 1)
Xn i=1
Xn
j=1;j6=iQn(zi; zj) + 1
n Xn
i=1
mni fv(vi) fbv(vi) 2
fv2(vi) bfv(vi) ; (1.3.11)
where Qn(zi; zj) 12 fm2ni
v(vi)+fm2nj v(vj)
1
hK vjhvi , Z denotes all the variables involved.
The bootstrap for U-statistics is …rst discussed by Bickel and Freedman (1981), which gives conditions for the bootstrap to work. One condition is that second moment of Qn(zi; zj) is uniformly bounded which is not the case here. Chen, Linton, and Keilegom (2003) show bootstrap works for semiparametric estimates when the criterion function is not smooth but their results are in the regular case (root-n). So we need to show that the bootstrap works for estimator (1.3.1).
For notation we let variables with superscript be the newly generated variables from the empirical probability density function of fzigni=1 with mass n1 on each zi; i.e., fzigni=1
and bni are the newly generated variables for fzigni=1 and bni respectively.
The theorem below says that the bootstrap technique indeed works for our estimator here, when we are in the nice world. The proof is tedious, but the idea of the proof is simple: we follow the standard proof of the consistency of the bootstrap for U-statistics while showing residual terms asymptotically negligible as in Section 1.3.2.
Theorem 1.3.4 Under the same conditions in Theorem 1.3.2, and the bootstrap series fzigni=1 are distributed as the empirical probability density function of fzigni=1 with mass n1 on each zi; then
s n
En
[ ni E ( nijvi)]2o 1 n
Xn
i=1 bni 1 n
Xn
i=1bni ! N (0; 1) :d