• No results found

Reducing bias in nonparametric density estimation via bandwidth dependent kernels: L1 view

N/A
N/A
Protected

Academic year: 2020

Share "Reducing bias in nonparametric density estimation via bandwidth dependent kernels: L1 view"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

Munich Personal RePEc Archive

Reducing bias in nonparametric density

estimation via bandwidth dependent

kernels: L1 view

Mynbaev, Kairat and Martins-Filho, Carlos

KazakhBritish Technical University, University of Colorado

-Boulder

2016

Online at

https://mpra.ub.uni-muenchen.de/75902/

(2)

Reducing bias in nonparametric density estimation via bandwidth

dependent kernels:

L1

view

1

Kairat Mynbaev

International School of Economics Kazakh-British Technical University Tolebi 59

Almaty 050000, Kazakhstan

email: kairat [email protected]

and

Carlos Martins-Filho

Department of Economics IFPRI

University of Colorado 2033 K Street NW

Boulder, CO 80309-0256, USA & Washington, DC 20006-1002, USA email: [email protected] email: [email protected]

November, 2016

Abstract. We define a new bandwidth-dependent kernel density estimator that improves existing conver-gence rates for the bias, and preserves that of the variation, when the error is measured inL1. No additional

assumptions are imposed to the extant literature.

Keywords and phrases. Kernel density estimation, higher order kernels, bias reduction.

AMS-MS Classification. 62G07, 62G10, 62G20.

(3)

1

Introduction

Given a sequence ofn∈Nindependent realizations{Xj}n

j=1 of the random variableX, having densityf on

R, the Rosenblatt-Parzen kernel estimator (Rosenblatt (1956), Parzen (1962)) off is given by

fn(x) =

1

n

n

X

j=1

(ShnK)(x−Xj), (1.1)

whereShn is an operator defined by

(ShnK)(x) = 1

hn

K

x hn

, (1.2)

K is a kernel, i.e., a function onRsuch thatR

K(x)dx= 1 andhn>0 is a non-stochastic bandwidth such

thathn→0 asn→ ∞.1

One of the most natural and mathematically sound (Devroye and Gy¨orfi (1985), Devroye (1987)) criteria

to measure the performance offn as an estimator off is theL1distanceR|fn−f|. In particular, given that

this distance is a random variable (measurable function of{Xj}nj=1) it is convenient to focus onE

R

|fn−f|

,

whereE denotes the expectation taken using f. For this criterion, there is a simple bound (Devroye, 1987,

p. 31)

E

Z

|fn−f|

Z

|(f∗ShnK)−f|+E

Z

|fn−f∗ShnK|

,

where for arbitrary f, g ∈ L1, (f ∗g)(x) = R g(y)f(x−y)dy is the convolution of f and g. The term

R

|f∗ShnK−f|is called bias overRandE R|fn−f∗ShnK|is called the variation overR. There exists

a large literature devoted to establishing conditions on f and K that assure suitable rates of convergence

of the bias to zero as n→ ∞(see, inter alia, Silverman (1986), Devroye (1987) and Tsybakov (2009)). In

particular, ifK is of orders, i.e.,αj(K) = 0 forj = 1, ..., s−1 andαs(K)6= 0, where αj(K) =RtjK(t)dt

is thejth moment of K, and f has an integrable derivativef(s), then R

|f ∗ShnK−f| is of order O(hsn)

and this order cannot be improved, see, e.g., (Devroye, 1987, Theorem 7.2). In this note, we show that if in

(1.2) the kernel is allowed to depend onn, then the orderO(hs

n) can be replaced by the ordero(hsn), without

increasing the order of the kernel or the smoothness of the density. In addition, another result from Devroye

1Throughout this note, integrals are overR, unless otherwise specified.

(4)

(1987) states that ifKis a kernel of order greater thansand the derivativef(s)isa-Lipschitz then the bias

is of orderO(hs+a

n ). We achieve the same rate of convergence with kernels of orders.

2

Main results

LetL1andC denote the spaces of integrable and (bounded) continuous functions onRwith normskfk1=

R

|f|andkfkC= sup|f|, andβs(K) =R |t|s|K(t)|dt. Let{Kn}be a sequence of kernels and define

ˆ

fn(x) =

1

n

n

X

j=1

(ShnKn)(x−Xj).

In the following Theorem 1, the densityf has the same degree of smoothness and the kernelsKn are of the

same order as in (Devroye, 1987, Theorem 7.2), but the bias is of ordero(hs

n) instead ofO(hsn). This results

because the kernels depend onnand have “disappearing” moments of orders.

Theorem 1. Let {Kn} be a sequence of kernels of order s such that: 1. αs(Kn) → 0; 2. {usKn(u)} is

uniformly integrable. For allf with absolutely continuousf(s−1)andf(s)L

1, we havekf∗ShnKn−fk1=

o(hs n).

Proof. Note that sinceKn is a kernel

f ∗ShnKn(x)−f(x) =

Z

Kn(t)[f(x−hnt)−f(x)]dt. (2.1)

Sincef iss-times differentiable, by Taylor’s Theorem,

f(x−hnt)−f(x) = s−1

X

j=1

f(j)(x)

j! (−hnt)

j+Z x−hnt

x

(x−hnt−u)s−1

(s−1)! f

(s)(u)du.

Furthermore, given thatKn is of orders,

f∗ShnKn(x)−f(x) =

1 (s−1)!

Z Z x−hnt

x

(x−hnt−u)s−1f(s)(u)duKn(t)dt. (2.2)

Lettingλ=−u−x

hnt we have

Z x−hnt

x

(x−hnt−u)s−1f(s)(u)du= (−hnt)s

Z 1

0

(5)

Substituting (2.3) into (2.2) we obtain

f∗ShnKn(x)−f(x) =

(−hn)s

s!

Z Z 1

0

f(s)(x−hnλt)s(1−λ)s−1dλtsKn(t)dt. (2.4)

SinceR1

0(1−λ)s−1dλ= 1

s, we have that

(−hn)s

(s−1)!

Z Z 1

0

f(s)(x)(1−λ)s−1dλtsKn(t)dt=

(−hn)s

s! f

(s)(x)Z tsK

n(t)dt. (2.5)

Then, adding and subtracting (2.5) to the right-hand side of (2.4) gives

f∗ShnKn(x)−f(x) =

(−hn)s

s!

f(s)(x)αs(Kn) +

Z Z 1

0

[f(s)(x−hnλt)−f(s)(x)]s(1−λ)s−1dλtsKn(t)dt

.

Since f(s) L

1 we write its continuity modulus as ω(δ) = sup|t|≤δ

R

f(s)(x−t)−f(s)(x)

dx. It is

well-known (see properties M.2, M.6 and M.7 in Zhuk and Natanson (2003)) that

ω(δ)≤2 f

(s)

1, ωis nondecreasing and limδ→0ω(δ) = 0. (2.6)

Then,

kf ∗ShnKn−fk1≤ h s n s! f (s)

1|αs(Kn)|+ Z Z 1

0

Z f

(s)(x

−hnλt)−f(s)(x)

dx s(1−λ)

s−1

|tsKn(t)|dt

≤ h s n s! f (s)

1|αs(Kn)|+

Z 1

0

Z

ω(λhn|t|)|tsKn(t)|dt s(1−λ)s−1dλ

= h s n s!   f (s)

1|αs(Kn)|+

Z 1

0

  Z

|t|≤1

hn

+

Z

|t|>1

hn

ω(λhn|t|)|tsKn(t)|dt s(1−λ)s−1dλ

  (2.7) ≤ h s n s!   f (s)

1|αs(Kn)|+ω

p

hn

βs(Kn) + 2

f (s) 1 Z

|t|>1

hn

|tsKn(t)|dt

. (2.8)

Given thatαs(Kn)→0 as n→ ∞,{tsKn(t)} is uniformly integrable, which implies sup

n βs(Kn)<∞, and

using (2.6) and (2.8) we have

kf∗ShnKn−fk1=o(hsn). (2.9)

Remark 1. Kernel sequences {Kn} that satisfy the restrictions imposed by Theorem 1 can be easily

constructed. To this end, denote byBsthe space of functions with bounded normkKkBs=β0(K) +βs(K).

(6)

Take functionsK(0), K(s)∈ Bs such that

α0(K(0)) = 1, αj(K(0)) = 0 forj = 1, ..., s;αj(K(s)) = 0 forj= 0,1, ..., s−1, αs(K(s)) = 1. (2.10)

We define then-dependent kernel Kn=K(0)+hnK(s) with 0< hn ≤1. Note thatKn is a kernel of order

swithαs(Kn) =hn which tends to zero asn→ ∞. It is clear that any kernelK of order scan be written

as K = K(0)+αs(K)K(s), so that the conventional s-order kernels obtain from ours with αs(K) = hn.

Furthermore, it follows from (2.10) that{ts(K

(0)(t) +hnK(s)(t))}is uniformly integrable.

Now, to obtain K(0) andK(s), assume that for a nonnegative kernel Kwe haveβ2s(K)<∞. Then, we

can associate withK a symmetric matrix

As=

   

α0(K) α1(K) ... αs(K)

α1(K) α2(K) ... αs+1(K)

... ... ... ... αs(K) αs+1(K) ... α2s(K)

   

,

such that detAs 6= 0 (see Mynbaev et al. (2014)). For an arbitrary vector b ∈ Rs+1 let a = A−s1b and

define a polynomial transformation of K by (TaK)(t) = Psi=0aitiK(t). Then, we put K(0)=TaK with

b = (1,0, ...,0)′ and K

(s) = TaK with b = (0, ...,0,1)′, which satisfy (2.10). Thus, we have the following

corollary to Theorem 1.

Corollary 1. Let Kn =K(0)+hnK(s) where K(0) and K(s) are as defined in Remark 1. Then, for all f

with absolutely continuousf(s−1)andf(s)L

1, we have kf∗ShnKn−fk1=o(hsn).

Remark 2. IfKnare supported on [−M, M] for someM >0 and for alln, then in (2.7), instead of splitting

R=

|t| ≤1/√hn ∪|t|>1/√hn we can useR={|t| ≤M} ∪ {|t|> M}and then instead of (2.8) we get

kf∗ShnKn−fk1≤

hs n

s!

h f

(m)

1|αs(Kn)|+ω(hnM)βs(Kn)

i

.

Hence, selecting {Kn} in such a way that αs(Kn) = O(ω(hn)), sup

n βs(Kn) < ∞ and using the fact that

ω(hnM)≤(M + 1)ω(hn)2 we get a result that is more precise than (2.9), i.e.,

kf∗ShnKn−fk1=O(hsnω(hn)).

(7)

Remark 3. By Young’s inequality, the variation of ˆfn using Kn =K(0)+hnK(s)is such that

E

Z

|fˆn−f∗ShnKn| ≤ E

Z

|fˆn−f∗ShnK(0)|+hn

Z

|f ∗ShnK(s)|

≤ E

Z

|fˆn−f∗ShnK(0)|+hn

Z

|f|

Z

|K(s)|.

Letting fn(0) be the estimator in (1.1) with K = K(0), we have ER|fˆn−f ∗ShnK(0)| ≤ ER|fn(0)−f ∗

ShnK(0)|+hnR|f|R|K(s)|. Hence,

E

Z

|fˆn−f ∗ShnKn| ≤E

Z

|f(0)

n −f∗ShnK(0)|+ 2hn

Z

|f|

Z

|K(s)|.

Since,hn→0 asn→ ∞, the variation of ˆfn is asymptotically bounded by the variation of the conventional

estimatorfn usingK(0), i.e.,ER|fn(0)−f ∗ShnK(0)|.

Under the assumptions that f has a variance,R

(1 +t2)(K

(0)(t))2dt <∞, (Devroye, 1987, Theorem 7.4)

showed thatER

|fn(0)−f∗ShnK(0)|=O((nhn)−1/2). Thus,

p

nhnE

Z

|fˆn−f ∗ShnKn|= (1 + (nh3n)1/2)O(1) =O(1),

where the last equality follows ifnh3

n ≤c <∞.

We now provide an analog for Theorem 7.1 in Devroye (1987). There, the bias orderO(hs+a) is achieved

for kernels with orders greater thans, while in the following theorems we obtain the same order of bias for

kernels of orders.

Theorem 2. Let{Kn}be a sequence of kernels of orderssuch that: 1. αs(Kn) =O(han); 2. sup

n βs+a(Kn)<

, for some a ∈ (0,1]. For f with absolutely continuous f(s−1) and f(s) L

1 assume that for some

0< c <∞

ω(δ)≤c|δ|a. (2.11)

Then, kf∗ShnKn−fk1=O(hs+an ).

Proof. As in the proof of Theorem 1, we have

kf∗ShnKn−fk1≤

hs n

s!

f

(s)

1|αs(Kn)|+ Z 1

0

Z

ω(λhn|t|)s(1−λ)s−1|tsKn(t)|dtdλ

(2.12)

(8)

By (2.11)

kf∗ShnKn−fk1≤

hs n

s!

f

(s)

1|αs(Kn)|+ch

a n

Z Z 1

0

s(1−λ)s−1

ts+aKn(t)

dt

.

Hence, under conditions 1. and 2. in the statement of the theorem we havekf∗ShnKn−fk1=O(h s+a n ).

Remark 4. Practitioners may find condition (2.11) too general, preferring more primitive conditions on

f(s). To this end, we say that a functiongdefined onRsatisfies aglobal Lipschitz conditionof ordera(0,1]

if there exist positive functionsl(x), r(x) such that

|g(x−h)−g(x)| ≤l(x)|h|a for|h| ≤r(x), x∈R. (2.13)

The function l is called a Lipschitz constant and the function r is called a Lipschitz radius. The class

Lip(a, δ), forδ >1, is defined as the set of functionsg which satisfy (2.13) withl andrsuch that

Z

(l(x) +r(x)−δ)dx <∞. (2.14)

In the next lemma we give two sufficient sets of conditions forg∈Lip(a, δ). In the first casegis compactly

supported, and in the second it is not.

Lemma 1. a) Supposeg has compact support,suppg⊆[−N, N]for someN >0, andg satisfies the usual

Lipschitz condition |g(x−h)−g(x)| ≤ c|h|a for any x, h and some a (0,1]. Set l(x) = c, r(x) = 1for

|x|< N andl(x) = 0, r(x) =|x| −N for|x| ≥N. Then,g∈Lip(a, δ)with anyδ >1.

b) Suppose that|g(1)(t)| ≤ce−|t|, tR. Let l(x) =cexp(−|x|/2 + 1/2), r(x) = (1 +|x|)/2, xR. Then,

g∈Lip(1, δ)with anyδ >1.

Proof. a) If|x|< N, then|g(x−h)−g(x)| ≤ |h|al(x) for allh(and not only for|h| ≤r(x)). If|x| ≥N, then

|h| ≤r(x) =|x| −N implies|x−h| ≥ |x| − |h| ≥N, so that|g(x−h)−g(x)|= 0 =|h|al(x) for|h| ≤r(x).

b) Let|x| ≥1. We have

|g(x−h)−g(x)| ≤ |h| sup

|t−x|≤|h||

g(1)(t)| ≤ |h|c sup

|t−x|≤|h|

e−|t|. (2.15)

|t−x| ≤ |h| ≤r(x) = (1 +|x|)/2 implies|t|=|x+t−x| ≥ |x| − |t−x| ≥ |x|/2−1/2≥0 and (2.15) gives

(9)

that by (2.15),|g(x−h)−g(x)| ≤ |h|c≤ |h|cexp(1/2− |x|/2) =|h|l(x) for allhand not only for|h| ≤r(x).

Condition (2.14) is obviously satisfied in both cases.

By part a) of Lemma 1, compactly supported densities with derivative f(s) that satisfies the usual a

-Lipschitz condition are such that f(s) Lip(a, δ) for any δ >1. This corresponds to the case treated in

Theorem 7.1 of Devroye (1987). Part b) shows that for densities with unbounded domains, not covered by

Theorem 7.1, iff(s)(x) has derivative that decays exponentially as |x| → ∞, then f(s) Lip(1, δ) for any

δ >1. Next we provide a version of Theorem 2 for densities with derivativef(s)Lip(a, δ).

Theorem 3. Suppose that the density f is such that its derivative f(s) belongs to L

1, C (the respective

norms are finite) andLip(a, δ). Let{Kn}be a sequence of kernels of orderssuch that: 1. αs(Kn) =O(han);

2. sup

n max{βs+a(Kn), βs+δ(Kn)}<∞. Then,kf∗ShnKn−fk1=O(h s+a n ).

Proof. As in the proof of Theorem 1, we have

kf∗ShnKn−fk1≤

hs n s! f (s)

1|αs(Kn)|+ Z 1

0

Z Z f

(s)(x

−hnλt)−f(s)(x)

|t

sK

n(t)|dt dx s(1−λ)s−1dλ

.

LetI(x) =R

f(s)(x−hnλt)−f(s)(x)

|tsKn(t)|dtand note that sincef(s)∈Lip(a, δ)

I(x) =

  

Z

λhn|t|≤r(x)

+

Z

λhn|t|>r(x)

   f

(s)(x

−hnλt)−f(s)(x)

|t

sK n(t)|dt

Z

λhn|t|≤r(x)

l(x)λahan|t|a+s|Kn(t)|dt+

Z

λhn|t|>r(x)

|f(s)(x−hnλt)−f(s)(x)||tsKn(t)|dt

≤hanβs+a(Kn)l(x) +

Z

λhn|t|>r(x)

|f(s)(x−hnλt)−f(s)(x)||tsKn(t)|dt.

LettingI1(x) =

R

λhn|t|>r(x)

|f(s)(xh

nλt)−f(s)(x)||tsKn(t)|dtwe have

I1(x)≤

Z

λhn|t|>r(x)

1

|t|δ

|f(s)(x−hnλt)|+|f(s)(x)|

|t|s+δ|Kn(t)|dt.

Noting that|t|−δ < λδhδ

nr(x)−δ and given thatkf(s)kC <∞, we obtain I1(x)≤ 2kf(s)kCλ δ

hδ n

r(x)δβs+δ(Kn).

Consequently,

I(x)≤hanmax{βs+a(Kn), βs+δ(Kn)}

l(x) + 2hδn−akf(s)kC

1

r(x)δ

. (2.16)

(10)

SinceR1

0 s(1−λ)

s−1ds=1

s and given (2.14)

kf∗ShnKn−fk1≤h s n

s!

f

(s)

1|αs(Kn)|+

1

sh

a

nmax{βs+a(Kn), βs+δ(Kn)}

Z

l(x) + 2kf(s)kC 1

r(x)δ

dx

.

(2.17)

Thus, using conditions 1. and 2. in the statement of the theorem, we havekf∗ShnKn−fk1=O(hs+an ).

Remark 5. As in the case of Theorem 1, Theorems 2 and 3 do not address the construction of the kernel

sequence{Kn}. The following corollary to Theorem 3 shows thatKn =K(0)+hanK(s) is a suitable kernel

sequence, whereK(0) andK(s)are as defined above.

Corollary 2. Suppose the density f is such that its derivative f(s) belongs to L

1, C and to Lip(a, δ),

where a ∈ (0,1], δ > 1. Let K(0), K(s) satisfy (2.10) and belong to the intersection Bs+a∩ Bs+δ. Put

Kn=K(0)+hanK(s),0≤hn≤1.ThenKnis a kernel of ordersforhn>0andkf∗ShnKn−fk1=O(hs+an ).

The condition K(0) ∈ Bs+a and the definition Kn = K(0)+hanK(s) can be replaced by K(0) ∈ Bs+1 and

Kn=K(0)+hnK(s), respectively, without affecting the conclusion.

References

Devroye, L., 1987. A Course in density estimation. Birkh¨auser, Boston, MA.

Devroye, L., Gy¨orfi, L., 1985. Nonparametric density estimation: theL1 view. John Wiley and Sons, New

York, NY.

Mynbaev, K., Nadarajah, S., Withers, C., Aipenova, A., 2014. Improving bias in kernel density estimation. Statistics and Probability Letters 94, 106–112.

Parzen, E., 1962. On estimation of a probability density and mode. Annals of Mathematical Statistics 33, 1065–1076.

Rosenblatt, M., 1956. Remarks on some nonparametric estimates of a density function. Annals of Mathe-matical Statistics 27, 832–837.

Silverman, B. W., 1986. Density estimation for statistics and data analysis. Chapman and Hall, London.

Tsybakov, A. B., 2009. Introduction to nonparametric estimation. Springer-Verlag, New York, NY.

References

Related documents

Figure 6.8: XPS spectra for the cit:AGZO material, including a) the Zn 2p region (as- synthesised), b) the Zn 2p region (post-heat treatment), c) the O 1s region (as-synthesised),

Reierson, “Impact of types on essentially typeless problems in GP.” In: Genetic Programming 1998: Proceedings of the Third Annual Conference, July 1998..

To combat customer churn within the mobile telecommunications industry, the Alacer Group used big data combined with an understanding of profitability to help a Tier 1

In that case the worsening of the state of the fossil-fuel based technology may decrease the supply of energy enough so that fossil fuel will be reallocated from both the present

However, the DWS technique has an important limitation: it is specifically thought to typical data warehouses organized in an ideal star schema consisting of a large fact

In the current study, which was set in a community mental health center serving a largely Latino area of Los Angeles, we engaged family members to facilitate the acquisition

While Mizrahi and Ness-Weisman (2007) give their own definition which is in line with the ability of the internal auditor intervention in prevention and correction of deficiencies

Macroeconomics and monetary economics, Public economics, Labor economics, International economics, Economic systems and economic structures, Methodology and history of