Convergence Rates of Density Estimation in Besov Spaces

(1)

doi:10.4236/am.2011.210175 Published Online October 2011 (http://www.SciRP.org/journal/am)

Convergence Rates of Density Estimation in Besov Spaces

Huiying Wang

Department of Applied Mathematics, Beijing University of Technology, Beijing, China E-mail: b200806005@emails.bjut.edu.cn

Received July 29, 2011; revised August 23, 2011; accepted August 30, 2011

Abstract

The optimality of a density estimation on Besov spaces ,

 

s r q

B R for the risk was established by Donoho, Johnstone, Kerkyacharian and Picard (“Density estimation by wavelet thresholding,” The Annals of Statistics, Vol. 24, No. 2, 1996, pp. 508-539). To show the lower bound of optimal rates of conver-gence

p L



, ,

s

n r q



R B p , they use Korostelev and Assouad lemmas. However, the conditions of those two lemmas are difficult to be verified. This paper aims to give another proof for that bound by using Fano’s Lemma, which looks a little simpler. In addition, our method can be used in many other statistical models for lower bounds of estimations.

Keywords:Optimal Rate of Convergence, Density Estimation, Besov Spaces, Wavelets

1. Introduction

Wavelet analysis has many applications, one of which is to estimate an unknown density function based on inde-pendent and identically distributed (i.i.d.) random

sam-ples. Let be a probability measurable space

and 1

( , ,  P) , , _n

X  X be i.i.d. random variables with an

un-known density function f. We use to denote the

expectation of a random variable X. The sequence

 

E X



,



: inf sup



n

n n _p



f _{f V}

R V p E f f



  is called optimal rate

of convergence on the functional class V for the Lp risk.

Here, fn is an arbitrary estimator of f with n i.i.d.

ran-dom samples. Kerkyacharian and Picard [1] study

n when V is a Besov space with matched case.

Donoho, Johnstone, Kerkyacharian and Picard [2] con-sider unmatched cases. In fact, they show the optimal

convergence rates for Besov class



,p



R V

r

L B_{r q}s_, and Lp

risk









 

1/ 1/ 2 1/ 1 ,

2 1

ln , ,

2 1 ,

, .

2 1 s r p

s r s

n r q s

s

p

n n r

s R B p

p

n r

s

   

 

 _

 _

 

 _

 _



 _ _(1.1)

To show the lower bound of (1.1), authors of [2,3] use Korostelev and Assouad lemmas. However, the condi-tions of those two lemmas are difficult to be verified. In this small paper, we give another proof for the lower

bound of (1.1) by using Fano’s lemma [4]. It should be pointed out that Fano’s lemma can be used to a variety of statistical models, see [5-7].

As usual, L_p

  

R p1



denotes the classical Lebes-gue space on the real line R. In particular, L2

 

R



R

stands for the Hilbert space, which consists of all square

inte-grable functions. As a subspace of p , the Sobolev

space with an integer exponent k means



L

 

_:



_,  m

_{ }

_, _0,1, _,



_

₁

_

k

p p

W R  f f L R m  k p .

The corresponding norm

 

: .

k p

k

W p _p

f  f  f

Moreover, the Besov space ,





[3] ,

s p q

B R (1 p q ,

s n  and (0,1]) can be defined by

 



 



2



 





, , 2 , 2

n

s n j j

p q p p

j Z

B R f W R  f  l



  _q

with the associated norm









_{ }

,

2 ( )

: 2 , 2

s n

p q p _q

j n j

p

B W

l Z

f  f   f  ,

where

 









 

2

, : sup 2 2 .

p f t h t f x h f x h f x p

  _    

In general, it can be shown that compactly supported and

n times differentiable functions belong to ,

 

s p q

(2)

H. Y. WANG ₁₂₅₉

Where p and q are density functions of P Q respec-tively.

Lemma 1.2. (Fano’s Lemma, [4]) Let be

pr

,

0 s n and 1 p, q .

The Besov space can be discretized by the sequence norm of wavelet coefficients. Many useful wavelets are

generated by scaling functions. More precisely, if  is

a scaling function with

 

_k 2



2



,

k

x h x k

 



 

then

 

:

 

1 1k 2



2



k k

x



h  xk

   _ defines a

wave-let [3]. Clearly, when  is compactly supported and

continuous, the corresponding wavelet  has the same

properties. An orthonormal wavelet basis of L2

 

R is

generated from dilation and translation of a scaling func-tion and its corresponding wavelet, i.e.

 





 





0 0

0

2

,

: 2 2 ,

: 2 2 .

j j

j j 0

j k

jk

j j k Z

x x k

 

 

 

 _ _

 



 _{ }



Although wavelet basis are constructed for L2

 

R ,

most of them constitute unconditional bases for Lp

 

R .

A scaling function  is called t regular, if  has

con-tinuous derivatives of order t and its corresponding

wavelet  has vanishing moments of order t, i.e.

 

k

x x dx0,k0,1, ,t1



 .

The following lemma [3] plays important roles in this paper.

Lemma 1.1. Let  be a compactly supported, t regu-lar orthonormal scaling function with the corresponding wavelet  and0 s t. If f Lp

 

R , s0k : f,0k ,

: ,

jk jk and 1 ,

d  f   p q , then the following

two conditions are equivalent:

1) ,

 

s p q

fB R ;

2)

1 1 2 0

0

2 j s

p j

p p

j _q

s d

 _{ }     

 



 

 

 _ _  

 

 

.

Furthermore,

,

1 1 2 0

0

2

s p q

j s p

j

B p p

j _q

f s d

 _{ }     

 



 

 

  

 

 

 .

Before introducing Fano’s Lemma, we need the nota-tion of Kullback-Leilber distance [4]. Let P and Q with P

being absolutely continuous with respect to Q (denoted

by ). Then the Kullback-Leilber distance is

de-fined by

PQ



,



: ₀

 

ln

_{ }

 

d p q

p x

( , ,  P_k)

obability measurable spaces and Ak, k0,1,,m.

If Ak

,

K P Q p x x

q x

 



_

v

A 

 for kv, then with Ac standing for

the complement of A and



_k, _v



v

0

1 : inf m

v m k

K P P





,

m



 



 

1



1



, exp 3 .

2 m e m





0

sup _k _kc min k m

P A

 

  

 _ _



By Lemma 1.1 and 1.2, we can show the following re-sult:

Theorem 1.1. Let







, ,

s r q

f B R L with1r, q ,

1  p and sr1. If fn is an estimator of f with

n_{i.i.d. random samples, then}

 





 

,

1/ 1/ 2 1/ 1

2 1 ,

ln

sup max , ,

s r q

s r p s s r

s n p

f B R L

n

E f f n

     _ 

 

  

 _ _







n

 

  _

 

where







 



,

, , : , , s

r q

s s

r q r q B

B R L  f B R f L and f

x y means

has compact support}; The notation with a constant C.

rk 1.1. Note that

xCy Rema

1/ 1/  

2 1/ 1

ln

1/ 1/ 2 1/ 1 2 1 ln

max ,

s r p p

s s r n

n

  s r

s r

s n

n

  

n

 

 

   

  _{ }_ _

  

 

for

    

2 1

p r

s



 and for 2 1

p r

s





1/1/1/ 1

2 1 2 1

max ,

s r p

2

ln s r s s

s s

n n

n

 

 

 



 

  

 

 

.

Then theorem 1.1 is a reformulate of the lowe und in

(1.1). By using the idea of reference [5], we show this theorem in the next two sections.

irstly, we prove

n _

  

  

r bo

2. Proof of Theorem 1.1

F

 





1/ 1/

2 1/ 1

ln

. s

, , sup

s r q

r p s r n p

n f f

n

   

   _ _  



One need construct

f B R L

E



such that s



,



k r

k

g g B_,_q R L

and





1//1/ 1

.

2 1

ln sup

s r p r n k p

k n

  

 

Let

s

n E f g _ _ 



ono

be a compactly supported, regular and

orth rmal scaling function,





t ts

 be the corresponding

(3)

d gers. Th

N enotes the set of positive inte en there

ex-ists a compactly supported density function g (i.e.

 

0

g x  and



g

 

) satisfying

 

,

 

s r q

d 1

x x

g x B R and g x

 

_{ }0,l c00.



l l l, 2jl



. Then

Let elem





, 2 ,, 2j1 : 0,

j

 

ents in

the number of

j

 enote

ed by [

is 2j

5], one defines

1

 , d d by # _j 2j1.

 2 1/

Motivat aj: 2 j s 1/  r and

 

 

_

₂j

_

k j jk _k _l

g x :g x a x I _ ,k j

: 1 if 2

with

2 I

_

_k_ j_l

_



j

k l , else

_

_

2j : 0

k l

I _  .

Obvi-ously,

2jl

g g,



g x

 

x  

 d 1 and

 

s1/r

k

g x  for large implies

that k

0 2 0

c  _

  j 

j, which

g

h

is a density function e assumptions of

for each k.

By t , the wavelet  is

com-pported and pactly su

s r q

B

t times differentiable. Therefore,

  

R t s



   and ,

 

.

s

k r q

, g B R Because

 1/ 2 1/

2j s r 1

j

a    ,

,

s r q

j _{jk B}

a C and so is

, s r q k B g Hence,

due to Lemma 1.1.

 

,

s

k r q



,



g x B R L . Clearly,

 

2 1/ 1/

2 :

j

k k p k l p j jk p

j s p r j p

g g g g a

    

   

 



(2.1)

For k  k j

 

2jl due to



: 2 j s1/ 2 1/ 

j

a     r . Fur-

thermore, :

2

j k n k p

A  f g  

  satisfies AkAk 

. Recall that 1. By Lemm

for kk # j 2j a 1.2,

 

2

1 3

sup _k min , 2 exp j

c j

g k

  

  _ _ . Here and

,

2 j

k  e 

after

n

P A   

n f

P

th

stands for the probability measure

correspond-o e density function

ing t

 

:

   

1 2

 

n

f x  f x f

n n

x f x . It is easy to see that Pg_k Pg₀ from the constructions

of gk. Since fn is an estimator of density with n i.i.d.

random samples,





 

2 i k

j n j j n c

n k p g n k p g k

E f g  P  f g   P A .

2 2  _ _ Then,





 

2 sup sup 2 1 3

min , 2 exp .

2 2

k j

k j

j

j n c

n k p g k

k

j j

E f g P A

e           _ _  __     (2.2)

Next, one shows ₂j c na01 2j 







 

_{ }

1 2

1

1 2 ₀ 1

2

, : n n ln d

n

n n n

n

f f

f x

K P P f x x

f x

 



_

, 1

 

1

n n

j

f x   

 

1 j

f x and 2

 

1 2

 

n n

j j

f x    f

 

x . Then





 

1

_{ }



1 1



1 2 1 1 2

1 2

, ln i d ,

n n

i i

f x

: Recall that

.

n

K P P f x x nK P P

f x







Note that





 

1 1 1

1 2 1

2

, : ln f x d

P P f x x

f x



_

lnu u 1

K and  

for en

0 Th

u .





 

_{ }

 

_{ }

 

1 1 2 1

2 1 1

2

1 2

2 1 2

, ln d

1 d

d .

n n f x

K P P n f x x

f x

n f x x

f x

n f x  f x f x x

            



Hence,









2

2j : inf 2 k, v 2 k, j_l .

j

j n n j n n

g g g g

v

k v k

K P P K P P

  

 _ _









Moreover,

 

1

 

2

2j 2 d

j

k k

n g x g x g x

  





 

 x. (2.3)

ng to the definition of

Accordi gk, supp



gkg





 

0,l

and g x

 

c₀ on

 

0,l. Thus,



g x

 

1 g_k

 

x g

 

x 2

 

2

 

2

1 1 2 1 2

0 ₂ 0

d

0 j jk x x c aj jk x c aj

 _  _ 

by the

orthonormality of

dxc



a 

. Then (2.3) reduces to

jk



1 2 0 2j c naj.

 _ 

(2.4)

ke

Ta  

1 2 1

2 ln

/ 1 2 1 1

2 2

2 l

j s r j

na n n

 

 _   _

  .

s r

n  

. Then j n        n

Now, one can choose C0 such that na2j Clnn

and C4



s1r



2c0. Therefore,   4 1/ 2

1

s r

c

n

 _{ }1

1 1

0 0

2 2

ln

j c C

j j

j

n

e e na

n      _   _  _ _    

and (2.2) reduces to supk_j E



fngk p



Cj. Then

the desired follows from  j  _p2j s1/p1/r by (2.1)

and  

1 2 1/ 1

2 ln s r j n n          .  





, 2 1 , sup s r q s s n p R L

E f f n

Now, we prove 

f B





 . Our

(4)

H. Y. WANG ₁₂₆₁

ov-Gilbert) Let

Lemma 2.1. (Varsham







1



:  , ,m

    , _i

 

0,1. Then there exists a

sub-set



0,,M



of

/8

m

 with 0 



0,, 0



such that 2

M and





1

0 .

8

m

i j m k k k

i j M

 



    



construct

It is sufficient to g_i



i0,1,,M



such

that i ,



,



s r q

g_ B R L and





2 1_.

i s

n

p n



 

 (2.5)

sup

i

E f g 

As pro ove, let

s

ved ab  be a compactly supported,





t ts regular and orthonormal scaling function, be upp

the corresponding wavelet with s  



0,l



, lN.

s s





Assume_ g_ Br q, R, 1/ 2

 

,  j:

 

:

L

 

  _and _ne

:

j

a 



0, , 2 ,l l

[0, ] 0

| l 0

g c 



1



l and

 

. Defi



2 j s , 2j

j

j k jk k

g_ x g x a   x





with

 

 

0,12j

j

k k

   _  (note that g0g). Since

 

0,1 k

  , one knows that 2

j

r _j

k

k  



nd a

 

1/ 1/ 2 1/

2 1

j j

r r r

k k

a 

 



 



 







By Lemma 1.1,

.

j s

,

s j

r q

k jk

B 

j k

a



  C, and so is

,

s r q

B

g_ . Hence gBr qs,



R L,



. at the supports of

Note th jk for k j are

mutu-ally disjoint. Then

 

0 0 2 0

js j jk

g_ x c a  _ c    _ 

for big j. This with



g_

 

x dx



g x

 

dx1 implies that g_ is a density function for each

 

0,12

j

 

1, there exists

.

Ac-cording to Lemma 2.



 0_, 1_,__,M



such that M22j3 and

3

2 . l i j

k

 _ _ 



(2.6)

j

k k



Because suppjksuppjk for k  k j, one

knows that

 1

2

l i

j

p _p _l _i p

j k k

k p

j

p

jk _p

p p

sp j l i

k k

k p



g_ g_ a  





_  

This with (2.6) and 

 

  





, l k

 i

 

0,1

k

  leads to

3

2

l i

pj p

g

     2 and

p p _s p

g  

1/

8 2 :

l i

p sj j p

p

g_ g_      . (2.7)

Clearly, the sets



0,1, ,



2

i i

j n

A_  f g_   i M

  

satisfy A_l A_i  for . Then Fano’s Lemma

yi

il

elds

 



1



0

1

sup min , Mexp 3 .

2 i

i

n c

g M

i M

P A e

  

  

 

 _   _

  (2.8)

On the other hand, it follows 01 22

j

M c

 _ 

f of (2.4). Take

j

na from

the similar arguments to the proo

1 2 1

2j__n s _{. Then}

choose a c 0 su hat

2 (2 1)

2 s j 1

j

na a    . Hence, one can

onstant C ch t

1 2 1

4 4

0 2 0 2

2 2

2 j e j j 2 j e j 1

M c na c C

M e

 

 _  _

 _ _

.



Therefore, (2.8) reduces to

 

0

sup _i i

n c g i M

P A C

 

 

 0 and





0

sup n i

p i M

E f g_

 

0

sup .

2 2 j

i M

C

i i

j n j

g n p

P f g

  

  

 

 

  _

  



This with (2.7) and

1 2 1

2j __n s _{yield (2.5).}

3.

The author Huiying Wang is grateful to the referees for

their valuable comments and thanks her adviso

Profes-sor Youming Liu, for his helpful guidance. Th work is supported by the National Natural Science Foundation of China (No. 10871012) and Natural Science Foundation of Beijing (No. 1082003).

] G. Kerkyacharian and D. Picard, “Density Estimation in

Acknowledgements

r, is

4. References

[1

Besov Spaces,” Statistics & Probability Letters, Vol. 13, No. 1, 1992, pp. 15-24.

doi:10.1016/0167-7152(92)90231-S

[2] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian and D. Picard, “Density Estimation by Wavelet Thresholding,” The Annals of Statistics, Vol. 24, No. 2, 1996, pp. 508- 539. doi:10.1214/aos/1032894451

Kerkyacharian, D. Picard and A. B. Tsy-lets, Approximation and Statistical Appli-cations,” Springer-Verlag, New York, 1997.

mir Zaiats, Springer rk, 2009.

[3] W. Härdle, G. bakov, “Wave

[4] A. B. Tsybakov, “Introduction to Nonparametric Estima-tion,” (English) Revised and Extended from the 2004 French Original, Translated by Vladi

Series in Statistics, Springer, New Yo

(5)

9-AOS682

2009, pp. 3362-3395. doi:10.1214/0

05.010

[6] C. Christophe, “Regression with Random Design: A Minimax Study,” Statistics & Probability Letters, Vol. 77, No. 1, 2007, pp. 40-53. doi:10.1016/j.spl.2006.