doi:10.4236/am.2011.210175 Published Online October 2011 (http://www.SciRP.org/journal/am)
Convergence Rates of Density Estimation in Besov Spaces
Huiying Wang
Department of Applied Mathematics, Beijing University of Technology, Beijing, China E-mail: b200806005@emails.bjut.edu.cn
Received July 29, 2011; revised August 23, 2011; accepted August 30, 2011
Abstract
The optimality of a density estimation on Besov spaces ,
s r qB R for the risk was established by Donoho, Johnstone, Kerkyacharian and Picard (“Density estimation by wavelet thresholding,” The Annals of Statistics, Vol. 24, No. 2, 1996, pp. 508-539). To show the lower bound of optimal rates of conver-gence
p L
, ,s
n r q
R B p , they use Korostelev and Assouad lemmas. However, the conditions of those two lemmas are difficult to be verified. This paper aims to give another proof for that bound by using Fano’s Lemma, which looks a little simpler. In addition, our method can be used in many other statistical models for lower bounds of estimations.
Keywords:Optimal Rate of Convergence, Density Estimation, Besov Spaces, Wavelets
1. Introduction
Wavelet analysis has many applications, one of which is to estimate an unknown density function based on inde-pendent and identically distributed (i.i.d.) random
sam-ples. Let be a probability measurable space
and 1
( , , P) , , n
X X be i.i.d. random variables with an
un-known density function f. We use to denote the
expectation of a random variable X. The sequence
E X
,
: inf sup
n
n n p
f f V
R V p E f f
is called optimal rate
of convergence on the functional class V for the Lp risk.
Here, fn is an arbitrary estimator of f with n i.i.d.
ran-dom samples. Kerkyacharian and Picard [1] study
n when V is a Besov space with matched case.
Donoho, Johnstone, Kerkyacharian and Picard [2] con-sider unmatched cases. In fact, they show the optimal
convergence rates for Besov class
,p
R Vr
L Br qs, and Lp
risk
1/ 1/ 2 1/ 1 ,
2 1
ln , ,
2 1 ,
, .
2 1 s r p
s r s
n r q s
s
p
n n r
s R B p
p
n r
s
(1.1)
To show the lower bound of (1.1), authors of [2,3] use Korostelev and Assouad lemmas. However, the condi-tions of those two lemmas are difficult to be verified. In this small paper, we give another proof for the lower
bound of (1.1) by using Fano’s lemma [4]. It should be pointed out that Fano’s lemma can be used to a variety of statistical models, see [5-7].
As usual, Lp
R p1
denotes the classical Lebes-gue space on the real line R. In particular, L2
R
R
stands for the Hilbert space, which consists of all square
inte-grable functions. As a subspace of p , the Sobolev
space with an integer exponent k means
L
:
, m
, 0,1, ,
1
kp p
W R f f L R m k p .
The corresponding norm
: .
k p
k
W p p
f f f
Moreover, the Besov space ,
[3] ,s p q
B R (1 p q ,
s n and (0,1]) can be defined by
2
, , 2 , 2
n
s n j j
p q p p
j Z
B R f W R f l
q
with the associated norm
,
2 ( )
: 2 , 2
s n
p q p q
j n j
p
B W
l Z
f f f ,
where
2
, : sup 2 2 .
p f t h t f x h f x h f x p
In general, it can be shown that compactly supported and
n times differentiable functions belong to ,
s p q
H. Y. WANG 1259
Where p and q are density functions of P Q respec-tively.
Lemma 1.2. (Fano’s Lemma, [4]) Let be
pr
,
0 s n and 1 p, q .
The Besov space can be discretized by the sequence norm of wavelet coefficients. Many useful wavelets are
generated by scaling functions. More precisely, if is
a scaling function with
k 2
2
,k
x h x k
then
:
1 1k 2
2
k k
x
h xk defines a
wave-let [3]. Clearly, when is compactly supported and
continuous, the corresponding wavelet has the same
properties. An orthonormal wavelet basis of L2
R isgenerated from dilation and translation of a scaling func-tion and its corresponding wavelet, i.e.
0 0
0
2
2
,
: 2 2 ,
: 2 2 .
j j
j j 0
j k
jk
j j k Z
x x k
x x k
Although wavelet basis are constructed for L2
R ,most of them constitute unconditional bases for Lp
R .A scaling function is called t regular, if has
con-tinuous derivatives of order t and its corresponding
wavelet has vanishing moments of order t, i.e.
k
x x dx0,k0,1, ,t1
.The following lemma [3] plays important roles in this paper.
Lemma 1.1. Let be a compactly supported, t regu-lar orthonormal scaling function with the corresponding wavelet and0 s t. If f Lp
R , s0k : f,0k ,: ,
jk jk and 1 ,
d f p q , then the following
two conditions are equivalent:
1) ,
s p q
fB R ;
2)
1 1 2 0
0
2 j s
p j
p p
j q
s d
.
Furthermore,
,
1 1 2 0
0
2
s p q
j s p
j
B p p
j q
f s d
.
Before introducing Fano’s Lemma, we need the nota-tion of Kullback-Leilber distance [4]. Let P and Q with P
being absolutely continuous with respect to Q (denoted
by ). Then the Kullback-Leilber distance is
de-fined by
PQ
,
: 0
ln
d p qp x
( , , Pk)
obability measurable spaces and Ak, k0,1,,m.
If Ak
,
K P Q p x x
q x
v
A
for kv, then with Ac standing for
the complement of A and
k, v
v
0
1 : inf m
v m k
K P P
,m
1
1
, exp 3 .
2 m e m
0
sup k kc min k m
P A
By Lemma 1.1 and 1.2, we can show the following re-sult:
Theorem 1.1. Let
, ,
s r q
f B R L with1r, q ,
1 p and sr1. If fn is an estimator of f with
n i.i.d. random samples, then
,
1/ 1/ 2 1/ 1
2 1 ,
ln
sup max , ,
s r q
s r p s s r
s n p
f B R L
n
E f f n
n
where
,
, , : , , s
r q
s s
r q r q B
B R L f B R f L and f
x y means
has compact support}; The notation with a constant C.
rk 1.1. Note that
xCy Rema
1/ 1/
2 1/ 1
ln
1/ 1/ 2 1/ 1 2 1 ln
max ,
s r p p
s s r n
n
s r
s r
s n
n
n
for
2 1
p r
s
and for 2 1
p r
s
1/1/1/ 1
2 1 2 1
max ,
s r p
2
ln s r s s
s s
n n
n
.
Then theorem 1.1 is a reformulate of the lowe und in
(1.1). By using the idea of reference [5], we show this theorem in the next two sections.
irstly, we prove
n
r bo
2. Proof of Theorem 1.1
F
1/ 1/
2 1/ 1
ln
. s
, , sup
s r q
r p s r n p
n f f
n
One need construct
f B R L
E
such that s
,
k r
k
g g B,q R L
and
1//1/ 1.
2 1
ln sup
s r p r n k p
k n
Let
s
n E f g
ono
be a compactly supported, regular and
orth rmal scaling function,
t ts
be the corresponding
d gers. Th
N enotes the set of positive inte en there
ex-ists a compactly supported density function g (i.e.
0g x and
g
) satisfying
,
s r q
d 1
x x
g x B R and g x
0,l c00.
l l l, 2jl
. ThenLet elem
, 2 ,, 2j1 : 0,
j
ents in
the number of
j
enote
ed by [
is 2j
5], one defines
1
, d d by # j 2j1.
2 1/
Motivat aj: 2 j s 1/ r and
2j
k j jk k l
g x :g x a x I ,k j
: 1 if 2
with
2 I
k jl
j
k l , else
2j : 0
k l
I .
Obvi-ously,
2jl
g g,
g x
x d 1 and
s1/rk
g x for large implies
that k
0 2 0
c
j
j, which
g
h
is a density function e assumptions of
for each k.
By t , the wavelet is
com-pported and pactly su
s r q
B
t times differentiable. Therefore,
R t s
and ,
.s
k r q
, g B R Because
1/ 2 1/
2j s r 1
j
a ,
,
s r q
j jk B
a C and so is
, s r q k B g Hence,
due to Lemma 1.1.
,s
k r q
,
g x B R L . Clearly,
2 1/ 1/
2 :
j
k k p k l p j jk p
j s p r j p
g g g g a
(2.1)
For k k j
2jl due to
: 2 j s1/ 2 1/
j
a r . Fur-
thermore, :
2
j k n k p
A f g
satisfies AkAk
. Recall that 1. By Lemm
for kk # j 2j a 1.2,
21 3
sup k min , 2 exp j
c j
g k
. Here and
,
2 j
k e
after
n
P A
n f
P
th
stands for the probability measure
correspond-o e density function
ing t
:
1 2
nn
f x f x f
n n
x f x . It is easy to see that Pgk Pg0 from the constructions
of gk. Since fn is an estimator of density with n i.i.d.
random samples,
2 i k
j n j j n c
n k p g n k p g k
E f g P f g P A .
2 2 Then,
2 sup sup 2 1 3min , 2 exp .
2 2
k j
k j
j
j n c
n k p g k
k
j j
E f g P A
e (2.2)
Next, one shows 2j c na01 2j
1 2
1
1 2 0 1
2
, : n n ln d
n
n n n
n
f f
f x
K P P f x x
f x
, 1
1n n
j
f x
1 j
f x and 2
1 2
n n
j j
f x f
x . Then
1
1 1
1 2 1 1 2
1 2
, ln i d ,
n n
i i
i i
f x
: Recall that
.
n
K P P f x x nK P P
f x
Note that
1 1 1
1 2 1
2
, : ln f x d
P P f x x
f x
lnu u 1K and
for en
0 Th
u .
1 1 2 1
2 1 1
2
1 2
2 1 2
, ln d
1 d
d .
n n f x
K P P n f x x
f x
f x
n f x x
f x
n f x f x f x x
Hence,
22j : inf 2 k, v 2 k, jl .
j
j
j n n j n n
g g g g
v
k v k
K P P K P P
Moreover,
1
22j 2 d
j
j
k k
n g x g x g x
x. (2.3)ng to the definition of
Accordi gk, supp
gkg
0,land g x
c0 on
0,l. Thus,
g x
1 gk
x g
x 2
2
21 1 2 1 2
0 2 0
d
0 j jk x x c aj jk x c aj
by the
orthonormality of
dxc
a . Then (2.3) reduces to
jk
1 2 0 2j c naj.
(2.4)
ke
Ta
1 2 1
2 ln
/ 1 2 1 1
2 2
2 l
j s r j
na n n
.
s r
n
. Then j n n
Now, one can choose C0 such that na2j Clnn
and C4
s1r
2c0. Therefore, 4 1/ 21
s r
c
n
1
1 1
0 0
2 2
2 2
ln
j c C
j j
j
n
e e na
n
and (2.2) reduces to supkj E
fngk p
Cj. Thenthe desired follows from j p2j s1/p1/r by (2.1)
and
1 2 1/ 1
2 ln s r j n n .
, 2 1 , sup s r q s s n p R LE f f n
Now, we prove
f B
. Our
H. Y. WANG 1261
ov-Gilbert) Let
Lemma 2.1. (Varsham
1
: , ,m
, i
0,1. Then there exists asub-set
0,,M
of/8
m
with 0
0,, 0
such that 2M and
1
0 .
8
m
i j m k k k
i j M
construct
It is sufficient to gi
i0,1,,M
suchthat i ,
,
s r q
g B R L and
2 1.i s
n
p n
(2.5)
sup
i
E f g
As pro ove, let
s
ved ab be a compactly supported,
t ts regular and orthonormal scaling function, be upp
the corresponding wavelet with s
0,l
, lN.s s
Assume g Br q, R, 1/ 2
, j:
:L
and ne
:
j
a
0, , 2 ,l l[0, ] 0
| l 0
g c
1
l and
. Defi
2 j s , 2j
j
j k jk k
g x g x a x
with
0,12jj
k k
(note that g0g). Since
0,1 k , one knows that 2
j
r j
k
k
nd a
1/ 1/ 2 1/
2 1
j j
r r r
k k
a
By Lemma 1.1,
.
j s
,
s j
r q
k jk
B
j k
a
C, and so is,
s r q
B
g . Hence gBr qs,
R L,
. at the supports ofNote th jk for k j are
mutu-ally disjoint. Then
0 0 2 0js j jk
g x c a c
for big j. This with
g
x dx
g x
dx1 implies that g is a density function for each
0,12j
1, there exists
.
Ac-cording to Lemma 2.
0, 1,,M
such that M22j3 and
3
2 . l i j
k
(2.6)j
k k
Because suppjksuppjk for k k j, one
knows that
1
2
l i
j
p p l i p
j k k
k p
j
p
jk p
p p
sp j l i
k k
k p
g g a
This with (2.6) and
, l k
i
0,1k
leads to
3
2
l i
pj p
g
2 and
p p s p
g
1/
8 2 :
l i
p sj j p
p
g g . (2.7)
Clearly, the sets
0,1, ,
2
i i
j n
A f g i M
satisfy Al Ai for . Then Fano’s Lemma
yi
il
elds
1
0
1
sup min , Mexp 3 .
2 i
i
n c
g M
i M
P A e
(2.8)
On the other hand, it follows 01 22
j
M c
f of (2.4). Take
j
na from
the similar arguments to the proo
1 2 1
2jn s . Then
choose a c 0 su hat
2 (2 1)
2 s j 1
j
na a . Hence, one can
onstant C ch t
1 2 1
4 4
0 2 0 2
2 2
2 j e j j 2 j e j 1
M c na c C
M e
.
Therefore, (2.8) reduces to
0
sup i i
n c g i M
P A C
0 and
0
sup n i
p i M
E f g
0
sup .
2 2 j
i M
C
i i
j n j
g n p
P f g
This with (2.7) and
1 2 1
2j n s yield (2.5).
3.
The author Huiying Wang is grateful to the referees for
their valuable comments and thanks her adviso
Profes-sor Youming Liu, for his helpful guidance. Th work is supported by the National Natural Science Foundation of China (No. 10871012) and Natural Science Foundation of Beijing (No. 1082003).
] G. Kerkyacharian and D. Picard, “Density Estimation in
Acknowledgements
r, is
4. References
[1
Besov Spaces,” Statistics & Probability Letters, Vol. 13, No. 1, 1992, pp. 15-24.
doi:10.1016/0167-7152(92)90231-S
[2] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian and D. Picard, “Density Estimation by Wavelet Thresholding,” The Annals of Statistics, Vol. 24, No. 2, 1996, pp. 508- 539. doi:10.1214/aos/1032894451
Kerkyacharian, D. Picard and A. B. Tsy-lets, Approximation and Statistical Appli-cations,” Springer-Verlag, New York, 1997.
mir Zaiats, Springer rk, 2009.
[3] W. Härdle, G. bakov, “Wave
[4] A. B. Tsybakov, “Introduction to Nonparametric Estima-tion,” (English) Revised and Extended from the 2004 French Original, Translated by Vladi
Series in Statistics, Springer, New Yo
9-AOS682
2009, pp. 3362-3395. doi:10.1214/0
05.010
[6] C. Christophe, “Regression with Random Design: A Minimax Study,” Statistics & Probability Letters, Vol. 77, No. 1, 2007, pp. 40-53. doi:10.1016/j.spl.2006.