R E S E A R C H
Open Access
Wavelet optimal estimations for a
two-dimensional continuous-discrete density
function over
L
p
risk
Lin Hu
1, Xiaochen Zeng
2and Jinru Wang
2**Correspondence:
[email protected] 2Department of Applied
Mathematics, Beijing University of Technology, Beijing, P.R. China Full list of author information is available at the end of the article
Abstract
The mixed continuous-discrete density model plays an important role in reliability, finance, biostatistics, and economics. Using wavelets methods, Chesneau, Dewan, and Doosti provide upper bounds of wavelet estimations onL2risk for a
two-dimensional continuous-discrete density function over Besov spacesBs r,q. This
paper deals withLp(1≤p<∞) risk estimations over Besov space, which generalizes
Chesneau–Dewan–Doosti’s theorems. In addition, we firstly provide a lower bound of Lprisk. It turns out that the linear wavelet estimator attains the optimal convergence
rate forr≥p, and the nonlinear one offers optimal estimation up to a logarithmic factor.
Keywords: Wavelets; Density estimation; Continuous-discrete density; Optimality
1 Introduction
1.1 Introduction
The density estimation plays an important role in both statistics and econometrics. This paper considers a two-dimensional density estimation model defined over mixed continu-ous and discrete variables [2]. More precisely, let (X1,Y1), (X2,Y2), . . . , (Xn,Yn) be indepen-dent and iindepen-dentically distributed (i.i.d.) observations of a bivariate random variable (X,Y), whereXis a continuous random variable, andYis a discrete one. The joint density func-tion of (X,Y) is given by
f(x,v) = ∂
∂xF(x,v)
withF(x,v) =P(X≤x,Y=v) being the distribution function of (X,Y). We are interested in estimatingf(x,v) from (X1,Y1), (X2,Y2), . . . , (Xn,Yn). This continuous-discrete density model also arises in survival analysis, economics, and social sciences. For example, con-sider a series system withmcomponents, which fails as soon as one of the components fails. LetXbe the failure time of the system, and letYbe the component whose failure resulted in the failure of the system. Then (X,Y) is a bivariate continuous-discrete random variable. For more examples, see [1] and [4].
The conventional kernel method gives a nice estimation for the continuous-discrete density function [1,10,14]. However, it is hard to provide the optimal estimation for the
densities in Besov spaces. In addition, the complexity of bandwidth selection increases the difficulty of the kernel method.
Recently, wavelet methods have made the remarkable achievements in density estima-tion [7,8,11,12,15] due to their time and frequency localization, multiscale decompo-sition, and fast algorithm in numerical computations. In fact, wavelet estimation attains optimality for densities in Besov spaces, which avoids the disadvantage of kernel methods. Using the wavelet method, Chesneau et al. [2] constructed linear and nonlinear wavelet estimators for a two-dimensional continuous-discrete density function and derived their mean integrated squared errors performance over Besov balls.
This paper addressesLp (1≤p<∞) risk estimations on Besov balls by using wavelet bases, which generalizes Chesneau–Dewan–Doosti’s theorems. It should be pointed out that a lower bound forLprisk of all estimators is derived firstly. It turns out that the linear wavelet estimator is optimal forr≥pand the nonlinear one attains optimal estimation up to a logarithmic factor.
1.2 Notations and definitions
In this paper, we use the tensor product method to construct an orthonormal wavelet basis forL2(R2), which will be used in later discussions. With a one-dimensional Daubechies
scaling functionD2N and a wavelet functionψ2N (ψ2Ncan be constituted from the scaling functionD2N), we construct two-dimensional tensor product waveletsϕ,ψ1,ψ2, andψ3 as follows:
ϕ(x,y) :=D2N(x)D2N(y), ψ1(x,y) :=D2N(x)ψ2N(y),
ψ2(x,y) :=ψ2N(x)D2N(y), ψ3(x,y) :=ψ2N(x)ψ2N(y).
Thenϕandψi(i= 1, 2, 3) are compactly supported in time domain, because Daubechies’ waveletD2N andψ2N are [5,8].
Denote
ϕj,k(x,y) := 2jϕ
2jx–k1, 2jy–k2
, ψji,k(x,y) := 2jψi2jx–k1, 2jy–k2
fork= (k1,k2)∈Z2andi= 1, 2, 3. Then for eachf ∈L2(R2),
f =
k∈Z2
αj0,kϕj0,k+
∞
j=j0 3
i=1
k∈Z2 βji,kψji,k
holds inL2 sense, whereαj,k:=f,ϕj,k,βji,k:=f,ψji,k. As usual, letPjbe the orthogonal projection operator defined by
Pjf :=
k∈Z2
f,ϕj,kϕj,k.
One of advantages of wavelet bases is that they can characterize Besov spaces, which contain Hölder spaces andL2-Sobolev spaces as particular examples. Throughout the
pa-per, we work within a Besov space on a compact subset ofR2. The following lemma shows
equivalent definitions for those spaces, which are fundamental in our discussions.
Lemma 1.1([13]) Letϕ be an m-regular orthonormal scaling function with the corre-sponding waveletsψi(i= 1, 2, 3).If f ∈Lr(R2),α
j,k=f,ϕj,kβji,k=f,ψji,k,and1≤r,q≤
∞, 0 <s<m.Then following assertions are equivalent:
(i) f∈Bsr,q(R2); (ii) {2jsP
j+1f–Pjfr}j≥0∈lq; (iii) {2j(s+1–2p)β
j,·r}j≥0q<∞. The Besov norm off can be defined by
fBsr,q:=αj0,·r+2 j(s+1–p2)
βj,·r
j≥j0q,
whereαj0,·rr:=
k∈Z2|αj0,k|randβj,·rr:= 3
i=1
k∈Z2|βji,k|r.
Here and further,ABmeans thatA≤CBfor some constantC> 0 independent ofA
andB,ABmeansBA, andA∼Bstands for bothABandAB.
Remark1.1 By (i) and (ii) of Lemma1.1we observe that
Pjf –fr=
∞
l=j
(Pl+1f–Plf) r
≤
∞
l=j
Pl+1f–Plfr
∞
l=j
2–ls2–js
forf∈Bs
r,q(R2). Hence
Pjf –fr2–js. (1.1)
Remark1.2 Whenr≤p, Lemma1.1(i) and (iii) imply that, fors–p2=s–2r > 0,
Bsr,qR2→Bsp,qR2,
where A→Bstands for a Banach spaceAcontinuously embedded in another Banach spaceB. More precisely,uB≤CuA(u∈A) for some constantC> 0.
Lemma 1.2([13]) Letϕ∈L2(R2)be a scaling function or a wavelet withsup
k∈Z2|ϕ(x–k)|<
∞. Then, forλ={λk} ∈lp(Z2) and 1≤p≤ ∞,
k∈Z2
λkϕj,k p
∼2j(1–2/p)λp.
Hereλpis thelp(Z2) norm ofλ∈lp(Z2):
λp:= ⎧ ⎨ ⎩
(k∈Z2|λk|p)1/p ifp<∞,
1.3 Main results
In this subsection, we state our main results and discuss relations to some other work. To do that, we propose a new bivariate functionf∗(x,y), which is an improved one of that in
[2]. Define
f∗(x,y) :=
m
v=1
u(y,v)P(Y=v)f(x|Y=v)
with
u(y,v) = ⎧ ⎪ ⎨ ⎪ ⎩
1
1+e
1
y–v+y–1v+1
1(v–1,v)(y) + e 1
y–v+y–1v–1
1+e
1
y–v+y–1v–1
1(v,v+1)(y), y=v,
1, y=v,
where 1Dis the indicator function of a setD.
The construction off∗follows the idea proposed by Chesneau [2] but is different from [2]. The weight u(y,v) equals to characteristic function 1{v–1
2≤y<v+12} in [2]. By a careful
verification our weightu(y,v) is differentiable with respect toyfor eachv∈ {1, 2, . . . ,m}. The modification ofu(y,v) from the characteristic function to the smooth one makesf∗
continuous iny. It is easy to see that, for anyy=v∈ {1, 2, . . . ,m},
f∗(x,y) =f(x,v).
Hence, the problem is converted to construct an estimator off∗. As in [2], we assume that f∗belongs to the spaceBsr,q(H,Q) or, equivalently,f∗belongs to the Besov ball
Bsr,q(H) :=f,f ∈Bsr,qR2andfBsr,q ≤H
and that the support off∗(x,·) is contained in [–Q,Q] for fixedv(Q> 0,v= 1, 2, . . . ,m). To introduce the wavelet estimator, we need the wavelet coefficient estimators ofαj,k andβji,k:
ˆ
αj,k= 1
n
n
l=1
R
ϕj,k(Xl,y)u(y,Yl)dy, βˆji,k= 1
n
n
l=1
R
ψji,k(Xl,y)u(y,Yl)dy. (1.2)
Define∧j0:={k∈Z 2,suppf
∗∩suppϕj0,k=∅}. Whenf∗andϕhave compact supports, the
cardinality of∧jsatisfies ∧j22j. Then the linear wavelet estimator off∗is given as
fol-lows:
ˆ
fnlin(x,y) := k∈∧j0
ˆ
αj0,kϕj0,k(x,y), (1.3)
wherej0is chosen such that 2j0∼n 1
2s+1,s:=s– (2
r –
2
p)+, andx+:=max{x, 0}. To obtain a nonlinear estimator, we takej0andj1such that 2j1∼lnnnand 2j0∼n
1 2m+1with
m>s. Define∧j:={k∈Z2,suppf∗∩suppψji,k=∅}andλj:= T22– j
2
lnn
described as Lemma2.3). Then the nonlinear estimator is given by
ˆ
fnnon(x,y) := k∈∧j0
ˆ
αj0,kϕj0,k(x,y) + j1
j=j0 3
i=1
k∈∧j
ˆ
βji,k1{| ˆβi j,k|>λj}ψ
i
j,k(x,y). (1.4)
From the definition ofˆfnnonwe find that the nonlinear estimator has the advantage to be adaptive, since it does not depend on the indicess,r,qandHin its construction.
The following theorem gives a lower bound estimation forLprisk.
Theorem 1.1 Letf be an estimator of fˆ ∗∈Bsr,q(H)with s>2rand r,q≥1.Then there exists
C> 0such that,for1≤p<∞,
sup
f∗∈Bsr,q(H)
Eˆfn–f∗pp≥Cmax
n–2sps+1,
lnn n
–(s– 2r+ 2p)p 2(s– 2r)+1
.
The upper bounds of the linear and nonlinear wavelet estimators are provided by The-orems1.2and1.3, respectively.
Theorem 1.2 Letfˆlin
n be the estimator of f∗∈Bsr,q(H,Q)defined by(1.3)with1≤r,q<∞,
s> 0.If the density of X is bounded,then for r≥p≥1or r≤p<∞and s>2r,
sup
f∗∈Bsr,q(H,Q)
Efˆnlin–f∗ppn–
ps
2s+1
with s=s– (2
r –
2
p)+and x+:=max(x, 0).
Remark1.3 Ifr≥2,p= 2 ands> 0,s=s, then Theorem1.2reduces to Theorem 4.1 in [2]. In addition, Theorem1.2does not make any restriction onQ, and so the assumptions are weaker than in [2]. Theorem1.2extends the corresponding theorem of [2] fromp= 2 top∈[1,∞).
Whenr≥p,s=sand the linear wavelet estimatorfˆlin
n attains optimality thanks to The-orems1.1and1.2. However, the linear estimator does not offer optimal estimation for
r<p, because ofs<sand s
2s+1<
s
2s+1 in this case.
To give a suboptimal estimation forr<p, we need the nonlinear wavelet estimators defined by (1.4).
Theorem 1.3 Letfˆnnonbe the estimator of f∗∈Bsr,q(H,Q)defined by(1.4)with1≤r,q<∞,
s> 0.If the density of X is bounded,then for r≥p≥1or r≤p<∞and s>2r,
sup
f∗∈Bsr,q(H,Q)
Efˆnnon–f∗pp(lnn)p
lnn n
αp
withα:=min{2ss+1, s–
2
r+2p
Remark1.4 Theorems1.1and1.3tell us that the nonlinear estimator is suboptimal up to a logarithmic factor. Moreover, ifp= 2 and{r≥2,s> 0}or{1≤r< 2,s>2r}, thenα=2ss+1, and Theorem1.3is the same as Theorem 4.2 in [2] up to a logarithmic factor. Hence Theorem1.3can be considered as an extension of Theorem 4.2 in [2] fromp= 2 top∈
[1,∞).
In particular, we can extend the theorems to the multidimensional case as in [3] by using the technique developed by [9]. It is a challenging problem to study the estimation of a multivariate continuous-discrete conditional density. We refer to [3] for further details.
2 Some lemmas
We shall show several lemmas in this section, which are needed for proofs of our main theorems.
Lemma 2.1 Letαˆj,kandβˆj,kbe defined by(1.2).Then
E(αˆj,k) =αj,k and E ˆ
βji,k=βji,k
for j≥j0,k∈Z2,and i= 1, 2, 3. Proof Denotecj,k(v) =
φj,k2(y)u(y,v)dy. Then
ˆ
αj,k= 1
n
n
i=1
ϕj,k(Xi,y)u(y,Yi)dy= 1
n
n
i=1
φj,k1(Xi)cj,k(Yi).
Since (X1,Y1), (X2,Y2), . . . , (Xn,Yn) are independent and identically distributed, we have
E(αˆj,k) =E
φj,k1(X1)cj,k(Y1)
=EEφj,k1(X1)cj,k(Y1)|Y1
=Ecj,k(Y1)E
φj,k1(X1)|Y1
=E
cj,k(Y1)
φj,k1(x)f(x|Y1)dx
= m
v=1
P(Y1=v)cj,k(v)
φj,k1(x)f(x|Y1=v)dx
=
m
v=1
P(Y1=v)u(y,v)f(x|Y1=v)
φj,k1(x)φj,k2(y)dx dy
= f∗(x,y)ϕj,k(x,y)dx dy=αj,k.
Similarly to the previous arguments,E(βˆi
j,k) =βji,k. The proof of Lemma2.1is done.
To show Lemma2.2, we introduce Rosenthal’s inequality.
Rosenthal’s inequality ([8]) LetX1,X2, . . . ,Xn be independent random variables such thatEXl= 0 andE|Xl|p<∞(l= 1, 2, . . . ,n). Then, withCp> 0,
E
n
l=1 Xl
p
≤
⎧ ⎨ ⎩
Cp[ n
l=1E|Xl|p+ ( n
l=1E|Xl|2)p/2], p≥2,
Cp( n
Lemma 2.2 Letαˆj,kandβˆj,kbe defined by(1.2).If the density of X is bounded,then there
exists a constant C> 0such that
E| ˆαj,k–αj,k|p≤2– p
2jn–
p
2 and E| ˆβj,k–βj,k|p≤2–
p
2jn–
p
2
for1≤p<∞and2j≤n.
Proof We only prove the first inequality, since the second one is similar. By the definition ofαˆj,k,
ˆ
αj,k= 1
n
n
l=1
Rϕj,k(Xl,y)u(y,Yl)dy= 1
n
n
l=1
φj,k1(Xl)cj,k2(Yl),
wherecj,k2(Yl) :=
Rφj,k2(y)u(y,Yl)dy, andφis a one-dimensional Daubechies scaling
func-tionD2N. Since|u(y,v)| ≤2, we obtain that cj,k
2(Yl)≤
R
φj,k2(y)u(y,Yl)dy≤2 –2jφ
1 (2.1)
and
Eφj,k1(Xl)cj,k2(Yl)
p
2–p2jEφ
j,k1(Xl)
p
2–p2j
R
φj,k1(x)
p
fX(x)dx2–j (2.2)
due to the boundedness offX. Defineξl:=φj,k1(Xl)cj,k2(Yl) –αj,k. Then
E|ξl|p=Eφj,k1(Xl)cj,k2(Yl) –αj,k
p
Eφj,k1(Xl)cj,k2(Yl)
p
+E|αj,k|p. (2.3) It follows from Lemma2.1and Jensen’s inequality that
E|αj,k|p=E
φj,k1(Xl)cj,k2(Yl)
p
≤Eφj,k1(Xl)cj,k2(Yl)
p .
Hence (2.3) reduces to
E|ξl|pEφj,k1(Xl)cj,k2(Yl)
p
2–j (2.4)
thanks to (2.2). By the definition ofαˆj,kandξl,αˆj,k–αj,k=1nnl=1ξl, whereξ1,ξ2, . . . ,ξnare independent because (X1,Y1), (X2,Y2), . . . , (Xn,Yn) also are. On the other hand, Lemma2.1 impliesE(ξl) = 0. Then Rosenthal inequality leads to
E| ˆαj,k–αj,k|p=E 1
n
n
l=1 ξl
p
⎧ ⎨ ⎩
n–p[n
l=1E|ξl|p+ ( n
l=1E|ξl|2) p
2], p≥2, n–p(n
l=1E|ξl|2) p
2, 1≤p≤2.
(2.5)
By (2.4) we know that
n–p
n
l=1 E|ξl|2
p
2
n–pn2–j p
for 1≤p< 2 and
n–p
n
l=1
E|ξl|p+ n
l=1 E|ξl|2
p
2
n–pn2–j+np22–
p
2jn–
p
22–
p
2j
forp≥2 thanks to the assumption 2j≤n. Combining these with (2.5), we receive the desired conclusion
E| ˆαj,k–αj,k|p2– p
2jn–
p
2.
This completes the proof.
To prove Lemma2.3, we need the well-known Bernstein inequality.
Bernstein’s inequality([8]) LetX1,X2, . . . ,Xnbe i.i.d. random variables withE(Xi) = 0 andXi∞≤M. Then, for eachγ> 0,
P
1
n
n
i=1 Xi
>γ
≤2exp
– nγ
2
2[E(X2
i) +X∞γ/3]
.
The next lemma is an extension of Proposition 4.2 in [2].
Lemma 2.3 Let2j≤lnnn,βˆji,k(i= 1, 2, 3)be defined in(1.2).If the density of X is bounded,
then for eachε> 0,there exists T> 0such that,for j≥0and k∈Z2,
Pβˆji,k–βji,k>T 22
–12j
lnn n
2–εj. (2.6)
Proof We only show (2.6) fori= 1. By the definition ofβˆj1,k,βˆj1,k=1
n n
l=1
Rψj1,k(Xl,y)u(y,
Yl)dy, and
ˆ
βj1,k–βj1,k=1
n
n
l=1
φj,k1(Xl)dj,k2(Yl) –β 1
j,k
,
where dj,k2(Yl) :=
Rψj,k2(y)u(y,Yl)dy (φ, ψ stand for the one-dimensional Daubechies
scaling function and wavelet function, respectively). Define ηl:=φj,k1(Xl)dj,k2(Yl) –βj1,k. Thenβˆj1,k–βj1,k=n1nl=1ηlandE(ηl) = 0 because ofβjl,k=E(βˆjl,k) =E[φj,k1(Xi)dj,k2(Yi)].
Using (2.1) with ψ instead of φ, we get |dj,k2(Yl)| 2– j
2. Note that |φj,k
1(Xl)| := 22j|φ(2jXl–k1)| ≤2
j
2φ∞. Then|φj,k
1(Xl)dj,k2(Yl)|1 and|βj1,k|=|E[φj,k1(Xl)dj,k2(Yl)]|
1. Hence
|ηl| ≤φj,k1(Xl)dj,k2(Yl) –β 1
j,k1. (2.7)
By replacingcj,k2andαj,kwithdj,k2 andβj1,k, respectively, arguments similar to (2.1)–(2.4) show that
Becauseη1,η2, . . . ,ηnare i.i.d. andE(ηl) = 0 (l= 1, 2, . . . ,n), Bernstein’s inequality tells us that
P
βˆjl,k–βjl,k= 1
n
n
l=1 ηl
>
T
22
–12j
lnn n
≤2exp
– nλ
2
j 2[E(η2l) +λ3jη∞]
(2.9)
withλj=T22–
1 2j
lnn
n . This with (2.7)–(2.8) implies
nλ2j
2[E(ηl2) +λj
3η∞]
≥ T2lnn
8(C1+C62T2
j
2
lnn n )
≥ T2lnn
8(C1+C62T)
because 22j
lnn
n ≤1 by the assumption 2j≤ n
lnn. Note thatlnn>jln2 due ton≥2jlnn> 2j. Hence
nλ2
j 2[E(η2
l) + λj
3η∞]
≥ T2ln2
8(C1+C62T) j>εj
by choosingT> 0 such that T2ln2
8(C1+C2 6T)
>ε. Then (2.9) reduces to
Pβˆj1,k–βj1,k>T 22
–12j
lnn n
≤2–εj,
which concludes (2.6) withi= 1. Similarly, the conclusions withi= 2, 3 hold. This
com-pletes the proof.
At the end of this section, we introduce two classical lemmas, which are needed for the proof of lower bound.
Lemma 2.4 (Varshamov–Gilbert lemma, [11]) Let :={ε= (ε1,ε2, . . . ,εm),εi∈ {0, 1}}.
Then there exists a subset(ε0,ε1, . . . ,εT)ofwithε0= (0, 0, . . . , 0)such that T≥2m8 and
m
k=1
εki–εjk≥m
8 (0≤i=j≤T).
To state Fano’s lemma,we introduce a concept:When P is absolutely continuous with
re-spect to Q(denoted by PQ),the Kullback divergence of P and Q between two measures
P and Q is defined by
K(P,Q) :=
p(x)lnp(x) q(x)dx,
where p(x)and q(x)are the density functions of P and Q,respectively.
Lemma 2.5(Fano’s lemma, [6]) Let(,F,Pk)be a probability spaces,and let Ak∈F,
Km:=inf0≤k≤mm1
k=kK(Pk,Pk),
sup
0≤k≤m
Pk
ACk≥min
1 2,
√
mexp–3e–1–Km
,
where K(Pk,Pk)is the Kullback distance of Pkand Pk(k= 0, 1, . . . ,m).
3 Proofs of lower bounds
We rewrite Theorem1.1as follows before giving its proof.
Theorem 3.1 Letfˆnbe an estimator of f∗∈Bsr,q(H)with s>2r and1≤r,q≤ ∞.Then,for 1≤p<∞,
sup
f∗∈Bsr,q(H)
Eˆfn–f∗ppmax
n–2ssp+1,
lnn n
(s– 2r+ 2p)p
2(s– 2r)+1
.
Proof As in Sect.1, we take the two-dimensional tensor product wavelet
ψ1(x,y) :=D2N(x)ψ2N(y),
whereD2N(·) andψ2N(·) are the one-dimensional Daubechies scaling function and wavelet function, respectively. Thenψ1ism-regular (m>s) for largeN, and
suppψ1⊆[0, 2N– 1]×[–N+ 1,N]
due tosuppD2N⊆[0, 2N– 1] andsuppψ2N⊆[–N+ 1,N]. Then there exists a compactly supported density functiong0 such that
R2g0(x)dx= 1,g0(x)|[0,2N–1]×[–N+1,N]=c0, and g0∈Bsr,q(H). Definej:=1j×2j with
1j :=0, 2N, 4N, . . . , 22j– 1N, 2j :=0,±2N,±4N, . . . ,±22j–1– 1N. Then j= 2j(2j– 1)∼22j( jdenotes the cardinality ofj). Denoteaj:= 2–(2s+1)jand
∧:=
gε(x,y) =g0(x,y) +aj
k∈j
εkψj1,k(x,y),εk∈ {0, 1}
.
Obviously, the supports ofψj1,kandψj1,kare disjoint fork=k∈jandsuppψj1,k⊆suppg0.
When (x,y)∈[0, 2N– 1]×[–N+ 1,N],
gε≥c0–ajψj1,k∞≥c0– 2–2sjψ1∞> 0
for largej. On the other hand,
R2gε(x,y)dx=
R2g0(x,y)dx= 1.
Moreover,gε∈Bsr,q(H). In fact, forεk∈ {0, 1},
k∈j|εk|
r≤22jand
2j(s+1–2r)aj k∈j
|εk|r 1
r
≤1.
By Lemma1.1,aj
k∈jεkψ
1
j,kBsr,q≤H. This withg0∈B s
r,q(H) impliesgε∈Bsr,q(H). According to Lemma 2.4 (Varshamov–Gilbert theorem), for ={ε= (εk)k∈j,εk ∈
{0, 1}}, there exists a subset{ε(0),ε(1), . . . ,ε(M)}ofsuch thatM≥222 j
8 ,ε(0)= (0, 0, . . . , 0),
and form,n= 0, 1, . . . ,M,m=n,
k∈j
ε(km)–εk(n)≥2 2j
8 . (3.1)
Denote∧:={gε(0),gε(1), . . . ,gε(M)}. Then∧⊆ ∧, and forgε(m),gε(n)∈ ∧,
gε(m)–gε(n)pp=apj
k∈j
εkm–εknpψj1,kp
p= 2
–2(sp+1)jψ1p p
k∈j
εmk –εknp,
since the supports ofψj1,k(k∈j) are mutually disjoint. This with (3.1) leads to
gε(m)–gε(n)pp≥C12–2psj:=δpj.
Define
Aε(i):=
ˆfn–gε(i)p<
δj 2
,
i= 0, 1, 2, . . . ,M. ThenAε(m)∩Aε(n)=∅form=n. Denote byPfnthe probability measure
with the densityfn(x,y) :=n
i=1f(xi,yi). By the construction ofgε(i),Png
ε(i) P
n
g0. Then it
follows from Lemma2.5(Fano’s lemma) that
sup
0≤i≤M
Png
ε(i)
ˆfn–gε(i)p≥
δj 2
≥ sup
0≤i≤M
Pgn
ε(i)
Acε(i)
≥min
1 2,
√
Me–3ee–KM
.
Furthermore,
Eˆfn–gε(i)pp≥ δpj
2pP n g
ε(i)
ˆfn–gε(i)p≥ δj 2
≥2–2psjPgn
ε(i)
Acε(i)
.
Taking 2j∼n2(21s+1), we obtain that
sup
0≤i≤M
Eˆfn–gε(i)pp≥2–2psj sup 0≤i≤M
Png
ε(i)
Acε(i)
≥n–2pss+1min
1 2,
√
Me–3ee–KM
withKM:=inf0≤v≤MM1
i=vK(Pgnε(i),P
n
gε(v)). By the definition of Kullback divergence,
KPng
ε(i),P
n g0
=
R2n
ln
n
i=1gε(i)(xi,yi) n
i=1g0(xi,yi) n !
i=1
gε(i)(xi,yi)dx1dy1dx2dy2· · ·dxndyn
=n
R2gε
(i)(x1,y1)ln
gε(i)(x1,y1) g0(x1,y1)
dx1dy1
≤n
R2
gε(i)(x1,y1)
gε(i)(x1,y1) g0(x1,y1)
– 1 dx1dy1, (3.3)
where we applied the inequalitylnu≤u– 1 foru> 0 in the last inequality. Note that
R2gε (i)(x1,y1)
gε(i)(x1,y1) g0(x1,y1)
– 1 dx1dy1
=
R2
g0(x1,y1)
–1
gε(i)(x1,y1) –g0(x1,y1)
2 dx1dy1
andg0(x1,y1) =c0for (x1,y1)∈[0, 2N– 1]×[–N+ 1,N]. Combining this with the Parseval
identity, we reduce (3.3) to
KPng
ε(i),P
n g0
≤nc–10 a2j
k∈j
εikψj1,k(x,y)
2
2
=nc–10 a2j
k∈j
εki2≤nc0–1a2j22j. (3.4)
Hence
KM≤ 1
M
M
i=1 KPng
ε(i),P
n g0
≤c–10 na2j22j.
On the other hand, 2j∼n2(21s+1) impliesna2
j ≤C. Then it follows fromM≥2
22j
8 ≥e22jln82
that
√
Me–KM≥e22jln162–c–10 C22j≥1
by choosingC> 0 such thatC<ln2
16c0. This with (3.2) leads to
sup
0≤i≤M
Eˆfn–gε(i)pp≥n–
ps
2s+1min
1 2,
√
Me–3ee–KM
n–2pss+1. (3.5)
Now, it remains to show that
sup
f∗∈Bsr,q(H)
Eˆfn–f∗pp≥C
lnn n
(s– 2r+ 2p)p
2(s– 2r)+1
. (3.6)
Similarly to the proof of (3.5), we construct the family of density functions{gk,k∈j}as follows:
whereaj:= 2–j(s+1–
2
r). Obviously,
R2gk(x,y)dx dy=
R2g0(x,y)dx dy= 1, and
gk(x,y)|[0,2N–1]×[–N+1,N]≥c0– 2–j(s– 2
r)ψ1
∞> 0
for largejsinces>2r. Thengk is a bivariate density function for fixedk∈j. From the proof of (3.5) we know thatg0∈Bsr,q(H). This with
ajψj1,kBsr,q∼aj2
j(s+1–2r)≤1
impliesgk∈Bsr,q(H) fork∈j. To prove (3.6), we need to show that
sup
k∈j
Eˆfn–gkpp≥C
lnn n
(s– 2r+ 2p)p
2(s– 2r)+1
. (3.7)
Whenk=k∈j,suppψj1,k∩suppψj1,k=∅and
gk–gkpp=a p
jψj1,k–ψj1,k p p= 2a
p
j2j(p–2)ψ1 p p= 2·2
–j(s–2r+2p)p
ψ1pp.
Moreover,
gk–gkp= 2
1
pψ1 p2
–j(s–2r+2p)
:=δj.
DefineBk:={ˆfn–gkp< δj
2}. ThenBk∩Bk=∅(k=k). According to Lemma2.5(Fano’s
lemma), we find that
sup
k∈j
Png
k
ˆfn–gkp≥
δj 2
≥min
1 2,
√
Me–3e–1e–KM
, (3.8)
where M= j andKM :=inf0≤v≤MM1
k=vK(Pngk,P n gv)≤
1
M
k=0K(Pngk,P n
g0). Similar to
(3.3)–(3.4), we conclude that
KPng
k,P n gv
≤n
R2
g0(x,y)
–1
gk(x,y) –g0(x,y)
2
dx dy≤c–10 C1na2j.
HenceKM≤c–10 C1na2j. By taking 2j∼(lnnn)
1
2(s– 2r)+1we obtain thatln2j≥Clnnande–KM≥
e–c–10 C1na2j ≥e–c–10 Clnn, thanks tona2
j ≤C2lnn(C=C1C2). Moreover, choosingC1andC
such thatC>c–1
0 C, we have
√
Me–3e–1e–KMeln2je–3e–1e–KM≥eClnn–c–10 Clnn–3e–11
due toM∼22j. This with (3.8) impliessupk∈
jP n
gk(ˆfn–gkp≥ δj
2)1. Furthermore,
sup
k∈j
Eˆfn–gkpp≥
δjp
2pPgk
ˆfn–gkp≥
δj 2
Then the desired conclusion (3.7) follows fromδj:= 2
1
pψ1
p2–j(s–
2
r+2p)and the choice of
2j∼( n
lnn)
1
2(s– 2r)+1. This completes the proof.
4 Proofs of upper bounds
In this section, we prove the upper bounds of wavelet estimators. The result of the linear one is derived firstly. We restate and prove Theorem1.2as Theorem4.1.
Theorem 4.1 Letfˆlin
n be the linear estimator of f∗∈Bsr,q(H,Q)defined in(1.3)with1≤
r,q<∞,s> 0.If the density of X is bounded,then for{r≥p≥1}or{r≤p<∞and s>2r},
sup
f∗∈Bsr,q(H,Q)
Efˆnlin–f∗ppn–
ps
2s+1
with s=s– (2
r –
2
p)+and x+:=max{x, 0}.
Proof Whenr≤p, s:=s– (2r – 2p)+=s–2r +2p andBsr,q(R2)→Bs
p,q(R2) thanks to Re-mark1.2. Then
sup
f∗∈Bsr,q(H,Q)
Efˆnlin–f∗pp sup
f∗∈Bsr,q(H,Q)
Efˆnlin–f∗pp.
Whenr>pandf∗has a compact support, thenfˆnlindoes due toϕhaving the same property. By the Hölder inequality,
sup
f∗∈Bsr,q(H,Q)
Efˆlin
n –f∗ p
p sup f∗∈Bsr,q(H,Q)
Efˆlin
n –f∗ r r
p r.
Becauses=sin that case, it is sufficient to prove that
sup
f∗∈Bs r,q(H,Q)
Efˆnlin–f∗ppn–
ps
2s+1 (4.1)
for the conclusion of Theorem4.1. Recall thatˆflin
n :=
k∈∧j0αˆj0,kφj0,k. Then by Lemma2.1we conclude that
Efˆnlin–Efˆnlinpp=E
k∈∧j0
(αˆj0,k–αj0,k)ϕj0,k p
p
2j0(p–2)
k∈∧j0
E| ˆαj0,k–αj0,k|p
due to Lemma1.2. It follows from Lemma2.2and ∧j022j0that
Efˆnlin–Eˆfnlinpp2j0(p–2)22j02–pj20n–
p
2 2
pj0 2 n–
p
2 n–
ps
2s+1 (4.2)
thanks to the choice of 2j0∼n2s1+1.
On the other hand, by Lemma2.1,E(fˆlin
n ) =
k∈∧j0αj0,kϕj0,k=Pj0f∗. Combining this with f∗∈Bsp,q(R2) and Remark1.1, we get that
Efˆnlin–f∗pp=Pj0f∗–f∗
p p2–j0ps
Taking 2j0∼n2s1+1, it is easy to show
Efˆnlin–f∗ppn–
ps
2s+1. (4.3)
Hence, by (4.2)–(4.3),
sup
f∗∈Bsr,q(H,Q)
Efˆnlin–f∗pp sup
f∗∈Bsr,q(H,Q)
Efˆnlin–Efˆnlinpp+ sup
f∗∈Bsr,q(H,Q)
Efˆnlin–f∗pp
n– ps 2s+1,
which means that (4.1) holds. The proof is done.
Next, we are in a position to prove the conclusion of the nonlinear one.
Theorem 4.2 Letfˆnon
n be the nonlinear estimator of f∗∈Bsr,q(H,Q)defined in(1.4)with 1≤r,q<∞,s> 0.If the density of X is bounded,then for{r≥p≥1}or{r≤p<∞and s>2r},
sup
f∗∈Bsr,q(H,Q)
Efˆnnon–f∗pp(lnn)p
lnn n
αp
withα:=min{2ss+1, s–
2
r+2p
2(s–2r)+1}.
Proof We only need to prove the caser≤p. In fact, whenr>p,fˆnon
n has a compact support because ofϕ,ψ, andf∗have the same property. By the Hölder inequality,
sup
f∗∈Bsr,q(H,Q)
Efˆnnon–f∗pp sup
f∗∈Bsr,q(H,Q)
Eˆfnnon–f∗rr
p r.
Using Theorem 4.2 for the case r = p, we find that supf∗∈Bs
r,q(H,Q)Eˆf
non
n – f∗rr (lnn)r(lnn
n )
αr, and therefore
sup
f∗∈Bsr,q(H,Q)
Efˆnnon–f∗pp(lnn)p
lnn n
αp .
It remains to estimate the caser≤p. Recall that
ˆ
fnnon–f∗=fˆnlin–Pj0f∗
+ (Pj1+1f∗–f∗) +
j1
j=j0 3
i=1
k∈∧j ˆ
βji,k1{| ˆβi j,k|>λj}–β
i j,k
ψji,k
withλj=T2– j
2
lnn
n . Denotefj0,j1:=
j1
j=j0
3
i=1
k∈∧j(βˆ
i
j,k1{| ˆβji,k|>λj}–βji,k)ψji,k. Then
Efˆnnon–f∗ppEˆfnlin–Pj0f∗
p
p+Pj1+1f∗–f∗ p
p+Efj0,j1
p
From the proof of Theorem4.1we obtain that
Efˆnlin–Pj0f∗
p p2
j0p
2 n–
p
2
lnn n
αp
and
Pj1+1f∗–f∗
p p2–j1ps
lnn n
αp
(4.5)
due to 2j0∼n2m1+1, 2j1∼ n
lnnandα=min{ s
2s+1,
s–2r+2p
2(s–2r)+1}.
Byfj0,j1:=
j1
j=j0
3
i=1
k∈∧j(βˆ
i
j,k1{| ˆβji,k|>λj}–β i
j,k)ψji,kand Lemma1.2,
Efj0,j1
p
p(j1–j0+ 1)p–1
j1
j=j0 3
i=1
2j(p–2) k∈∧j
Eβˆji,k1{| ˆβji,k|>λj}–β i j,k
p .
On the other hand, it is easy to see that
βˆji,k1{| ˆβji,k|>λj}–β i
j,k=βˆji,k–βji,k(1{| ˆβji,k|≥λj,|βji,k|<λj/2}+ 1{| ˆβji,k|≥λj,|βji,k|≥λj/2}) +βji,k(1{| ˆβij,k|<λj,|βij,k|>2λj}+ 1{| ˆβji,k|<λj,|βji,k|≤2λj})
and 1{| ˆβij,k|≥λj,|βij,k|<λj/2}≤1{| ˆβji,k–βji,k|>λj/2}. Then
Efj0,j1
p
pT1+T2+T3+T4 (4.6)
with
T1:= (lnn)p–1
j1
j=j0 3
i=1
2j(p–2) k∈∧j
Eβˆji,k–βji,kp1{| ˆβji,k–βji,k|>λj/2}
,
T2:= (lnn)p–1
j1
j=j0 3
i=1
2j(p–2) k∈∧j
Eβˆji,k–βji,kp1{| ˆβji,k|≥λj,|βji,k|≥λj/2}
,
T3:= (lnn)p–1
j1
j=j0 3
i=1
2j(p–2) k∈∧j
Eβji,kp1{| ˆβji,k|<λj,|βji,k|≤2λj}
,
T4:= (lnn)p–1
j1
j=j0 3
i=1
2j(p–2) k∈∧j
Eβji,kp1{| ˆβji,k|<λj,|βji,k|>2λj}
.
When| ˆβji,k|<λjand|βji,k|> 2λj,| ˆβji,k–βji,k| ≥ |βji,k|–| ˆβji,k|>| ˆβji,k|/2. Hence βji,kp1{| ˆβi
j,k|<λj,|βji,k|>2λj}βˆ i j,k–βji,k
p 1{| ˆβi
j,k–βji,k|>λj/2}.
Then (4.6) reduces to
Efj0,j1
p
By (4.4)–(4.5) and (4.7) it is sufficient to show
T(lnn)p
lnn n
αp
, = 1, 2, 3, (4.8)
for the conclusion of Theorem4.2.
To estimateT1, using the Hölder inequality, we find that
T1(lnn)p–1
j1
j=j0 3
i=1
2j(p–2)
k∈∧j
Eβˆji,k–βji,k2p 1 2E(1
{| ˆβji,k–βji,k|≥λj/2}) 1
2.
Note thatE(1{| ˆβij,k–βji,k|≥λj/2}) =P(| ˆβ i
j,k–βji,k| ≥ λj
2)≤2
–εjdue to Lemma2.3. Takingεsuch
thatε>p, we conclude that
T1(lnn)p–1n–
p
2
j1
j=j0 3
i=1
2p–2εj(lnn)p–1n–
p
22
p
2j0(lnn)p–1n–2pss+1
thanks to Lemma2.2, ∧j22jand the choice ofj0. Hence (4.8) with= 1 holds since α≤2ss+1.
To estimateT2andT3, define
2j∗0∼
n lnn
1–2α
, 2j∗1∼
n lnn
α
s– 2r+ 2p
.
Recall that 2j0∼n2m1+1, 2j1∼ n
lnnandα:=min{ s
2s+1,
s–2r+2p
2(s–2r)+1}. Then
1 – 2α≥ 1
2s+ 1> 1
2m+ 1 and
α s–2r+2p ≤
1
2(s–2r) + 1 ≤1.
Hence 2j0≤2j∗0and 2j1∗≤2j1. Moreover, a simple computation shows that 1 – 2α≤ α
s–2r+2p,
which implies 2j∗0≤2j∗1.
Now, we estimateT2by dividingT2into
T2= (lnn)p–1
j∗0
j=j0
+ j1
j=j∗0+1
3
i=1
2j(p–2) k∈∧j
Eβˆji,k–βji,kp1{| ˆβi
j,k|≥λj,|βji,k|≥λj/2}
:=t1+t2. (4.9)
Since 1{| ˆβi
j,k|≥λj,|βij,k|≥λj/2}≤1, by Lemma2.2we know that
t1(lnn)p–1
j∗0
j=j0 3
i=1
2pj2n–
p
2 (lnn)p–1n–
p
22
j∗0
2p(lnn)p
lnn n
αp
(4.10)
thanks to ∧j22jand the choice ofj∗0. To estimatet2, we observe that
1
{| ˆβji,k|≥λj,|βji,k|≥
λj
2}≤
1
{|βji,k|≥λ2j} |βi
j,k|
λj r
This with Lemma2.2leads to
t2(lnn)p–1
j1
j=j∗0+1 3
i=1
2j(p2–2)n–
p
2
k∈∧j |βi
j,k|
λj r
. (4.11)
Note thatβj,·r2–j(s+1–
2
r)because off∗∈Bs
r,qand Lemma1.1. Then (4.11) reduces to
t2(lnn)p–
r
2–1n
r–p
2
j1
j=j0∗+1
2–j(sr+2r–
p
2) (4.12)
thanks toλj=T22– j
2
lnn
n . Denoteθ:=sr+ r
2–
p
2. Whenθ> 0,r>
p
2s+1and
t2(lnn)p–
r
2–1n
r–p
2 2–j∗0(sr+r2–
p
2)(lnn)p
lnn n
αp
(4.13)
due to the choice ofj∗0. In (4.13), we use the factα= s
2s+1in the caser>
p
2s+1.
To show (4.13) forθ≤0, definer1:= (1 – 2α)p> 0. Thenα=
s–2r+2p 2(s–2r)+1 ≤
s
2s+1 andr≤
p
2s+1≤(1 – 2α)p=r1becauseθ≤0. The same arguments as (4.11) show that
t2(lnn)p–1
j1
j=j∗0+1 3
i=1
2j(p2–2)n–
p
2
k∈∧j |βi
j,k|
λj r1
.
It follows fromf∗∈Bsr,qand Lemma1.1that
βj,·r1≤ βj,·r≤2–j(s+1–
2
r)
due tor≤r1. Therefore, similarly to (4.12), we get that
t2(lnn)p–
r1 2–1n
r1–p
2
j1
j=j0∗+1
2j[p–22 –(s–2r–12)r1].
Note that p2– 2 – (s–2r +12)r1= 0 because ofr1= (1 – 2α)pandα=
s–2r+p2
2(s–2r)+1. Then
t2(lnn)p–
r1
2–1n
r1–p
2 (lnn)p
lnn n
αp
, (4.14)
which implies that (4.13) holds forθ≤0. The desired conclusion (4.8) with= 2 follows from (4.9)–(4.10) and (4.13)–(4.14).
Finally, by splittingT3into
T3= (lnn)p–1
j∗0
j=j0
+ j1
j=j∗0+1
3
i=1
2j(p–2)
k∈∧j
Eβji,kp1{| ˆβi
j,k|<λj,|βji,k|≤2λj}
we obtain that
e1(lnn)p–1
j∗0
j=j0 3
i=1
2jp|λj|p(lnn)
3 2p–1n–
p
22
j∗0p
2 (lnn)p
lnn n
αp
(4.16)
thanks to ∧j22jand the choice ofλjandj∗0.
To estimatee2, we use the fact 1{| ˆβji,k|≤λj,|βji,k|≤2λj}≤(
2λj
|βji,k|)
p–rbecause ofr≤p. Similarly to (4.11)–(4.13),
e2(lnn)p
lnn n
αp
(4.17)
forθ> 0, whereθ:=sr+r2–p2. Whenθ≤0, we rewritee2as follows:
e2= (lnn)p–1
j∗1
j=j∗0+1
+ j1
j=j∗1+1
3
i=1
2j(p–2) k∈∧j
Eβji,kp1{| ˆβji,k|<λj,|βji,k|≤2λj}
:=e∗1+e∗2. (4.18)
Proceeding as in (4.11) and (4.12), we find that
e∗1(lnn)p–1
lnn n
p–r
2 j ∗ 1
j=j∗0+1
2–j(sr+r–2p)(lnn)p–1
lnn n
p–r
2
2–j∗1(sr+
r–p
2 ).
This with the choice of 2j∗1 ∼( n
lnn)
α
s– 2r+ 2p
leads to
e∗1(lnn)p
lnn n
αp
(4.19)
due toα= s–
2
r+2p
2(s–2r)+1forθ≤0. Whenr≤p,
βj,·p≤ βj,·r2–j(s+1–
2
r)
thanks tof∗∈Bsr,qand Lemma1.1. Therefore
e∗2(lnn)p–1 j1
j=j∗1+1 3
i=1
2j(p–2) k∈∧j
βji,kp(lnn)p–1 j1
j=j∗1+1
2–j(sp–2rp+2).
Combining this with the choice of 2j∗1∼( n
lnn)
α
s– 2r+ 2p
, we observe that
e∗2(lnn)p–12–j∗1(sp– 2p
r+2)(lnn)p
lnn n