Wavelet optimal estimations for a two dimensional continuous discrete density function over \(L^{p}\) risk

(1)

R E S E A R C H

Open Access

Wavelet optimal estimations for a

two-dimensional continuous-discrete density

function over

L

p

risk

Lin Hu

1

_{, Xiaochen Zeng}

2

_{and Jinru Wang}

2*

*_{Correspondence:}

[email protected] 2_{Department of Applied}

Mathematics, Beijing University of Technology, Beijing, P.R. China Full list of author information is available at the end of the article

Abstract

The mixed continuous-discrete density model plays an important role in reliability, ﬁnance, biostatistics, and economics. Using wavelets methods, Chesneau, Dewan, and Doosti provide upper bounds of wavelet estimations onL2_{risk for a}

two-dimensional continuous-discrete density function over Besov spacesBs r,q. This

paper deals withLp₍₁_≤_p_<_∞_{) risk estimations over Besov space, which generalizes}

Chesneau–Dewan–Doosti’s theorems. In addition, we ﬁrstly provide a lower bound of Lp_{risk. It turns out that the linear wavelet estimator attains the optimal convergence}

rate forr≥p, and the nonlinear one oﬀers optimal estimation up to a logarithmic factor.

Keywords: Wavelets; Density estimation; Continuous-discrete density; Optimality

1 Introduction

1.1 Introduction

The density estimation plays an important role in both statistics and econometrics. This paper considers a two-dimensional density estimation model deﬁned over mixed continu-ous and discrete variables [2]. More precisely, let (X1,Y1), (X2,Y2), . . . , (Xn,Yn) be indepen-dent and iindepen-dentically distributed (i.i.d.) observations of a bivariate random variable (X,Y), whereXis a continuous random variable, andYis a discrete one. The joint density func-tion of (X,Y) is given by

f(x,v) = ∂

∂xF(x,v)

withF(x,v) =P(X≤x,Y=v) being the distribution function of (X,Y). We are interested in estimatingf(x,v) from (X1,Y1), (X2,Y2), . . . , (Xn,Yn). This continuous-discrete density model also arises in survival analysis, economics, and social sciences. For example, con-sider a series system withmcomponents, which fails as soon as one of the components fails. LetXbe the failure time of the system, and letYbe the component whose failure resulted in the failure of the system. Then (X,Y) is a bivariate continuous-discrete random variable. For more examples, see [1] and [4].

The conventional kernel method gives a nice estimation for the continuous-discrete density function [1,10,14]. However, it is hard to provide the optimal estimation for the

(2)

densities in Besov spaces. In addition, the complexity of bandwidth selection increases the diﬃculty of the kernel method.

Recently, wavelet methods have made the remarkable achievements in density estima-tion [7,8,11,12,15] due to their time and frequency localization, multiscale decompo-sition, and fast algorithm in numerical computations. In fact, wavelet estimation attains optimality for densities in Besov spaces, which avoids the disadvantage of kernel methods. Using the wavelet method, Chesneau et al. [2] constructed linear and nonlinear wavelet estimators for a two-dimensional continuous-discrete density function and derived their mean integrated squared errors performance over Besov balls.

This paper addressesLp ₍₁_≤_p_<_∞_{) risk estimations on Besov balls by using wavelet} bases, which generalizes Chesneau–Dewan–Doosti’s theorems. It should be pointed out that a lower bound forLp_{risk of all estimators is derived ﬁrstly. It turns out that the linear} wavelet estimator is optimal forr≥pand the nonlinear one attains optimal estimation up to a logarithmic factor.

1.2 Notations and deﬁnitions

In this paper, we use the tensor product method to construct an orthonormal wavelet basis forL2₍_R2_{), which will be used in later discussions. With a one-dimensional Daubechies}

scaling functionD2N and a wavelet functionψ2N (ψ2Ncan be constituted from the scaling functionD2N), we construct two-dimensional tensor product waveletsϕ,ψ1,ψ2, andψ3 as follows:

ϕ(x,y) :=D2N(x)D2N(y), ψ1(x,y) :=D2N(x)ψ2N(y),

ψ2(x,y) :=ψ2N(x)D2N(y), ψ3(x,y) :=ψ2N(x)ψ2N(y).

Thenϕandψi₍_i_{= 1, 2, 3) are compactly supported in time domain, because Daubechies’} waveletD2N andψ2N are [5,8].

Denote

ϕj,k(x,y) := 2jϕ

2jx–k1, 2jy–k2

, ψ_ji_,_k(x,y) := 2jψi2jx–k1, 2jy–k2

fork= (k1,k2)∈Z2andi= 1, 2, 3. Then for eachf ∈L2(R2),

f =

k∈Z2

αj0,kϕj0,k+

∞

j=j0 3

i=1

k∈Z2 β_ji_,_kψ_ji_,_k

holds inL2 sense, whereαj,k:=f,ϕj,k,βji,k:=f,ψji,k. As usual, letPjbe the orthogonal projection operator deﬁned by

Pjf :=

k∈Z2

f,ϕj,kϕj,k.

(3)

One of advantages of wavelet bases is that they can characterize Besov spaces, which contain Hölder spaces andL2_{-Sobolev spaces as particular examples. Throughout the}

pa-per, we work within a Besov space on a compact subset ofR2_{. The following lemma shows}

equivalent deﬁnitions for those spaces, which are fundamental in our discussions.

Lemma 1.1([13]) Letϕ be an m-regular orthonormal scaling function with the corre-sponding waveletsψi(i= 1, 2, 3).If f ∈Lr₍_R2_),_α

j,k=f,ϕj,kβji,k=f,ψji,k,and1≤r,q≤

∞, 0 <s<m.Then following assertions are equivalent:

(i) f∈Bs_r_,_q(R2); (ii) {2js_P

j+1f–Pjfr}j≥0∈lq; (iii) {2j(s+1–2p)_β

j,·r}j≥0q<∞. The Besov norm off can be deﬁned by

fBsr,q:=αj0,·r+2 j(s+1–_p2)

βj,·r

j≥j0q,

whereαj0,·rr:=

k∈Z2|αj0,k|randβj,·rr:= 3

i=1

k∈Z2|β_ji_,_k|r.

Here and further,ABmeans thatA≤CBfor some constantC> 0 independent ofA

andB,ABmeansBA, andA∼Bstands for bothABandAB.

Remark1.1 By (i) and (ii) of Lemma1.1we observe that

Pjf –fr=

∞

l=j

(Pl+1f–Plf) r

≤

∞

l=j

Pl+1f–Plfr

∞

l=j

2–ls2–js

forf∈Bs

r,q(R2). Hence

Pjf –fr2–js. (1.1)

Remark1.2 Whenr≤p, Lemma1.1(i) and (iii) imply that, fors–_p2=s–2_r > 0,

Bs_r_,_qR2→Bs_p_,_qR2,

where A→Bstands for a Banach spaceAcontinuously embedded in another Banach spaceB. More precisely,uB≤CuA(u∈A) for some constantC> 0.

Lemma 1.2([13]) Letϕ∈L2₍_R2₎_{be a scaling function or a wavelet with}_sup

k∈Z2|ϕ(x–k)|<

∞. Then, forλ={λk} ∈lp(Z2) and 1≤p≤ ∞,

k∈Z2

λkϕj,k p

∼2j(1–2/p)λp.

Hereλpis thelp(Z2) norm ofλ∈lp(Z2):

λp:= ⎧ ⎨ ⎩

(_k_∈_Z2|λk|p)1/p ifp<∞,

(4)

1.3 Main results

In this subsection, we state our main results and discuss relations to some other work. To do that, we propose a new bivariate functionf∗(x,y), which is an improved one of that in

[2]. Deﬁne

f∗(x,y) :=

m

v=1

u(y,v)P(Y=v)f(x|Y=v)

with

u(y,v) = ⎧ ⎪ ⎨ ⎪ ⎩

1

1+e

1

y–v+y–1v+1

1(v–1,v)(y) + e 1

y–v+y–1v–1

1+e

1

y–v+y–1v–1

1(v,v+1)(y), y=v,

1, y=v,

where 1Dis the indicator function of a setD.

The construction off∗follows the idea proposed by Chesneau [2] but is diﬀerent from [2]. The weight u(y,v) equals to characteristic function 1_{_v_–1

2≤y<v+12} in [2]. By a careful

verification our weightu(y,v) is differentiable with respect toyfor eachv∈ {1, 2, . . . ,m}. The modification ofu(y,v) from the characteristic function to the smooth one makesf∗

continuous iny. It is easy to see that, for anyy=v∈ {1, 2, . . . ,m},

f∗(x,y) =f(x,v).

Hence, the problem is converted to construct an estimator off∗. As in [2], we assume that f∗belongs to the spaceBsr,q(H,Q) or, equivalently,f∗belongs to the Besov ball

Bs_r_,_q(H) :=f,f ∈Bs_r_,_qR2andfBsr,q ≤H

and that the support off∗(x,·) is contained in [–Q,Q] for ﬁxedv(Q> 0,v= 1, 2, . . . ,m). To introduce the wavelet estimator, we need the wavelet coeﬃcient estimators ofαj,k andβ_ji_,_k:

ˆ

αj,k= 1

n

l=1

R

ϕj,k(Xl,y)u(y,Yl)dy, βˆji,k= 1

n

l=1

R

ψ_ji_,_k(Xl,y)u(y,Yl)dy. (1.2)

Deﬁne∧j0:={k∈Z 2_,_supp_f

∗∩suppϕj0,k=∅}. Whenf∗andϕhave compact supports, the

cardinality of∧jsatisﬁes ∧j22j. Then the linear wavelet estimator off∗is given as

fol-lows:

ˆ

f_nlin(x,y) := k∈∧j0

ˆ

αj0,kϕj0,k(x,y), (1.3)

wherej0is chosen such that 2j0∼n 1

2s+1,s:=s– (2

r –

2

p)+, andx+:=max{x, 0}. To obtain a nonlinear estimator, we takej0andj1such that 2j1∼lnnnand 2j0∼n

1 2m+1_with

m>s. Deﬁne∧j:={k∈Z2,suppf∗∩suppψji,k=∅}andλj:= T₂2– j

2

lnn

(5)

described as Lemma2.3). Then the nonlinear estimator is given by

ˆ

f_nnon(x,y) := k∈∧j0

ˆ

αj0,kϕj0,k(x,y) + j1

j=j0 3

i=1

k∈∧j

ˆ

β_ji_,_k1_{{| ˆ}_βi j,k|>λj}ψ

i

j,k(x,y). (1.4)

From the deﬁnition ofˆf_nnonwe ﬁnd that the nonlinear estimator has the advantage to be adaptive, since it does not depend on the indicess,r,qandHin its construction.

The following theorem gives a lower bound estimation forLprisk.

Theorem 1.1 Letf be an estimator of fˆ ∗∈Bsr,q(H)with s>2rand r,q≥1.Then there exists

C> 0such that,for1≤p<∞,

sup

f∗∈Bsr,q(H)

Eˆfn–f∗pp≥Cmax

n–2sps+1_,

lnn n

–(s– 2r+ 2p)p 2(s– 2_r)+1

.

The upper bounds of the linear and nonlinear wavelet estimators are provided by The-orems1.2and1.3, respectively.

Theorem 1.2 Letfˆlin

n be the estimator of f∗∈Bsr,q(H,Q)deﬁned by(1.3)with1≤r,q<∞,

s> 0.If the density of X is bounded,then for r≥p≥1or r≤p<∞and s>2_r,

sup

f∗∈Bsr,q(H,Q)

Efˆ_nlin–f∗p_pn–

ps

2s+1

with s=s– (2

r –

2

p)+and x+:=max(x, 0).

Remark1.3 Ifr≥2,p= 2 ands> 0,s=s, then Theorem1.2reduces to Theorem 4.1 in [2]. In addition, Theorem1.2does not make any restriction onQ, and so the assumptions are weaker than in [2]. Theorem1.2extends the corresponding theorem of [2] fromp= 2 top∈[1,∞).

Whenr≥p,s=sand the linear wavelet estimatorfˆlin

n attains optimality thanks to The-orems1.1and1.2. However, the linear estimator does not oﬀer optimal estimation for

r<p, because ofs<sand s

2s+1<

s

2s+1 in this case.

To give a suboptimal estimation forr<p, we need the nonlinear wavelet estimators deﬁned by (1.4).

Theorem 1.3 Letfˆ_nnonbe the estimator of f∗∈Bs_r_,_q(H,Q)deﬁned by(1.4)with1≤r,q<∞,

s> 0.If the density of X is bounded,then for r≥p≥1or r≤p<∞and s>2_r,

sup

f∗∈Bsr,q(H,Q)

Efˆ_nnon–f∗p_p(lnn)p

lnn n

αp

withα:=min{₂_ss₊₁, s–

2

r+2p

(6)

Remark1.4 Theorems1.1and1.3tell us that the nonlinear estimator is suboptimal up to a logarithmic factor. Moreover, ifp= 2 and{r≥2,s> 0}or{1≤r< 2,s>2_r}, thenα=₂_ss₊₁, and Theorem1.3is the same as Theorem 4.2 in [2] up to a logarithmic factor. Hence Theorem1.3can be considered as an extension of Theorem 4.2 in [2] fromp= 2 top∈

[1,∞).

In particular, we can extend the theorems to the multidimensional case as in [3] by using the technique developed by [9]. It is a challenging problem to study the estimation of a multivariate continuous-discrete conditional density. We refer to [3] for further details.

2 Some lemmas

We shall show several lemmas in this section, which are needed for proofs of our main theorems.

Lemma 2.1 Letαˆj,kandβˆj,kbe deﬁned by(1.2).Then

E(αˆj,k) =αj,k and E _ˆ

β_ji_,_k=β_ji_,_k

for j≥j0,k∈Z2,and i= 1, 2, 3. Proof Denotecj,k(v) =

φj,k2(y)u(y,v)dy. Then

ˆ

αj,k= 1

n

i=1

ϕj,k(Xi,y)u(y,Yi)dy= 1

n

i=1

φj,k1(Xi)cj,k(Yi).

Since (X1,Y1), (X2,Y2), . . . , (Xn,Yn) are independent and identically distributed, we have

E(αˆj,k) =E

φj,k1(X1)cj,k(Y1)

=EEφj,k1(X1)cj,k(Y1)|Y1

=Ecj,k(Y1)E

φj,k1(X1)|Y1

=E

cj,k(Y1)

φj,k1(x)f(x|Y1)dx

= m

v=1

P(Y1=v)cj,k(v)

φj,k1(x)f(x|Y1=v)dx

=

m

v=1

P(Y1=v)u(y,v)f(x|Y1=v)

φj,k1(x)φj,k2(y)dx dy

= f∗(x,y)ϕj,k(x,y)dx dy=αj,k.

Similarly to the previous arguments,E(βˆi

j,k) =βji,k. The proof of Lemma2.1is done.

To show Lemma2.2, we introduce Rosenthal’s inequality.

Rosenthal’s inequality ([8]) LetX1,X2, . . . ,Xn be independent random variables such thatEXl= 0 andE|Xl|p<∞(l= 1, 2, . . . ,n). Then, withCp> 0,

E

n

l=1 Xl

p

≤

⎧ ⎨ ⎩

Cp[ n

l=1E|Xl|p+ ( n

l=1E|Xl|2)p/2], p≥2,

Cp( n

(7)

Lemma 2.2 Letαˆj,kandβˆj,kbe deﬁned by(1.2).If the density of X is bounded,then there

exists a constant C> 0such that

E| ˆαj,k–αj,k|p≤2– p

2j_n–

p

2 _and _E| ˆ_β_j_,_k_–_β_j_,_k|p≤₂–

p

2j_n–

p

2

for1≤p<∞and2j_≤_n_.

Proof We only prove the ﬁrst inequality, since the second one is similar. By the deﬁnition ofαˆj,k,

ˆ

αj,k= 1

n

l=1

Rϕj,k(Xl,y)u(y,Yl)dy= 1

n

l=1

φj,k1(Xl)cj,k2(Yl),

wherecj,k2(Yl) :=

Rφj,k2(y)u(y,Yl)dy, andφis a one-dimensional Daubechies scaling

func-tionD2N. Since|u(y,v)| ≤2, we obtain that _c_j_,_k

2(Yl)≤

R

φj,k2(y)u(y,Yl)dy≤2 –2j_φ

1 (2.1)

and

Eφj,k1(Xl)cj,k2(Yl)

p

2–p2j_E_φ

j,k1(Xl)

p

2–p2j

R

φj,k1(x)

p

fX(x)dx2–j (2.2)

due to the boundedness offX. Deﬁneξl:=φj,k1(Xl)cj,k2(Yl) –αj,k. Then

E|ξl|p=Eφj,k1(Xl)cj,k2(Yl) –αj,k

p

Eφj,k1(Xl)cj,k2(Yl)

p

+E|αj,k|p. (2.3) It follows from Lemma2.1and Jensen’s inequality that

E|αj,k|p=E

φj,k1(Xl)cj,k2(Yl)

p

≤Eφj,k1(Xl)cj,k2(Yl)

p .

Hence (2.3) reduces to

E|ξl|pEφj,k1(Xl)cj,k2(Yl)

p

2–j _(2.4)

thanks to (2.2). By the deﬁnition ofαˆj,kandξl,αˆj,k–αj,k=1_nnl=1ξl, whereξ1,ξ2, . . . ,ξnare independent because (X1,Y1), (X2,Y2), . . . , (Xn,Yn) also are. On the other hand, Lemma2.1 impliesE(ξl) = 0. Then Rosenthal inequality leads to

E| ˆαj,k–αj,k|p=E 1

n

l=1 ξl

p

⎧ ⎨ ⎩

n–p_[n

l=1E|ξl|p+ ( n

l=1E|ξl|2) p

2_], _p≥_2, n–p₍n

l=1E|ξl|2) p

2, 1≤p≤2.

(2.5)

By (2.4) we know that

n–p

_n

l=1 E|ξl|2

p

2

n–pn2–j p

(8)

for 1≤p< 2 and

n–p

_n

l=1

E|ξl|p+ _n

l=1 E|ξl|2

p

2

n–pn2–j+np2₂–

p

2j_n–

p

2₂–

p

2j

forp≥2 thanks to the assumption 2j_≤_n_{. Combining these with (}_2.5_{), we receive the} desired conclusion

E| ˆαj,k–αj,k|p2– p

2j_n–

p

2_.

This completes the proof.

To prove Lemma2.3, we need the well-known Bernstein inequality.

Bernstein’s inequality([8]) LetX1,X2, . . . ,Xnbe i.i.d. random variables withE(Xi) = 0 andXi∞≤M. Then, for eachγ> 0,

P

1

n

i=1 Xi

>γ

≤2exp

– nγ

2

2[E(X2

i) +X∞γ/3]

.

The next lemma is an extension of Proposition 4.2 in [2].

Lemma 2.3 Let2j≤_lnn_n,βˆ_ji_,_k(i= 1, 2, 3)be deﬁned in(1.2).If the density of X is bounded,

then for eachε> 0,there exists T> 0such that,for j≥0and k∈Z2_,

Pβˆ_ji_,_k–β_ji_,_k>T 22

–1₂j

lnn n

2–εj. (2.6)

Proof We only show (2.6) fori= 1. By the deﬁnition ofβˆ_j1_,_k,βˆ_j1_,_k=1

n n

l=1

Rψj1,k(Xl,y)u(y,

Yl)dy, and

ˆ

β_j1_,_k–β_j1_,_k=1

n

l=1

φj,k1(Xl)dj,k2(Yl) –β 1

j,k

,

where dj,k2(Yl) :=

Rψj,k2(y)u(y,Yl)dy (φ, ψ stand for the one-dimensional Daubechies

scaling function and wavelet function, respectively). Deﬁne ηl:=φj,k1(Xl)dj,k2(Yl) –βj1,k. Thenβˆ_j1_,_k–β_j1_,_k=_n1n_l₌₁ηlandE(ηl) = 0 because ofβ_jl_,_k=E(βˆ_jl_,_k) =E[φj,k1(Xi)dj,k2(Yi)].

Using (2.1) with ψ instead of φ, we get |dj,k2(Yl)| 2– j

2. Note that |_φ_j_,_k

1(Xl)| := 22j|_φ₍₂j_X_l_–_k₁₎| ≤₂

j

2_φ_∞_{. Then}|_φ_j_,_k

1(Xl)dj,k2(Yl)|1 and|βj1,k|=|E[φj,k1(Xl)dj,k2(Yl)]|

1. Hence

|ηl| ≤φj,k1(Xl)dj,k2(Yl) –β 1

j,k1. (2.7)

By replacingcj,k2andαj,kwithdj,k2 andβj1,k, respectively, arguments similar to (2.1)–(2.4) show that

(9)

Becauseη1,η2, . . . ,ηnare i.i.d. andE(ηl) = 0 (l= 1, 2, . . . ,n), Bernstein’s inequality tells us that

P

βˆ_jl_,_k–β_jl_,_k= 1

n

l=1 ηl

>

T

22

–1₂j

lnn n

≤2exp

– nλ

2

j 2[E(η2_l) +λ₃jη∞]

(2.9)

withλj=T₂2–

1 2j

lnn

n . This with (2.7)–(2.8) implies

nλ2_j

2[E(η_l2) +λj

3η∞]

≥ T2lnn

8(C1+C₆2T2

j

2

lnn n )

≥ T2lnn

8(C1+C₆2T)

because 22j

lnn

n ≤1 by the assumption 2j≤ n

lnn. Note thatlnn>jln2 due ton≥2jlnn> 2j. Hence

nλ2

j 2[E(η2

l) + λj

3η∞]

≥ T2ln2

8(C1+C₆2T) j>εj

by choosingT> 0 such that T2ln2

8(C1+C2 6T)

>ε. Then (2.9) reduces to

Pβˆ_j1_,_k–β_j1_,_k>T 22

–1₂j

lnn n

≤2–εj,

which concludes (2.6) withi= 1. Similarly, the conclusions withi= 2, 3 hold. This

com-pletes the proof.

At the end of this section, we introduce two classical lemmas, which are needed for the proof of lower bound.

Lemma 2.4 (Varshamov–Gilbert lemma, [11]) Let :={ε= (ε1,ε2, . . . ,εm),εi∈ {0, 1}}.

Then there exists a subset(ε0_,_ε1_{, . . . ,}_εT₎_of_with_ε0_{= (0, 0, . . . , 0)}_{such that T}_≥₂m8 _and

m

k=1

ε_ki–εj_k≥m

8 (0≤i=j≤T).

To state Fano’s lemma,we introduce a concept:When P is absolutely continuous with

re-spect to Q(denoted by PQ),the Kullback divergence of P and Q between two measures

P and Q is deﬁned by

K(P,Q) :=

p(x)lnp(x) q(x)dx,

where p(x)and q(x)are the density functions of P and Q,respectively.

Lemma 2.5(Fano’s lemma, [6]) Let(,F,Pk)be a probability spaces,and let Ak∈F,

(10)

Km:=inf0≤k≤m_m1

k=kK(Pk,Pk),

sup

0≤k≤m

Pk

AC_k≥min

1 2,

√

mexp–3e–1–Km

,

where K(Pk,Pk)is the Kullback distance of Pkand Pk(k= 0, 1, . . . ,m).

3 Proofs of lower bounds

We rewrite Theorem1.1as follows before giving its proof.

Theorem 3.1 Letfˆnbe an estimator of f∗∈Bsr,q(H)with s>2r and1≤r,q≤ ∞.Then,for 1≤p<∞,

sup

f∗∈Bsr,q(H)

Eˆfn–f∗ppmax

n–2ssp+1_,

lnn n

(s– 2_r+ 2_p)p

2(s– 2_r)+1

.

Proof As in Sect.1, we take the two-dimensional tensor product wavelet

ψ1(x,y) :=D2N(x)ψ2N(y),

whereD2N(·) andψ2N(·) are the one-dimensional Daubechies scaling function and wavelet function, respectively. Thenψ1_is_m_{-regular (}_m_>_s_{) for large}_N_{, and}

suppψ1⊆[0, 2N– 1]×[–N+ 1,N]

due tosuppD2N⊆[0, 2N– 1] andsuppψ2N⊆[–N+ 1,N]. Then there exists a compactly supported density functiong0 such that

R2g0(x)dx= 1,g0(x)|[0,2N–1]×[–N+1,N]=c0, and g0∈Bsr,q(H). Deﬁnej:=1j×2j with

1_j :=0, 2N, 4N, . . . , 22j– 1N, 2_j :=0,±2N,±4N, . . . ,±22j–1– 1N. Then j= 2j(2j– 1)∼22j( jdenotes the cardinality ofj). Denoteaj:= 2–(2s+1)jand

∧:=

gε(x,y) =g0(x,y) +aj

k∈j

εkψj1,k(x,y),εk∈ {0, 1}

.

Obviously, the supports ofψ_j1_,_kandψ_j1_,_kare disjoint fork=k∈jandsuppψj1,k⊆suppg0.

When (x,y)∈[0, 2N– 1]×[–N+ 1,N],

gε≥c0–ajψj1,k∞≥c0– 2–2sjψ1_∞> 0

for largej. On the other hand,

R2gε(x,y)dx=

R2g0(x,y)dx= 1.

(11)

Moreover,gε∈Bs_r_,_q(H). In fact, forεk∈ {0, 1},

k∈j|εk|

r_≤₂2j_and

2j(s+1–2r)_a_j k∈j

|εk|r 1

r

≤1.

By Lemma1.1,aj

k∈jεkψ

1

j,kBsr,q≤H. This withg0∈B s

r,q(H) impliesgε∈Bs_r_,_q(H). According to Lemma 2.4 (Varshamov–Gilbert theorem), for ={ε= (εk)k∈j,εk ∈

{0, 1}}, there exists a subset{ε(0),ε(1), . . . ,ε(M)}ofsuch thatM≥222 j

8 _,_ε(0)_{= (0, 0, . . . , 0),}

and form,n= 0, 1, . . . ,M,m=n,

k∈j

ε(_km)–ε_k(n)≥2 2j

8 . (3.1)

Denote∧:={gε(0),g_ε(1), . . . ,g_ε(M)}. Then∧⊆ ∧, and forg_ε(m),g_ε(n)∈ ∧,

gε(m)–g_ε(n)p_p=ap_j

k∈j

ε_km–ε_knpψ_j1_,_kp

p= 2

–2(sp+1)j_ψ1p p

k∈j

εm_k –ε_knp,

since the supports ofψ_j1_,_k(k∈j) are mutually disjoint. This with (3.1) leads to

gε(m)–g_ε(n)p_p≥C12–2psj:=δpj.

Deﬁne

Aε(i):=

ˆfn–gε(i)p<

δj 2

,

i= 0, 1, 2, . . . ,M. ThenAε(m)∩A_ε(n)=∅form=n. Denote byP_fnthe probability measure

with the densityfn₍_x_,_y_{) :=}n

i=1f(xi,yi). By the construction ofgε(i),Pn_g

ε(i) P

n

g0. Then it

follows from Lemma2.5(Fano’s lemma) that

sup

0≤i≤M

Pn_g

ε(i)

ˆfn–gε(i)p≥

δj 2

≥ sup

0≤i≤M

P_gn

ε(i)

Ac_ε(i)

≥min

1 2,

√

Me–3e_e–KM

.

Furthermore,

Eˆfn–gε(i)p_p≥ δp_j

2pP n g

ε(i)

ˆfn–gε(i)_p≥ δj 2

≥2–2psjP_gn

ε(i)

Ac_ε(i)

.

Taking 2j∼n2(21s+1)_{, we obtain that}

sup

0≤i≤M

Eˆfn–gε(i)p_p≥2–2psj sup 0≤i≤M

Pn_g

ε(i)

Ac_ε(i)

≥n–2pss+1_min

1 2,

√

Me–3e_e–KM

(12)

withKM:=inf0≤v≤M_M1

i=vK(Pgn_ε(i),P

n

g_ε(v)). By the deﬁnition of Kullback divergence,

KPn_g

ε(i),P

n g0

=

R2n

ln

n

i=1gε(i)(xi,yi) n

i=1g0(xi,yi) n !

i=1

gε(i)(x_i,y_i)dx₁dy₁dx₂dy₂· · ·dx_ndy_n

=n

R2gε

(i)(x1,y1)ln

gε(i)(x1,y1) g0(x1,y1)

dx1dy1

≤n

R2

g_ε(i)(x₁,y₁)

gε(i)(x1,y1) g0(x1,y1)

– 1 dx1dy1, (3.3)

where we applied the inequalitylnu≤u– 1 foru> 0 in the last inequality. Note that

R2gε (i)(x1,y1)

gε(i)(x1,y1) g0(x1,y1)

– 1 dx1dy1

=

R2

g0(x1,y1)

–1

gε(i)(x₁,y₁) –g₀(x₁,y₁)

2 dx1dy1

andg0(x1,y1) =c0for (x1,y1)∈[0, 2N– 1]×[–N+ 1,N]. Combining this with the Parseval

identity, we reduce (3.3) to

KPn_g

ε(i),P

n g0

≤nc–1₀ a2_j

k∈j

εi_kψ_j1_,_k(x,y)

2

=nc–1₀ a2_j

k∈j

ε_ki2≤nc₀–1a2_j22j. (3.4)

Hence

KM≤ 1

M

i=1 KPn_g

ε(i),P

n g0

≤c–1₀ na2_j22j.

On the other hand, 2j_∼_n2(21s+1) _implies_na2

j ≤C. Then it follows fromM≥2

22j

8 ≥e22jln82

that

√

Me–KM≥_e22jln162–c–10 C22j≥₁

by choosingC> 0 such thatC<ln2

16c0. This with (3.2) leads to

sup

0≤i≤M

Eˆfn–gε(i)p_p≥n–

ps

2s+1_min

1 2,

√

Me–3e_e–KM

n–2pss+1_. _(3.5)

Now, it remains to show that

sup

f∗∈Bsr,q(H)

Eˆfn–f∗pp≥C

lnn n

(s– 2_r+ 2_p)p

2(s– 2_r)+1

. (3.6)

Similarly to the proof of (3.5), we construct the family of density functions{gk,k∈j}as follows:

(13)

whereaj:= 2–j(s+1–

2

r)_{. Obviously,}

R2gk(x,y)dx dy=

R2g0(x,y)dx dy= 1, and

gk(x,y)|[0,2N–1]×[–N+1,N]≥c0– 2–j(s– 2

r)_ψ1

∞> 0

for largejsinces>2_r. Thengk is a bivariate density function for ﬁxedk∈j. From the proof of (3.5) we know thatg0∈Bsr,q(H). This with

ajψj1,kBsr,q∼aj2

j(s+1–2_r)_≤₁

impliesgk∈Bsr,q(H) fork∈j. To prove (3.6), we need to show that

sup

k∈j

Eˆfn–gkpp≥C

lnn n

(s– 2_r+ 2_p)p

2(s– 2_r)+1

. (3.7)

Whenk=k∈j,suppψ_j1_,_k∩suppψ_j1_,_k=∅and

gk–gkpp=a p

jψj1,k–ψj1,k p p= 2a

p

j2j(p–2)ψ1 p p= 2·2

–j(s–2_r+2_p)p

ψ1p_p.

Moreover,

gk–gkp= 2

1

p_ψ1 p2

–j(s–2_r+2_p)

:=δj.

DeﬁneBk:={ˆfn–gkp< δj

2}. ThenBk∩Bk=∅(k=k). According to Lemma2.5(Fano’s

lemma), we ﬁnd that

sup

k∈j

Pn_g

k

ˆfn–gkp≥

δj 2

≥min

1 2,

√

Me–3e–1e–KM

, (3.8)

where M= j andKM :=inf0≤v≤M_M1

k=vK(Pngk,P n gv)≤

1

M

k=0K(Pngk,P n

g0). Similar to

(3.3)–(3.4), we conclude that

KPn_g

k,P n gv

≤n

R2

g0(x,y)

–1

gk(x,y) –g0(x,y)

2

dx dy≤c–1₀ C1na2j.

HenceKM≤c–10 C1na2j. By taking 2j∼(lnnn)

1

2(s– 2_r)+1_{we obtain that}_ln₂j_≥_C_ln_n_and_e–KM≥

e–c–10 C1na2j ≥_e–c–10 Clnn_{, thanks to}_na2

j ≤C2lnn(C=C1C2). Moreover, choosingC1andC

such thatC>c–1

0 C, we have

√

Me–3e–1e–KM_eln2j_e–3e–1_e–KM≥_eClnn–c–10 Clnn–3e–1₁

due toM∼22j. This with (3.8) impliessup_k_∈

jP n

gk(ˆfn–gkp≥ δj

2)1. Furthermore,

sup

k∈j

Eˆfn–gkpp≥

δ_jp

2pPgk

ˆfn–gkp≥

δj 2

(14)

Then the desired conclusion (3.7) follows fromδj:= 2

1

p_ψ1

p2–j(s–

2

r+2p)_{and the choice of}

2j_∼₍ n

lnn)

1

2(s– 2_r)+1_{. This completes the proof.}

4 Proofs of upper bounds

In this section, we prove the upper bounds of wavelet estimators. The result of the linear one is derived ﬁrstly. We restate and prove Theorem1.2as Theorem4.1.

Theorem 4.1 Letfˆlin

n be the linear estimator of f∗∈Bsr,q(H,Q)deﬁned in(1.3)with1≤

r,q<∞,s> 0.If the density of X is bounded,then for{r≥p≥1}or{r≤p<∞and s>2_r},

sup

f∗∈Bsr,q(H,Q)

ps

2s+1

with s=s– (2

r –

2

p)+and x+:=max{x, 0}.

Proof Whenr≤p, s:=s– (2_r – 2_p)+=s–2_r +2_p andBsr,q(R2)→Bs

p,q(R2) thanks to Re-mark1.2. Then

sup

f∗∈Bsr,q(H,Q)

Efˆ_nlin–f∗p_p sup

f∗∈Bsr,q(H,Q)

Efˆ_nlin–f∗p_p.

Whenr>pandf∗has a compact support, thenfˆ_nlindoes due toϕhaving the same property. By the Hölder inequality,

sup

f∗∈Bsr,q(H,Q)

Efˆlin

n –f∗ p

p sup f∗∈Bsr,q(H,Q)

Efˆlin

n –f∗ r r

p r_.

Becauses=sin that case, it is suﬃcient to prove that

sup

f∗∈Bs r,q(H,Q)

ps

2s+1 (4.1)

for the conclusion of Theorem4.1. Recall thatˆflin

n :=

k∈∧j₀αˆj0,kφj0,k. Then by Lemma2.1we conclude that

Efˆ_nlin–Efˆ_nlinp_p=E

k∈∧j₀

(αˆj0,k–αj0,k)ϕj0,k p

p

2j0(p–2)

k∈∧j₀

E| ˆαj0,k–αj0,k|p

due to Lemma1.2. It follows from Lemma2.2and ∧j022j0that

Efˆ_nlin–Eˆf_nlinp_p2j0(p–2)₂2j0₂–pj20_n–

p

2 ₂

pj0 2 _n–

p

2 _n–

ps

2s+1 (4.2)

thanks to the choice of 2j0∼_n2s1+1_.

On the other hand, by Lemma2.1,E(fˆlin

n ) =

k∈∧j₀αj0,kϕj0,k=Pj0f∗. Combining this with f∗∈Bs_p_,_q(R2) and Remark1.1, we get that

Efˆ_nlin–f∗p_p=Pj0f∗–f∗

p p2–j0ps

(15)

Taking 2j0∼_n2s1+1_{, it is easy to show}

ps

2s+1_. _(4.3)

Hence, by (4.2)–(4.3),

sup

f∗∈Bsr,q(H,Q)

Efˆ_nlin–f∗p_p sup

f∗∈Bsr,q(H,Q)

Efˆ_nlin–Efˆ_nlinp_p+ sup

f∗∈Bsr,q(H,Q)

Efˆ_nlin–f∗p_p

n– ps 2s+1,

which means that (4.1) holds. The proof is done.

Next, we are in a position to prove the conclusion of the nonlinear one.

Theorem 4.2 Letfˆnon

n be the nonlinear estimator of f∗∈Bsr,q(H,Q)deﬁned in(1.4)with 1≤r,q<∞,s> 0.If the density of X is bounded,then for{r≥p≥1}or{r≤p<∞and s>2_r},

sup

f∗∈Bsr,q(H,Q)

lnn n

αp

withα:=min{₂_ss₊₁, s–

2

r+2p

2(s–2_r)+1}.

Proof We only need to prove the caser≤p. In fact, whenr>p,fˆnon

n has a compact support because ofϕ,ψ, andf∗have the same property. By the Hölder inequality,

sup

f∗∈Bsr,q(H,Q)

Efˆ_nnon–f∗p_p sup

f∗∈Bsr,q(H,Q)

Eˆf_nnon–f∗r_r

p r_.

Using Theorem 4.2 for the case r = p, we ﬁnd that sup_f_∗_∈_Bs

r,q(H,Q)Eˆf

non

n – f∗rr (lnn)r₍lnn

n )

αr_{, and therefore}

sup

f∗∈Bsr,q(H,Q)

lnn n

αp .

It remains to estimate the caser≤p. Recall that

ˆ

f_nnon–f∗=fˆnlin–Pj0f∗

+ (Pj1+1f∗–f∗) +

j1

j=j0 3

i=1

k∈∧j _ˆ

β_ji_,_k1_{{| ˆ}_βi j,k|>λj}–β

i j,k

ψ_ji_,_k

withλj=T2– j

2

lnn

n . Denotefj0,j1:=

j1

j=j0

3

i=1

k∈∧j(βˆ

i

j,k1{| ˆβ_ji_,_k|>λj}–βji,k)ψji,k. Then

Efˆ_nnon–f∗p_pEˆfnlin–Pj0f∗

p

p+Pj1+1f∗–f∗ p

p+Efj0,j1

p

(16)

From the proof of Theorem4.1we obtain that

Efˆ_nlin–Pj0f∗

p p2

j0p

2 _n–

p

2

lnn n

αp

and

Pj1+1f∗–f∗

p p2–j1ps

lnn n

αp

(4.5)

due to 2j0∼_n2m1+1_{, 2}j1∼ n

lnnandα=min{ s

2s+1,

s–2r+2p

2(s–2_r)+1}.

Byfj0,j1:=

j1

j=j0

3

i=1

k∈∧j(βˆ

i

j,k1{| ˆβ_ji_,_k|>λj}–β i

j,k)ψji,kand Lemma1.2,

Efj0,j1

p

p(j1–j0+ 1)p–1

j1

j=j0 3

i=1

2j(p–2) k∈∧j

Eβˆ_ji_,_k1_{{| ˆ}β_ji_,_k|>λj}–β i j,k

p .

On the other hand, it is easy to see that

βˆ_ji_,_k1_{{| ˆ}β_ji_,_k|>λj}–β i

j,k=βˆji,k–βji,k(1{| ˆβ_ji_,_k|≥λj,|βji,k|<λj/2}+ 1{| ˆβji,k|≥λj,|βji,k|≥λj/2}) +β_ji_,_k(1_{{| ˆ}βi_j_,_k|<λj,|βij,k|>2λj}+ 1{| ˆβji,k|<λj,|βji,k|≤2λj})

and 1_{{| ˆ}βi_j_,_k|≥λj,|βij,k|<λj/2}≤1{| ˆβji,k–βji,k|>λj/2}. Then

Efj0,j1

p

pT1+T2+T3+T4 (4.6)

with

T1:= (lnn)p–1

j1

j=j0 3

i=1

2j(p–2) k∈∧j

Eβˆ_ji_,_k–β_ji_,_kp1_{{| ˆ}β_ji_,_k–β_ji_,_k|>λj/2}

,

T2:= (lnn)p–1

j1

j=j0 3

i=1

2j(p–2) k∈∧j

Eβˆ_ji_,_k–β_ji_,_kp1_{{| ˆ}β_ji_,_k|≥λj,|βji,k|≥λj/2}

,

T3:= (lnn)p–1

j1

j=j0 3

i=1

2j(p–2) k∈∧j

Eβ_ji_,_kp1_{{| ˆ}β_ji_,_k|<λj,|βji,k|≤2λj}

,

T4:= (lnn)p–1

j1

j=j0 3

i=1

2j(p–2) k∈∧j

Eβ_ji_,_kp1_{{| ˆ}β_ji_,_k|<λj,|βji,k|>2λj}

.

When| ˆβ_ji_,_k|<λjand|βji,k|> 2λj,| ˆβji,k–βji,k| ≥ |βji,k|–| ˆβji,k|>| ˆβji,k|/2. Hence β_ji_,_kp1_{{| ˆ}_βi

j,k|<λj,|βji,k|>2λj}βˆ i j,k–βji,k

p 1_{{| ˆ}_βi

j,k–βji,k|>λj/2}.

Then (4.6) reduces to

Efj0,j1

p

(17)

By (4.4)–(4.5) and (4.7) it is suﬃcient to show

T(lnn)p

lnn n

αp

, = 1, 2, 3, (4.8)

for the conclusion of Theorem4.2.

To estimateT1, using the Hölder inequality, we ﬁnd that

T1(lnn)p–1

j1

j=j0 3

i=1

2j(p–2)

k∈∧j

Eβˆ_ji_,_k–β_ji_,_k2p 1 2_E₍₁

{| ˆβ_ji_,_k–β_ji_,_k|≥λj/2}) 1

2_.

Note thatE(1_{{| ˆ}βi_j_,_k–β_ji_,_k|≥λj/2}) =P(| ˆβ i

j,k–βji,k| ≥ λj

2)≤2

–εj_{due to Lemma}_2.3_{. Taking}_ε_such

thatε>p, we conclude that

T1(lnn)p–1n–

p

2

j1

j=j0 3

i=1

2p–2εj₍_ln_n₎p–1_n–

p

2₂

p

2j0₍_ln_n₎p–1_n–2pss+1

thanks to Lemma2.2, ∧j22jand the choice ofj0. Hence (4.8) with= 1 holds since α≤₂_ss₊₁.

To estimateT2andT3, deﬁne

2j∗0∼

n lnn

1–2α

, 2j∗1∼

n lnn

α

s– 2_r+ 2_p

.

Recall that 2j0∼_n2m1+1_{, 2}j1∼ n

lnnandα:=min{ s

2s+1,

s–2_r+2_p

2(s–2_r)+1}. Then

1 – 2α≥ 1

2s+ 1> 1

2m+ 1 and

α s–2_r+2_p ≤

1

2(s–2_r) + 1 ≤1.

Hence 2j0≤₂j∗0_{and 2}j1∗≤₂j1_{. Moreover, a simple computation shows that 1 – 2}_α≤ α

s–2_r+2_p,

which implies 2j∗₀_≤₂j∗₁_.

Now, we estimateT2by dividingT2into

T2= (lnn)p–1

j∗0

j=j0

+ j1

j=j∗₀+1

₃

i=1

2j(p–2) k∈∧j

Eβˆ_ji_,_k–β_ji_,_kp1_{{| ˆ}_βi

j,k|≥λj,|βji,k|≥λj/2}

:=t1+t2. (4.9)

Since 1_{{| ˆ}_βi

j,k|≥λj,|βij,k|≥λj/2}≤1, by Lemma2.2we know that

t1(lnn)p–1

j∗0

j=j0 3

i=1

2pj2_n–

p

2 ₍_ln_n₎p–1_n–

p

2₂

j∗₀

2p₍_ln_n₎p

lnn n

αp

(4.10)

thanks to ∧j22jand the choice ofj∗0. To estimatet2, we observe that

1

{| ˆβ_ji_,_k|≥λj,|βji,k|≥

λj

2}≤

1

{|β_ji_,_k|≥λ₂j} _|_βi

j,k|

λj r

(18)

This with Lemma2.2leads to

t2(lnn)p–1

j1

j=j∗₀+1 3

i=1

2j(p2–2)_n–

p

2

k∈∧j _|_βi

j,k|

λj r

. (4.11)

Note thatβj,·r2–j(s+1–

2

r)_{because of}_f_∗∈_Bs

r,qand Lemma1.1. Then (4.11) reduces to

t2(lnn)p–

r

2–1_n

r–p

2

j1

j=j₀∗+1

2–j(sr+2r–

p

2) _(4.12)

thanks toλj=T₂2– j

2

lnn

n . Denoteθ:=sr+ r

2–

p

2. Whenθ> 0,r>

p

2s+1and

t2(lnn)p–

r

2–1_n

r–p

2 ₂–j∗0(sr+r2–

p

2)₍_ln_n₎p

lnn n

αp

(4.13)

due to the choice ofj∗₀. In (4.13), we use the factα= s

2s+1in the caser>

p

2s+1.

To show (4.13) forθ≤0, deﬁner1:= (1 – 2α)p> 0. Thenα=

s–2_r+2_p 2(s–2_r)+1 ≤

s

2s+1 andr≤

p

2s+1≤(1 – 2α)p=r1becauseθ≤0. The same arguments as (4.11) show that

t2(lnn)p–1

j1

j=j∗0+1 3

i=1

2j(p2–2)_n–

p

2

k∈∧j _|_βi

j,k|

λj r1

.

It follows fromf∗∈Bsr,qand Lemma1.1that

βj,·r1≤ βj,·r≤2–j(s+1–

2

r)

due tor≤r1. Therefore, similarly to (4.12), we get that

t2(lnn)p–

r1 2–1_n

r1–p

2

j1

j=j₀∗+1

2j[p–22 –(s–2r–12)r1]_.

Note that p₂– 2 – (s–2_r +1₂)r1= 0 because ofr1= (1 – 2α)pandα=

s–2r+p2

2(s–2_r)+1. Then

t2(lnn)p–

r₁

2–1_n

r_1–p

2 ₍_ln_n₎p

lnn n

αp

, (4.14)

which implies that (4.13) holds forθ≤0. The desired conclusion (4.8) with= 2 follows from (4.9)–(4.10) and (4.13)–(4.14).

Finally, by splittingT3into

T3= (lnn)p–1

j∗0

j=j0

+ j1

j=j∗₀+1

₃

i=1

2j(p–2)

k∈∧j

Eβ_ji_,_kp1_{{| ˆ}_βi

j,k|<λj,|βji,k|≤2λj}

(19)

we obtain that

e1(lnn)p–1

j∗₀

j=j0 3

i=1

2jp|λj|p(lnn)

3 2p–1_n–

p

2₂

j∗₀p

2 ₍_ln_n₎p

lnn n

αp

(4.16)

thanks to ∧j22jand the choice ofλjandj∗0.

To estimatee2, we use the fact 1{| ˆβ_ji_,_k|≤λj,|βji,k|≤2λj}≤(

2λj

|β_ji_,_k|)

p–r_{because of}_r_≤_p_{. Similarly to} (4.11)–(4.13),

e2(lnn)p

lnn n

αp

(4.17)

forθ> 0, whereθ:=sr+r₂–p₂. Whenθ≤0, we rewritee2as follows:

e2= (lnn)p–1

j∗₁

j=j∗₀+1

+ j1

j=j∗₁+1

₃

i=1

2j(p–2) k∈∧j

Eβ_ji_,_kp1_{{| ˆ}β_ji_,_k|<λj,|βji,k|≤2λj}

:=e∗₁+e∗₂. (4.18)

Proceeding as in (4.11) and (4.12), we ﬁnd that

e∗₁(lnn)p–1

lnn n

p–r

2 j ∗ 1

j=j∗₀+1

2–j(sr+r–2p)₍_ln_n₎p–1

lnn n

p–r

2

2–j∗1(sr+

r–p

2 )_.

This with the choice of 2j∗₁ _∼₍ n

lnn)

α

s– 2_r+ 2_p

leads to

e∗₁(lnn)p

lnn n

αp

(4.19)

due toα= s–

2

r+2p

2(s–2_r)+1forθ≤0. Whenr≤p,

βj,·p≤ βj,·r2–j(s+1–

2

r)

thanks tof∗∈Bs_r_,_qand Lemma1.1. Therefore

e∗₂(lnn)p–1 j1

j=j∗₁+1 3

i=1

2j(p–2) k∈∧j

β_ji_,_kp(lnn)p–1 j1

j=j∗₁+1

2–j(sp–2rp+2)_.

Combining this with the choice of 2j∗1∼₍ n

lnn)

α

s– 2_r+ 2_p

, we observe that

e∗₂(lnn)p–12–j∗1(sp– 2p

r+2)₍_ln_n₎p

lnn n