• No results found

Wavelet optimal estimations for a two dimensional continuous discrete density function over \(L^{p}\) risk

N/A
N/A
Protected

Academic year: 2020

Share "Wavelet optimal estimations for a two dimensional continuous discrete density function over \(L^{p}\) risk"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

R E S E A R C H

Open Access

Wavelet optimal estimations for a

two-dimensional continuous-discrete density

function over

L

p

risk

Lin Hu

1

, Xiaochen Zeng

2

and Jinru Wang

2*

*Correspondence:

[email protected] 2Department of Applied

Mathematics, Beijing University of Technology, Beijing, P.R. China Full list of author information is available at the end of the article

Abstract

The mixed continuous-discrete density model plays an important role in reliability, finance, biostatistics, and economics. Using wavelets methods, Chesneau, Dewan, and Doosti provide upper bounds of wavelet estimations onL2risk for a

two-dimensional continuous-discrete density function over Besov spacesBs r,q. This

paper deals withLp(1p<) risk estimations over Besov space, which generalizes

Chesneau–Dewan–Doosti’s theorems. In addition, we firstly provide a lower bound of Lprisk. It turns out that the linear wavelet estimator attains the optimal convergence

rate forrp, and the nonlinear one offers optimal estimation up to a logarithmic factor.

Keywords: Wavelets; Density estimation; Continuous-discrete density; Optimality

1 Introduction

1.1 Introduction

The density estimation plays an important role in both statistics and econometrics. This paper considers a two-dimensional density estimation model defined over mixed continu-ous and discrete variables [2]. More precisely, let (X1,Y1), (X2,Y2), . . . , (Xn,Yn) be indepen-dent and iindepen-dentically distributed (i.i.d.) observations of a bivariate random variable (X,Y), whereXis a continuous random variable, andYis a discrete one. The joint density func-tion of (X,Y) is given by

f(x,v) =

∂xF(x,v)

withF(x,v) =P(Xx,Y=v) being the distribution function of (X,Y). We are interested in estimatingf(x,v) from (X1,Y1), (X2,Y2), . . . , (Xn,Yn). This continuous-discrete density model also arises in survival analysis, economics, and social sciences. For example, con-sider a series system withmcomponents, which fails as soon as one of the components fails. LetXbe the failure time of the system, and letYbe the component whose failure resulted in the failure of the system. Then (X,Y) is a bivariate continuous-discrete random variable. For more examples, see [1] and [4].

The conventional kernel method gives a nice estimation for the continuous-discrete density function [1,10,14]. However, it is hard to provide the optimal estimation for the

(2)

densities in Besov spaces. In addition, the complexity of bandwidth selection increases the difficulty of the kernel method.

Recently, wavelet methods have made the remarkable achievements in density estima-tion [7,8,11,12,15] due to their time and frequency localization, multiscale decompo-sition, and fast algorithm in numerical computations. In fact, wavelet estimation attains optimality for densities in Besov spaces, which avoids the disadvantage of kernel methods. Using the wavelet method, Chesneau et al. [2] constructed linear and nonlinear wavelet estimators for a two-dimensional continuous-discrete density function and derived their mean integrated squared errors performance over Besov balls.

This paper addressesLp (1p<) risk estimations on Besov balls by using wavelet bases, which generalizes Chesneau–Dewan–Doosti’s theorems. It should be pointed out that a lower bound forLprisk of all estimators is derived firstly. It turns out that the linear wavelet estimator is optimal forrpand the nonlinear one attains optimal estimation up to a logarithmic factor.

1.2 Notations and definitions

In this paper, we use the tensor product method to construct an orthonormal wavelet basis forL2(R2), which will be used in later discussions. With a one-dimensional Daubechies

scaling functionD2N and a wavelet functionψ2N (ψ2Ncan be constituted from the scaling functionD2N), we construct two-dimensional tensor product waveletsϕ,ψ1,ψ2, andψ3 as follows:

ϕ(x,y) :=D2N(x)D2N(y), ψ1(x,y) :=D2N(x)ψ2N(y),

ψ2(x,y) :=ψ2N(x)D2N(y), ψ3(x,y) :=ψ2N(x)ψ2N(y).

Thenϕandψi(i= 1, 2, 3) are compactly supported in time domain, because Daubechies’ waveletD2N andψ2N are [5,8].

Denote

ϕj,k(x,y) := 2

2jxk1, 2jyk2

, ψji,k(x,y) := 2jψi2jxk1, 2jyk2

fork= (k1,k2)∈Z2andi= 1, 2, 3. Then for eachfL2(R2),

f =

k∈Z2

αj0,kϕj0,k+

j=j0 3

i=1

k∈Z2 βji,kψji,k

holds inL2 sense, whereαj,k:=f,ϕj,k,βji,k:=f,ψji,k. As usual, letPjbe the orthogonal projection operator defined by

Pjf :=

k∈Z2

f,ϕj,kϕj,k.

(3)

One of advantages of wavelet bases is that they can characterize Besov spaces, which contain Hölder spaces andL2-Sobolev spaces as particular examples. Throughout the

pa-per, we work within a Besov space on a compact subset ofR2. The following lemma shows

equivalent definitions for those spaces, which are fundamental in our discussions.

Lemma 1.1([13]) Letϕ be an m-regular orthonormal scaling function with the corre-sponding waveletsψi(i= 1, 2, 3).If fLr(R2),α

j,k=f,ϕj,kβji,k=f,ψji,k,and1≤r,q

∞, 0 <s<m.Then following assertions are equivalent:

(i) fBsr,q(R2); (ii) {2jsP

j+1fPjfr}j≥0∈lq; (iii) {2j(s+1–2p)β

jr}j≥0q<∞. The Besov norm off can be defined by

fBsr,q:=αj0,·r+2 j(s+1–p2)

βjr

jj0q,

whereαj0,·rr:=

k∈Z2|αj0,k|randβjrr:= 3

i=1

k∈Z2|βji,k|r.

Here and further,ABmeans thatACBfor some constantC> 0 independent ofA

andB,ABmeansBA, andABstands for bothABandAB.

Remark1.1 By (i) and (ii) of Lemma1.1we observe that

Pjffr=

l=j

(Pl+1fPlf) r

l=j

Pl+1fPlfr

l=j

2–ls2–js

forfBs

r,q(R2). Hence

Pjffr2–js. (1.1)

Remark1.2 Whenrp, Lemma1.1(i) and (iii) imply that, forsp2=s–2r > 0,

Bsr,qR2Bsp,qR2,

where ABstands for a Banach spaceAcontinuously embedded in another Banach spaceB. More precisely,uBCuA(uA) for some constantC> 0.

Lemma 1.2([13]) LetϕL2(R2)be a scaling function or a wavelet withsup

k∈Z2|ϕ(xk)|<

∞. Then, forλ={λk} ∈lp(Z2) and 1≤p≤ ∞,

k∈Z2

λkϕj,k p

∼2j(1–2/p)λp.

Hereλpis thelp(Z2) norm ofλlp(Z2):

λp:= ⎧ ⎨ ⎩

(kZ2|λk|p)1/p ifp<∞,

(4)

1.3 Main results

In this subsection, we state our main results and discuss relations to some other work. To do that, we propose a new bivariate functionf∗(x,y), which is an improved one of that in

[2]. Define

f∗(x,y) :=

m

v=1

u(y,v)P(Y=v)f(x|Y=v)

with

u(y,v) = ⎧ ⎪ ⎨ ⎪ ⎩

1

1+e

1

yv+y–1v+1

1(v–1,v)(y) + e 1

yv+y–1v–1

1+e

1

yv+y–1v–1

1(v,v+1)(y), y=v,

1, y=v,

where 1Dis the indicator function of a setD.

The construction off∗follows the idea proposed by Chesneau [2] but is different from [2]. The weight u(y,v) equals to characteristic function 1{v1

2≤y<v+12} in [2]. By a careful

verification our weightu(y,v) is differentiable with respect toyfor eachv∈ {1, 2, . . . ,m}. The modification ofu(y,v) from the characteristic function to the smooth one makesf

continuous iny. It is easy to see that, for anyy=v∈ {1, 2, . . . ,m},

f∗(x,y) =f(x,v).

Hence, the problem is converted to construct an estimator off∗. As in [2], we assume that f∗belongs to the spaceBsr,q(H,Q) or, equivalently,f∗belongs to the Besov ball

Bsr,q(H) :=f,fBsr,qR2andfBsr,qH

and that the support off∗(x,·) is contained in [–Q,Q] for fixedv(Q> 0,v= 1, 2, . . . ,m). To introduce the wavelet estimator, we need the wavelet coefficient estimators ofαj,k andβji,k:

ˆ

αj,k= 1

n

n

l=1

R

ϕj,k(Xl,y)u(y,Yl)dy, βˆji,k= 1

n

n

l=1

R

ψji,k(Xl,y)u(y,Yl)dy. (1.2)

Define∧j0:={k∈Z 2,suppf

∗∩suppϕj0,k=∅}. Whenf∗andϕhave compact supports, the

cardinality of∧jsatisfies ∧j22j. Then the linear wavelet estimator off∗is given as

fol-lows:

ˆ

fnlin(x,y) := k∈∧j0

ˆ

αj0,kϕj0,k(x,y), (1.3)

wherej0is chosen such that 2j0∼n 1

2s+1,s:=s– (2

r

2

p)+, andx+:=max{x, 0}. To obtain a nonlinear estimator, we takej0andj1such that 2j1∼lnnnand 2j0∼n

1 2m+1with

m>s. Define∧j:={k∈Z2,suppf∗∩suppψji,k=∅}andλj:= T22– j

2

lnn

(5)

described as Lemma2.3). Then the nonlinear estimator is given by

ˆ

fnnon(x,y) := k∈∧j0

ˆ

αj0,kϕj0,k(x,y) + j1

j=j0 3

i=1

k∈∧j

ˆ

βji,k1{| ˆβi j,k|>λj}ψ

i

j,k(x,y). (1.4)

From the definition ofˆfnnonwe find that the nonlinear estimator has the advantage to be adaptive, since it does not depend on the indicess,r,qandHin its construction.

The following theorem gives a lower bound estimation forLprisk.

Theorem 1.1 Letf be an estimator of fˆ ∗∈Bsr,q(H)with s>2rand r,q≥1.Then there exists

C> 0such that,for1≤p<∞,

sup

f∗∈Bsr,q(H)

EˆfnfppCmax

n–2sps+1,

lnn n

–(s– 2r+ 2p)p 2(s– 2r)+1

.

The upper bounds of the linear and nonlinear wavelet estimators are provided by The-orems1.2and1.3, respectively.

Theorem 1.2 Letfˆlin

n be the estimator of f∗∈Bsr,q(H,Q)defined by(1.3)with1≤r,q<∞,

s> 0.If the density of X is bounded,then for rp≥1or rp<∞and s>2r,

sup

f∗∈Bsr,q(H,Q)

Efˆnlin–fppn

ps

2s+1

with s=s– (2

r

2

p)+and x+:=max(x, 0).

Remark1.3 Ifr≥2,p= 2 ands> 0,s=s, then Theorem1.2reduces to Theorem 4.1 in [2]. In addition, Theorem1.2does not make any restriction onQ, and so the assumptions are weaker than in [2]. Theorem1.2extends the corresponding theorem of [2] fromp= 2 top∈[1,∞).

Whenrp,s=sand the linear wavelet estimatorfˆlin

n attains optimality thanks to The-orems1.1and1.2. However, the linear estimator does not offer optimal estimation for

r<p, because ofs<sand s

2s+1<

s

2s+1 in this case.

To give a suboptimal estimation forr<p, we need the nonlinear wavelet estimators defined by (1.4).

Theorem 1.3 Letfˆnnonbe the estimator of f∗∈Bsr,q(H,Q)defined by(1.4)with1≤r,q<∞,

s> 0.If the density of X is bounded,then for rp≥1or rp<∞and s>2r,

sup

f∗∈Bsr,q(H,Q)

Efˆnnon–fpp(lnn)p

lnn n

αp

withα:=min{2ss+1, s

2

r+2p

(6)

Remark1.4 Theorems1.1and1.3tell us that the nonlinear estimator is suboptimal up to a logarithmic factor. Moreover, ifp= 2 and{r≥2,s> 0}or{1≤r< 2,s>2r}, thenα=2ss+1, and Theorem1.3is the same as Theorem 4.2 in [2] up to a logarithmic factor. Hence Theorem1.3can be considered as an extension of Theorem 4.2 in [2] fromp= 2 top

[1,∞).

In particular, we can extend the theorems to the multidimensional case as in [3] by using the technique developed by [9]. It is a challenging problem to study the estimation of a multivariate continuous-discrete conditional density. We refer to [3] for further details.

2 Some lemmas

We shall show several lemmas in this section, which are needed for proofs of our main theorems.

Lemma 2.1 Letαˆj,kandβˆj,kbe defined by(1.2).Then

E(αˆj,k) =αj,k and E ˆ

βji,k=βji,k

for jj0,k∈Z2,and i= 1, 2, 3. Proof Denotecj,k(v) =

φj,k2(y)u(y,v)dy. Then

ˆ

αj,k= 1

n

n

i=1

ϕj,k(Xi,y)u(y,Yi)dy= 1

n

n

i=1

φj,k1(Xi)cj,k(Yi).

Since (X1,Y1), (X2,Y2), . . . , (Xn,Yn) are independent and identically distributed, we have

E(αˆj,k) =E

φj,k1(X1)cj,k(Y1)

=EEφj,k1(X1)cj,k(Y1)|Y1

=Ecj,k(Y1)E

φj,k1(X1)|Y1

=E

cj,k(Y1)

φj,k1(x)f(x|Y1)dx

= m

v=1

P(Y1=v)cj,k(v)

φj,k1(x)f(x|Y1=v)dx

=

m

v=1

P(Y1=v)u(y,v)f(x|Y1=v)

φj,k1(x)φj,k2(y)dx dy

= f∗(x,y)ϕj,k(x,y)dx dy=αj,k.

Similarly to the previous arguments,E(βˆi

j,k) =βji,k. The proof of Lemma2.1is done.

To show Lemma2.2, we introduce Rosenthal’s inequality.

Rosenthal’s inequality ([8]) LetX1,X2, . . . ,Xn be independent random variables such thatEXl= 0 andE|Xl|p<∞(l= 1, 2, . . . ,n). Then, withCp> 0,

E

n

l=1 Xl

p

⎧ ⎨ ⎩

Cp[ n

l=1E|Xl|p+ ( n

l=1E|Xl|2)p/2], p≥2,

Cp( n

(7)

Lemma 2.2 Letαˆj,kandβˆj,kbe defined by(1.2).If the density of X is bounded,then there

exists a constant C> 0such that

E| ˆαj,kαj,k|p≤2– p

2jn

p

2 and E| ˆβj,kβj,k|p2

p

2jn

p

2

for1≤p<∞and2jn.

Proof We only prove the first inequality, since the second one is similar. By the definition ofαˆj,k,

ˆ

αj,k= 1

n

n

l=1

Rϕj,k(Xl,y)u(y,Yl)dy= 1

n

n

l=1

φj,k1(Xl)cj,k2(Yl),

wherecj,k2(Yl) :=

Rφj,k2(y)u(y,Yl)dy, andφis a one-dimensional Daubechies scaling

func-tionD2N. Since|u(y,v)| ≤2, we obtain that cj,k

2(Yl)≤

R

φj,k2(y)u(y,Yl)dy≤2 –2jφ

1 (2.1)

and

Eφj,k1(Xl)cj,k2(Yl)

p

2–p2jEφ

j,k1(Xl)

p

2–p2j

R

φj,k1(x)

p

fX(x)dx2–j (2.2)

due to the boundedness offX. Defineξl:=φj,k1(Xl)cj,k2(Yl) –αj,k. Then

E|ξl|p=Eφj,k1(Xl)cj,k2(Yl) –αj,k

p

Eφj,k1(Xl)cj,k2(Yl)

p

+E|αj,k|p. (2.3) It follows from Lemma2.1and Jensen’s inequality that

E|αj,k|p=E

φj,k1(Xl)cj,k2(Yl)

p

Eφj,k1(Xl)cj,k2(Yl)

p .

Hence (2.3) reduces to

E|ξl|pEφj,k1(Xl)cj,k2(Yl)

p

2–j (2.4)

thanks to (2.2). By the definition ofαˆj,kandξl,αˆj,kαj,k=1nnl=1ξl, whereξ1,ξ2, . . . ,ξnare independent because (X1,Y1), (X2,Y2), . . . , (Xn,Yn) also are. On the other hand, Lemma2.1 impliesE(ξl) = 0. Then Rosenthal inequality leads to

E| ˆαj,kαj,k|p=E 1

n

n

l=1 ξl

p

⎧ ⎨ ⎩

np[n

l=1E|ξl|p+ ( n

l=1E|ξl|2) p

2], p2, np(n

l=1E|ξl|2) p

2, 1≤p≤2.

(2.5)

By (2.4) we know that

np

n

l=1 E|ξl|2

p

2

npn2–j p

(8)

for 1≤p< 2 and

np

n

l=1

E|ξl|p+ n

l=1 E|ξl|2

p

2

npn2–j+np22

p

2jn

p

22

p

2j

forp≥2 thanks to the assumption 2jn. Combining these with (2.5), we receive the desired conclusion

E| ˆαj,kαj,k|p2– p

2jn

p

2.

This completes the proof.

To prove Lemma2.3, we need the well-known Bernstein inequality.

Bernstein’s inequality([8]) LetX1,X2, . . . ,Xnbe i.i.d. random variables withE(Xi) = 0 andXi∞≤M. Then, for eachγ> 0,

P

1

n

n

i=1 Xi

>γ

≤2exp

2

2[E(X2

i) +Xγ/3]

.

The next lemma is an extension of Proposition 4.2 in [2].

Lemma 2.3 Let2jlnnn,βˆji,k(i= 1, 2, 3)be defined in(1.2).If the density of X is bounded,

then for eachε> 0,there exists T> 0such that,for j≥0and k∈Z2,

ˆji,kβji,k>T 22

–12j

lnn n

2–εj. (2.6)

Proof We only show (2.6) fori= 1. By the definition ofβˆj1,k,βˆj1,k=1

n n

l=1

Rψj1,k(Xl,y)u(y,

Yl)dy, and

ˆ

βj1,kβj1,k=1

n

n

l=1

φj,k1(Xl)dj,k2(Yl) –β 1

j,k

,

where dj,k2(Yl) :=

Rψj,k2(y)u(y,Yl)dy (φ, ψ stand for the one-dimensional Daubechies

scaling function and wavelet function, respectively). Define ηl:=φj,k1(Xl)dj,k2(Yl) –βj1,k. Thenβˆj1,kβj1,k=n1nl=1ηlandE(ηl) = 0 because ofβjl,k=E(βˆjl,k) =E[φj,k1(Xi)dj,k2(Yi)].

Using (2.1) with ψ instead of φ, we get |dj,k2(Yl)| 2– j

2. Note that |φj,k

1(Xl)| := 22j|φ(2jXlk1)| ≤2

j

2φ. Then|φj,k

1(Xl)dj,k2(Yl)|1 and|βj1,k|=|E[φj,k1(Xl)dj,k2(Yl)]|

1. Hence

|ηl| ≤φj,k1(Xl)dj,k2(Yl) –β 1

j,k1. (2.7)

By replacingcj,k2andαj,kwithdj,k2 andβj1,k, respectively, arguments similar to (2.1)–(2.4) show that

(9)

Becauseη1,η2, . . . ,ηnare i.i.d. andE(ηl) = 0 (l= 1, 2, . . . ,n), Bernstein’s inequality tells us that

P

βˆjl,kβjl,k= 1

n

n

l=1 ηl

>

T

22

–12j

lnn n

≤2exp

2

j 2[E(η2l) +λ3∞]

(2.9)

withλj=T22–

1 2j

lnn

n . This with (2.7)–(2.8) implies

2j

2[E(ηl2) +λj

3η∞]

T2lnn

8(C1+C62T2

j

2

lnn n )

T2lnn

8(C1+C62T)

because 22j

lnn

n ≤1 by the assumption 2jn

lnn. Note thatlnn>jln2 due ton≥2jlnn> 2j. Hence

2

j 2[E(η2

l) + λj

3η∞]

T2ln2

8(C1+C62T) j>εj

by choosingT> 0 such that T2ln2

8(C1+C2 6T)

>ε. Then (2.9) reduces to

ˆj1,kβj1,k>T 22

–12j

lnn n

≤2–εj,

which concludes (2.6) withi= 1. Similarly, the conclusions withi= 2, 3 hold. This

com-pletes the proof.

At the end of this section, we introduce two classical lemmas, which are needed for the proof of lower bound.

Lemma 2.4 (Varshamov–Gilbert lemma, [11]) Let :={ε= (ε1,ε2, . . . ,εm),εi∈ {0, 1}}.

Then there exists a subset(ε0,ε1, . . . ,εT)ofwithε0= (0, 0, . . . , 0)such that T2m8 and

m

k=1

εkiεjkm

8 (0≤i=jT).

To state Fano’s lemma,we introduce a concept:When P is absolutely continuous with

re-spect to Q(denoted by PQ),the Kullback divergence of P and Q between two measures

P and Q is defined by

K(P,Q) :=

p(x)lnp(x) q(x)dx,

where p(x)and q(x)are the density functions of P and Q,respectively.

Lemma 2.5(Fano’s lemma, [6]) Let(,F,Pk)be a probability spaces,and let AkF,

(10)

Km:=inf0≤kmm1

k=kK(Pk,Pk),

sup

0≤km

Pk

ACk≥min

1 2,

mexp–3e–1–Km

,

where K(Pk,Pk)is the Kullback distance of Pkand Pk(k= 0, 1, . . . ,m).

3 Proofs of lower bounds

We rewrite Theorem1.1as follows before giving its proof.

Theorem 3.1 Letfˆnbe an estimator of f∗∈Bsr,q(H)with s>2r and1≤r,q≤ ∞.Then,for 1≤p<∞,

sup

f∗∈Bsr,q(H)

Eˆfnfppmax

n–2ssp+1,

lnn n

(s– 2r+ 2p)p

2(s– 2r)+1

.

Proof As in Sect.1, we take the two-dimensional tensor product wavelet

ψ1(x,y) :=D2N(x)ψ2N(y),

whereD2N(·) andψ2N(·) are the one-dimensional Daubechies scaling function and wavelet function, respectively. Thenψ1ism-regular (m>s) for largeN, and

suppψ1⊆[0, 2N– 1]×[–N+ 1,N]

due tosuppD2N⊆[0, 2N– 1] andsuppψ2N⊆[–N+ 1,N]. Then there exists a compactly supported density functiong0 such that

R2g0(x)dx= 1,g0(x)|[0,2N–1]×[–N+1,N]=c0, and g0∈Bsr,q(H). Definej:=1j×2j with

1j :=0, 2N, 4N, . . . , 22j– 1N, 2j :=0,±2N,±4N, . . . ,±22j–1– 1N. Then j= 2j(2j– 1)∼22j( jdenotes the cardinality ofj). Denoteaj:= 2–(2s+1)jand

∧:=

(x,y) =g0(x,y) +aj

kj

εkψj1,k(x,y),εk∈ {0, 1}

.

Obviously, the supports ofψj1,kandψj1,kare disjoint fork=kjandsuppψj1,k⊆suppg0.

When (x,y)∈[0, 2N– 1]×[–N+ 1,N],

c0–ajψj1,k∞≥c0– 2–2sjψ1> 0

for largej. On the other hand,

R2(x,y)dx=

R2g0(x,y)dx= 1.

(11)

Moreover,Bsr,q(H). In fact, forεk∈ {0, 1},

kj|εk|

r22jand

2j(s+1–2r)aj kj

|εk|r 1

r

≤1.

By Lemma1.1,aj

kjεkψ

1

j,kBsr,qH. This withg0∈B s

r,q(H) impliesBsr,q(H). According to Lemma 2.4 (Varshamov–Gilbert theorem), for ={ε= (εk)kj,εk

{0, 1}}, there exists a subset{ε(0),ε(1), . . . ,ε(M)}ofsuch thatM≥222 j

8 ,ε(0)= (0, 0, . . . , 0),

and form,n= 0, 1, . . . ,M,m=n,

kj

ε(km)–εk(n)≥2 2j

8 . (3.1)

Denote∧:={(0),gε(1), . . . ,gε(M)}. Then∧⊆ ∧, and forgε(m),gε(n)∈ ∧,

(m)–gε(n)pp=apj

kj

εkmεknpψj1,kp

p= 2

–2(sp+1)jψ1p p

kj

εmkεknp,

since the supports ofψj1,k(kj) are mutually disjoint. This with (3.1) leads to

(m)–gε(n)ppC12–2psj:=δpj.

Define

(i):=

ˆfn(i)p<

δj 2

,

i= 0, 1, 2, . . . ,M. Then(m)∩Aε(n)=∅form=n. Denote byPfnthe probability measure

with the densityfn(x,y) :=n

i=1f(xi,yi). By the construction of(i),Png

ε(i) P

n

g0. Then it

follows from Lemma2.5(Fano’s lemma) that

sup

0≤iM

Png

ε(i)

ˆfn(i)p

δj 2

≥ sup

0≤iM

Pgn

ε(i)

Acε(i)

≥min

1 2,

Me–3eeKM

.

Furthermore,

Eˆfn(i)ppδpj

2pP n g

ε(i)

ˆfn(i)pδj 2

≥2–2psjPgn

ε(i)

Acε(i)

.

Taking 2jn2(21s+1), we obtain that

sup

0≤iM

Eˆfn(i)pp≥2–2psj sup 0≤iM

Png

ε(i)

Acε(i)

n–2pss+1min

1 2,

Me–3eeKM

(12)

withKM:=inf0≤vMM1

i=vK(Pgnε(i),P

n

gε(v)). By the definition of Kullback divergence,

KPng

ε(i),P

n g0

=

R2n

ln

n

i=1(i)(xi,yi) n

i=1g0(xi,yi) n !

i=1

(i)(xi,yi)dx1dy1dx2dy2· · ·dxndyn

=n

R2

(i)(x1,y1)ln

(i)(x1,y1) g0(x1,y1)

dx1dy1

n

R2

gε(i)(x1,y1)

(i)(x1,y1) g0(x1,y1)

– 1 dx1dy1, (3.3)

where we applied the inequalitylnuu– 1 foru> 0 in the last inequality. Note that

R2 (i)(x1,y1)

(i)(x1,y1) g0(x1,y1)

– 1 dx1dy1

=

R2

g0(x1,y1)

–1

(i)(x1,y1) –g0(x1,y1)

2 dx1dy1

andg0(x1,y1) =c0for (x1,y1)∈[0, 2N– 1]×[–N+ 1,N]. Combining this with the Parseval

identity, we reduce (3.3) to

KPng

ε(i),P

n g0

nc–10 a2j

kj

εikψj1,k(x,y)

2

2

=nc–10 a2j

kj

εki2≤nc0–1a2j22j. (3.4)

Hence

KM≤ 1

M

M

i=1 KPng

ε(i),P

n g0

c–10 na2j22j.

On the other hand, 2jn2(21s+1) impliesna2

jC. Then it follows fromM≥2

22j

8 ≥e22jln82

that

MeKMe22jln162–c–10 C22j1

by choosingC> 0 such thatC<ln2

16c0. This with (3.2) leads to

sup

0≤iM

Eˆfn(i)ppn

ps

2s+1min

1 2,

Me–3eeKM

n–2pss+1. (3.5)

Now, it remains to show that

sup

f∗∈Bsr,q(H)

EˆfnfppC

lnn n

(s– 2r+ 2p)p

2(s– 2r)+1

. (3.6)

Similarly to the proof of (3.5), we construct the family of density functions{gk,kj}as follows:

(13)

whereaj:= 2–j(s+1–

2

r). Obviously,

R2gk(x,y)dx dy=

R2g0(x,y)dx dy= 1, and

gk(x,y)|[0,2N–1]×[–N+1,N]≥c0– 2–j(s– 2

r)ψ1

∞> 0

for largejsinces>2r. Thengk is a bivariate density function for fixedkj. From the proof of (3.5) we know thatg0∈Bsr,q(H). This with

ajψj1,kBsr,qaj2

j(s+1–2r)1

impliesgkBsr,q(H) forkj. To prove (3.6), we need to show that

sup

kj

EˆfngkppC

lnn n

(s– 2r+ 2p)p

2(s– 2r)+1

. (3.7)

Whenk=kj,suppψj1,k∩suppψj1,k=∅and

gkgkpp=a p

jψj1,kψj1,k p p= 2a

p

j2j(p–2)ψ1 p p= 2·2

j(s–2r+2p)p

ψ1pp.

Moreover,

gkgkp= 2

1

pψ1 p2

j(s–2r+2p)

:=δj.

DefineBk:={ˆfngkp< δj

2}. ThenBkBk=∅(k=k). According to Lemma2.5(Fano’s

lemma), we find that

sup

kj

Png

k

ˆfngkp

δj 2

≥min

1 2,

Me–3e–1eKM

, (3.8)

where M= j andKM :=inf0≤vMM1

k=vK(Pngk,P n gv)≤

1

M

k=0K(Pngk,P n

g0). Similar to

(3.3)–(3.4), we conclude that

KPng

k,P n gv

n

R2

g0(x,y)

–1

gk(x,y) –g0(x,y)

2

dx dyc–10 C1na2j.

HenceKMc–10 C1na2j. By taking 2j∼(lnnn)

1

2(s– 2r)+1we obtain thatln2jClnnandeKM

ec–10 C1na2jec–10 Clnn, thanks tona2

jC2lnn(C=C1C2). Moreover, choosingC1andC

such thatC>c–1

0 C, we have

Me–3e–1eKMeln2je–3e–1eKMeClnnc–10 Clnn–3e–11

due toM∼22j. This with (3.8) impliessupk

jP n

gkfngkpδj

2)1. Furthermore,

sup

kj

Eˆfngkpp

δjp

2pPgk

ˆfngkp

δj 2

(14)

Then the desired conclusion (3.7) follows fromδj:= 2

1

pψ1

p2–j(s

2

r+2p)and the choice of

2j( n

lnn)

1

2(s– 2r)+1. This completes the proof.

4 Proofs of upper bounds

In this section, we prove the upper bounds of wavelet estimators. The result of the linear one is derived firstly. We restate and prove Theorem1.2as Theorem4.1.

Theorem 4.1 Letfˆlin

n be the linear estimator of f∗∈Bsr,q(H,Q)defined in(1.3)with1≤

r,q<∞,s> 0.If the density of X is bounded,then for{rp≥1}or{rp<∞and s>2r},

sup

f∗∈Bsr,q(H,Q)

Efˆnlin–fppn

ps

2s+1

with s=s– (2

r

2

p)+and x+:=max{x, 0}.

Proof Whenrp, s:=s– (2r – 2p)+=s–2r +2p andBsr,q(R2)Bs

p,q(R2) thanks to Re-mark1.2. Then

sup

f∗∈Bsr,q(H,Q)

Efˆnlin–fpp sup

f∗∈Bsr,q(H,Q)

Efˆnlin–fpp.

Whenr>pandf∗has a compact support, thenfˆnlindoes due toϕhaving the same property. By the Hölder inequality,

sup

f∗∈Bsr,q(H,Q)

Efˆlin

nfp

p sup f∗∈Bsr,q(H,Q)

Efˆlin

nfr r

p r.

Becauses=sin that case, it is sufficient to prove that

sup

f∗∈Bs r,q(H,Q)

Efˆnlin–fppn

ps

2s+1 (4.1)

for the conclusion of Theorem4.1. Recall thatˆflin

n :=

k∈∧j0αˆj0,kφj0,k. Then by Lemma2.1we conclude that

Efˆnlin–Efˆnlinpp=E

k∈∧j0

(αˆj0,kαj0,k)ϕj0,k p

p

2j0(p–2)

k∈∧j0

E| ˆαj0,kαj0,k|p

due to Lemma1.2. It follows from Lemma2.2and ∧j022j0that

Efˆnlin–Eˆfnlinpp2j0(p–2)22j02pj20n

p

2 2

pj0 2 n

p

2 n

ps

2s+1 (4.2)

thanks to the choice of 2j0∼n2s1+1.

On the other hand, by Lemma2.1,E(fˆlin

n ) =

k∈∧j0αj0,kϕj0,k=Pj0f∗. Combining this with f∗∈Bsp,q(R2) and Remark1.1, we get that

Efˆnlin–fpp=Pj0f∗–f

p p2–j0ps

(15)

Taking 2j0∼n2s1+1, it is easy to show

Efˆnlin–fppn

ps

2s+1. (4.3)

Hence, by (4.2)–(4.3),

sup

f∗∈Bsr,q(H,Q)

Efˆnlin–fpp sup

f∗∈Bsr,q(H,Q)

Efˆnlin–Efˆnlinpp+ sup

f∗∈Bsr,q(H,Q)

Efˆnlin–fpp

nps 2s+1,

which means that (4.1) holds. The proof is done.

Next, we are in a position to prove the conclusion of the nonlinear one.

Theorem 4.2 Letfˆnon

n be the nonlinear estimator of f∗∈Bsr,q(H,Q)defined in(1.4)with 1≤r,q<∞,s> 0.If the density of X is bounded,then for{rp≥1}or{rp<∞and s>2r},

sup

f∗∈Bsr,q(H,Q)

Efˆnnon–fpp(lnn)p

lnn n

αp

withα:=min{2ss+1, s

2

r+2p

2(s–2r)+1}.

Proof We only need to prove the caserp. In fact, whenr>p,fˆnon

n has a compact support because ofϕ,ψ, andf∗have the same property. By the Hölder inequality,

sup

f∗∈Bsr,q(H,Q)

Efˆnnon–fpp sup

f∗∈Bsr,q(H,Q)

Eˆfnnon–frr

p r.

Using Theorem 4.2 for the case r = p, we find that supfBs

r,q(H,Q)Eˆf

non

nfrr (lnn)r(lnn

n )

αr, and therefore

sup

f∗∈Bsr,q(H,Q)

Efˆnnon–fpp(lnn)p

lnn n

αp .

It remains to estimate the caserp. Recall that

ˆ

fnnon–f∗=fˆnlinPj0f

+ (Pj1+1f∗–f∗) +

j1

j=j0 3

i=1

k∈∧j ˆ

βji,k1{| ˆβi j,k|>λj}–β

i j,k

ψji,k

withλj=T2– j

2

lnn

n . Denotefj0,j1:=

j1

j=j0

3

i=1

k∈∧j(βˆ

i

j,k1{| ˆβji,k|>λj}–βji,k)ψji,k. Then

Efˆnnon–fppEˆfnlin–Pj0f

p

p+Pj1+1f∗–fp

p+Efj0,j1

p

(16)

From the proof of Theorem4.1we obtain that

Efˆnlin–Pj0f

p p2

j0p

2 n

p

2

lnn n

αp

and

Pj1+1f∗–f

p p2–j1ps

lnn n

αp

(4.5)

due to 2j0∼n2m1+1, 2j1∼ n

lnnandα=min{ s

2s+1,

s–2r+2p

2(s–2r)+1}.

Byfj0,j1:=

j1

j=j0

3

i=1

k∈∧j(βˆ

i

j,k1{| ˆβji,k|>λj}–β i

j,k)ψji,kand Lemma1.2,

Efj0,j1

p

p(j1–j0+ 1)p–1

j1

j=j0 3

i=1

2j(p–2) k∈∧j

ˆji,k1{| ˆβji,k|>λj}–β i j,k

p .

On the other hand, it is easy to see that

βˆji,k1{| ˆβji,k|>λj}–β i

j,k=βˆji,kβji,k(1{| ˆβji,k|≥λj,|βji,k|<λj/2}+ 1{| ˆβji,k|≥λj,|βji,k|≥λj/2}) +βji,k(1{| ˆβij,k|<λj,|βij,k|>2λj}+ 1{| ˆβji,k|<λj,|βji,k|≤2λj})

and 1{| ˆβij,k|≥λj,|βij,k|<λj/2}≤1{| ˆβji,kβji,k|>λj/2}. Then

Efj0,j1

p

pT1+T2+T3+T4 (4.6)

with

T1:= (lnn)p–1

j1

j=j0 3

i=1

2j(p–2) k∈∧j

ˆji,kβji,kp1{| ˆβji,kβji,k|>λj/2}

,

T2:= (lnn)p–1

j1

j=j0 3

i=1

2j(p–2) k∈∧j

ˆji,kβji,kp1{| ˆβji,k|≥λj,|βji,k|≥λj/2}

,

T3:= (lnn)p–1

j1

j=j0 3

i=1

2j(p–2) k∈∧j

ji,kp1{| ˆβji,k|<λj,|βji,k|≤2λj}

,

T4:= (lnn)p–1

j1

j=j0 3

i=1

2j(p–2) k∈∧j

ji,kp1{| ˆβji,k|<λj,|βji,k|>2λj}

.

When| ˆβji,k|<λjand|βji,k|> 2λj,| ˆβji,kβji,k| ≥ |βji,k|–| ˆβji,k|>| ˆβji,k|/2. Hence βji,kp1{| ˆβi

j,k|<λj,|βji,k|>2λj}βˆ i j,kβji,k

p 1{| ˆβi

j,kβji,k|>λj/2}.

Then (4.6) reduces to

Efj0,j1

p

(17)

By (4.4)–(4.5) and (4.7) it is sufficient to show

T(lnn)p

lnn n

αp

, = 1, 2, 3, (4.8)

for the conclusion of Theorem4.2.

To estimateT1, using the Hölder inequality, we find that

T1(lnn)p–1

j1

j=j0 3

i=1

2j(p–2)

k∈∧j

ˆji,kβji,k2p 1 2E(1

{| ˆβji,kβji,k|≥λj/2}) 1

2.

Note thatE(1{| ˆβij,kβji,k|≥λj/2}) =P(| ˆβ i

j,kβji,k| ≥ λj

2)≤2

εjdue to Lemma2.3. Takingεsuch

thatε>p, we conclude that

T1(lnn)p–1n

p

2

j1

j=j0 3

i=1

2p–2εj(lnn)p–1n

p

22

p

2j0(lnn)p–1n–2pss+1

thanks to Lemma2.2, ∧j22jand the choice ofj0. Hence (4.8) with= 1 holds since α2ss+1.

To estimateT2andT3, define

2j∗0∼

n lnn

1–2α

, 2j∗1∼

n lnn

α

s– 2r+ 2p

.

Recall that 2j0∼n2m1+1, 2j1∼ n

lnnandα:=min{ s

2s+1,

s–2r+2p

2(s–2r)+1}. Then

1 – 2α≥ 1

2s+ 1> 1

2m+ 1 and

α s–2r+2p

1

2(s–2r) + 1 ≤1.

Hence 2j0≤2j∗0and 2j1∗≤2j1. Moreover, a simple computation shows that 1 – 2αα

s–2r+2p,

which implies 2j02j1.

Now, we estimateT2by dividingT2into

T2= (lnn)p–1

j∗0

j=j0

+ j1

j=j0+1

3

i=1

2j(p–2) k∈∧j

ˆji,kβji,kp1{| ˆβi

j,k|≥λj,|βji,k|≥λj/2}

:=t1+t2. (4.9)

Since 1{| ˆβi

j,k|≥λj,|βij,k|≥λj/2}≤1, by Lemma2.2we know that

t1(lnn)p–1

j∗0

j=j0 3

i=1

2pj2n

p

2 (lnn)p–1n

p

22

j0

2p(lnn)p

lnn n

αp

(4.10)

thanks to ∧j22jand the choice ofj∗0. To estimatet2, we observe that

1

{| ˆβji,k|≥λj,|βji,k|≥

λj

2}≤

1

{|βji,k|≥λ2j} |βi

j,k|

λj r

(18)

This with Lemma2.2leads to

t2(lnn)p–1

j1

j=j0+1 3

i=1

2j(p2–2)n

p

2

k∈∧j |βi

j,k|

λj r

. (4.11)

Note thatβjr2–j(s+1–

2

r)because offBs

r,qand Lemma1.1. Then (4.11) reduces to

t2(lnn)p

r

2–1n

rp

2

j1

j=j0∗+1

2–j(sr+2r

p

2) (4.12)

thanks toλj=T22– j

2

lnn

n . Denoteθ:=sr+ r

2–

p

2. Whenθ> 0,r>

p

2s+1and

t2(lnn)p

r

2–1n

rp

2 2j∗0(sr+r2–

p

2)(lnn)p

lnn n

αp

(4.13)

due to the choice ofj0. In (4.13), we use the factα= s

2s+1in the caser>

p

2s+1.

To show (4.13) forθ≤0, definer1:= (1 – 2α)p> 0. Thenα=

s–2r+2p 2(s–2r)+1 ≤

s

2s+1 andr

p

2s+1≤(1 – 2α)p=r1becauseθ≤0. The same arguments as (4.11) show that

t2(lnn)p–1

j1

j=j∗0+1 3

i=1

2j(p2–2)n

p

2

k∈∧j |βi

j,k|

λj r1

.

It follows fromf∗∈Bsr,qand Lemma1.1that

βjr1≤ βjr≤2–j(s+1–

2

r)

due torr1. Therefore, similarly to (4.12), we get that

t2(lnn)p

r1 2–1n

r1–p

2

j1

j=j0∗+1

2j[p–22 –(s–2r–12)r1].

Note that p2– 2 – (s–2r +12)r1= 0 because ofr1= (1 – 2α)pandα=

s–2r+p2

2(s–2r)+1. Then

t2(lnn)p

r1

2–1n

r1–p

2 (lnn)p

lnn n

αp

, (4.14)

which implies that (4.13) holds forθ≤0. The desired conclusion (4.8) with= 2 follows from (4.9)–(4.10) and (4.13)–(4.14).

Finally, by splittingT3into

T3= (lnn)p–1

j∗0

j=j0

+ j1

j=j0+1

3

i=1

2j(p–2)

k∈∧j

ji,kp1{| ˆβi

j,k|<λj,|βji,k|≤2λj}

(19)

we obtain that

e1(lnn)p–1

j0

j=j0 3

i=1

2jp|λj|p(lnn)

3 2p–1n

p

22

j0p

2 (lnn)p

lnn n

αp

(4.16)

thanks to ∧j22jand the choice ofλjandj∗0.

To estimatee2, we use the fact 1{| ˆβji,k|≤λj,|βji,k|≤2λj}≤(

2λj

|βji,k|)

prbecause ofrp. Similarly to (4.11)–(4.13),

e2(lnn)p

lnn n

αp

(4.17)

forθ> 0, whereθ:=sr+r2p2. Whenθ≤0, we rewritee2as follows:

e2= (lnn)p–1

j1

j=j0+1

+ j1

j=j1+1

3

i=1

2j(p–2) k∈∧j

ji,kp1{| ˆβji,k|<λj,|βji,k|≤2λj}

:=e1+e2. (4.18)

Proceeding as in (4.11) and (4.12), we find that

e1(lnn)p–1

lnn n

pr

2 j ∗ 1

j=j0+1

2–j(sr+r–2p)(lnn)p–1

lnn n

pr

2

2–j∗1(sr+

rp

2 ).

This with the choice of 2j1 ( n

lnn)

α

s– 2r+ 2p

leads to

e1(lnn)p

lnn n

αp

(4.19)

due toα= s

2

r+2p

2(s–2r)+1forθ≤0. Whenrp,

βjpβjr2–j(s+1–

2

r)

thanks tof∗∈Bsr,qand Lemma1.1. Therefore

e2(lnn)p–1 j1

j=j1+1 3

i=1

2j(p–2) k∈∧j

βji,kp(lnn)p–1 j1

j=j1+1

2–j(sp–2rp+2).

Combining this with the choice of 2j∗1∼( n

lnn)

α

s– 2r+ 2p

, we observe that

e2(lnn)p–12–j∗1(sp– 2p

r+2)(lnn)p

lnn n

References

Related documents

The granules ready for compression has been evaluated The prepared mucoadhesive tablets were evaluated thickness, weight variation, friability, hardness,

mation Security Agency (ENISA) found that the fol- lowing topics have been identi fi ed as problematic in the proposed framework: lack of a uni fi ed approach towards data breach

By considering the genealogy of all sequences rather than pairs of sequences, this new estimator also allows the joint estimation of other interesting parameters in the admixture

Visibility monitoring is now used not only for the ventilation control, but also for early fire detection. It allows detecting smoke and smouldering fires in an early stage. Real

Earth Planets Space, 53, 321?326, 2001 Synthetic aperture technique applied to a multi beam echo sounder Akira Asada1 and Tetsuichiro Yabuki2 1Institute of Industrial Science, University

CASE REPORT Open Access Serous cystadenocarcinoma of the pancreas report of a case and management reflections K Bramis1, A Petrou2*, A Papalambros1, A Manzelli3, E Mantonakis1, N

(A, B) FOXO1 mRNA and protein expression levels examined via qRT-PCR and Western blot at day 18 following miRNA inhibitor and siFOXO1 transfection in ligamentum flavum cells for 48h;