• No results found

In this section, we derive an adaptive estimation procedure for the value of the linear function `h(ϕ). This procedure is based on the estimator`b

b

m given in (1.8) with dimen-

sion parametermb selected as a minimizer of the data driven penalized contrast criterion

(1.2a–1.2b). The selection criterion (1.2a–1.2b) involves the random upper bound Mcn

and the random penalty sequencepend which we introduce below. We show that the esti-

mator`b b

m attains the minimax rate of convergence within a logarithmic term. Moreover,

we illustrate the cost due to adaption by considering classical smoothness assumptions. In an intermediate step we do not consider the estimation of unknown quantities in the penalty function. Let us therefore consider a deterministic upper bound Mn and

a deterministic penalty sequence pen := (penm)m>1, which is nondecreasing. These quantities are constructed such that they can be easily estimated in a second step. As an adaptive choice me of the dimension parameter m we propose the minimizer of a

penalized contrast criterion, that is,

e

m := arg min 16m6Mn

m+ penm} (1.14a)

where the random sequence of contrast Ψ := (Ψm)m>1is defined by

Ψm := max m6m06M n n |`bm0−`bm|2− penm0 o . (1.14b)

The fundamental idea to establish an appropriate upper bound for the risk of`b e

mis given

1.4 Adaptive estimation 31

definition of Ψ andme we deduce for all 1 6 m 6 Mn

|`b e m− `h(ϕ)| 26 3n| b ` e m`bm∧me | 2+ | b ` e m∧m`bm| 2+ | b `m− `h(ϕ)|2 o 6 3nΨm+ pen e mme + penm+|`bm− `h(ϕ)|2 o 6 6{Ψm+ penm} + 3|`bm− `h(ϕ)|2

where the right hand side does not depend on the adaptive choiceme. Since the penalty

sequence pen is nondecreasing we obtain Ψm 6 6 max m6m06M  |`bm0− `hm0)|2− 1 6penm0  + + 3 max m6m06M n |`h(ϕm− ϕm0)|2

where (·)+denotes the positive part of a function. Combining the last estimate with the

previous reduction scheme yields for all 1 6 m 6 Mn

|`b e

m− `h(ϕ)|2 6 7 penm+78 biasm+42 max m6m06M  |`bm0 − `hm0)|2− 1 6penm0  + (1.15) where biasm := supm0>m|`hm0 − ϕ)|2. We will prove below that penm+ biasm is of the order Rh

n(1+log n)−1. Moreover, we will bound the right hand side term appropriately with the help of Bernstein’s inequality.

Let us now introduce the upper bound Mn and sequence of penalty penm used in

the penalized contrast criterion (1.14a–1.14b). In the following, assume without loss of generality that [h]16= 0.

Definition 1.4.1. For all n > 1 let an := n1−1/ log(2+log n)(1 + log n)−1 and Mnh :=

max{16 m 6 bn1/4c : max 16j6m[h] 2 j 6 n[h]21} then we define Mn:= min n 26 m 6 Mnh : m3k[T ]−1m k2 max 16j6m[h] 2 j > an o − 1

where we set Mn:= Mnhif the min runs over an empty set. Thus, Mntakes values between

1and Mh

n. Let ςm2 = 74 E[Y2] + max16m06mk[T ]−1m [g]mk2, then we define penm := 24 ςm2 (1 + log n) n−1 max

16m06mk[h]

t

m0[T ]−1m0k2. (1.16)

To apply Bernstein’s inequality we need another assumption regarding the error term

U. This is captured by the set Uσfor some σ > 0, which contains all conditional distributions PU |W such that E[U|W ] = 0, E[U2|W ] 6 σ2, and Cramer’s condition hold,

i.e.,

E[|U |k|W ] 6 σkk!, k = 3, 4, . . . .

Moreover, besides Assumption 1.3 we need the following Cramer condition which is in particular satisfied if the basis {fl}l>1are uniformly bounded.

Assumption 1.4. There exists η > 1 such that the distribution of W satisfies

supj,l∈NE |fj(W )fl(W ) − E[fj(W )fl(W )]|k6 ηkk!, k = 3, 4, . . . .

We now present an upper bound for`b e

m. As Goldenshluger and Pereverzev [2000] we

face a logarithmic loss due to the adaptation.

Theorem 1.4.1. Assume an iid. n-sample of (Y, Z, W ) from the model (1.1a–1.1b) with

E[Y2] > 0. Let Assumptions 1.1–1.4 be satisfied. Suppose that (mn)3max16j6m◦

n[h]

2

j =

o(anυm

n)as n → ∞. Then we have for all n > 1

sup T ∈Tυ d,D sup PU |W∈U∞ σ sup ϕ∈Fγρ E|`b e m− `h(ϕ)| 2 6 C Rhn(1+log n)−1

for a constant C > 0 only depending on the classes Fρ

γ, Td,Dυ , the constants σ, η and the

representer h.

Remark 1.4.1. In all examples studied below the condition (m

n)3max16j6m◦

n[h]

2

j = o(anυm

n)

as n tends to infinity is satisfied if the structural function ϕ is sufficiently smooth. More precisely, in case of (pp) it suffices to assume 3 < 2p + 2 min(0, s). On the other hand, in case of (pe) or (ep) this condition is automatically fulfilled.  In the following definition we introduce empirical versions of the integer Mn and

the penalty sequence pen. Thereby, we complete the data driven penalized contrast criterion (1.2a–1.2b). This allows for a completely data driven selection method. For this purpose, we construct an estimator for ς2

m by replacing the unknown quantities by

their empirical analog, that is,

b ςm2 := 74n−1 n X i=1 Yi2+ max 16m06mk[T ]b −1 m [bg]mk 2 .

With the nondecreasing sequence (ςb

2

m)m>1 at hand we only need to replace the matrix

1.4 Adaptive estimation 33

Definition 1.4.2. Let anand Mnh be as in Definition 1.4.1 then for all n > 1 define

c Mn:= min n 26 m 6 Mnh : m3k[T ]b−1m k2 max 16j6m[h] 2 j > an o − 1

where we setMcn:= Mnhif the min runs over an empty set. Thus,Mcntakes values between

1and Mnh. Then we introduce for all m > 1 an empirical analog of penm by

d

penm := 204ςbm2(1 + log n)n−1 max

16m06mk[h]

t

m0[T ]b−1m0k2. (1.17)

Before we establish the next upper bound we introduce

Mn+:= min  26 m 6 Mnh : υm−1m3 max 16j6m[h] 2 j > 4Dan  − 1 (1.18) where M+

n := Mnh if the min runs over an empty set. Thus, Mn+ takes values between 1

and Mh

n. As in the partial adaptive case we attain the minimax rate of convergence Rh

within a logarithmic term.

Theorem 1.4.2. Let the assumptions of Theorem 1.4.1 be satisfied. Additionally suppose

that (M+

n + 1)2log n = o nυMn++1



as n → ∞ and supj>1E |ej(Z)|20 6 η20. Then for all

n > 1 we have sup T ∈Tυ d,D sup PU |W∈U∞ σ sup ϕ∈Fγρ E |`b b m− `h(ϕ)| 2 6 C Rhn(1+log n)−1

for a constant C > 0 only depending on the classes Fρ

γ, Td,Dυ , the constants σ, η and the

representer h.

Remark 1.4.2. Note that below in all examples illustrating Theorem 1.4.2 the condition

(Mn++ 1)2log n = o(nυM+

n+1)as n tends to infinity is automatically satisfied. 

As in the case of minimax optimal estimation we now present an upper bound uni- formly over the class Fτ

ω of representer. For this purpose define Mnω := max{1 6 m 6

bn1/4c : max

16j6m(ωj−1)6 n}. In the definition of the boundsMcn, Mn+, and Mn−(cf. Ap-

pendix 1.4) we replace Mh

n and max16j6m[h]2j by Mnω and max16j6mωj−1, respectively.

Consequently, by employing khk2

1/γ 6 τ and Rhn 6 τ Rωn for all h ∈ Fωτ the next result

follows line by line the proof of Theorem 1.4.2 and hence its proof is omitted.

Corollary 1.4.3. Under the conditions of Theorem 1.4.2 we have for all n > 1

sup T ∈Tυ d,D sup PU |W∈Uσ∞ sup ϕ∈Fγρ, h∈Fωτ E |`b b m− `h(ϕ)|2 6 C Rωn(1+log n)−1

where the constant C > 0 depends on the parameter spaces Fγρ, Fωτ, Td,Dυ , and the constants σ, η.

Illustration by classical smoothness assumptions. Let us illustrate the cost due to adaption by considering classical smoothness assumptions as discussed in Subsection 1.3.4. In Theorem 1.4.2 and Corollary 1.4.3, respectively, we have seen that the adaptive estimator `b

b

m attains within a constant the rates Rhadaptand Rωadapt. Let us now present

the orders of these rates by considering the example configurations of Remark 1.2.1. The proof of the following result is omitted because of the analogy with the proof of Proposition 1.3.6.

Proposition 1.4.4. Assume an iid. n-sample of (Y, Z, W ) from the model (1.1a–1.1b)

with conditional expectation operator T ∈ Tυ

d,D, error term U such that PU |W ∈ Uσ, and

E[Y2] > 0. Then for the example configurations of Remark 1.2.1 we obtain

(pp) if in addition 3 < 2p + 2 min(s, 0) that mn∼ n(1 + log n)−11/(2p+2a)

and (i) Rh n(1+log n)−1 ∼            (n−1(1 + log n))(2p+2s−1)/(2p+2a), if s − a < 1/2 n−1(1 + log n)2, if s − a = 1/2 n−1(1 + log n), if s − a > 1/2, (ii) Rωn(1+log n)−1 ∼ max (n−1(1 + log n))(p+s)/(p+a), n−1(1 + log n)



. (pe) mn∼ log n(1 + log n)−(a+p)/a1/2a

and (i) Rhn(1+log n)−1 ∼ (1 + log n)−(2p+2s−1)/(2a),

(ii) Rωn(1+log n)−1 ∼ (1 + log n)−(p+s)/a.

(ep) mn∼ log n(1 + log n)−(a+p)/p1/2p

and (i) Rhn(1+log n)−1 ∼            n−1(1 + log n)(2a+2p−2s+1)/(2p), if s − a < 1/2 n−1(1 + log n)(log log n), if s − a = 1/2 n−1(1 + log n), if s − a > 1/2, (ii) Rω

n(1+log n)−1 ∼ max n−1(log n)(a+p−s)/p, n−1(1 + log n)



.

Let us revisit Examples 1.3.1 and 1.3.2. In the following, we apply the general theory to adaptive pointwise estimation and adaptive estimation of averages of the structural function ϕ.

Related documents