Adaptive estimation - Estimation and Testing of Instrumental Mean and Quantile Regression Model

In this section, we derive an adaptive estimation procedure for the value of the linear function `h(ϕ). This procedure is based on the estimator`b

m given in (1.8) with dimen-

sion parametermb selected as a minimizer of the data driven penalized contrast criterion

(1.2a–1.2b). The selection criterion (1.2a–1.2b) involves the random upper bound Mc_n

and the random penalty sequencepend which we introduce below. We show that the esti-

mator`b b

m attains the minimax rate of convergence within a logarithmic term. Moreover,

we illustrate the cost due to adaption by considering classical smoothness assumptions. In an intermediate step we do not consider the estimation of unknown quantities in the penalty function. Let us therefore consider a deterministic upper bound Mn and

a deterministic penalty sequence pen := (pen_m)_m>1, which is nondecreasing. These quantities are constructed such that they can be easily estimated in a second step. As an adaptive choice me of the dimension parameter m we propose the minimizer of a

penalized contrast criterion, that is,

m := arg min 16m6Mn

{Ψ_m+ pen_m} (1.14a)

where the random sequence of contrast Ψ := (Ψm)m>1is defined by

Ψm := max m6m0_6M n n |`b_m0−`b_m|2− pen_m0 o . (1.14b)

The fundamental idea to establish an appropriate upper bound for the risk of`b e

mis given

1.4 Adaptive estimation 31

definition of Ψ andme we deduce for all 1 6 m 6 Mn

|`b e m− `h(ϕ)| 2_{6 3}n_| b ` e m−`bm∧m_e | 2_{+ |} b ` e m∧m−`bm| 2_{+ |} b `m− `h(ϕ)|2 o 6 3nΨ_m+ pen e m+Ψm_e + penm+|`b_m− `_h(ϕ)|2 o 6 6{Ψm+ penm} + 3|`b_m− `_h(ϕ)|2

where the right hand side does not depend on the adaptive choiceme. Since the penalty

sequence pen is nondecreasing we obtain Ψm 6 6 max m6m0_6M |`b_m0− `_h(ϕ_m0)|2− 1 6penm0 + + 3 max m6m0_6M n |`_h(ϕm− ϕm0)|2

where (·)+denotes the positive part of a function. Combining the last estimate with the

previous reduction scheme yields for all 1 6 m 6 Mn

|`b e

n(1+log n)−1. Moreover, we will bound the right hand side term appropriately with the help of Bernstein’s inequality.

Let us now introduce the upper bound Mn and sequence of penalty penm used in

the penalized contrast criterion (1.14a–1.14b). In the following, assume without loss of generality that [h]16= 0.

Definition 1.4.1. For all n > 1 let an := n1−1/ log(2+log n)(1 + log n)−1 and Mnh :=

max{1_{6 m 6 bn}1/4c : max 16j6m[h] 2 j 6 n[h]21} then we define Mn:= min n 2_{6 m 6 M}_nh : m3k[T ]−1_m k2 max 16j6m[h] 2 j > an o − 1

where we set Mn:= Mnhif the min runs over an empty set. Thus, Mntakes values between

1and Mh

n. Let ςm2 = 74 E[Y2] + max16m0_6mk[T ]−1_m [g]_mk2, then we define pen_m := 24 ς_m2 (1 + log n) n−1 max

16m0_6mk[h]

m0[T ]−1_m0k2. (1.16)

To apply Bernstein’s inequality we need another assumption regarding the error term

U. This is captured by the set U_σ∞ for some σ > 0, which contains all conditional distributions PU |W such that E[U|W ] = 0, E[U2|W ] 6 σ2, and Cramer’s condition hold,

i.e.,

E[|U |k|W ] 6 σkk!, k = 3, 4, . . . .

Moreover, besides Assumption 1.3 we need the following Cramer condition which is in particular satisfied if the basis {fl}l>1are uniformly bounded.

Assumption 1.4. There exists η > 1 such that the distribution of W satisfies

sup_j,l∈NE |fj(W )fl(W ) − E[fj(W )fl(W )]|k6 ηkk!, k = 3, 4, . . . .

We now present an upper bound for`b e

m. As Goldenshluger and Pereverzev [2000] we

face a logarithmic loss due to the adaptation.

Theorem 1.4.1. Assume an iid. n-sample of (Y, Z, W ) from the model (1.1a–1.1b) with

E[Y2] > 0. Let Assumptions 1.1–1.4 be satisfied. Suppose that (m◦n)3max16j6m◦

n[h]

j =

o(anυm◦

n)as n → ∞. Then we have for all n > 1

sup T ∈Tυ d,D sup PU |W∈U∞ σ sup ϕ∈Fγρ E|`b e m− `h(ϕ)| 2 6 C Rhn(1+log n)−1

for a constant C > 0 only depending on the classes Fρ

γ, Td,Dυ , the constants σ, η and the

representer h.

Remark 1.4.1. In all examples studied below the condition (m◦

n)3max16j6m◦

n[h]

j = o(anυm◦

as n tends to infinity is satisfied if the structural function ϕ is sufficiently smooth. More precisely, in case of (pp) it suffices to assume 3 < 2p + 2 min(0, s). On the other hand, in case of (pe) or (ep) this condition is automatically fulfilled. In the following definition we introduce empirical versions of the integer Mn and

the penalty sequence pen. Thereby, we complete the data driven penalized contrast criterion (1.2a–1.2b). This allows for a completely data driven selection method. For this purpose, we construct an estimator for ς2

m by replacing the unknown quantities by

their empirical analog, that is,

b ς_m2 := 74n−1 n X i=1 Y_i2+ max 16m0_6mk[T ]b −1 m [bg]mk 2 .

With the nondecreasing sequence (ςb

m)m>1 at hand we only need to replace the matrix

1.4 Adaptive estimation 33

Definition 1.4.2. Let anand Mnh be as in Definition 1.4.1 then for all n > 1 define

c Mn:= min n 2_{6 m 6 M}_nh : m3k[T ]b−1_m k2 max 16j6m[h] 2 j > an o − 1

where we setMc_n:= M_nhif the min runs over an empty set. Thus,Mc_ntakes values between

1and M_nh_{. Then we introduce for all m > 1 an empirical analog of pen}_m by

pen_m := 204ς_b_m2(1 + log n)n−1 max

16m0_6mk[h]

m0[T ]b−1_m0k2. (1.17)

Before we establish the next upper bound we introduce

M_n+:= min 2_{6 m 6 M}_nh : υ_m−1m3 max 16j6m[h] 2 j > 4Dan − 1 (1.18) where M+

n := Mnh if the min runs over an empty set. Thus, Mn+ takes values between 1

and Mh

n. As in the partial adaptive case we attain the minimax rate of convergence Rh

within a logarithmic term.

Theorem 1.4.2. Let the assumptions of Theorem 1.4.1 be satisfied. Additionally suppose

that (M+

n + 1)2log n = o nυMn++1

as n → ∞ and sup_j>1E |ej(Z)|20 6 η20. Then for all

n > 1 we have sup T ∈Tυ d,D sup PU |W∈U∞ σ sup ϕ∈Fγρ E |`b b m− `h(ϕ)| 2 6 C Rhn(1+log n)−1

for a constant C > 0 only depending on the classes Fρ

γ, Td,Dυ , the constants σ, η and the

representer h.

Remark 1.4.2. Note that below in all examples illustrating Theorem 1.4.2 the condition

(M_n++ 1)2log n = o(nυ_M+

n+1)as n tends to infinity is automatically satisfied.

As in the case of minimax optimal estimation we now present an upper bound uniformly over the class Fτ

ω of representer. For this purpose define Mnω := max{1 6 m 6

bn1/4_{c : max}

16j6m(ωj−1)6 n}. In the definition of the boundsMc_n, M_n+, and M_n−(cf. Ap-

pendix 1.4) we replace Mh

n and max16j6m[h]2j by Mnω and max16j6mωj−1, respectively.

Consequently, by employing khk2

1/γ 6 τ and Rhn 6 τ Rωn for all h ∈ Fωτ the next result

follows line by line the proof of Theorem 1.4.2 and hence its proof is omitted.

Corollary 1.4.3. Under the conditions of Theorem 1.4.2 we have for all n > 1

sup T ∈Tυ d,D sup PU |W∈Uσ∞ sup ϕ∈Fγρ, h∈Fωτ E |`b b m− `h(ϕ)|2 6 C Rωn(1+log n)−1

where the constant C > 0 depends on the parameter spaces F_γρ, F_ωτ, T_d,Dυ , and the constants σ, η.

Illustration by classical smoothness assumptions. Let us illustrate the cost due to adaption by considering classical smoothness assumptions as discussed in Subsection 1.3.4. In Theorem 1.4.2 and Corollary 1.4.3, respectively, we have seen that the adaptive estimator `b

m attains within a constant the rates Rhadaptand Rωadapt. Let us now present

the orders of these rates by considering the example configurations of Remark 1.2.1. The proof of the following result is omitted because of the analogy with the proof of Proposition 1.3.6.

Proposition 1.4.4. Assume an iid. n-sample of (Y, Z, W ) from the model (1.1a–1.1b)

with conditional expectation operator T ∈ Tυ

d,D, error term U such that PU |W ∈ Uσ∞, and

E[Y2] > 0. Then for the example configurations of Remark 1.2.1 we obtain

(pp) if in addition 3 < 2p + 2 min(s, 0) that m◦_n∼ n(1 + log n)−11/(2p+2a)

and (i) Rh n(1+log n)−1 ∼            (n−1(1 + log n))(2p+2s−1)/(2p+2a), if s − a < 1/2 n−1(1 + log n)2, if s − a = 1/2 n−1(1 + log n), if s − a > 1/2, (ii) Rω_{n(1+log n)}−1 ∼ max (n−1(1 + log n))(p+s)/(p+a), n−1(1 + log n)

. (pe) m◦_n∼ log n(1 + log n)−(a+p)/a1/2a

and (i) Rh_{n(1+log n)}−1 ∼ (1 + log n)−(2p+2s−1)/(2a),

(ii) Rω_{n(1+log n)}−1 ∼ (1 + log n)−(p+s)/a.

(ep) m◦_n∼ log n(1 + log n)−(a+p)/p1/2p

and (i) Rh_{n(1+log n)}−1 ∼            n−1(1 + log n)(2a+2p−2s+1)/(2p)_, _{if s − a < 1/2} n−1(1 + log n)(log log n), if s − a = 1/2 n−1(1 + log n), if s − a > 1/2, (ii) Rω

n(1+log n)−1 ∼ max n−1(log n)(a+p−s)/p, n−1(1 + log n)

Let us revisit Examples 1.3.1 and 1.3.2. In the following, we apply the general theory to adaptive pointwise estimation and adaptive estimation of averages of the structural function ϕ.

In document Estimation and Testing of Instrumental Mean and Quantile Regression Models (Page 30-35)