Sequential Model Selection Method for Nonparametric Autoregression

(1)

HAL Id: hal-02334926

https://hal.archives-ouvertes.fr/hal-02334926

Submitted on 27 Oct 2019

HAL

is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

Sequential Model Selection Method for Nonparametric

Autoregression

Ouerdia Arkoun, Jean-Yves Brua, Serguei Pergamenchtchikov

To cite this version:

Ouerdia Arkoun, Jean-Yves Brua, Serguei Pergamenchtchikov. Sequential Model Selection Method

for Nonparametric Autoregression. Sequential Analysis, Taylor & Francis, 2019. �hal-02334926�

(2)

Sequential Model Selection Method for

Nonparametric Autoregression

Ouerdia Arkoun

Sup’Biotech, Laboratoire BIRL, 66 Rue Guy Moquet, 94800 Villejuif, and

Normandie Universit´

e, Universit´

e de Rouen,

Laboratoire de Math´

ematiques Rapha¨

el Salem, CNRS, UMR 6085,

Avenue de l’universit´

e, BP 12, 76801 Saint-Etienne du Rouvray France

Jean-Yves Brua

Normandie Universit´

e, Universit´

e de Rouen,

ematiques Rapha¨

e, BP 12, 76801 Saint-Etienne du Rouvray France

Serguei Pergamenchtchikov

Normandie Universit´

e, Universit´

e de Rouen,

ematiques Rapha¨

e, BP 12, 76801 Saint-Etienne du Rouvray France and

International Laboratory of Statistics of Stochastic Processes and

Quantitative Finance, National Research Tomsk State University, Russian

Federation

Abstract: In this paper for the first time the nonparametric autoregression es-timation problem for the quadratic risks is considered. To this end we develop a new adaptive sequential model selection method based on the efficient sequential kernel estimators proposed by Arkoun and Pergamenshchikov (2016). Moreover,

This work was supported by the RSF grant 17-11-01049 (National

Re-search Tomsk State University).

Address correspondence to O. Arkoun, Laboratoire de Math´

ematiques

Rapha¨

el Salem, CNRS, UMR 6085, Avenue de l’universit´

e, BP 12, 76801

Saint-Etienne du Rouvray Cedex France,

(3)

we develop a new analytical tool for general regression models to obtain the non asymptotic sharp oracle inequalities for both usual quadratic and robust quadratic risks. Then, we show that the constructed sequential model selection procedure is optimal in the sense of oracle inequalities.

Keywords: Non-parametric estimation; Non parametric autoregresion; Non-asymptotic estimation; Robust risk; Model selection; Sharp oracle inequalities.

Subject Classifications: primary 62G08, secondary 62G05

1

Introduction

One of the standard linear models in general theory of time series is the autore-gressive model (see, for example, Anderson (1994) and the references therein). Natural extensions for such models are nonparametric autoregressive models which are defined by

y_k=S(x_k)y_k₋₁+ξ_k and x_k =a+k(b−a)

n , (1.1)

where S(·)∈ L₂[a, b] is unknown function, a < b are fixed known constants, 1≤

k≤n, the initial valuey₀ is a constant and the noise (ξ_k)_k_≥₁ is i.i.d. sequence of unobservable random variables withEξ₁= 0 andEξ2₁= 1.

The problem is to estimate the functionS(·) on the basis of the observations (y_k)₁_≤_k_≤_n under the condition that the noise distribution is unknown.

It should be noted that the varying coefficient principle is well known in the regression analysis. It permits the use of more complex forms for regression coef-ficients and, therefore, the models constructed via this method are more adequate for applications (see, for example,Fan and Zhang (2008),Luo et al. (2009)). In this paper we consider the varying coefficient autoregressive models (1.1). There is a number of papers which consider these models such as Dahlhaus (1996a), Dahlhaus (1996b) and Belitser (2000). In all these papers, the authors propose some asymptotic (asn → ∞) methods for different identification studies without considering optimal estimation issues. To our knowledge, for the first time, the minimax estimation problem for the model (1.1) has been treated inArkoun and Pergamenchtchikov (2008) and Moulines et al. (2005) in the nonadaptive case, i.e. for the known regularity of the function S. Then, in Arkoun (2011) it is proposed to use the sequential analysis method for the adaptive pointwise estima-tion problem in the case where the unknown H¨older regularity is less than one, i.e when the function S is not differentiable. Also it should be noted (see, Arkoun (2011)) that for the model (1.1), the adaptive pointwise estimation is possible only in the sequential analysis framework. That is why we study sequential estimation methods for the smooth functionS. In this paper we consider the quadratic risk

(4)

defined as

R_p(Sb_n, S) = E_p,SkSb_n−Sk2, kSk2= Z b

a

S2(x)dx , (1.2)

where Sb_n is an estimator of S based on observations (y_k)₁_≤_k_≤_n and E_p,S is the expectation with respect to the distribution lawP_p,Sof the process (y_k)₁_≤_k_≤_ngiven the distribution density p and the coefficient S. Moreover, taking into account that the distribution p is unknown, we use the robust nonparametric estimation approach proposed inGaltchouk and Pergamenshchikov (2006a). To this end we set the robust risk as

R∗(Sb_n, S) = sup p∈P

R_p(Sb_n, S), (1.3)

whereP is a family of the distributions defined in Section2.

In order to estimate the functionSin model (1.6) we make use of the estimator family (Sb_λ, λ∈Λ), whereSb_λ is a weighted least square estimator with the Pinsker weights. For this family, similarly to Galtchouk and Pergamenshchikov (2009a), we construct a special selection rule, i.e. a random variablebλwith values in Λ, for which we define the selection estimator as Sb_∗ =Sb

b

λ. Our goal in this paper is to show the non asymptotic sharp oracle inequality for the robust risks (1.3), i.e. to show that for any ˇ% >0 andn≥1

R∗₍ b S_∗, S) ≤ (1 + ˇ%) min λ∈Λ R∗₍ b S_λ, S) +Bn ˇ %n, (1.4)

whereB_n is a rest term such that for any ˇδ >0,

lim n→∞

Bn

nδˇ = 0. (1.5)

In this case the estimatorSb_∗is called optimal in the oracle inequality sense. In this paper, in order to obtain this inequality for model (1.1) we develop a new model selection method based on the truncated sequential procedures developed in Arkoun and Pergamenchtchikov (2016) for the pointwise efficient estimation. Then we use the non asymptotic analysis tool proposed in Galtchouk and Perga-menshchikov (2009a) based on the non-asymptotic studies from Baron et al. (1999) for a family of least-squares estimators and extended in Fourdrinier and Pergamenshchikov (2007) to some other estimator families. To this end we use the approach proposed in Galtchouk and Pergamenshchikov (2011), i.e. we pass to a discrete time regression model by making use of the truncated sequential pro-cedure introduced inArkoun and Pergamenchtchikov (2016). To this end, at any point (z_l)₁_≤_l_≤_d of a partition of the interval [a, b], we define a sequential procedure (τ_l, S_l∗) with a stopping rule τ_land an estimator S_l∗. ForY_l=S∗_l with 1≤l≤d, we come to the regression equation on some set Γ⊆Ω:

(5)

Here, in contrast with the classical regression model, the noise sequence (ζ_l)₁_≤_l_≤_d has a complex structure, namely,

ζ_l= ξ_l∗+$_l, (1.7) where (ξ∗

l)1≤l≤d is a ”main noise” sequence of uncorrelated random variables and ($_l)₁_≤_l_≤_n is a sequence of bounded random variables.

We will use the oracle inequality (1.4) to prove the asymptotic efficiency for the proposed procedure, using the same method as it is been used inGaltchouk and Pergamenshchikov (2009b). The asymptotic efficiency means that the procedure provides the optimal convergence rate and the asymptotically minimal rate normal-ized risk which coincides with the Pinsker constant. It should be emphasnormal-ized that only sharp oracle inequalities of type (1.4) allow to synthesis efficiency property in the adaptive setting.

The paper is organized as follows: In Section 2 we state the main conditions for the model (1.1). In Section3we describe the passage to the regression scheme. In Section4 we describe the sequential model selection procedure. In Section5 we announce the main results. In Section 6 we study the properties of the obtained regression model (1.6). In Section7 we prove all basic results. In AppendixA we give all the auxiliary and technical tools.

2

Main Conditions

As inArkoun and Pergamenchtchikov (2016) we assume that in the model (1.1) the i.i.d. random variables (ξk)k≥1have a densityp(with respect to Lebesgue measure)

from the functional classP defined as P := ( p≥0 : Z +∞ −∞ p(x) dx= 1, Z +∞ −∞ x p(x) dx= 0, Z +∞ −∞ x2p(x) dx= 1 and sup k≥1 R+∞ −∞ |x| 2k_p₍_x_{) d}_x ςk₍₂_k₋_1)!! ≤1    , (2.1)

whereς ≥1 is some parameter, which may be a function of the number observation

n, i.e. ς=ς(n), such that for any ˇδ >0 lim n→∞

ς(n)

nδˇ = 0. (2.2)

Note that the (0,1)-Gaussian density belongs to P. In the sequel we denote this density byp₀. It is clear that for any q >0

m∗_q = sup p∈P

(6)

where E_p is the expectation with respect to the density pfrom P. To obtain the stable (uniformly with respect to the functionS ) model (1.1), we assume that for some fixed 0< <1 andL >0 the unknown functionS belongs to theε-stability setintroduced inArkoun and Pergamenchtchikov (2016) as

Θ_,L=nS ∈C₁([a, b],R) :|S|∗≤1− and |S˙|∗≤L

o

, (2.4) where C₁([a, b],_R) is the Banach space of continuously differentiable [a, b] → _R functions and|S|_∗= sup_a_≤_x_≤_b|S(x)|.

3

Passage to a discrete time regression model

We will use as a basic procedure the pointwise procedure fromArkoun and Perga-menchtchikov (2016) at the points (z_l)₁_≤_l_≤_d defined as

z_l=a+ l

d(b−a) and d= [

√

n], (3.1)

where [a] is the integer part of a number a. So we propose to use the first ι_l

observations for the auxiliary estimation ofS(z_l). We set

b S_ι l= 1 A_ι l ι_l X j=1 Q_l,jy_j₋₁y_j, A_ι l= ι_l X j=1 Q_l,jy2_j₋₁, (3.2)

where Ql,j =Q(ul,j) and the kernel Q(·) is the indicator function of the interval [−1; 1], i.e. Q(u) =1_[₋₁_,_1](u). The points (u_l,j) are defined as

u_l,j =xj−zl

h . (3.3)

Note that to estimateS(z_l) on the basis of kernel estimator with the kernel Qwe use only the observations (y_j)_k

1,l≤j≤k2,l from theh- neighborhood of the pointzl,

i.e.

k₁_,l= [nz_e_l−neh] + 1 and k₂_,l= [nz_e_l+neh]∧n , (3.4) wherez_el = (zl−a)/(b−a) and eh=h/(b−a). Note that, only for the last point

z_d=b, we takek₂_,d=n. We chooseι_lin (3.2) as

ιl=k1,l+q and q=qn= [(neh)µ0] (3.5) for some 0< µ₀<1. In the sequel for any 0≤k < m≤nwe set

A_k,m= m X

j=k+1

(7)

Next, similarly toArkoun (2011), we use a kernel sequential procedure based on the observations (y_j)_ι

l≤j≤n. To transform the kernel estimator in a linear function of

observations and we replace the number of observationsnby the following stopping time

τ_l= inf{ι_l+ 1≤k≤k₂_,l : A_ι

l,k≥Hl}, (3.7)

where inf{∅} = k₂_,l and the positive threshold H_l will be chosen as a positive random variable which is measurable with respect to theσ-field{y₁, . . . , y_ι

l}.

Now we define the sequential estimator as

S_l∗= 1 H_l   τ_l−1 X j=ι_l+1 Q_l,jy_j₋₁y_j +_κ_lQ_l,τ lyτl−1yτl  1Γ_l, (3.8) where Γ_l={A_ι

l,k2,l−1≥Hl} and the correcting coefficient 0<κl≤1 on this set

is defined as A_ι l,τl−1+κ 2 l Ql,τ_ly 2 τ_l−1=Hl. (3.9) Note that, to obtain the efficient kernel estimator ofS(z_l) we need to use allk₂_,l−

ιl−1 observations. Similarly to Konev and Pergamenshchikov (1984), one can show thatτ_l≈γ_lH_l asH_l→ ∞, where

γ_l= 1−S2(z_l). (3.10) Therefore, one needs to chooseH_l as (k₂_,l−ι_l−1)/γ_l. Taking into account that the coefficientsγ_lare unknown, we define the thresholdH_l as

H_l= 1−e e γ_l (k2,l−ιl−1) and e= 1 2 + lnn, (3.11) where _eγ_l = 1−Se2_ι l and Se_ι

l is the projection of the estimator Sbιl in the interval

]−1 +_e,1−_e[, i.e. e

S_ι

l= min(max(Sbιl,−1 +e),1−e). (3.12) To obtain the uncorrelated stochastic terms in kernel estimator forS(z_l) we choose the bandwidthhas

h=b−a

2d . (3.13)

As to the estimatorSb_ι

l, we can show the following property.

Theorem 3.1. The convergence rate in probability of the estimator (3.12)is more rapid than any power function, i.e. for anyb>0

lim n→∞n b _max 1≤l≤d_S_∈sup_Θ ,L sup p∈P P_p,S|Se_ι l−S(zl)|> 0 = 0, (3.14)

where₀=₀(n)→0 asn→ ∞ such thatlim_n_→∞nδˇ

(8)

Now we set

Y_l=S_l∗(z_l)1_Γ and Γ =∩d

l=1Γl. (3.15) Using the convergence (3.14), we study the probability properties of the set Γ in the following theorem.

Theorem 3.2. For any b>0, the probability of the set Γ satisfies the following asymptotic equality lim n→∞n b _sup S∈Θ_,L P_p,S(Γc) = 0. (3.16)

In view of this theorem we can shrink the set Γc_{. So, using the estimators (}_3.15₎ on the set Γ we obtain the discrete time regression model (1.6) in which

ξ∗_l = Pτ_l−1 j=ι_l+1 Ql,jyj−1ξj+κlQ(ul,τl)yτl−1ξτl H_l (3.17) and$_l=$₁_,l+$₂_,l, where $₁_,l= Pτ_l−1 j=ι_l+1Ql,jy 2 j−1sˇl,j+κl2Q(ul,τ_l)y2τ_l−1ˇsl,τ_l H_l , ˇsl,j=S(xj)−S(zl) and $₂_,l= (κl−κ 2 l)Q(ul,τ_l)y 2 τ_l−1S(xτ_l) H_l .

Note that in the model (1.6) the random variables (ξ_j∗)₁_≤_j_≤_d are defined only on the set Γ. For technical reasons we need to define these variables on the set Γc _as well. To this end, for anyj ≥1 we set

ˇ Q_l,j =Q_l,jy_j₋₁1_{_j<k 2,l}+ p H_lQ_l,j1_{_j₌_k 2,l} (3.18) and ˇA_ι l,m= Pm j=ι_l+1 Qˇ 2

l,j. Note, that for anyj≥1 andl6=m ˇ

Q_l,jQˇ_m,j = 0. (3.19)

and ˇA_ι

l,k2,l≥Hl. So now we can modify the stopping time (3.7) as

ˇ

τ_l= inf{k≥ι_l+ 1 : ˇA_ι

l,k≥Hl}. (3.20)

Obviously, ˇτ_l ≤k₂_,l and ˇτ_l =τ_l on the set Γ for any 1≤l ≤d. Now similarly to (3.9) we define the correction coefficient as

ˇ A_ι l,τˇl−1+ ˇκ 2 l Qˇ 2 l,τˇ_l =Hl. (3.21) It is clear that 0 < κˇl ≤ 1 and ˇκl =κl on the set Γ for 1 ≤ l ≤d. Using this coefficient we set

η_l= Pτˇl−1

j=ι_l+1Qˇl,jξj+ ˇκlQˇl,τˇlξτˇl

(9)

Note that on the set Γ, for any 1≤l≤d, the random variablesη_l=ξ∗_l. Moreover (see LemmaA.2), for any 1≤l≤dandp∈ P

E_p,S(η_l|G_l) = 0, E_p,S η2_l|G_l =σ_l2 and E_p,S η4_l |G_l ≤mˇσ_l4, (3.23) where σ_l =H_l−1/2, Gl =σ{η1, . . . , ηl−1, σl} and ˇm = 4(144/ √ 3)4_m∗ 4. It is clear that σ₀_,_∗≤ min 1≤l≤d σ_l2≤ max 1≤l≤d σ_l2≤σ₁_,_∗, (3.24) where σ₀_,_∗= 1− 2 2(1−_e)nh and σ1,∗= 1 (1−_e)(2nh−q−3). Now, taking into account that|$₁_,l| ≤Lh, for any S∈Θ_,L we obtain that

sup S∈Θ_,L E_p,S1_Γ$_l2≤ L2h2+ υˇn (nh)2 , (3.25)

where ˇυ_n = sup_p_∈P sup_S_∈_Θ

,L Ep,S max1≤j≤n y

4

j. The behavior of this coefficient is studied in the following theorem.

Theorem 3.3. For anyb>0 the sequence(ˇυ_n)_n_≥₁ satisfies the following limiting equality

lim n→∞n

−b_υ_ˇ

n= 0. (3.26)

Remark 3.1. It should be noted that the property (3.26)means that the asymptotic behavior of the upper bound (3.25)is approximately as h−2 _when_n_{→ ∞}_{. We will}

use this in the oracle inequalities below.

Remark 3.2. It should be emphasized that to estimate the function S in (1.1)

we use the approach developed inGaltchouk and Pergamenshchikov (2011) for the sequential drift estimation problem in the stochastic differential equation. On the basis of the efficient sequential kernel procedure developped inGaltchouk and Perga-menshchikov (2005),Galtchouk and Pergamenshchikov(2006b) andGaltchouk and Pergamenshchikov (2015) with the kernel-indicator, the stochastic differential equa-tion is replaced by regression model. It should be noted that to obtain the efficient estimator one needs to take the kernel-indicator estimator. By this reason, in this paper, we use the kernel-indicator in the sequential estimator (3.8). It also should be noted that the sequential estimator (3.8) which has the same form as inArkoun and Pergamenchtchikov (2016), except the last term, in which the correction coef-ficient is replaced by the square root of the coefcoef-ficient used in Konev (2016). We modify this procedure to calculate the variance of the stochastic term (3.17).

(10)

4

Model selection

In this section we consider the nonparametric estimation problem in the non asymp-totic setting for the regression model (1.6) for some set Γ⊆Ω. The design points (z_l)₁_≤_l_≤_dare defined in (3.1). The functionS(·) is unknown and has to be estimated from observations the Y1, . . . , Yd. Moreover, we assume that the unobserved ran-dom variables (η_l)₁_≤_l_≤_dsatisfy the properties (3.23) with some nonrandom constant

ˇ

m>1 and the known random positive coefficients (σ_l)₁_≤_l_≤_d satisfy the inequality (3.24) for some nonrandom positive constantsσ₀_,_∗andσ₁_,_∗Concerning the random sequence$= ($_l)₁_≤_l_≤_n we suppose that

u∗_d=E_p,S1_Γk$k2

d<∞. (4.1)

The performance of any estimator Sb will be measured by the empirical squared error kSb−Sk2_d = (bS−S,Sb−S)_d = b−a d d X l=1 (bS(z_l)−S(z_l))2.

Now we fix a basis (φ_j)₁_≤_j_≤_nwhich is orthonormal for the empirical inner product:

(φi, φj)d= b−a d d X l=1 φi(z_l)φ_j(z_l) =1_{_i₌_j_}. (4.2)

For example, we can take the trigonometric basis (φj)j≥1 inL2[a, b] defined as

φ₁= 1, φ_j(x) = r

2

b−aTrj(2π[j/2]l0(x)), j≥ 2, (4.3)

where the function Tr_j(x) = cos(x) for even j and Tr_j(x) = sin(x) for odd j, [x] denotes the integer part of x. and l₀(x) = (x−a)/(b−a). Note that, using the orthonormality property (4.2) we can represent for any 1≤l≤dthe functionSas

S(z_l) = d X j=1 θ_j,dφ_j(z_l) and θ_j,d= S, φ_j d . (4.4)

So, to estimate the functionSwe have to estimate the Fourier coefficients (θ_j,d)₁_≤_j_≤_d. To this end, we replace the functionS by these observations, i.e.

b θ_j,d= b−a d d X l=1 Y_lφ_j(z_l). (4.5)

From (1.6) we obtain immediately the following regression scheme

b

θ_j,d = θ_j,d +ζ_j,d with ζ_j,d= r

b−a

(11)

where η_j,d= r b−a d d X l=1 η_lφ_j(z_l) and $_j,d=b−a d d X l=1 $_lφ_j(z_l).

Note that the upper bound (3.24) and the Bounyakovskii-Cauchy-Schwarz inequal-ity imply that

|$_j,d| ≤ k$kdkφjkd=k$kd. (4.7) We estimate the functionSon the grid (3.1) by the weighted least-squares estimator

b S_λ(z_l) = d X j=1 λ(j)θb_j,dφ_j(z_l)1_Γ, 1≤l≤d , (4.8)

where the weight vectorλ= (λ(1), . . . , λ(d))0belongs to some finite set Λ⊂[0,1]d_, the prime denotes the transposition. We set for anya≤t≤b

b S_λ(t) = d X l=1 b S_λ(z_l)1_{_z l−1<t≤zl}. (4.9)

Moreover, denotingλ2_{= (}_λ2₍₁₎_{, . . . , λ}2₍_n₎₎0 _{we define the following sets}

Λ₁={λ2, λ∈Λ} and Λ₂= Λ∪Λ₁. (4.10) Denote byν the cardinal number of the set Λ and

ν∗= max λ∈Λ d X j=1 1_{λ(j)>0}.

In order to obtain a good estimator, we have to write a rule to choose a weight vectorλ∈Λ in (4.8). We define the empirical squared risk as

Err_d(λ) =kSb_λ−Sk2_d. Using (4.4) and (4.8) we can rewrite this risk as

Err_d(λ) = d X j=1 λ2(j)bθ2_j,d −2 d X j=1 λ(j)bθ_j,dθ_j,d + d X j=1 θ_j,d2 . (4.11)

Since the coefficientθ_j,d is unknown, we need to replace the termθb_j,dθ_j,dby some of its estimators which we choose as

e θ_j,d=θb2_j,d− b−a d sj,d with sj,d= b−a d d X l=1 σ2_lφ2_j(z_l). (4.12)

(12)

Note that from (3.24) - (4.2) it follows that

s_j,d≤σ₁_,_∗. (4.13) Similarly to Galtchouk and Pergamenshchikov (2011) one needs to introduce a penalty term in the cost function to compensate such modification of the empirical risk. We choose it as P_d(λ) =b−a d d X j=1 λ2(j)s_j,d. (4.14)

Finally, we define the cost function in the following form

J_d(λ) = d X j=1 λ2(j)bθ2_j,d −2 d X j=1 λ(j)θe_j,d + δP_d(λ). (4.15)

where 0< δ <1 is some positive constant which will be chosen later. We set b

λ= argmin_λ_∈_ΛJ_d(λ) (4.16) and define an estimator ofS(t) of the form (4.9):

b

S_∗(t) =Sb

b

λ(t) for a≤t≤b . (4.17) Remark 4.1. We use the procedure(4.17)to estimate the functionS in the autore-gressive model (1.1)through the regression scheme (1.6)generated by the sequential procedures (3.15).

5

Main results

In this section we formulate all main results. First we obtain the sharp oracle inequality for the selection model procedure (4.17) for the general regression model (1.6).

Theorem 5.1. There exists some constant l∗>0 such that for any weight vectors setΛ, anyp∈ P, anyn≥ 1 and0< δ≤1/12, the procedure (4.17), satisfies the following oracle inequality

E_p,SkSb_∗−Sk2_d≤ 1 + 4δ 1−6δminλ∈Λ Ep,SkSbλ−Sk 2 d +l∗νς 2 δ σ2 1,∗ σ₀_,_∗d+u ∗ d+δ 2p P_S(Γc₎ ! . (5.1)

(13)

Theorem 5.2. There exists some constant l∗>0 such that for any weight vectors set Λ, any continuously differentiable function S, any p ∈ P, any n ≥ 1 and

0< δ≤1/12, the procedure (4.17)satisfies the following oracle inequality

R_p(Sb_∗, S) ≤ (1 + 4δ)(1 +δ)2 1−6δ minλ∈Λ R_p(Sb_λ, S) +l∗ς 2_ν δ kS˙k2 d2 + σ2 1,∗ σ₀_,_∗d+u ∗ d+δ 2p P_S(Γc₎ ! . (5.2)

Now we assume that the cardinal ν of Λ and the parameter ς in the density family (2.1) are functions of the number observationsn, i.e. ν =ν(n) andς =ς(n) such that for any ˇδ >0

lim n→∞

ν(n)

nδˇ = 0. (5.3)

Using Theorems 3.2 – 3.3 and the bounds (3.24) - (3.25) we obtain the oracle inequality for the estimation problem for the model (1.1).

Theorem 5.3. Assume that the conditions (2.2) and (5.3) hold. Then for any

p ∈ P, S ∈ Θ_,L, n ≥ 3 and 0 < δ ≤ 1/12, the procedure (4.17) satisfies the following oracle inequality

R_p(Sb∗, S)≤ (1 + 4δ)(1 +δ)2 1−6δ minλ∈Λ R_p(Sbλ, S) + ˇ B_n(p) δn , (5.4)

where the termBˇ_n(p)is such that for anyδ >ˇ 0

lim n→∞

ˇ B_n(p)

nδˇ = 0. We obtain the same inequality for the robust risks

Theorem 5.4. Assume that the conditions (2.2) and (5.3) hold. Then for any

n ≥ 3, any S ∈ Θ_,L and any 0 < δ ≤ 1/12, the procedure (4.17) satisfies the following oracle inequality

R∗(Sb∗, S)≤ (1 + 4δ)(1 +δ)2 1−6δ minλ∈ΛR ∗₍ b Sλ, S) + B∗ n δn , (5.5)

where the termBˇ_n is such that for anyδ >ˇ 0 lim n→∞

B∗_n

nδˇ = 0.

It is well known that to obtain the efficiency property we need to specify the weight coefficients (λ(j))₁_≤_j_≤_n(see, for example,Galtchouk and Pergamenshchikov

(2009b)). Consider for some fixed 0< ε <1 a numerical grid of the form

(14)

wherem= [1/ε2]. We assume that both parametersk∗≥1 andεare functions of

n, i.e. k∗=k∗(n) andε=ε(n), such that        lim_n_→∞k∗(n) = +∞, lim_n_→∞ k ∗₍_n₎ lnn = 0,

lim_n_→∞ε(n) = 0 and lim_n_→∞ nδˇ_ε₍_n_{) = +∞}

(5.7)

for any ˇδ >0. One can take, for example, forn≥2

ε(n) = 1 lnn and k ∗₍_n_{) =}_k∗ 0+ √ lnn , (5.8) where k∗₀ ≥0 is some fixed constant. For each α= (β,l) ∈ A, we introduce the weight sequence

λ_α= (λ_α(j))₁_≤_j_≤_n with the elements

λ_α(j) =1_{₁_≤_j<j ∗}+ 1−(j/ωα) β 1_{_j ∗≤j≤ωα}, (5.9) wherej_∗= 1 + [lnn],ω_α= (d_βln)1/(2β+1) _and d_β =(β+ 1)(2β+ 1) π2β_β . Now we define the set Λ as

Λ = {λ_α, α∈ A}. (5.10) Note that these weight coefficients are used inKonev and Pergamenshchikov (2012, 2015) for continuous time regression models to show the asymptotic efficiency. It will be noted that in this case the cardinal of the set Λ isν=k∗m. It is clear that properties (5.7) imply condition (5.3).

6

Properties of the regression model

(

1.6

)

In order to prove the oracle inequalities we need to study the conditions introduced inKonev and Pergamenshchikov (2012) for the general semi-martingale model. To this end we set for anyλ∈Rd the functions

B(λ) = b√−a d d X j=1 λ(j)η_e_j,d, η_e_j,d=η2_j,d−E_p,Sη2_j,d. (6.1)

Property 6.1. For anyd≥1 and anyλ= (λ₁, . . . , λ_d)∈_Rd

E_p,S1_Γ B2(λ) ≤10(b−a)σ₁_,_∗m Eˇ _p,SP_d(λ). (6.2)

(15)

Proof. First note that the random variable_eη_j,dcan be represented as e ηj,d= b−a d d X l=1 φ2_j(zl) ˇηl+ 21{l≥2}υˇj,lηl , where ˇη_l=η2 l −σ 2 l and ˇυj,l=φj(zl) Pl−1

r=1 φj(zr)ηr. Therefore, we can rewrite the termB(λ) as

B(λ) =B₁(λ) + 2B₂(λ).

The termsB₁(λ) andB₂(λ) are defined as

B₁(λ) = (b−a) 2 d√d d X l=1 ψ₁_,l(λ)ˇη_l and B₂(λ) =(b−a) 2 d√d d X l=2 ψ₂_,l(λ)η_l, where ψ1,l(λ) = d X j=1 λ(j)φ2_j(zl) and ψ2,l(λ) = d X j=1 λ(j)ˇυj,l. So, E_p,SB2(λ)≤2E_p,SB₁2(λ) + 8E_p,SB2₂(λ). (6.3) Taking into account property (4.2) and Bounyakovskii - Cauchy - Schwarz inequality we get ψ2₁_,l(λ)≤ d X j=1 λ2(j)φ2_j(z_l) d X j=1 φ2_j(z_l) = d b−a d X j=1 λ2(j)φ2_j(z_l).

In view of properties (3.23) we obtain that

E_p,SB2₁(λ) =(b−a) 4 d3 d X l=1 ψ₁2_,l(λ)E_p,Sηˇ_l2≤(b−a) 4 d3 d X l=1 ψ2₁_,l(λ)E_p,Sη_l4 ≤σ1,∗mˇ(b−a) 3 d2 Ep,S d X j=1 λ2(j) d X l=1 σ_l2φ2_j(z_l) = σ₁_,_∗(b−a) ˇm E_p,SP_d(λ).

To estimate the last term in the right hand side of the inequality (6.3), noting that the termψ₂_,l can be represented as

ψ₂_,l(λ) = l−1

X

r=1

(16)

whereg_l,r=Pd_j₌₁λ(j)φ_j(z_l)φ_j(z_r), we use properties (3.23) to obtain E_p,SB2₂(λ) =(b−a) 4 d3 d X l=2 E_p,Sψ2₂_,l(λ)η_l2≤σ1,∗(b−a) 4 d3 d X l=2 E_p,Sψ₂2_,l(λ) =σ1,∗(b−a) 4 d3 d X l=2 l−1 X r=1 g_l,r2 E_p,Sσ2_r ≤σ1,∗(b−a) 4 d3 d X r=1 E_p,Sσ2_r d X l=1 g2_l,r= σ₁_,_∗(b−a)E_p,SP_d(λ). Hence Property6.1. 2

Now we need the following moment bound. Property 6.2. For any non randomv1, . . . , vd

E   d X j=1 v_jη_j,d   2 ≤ σ₁_,_∗ d X j=1 v2_j. (6.4)

Proof. Note that

E   d X j=1 v_jη_j,d   2 = b−a d E d X l=1 σ2_l   d X j=1 v_jφ_j(z_l)   2 ≤ σ1,∗(b−a) d d X l=1   d X j=1 v_jφ_j(z_l)   2 .

By applying the orthonormal property (4.2) we obtain the desired inequality. Hence Property 6.2. 2

7

Proofs

7.1

Proof of Theorem

3.1

First recall that

b S_ι l= 1 A_ι l ι_l X j=1 Q_l,jy_j₋₁y_j and Se_ι l= min(max(Sbιl,−1 + ˜),1−˜),

(17)

where ˜= 1/(2 + lnn). Note that for sufficiently largen, for which we have ˜ <

and thenS(z_l)∈[−1 + ˜; 1−˜]. We can write

y_j2₋₁. Taking into account that |x_j−z_l| ≤ h for

k1,l ≤j ≤k2,l, we obtain that for anyS ∈Θ,L, |Sb_ι

l−S(zl)| ≤Lh+|In|.

So, for sufficiently largen

P_p,S|Se_ι l−S(zl)|> 0 ≤P_p,SI_n> 0 2 ≤P_p,SI_n >0 2 ,Ξ +P_p,S(Ξc), (7.1) where Ξ =n Υm0,m1(zl) ≤1/2 o , m₀ = k₁_,l−2, m₁ = ι_l−1 and Υ_m 0,m1(zl) is defined in (A.7). Hence we obtain the following inequality on the set Ξ:

ι_l X j=k₁_,l y2_j₋₁= (ι_l−k₁_,l+ 1) ₁ 1−S2₍_z l) + Υ_m 0,m1(zl) ≥q 2. Therefore, for any ˇp >2,

P_p,SI_n >0 2 ,Ξ ≤P_p,S   ι_l X j=k₁_,l y_j₋₁ξ_j >q 2   ≤ 2 ˇ p qpˇEp,S ι_l X j=k₁_,l y_j₋₁ξ_j ˇ p . (7.2)

Using here the correlation inequality (A.2) and the bound (A.6), we obtain that

max 1≤l≤d_S_∈sup_Θ ,L sup p∈P E_p,S ι_l X j=k₁_,l y_j₋₁ξ_j ˇ p ≤c_p_ˇqp/ˇ2.

(18)

7.2

Proof of Theorem

3.2

First note, that

P_p,S(Γc)≤ d X l=1 P_p,SA_ι l,k2,l−1< Hl .

Moreover, note that in view of definition (A.7) the termA_ι

l,k2,l−1can be represented as Aι_l,k₂_,l−1= (m1,l−m0,l) ₁ γ_l+ Υm0,l,m1,l(zl) ,

wherem₀_,l=ι_l−1 andm₁_,l =k₂_,l−2. Taking into account the definition ofH_lin (3.11) and the fact that 0<_eγ_l, γ_l≤1 and that|_eγ_l−γ_l| ≤2|Se_ι

l−S(zl)|, we obtain P_p,SA_ι l,k2,l−1< Hl =P_p,S ₁ γ_l + Υm0,l,m1,l(zl)< 1−_e e γ_l ≤P_p,S 1 γ_l − 1 e γ_l > e 2 +P_p,S Υm0,l,m1,l(zl) > e 2 ≤P_p,S Seιl−S(zl) > e 3 4 +P_p,S Υm0,l,m1,l(zl) > e 2 .

Applying here Theorem3.1and LemmaA.6we obtain Theorem3.2.

7.3

Proof of Theorem

3.3

Note that, for anym≥1 E_p,S max 1≤j≤n y4_j ≤nb/2+ n X j=1 Z +∞ nb/2 P_p,Sy4_j ≥zdz ≤nb/2+n max 1≤j≤n E_p,S|y_j|4mZ +∞ nb/2 z−mdz =nb/2+ max 1≤j≤n E_p,S|y_j|4mn1−b(m−1)/2 m−1 .

Choosing here m > 1 + 2/b and using the bound (A.6) we obtain the property (3.26). Hence Theorem3.3.

7.4

Proof of Theorem

5.1

First of all, note that on the set Γ we can represent the empirical squared error Err_d(λ) in the form

Err_d(λ) =J_d(λ) + 2 d X j=1 λ(j)θ_j,d0 +kSk2 d−δ Pd(λ) (7.3)

(19)

withθ0_j,d=θe_j,d−θ_j,dθb_j,d. From (4.6) we find that θ0_j,d=θj,dζj,d+ b−a d eηj,d+ 2 r b−a d ηj,d$j,d+$ 2 j,d, whereη_e_j,d=η2 j,d−sj,d. Now putting M(λ) = d X j=1 λ(j)θ_j,dζ_j,d, (7.4) we rewrite (7.3) as follows Err_d(λ) =J_d(λ) + 2M(λ) + 2√1 dB(λ) + 2∆(λ) +kSk2 d−δ Pd(λ), (7.5) whereB(λ) is given in (6.1), ∆(λ) = ∆₁(λ) + ∆₂(λ), ∆₁(λ) = d X j=1 λ(j)$2_j,d and ∆₂(λ) = 2 r b−a d d X j=1 λ(j)η_j,d$_j,d.

In view of Property6.1, for anyλ∈Rd,

E_p,S1_ΓB2(λ)≤ 10σ₁_,_∗(b−a) ˇm E_p,SP_d(λ). (7.6) Note that the inequalities (3.24) imply that

P₀_,d(λ)≤P_d(λ)≤P₁_,d(λ), (7.7) where P₀_,d(λ) = σ0,∗(b−a)|λ| 2 d and P1,d(λ) = σ₁_,_∗(b−a)|λ|2 d .

For ∆₁(λ), taking into account the properties of Fourier coefficients we obtain that sup λ∈[0,1]d |∆1(λ)| ≤ d X j=1 $2_j,d=k$k2_d. (7.8)

To estimate the term ∆₂(λ) we recall that, for anyε >0 and anyx, y∈R

2xy≤εx2+ε−1y2. (7.9) Therefore, for some 0< ε <1,

|∆₂(λ)| ≤εb−a d d X j=1 λ2(j)η_j,d2 +k$k 2 d ε =εPd(λ) +ε |B(λ2_)| √ d + k$k2 d ε ,

(20)

where the vectorλ2∈Λ1as in (4.10). Thus, for anyλ∈[0,1]d, |∆(λ)| ≤εP_d(λ) +εB(λ 2_)| √ d + 2ε −1_k_$_k2 d. (7.10) Putting M₁(λ) = 2B√(λ) d + 2∆(λ),

we can rewrite the empirical risk (7.5) as

Err_d(λ) =J_d(λ) + 2M(λ) +M₁(λ) +kSk2 d−δ Pd(λ). (7.11) From (7.10) we obtain |M₁(λ)| ≤2|B√(λ)| d + 2 |B(λ2_)| √ d + 2εPd(λ) + 4ε −1_k_$_k2 d. Moreover, setting B∗= sup λ∈Λ _B2₍_λ₎ P_d(λ) + B2(λ2) P_d(λ2₎

and taking into account thatP_d(λ2)≤P_d(λ) for anyλ∈Λ, we get 2|B√(λ)| d + 2 |B(λ2)| √ d ≤2εPd(λ) +ε −1B∗ d . By choosingε=δ/4 we find |M₁(λ)| ≤δP_d(λ) +16 δ Υd, Υd= B∗ 4d +k$k 2 d. (7.12) Now from (7.11) we obtain that, for some fixed λ0 from Λ,

Err_d(bλ)−Err_d(λ0) = Jd(bλ)− Jd(λ0) + 2M(µb)

+M₁(bλ)−δP_d(bλ)−M₁(λ₀) +δP_d(λ₀),

whereµ_b=bλ−λ₀. By the definition of bλin (4.16) we obtain on the set Γ Err_d(bλ)≤Err_d(λ0) + 2M(µb) + 32

Υ_d

δ + 2δPd(λ0). (7.13)

From (7.6) and (7.7) it follows that E_p,S1_ΓB∗≤X λ∈Λ E_p,S1_Γ _B2₍_λ₎ P_d(λ) + B2₍_λ2₎ P_d(λ2₎ ≤10σ₁_,_∗(b−a) ˇmX λ∈Λ P₁_,d(λ) P₀_,d(λ)+ P₁_,d(λ2₎ P₀_,d(λ2₎ ! = 20 ˇm(b−a)ν σ_∗.

(21)

andσ_∗=

σ2 1,∗

σ₀_,_∗ .Therefore, for 0< δ <1 this inequality allows to bound Υd as

Ep,S1ΓΥd≤ 5 ˇm(b−a)σ_∗ν d +u ∗ d, (7.14) whereu∗ d is given by (4.1).

Now we study the second term on the right-hand side of inequality (7.13). For any weight vectorλ∈Λ we setµ=λ−λ₀. Then we decompose this term as

M(µ) =Z(µ) +V(µ) (7.15) with Z(µ) = r b−a d d X j=1 µ(j)θ_j,dη_j,d and V(µ) = d X j=1 µ(j)θ_j,d$_j,d.

We define now the weighted discrete Fourier transformation ofS, i.e. we set

ˇ S_µ= d X j=1 µ(j)θ_j,dφ_j. (7.16)

Now by using Property 6.2we can estimate the termZ(µ) as

E_p,S1_ΓZ2(µ) ≤σ1,∗(b−a)

d k

ˇ

S_µk2

d:=σ1,∗(b−a)D(µ). (7.17)

Moreover, by the inequalities (7.9) with ε=δand (7.8) we can estimateV(µ) as follows 2V(µ) = 2 d X j=1 µ(j)θ_j,d$_j,d≤δkSˇ_µk2 d+ k$k2 d δ . (7.18) Setting Z∗= sup µ∈Λ−λ₀ Z2₍_µ₎ D(µ) , we obtain on the set Γ

2M(µ)≤ 2δkSˇ_µk2 d+ Z∗ dδ + k$k2 d δ . (7.19)

Note now that from (7.17) it follows that

E_p,S1_ΓZ∗≤ X µ∈Λ−λ₀

E_p,S1_ΓZ2₍_µ₎

(22)

Now we estimate the first term on the right-hand side of the inequality (7.19). On the set Γ we have

kSˇ_µk2 d− kSb_µk2_d= d X j=1 µ2(j)(θ_j,d2 −θb2_j,d) ≤ −2 d X j=1 µ2(j)θ_j,dζ_j,d =−2Z₁(µ)−2V₁(µ), (7.21) where Z₁(µ) = r b−a d d X j=1 µ2(j)θ_j,dη_j,d and V₁(µ) = d X j=1 µ2(j)θ_j,d$_j,d.

Taking into account that|µ(j)| ≤1, similarly to inequality (7.17), we find E_p,S1_ΓZ₁2(µ) ≤ σ₁_,_∗D(µ).

Moreover, for the random variable

Z₁∗= sup µ∈Λ−λ₀

Z2 1(µ)

D(µ) , we obtain the same upper bound as in (7.20), i.e.

E_p,SZ₁∗1_Γ ≤νσ₁_,_∗(b−a). (7.22) Furthermore, similarly to (7.18) we estimate the second term in (7.21) as

2|V₁(µ)| ≤δkSˇ_µk2

d+ k$k2

d

δ .

Therefore, on the set Γ

kSˇ_µk2_d≤ kSb_µk2_d + 2δkSˇ_µk2_d+ Z₁∗ dδ + k$k2 d δ , i.e. k_Sˇ µk 2 d≤ 1 1−2δkSbµk 2 d + 1 (1−2δ)δ _Z∗ 1 d +k$k 2 d . (7.23) Using this inequality in (7.19) and puttingZ₂∗=Z∗+Z₁∗ yield on the set Γ

2M(µ_b)≤ 2δ 1−2δkSbµbk 2 d+ 1 δ(1−2δ) _Z∗ 2 d +k$k 2 d ≤ 4δ(Errd(bλ) + Errd(λ0)) 1−2δ + 1 δ(1−2δ) _Z∗ 2 d +k$k 2 d .

(23)

Therefore from the preceding inequality and (7.13) we obtain Errd(bλ)1Γ≤ 1 + 2δ 1−6δErrd(λ0)1Γ + 32(1−2δ) δ(1−6δ) Υd1Γ + 1 δ(1−6δ) _Z∗ 2 d +k$k 2 d 1_Γ+2δ(1−2δ) 1−6δ Pd(λ0)1Γ

and through the inequalities (7.14), (7.20) and (7.22) we estimate the empirical risk as E_p,SErr_d(bλ)1_Γ ≤ 1 + 2δ 1−6δEp,SErrd(λ0)1Γ + 32(1−2δ) δ(1−6δ) _{5 ˇ}_m_σ ∗ν(b−a) d +u ∗ d + 1 δ(1−6δ) ₂_νσ 1,∗(b−a) d +u ∗ d +2δ(1−2δ) 1−6δ Ep,S1ΓPd(λ0).

Taking into account thatσ_∗_,₁≤σ_∗and that 1−6δ >1/2 for 0< δ <1/12, we get

E_p,SErr_d(bλ)1_Γ ≤ 1 + 2δ 1−6δEp,SErrd(λ0)1Γ+ 320 δ _{( ˇ}_m_{+ 1)}_σ ∗ν(b−a) d +u ∗ d +2δ(1−2δ) 1−6δ Ep,S1ΓPd(λ0).

By applying LemmaA.4withε= 2δwe get that E_p,SErr_d(bλ)1_Γ≤ 1 + 4δ 1−6δEp,SErrd(λ0)1Γ+ 320 δ _{( ˇ}_m_{+ 1)}_σ ∗ν(b−a) d + 3u ∗ d + 10δqσ₁_,_∗mPˇ _p,S(Γc₎_.

Taking into account the definition of ˇm in (3.23) and that m∗₄ ≤ 3ς2_{, then by}

replacing

E_p,SErr_d(bλ)1_Γ and E_p,SErr_d(λ₀)1_Γ by Ep,SkSb_∗−Sk 2 d− kSk 2 dPp,S(Γ c ) and Ep,SkSbλ₀−Sk 2 d− kSk 2 dPp,S(Γ c ) respectively, we come to the inequality (5.1). Hence Theorem5.1.

Acknowledgements. The last author was partially supported by the Russian Federal Professor program (project no. 1.472.2016/1.4, the Ministry of Education and Science of the Russian Federation), the research project no. 2.3208.2017/4.6 (the Ministry of Education and Science of the Russian Federation), by RFBR Grant 16-01-00121 A and by ”The Tomsk State University competitiveness improvement program” grant 8.1.18.2018.

(24)

A

Appendix

A.1

Burkh¨

older inequality

We need the following fromShiryaev (2004).

Lemma A.1. Let (M_k)₁_≤_k_≤_n be a martingale. Then for any q >1

E|M_n|q _≤_b∗ qE   n X j=1 (M_j−M_j₋₁)2   q/2 , (A.1)

where the coefficientb∗_q = 18(q)3/2_/₍_q₋₁₎1/2_.

A.2

Properties of the sequential procedures

Lemma A.2. The properties(3.23)hold for the random variables(η_l)₁_≤_l_≤_d defined in (3.22).

Proof. First, we setF_j=σ{ξ₁, . . . , ξ_j}for 1≤j≤nand as usual F₀={Ω,∅}. Moreover, note that

η_l= n X j=1 ˇ t_l,jξ_j and ˇt_l,j=σ2_l1_{_ι l≤j<τˇl} ˇ Q_l,j+1_{_j_=ˇ_τ l}κˇl ˇ Q_l,_τ_ˇ l .

Taking into account that ˇt_l,j isF_j₋₁ - measurable for any 1≤j ≤nand n

X

j=1

ˇ

t2_l,j =σ2_l.

Note also thatGl=σ{η1, . . . , ηl−1, σl,} ⊂ Fι_l. Noting that Eη_l|F_ι l = 0 and Eη2_l|F_ι l = 1,

we obtain the first two equalities in (3.23). As to the last inequality, note that through (A.1) we can write

E_p,S      n X j=1 ˇ t_l,jξ_j   4 |F_ι l   ≤b ∗ 4Ep,S      n X j=1 ˇ t2_l,jξ_j2   2 |F_ι l   .

Now, note that

  n X j=1 ˇ t2_l,jξ_j2   2 ≤2σ_l4+ 2   n X j=1 ˇ t2_l,jξe_j   2

(25)

whereξe_j =ξ_j2−1. Taking into account that E_p,S      n X j=1 ˇ t2_l,jξe_j   2 |F_ι l   =Epξe 2 1 n X j=1 ˇ t4_l,j ≤σ4_lE_pξe2₁,

we obtain the last bound in (3.23). Hence LemmaA.2.

2

A.3

Correlation inequality

Now we give the correlation inequality from Galtchouk and Pergamenshchikov (2013).

Lemma A.3. Let(Ω,F,(F_j)₁_≤_j_≤_n,P)be a filtered probability space and(X_j,F_j)₁_≤_j_≤_n

A.4

Upper bound for the penalty term

Lemma A.4. For sufficiently large nand0< ε <1,

E_p,SP_d(λ)≤ 1 1−εEp,SErrd(λ)1Γ+ u∗_d (1−ε)ε + 10 1−ε q σ₁_,_∗mPˇ _p,S(Γc₎_.

Proof. Indeed, by the definition of Err_d(λ) on the set Γ we have

Err_d(λ) = d X j=1 (λ(j)−1)θ_j,d+λ(j)ζ_j,d2 = d X j=1 (λ(j)−1)θ_j,d+λ(j)$_j,d+λ(j) r b−a d ηj,d !2 .

(26)

Therefore, putting I₁= d X j=1 λ(j)(λ(j)−1)θ_j,dη_j,d and I₂= d X j=1 λ2(j)$_j,dη_j,d,

we get on the set Γ the following lower bound for the empirical risk Err_d(λ)≥ b−a d d X j=1 λ2(j)η_j,d2 + 2 r b−a d I1+ 2 r b−a d I2.

Taking into account that for 0< ε <1, 2 r b−a d |I2| ≤ ε b−a d d X j=1 λ2(j)η_j,d2 +k$k 2 d ε , we get Err_d(λ) ≥ (1−ε)b−a d d X j=1 λ2(j)η2_j,d+ 2 r b−a d I1− k$k2 d ε . (A.3)

Let us consider the first term in (A.3), then we have

E_p,S1_Γ d X j=1 λ2(j)η2_j,d=E_p,S d X j=1 λ2(j)s_j,d−E_p,S1_Γc d X j=1 λ2(j)η2_j,d.

Using the correlation inequality (A.2) and the upper bound for the fourth mo-ment in (3.23) we obtain E_p,Sη4_j,d≤64 ˇmσ2₁_,_∗. (A.4) This implies E_p,S1_Γ d X j=1 λ2(j)η_j,d2 ≥E_p,S d X j=1 λ2(j)s_j,d−8σ₁_,_∗dqm Pˇ _p,S(Γc₎_. _(A.5) Therefore, we obtain b−a d Ep,S1Γ d X j=1 λ2(j)η2_j,d≥E_p,SP_d(λ)−8(b−a)σ₁_,_∗qmPˇ _p,S(Γc₎_.

Moreover, taking into account thatE_p,SI₁= 0 and in view of Theorem (6.2) E_p,SI₁2≤σ₁_,_∗kSk2_d.

So, recalling that thatkSk_d≤b−a, we estimateE_p,SI₁1_Γ as Ep,SI11Γ = Ep,SI11Γc≤ pσ1,∗ q Pp,S(Γc). Hence LemmaA.4.

(27)

A.5

Properties of the model

(

1.1

)

Lemma A.5. For all t ∈ _N∗ _and ₀ _{< <}₁_, _{the random variables} _y

k in (1.1)

satisfy the following :

sup n≥1 sup 0≤k≤n sup S∈Θ_,L E_p,Sy2_kt <∞. (A.6)

Proof. This lemma is shown inArkoun (2011) (Lemma A.1). We set Υ_m 0,m1(zl) = 1 m₁−m₀ m₁ X j=m₀+1 y2_j− 1 γ_l, (A.7)

where (k₁_,l−2)₊ ≤m₀< m₁≤k₂_,l, (a)₊ is positive part ofaandγ_lis defined in (3.10).

Lemma A.6. Assume that the boundsm₀andm₁ in (A.7)are such that for some

0< ₁<1/2

lim inf n→∞ n

₁₍_m

1−m0)>0.

Then, for anyb>0 lim n→∞n b _max 1≤l≤d sup S∈Θ_,L sup p∈P P_p,S Υm0,m1(zl) > 0 = 0, (A.8)

where₀=₀(n)→0 asn→ ∞ is such thatlim_n_→∞nδˇ₀=∞for anyδ >ˇ 0.

Proof. This lemma is shown inArkoun (2011) (Lemma A.2).

A.6

Properties of the norms

Lemma A.7. Letf be an absolutely continuous[a, b]→Rfunction withkf˙k<∞

andg be a simple[a, b]→Rfunction of the form

g(t) = p X

j=1

c_jχ(tj−1,tj](t),

wherec_j are some constants. Then for all _eε >0, the function ∆ =f−g satisfies the following inequalities

k∆k2_≤_{(1 +} e ε)k∆k2 d+ 1 + 1 e ε _k_f˙_k2 d2 (b−a) 2_, and k∆k2 d≤(1 +eε)k∆k 2₊ 1 +1 e ε _k_f˙_k2 d2 (b−a) 2_.

Proof. Lemma A.7is proven in Konev and Pergamenshchikov (2015). (Lemma A.2.)

(28)

References

Anderson, T. W. (1994): The Statistical Analysis of Time Series,New York: Wiley. Arkoun, O. and Pergamenchtchikov, S. (2008): Nonparametric Estimation for an Autoregressive Model, Journal of Mathematics and Mechanics of Tomsk State University2: 20-30.

Arkoun, O. (2011): Sequential Adaptive Estimators in Nonparametric Autoregres-sive Models,Sequential Analysis30: 228-246.

Arkoun, O. and Pergamenchtchikov, S. (2016): Sequential Robust Estimation for Nonparametric Autoregressive Models.Sequential Analysis35: 489-515.

Baron, A., Birg´e, L. and Massart, P. (1999): Risk bounds for model selection via penalization.Probab. Theory Related Fields 113: 301-413.

Belitser, E. (2000): Recursive Estimation of a Drifted Autoregressive Parameter,

The Annals of Statistics26: 860-870.

Dahlhaus, R. (1996a): Maximum Likelihood Estimation and Model Selection for Locally Stationary Processes,Journal of Nonparametric Statistics6: 171-191. Dahlhaus, R. (1996b): On the Kullback-Leibler Information Divergence of Locally

Stationary Processes,Stochastic Processes and Their Applications62: 139-168. Fan, J. and Zhang, W. (2008): Statistical Methods with Varying Coefficient Models,

Statistics and Its Interface1: 179-195.

Fourdrinier, D. and Pergamenshchikov, S. (2007): Improved model selection method for a regression function with dependent noise. Annals of the Institute of Statistical Mathematics59: 435-464.

Galtchouk, L. and Pergamenshchikov, S. (2005): Nonparametric Sequential Mini-max Estimation of the Drift Coefficient in Diffusion Processes,Sequential Anal-ysis24: 303-330.

Galtchouk, L. and Pergamenshchikov, S. (2006a): Asymptotically Efficient Esti-mates for Nonparametric Regression Models, Statistics and Probability Letters

76: 852-860.

Galtchouk, L. and Pergamenshchikov, S. (2006b): Asymptotically Efficient Sequen-tial Kernel Estimates of the Drift Coefficient in Ergodic Diffusion Processes,

Statistical Inference for Stochastic Processes 9: 1-16.

Galtchouk, L. and Pergamenshchikov, S. (2009): Sharp non-asymptotic oracle in-equalities for nonparametric heteroscedastic regression models.Journal of Non-parametric Statistics21: 1-16.

(29)

Galtchouk, L. and Pergamenshchikov, S. (2009): Adaptive Asymptotically Efficient Estimation in Heteroscedastic Nonparametric Regression,Journal of Korean Sta-tistical Society38: 305-322.

Galtchouk, L. and Pergamenshchikov, S. (2011): Adaptive Sequential Estimation for Ergodic Diffusion Processes in Quadratic Metric, Journal of Nonparametric Statistics23: 255-285.

Galtchouk, L. and Pergamenshchikov, S. (2013): Uniform concentration inequality for ergodic diffusion processes observe at discrete times,Stochastic processes and their applications 123: 91-109.

Galtchouk, L. and Pergamenshchikov, S. (2015): Efficient pointwise estimation based on discrete data in ergodic nonparametric diffusions.Bernoulli21: 2569-2594.

Konev, V. and Pergamenshchikov, S. (1984): Estimate of the Number of Obser-vations in Sequential Identification of the Parameters of Dynamical Systems,

Avtomat. i Telemekh 12: 56-63.

Konev, V. and Pergamenshchikov, S. (2012): Efficient Robust Nonparametric Es-timation in a Semimartingale Regression Model, Annales de l’Institut Henri Poincar´e, Probabilit´es et Statistiques48: 1277-1244.

Konev, V. and Pergamenshchikov, S. (2015): Robust model selection for a semi-martingale continuous time regression from discrete data. Stochastic processes and their applications 125: 294-326.

Konev, V. (2016): On One Property of Martingales with Conditionally Gaussian Increments and Its Application in the Theory of Nonasymptotic Inference. Dok-lady Mathematics94: 1-5.

Luo, X. H., Yang, Z. H. and Zhou, Y. (2009): Nonparametric Estimation of the Production Function with Time-Varying Elasticity Coefficients, Systems Engineering-Theory and Practice29: 144-149.

Moulines, E., Priouret, P. and Roueff, F. (2005): On Recursive Estimation for Time Varying Autoregressive Processes,Annals of Statistics33: 2610-2654.