Robust adaptive efficient estimation for semi-Markov nonparametric regression models

(1)

HAL Id: hal-02334912

https://hal.archives-ouvertes.fr/hal-02334912

Submitted on 27 Oct 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Robust adaptive efficient estimation for semi-Markov

nonparametric regression models

Vlad Stefan Barbu, Slim Beltaief, Serguei Pergamenshchikov

To cite this version:

Vlad Stefan Barbu, Slim Beltaief, Serguei Pergamenshchikov. Robust adaptive efficient estimation for semi-Markov nonparametric regression models. Statistical Inference for Stochastic Processes, Springer Verlag, 2019, 22 (2), pp.187 - 231. �hal-02334912�

(2)

Robust adaptive efficient estimation for

semi-Markov nonparametric regression models

∗

Vlad Stefan Barbu†, Slim Beltaief‡and Serguei Pergamenshchikov§

Abstract

We consider the nonparametric robust estimation problem for regression models in continuous time with semi-Markov noises. An adaptive model se-lection procedure is proposed. Under general moment conditions on the noise distribution a sharp non-asymptotic oracle inequality for the robust risks is obtained and the robust efficiency is shown. It turns out that for semi-Markov models the robust minimax convergence rate may be faster or slower than the classical one.

MSC:primary 62G08, secondary 62G05

Keywords: Non-asymptotic estimation; Robust risk; Model selection; Sharp oracle inequality; Asymptotic efficiency.

∗

This work was done under partial financial support of the grant of RSF number 14-49-00079 (National Research University “MPEI” 14 Krasnokazarmennaya, 111250 Moscow, Russia) and of the RFBR Grant 16-01-00121.

†

Laboratoire de Mathématiques Raphaël Salem, UMR 6085 CNRS-Université de Rouen, France, e-mail: [email protected]

‡

Laboratoire de Mathématiques Raphaël Salem, UMR 6085 CNRS-Université de Rouen, France, e-mail: [email protected]

§

Laboratoire de Mathématiques Raphaël Salem, UMR 6085 CNRS-Université de Rouen, France and International Laboratory of Statistics of Stochastic Processes and Quantitative Finance of Na-tional Research Tomsk State University, e-mail: [email protected]

(3)

1 Introduction

Let us consider a regression model in continuous time

dy_t=S(t)dt+ dξ_t, 0≤t≤n , (1.1) where S(·) is an unknown 1-periodic function from L₂[0,1] defined on R with

values inR, the noise process(ξt)t≥0is defined as

ξ_t=%₁L_t+%₂z_t, (1.2)

where%₁and%₂are unknown coefficients,(L_t)_t_≥₀is a Levy process and the pure jump process(z_t)_t≥1,defined in (2.3), is assumed to be a semi-Markov process

(see, for example, [2]).

The problem is to estimate the unknown functionSin the model (1.1) on the basis of observations(y_t)₀≤t≤n. Firstly, this problem was considered in the

frame-work of the “signal+white noise” models (see, for example, [6] or [24]). Later, in order to study dependent observations in continuous time, were introduced “sig-nal+color noise” regressions based on Ornstein-Uhlenbeck processes (cf. [8], [9], [10], [13]).

Moreover, to include jumps in such models, the papers [14] and [15] used non Gaussian Ornstein-Uhlenbeck processes introduced in [1] for modeling of the risky assets in the stochastic volatility financial markets. Unfortunately, the dependence of the stable Ornstein-Uhlenbeck type decreases with a geometric rate. So, asymp-totically when the duration of observations goes to infinity, we obtain very quickly the same “signal+white noise” model.

The main goal of this paper is to consider continuous time regression models with dependent observations for which the dependence does not disappear for a sufficient large duration of observations. To this end we define the noise in the model (1.1) through a semi-Markov process which keeps the dependence for any durationn. This type of models allows, for example, to estimate the signals ob-served under long impulse noise impact with a memory or “against signals”.

In this paper we use the robust estimation approach introduced in [14] for such problems. To this end, we denote byQthe distribution of(ξ_t)₀≤t≤n in the

Sko-rokhod spaceD[0, n]. We assume thatQis unknown and belongs to some distri-bution familyQ_nspecified in Section4. In this paper we use the quadratic risk

R_Q(Se_n, S) =E_Q,SkSe_n−Sk2, (1.3)

wherekfk2 ₌ R1

0 f

2₍_s_)d_s_and_E

Q,S is the expectation with respect to the

(4)

the noise distributionQis unknown, it seems reasonable to introduce the robust risk of the form

R∗_n(Se_n, S) = sup

Q∈Q_n

R_Q(Se_n, S), (1.4)

which enables us to take into account the information thatQ∈ Q_nand ensures the quality of an estimateSe_nfor all distributions in the familyQ_n.

To summarize, the goal of this paper is to develop robust efficient model selec-tion methods for the model (1.1) with the semi-Markov noise having unknown dis-tribution, based on the approach proposed by Konev and Pergamenshchikov in [14] and [15] for continuous time regression models with semimartingale noises. Unfor-tunately, we cannot use directly this method for semi-Markov regression models, since their tool essentially uses the fact that the Ornstein-Uhlenbeck dependence decreases with geometrical rate and the “white noise” case is obtained sufficiently quickly.

Thus in the present paper we propose new analytical tools based on renewal methods to obtain the sharp non-asymptotic oracle inequalities. As a consequence, we obtain the robust efficiency for the proposed model selection procedures in the adaptive setting.

The rest of the paper is organized as follows. We start by introducing the main conditions in the next section. Then, in Section3we construct the model selection procedure on the basis of the weighted least squares estimates. The main results are stated in Section4; here we also specify the set of admissible weight sequences in the model selection procedure. In Section5we derive some renewal results useful for obtaining other results of the paper. In Section6we develop stochastic calculus for semi Markov processes. In Section7 we study some properties of the model (1.1). A numerical example is presented in Section8. Most of the results of the paper are proved in Section9. In Appendix some auxiliary propositions are given.

2 Main conditions

In the model (1.2) we assume that the Levy processL_tis defined as

L_t= ˇ% w_t+p1−%ˇ2_Lˇ

t, Lˇt=x∗(µ−µe)t, (2.1)

where,0≤%ˇ≤1is an unknown constant,(w_t)_t_≥₀is a standard Brownian motion,

µ(ds,dx) is the jump measure with the deterministic compensator µe(dsdx) =

dsΠ(dx), whereΠ(·) is some positive measure onR(see, for example [7,3] for

details) for which we assume that

(5)

where we use the usual notationΠ(|x|m_{) =} R

R |z|

m_Π(d_z₎_{for any}_{m >}₀_{. Note}

thatΠ(R)may be equal to+∞. Moreover, we assume that the pure jump process (z_t)_t_≥₀in (1.2) is a semi-Markov process with the following form

z_t=

Nt

X

i=1

Y_i, (2.3)

where(Y_i)_i_≥₁ is an i.i.d. sequence of random variables with EY_i = 0, EY_i2= 1 and EY_i4 <∞.

HereN_tis a general counting process (see, for example, [19]) defined as

N_t= ∞ X k=1 1{Tk≤t} and Tk= k X l=1 τ_l, (2.4)

where(τ_l)_l_≥₁ is an i.i.d. sequence of positive integrated random variables with distributionηand meanτˇ=Eτ₁>0. We assume that the processes(N_t)_t_≥₀and

(Y_i)i≥1 are independent between them and are also independent of(Lt)t≥0.

Note that the process(z_t)_t_≥₀is a special case of a semi-Markov process (see, e.g., [2] and [17]).

Remark 2.1. It should be noted that ifτ_j are exponential random variables, then

(N_t)_t_≥₀ is a Poisson process and, in this case,(ξ_t)_t_≥₀is a Levy process for which this model has been studied in [11], [12] and [14]. But, in the general case when

the process(2.3) is not a Levy process, this process has a memory and cannot be

treated in the framework of semi-martingales with independent increments. In this case, we need to develop new tools based on renewal theory arguments, what we

do in Section5. This tools will be intensively used in the proofs of the main results

of this paper.

Note that for any function f from L₂[0, n], f : [0, n] → _R, for the noise process(ξ_t)_t_≥₀defined in (1.2), with(z_t)_t_≥₀given in (2.3), the integral

I_n(f) =

Z n

0

f(s)dξ_s (2.5)

is well defined withE_QIn(f) = 0. Moreover, as it is shown in Lemma6.2,

E_QI_n2(f)≤_κ_Q

Z n

0

(6)

whereκQ =%21+% 2

2|ρ|∗and|ρ|∗ = supt≥0|ρ(t)|<∞. Hereρis the density of

the renewal measureηˇdefined as

ˇ η= ∞ X l=1 η(l), (2.7)

whereη(l)is thelth convolution power forη. To study the series (2.7) we assume that the measureηhas a densitygwhich satisfies the following conditions.

H₁)Assume that, for anyx∈R,there exist the finite limits

g(x−) = lim

z→x−g(z) and g(x+) = limz→x+g(z)

and, for anyK >0,there existsδ =δ(K)>0for which

sup |x|≤K Z δ 0 |g(x+t) +g(x−t)−g(x+)−g(x−)| t dt < ∞. H₂)For anyγ >0, sup z≥0 zγ|2g(z)−g(z−)−g(z+)| < ∞.

H₃)There existsβ >0such thatR

Re

βx_g₍_x_{) d}_{x <}_∞_.

Remark 2.2. It should be noted that the conditionH₃)means that there exists an

exponential moment for the random variable(τ_j)_j_≥₁, i.e. these random variables

are not too large. This is a natural constraint since these random variables define the intervals between jumps, i.e., the frequency of the jumps. So, to study the

influence of the jumps in the model(1.1)one needs to consider the noise process

(1.2)with “small” interval between jumps or large jump frequency.

For the next condition we need to introduce the Fourier transform of any func-tionf fromL₁(R), f :R→R,defined as

b f(θ) = 1 2π Z R eiθxf(x) dx. (2.8)

H₄)There existst∗ >0such that the functiong_b(θ−it)belongs toL₁(R)for

(7)

It is clear that Conditions H₁)–H₄) hold true for any continuously differen-tiable functiong, for example for the exponential density.

Now we define the family of the noise distributions for the model (1.1) which is used in the robust risk (1.4). Note that any distributionQfromQ_nis defined by the unknown parameters in (1.2) and (2.1). We assume that

ς∗ ≤%2₁ ≤ς∗, 0≤%ˇ≤1 and ς∗≤σQ≤ς

∗

, (2.9)

whereσ_Q =%2

1+%22/τˇ, the unknown bounds0< ς∗ ≤ς∗ are functions ofn, i.e.

ς∗=ς∗(n)andς∗ =ς∗(n), such that for anyˇ >0,

lim n→∞n ˇ _ς ∗(n) = +∞ and lim n→∞ ς∗(n) nˇ = 0. (2.10)

Remark 2.3. As we will see later, the parameterσ_Q is the limit for the Fourier

transform of the noise process(1.2). Such limit is called variance proxy (see [14]).

Remark 2.4. Note that, generally (but it is not necessary) the parameters%₁and%₂

can be dependent onn. The conditions(2.10)means that we consider all possible

cases, i.e. these parameters may go to the infinity or be constant or to the zero as well. See, for example, the conditions (3.32) in [15].

3 Model selection

Let(φ_j)_j≥1be an orthonormal uniformly bounded basis inL2[0,1], i.e., for some

constantφ_∗ ≥1, which may be depend onn,

sup

0≤j≤n

sup

0≤t≤1

|φ_j(t)| ≤ φ∗<∞. (3.1)

We extend the functionsφ_j(t)by periodicity, i.e., we setφ_j(t) :=φ_j({t}), where {t}is the fractional part oft≥0. For example, we can take the trigonometric basis defined as Tr₁≡1and, forj≥2,

Tr_j(x) =√2

 



cos(2π[j/2]x) for even j; sin(2π[j/2]x) for odd j,

(3.2)

where[x]denotes the integer part ofx.

To estimate the functionSwe use here the model selection procedure for con-tinuous time regression models from [14] based on the Fourrier expansion. We recall that for any functionSfromL₂[0,1]we can write

S(t) = ∞ X j=1 θ_jφ_j(t) and θ_j = (S, φ_j) = Z 1 0 S(t)φ_j(t)dt . (3.3)

(8)

So, to estimate the function S it suffices to estimate the coefficients θ_j and to replace them in this representation by their estimators. Using the fact that the functionSandφ_j are1- periodic we can write that

θ_j = 1

n

Z n

0

φ_j(t)S(t)dt .

If we replace here the differentialS(t)dtby the stochastic observed differentialdy_t

we obtain the natural estimate forθ_j on the time interval[0, n]

b θ_j,n= 1 n Z n 0 φ_j(t)dy_t, (3.4)

which can be represented, in view of the model (1.1), as

b θ_j,n=θ_j+ √1 nξj,n, ξj,n= 1 √ nIn(φj). (3.5)

Now (see, for example, [6]) we can estimate the functionSby the projection esti-mators, i.e. b S_m(t) = m X j=1 b θ_j,nφ_j(t), 0≤t≤1, (3.6)

for some numberm→ ∞asn→ ∞. It should be noted that Pinsker in [24] shows that the projection estimators of the form (3.6) are not efficient. For obtaining efficient estimation one needs to use weighted least square estimators defined as

b Sλ(t) = n X j=1 λ(j)θb_j,nφ_j(t), (3.7)

where the coefficientsλ = (λ(j))₁≤j≤nbelong to some finite setΛ from[0,1]n.

As it is shown in [24], in order to obtain efficient estimators, the coefficientsλ(j)

in (3.7) need to be chosen depending on the regularity of the unknown functionS. In this paper we consider the adaptive case, i.e. we assume that the regularity of the functionSis unknown. In this case we chose the weight coefficients on the basis of the model selection procedure proposed in [14] for the general semi-martingale regression model in continuous time. These coefficients will be obtained later in (3.20). To the end, first we set

ˇι= #(Λ) and |Λ|_∗ = 1 + max

λ∈Λ

ˇ

(9)

where#(Λ)is the cardinal number ofΛandLˇ(λ) =Pn

j=1λ(j). Now, to choose

a weight sequenceλin the setΛwe use the empirical quadratic risk, defined as Errn(λ) =kSb_λ−Sk2,

which in our case is equal to

Errn(λ) = n X j=1 λ2(j)bθ2 j,n−2 n X j=1 λ(j)bθ_j,nθ_j+ ∞ X j=1 θ2_j. (3.9)

Since the Fourier coefficients(θ_j)_j≥1 are unknown, we replace the termsθb_j,nθ_j,n

by e θ_j,n =θb2 j,n− b σ_n n , (3.10)

where_bσ_nis an estimate for the variance proxyσ_Qdefined in (2.9). If it is known, we takebσn=σQ; otherwise, we can choose it, for example, as in [14], i.e.

b σ_n= n X j=[√n]+1 b t2_j,n, (3.11)

wherebt_j,nare the estimators for the Fourier coefficients with respect to the

trigono-metric basis (3.2), i.e.

b t_j,n = 1 n Z n 0 T r_j(t)dy_t. (3.12)

Finally, in order to choose the weights, we will minimize the following cost func-tion Jn(λ) = n X j=1 λ2(j)bθ2 j,n−2 n X j=1 λ(j)θe_j,n+δ P_n(λ), (3.13)

whereδ >0is some threshold which will be specified later and the penalty term is

P_n(λ) = σbn|λ|

2

n . (3.14)

We define the model selection procedure as

b

S∗=Sb_λ_ˆ, (3.15)

where

b

(10)

We recall that the setΛis finite soλˆexists. In the case whenλˆis not unique, we take one of them.

Let us now specify the weight coefficients(λ(j))₁_≤_j_≤_n. Consider, for some fixed0< ε <1,a numerical grid of the form

A={1, . . . , k∗} × {ε, . . . , mε}, (3.17) wherem= [1/ε2]. We assume that both parametersk∗ ≥1andεare functions of

n, i.e.k∗=k∗(n)andε=ε(n), such that

       lim_n_→∞ k∗(n) = +∞, lim_n_→∞ k ∗₍_n₎ lnn = 0,

lim_n_→∞ ε(n) = 0 and lim_n_→∞ nδˇε(n) = +∞

(3.18)

for anyδ >ˇ 0. One can take, for example, forn≥2

ε(n) = 1 lnn and k ∗ (n) =k∗₀+ √ lnn , (3.19)

wherek∗₀ ≥0is some fixed constant and the thresholdς∗(n)is introduced in (2.9). For eachα= (β,l)∈ A, we introduce the weight sequence

λ_α = (λ_α(j))₁_≤_j_≤_n

with the elements

λ_α(j) =1{1≤j<j∗}+ 1−(j/ωα)β 1{j∗≤j≤ωα}, (3.20) wherej∗= 1 + [lnυn],ωα= (dβlυn)1/(2β+1), d_β = (β+ 1)(2β+ 1) π2β_β and υn=n/ς ∗_.

Now we define the setΛas

Λ = {λ_α, α∈ A}. (3.21)

It will be noted that in this case the cardinal of the setΛis

ˇ

ι=k∗m . (3.22)

Moreover, taking into account thatd_β <1forβ ≥1we obtain for the set (3.21) |Λ|_∗ ≤ 1 + sup

α∈A

(11)

Remark 3.1. Note that the form (3.20) for the weight coefficients in (3.7) was proposed by Pinsker in [24] for the efficient estimation in the nonadaptive case,

i.e. when the regularity parameters of the functionS are known. In the adaptive

case these weight coefficients are used in [14,15] to show the asymptotic efficiency

for model selection procedures.

4 Main results

In this section we obtain in Theorem4.3the non-asymptotic oracle inequality for the quadratic risk (1.3) for the model selection procedure (3.15) and in Theorem

4.4 the non- asymptotic oracle inequality for the robust risk (1.4) for the same model selection procedure (3.15), considered with the coefficients (3.20). We give the lower and upper bound for the robust risk in Theorems4.5and4.7, and also the optimal convergence rate in Corollary4.8.

Before stating the non-asymptotic oracle inequality, let us first introduce the following parameters which will be used for describing the rest term in the oracle inequalities. For the renewal densityρdefined in (2.7) we set

Υ(x) =ρ(x)−1 ˇ τ and kΥk1 = Z +∞ 0 |Υ(x)|dx , (4.1)

whereτˇ = Eτ₁. In Proposition5.2we show that|ρ|_∗ = sup_t_≥₀|ρ(t)|< ∞and kΥk₁<∞. So, using this, we can introduce the following parameters

Ψ_Q= 4κQˇι+ 5 + 4ˇι σ_Q ! σ_Qˇτ φ2_maxkΥk₁+φ4_max(1 +σ_Q2)3ˇl (4.2) and c∗_Q=σ_Q+ 2κQ+σQτ φˇ 2maxkΥk1+φ4max(1 +σ 2 Q) 2_ˇ_l_, _(4.3) whereˇl = (4ˇτ2 + 8)kΥk₁ + 5 + 13(1 + ˇτ)2(1 +|ρ|2 ∗)(EY14) + 4Π(x 4₎_{. First,}

let us state the non-asymptotic oracle inequality for the quadratic risk (1.3) for the model selection procedure (3.15).

Theorem 4.1. Assume that ConditionsH₁)–H₄)hold. Then, for anyn≥ 1and

0 < δ < 1/6, the estimator of S given in (3.15) satisfies the following oracle inequality R_Q(Sb∗, S)≤ 1 + 3δ 1−3δminλ∈ΛRQ( b Sλ, S) + ΨQ+ 10|Λ|∗ES|bσn−σQ| nδ . (4.4)

(12)

Now we study the estimate (3.11).

Proposition 4.2. Assume that ConditionsH₁)–H₄)hold and that the functionS(·)

is continuously differentiable. Then, for anyn≥2,

E_Q,S|_bσ_n−σ_Q| ≤ 6k ˙ Sk2₊_c∗ Q √ n . (4.5)

Theorem4.1and Proposition4.2implies the following result.

Theorem 4.3. Assume that ConditionsH₁)–H₄) hold and that the functionS is

continuously differentiable. Then, for anyn≥ 1and0< δ ≤1/6, the procedure

(3.15),(3.11)satisfies the following oracle inequality

R_Q(Sb∗, S)≤ 1 + 3δ 1−3δ minλ∈Λ R_Q(Sb_λ, S) + 60Λe_nkS˙k2+Ψe_Q,n nδ , (4.6) whereΨe_Q,n= 10Λe_nc∗ Q+ ΨQandΛe_n=|Λ|_∗/ √ n.

Remark 4.1. Note that the coefficientκQcan be estimated asκQ≤(1+ˇτ|ρ|∗)σQ.

Therefore,taking into account thatφ4_max ≥1, the remainder term in(4.6) can be

estimated as e Ψ_Q,n ≤C∗ 1 +σ_Q6 + 1 σ_Q ! (1 +Λe_n)ˇιφ4 max, (4.7)

whereC_∗>0is some constant which is independent of the distributionQ.

Furthermore, let us study the robust risk (1.4) for the procedure (3.15). In this case, the distribution familyQ_nconsists in all distributions on the Skorokhod space D[0, n]of the process (1.2) with the parameters satisfying the conditions (2.9) and (2.10).

Moreover, we assume also that the number of the weight vectors and the upper bound for the basis functions in (3.1) may depend onn ≥ 1, i.e. ˇι = ˇι(n) and

φ∗ =φ∗(n), such that for anyˇ >0

lim n→∞ ˇ ι(n) nˇ = 0 and _nlim_→∞ φ∗(n) nˇ = 0. (4.8)

The next result presents the non-asymptotic oracle inequality for the robust risk (1.4) for the model selection procedure (3.15), considered with the coefficients (3.20).

(13)

Theorem 4.4. Assume that Conditions H₁) – H₄) hold and that the unknown

functionSis continuously differentiable. Then, for the robust risk defined in(1.4)

through the distribution family(2.9)–(2.10), the procedure(3.15) with the

coef-ficients (3.20) for any n ≥ 1 and0 < δ < 1/6, satisfies the following oracle

inequality R∗(Sb∗, S)≤ 1 + 3δ 1−3δminλ∈Λ R∗(Sb_λ, S) + U∗_n(S) nδ , (4.9)

where the sequenceU∗_n(S) > 0 is such that, under the conditions(2.10),(3.18)

and(4.8), for anyr >0andδ >ˇ 0,

lim

n→∞ _ksup_S_˙_k≤_r

U∗_n(S)

nˇδ = 0. (4.10)

Now we study the asymptotic efficiency for the procedure (3.15) with the co-efficients (3.20), with respect to the robust risk (1.4) defined by the distribution family (2.9)–(2.10). To this end, we assume that the unknown functionS in the model (1.1) belongs to the Sobolev ball

W_rk={f ∈ C_perk [0,1] :

k

X

j=0

kf(j)k2 ≤r}, (4.11)

wherer > 0andk ≥ 1are some unknown parameters, Ck

per[0,1]is the set of k

times continuously differentiable functions f : [0,1] → _Rsuch that f(i)(0) =

f(i)₍₁₎_{for all}₀_≤_i_≤_k_{. The function class}_Wk

r can be written as an ellipsoid in

L₂[0,1], i.e., W_rk={f ∈ C_perk [0,1] : ∞ X j=1 a_jθ_j2 ≤r}, (4.12) where a_j = Pk i=0(2π[j/2]) 2i _and _θ j = R1

0 f(v)Trj(v)dv. We recall that the

trigonometric basis(Tr_j)_j_≥₁is defined in (3.2).

Similarly to [14,15] we will show here that the asymptotic sharp lower bound for the robust risk (1.4) is given by

r∗_k = ((2k+ 1)r)1/(2k+1) k (k+ 1)π 2k/(2k+1) . (4.13)

Note that this is the well-known Pinsker constant obtained for the nonadaptive filtration problem in “signal + small white noise” model (see, for example, [24]). Let Π_n be the set of all estimators Sb_n measurable with respect to the σ - field

σ{y_t,0≤t≤n}generated by the process (1.1).

The following two results give the lower and upper bound for the robust risk in our case.

(14)

Theorem 4.5. Under Conditions(2.9)and(2.10), lim inf n→∞ υ 2k/(2k+1) n inf b S_n∈Π_n sup S∈Wk r R∗_n(Sb_n, S)≥r∗_k, (4.14) whereυ_n=n/ς∗.

Note that if the parametersrandkare known, i.e. for the non adaptive estima-tion case, then to obtain the efficient estimaestima-tion for the "signal+white noise" model Pinsker in [24] proposed to use the estimateSb_λ

0 defined in (3.7) with the weights

(3.20) in which

λ₀ =λ_α

0 and α0 = (k,l0), (4.15)

wherel₀ = [r/ε]ε. For the model (1.1) – (1.2) we show the same result.

Proposition 4.6. The estimatorSb_λ

0satisfies the following asymptotic upper bound

lim n→∞υ 2k/(2k+1) n sup S∈Wk r R∗_n(Sb_λ 0, S)≤r ∗ k.

For the adaptive estimation we user the model selection procedure (3.15) with the parameterδdefined as a function ofnsatisfying

lim

n δn= 0 and limn n

ˇ

δ_δ

n= 0 (4.16)

for anyδ >ˇ 0. For example, we can takeδ_n= (6 + lnn)−1_.

Theorem 4.7. Assume that ConditionsH₁)–H₄)hold true. Then the robust risk

defined in(1.4)through the distribution family(2.9)–(2.10)for the procedure(3.15)

based on the trigonometric basis(3.2)with the coefficients(3.20)and the

parame-terδ=δ_nsatisfying(4.16)has the following asymptotic upper bound

lim sup

n→∞

υ_n2k/(2k+1) sup

S∈W_rk

R∗_n(Sb_∗, S)≤r∗_k. (4.17)

Theorem 4.5and Theorem 4.7 allow us to compute the optimal convergence rate.

Corollary 4.8. Under the assumptions of Theorem4.7, we have

lim n→∞ υ 2k/(2k+1) n inf b Sn∈Πn sup S∈Wk r R∗_n(Sb_n, S) =r∗_k. (4.18)

Remark 4.2. It is well known that the optimal (minimax) risk convergence rate for

the Sobolev ballW_rkisn2k/(2k+1)(see, for example, [24], [22]). We see here that

the efficient robust rate isυ2k/(2k+1)

n , i.e., if the distribution upper boundς

∗ _→ ₀

asn→ ∞,we obtain a faster rate with respect ton2k/(2k+1), and, ifς∗ → ∞as

n→ ∞,we obtain a slower rate. In the case whenς∗ is constant, than the robust

(15)

5 Renewal density

This section is concerned with results related to the renewal measure (2.7). We start with the following lemma.

Lemma 5.1. Let τ be a positive random variable with a density g, such that

Eeβτ < ∞ for someβ > 0. Then there exists a constantβ1,0 < β1 ≤ β for

which,

Ee(β1+iω)τ ₆_{= 1} _∀_ω _∈

R.

Proof.We will show this lemma by the contradiction, i.e. assume there exists some

sequence of positive numbers going to zero(γ_k)_k_≥₁and a sequence(w_k)_k_≥₁such that

Ee(γk+iωk)τ _{= 1} _(5.1)

for anyk≥1. Firstly assume thatlim sup_k_→∞ w_k= +∞. Note that in this case, for anyN ≥1, Z N 0 eγkt_cos(_w kt)g(t)dt ≤ Z N 0 cos(w_kt)g(t)dt + Z N 0 (eγkt−_{1) cos(}_w kt)g(t)dt ,

i.e., in view of LemmaA.4, for any fixedN ≥1

lim sup k→N Z N 0 eγkt_cos(_w kt)g(t)dt= 0.

Since for someβ >0the integralR+∞

0 e βt_g₍_t_)d_{t <}_∞_{, we get} lim k→∞ Z +∞ 0 eγkt_cos(_w kt)g(t)dt= 0.

Let nowlim sup_k_→∞w_k=ω_∞ = 06 and0<|ω_∞|<∞. In this case there exists a sequence(lk)k≥1such thatlimk→∞wlk =ω∞, i.e.

1 = lim sup

k→∞

Eeγlkτ_cos(_{τ w}

lk) =Ecos(τ w∞).

It is clear that, for random variables having density, the last equality is possible if and only ifw∞ = 0. In this case, i.e. when lim supk→∞wlk = 0, the equation (5.1) implies lim sup k→∞ Eeγlkτsin(τ wlk) w_l k =Eτ = 0.

(16)

But, under our conditions,Eτ >0. These contradictions imply the desired result.

2

Proposition 5.2. Letτbe a positive random variable with the distributionηhaving

a densitygwhich satisfies ConditionsH₁)–H₄). Then the renewal measure(2.7)

is absolutely continuous with densityρ, for which

ρ(x) = 1 ˇ

τ + Υ(x), (5.2)

whereτˇ = Eτ₁ andΥ(·)is some function defined on R+ with values in Rsuch

that

sup

x≥0

xγ|Υ(x)|<∞ for all γ >0.

Proof. First note, that we can represent the renewal measureηˇasηˇ=η∗η₀ and

η₀=P∞

j=0η

(j)_{. It is clear that in this case the density}_ρ_of_η_ˇ_{can be written as}

ρ(x) = Z x 0 g(x−y) X n≥0 g(n)(y)dy . (5.3)

Now we use the arguments proposed in the proof of Lemma 9.5 from [5]. For any

0< <1we set ρ(x) = Z x 0 g(x−y)   X n≥0 (1−)ng(n)(y)−(1−) ˇ τ g0(y)  dy−g(x), (5.4)

whereg0(y) =e−y/τˇ1{y>0}.It is easy to deduce that for anyx∈R

lim →0 ρ(x) = ρ(x) − 1 ˇ τ Z x 0 g(z) dz−g(x). (5.5)

Moreover, in view of the conditionH₁)we obtain that the functionρ(x)satisfies the conditionD)from SectionA.2. So, through PropositionA.5we get

ρ(x+) +ρ(x−) = 1 π Z R e−ixθρ_b(θ) dθ , whereρb(θ) = R Re iθx_ρ (x)dx. Note that |_bg(θ)|= Z R eiθxg(x)dx ≤ Z R g(x)dx= 1,

(17)

i.e. for any0< <1we have|1−(1−)bg(θ)|<1and therefore ∞ X n=0 (1−)n(_bg(θ))n= 1 1−(1−)_bg(θ).

From this and, taking into account that

b g0(θ) = Z R eiθxg0(x)dx= ˇ τ −iτ θˇ , we obtain b ρ(θ) =_bg(θ) ∞ X n=0 (1−)n(_bg(θ))n− 1− ˇ τ b g(θ)_bg0(θ)−bg(θ) =_bg(θ)G(θ) and G(θ) = 1 1−(1−)_bg(θ) − 1−iˇτ θ −iτ θˇ , i.e. ρ(x−) +ρ(x+) = 1 π Z R e−ixθ_bg(θ)G(θ) dθ . (5.6) One can check directly that

sup

0<<1,θ∈_R

|G(θ)| < ∞.

Therefore, using the condition H₃) and the Lebesgue’s dominated convergence theorem, we can pass to limit as→0in (5.6), i.e., we obtain that

ρ(x+) +ρ(x−)−2 ˇ τ Z x 0 g(z) dz−g(x+)−g(x−) = 1 π Z R e−ixθ_bg(θ)G₀(θ) dθ , where G₀(θ) = 1 1−_bg(θ) + 1−iτ θˇ iτ θˇ .

Using here again PropositionA.5we deduce that

ρ(x+) +ρ(x−) = 2 ˇ τ Z x 0 g(z) dz+ 1 π Z R e−ixθ_bg(θ) ˇG(θ) dθ (5.7) and ˇ G(θ) = 1 1−_bg(θ) + 1 iˇτ θ.

(18)

Note now that we can represent the density (5.3) as ρ(x) =g∗X n≥0 g(n)=X n≥1 g(n)(x) =g(x) +X n≥2 g(n)(x) =:g(x) +ρ_c(x)

and the functionρ_c(x)is continuous for allx∈R. This means that

e

ρ(x) = ρ(x+) +ρ(x−)

2 −ρ(x) =

g(x+) +g(x−)

2 −g(x)

and, therefore, the conditionH₂)implies that, for anyγ >0,

sup

x≥0

xγ|ρ_e(x)| <∞.

Now we can rewrite (5.7) as

ρ(x) = 1 ˇ τ Z x 0 g(z) dz+ 1 2π Z R e−ixθg_b(θ) ˇG(θ) dθ−ρ_e(x). (5.8)

Taking into account thatEeβτ <∞for someβ >0we can obtain that

sup x≥0 xγ Z +∞ x g(z) dz <∞.

To study the second term in (5.8) we will use PropositionA.3. Indeed, the condition H₃)implies the first limit equality in (A.1). The second one follows directly from LemmaA.4. Therefore, in view of PropositionA.3, there exists someβ∗>0such that, for any0≤β₀ ≤β∗,

Z

R

e−ixθ_bg(θ) ˇG(θ) dθ=e−β0x

Z

R

e−ixθ_bg(θ−iβ₀) ˇG(θ−iβ₀) dθ .

Note that, due to Lemma 5.1, the function 1− _bg(θ) has no zeros on the line {z∈_C : Im(z) =−β₁}. Moreover, one can check directly thatθ = 0is an iso-lated zero. So, this means that for anyN >1there can be only finitely many zeros in{z∈_C : −β₁ <Im(z)<0,|Re(z)|< N}of the function1−_bg(θ).Moreover, note that in view of lemmaA.4for anyr >0

lim

Re(θ)→∞,|Im(θ)|≤rb

g(θ) = 0.

This means that there exists N > 0 such that the function 1 −_bg(θ) 6= 0 for

θ ∈ {z∈_C : −β₁ <Im(z)<0,|Re(z)| ≥N}. So, there can be only finitely many zeros of the function 1−_bg(θ) in {z∈_C : −β₁ < Im(z)<0} for some

(19)

fixed0 < β₁ < β. Therefore, there exists someβ₀ > 0for which the function

1−_bg(θ)has no zeros in{z∈_C : −β₀< Im(z)<0}, i.e. the functionGˇ(θ)will be bounded in this set and we obtain that

sup x≥0 eβ0x Z R e−ixθbg(θ) ˇG(θ) dθ <∞.

This the conclusion follows. 2

Using this proposition we can study the renewal process(N_t)_t≥0introduced in

(2.4).

Corollary 5.3. Assume that ConditionsH₁)–H₄)hold true. Then, for anyt >0,

EN_t≤ |ρ|_∗t and EN_t2 ≤ |ρ|_∗t+ |ρ|2_∗t2. (5.9)

Proof.First, by means of Proposition5.2, note that we get

EN_t=E X k≥1 1_{_T k≤t}= Z t 0 ρ(v) dv≤ |ρ|_∗t .

Regarding the last bound in (5.9), we use the same reasoning as in the previous inequality, i.e., we obtain

EN_t2 =E X k≥1 1_{_T k≤t}+ 2E X k≥1 1_{_T k≤t} X j=k+1 1_{_T j≤t} =EN_t+ 2E X k≥1 1_{_T k≤t}Θ(Tk) =ENt+ Z t 0 Θ(v)ρ(v) dv ,

where, for0≤v≤t,we defined the functionΘ(v) =EN_t₋_v≤ |ρ|_∗(t−v). 2

6 Stochastic calculus for semi-Markov processes

In this section we give some results of stochastic calculus for the process(ξ_t)_t_≥₀

given in (1.2), needed all along this paper. As the processξ_tis the combination of a Levy process and a semi-Markov process, these results are not standard and need to be provided.

Lemma 6.1. Letfandgbe any non-random functions fromL₂[0, n]and(I_t(f))_t_≥₀

be the process defined in(2.5). Then, for any0≤t≤n,

EI_t(f)I_t(g) =%2₁(f, g)_t +%2₂(f, gρ)_t, (6.1)

where(f, g)_t=Rt

(20)

Proof.First, note that we can represent the stochastic integralI_t(f)as I_t(f) =%₁I_tL(f) +%₂I_tz(f), (6.2) where I_tL(f) = Z t 0 f(s)dL_s and I_tz(f) = Z t 0 f(s)dz_s.

Note that the mutual covariation for the martingales I_tL(f) and I_tL(g) (see, for example, [18]) may be calculated as

[IL(f), IL(g)]_t= ˇ%2 Z t 0 f(s)g(s)ds+ (1−%ˇ2) X 0≤s≤t f(s)g(s) ∆ ˇL_s2 , (6.3)

where∆ ˇL_s= ˇL_s−Lˇ_s−. Taking into account thatEItL(f)ItL(g) =E[IL(f), IL(g)]t

and that in view of the first condition in (2.2)Π(x2) = 1, we obtain that EI_tL(f)I_tL(g) = ˇ%2 Z t 0 f(s)g(s)ds+ (1−%ˇ2) Π(x2) Z t 0 f(s)g(s)ds = Z t 0 f(s)g(s)ds . (6.4)

Moreover, note that EI_tz(f)I_tz(g) =E ∞ X l=1 f(T_l)g(T_l)Y_l21_{_T l≤t} ! =E ∞ X l=1 f(T_l)g(T_l)1{Tl≤t} ! = Z t 0 f(s)g(s)ρ(s)ds .

Hence the conclusion follows. 2

Lemma 6.2. Assume that Conditions H₁)–H₄)hold true. Then, for anyn ≥ 1

and for any non random functionffromL₂[0, n],the stochastic integral(2.5)exists

and satisfies the properties(2.6)with the coefficientκQgiven in(2.6).

Proof. This lemma follows directly from Lemma6.1withf =gand Proposition

5.2. 2

Lemma 6.3. Letf andgbe bounded functions defined on[0,∞)×_R.Then, for

anyk≥1, E I_T k−(f)ITk−(g)| G =%2₁(f , g)_T k+% 2 2 k−1 X l=1 f(T_l)g(T_l),

(21)

Proof.Using (6.2), (6.4) and, taking into account that the process(L_t)_t≥0is

inde-pendent of theG, we obtain EI_T k−(f)ITk−(g)| G =%2₁(f , g)_T k+E I_Tz k−(f)I z Tk−(g)| G . Moreover, EI_Tz k−(f)I z Tk−(g)| G =E _k₋₁ X l=1 f(T_l)Y_l ! _k₋₁ X l=1 g(T_l)Y_l ! | G ! = k−1 X l=1 f(T_l)g(T_l).

This we obtain the desired result. 2

Lemma 6.4. Assume that ConditionsH₁)–H₄)hold true. Then, for any

measur-able bounded non-random functionsf andg,we have

E Z n 0 I_t2₋(f)g(t) dm_t ≤2%2₂|g|_∗|f|2 ∗kΥk1n.

Proof. Using the definition of the process(m_t)_t_≥₀ we can represent this integral

as Z n 0 I_t2₋(f)g(t) dm_t=X k≥1 I_T2 k−(f)g(Tk)Y 2 k 1{T_k≤n} − Z n 0 I_t2(f)g(t)ρ(t) dt=:V_n−U_n. (6.5)

Note now that

EV_n =EX k≥1 g(T_k)EI_T2 k−(f) | G1_{_T k≤n}.

Now, using Lemma6.3we can represent the last expectation as

EV_n =%2₁EV_n0+%2₂EV_n00, (6.6) where V_n0 =X k≥1 g(T_k)kfk2 T_k1{Tk≤n} and V 00 n = X k≥2 g(T_k)1_{_T k≤n} k−1 X l=1 f2(T_l).

(22)

The first term in (6.6) can be represented as

EV_n0 =

Z n

0

g(t)kfk2_tρ(t)dt .

To estimate the last expectation in (6.6), note that

EV_n00 =E X l≥1 f2(T_l) ¯g(T_l)1{T_l≤n} = Z n 0 f2(v) ¯g(v)ρ(v)dv , where ¯ g(v) =E X k≥1 g(v+T_k)1_{_T k≤n−v} = Z n v g(t)ρ(t−v)dt .

Moreover, using now the representation (6.1), we calculate the expectation of the last term in (6.5) EU_n=%2₁ Z n 0 kfk2_tg(t)ρ(t) dt+%2₂ Z n 0 ˇ f(t)g(t)ρ(t) dt ,

wherefˇ(t) =R₀t f2(s)ρ(s) ds. This implies that

E Z n 0 I_t2₋(f)g(t) dm_t=%2₂ Z n 0 g(t)δ(t)dt , whereδ(t) =Rt 0 f

2₍_v_{) (}_ρ₍_t₋_v₎₋_ρ₍_t₎₎_ρ₍_v_{) d}_v_{. Note that, in view of}

Proposi-tion5.2, the functionδcan be estimated as

|δ(t)| ≤ |f|2_∗|ρ|_∗ Z t 0 |Υ(t−v)−Υ(t)|dv≤ |f|2_∗|ρ|_∗ (kΥk₁+t|Υ(t)|). Therefore, E Z n 0 I_t2₋(f)g(t) dm_t ≤2%2₂|g|_∗|f|2_∗kΥk₁n

and this finishes the proof. 2

Lemma 6.5. Assume that ConditionsH₁)–H₄)hold true. Then, for any

measur-able bounded non-random functionsf andg, one has

E

Z n

0

(23)

Proof.First, note that Z n 0 I_t2₋(f)I_t−(g)g(t)dξt=%1 Z n 0 I_t2(f)I_t(g)g(t)dL_t+%₂ Z n 0 I_t2₋(f)I_t−(g)g(t)dzt.

Second, we will show that E

Z n

0

I_t2₋(f)I_t₋(g)g(t)dL_t= 0. (6.7)

Using the notations (6.2), we set

J₁ = Z n 0 I_t2(f)I_tL(g)g(t)dL_t and J₂= Z n 0 I_t2(f)I_tz(g)g(t)dL_t, we obtain that Z n 0 I_t2(f)I_t(g)g(t)dL_t=%₁J₁+%₂J₂. (6.8)

Now let us recall the Novikov inequalities, [23], also referred to as the Bichteler– Jacod inequalities (see [4,21]) providing bound moments of supremum of purely discontinuous local martingales for any predictable functionhand anyp≥2

E sup 0≤t≤n Z [0,t]×R hd(µ−ν) p ≤C_p∗EJˇ_p,n(h), (6.9)

whereC_p∗is some positive constant and

ˇ J_p,n(h) = Z [0,n]×R h2 dν !p/2 + Z [0,n]×R hpdν .

By applying this inequality for the non-random functionh = (s, x) =g(s)x, and, recalling thatΠ(x8₎_<_∞_{, we obtain,}

sup 0≤t≤n E I ˇ L t (g) 8 <∞.

Taking into account that, for any non random square integrated functionf,the in-tegral

Rt

0 f(s)dws

is Gaussian with the parameters

0,Rt 0 f 2₍_s_)d_s_{, we obtain} sup 0≤t≤n EI_tL(g) 8 <∞.

(24)

Finally, by using the Cauchy inequality, we can estimate for any 0 < t ≤ the following expectation as E(I_tL(f))4(I_tL(g))2 < q E(IL t (f))8 q E(IL t(f))4 i.e., sup 0≤t≤n E(I_tL(f))4(I_tL(g))2<∞.

Moreover, taking into account that the processes(L_t)_t_≥₀ and(z_t)_t_≥₀are indepen-dent, we obtain that

E(I_tz(f))4(I_tL(g))2=E(I_tz(f))4E(I_tL(g))2 = Π(x2)

Z t

0

g2(s)dsE(I_tz(f))4.

One can check directly here that, fort >0,

E|I_tz(f)|4 ≤ |f|4_∗EY₁4EN_t2.

Note that the last bound in Corollary5.3yields sup₀_≤_t_≤_n E(I_tz(f))4 < ∞ and, therefore,

sup

0≤t≤n

E(I_t(f))4(I_tL(g))2 <∞.

It follows directly thatEJ₁ = 0. Now we study the last term in (6.8). To this end, first note that similarly to the previous reasoning we obtain that

E Z n 0 (I_tL(f))2I_tz(g)g(t)dL_t= 0 and E Z n 0 I_tL(f)I_tz(f)I_tz(g)g(t)dL_t= 0.

Therefore, to show (6.7) one needs to show that E

Z n

0

(I_tz(f))2I_tz(g)g(t) dL_t= 0. (6.10)

To check this note that, for any0< t≤nand for any bounded functionf,

I_tz(f) = ∞ X k=1 f(T_k)Y_k1{Tk≤t} = N_n X k=1 f(T_k)Y_k1{Tk≤t}, i.e., Z n 0 (I_tz(f))2I_tz(g)g(t) dL_t= Nn X k=1 Nn X l=1 Nn X j=1 f(T_k)f(T_l)g(T_j)Y_jY_lY_kI_klj,

(25)

where I_klj = Z n 0 1_{_T k≤t}1{Tl≤t}1{Tj≤t}dLt.

Taking into account that the(L_t)_t_≥₀is independent of the fieldG_z =σ{z_t, t≥0},

we obtain thatE I_klj|G_z = 0. Therefore, E Z n 0 (I_tz(f))2I_tz(g)g(t) dL_t = E N_n X k=1 N_n X l=1 N_n X j=1 f(T_k)f(T_l)g(T_j)Y_jY_lY_kE I_klj|G_z = 0.

So, we obtain (6.10) and hence the proof is achieved. 2

7 Properties of the regression model

(

1.1 )

In order to prove the oracle inequalities we need to study the conditions introduced in [14] for the general semi-martingale model (1.1). To this end we set for any

x∈_Rn_{the functions} B₁_,Q,n(x) = n X j=1 x_j E_Qξ_j,n2 −σ_Q and B₂_,Q,n(x) = n X j=1 x_jξe_j,n, (7.1)

whereσ_Qis defined in (2.9) andξe_j,n =ξ_j,n2 −E_Qξ_j,n2 .

Proposition 7.1. Assume that ConditionsH₁)–H₄)hold. Then

sup

x∈[−1,1]n

|B₁_,Q,n(x)| ≤C₁_,Q,n, (7.2)

whereC₁_,Q,n=σ_Qτ φˇ 2_maxkΥk₁.

Proof.First, note that from (6.2) we have

ξ_j,n = √%1 nI L n(φj) + %₂ √ nI z n(φj).

So, using (6.4) we can write that

Eξ_j,n2 = % 2 1 n Z n 0 φ2_j(t)dt+% 2 2 nE ∞ X l=1 φ2_j(T_l)1_{_T l≤n}. (7.3)

(26)

Proposition5.2implies E ∞ X l=1 φ2_j(T_l)1{Tl≤n} = Z n 0 φ2_j(x)ρ(x)dx = 1 ˇ τ Z n 0 φ2_j(x)dx + Z n 0 φ2_j(x)Υ(x)dx . Note thatRn 0 φ 2

j(t)dt=n. So, in view of the condition (3.1), we obtain

Eξ 2 j,n−σQ = %2₂ n Z n 0 φ2_j(x)Υ(x)dx ≤ % 2 2 n φ 2 maxkΥk1. (7.4)

Estimating here%2₂byσ_Qτˇwe obtain the inequality (7.2) and hence the conclusion

follows. 2

Proposition 7.2. Assume that ConditionsH₁)–H₄)hold. Then

sup

|x|≤1

E_QB2₂_,Q,n(x) ≤ C₂_,Q,n, (7.5)

whereC₂_,Q,n=φ4_max(1 +σ_Q2)3ˇlandˇlis given in(4.3).

Proof.By Ito’s formula one gets

dI_t2(f) = 2I_t−(f)dIt(f) +%21%ˇ

2_f2₍_t_)d_t₊ X

0≤s≤t

f2(s)(∆ξ_sd)2, (7.6)

where ξ_td = %₃Lˇ_t +%₂z_t and %₃ = %₁p1−%ˇ2_{. Taking into account that the}

processes( ˇL_t)_t_≥₀and(z_t)_t_≥₀are independent and the time of jumpsT_kdefined in (2.4) has a density, we have∆z_s∆ ˇL_s = 0a.s. for anys ≥0. Therefore, we can rewrite the differential (7.6) as

dI_t2(f) =2I_t−(f)dIt(f) +%21%ˇ 2_f2₍_t_)d_t +%2₃d X 0≤s≤t f2(s)(∆ ˇL_s)2+%2₂d X 0≤s≤t f2(s)(∆z_s)2. (7.7)

From Lemma6.2it follows that EI_t2(f) =%2₁ Z t 0 f2(s)ds+%2₂ Z t 0 f2(s)ρ(s)ds . Therefore, putting e I_t(f) =I_t2(f)−EI_t2(f), (7.8)

(27)

we obtain dIe_t(f) = 2I_t₋(f)f(t)dξ_t+f2(t)dm_e_t, m_e_t=%₃2mˇ_t+%2₂m_t, wheremˇ_t=P 0≤s≤t(∆ ˇLs) 2₋_t_and_m t= P 0≤s≤t(∆zs) 2₋Rt 0 ρ(s)ds. For any

non-random vectorx= (x_j)₁_≤_j_≤_nwithPn

j=1x 2 j ≤1, we set ¯ I_t(x) = n X j=1 x_jIe_t(φ_j). (7.9) Denoting A_t(x) = n X j=1 x_jI_t(φ_j)φ_j(t) and B_t(x) = n X j=1 x_jφ2_j(t), (7.10)

we get the following stochastic differential equation for (7.9)

d ¯I_t(x) = 2A_t₋(x)dξ_t+B_t(x)dm_e_t, I¯₀(x) = 0.

Applying the Ito’s formula one obtains EI¯_n2(x) =2E Z n 0 ¯ I_t₋(x)d ¯I_t(x) + 4%2₁%ˇ2E Z n 0 A2_t(x)dt +%2₃EDˇ_n(x) +%2₂ED_n(x), (7.11) whereDˇ_n(x) =P 0≤t≤n 2At−(x)∆ ˇLt+% 2 3Bt(x)(∆ ˇLt)2 2 and D_n(x) = P+∞ k=1 2A_T k−(x)Yk+%2BTk−(x)Y 2 k 2 1_{_T

k≤n}. Let us now show

that E Z n 0 ¯ I_t₋(x)dI¯_t(x) ≤2%4₂φ3_maxkΥk₁n2. (7.12)

To this end, note that

Z n 0 ¯ I_t₋(x)d ¯I_t(x) =2 X 1≤j,l≤n x_jx_l Z n 0 e I_t₋(φ_j)I_t₋(φ_l)φ_l(t)dξ_t + n X j=1 x_j Z n 0 e I_t−(φj)Bt(x)dmet.

(28)

Using here Lemma6.5, we getE R₀n Ie_t₋(φ_j)I_t₋(φ_i)φ_i(t)dξ_t= 0. Moreover, the

process( ˇm_t)_t≥0is a martingale, i.e.E

Rn 0 Iet−(φj)Bt(x)dmet= 0. Therefore, E Z n 0 ¯ I_t−(x)dI¯t(x) =%22 n X j=1 x_jE Z n 0 e I_t−(φj)Bt(x)dmt.

Taking into account here that for any non-random bounded functionf

E Z n 0 f(t)dm_t= 0, we obtainERn 0 Iet−(φj)Bt(x) dmt = E Rn 0 I 2 t−(φj)Bt(x) dmt. So, Lemma6.4 yields E Z n 0 e I_t−(φj)Bt(x)dmt = n X l=1 x_lE Z n 0 I_t2₋(φ_j)φ2_l(t)dm_t ≤ 2%2₂φ3_maxkΥk₁ n X l=1 |x_l|n . Therefore, E Z n 0 ¯ I_t₋(x)dI¯_t(x) ≤2%4₂φ3_maxkΥk₁n X 1≤l,j≤n |x_l| |x_j| = 2%4₂φ3_maxkΥk₁n n X l=1 |x_l| !2 .

Taking into account here that Pn

l=1 |xl| 2 ≤ nP l≥1 x 2 l ≤n, we obtain (7.12).

Reminding thatΠ(x2) = 1we can calculate directly that

EDˇ_n(x) = 4E Z n 0 A2_t(x)dt+%4₃Π(x4) Z n 0 B_t2(x)dt . (7.13)

(29)

Note that, thanks to Lemma6.1, we obtain that E Z n 0 A2_t(x)dt=X i,j x_ix_j Z n 0 φ_i(t)φ_j(t)EI_tφ_i(t)I_tφ_j(t)dt =X i,j x_ix_j Z n 0 φ_i(t)φ_j(t) Z t 0 φ_i(v)φ_j(v)(%2₁+%2₂ρ(v))dv = 1 2% 2 1 X i,j x_ix_j Z n 0 φ_i(t)φ_j(t)dt 2 +%2₂A₁_,n(x) = n 2 2 % 2 1+% 2 2A1,n(x), whereA₁_,n(x) =P i,jxixj Rn 0 φi(t)φj(t) Rt 0 φi(v)φj(v)ρ(v)dv dt. This term can be estimated through Proposition5.2as

A₁_,n(x) = n2 2ˇτ + X i,j x_ix_j Z n 0 φ_i(t)φ_j(t) Z t 0 φ_i(v)φ_j(v) Υ(v)dv dt ≤ n 2 2ˇτ +n φ 4 maxkΥk1 X i,j |x_i||x_j| ≤ 1 2ˇτ + φ 4 maxkΥk1 n2.

So, reminding thatσ_Q=%2₁+%₂2/τˇand thatφ_max≥1, we obtain that

E Z n 0 A2_t(x)dt≤σQ 2 + φ 4 maxkΥk1 n2 ≤ 1 4+kΥk1 φ4_max(1 +σ_Q2)n2. (7.14)

Taking into account that

sup t≥0 B_t2(x) ≤ φ4_max   n X j=1 |x_j|   2 ≤ φ4_maxn , (7.15)

thatφ_max ≥1,and that%4₁ ≤σ2_Qwe estimate the expectation in (7.13) as

(30)

Moreover, taking into account that the random variable Y_k is independent of

A_T

k−(x)and of the field

G =σ{T_j, j ≥ 1}and thatEA_T k−(x) |G = 0, we get E +∞ X k=1 B_T k−(x)ATk−(x)Y 3 k1{Tk≤n}= +∞ X k=1 EEB_T k−(x)ATk−(x)Y 3 k1{Tk≤n}|G =EY₁3E +∞ X k=1 B_T k−(x)1{Tk≤n}E(ATk−(x) |G) = 0. Therefore, EDn(x) =%2₂EY14D1,n(x) + 4D2,n(x), (7.17) where D₁_,n(x) = +∞ X k=1 EB_T2 k−(x)1{Tk≤n} and D2,n(x) = +∞ X k=1 EA2_T k− (x)1_{_T k≤n}.

Using the bound (7.15) we can estimate the termD₁_,nasD₁_,n(x)≤φ4_maxnEN_n. Using here Corollary5.3, we obtain

D₁_,n(x)≤ |ρ|_∗φ4_maxn2. (7.18) Now, to estimate the last term in (7.17), note that the processA_t(x)can be rewritten as A_t(x) = Z t 0 Qx(t, s)dξs, with Qx(t, s) = n X j=1 x_jφ_j(s)φ_j(t). (7.19)

Applying Lemma6.3again, we obtain for anyk≥1

E A2_T k− (x)|G=%2₁ Z T_k 0 Q2_x(T_k, s)ds+%2₂ k−1 X j=1 Q2_x(T_k, T_j).

So, we can represent the last term in (7.17) as

D₂_,n=%2₁D(1)₂_,n+%2₂D(2)₂_,n, (7.20) where D(1)₂_,n = +∞ X k=1 E 1_{_T k≤n} Z T_k 0 Q2_x(T_k, s)ds

(31)

and D₂(2)_,n= +∞ X k=1 E 1{T_k≤n} k−1 X j=1 Q2_x(T_k, T_j).

Thanks to Proposition5.2we obtain

D(1)₂_,n = Z n 0 Z t 0 Q2_x(t, s)ds ρ(t) dt ≤ |ρ|_∗ Z n 0 Z n 0 Q2_x(t, s)dsdt .

In view of the definition ofQ_xin (7.19), we can rewrite the last integral as

Z n 0 Q2_x(t, s)ds= X 1≤i,j≤n x_ix_jφ_i(t)φ_j(t) Z n 0 φ_i(s)φ_j(s) ds = n X i=1 x2_i φ2_i(t) Z n 0 φ2_i(s) ds= n n X i=1 x2_i φ2_i(t). SincePn j=1 x 2 j ≤1, we obtain that, Z n 0

Q_x2(t, s)ds ≤ φ2_maxn and D₂(1)_,n ≤φ2_max|ρ|_∗n2. (7.21)

Let us estimate now the last term in (7.20). First, note that we can represent this term as D₂(2)_,n= +∞ X k=1 E 1_{_T k≤n} k−1 X j=1 Q2_x(T_k, T_j) = ∞ X j=1 1_{_T j≤n}G(Tj) = Z n 0 G(t)ρ(t)dt , where G(t) = +∞ X k=1 E 1_{_T k≤n}Q 2 x((t+Tk), t) = Z n 0 Q2_x(t+v, t)ρ(v)dv = Z n+t t Q2_x(u, t)ρ(u−t)du .

It is clear that, for any0≤t≤n,

Z n+t t Q2_x(u, u−t)ρ(u) du ≤ |ρ|_∗ Z 2n 0 Q2_x(v, t) dv .

(32)

In view of the inequality (7.21) we obtain Z 2n 0 Q2_x(u, t) du= Z 2n 0 Q2_x(t, u) du ≤2φ2_maxn . Therefore, max 0≤t≤nG(t)

≤ 2|ρ|_∗φ2_maxn and D₂(2)_,n≤ 2|ρ|2_∗φ2_maxn2.

So, estimating%2₂ by τ σˇ _Q and taking into account thatEY₁4 ≥ 1,we obtain that we obtain that

EDn(x)≤13 (1 + ˇτ)φ4_max EY₁4(1 +|ρ|2_∗)n2σ_Q.

Using all these bound in (7.11) we obtain (7.5) and thus the conclusion follows.2

Remark 7.1. The properties(7.2)and(7.5)are used to obtain the oracle

inequal-ities given in Section4(see, for example, [14]).

8 Simulation

In this section we report the results of a Monte Carlo experiment in order to assess the performance of the proposed model selection procedure (3.15). In (1.1) we chose a1-periodic function which is defined, for0≤t≤1,as

S(t) =tsin(2πt) +t2(1−t) cos(4πt). (8.1) We simulate the model

dy_t=S(t)dt+ dξ_t,

whereξt= 0.5dwt+ 0.5dzt.

Herez_t is the semi-Markov process defined in (2.3) with a GaussianN(0,1)

sequence(Y_j)_j≥1and(τk)k≥1used in (2.4) taken asτk ∼χ2₃.

We use the model selection procedure (3.15) with the weights (3.20) in which

k∗ = 100 +pln(n), t_i = i/ln(n),m = [ln2(n)]andδ = (3 + ln(n))−2_{. We}

define the empirical risk as R= 1 p p X j=1 ˆ ESb_n(t_j)−S(t_j) 2 , (8.2)

where the observation frequencyp = 100001and the expectation was taken as an average overN = 10000replications, i.e.,

ˆ E b Sn(.)−S(.) 2 = 1 N N X l=1 b S_nl(.)−S(.) 2 .

(33)

n R R∗

20 0.04430 0.235

100 0.01290 0.068

200 0.00812 0.043

1000 0.00196 0.010

Table 1: Empirical risks

We set the relative quadratic risk as

R∗=R/||S||2_p, with ||S||2_p = 1 p p X j=0 S2(tj). (8.3) In our case||S||2 p= 0.1883601.

Table1gives the values for the sample risks (8.2) and (8.3) for different num-bers of observationsn.

Figures 1–4 show the behaviour of the regression function and its estimates by the model selection procedure (3.15) depending on the values of observation periodsn. The black full line is the regression function (8.1) and the red dotted line is the associated estimator.

(34)

0.0 0.2 0.4 0.6 0.8 1.0 − 1. 0 − 0. 50 .00 .5

Figure 1: Estimator ofSforn= 20

0.0 0.2 0.4 0.6 0.8 1.0 − 1. 0 − 0. 50 .00 .5

(35)

0.0 0.2 0.4 0.6 0.8 1.0 − 1. 0 − 0. 50 .00 .5

Figure 3: Estimator ofSforn= 200

0.0 0.2 0.4 0.6 0.8 1.0 − 1. 0 − 0. 50 .00 .5

(36)

Remark 8.1. From numerical simulations of the procedure (3.15) with various

observation numbersnwe may conclude that the quality of the proposed

proce-dure: (i) is good for practical needs, i.e. for reasonable (non large) number of observations; (ii) is improving as the number of observations increases.

9 Proofs

We will prove here most of the results of this paper.

9.1 Proof of Theorem4.1

First, note that we can rewrite the empirical squared error in (3.9) as follows

Errn(λ) =Jn(λ) + 2

∞

X

j=1

λ(j)ˇθ_j,n+||S||2−δPn(λ), (9.1)

whereθˇ_j,n =θe_j,n−θ_jθb_j,n. Using the definition ofθe_j,nin (3.10) we obtain that

ˇ θ_j,n = √1 nθjξj,n+ 1 nξej,n+ 1 nςj,n+ σ_Q−σ_b_n n , whereς_j,n=E_Qξ_j,n2 −σ_Qandξe_j,n =ξ2 j,n−EQξj,n2 . Putting M(λ) = √1 n n X j=1 λ(j)θ_jξ_j,n and P_n0= σQ|λ| 2 n , (9.2) we can rewrite (9.1) as Errn(λ) =Jn(λ) + 2 σ_Q−σ_b_n n Lˇ(λ) + 2M(λ) + 2 nB1,Q,n(λ) + 2pP0 n(λ) B₂_,Q,n(e(λ) √ σ_Qn +kSk 2₋_ρP n(λ), (9.3)

wheree(λ) =λ/|λ|, the functionLˇ(·)is defined in (3.8) and the functionsB₁_,Q,n(·)

(37)

Letλ0 = (λ0(j))1≤j≤nbe a fixed sequence inΛandbλbe as in (3.16).

Substi-tutingλ0andλbin Equation (9.3), we obtain

Errn(λb)−Err_n(λ₀) =J(λb)−J(λ₀) + 2 σ_Q−σ_b_Q n Lˇ($) + 2 nB1,Q,n($) + 2M($) + 2 q P0 n(bλ) B₂_,Q,n(eb) √ σ_Qn −2 q P0 n(λ0) B₂_,Q,n(e0) √ σ_Qn −δPn(bλ) +δPn(λ₀), (9.4)

where$=bλ−λ₀,_be=e(λb)ande₀=e(λ₀). Note that, by (3.8),

|Lˇ(ˆx)| ≤ Lˇ(ˆλ) + ˇL(λ)≤2|Λ|_∗.

Applying the inequality

2|ab| ≤δa2+δ−1b2 (9.5)

implies that, for anyλ∈Λ,

2pP0 n(λ) |B₂_,Q,n(e(λ))| √ σ_Qn ≤ δP 0 n(λ) + B₂2_,Q,n(e(λ)) δσ_Qn .

Taking into account the bound (7.2), we get

Errn(ˆλ)≤Errn(λ0) + 2M($) + 2C₁_,Q,n n + 2B₂∗_,Q,n δσ_Qn + 1 n|σb−σQ|(|bλ| 2₊_|_λ 0|2) + 2δPn(λ0),

where B₂∗_,Q,n = sup_λ_∈_ΛB2₂_,Q,n((e(λ)). Moreover, noting that in view of (3.8)

sup_λ_∈_Λ|λ|2 _{≤ |}_Λ_|

∗, we can rewrite the previous bound as

Errn(bλ)≤Err_n(λ₀) + 2M($) + 2C₁_,Q,n n + 2B₂∗_,Q,n δσ_Qn +4|Λ|∗ n |bσ−σQ|+ 2δPn(λ0). (9.6)

To estimate the second term in the right side of this inequality we set

S_x=

n

X

j=1

(38)

Thanks to (2.6) we estimate the termM(x)for anyx∈_Rn_as E_QM2(x)≤_κ_Q1 n n X j=1 x2(j)θ2_j =κQ 1 n||Sx|| 2_. _(9.7)

To estimate this function for a random vectorx∈_Rn_{we set}

Z∗ = sup

xεΛ1

nM2(x)

||Sx||2

, Λ1= Λ−λ0.

So, through Inequality (9.5), we get

2|M(x)| ≤δ||Sx||2+ Z∗

nδ. (9.8)

It is clear that the last term here can be estimated as E_QZ∗ ≤ X x∈Λ1 nE_QM2(x) ||Sx||2 ≤ X x∈Λ1 κQ=κQˇι , (9.9)

whereˇι=card(Λ). Moreover, note that, for anyx∈Λ1,

||Sx||2− ||Sxb ||2= n X j=1 x2(j)(θ2_j −θb2 j)≤ −2M1(x), (9.10) whereM₁(x) = n−1/2 Pn j=1 x 2₍_j₎_θ

jξj,n. Taking into account that, for anyx ∈

Λ1the components|x(j)| ≤1, we can estimate this term as in (9.7), i.e.,

E_QM₁2(x)≤κQ

||Sx||2

n .

Similarly to the previous reasoning we set

Z₁∗ = sup xεΛ1 nM₁2(x) ||Sx||2 and we get E_QZ₁∗≤κQˇι . (9.11)

Using the same type of arguments as in (9.8), we can derive

2|M1(x)| ≤δ||Sx||2+

Z₁∗

(39)

From here and (9.10), we get ||Sx||2≤ ||Sxb || 2 1−δ + Z₁∗ nδ(1−δ) (9.13)

for any0< δ <1. Using this bound in (9.8) yields

2M(x)≤ δ||Sxb || 2

1−δ +

Z∗+Z₁∗ nδ(1−δ).

Taking into account thatkSb_$k2 ≤2 (Err_n(bλ) +Err_n(λ₀)), we obtain

2M($)≤ 2δ(Errn(bλ) +Errn(λ0))

1−δ +

Z∗+Z₁∗ nδ(1−δ).

Using this bound in (9.6) we obtain

Errn(bλ)≤ 1 +δ 1−3δErrn(λ0) + Z∗+Z₁∗ nδ(1−3δ) + 2C₁_,Q,n n(1−3δ) + 2B₂∗_,Q,n δ(1−3δ)σ_Qn +(4|Λ|∗+ 2) n(1−3δ) |σb−σQ|+ 2δ (1−3δ)P 0 n(λ0).

Moreover, for0< δ <1/6,we can rewrite this inequality as

Errn(λb)≤ 1 +δ 1−3δErrn(λ0) + 2(Z∗+Z₁∗) nδ + 4C₁_,Q,n n + 4B₂∗_,Q,n δσ_Qn +(8|Λ|∗+ 2) n |σbn−σQ|+ 2δ (1−3δ)P 0 n(λ0).

In view of Proposition7.2we estimate the expectation of the termB₂∗_,Q,nin (9.6) as

E_QB₂∗_,Q,n ≤X

λ∈Λ

E_QB₂2_,Q,n(e(λ))≤ˇιC₂_,Q,n.

Taking into account that|Λ|_∗ ≥1, we get

R(Sb∗, S)≤ 1 +δ 1−3δR(Sbλ0, S) + 4κQˇι nδ + 4C₁_,Q,n n + 4ˇιC₂_,Q,n δσ_Qn +10|Λ|∗ n EQ|bσ−σQ|+ 2δ (1−3δ)P 0 n(λ0).

Using the upper bound forPn(λ0)in LemmaA.1, one obtains (4.1), that finishes

(40)

9.2 Proof of Proposition4.2

We use here the same method as in [11]. First of all note that the definition (3.12) implies that b t_j,n=t_j+√1 nηj,n, (9.14) where t_j = Z 1 0 S(t)T r_j(t)dt and η_j,n= √1 n Z n 0 Tr_j(t) dξ_t. So, we have b σ_n= n X j=[√n]+1 t2_j + 2M_n+ 1 n n X j=[√n]+1 η_j,n2 , (9.15) where M_n= √1 n n X j=[√n]+1 t_jη_j,n.

Note that, for continuously differentiable functions (see, for example, Lemma A.6 in [11]), the Fourier coefficients(t_j)satisfy the following inequality, for anyn≥1,

∞ X j=[√n]+1 t2_j ≤ 4R1 0 |S˙(t)|dt 2 √ n ≤ 4kS˙k2 √ n . (9.16)

In the same way as in (9.7) we estimate the termM_n, i.e.,

E_QM_n2≤ κQ n n X j=[√n]+1 t2_j ≤ 4κQk ˙ Sk2 n√n ,

while the absolute value of this term forn≥1can be estimated as

|E_QM_n| ≤ κQ+k

˙

Sk2 √

n .

Moreover, using Propositions7.1and7.2 we can represent the last term in (9.15) as 1 n n X j=[√n]+1 η2_j,n= σQ(n− √ n) n + B₁_,Q,n(x0) n + B₂_,Q,n(x00) √ n ,

(41)

withx0_j =1{√n<j≤n}andx00_j =1{√n<j≤n}/ √ n. Therefore, E_Q 1 n n X j=[√n]+1 η2_j,n−σ_Q ≤ √σQ n + C₁_,Q,n n + p C₂_,Q,n √ n .

Taking into account thatC₂_,Q,n ≥ 1,we obtain the bound (4.5) and hence the

desired result. 2

First note, that in view of (3.22) and (3.18)

lim n→∞ ˇ ι nˇ = lim_n_→∞ k∗m nˇ = 0 for any ˇ >0.

Furthermore, the bound (3.23) and the conditions (2.10) and (3.18) yield

lim

n→∞ |Λ|_∗

n1/3+ˇ = 0 for any >ˇ 0.

So, from here we obtain the convergence (4.10). 2

First, we denote byQ₀the distribution of the noise (1.2) and (2.1) with the param-eter%₁ =ς∗,%ˇ= 1and%₂ = 0, i.e. the distribution for the “signal + white noise” model. So, we can estimate as below the robust risk

R∗_n(Se_n, S)≥ R_Q

0(Sen, S).

Now Theorem 6.1 from [12] yields the lower bound (4.14). Hence this finishes the

proof. 2

9.5 Proof of Proposition4.6

Puttingλ₀(j) = 0forj ≥nwe can represent the quadratic risk for the estimator (3.7) as kSb_λ 0−Sk 2₌ ∞ X j=1 (1−λ0(j))2θ_j2−2Hn+ 1 n n X j=1 λ2₀(j)ξ2_j,n,

(42)

whereHn = n−1/2 Pn

j=1(1−λ0(j))λ0(j)θjξj,n. Note thatEQHn = 0for any Q∈Qn, therefore, E_QkSb_λ 0 −S k 2₌ ∞ X j=1 (1−λ0(j))2θ_j2+ 1 nEQ n X j=1 λ2₀(j)ξ_j,n2 .

Proposition7.1and the last inequality in (2.9) imply that for anyQ∈ Q_n

EQ n X j=1 λ2₀(j)ξ_j,n2 ≤ς∗ n X j=1 λ2₀(j) +φ 2 maxς ∗_k_Υ_k 1 ˇ τ :=ς ∗ n X j=1 λ2₀(j) +C∗₁_,n. Therefore, R∗_n(Sb_λ 0, S)≤ ∞ X j=j∗ (1−λ0(j))2θj2+ 1 υ_n n X j=1 λ2₀(j) + C ∗ 1,n n ,

wherej∗andυnare defined in (3.20). Setting

Υ₁_,n(S) =υ2_nk/(2k+1) ∞ X j=j∗ (1−λ0(j))2θ2j and Υ2,n= 1 υn1/(2k+1) n X j=1 λ2₀(j),

we rewrite the last inequality as

υ_n2k/(2k+1)R_n∗(Sλb

0, S)≤Υ1,n(S) + Υ2,n+ ˇCn, (9.17)

whereCˇ_n =υ_n2k/(2k+1)C∗₁_,n/n. Note, that the conditions (2.10) and (4.8) imply thatC∗₁_,n = o(nδˇ) asn → ∞ for anyδ >ˇ 0; therefore, Cˇ_n → 0as n → ∞. Putting

u_n=υ2_nk/(2k+1)sup

j≥j∗

(1−λ0(j))2/aj,

withajdefined in (4.12), we estimate the first term in (9.17) as

sup S∈Wk r Υ₁_,n(S)≤ sup S∈Wk r u_n X j≥1 ajθj ≤u_nr.

Taking into account thata_j/(π2kj2k) → 1 asj → ∞andl₀ → rasε → 0and using the definition ofω_α

0 in (3.20), we obtain that lim sup n→∞ u_n≤ lim n→∞ υ 2k/(2k+1) n sup j≥j∗ (1−λ0(j))2 (π j)2k = lim n→∞ υ_n2k/(2k+1) π2k_ω2k α₀ = 1 π2k_(d kr)2k/(2k+1) .

(43)

Therefore, lim sup n→∞ sup S∈Wk r Υ1,n(S)≤ r1/(2k+1) π2k_(d k)2k/(2k+1) =: Υ∗₁. (9.18)

As to the second term in (9.17), note that

lim n→∞ 1 ω_α 0 n X j=1 λ2₀(j) = Z 1 0 (1−tk)2dt= 2k 2 (k+ 1)(2k+ 1).

So, taking into account thatω_α

0/υ 1/(2k+1) n → (dkr)1/(2k+1)asn→ ∞, the limit ofΥ₂_,n can be calculated as lim n→∞Υ2,n= 2(d_kr)1/(2k+1)k2 (k+ 1)(2k+ 1) =: Υ ∗ 2.

Moreover, sinceΥ∗₁+ Υ∗₂=:r∗_k, we obtain

lim n→∞υ 2k/(2k+1) n sup S∈Wk r R∗_n(Sb_λ 0, S)≤r ∗ k

and get the desired result. 2

Combining Proposition4.6and Theorem4.4yields Theorem4.6. 2

Acknowledgments. This research was partially supported by the Ministry of

Education and Science of the Russian Federation, project (No 2.3208.2017/PCH), by the Russian Federal Professor program of the Ministry of Education and Science of the Russian Federation, (project No 1.472.2016/FPM) and by the Academic D.I. Mendeleev Fund Program of the Tomsk State University (research project NU 8.1.55.2015 L).

10 Appendix

A.1 Property of the penalty term

Lemma A.1. For anyn≥ 1andλ∈Λ,

P_n0(λ)≤E_QErr_n(λ) +C1,Q,n

n ,

where the coefficientP0