• No results found

Robust adaptive efficient estimation for semi-Markov nonparametric regression models

N/A
N/A
Protected

Academic year: 2021

Share "Robust adaptive efficient estimation for semi-Markov nonparametric regression models"

Copied!
49
0
0

Loading.... (view fulltext now)

Full text

(1)

HAL Id: hal-02334912

https://hal.archives-ouvertes.fr/hal-02334912

Submitted on 27 Oct 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Robust adaptive efficient estimation for semi-Markov

nonparametric regression models

Vlad Stefan Barbu, Slim Beltaief, Serguei Pergamenshchikov

To cite this version:

Vlad Stefan Barbu, Slim Beltaief, Serguei Pergamenshchikov. Robust adaptive efficient estimation for semi-Markov nonparametric regression models. Statistical Inference for Stochastic Processes, Springer Verlag, 2019, 22 (2), pp.187 - 231. �hal-02334912�

(2)

Robust adaptive efficient estimation for

semi-Markov nonparametric regression models

Vlad Stefan Barbu†, Slim Beltaief‡and Serguei Pergamenshchikov§

Abstract

We consider the nonparametric robust estimation problem for regression models in continuous time with semi-Markov noises. An adaptive model se-lection procedure is proposed. Under general moment conditions on the noise distribution a sharp non-asymptotic oracle inequality for the robust risks is obtained and the robust efficiency is shown. It turns out that for semi-Markov models the robust minimax convergence rate may be faster or slower than the classical one.

MSC:primary 62G08, secondary 62G05

Keywords: Non-asymptotic estimation; Robust risk; Model selection; Sharp oracle inequality; Asymptotic efficiency.

This work was done under partial financial support of the grant of RSF number 14-49-00079 (National Research University “MPEI” 14 Krasnokazarmennaya, 111250 Moscow, Russia) and of the RFBR Grant 16-01-00121.

Laboratoire de Mathématiques Raphaël Salem, UMR 6085 CNRS-Université de Rouen, France, e-mail: [email protected]

Laboratoire de Mathématiques Raphaël Salem, UMR 6085 CNRS-Université de Rouen, France, e-mail: [email protected]

§

Laboratoire de Mathématiques Raphaël Salem, UMR 6085 CNRS-Université de Rouen, France and International Laboratory of Statistics of Stochastic Processes and Quantitative Finance of Na-tional Research Tomsk State University, e-mail: [email protected]

(3)

1

Introduction

Let us consider a regression model in continuous time

dyt=S(t)dt+ dξt, 0≤t≤n , (1.1) where S(·) is an unknown 1-periodic function from L2[0,1] defined on R with

values inR, the noise process(ξt)t≥0is defined as

ξt=%1Lt+%2zt, (1.2)

where%1and%2are unknown coefficients,(Lt)t0is a Levy process and the pure jump process(zt)t≥1,defined in (2.3), is assumed to be a semi-Markov process

(see, for example, [2]).

The problem is to estimate the unknown functionSin the model (1.1) on the basis of observations(yt)0≤t≤n. Firstly, this problem was considered in the

frame-work of the “signal+white noise” models (see, for example, [6] or [24]). Later, in order to study dependent observations in continuous time, were introduced “sig-nal+color noise” regressions based on Ornstein-Uhlenbeck processes (cf. [8], [9], [10], [13]).

Moreover, to include jumps in such models, the papers [14] and [15] used non Gaussian Ornstein-Uhlenbeck processes introduced in [1] for modeling of the risky assets in the stochastic volatility financial markets. Unfortunately, the dependence of the stable Ornstein-Uhlenbeck type decreases with a geometric rate. So, asymp-totically when the duration of observations goes to infinity, we obtain very quickly the same “signal+white noise” model.

The main goal of this paper is to consider continuous time regression models with dependent observations for which the dependence does not disappear for a sufficient large duration of observations. To this end we define the noise in the model (1.1) through a semi-Markov process which keeps the dependence for any durationn. This type of models allows, for example, to estimate the signals ob-served under long impulse noise impact with a memory or “against signals”.

In this paper we use the robust estimation approach introduced in [14] for such problems. To this end, we denote byQthe distribution of(ξt)0≤t≤n in the

Sko-rokhod spaceD[0, n]. We assume thatQis unknown and belongs to some distri-bution familyQnspecified in Section4. In this paper we use the quadratic risk

RQ(Sen, S) =EQ,SkSen−Sk2, (1.3)

wherekfk2 = R1

0 f

2(s)dsandE

Q,S is the expectation with respect to the

(4)

the noise distributionQis unknown, it seems reasonable to introduce the robust risk of the form

R∗n(Sen, S) = sup

Q∈Qn

RQ(Sen, S), (1.4)

which enables us to take into account the information thatQ∈ Qnand ensures the quality of an estimateSenfor all distributions in the familyQn.

To summarize, the goal of this paper is to develop robust efficient model selec-tion methods for the model (1.1) with the semi-Markov noise having unknown dis-tribution, based on the approach proposed by Konev and Pergamenshchikov in [14] and [15] for continuous time regression models with semimartingale noises. Unfor-tunately, we cannot use directly this method for semi-Markov regression models, since their tool essentially uses the fact that the Ornstein-Uhlenbeck dependence decreases with geometrical rate and the “white noise” case is obtained sufficiently quickly.

Thus in the present paper we propose new analytical tools based on renewal methods to obtain the sharp non-asymptotic oracle inequalities. As a consequence, we obtain the robust efficiency for the proposed model selection procedures in the adaptive setting.

The rest of the paper is organized as follows. We start by introducing the main conditions in the next section. Then, in Section3we construct the model selection procedure on the basis of the weighted least squares estimates. The main results are stated in Section4; here we also specify the set of admissible weight sequences in the model selection procedure. In Section5we derive some renewal results useful for obtaining other results of the paper. In Section6we develop stochastic calculus for semi Markov processes. In Section7 we study some properties of the model (1.1). A numerical example is presented in Section8. Most of the results of the paper are proved in Section9. In Appendix some auxiliary propositions are given.

2

Main conditions

In the model (1.2) we assume that the Levy processLtis defined as

Lt= ˇ% wt+p1−%ˇ2Lˇ

t, Lˇt=x∗(µ−µe)t, (2.1)

where,0≤%ˇ≤1is an unknown constant,(wt)t0is a standard Brownian motion,

µ(ds,dx) is the jump measure with the deterministic compensator µe(dsdx) =

dsΠ(dx), whereΠ(·) is some positive measure onR(see, for example [7,3] for

details) for which we assume that

(5)

where we use the usual notationΠ(|x|m) = R

R |z|

mΠ(dz)for anym >0. Note

thatΠ(R)may be equal to+∞. Moreover, we assume that the pure jump process (zt)t0in (1.2) is a semi-Markov process with the following form

zt=

Nt

X

i=1

Yi, (2.3)

where(Yi)i1 is an i.i.d. sequence of random variables with EYi = 0, EYi2= 1 and EYi4 <∞.

HereNtis a general counting process (see, for example, [19]) defined as

Nt= ∞ X k=1 1{Tk≤t} and Tk= k X l=1 τl, (2.4)

where(τl)l1 is an i.i.d. sequence of positive integrated random variables with distributionηand meanτˇ=Eτ1>0. We assume that the processes(Nt)t0and

(Yi)i≥1 are independent between them and are also independent of(Lt)t≥0.

Note that the process(zt)t0is a special case of a semi-Markov process (see, e.g., [2] and [17]).

Remark 2.1. It should be noted that ifτj are exponential random variables, then

(Nt)t0 is a Poisson process and, in this case,(ξt)t0is a Levy process for which this model has been studied in [11], [12] and [14]. But, in the general case when

the process(2.3) is not a Levy process, this process has a memory and cannot be

treated in the framework of semi-martingales with independent increments. In this case, we need to develop new tools based on renewal theory arguments, what we

do in Section5. This tools will be intensively used in the proofs of the main results

of this paper.

Note that for any function f from L2[0, n], f : [0, n] → R, for the noise process(ξt)t0defined in (1.2), with(zt)t0given in (2.3), the integral

In(f) =

Z n

0

f(s)dξs (2.5)

is well defined withEQIn(f) = 0. Moreover, as it is shown in Lemma6.2,

EQIn2(f)≤κQ

Z n

0

(6)

whereκQ =%21+% 2

2|ρ|∗and|ρ|∗ = supt≥0|ρ(t)|<∞. Hereρis the density of

the renewal measureηˇdefined as

ˇ η= ∞ X l=1 η(l), (2.7)

whereη(l)is thelth convolution power forη. To study the series (2.7) we assume that the measureηhas a densitygwhich satisfies the following conditions.

H1)Assume that, for anyx∈R,there exist the finite limits

g(x−) = lim

z→x−g(z) and g(x+) = limz→x+g(z)

and, for anyK >0,there existsδ =δ(K)>0for which

sup |x|≤K Z δ 0 |g(x+t) +g(x−t)−g(x+)−g(x−)| t dt < ∞. H2)For anyγ >0, sup z≥0 zγ|2g(z)−g(z−)−g(z+)| < ∞.

H3)There existsβ >0such thatR

Re

βxg(x) dx <.

Remark 2.2. It should be noted that the conditionH3)means that there exists an

exponential moment for the random variable(τj)j1, i.e. these random variables

are not too large. This is a natural constraint since these random variables define the intervals between jumps, i.e., the frequency of the jumps. So, to study the

influence of the jumps in the model(1.1)one needs to consider the noise process

(1.2)with “small” interval between jumps or large jump frequency.

For the next condition we need to introduce the Fourier transform of any func-tionf fromL1(R), f :R→R,defined as

b f(θ) = 1 2π Z R eiθxf(x) dx. (2.8)

H4)There existst∗ >0such that the functiongb(θ−it)belongs toL1(R)for

(7)

It is clear that Conditions H1)–H4) hold true for any continuously differen-tiable functiong, for example for the exponential density.

Now we define the family of the noise distributions for the model (1.1) which is used in the robust risk (1.4). Note that any distributionQfromQnis defined by the unknown parameters in (1.2) and (2.1). We assume that

ς∗ ≤%21 ≤ς∗, 0≤%ˇ≤1 and ς∗≤σQ≤ς

, (2.9)

whereσQ =%2

1+%22/τˇ, the unknown bounds0< ς∗ ≤ς∗ are functions ofn, i.e.

ς∗=ς∗(n)andς∗ =ς∗(n), such that for anyˇ >0,

lim n→∞n ˇ ς ∗(n) = +∞ and lim n→∞ ς∗(n) nˇ = 0. (2.10)

Remark 2.3. As we will see later, the parameterσQ is the limit for the Fourier

transform of the noise process(1.2). Such limit is called variance proxy (see [14]).

Remark 2.4. Note that, generally (but it is not necessary) the parameters%1and%2

can be dependent onn. The conditions(2.10)means that we consider all possible

cases, i.e. these parameters may go to the infinity or be constant or to the zero as well. See, for example, the conditions (3.32) in [15].

3

Model selection

Let(φj)j≥1be an orthonormal uniformly bounded basis inL2[0,1], i.e., for some

constantφ ≥1, which may be depend onn,

sup

0≤j≤n

sup

0≤t≤1

j(t)| ≤ φ∗<∞. (3.1)

We extend the functionsφj(t)by periodicity, i.e., we setφj(t) :=φj({t}), where {t}is the fractional part oft≥0. For example, we can take the trigonometric basis defined as Tr1≡1and, forj≥2,

Trj(x) =√2

 

cos(2π[j/2]x) for even j; sin(2π[j/2]x) for odd j,

(3.2)

where[x]denotes the integer part ofx.

To estimate the functionSwe use here the model selection procedure for con-tinuous time regression models from [14] based on the Fourrier expansion. We recall that for any functionSfromL2[0,1]we can write

S(t) = ∞ X j=1 θjφj(t) and θj = (S, φj) = Z 1 0 S(t)φj(t)dt . (3.3)

(8)

So, to estimate the function S it suffices to estimate the coefficients θj and to replace them in this representation by their estimators. Using the fact that the functionSandφj are1- periodic we can write that

θj = 1

n

Z n

0

φj(t)S(t)dt .

If we replace here the differentialS(t)dtby the stochastic observed differentialdyt

we obtain the natural estimate forθj on the time interval[0, n]

b θj,n= 1 n Z n 0 φj(t)dyt, (3.4)

which can be represented, in view of the model (1.1), as

b θj,nj+ √1 nξj,n, ξj,n= 1 √ nIn(φj). (3.5)

Now (see, for example, [6]) we can estimate the functionSby the projection esti-mators, i.e. b Sm(t) = m X j=1 b θj,nφj(t), 0≤t≤1, (3.6)

for some numberm→ ∞asn→ ∞. It should be noted that Pinsker in [24] shows that the projection estimators of the form (3.6) are not efficient. For obtaining efficient estimation one needs to use weighted least square estimators defined as

b Sλ(t) = n X j=1 λ(j)θbj,nφj(t), (3.7)

where the coefficientsλ = (λ(j))1≤j≤nbelong to some finite setΛ from[0,1]n.

As it is shown in [24], in order to obtain efficient estimators, the coefficientsλ(j)

in (3.7) need to be chosen depending on the regularity of the unknown functionS. In this paper we consider the adaptive case, i.e. we assume that the regularity of the functionSis unknown. In this case we chose the weight coefficients on the basis of the model selection procedure proposed in [14] for the general semi-martingale regression model in continuous time. These coefficients will be obtained later in (3.20). To the end, first we set

ˇι= #(Λ) and |Λ| = 1 + max

λ∈Λ

ˇ

(9)

where#(Λ)is the cardinal number ofΛandLˇ(λ) =Pn

j=1λ(j). Now, to choose

a weight sequenceλin the setΛwe use the empirical quadratic risk, defined as Errn(λ) =kSbλ−Sk2,

which in our case is equal to

Errn(λ) = n X j=1 λ2(j)bθ2 j,n−2 n X j=1 λ(j)bθj,nθj+ ∞ X j=1 θ2j. (3.9)

Since the Fourier coefficients(θj)j≥1 are unknown, we replace the termsθbj,nθj,n

by e θj,n =θb2 j,n− b σn n , (3.10)

wherebσnis an estimate for the variance proxyσQdefined in (2.9). If it is known, we takebσn=σQ; otherwise, we can choose it, for example, as in [14], i.e.

b σn= n X j=[√n]+1 b t2j,n, (3.11)

wherebtj,nare the estimators for the Fourier coefficients with respect to the

trigono-metric basis (3.2), i.e.

b tj,n = 1 n Z n 0 T rj(t)dyt. (3.12)

Finally, in order to choose the weights, we will minimize the following cost func-tion Jn(λ) = n X j=1 λ2(j)bθ2 j,n−2 n X j=1 λ(j)θej,n+δ Pn(λ), (3.13)

whereδ >0is some threshold which will be specified later and the penalty term is

Pn(λ) = σbn|λ|

2

n . (3.14)

We define the model selection procedure as

b

S∗=Sbλˆ, (3.15)

where

b

(10)

We recall that the setΛis finite soλˆexists. In the case whenλˆis not unique, we take one of them.

Let us now specify the weight coefficients(λ(j))1jn. Consider, for some fixed0< ε <1,a numerical grid of the form

A={1, . . . , k∗} × {ε, . . . , mε}, (3.17) wherem= [1/ε2]. We assume that both parametersk∗ ≥1andεare functions of

n, i.e.k∗=k∗(n)andε=ε(n), such that

       limn→∞ k∗(n) = +∞, limn→∞ k ∗(n) lnn = 0,

limn→∞ ε(n) = 0 and limn→∞ nδˇε(n) = +∞

(3.18)

for anyδ >ˇ 0. One can take, for example, forn≥2

ε(n) = 1 lnn and k ∗ (n) =k∗0+ √ lnn , (3.19)

wherek∗0 ≥0is some fixed constant and the thresholdς∗(n)is introduced in (2.9). For eachα= (β,l)∈ A, we introduce the weight sequence

λα = (λα(j))1jn

with the elements

λα(j) =1{1≤j<j∗}+ 1−(j/ωα)β 1{j∗≤j≤ωα}, (3.20) wherej∗= 1 + [lnυn],ωα= (dβlυn)1/(2β+1), dβ = (β+ 1)(2β+ 1) π2ββ and υn=n/ς ∗.

Now we define the setΛas

Λ = {λα, α∈ A}. (3.21)

It will be noted that in this case the cardinal of the setΛis

ˇ

ι=k∗m . (3.22)

Moreover, taking into account thatdβ <1forβ ≥1we obtain for the set (3.21) |Λ| ≤ 1 + sup

α∈A

(11)

Remark 3.1. Note that the form (3.20) for the weight coefficients in (3.7) was proposed by Pinsker in [24] for the efficient estimation in the nonadaptive case,

i.e. when the regularity parameters of the functionS are known. In the adaptive

case these weight coefficients are used in [14,15] to show the asymptotic efficiency

for model selection procedures.

4

Main results

In this section we obtain in Theorem4.3the non-asymptotic oracle inequality for the quadratic risk (1.3) for the model selection procedure (3.15) and in Theorem

4.4 the non- asymptotic oracle inequality for the robust risk (1.4) for the same model selection procedure (3.15), considered with the coefficients (3.20). We give the lower and upper bound for the robust risk in Theorems4.5and4.7, and also the optimal convergence rate in Corollary4.8.

Before stating the non-asymptotic oracle inequality, let us first introduce the following parameters which will be used for describing the rest term in the oracle inequalities. For the renewal densityρdefined in (2.7) we set

Υ(x) =ρ(x)−1 ˇ τ and kΥk1 = Z +∞ 0 |Υ(x)|dx , (4.1)

whereτˇ = Eτ1. In Proposition5.2we show that|ρ| = supt0|ρ(t)|< ∞and kΥk1<∞. So, using this, we can introduce the following parameters

ΨQ= 4κQˇι+ 5 + 4ˇι σQ ! σQˇτ φ2maxkΥk1+φ4max(1 +σQ2)3ˇl (4.2) and c∗QQ+ 2κQ+σQτ φˇ 2maxkΥk1+φ4max(1 +σ 2 Q) 2ˇl, (4.3) whereˇl = (4ˇτ2 + 8)kΥk1 + 5 + 13(1 + ˇτ)2(1 +|ρ|2 ∗)(EY14) + 4Π(x 4). First,

let us state the non-asymptotic oracle inequality for the quadratic risk (1.3) for the model selection procedure (3.15).

Theorem 4.1. Assume that ConditionsH1)–H4)hold. Then, for anyn≥ 1and

0 < δ < 1/6, the estimator of S given in (3.15) satisfies the following oracle inequality RQ(Sb∗, S)≤ 1 + 3δ 1−3δminλ∈ΛRQ( b Sλ, S) + ΨQ+ 10|Λ|∗ES|bσn−σQ| nδ . (4.4)

(12)

Now we study the estimate (3.11).

Proposition 4.2. Assume that ConditionsH1)–H4)hold and that the functionS(·)

is continuously differentiable. Then, for anyn≥2,

EQ,S|bσn−σQ| ≤ 6k ˙ Sk2+c∗ Q √ n . (4.5)

Theorem4.1and Proposition4.2implies the following result.

Theorem 4.3. Assume that ConditionsH1)–H4) hold and that the functionS is

continuously differentiable. Then, for anyn≥ 1and0< δ ≤1/6, the procedure

(3.15),(3.11)satisfies the following oracle inequality

RQ(Sb∗, S)≤ 1 + 3δ 1−3δ minλ∈Λ RQ(Sbλ, S) + 60ΛenkS˙k2+ΨeQ,n nδ , (4.6) whereΨeQ,n= 10Λenc∗ Q+ ΨQandΛen=|Λ|/ √ n.

Remark 4.1. Note that the coefficientκQcan be estimated asκQ≤(1+ˇτ|ρ|∗)σQ.

Therefore,taking into account thatφ4max ≥1, the remainder term in(4.6) can be

estimated as e ΨQ,n ≤C∗ 1 +σQ6 + 1 σQ ! (1 +Λen)ˇιφ4 max, (4.7)

whereC>0is some constant which is independent of the distributionQ.

Furthermore, let us study the robust risk (1.4) for the procedure (3.15). In this case, the distribution familyQnconsists in all distributions on the Skorokhod space D[0, n]of the process (1.2) with the parameters satisfying the conditions (2.9) and (2.10).

Moreover, we assume also that the number of the weight vectors and the upper bound for the basis functions in (3.1) may depend onn ≥ 1, i.e. ˇι = ˇι(n) and

φ∗ =φ∗(n), such that for anyˇ >0

lim n→∞ ˇ ι(n) nˇ = 0 and nlim→∞ φ∗(n) nˇ = 0. (4.8)

The next result presents the non-asymptotic oracle inequality for the robust risk (1.4) for the model selection procedure (3.15), considered with the coefficients (3.20).

(13)

Theorem 4.4. Assume that Conditions H1) – H4) hold and that the unknown

functionSis continuously differentiable. Then, for the robust risk defined in(1.4)

through the distribution family(2.9)–(2.10), the procedure(3.15) with the

coef-ficients (3.20) for any n ≥ 1 and0 < δ < 1/6, satisfies the following oracle

inequality R∗(Sb∗, S)≤ 1 + 3δ 1−3δminλ∈Λ R∗(Sbλ, S) + U∗n(S) nδ , (4.9)

where the sequenceU∗n(S) > 0 is such that, under the conditions(2.10),(3.18)

and(4.8), for anyr >0andδ >ˇ 0,

lim

n→∞ ksupS˙k≤r

U∗n(S)

nˇδ = 0. (4.10)

Now we study the asymptotic efficiency for the procedure (3.15) with the co-efficients (3.20), with respect to the robust risk (1.4) defined by the distribution family (2.9)–(2.10). To this end, we assume that the unknown functionS in the model (1.1) belongs to the Sobolev ball

Wrk={f ∈ Cperk [0,1] :

k

X

j=0

kf(j)k2 ≤r}, (4.11)

wherer > 0andk ≥ 1are some unknown parameters, Ck

per[0,1]is the set of k

times continuously differentiable functions f : [0,1] → Rsuch that f(i)(0) =

f(i)(1)for all0ik. The function classWk

r can be written as an ellipsoid in

L2[0,1], i.e., Wrk={f ∈ Cperk [0,1] : ∞ X j=1 ajθj2 ≤r}, (4.12) where aj = Pk i=0(2π[j/2]) 2i and θ j = R1

0 f(v)Trj(v)dv. We recall that the

trigonometric basis(Trj)j1is defined in (3.2).

Similarly to [14,15] we will show here that the asymptotic sharp lower bound for the robust risk (1.4) is given by

r∗k = ((2k+ 1)r)1/(2k+1) k (k+ 1)π 2k/(2k+1) . (4.13)

Note that this is the well-known Pinsker constant obtained for the nonadaptive filtration problem in “signal + small white noise” model (see, for example, [24]). Let Πn be the set of all estimators Sbn measurable with respect to the σ - field

σ{yt,0≤t≤n}generated by the process (1.1).

The following two results give the lower and upper bound for the robust risk in our case.

(14)

Theorem 4.5. Under Conditions(2.9)and(2.10), lim inf n→∞ υ 2k/(2k+1) n inf b Sn∈Πn sup S∈Wk r R∗n(Sbn, S)≥r∗k, (4.14) whereυn=n/ς∗.

Note that if the parametersrandkare known, i.e. for the non adaptive estima-tion case, then to obtain the efficient estimaestima-tion for the "signal+white noise" model Pinsker in [24] proposed to use the estimateSbλ

0 defined in (3.7) with the weights

(3.20) in which

λ0α

0 and α0 = (k,l0), (4.15)

wherel0 = [r/ε]ε. For the model (1.1) – (1.2) we show the same result.

Proposition 4.6. The estimatorSbλ

0satisfies the following asymptotic upper bound

lim n→∞υ 2k/(2k+1) n sup S∈Wk r R∗n(Sbλ 0, S)≤r ∗ k.

For the adaptive estimation we user the model selection procedure (3.15) with the parameterδdefined as a function ofnsatisfying

lim

n δn= 0 and limn n

ˇ

δδ

n= 0 (4.16)

for anyδ >ˇ 0. For example, we can takeδn= (6 + lnn)−1.

Theorem 4.7. Assume that ConditionsH1)–H4)hold true. Then the robust risk

defined in(1.4)through the distribution family(2.9)–(2.10)for the procedure(3.15)

based on the trigonometric basis(3.2)with the coefficients(3.20)and the

parame-terδ=δnsatisfying(4.16)has the following asymptotic upper bound

lim sup

n→∞

υn2k/(2k+1) sup

S∈Wrk

R∗n(Sb, S)≤r∗k. (4.17)

Theorem 4.5and Theorem 4.7 allow us to compute the optimal convergence rate.

Corollary 4.8. Under the assumptions of Theorem4.7, we have

lim n→∞ υ 2k/(2k+1) n inf b Sn∈Πn sup S∈Wk r R∗n(Sbn, S) =r∗k. (4.18)

Remark 4.2. It is well known that the optimal (minimax) risk convergence rate for

the Sobolev ballWrkisn2k/(2k+1)(see, for example, [24], [22]). We see here that

the efficient robust rate isυ2k/(2k+1)

n , i.e., if the distribution upper boundς

0

asn→ ∞,we obtain a faster rate with respect ton2k/(2k+1), and, ifς∗ → ∞as

n→ ∞,we obtain a slower rate. In the case whenς∗ is constant, than the robust

(15)

5

Renewal density

This section is concerned with results related to the renewal measure (2.7). We start with the following lemma.

Lemma 5.1. Let τ be a positive random variable with a density g, such that

Eeβτ < ∞ for someβ > 0. Then there exists a constantβ1,0 < β1 ≤ β for

which,

Ee(β1+iω)τ 6= 1 ω

R.

Proof.We will show this lemma by the contradiction, i.e. assume there exists some

sequence of positive numbers going to zero(γk)k1and a sequence(wk)k1such that

Ee(γk+iωk)τ = 1 (5.1)

for anyk≥1. Firstly assume thatlim supk→∞ wk= +∞. Note that in this case, for anyN ≥1, Z N 0 eγktcos(w kt)g(t)dt ≤ Z N 0 cos(wkt)g(t)dt + Z N 0 (eγkt−1) cos(w kt)g(t)dt ,

i.e., in view of LemmaA.4, for any fixedN ≥1

lim sup k→N Z N 0 eγktcos(w kt)g(t)dt= 0.

Since for someβ >0the integralR+∞

0 e βtg(t)dt <, we get lim k→∞ Z +∞ 0 eγktcos(w kt)g(t)dt= 0.

Let nowlim supk→∞wk = 06 and0<|ω|<∞. In this case there exists a sequence(lk)k≥1such thatlimk→∞wlk =ω∞, i.e.

1 = lim sup

k→∞

Eeγlkτcos(τ w

lk) =Ecos(τ w∞).

It is clear that, for random variables having density, the last equality is possible if and only ifw∞ = 0. In this case, i.e. when lim supk→∞wlk = 0, the equation (5.1) implies lim sup k→∞ Eeγlkτsin(τ wlk) wl k =Eτ = 0.

(16)

But, under our conditions,Eτ >0. These contradictions imply the desired result.

2

Proposition 5.2. Letτbe a positive random variable with the distributionηhaving

a densitygwhich satisfies ConditionsH1)–H4). Then the renewal measure(2.7)

is absolutely continuous with densityρ, for which

ρ(x) = 1 ˇ

τ + Υ(x), (5.2)

whereτˇ = Eτ1 andΥ(·)is some function defined on R+ with values in Rsuch

that

sup

x≥0

xγ|Υ(x)|<∞ for all γ >0.

Proof. First note, that we can represent the renewal measureηˇasηˇ=η∗η0 and

η0=P∞

j=0η

(j). It is clear that in this case the densityρofηˇcan be written as

ρ(x) = Z x 0 g(x−y) X n≥0 g(n)(y)dy . (5.3)

Now we use the arguments proposed in the proof of Lemma 9.5 from [5]. For any

0< <1we set ρ(x) = Z x 0 g(x−y)   X n≥0 (1−)ng(n)(y)−(1−) ˇ τ g0(y)  dy−g(x), (5.4)

whereg0(y) =e−y/τˇ1{y>0}.It is easy to deduce that for anyx∈R

lim →0 ρ(x) = ρ(x) − 1 ˇ τ Z x 0 g(z) dz−g(x). (5.5)

Moreover, in view of the conditionH1)we obtain that the functionρ(x)satisfies the conditionD)from SectionA.2. So, through PropositionA.5we get

ρ(x+) +ρ(x−) = 1 π Z R e−ixθρb(θ) dθ , whereρb(θ) = R Re iθxρ (x)dx. Note that |bg(θ)|= Z R eiθxg(x)dx ≤ Z R g(x)dx= 1,

(17)

i.e. for any0< <1we have|1−(1−)bg(θ)|<1and therefore ∞ X n=0 (1−)n(bg(θ))n= 1 1−(1−)bg(θ).

From this and, taking into account that

b g0(θ) = Z R eiθxg0(x)dx= ˇ τ −iτ θˇ , we obtain b ρ(θ) =bg(θ) ∞ X n=0 (1−)n(bg(θ))n− 1− ˇ τ b g(θ)bg0(θ)−bg(θ) =bg(θ)G(θ) and G(θ) = 1 1−(1−)bg(θ) − 1−iˇτ θ −iτ θˇ , i.e. ρ(x−) +ρ(x+) = 1 π Z R e−ixθbg(θ)G(θ) dθ . (5.6) One can check directly that

sup

0<<1,θ∈R

|G(θ)| < ∞.

Therefore, using the condition H3) and the Lebesgue’s dominated convergence theorem, we can pass to limit as→0in (5.6), i.e., we obtain that

ρ(x+) +ρ(x−)−2 ˇ τ Z x 0 g(z) dz−g(x+)−g(x−) = 1 π Z R e−ixθbg(θ)G0(θ) dθ , where G0(θ) = 1 1−bg(θ) + 1−iτ θˇ iτ θˇ .

Using here again PropositionA.5we deduce that

ρ(x+) +ρ(x−) = 2 ˇ τ Z x 0 g(z) dz+ 1 π Z R e−ixθbg(θ) ˇG(θ) dθ (5.7) and ˇ G(θ) = 1 1−bg(θ) + 1 iˇτ θ.

(18)

Note now that we can represent the density (5.3) as ρ(x) =g∗X n≥0 g(n)=X n≥1 g(n)(x) =g(x) +X n≥2 g(n)(x) =:g(x) +ρc(x)

and the functionρc(x)is continuous for allx∈R. This means that

e

ρ(x) = ρ(x+) +ρ(x−)

2 −ρ(x) =

g(x+) +g(x−)

2 −g(x)

and, therefore, the conditionH2)implies that, for anyγ >0,

sup

x≥0

xγ|ρe(x)| <∞.

Now we can rewrite (5.7) as

ρ(x) = 1 ˇ τ Z x 0 g(z) dz+ 1 2π Z R e−ixθgb(θ) ˇG(θ) dθ−ρe(x). (5.8)

Taking into account thatEeβτ <∞for someβ >0we can obtain that

sup x≥0 xγ Z +∞ x g(z) dz <∞.

To study the second term in (5.8) we will use PropositionA.3. Indeed, the condition H3)implies the first limit equality in (A.1). The second one follows directly from LemmaA.4. Therefore, in view of PropositionA.3, there exists someβ∗>0such that, for any0≤β0 ≤β∗,

Z

R

e−ixθbg(θ) ˇG(θ) dθ=e−β0x

Z

R

e−ixθbg(θ−iβ0) ˇG(θ−iβ0) dθ .

Note that, due to Lemma 5.1, the function 1− bg(θ) has no zeros on the line {z∈C : Im(z) =−β1}. Moreover, one can check directly thatθ = 0is an iso-lated zero. So, this means that for anyN >1there can be only finitely many zeros in{z∈C : −β1 <Im(z)<0,|Re(z)|< N}of the function1−bg(θ).Moreover, note that in view of lemmaA.4for anyr >0

lim

Re(θ)→∞,|Im(θ)|≤rb

g(θ) = 0.

This means that there exists N > 0 such that the function 1 −bg(θ) 6= 0 for

θ ∈ {z∈C : −β1 <Im(z)<0,|Re(z)| ≥N}. So, there can be only finitely many zeros of the function 1−bg(θ) in {z∈C : −β1 < Im(z)<0} for some

(19)

fixed0 < β1 < β. Therefore, there exists someβ0 > 0for which the function

1−bg(θ)has no zeros in{z∈C : −β0< Im(z)<0}, i.e. the functionGˇ(θ)will be bounded in this set and we obtain that

sup x≥0 eβ0x Z R e−ixθbg(θ) ˇG(θ) dθ <∞.

This the conclusion follows. 2

Using this proposition we can study the renewal process(Nt)t≥0introduced in

(2.4).

Corollary 5.3. Assume that ConditionsH1)–H4)hold true. Then, for anyt >0,

ENt≤ |ρ|t and ENt2 ≤ |ρ|t+ |ρ|2t2. (5.9)

Proof.First, by means of Proposition5.2, note that we get

ENt=E X k≥1 1{T k≤t}= Z t 0 ρ(v) dv≤ |ρ|t .

Regarding the last bound in (5.9), we use the same reasoning as in the previous inequality, i.e., we obtain

ENt2 =E X k≥1 1{T k≤t}+ 2E X k≥1 1{T k≤t} X j=k+1 1{T j≤t} =ENt+ 2E X k≥1 1{T k≤t}Θ(Tk) =ENt+ Z t 0 Θ(v)ρ(v) dv ,

where, for0≤v≤t,we defined the functionΘ(v) =ENtv≤ |ρ|(t−v). 2

6

Stochastic calculus for semi-Markov processes

In this section we give some results of stochastic calculus for the process(ξt)t0

given in (1.2), needed all along this paper. As the processξtis the combination of a Levy process and a semi-Markov process, these results are not standard and need to be provided.

Lemma 6.1. Letfandgbe any non-random functions fromL2[0, n]and(It(f))t0

be the process defined in(2.5). Then, for any0≤t≤n,

EIt(f)It(g) =%21(f, g)t +%22(f, gρ)t, (6.1)

where(f, g)t=Rt

(20)

Proof.First, note that we can represent the stochastic integralIt(f)as It(f) =%1ItL(f) +%2Itz(f), (6.2) where ItL(f) = Z t 0 f(s)dLs and Itz(f) = Z t 0 f(s)dzs.

Note that the mutual covariation for the martingales ItL(f) and ItL(g) (see, for example, [18]) may be calculated as

[IL(f), IL(g)]t= ˇ%2 Z t 0 f(s)g(s)ds+ (1−%ˇ2) X 0≤s≤t f(s)g(s) ∆ ˇLs2 , (6.3)

where∆ ˇLs= ˇLs−Lˇs−. Taking into account thatEItL(f)ItL(g) =E[IL(f), IL(g)]t

and that in view of the first condition in (2.2)Π(x2) = 1, we obtain that EItL(f)ItL(g) = ˇ%2 Z t 0 f(s)g(s)ds+ (1−%ˇ2) Π(x2) Z t 0 f(s)g(s)ds = Z t 0 f(s)g(s)ds . (6.4)

Moreover, note that EItz(f)Itz(g) =E ∞ X l=1 f(Tl)g(Tl)Yl21{T l≤t} ! =E ∞ X l=1 f(Tl)g(Tl)1{Tl≤t} ! = Z t 0 f(s)g(s)ρ(s)ds .

Hence the conclusion follows. 2

Lemma 6.2. Assume that Conditions H1)–H4)hold true. Then, for anyn ≥ 1

and for any non random functionffromL2[0, n],the stochastic integral(2.5)exists

and satisfies the properties(2.6)with the coefficientκQgiven in(2.6).

Proof. This lemma follows directly from Lemma6.1withf =gand Proposition

5.2. 2

Lemma 6.3. Letf andgbe bounded functions defined on[0,∞)×R.Then, for

anyk≥1, E IT k−(f)ITk−(g)| G =%21(f , g)T k+% 2 2 k−1 X l=1 f(Tl)g(Tl),

(21)

Proof.Using (6.2), (6.4) and, taking into account that the process(Lt)t≥0is

inde-pendent of theG, we obtain EIT k−(f)ITk−(g)| G =%21(f , g)T k+E ITz k−(f)I z Tk−(g)| G . Moreover, EITz k−(f)I z Tk−(g)| G =E k1 X l=1 f(Tl)Yl ! k1 X l=1 g(Tl)Yl ! | G ! = k−1 X l=1 f(Tl)g(Tl).

This we obtain the desired result. 2

Lemma 6.4. Assume that ConditionsH1)–H4)hold true. Then, for any

measur-able bounded non-random functionsf andg,we have

E Z n 0 It2(f)g(t) dmt ≤2%22|g||f|2 ∗kΥk1n.

Proof. Using the definition of the process(mt)t0 we can represent this integral

as Z n 0 It2(f)g(t) dmt=X k≥1 IT2 k−(f)g(Tk)Y 2 k 1{Tk≤n} − Z n 0 It2(f)g(t)ρ(t) dt=:Vn−Un. (6.5)

Note now that

EVn =EX k≥1 g(Tk)EIT2 k−(f) | G1{T k≤n}.

Now, using Lemma6.3we can represent the last expectation as

EVn =%21EVn0+%22EVn00, (6.6) where Vn0 =X k≥1 g(Tk)kfk2 Tk1{Tk≤n} and V 00 n = X k≥2 g(Tk)1{T k≤n} k−1 X l=1 f2(Tl).

(22)

The first term in (6.6) can be represented as

EVn0 =

Z n

0

g(t)kfk2tρ(t)dt .

To estimate the last expectation in (6.6), note that

EVn00 =E X l≥1 f2(Tl) ¯g(Tl)1{Tl≤n} = Z n 0 f2(v) ¯g(v)ρ(v)dv , where ¯ g(v) =E X k≥1 g(v+Tk)1{T k≤n−v} = Z n v g(t)ρ(t−v)dt .

Moreover, using now the representation (6.1), we calculate the expectation of the last term in (6.5) EUn=%21 Z n 0 kfk2tg(t)ρ(t) dt+%22 Z n 0 ˇ f(t)g(t)ρ(t) dt ,

wherefˇ(t) =R0t f2(s)ρ(s) ds. This implies that

E Z n 0 It2(f)g(t) dmt=%22 Z n 0 g(t)δ(t)dt , whereδ(t) =Rt 0 f

2(v) (ρ(tv)ρ(t))ρ(v) dv. Note that, in view of

Proposi-tion5.2, the functionδcan be estimated as

|δ(t)| ≤ |f|2|ρ| Z t 0 |Υ(t−v)−Υ(t)|dv≤ |f|2|ρ| (kΥk1+t|Υ(t)|). Therefore, E Z n 0 It2(f)g(t) dmt ≤2%22|g||f|2kΥk1n

and this finishes the proof. 2

Lemma 6.5. Assume that ConditionsH1)–H4)hold true. Then, for any

measur-able bounded non-random functionsf andg, one has

E

Z n

0

(23)

Proof.First, note that Z n 0 It2(f)It−(g)g(t)dξt=%1 Z n 0 It2(f)It(g)g(t)dLt+%2 Z n 0 It2(f)It−(g)g(t)dzt.

Second, we will show that E

Z n

0

It2(f)It(g)g(t)dLt= 0. (6.7)

Using the notations (6.2), we set

J1 = Z n 0 It2(f)ItL(g)g(t)dLt and J2= Z n 0 It2(f)Itz(g)g(t)dLt, we obtain that Z n 0 It2(f)It(g)g(t)dLt=%1J1+%2J2. (6.8)

Now let us recall the Novikov inequalities, [23], also referred to as the Bichteler– Jacod inequalities (see [4,21]) providing bound moments of supremum of purely discontinuous local martingales for any predictable functionhand anyp≥2

E sup 0≤t≤n Z [0,t]×R hd(µ−ν) p ≤Cp∗EJˇp,n(h), (6.9)

whereCp∗is some positive constant and

ˇ Jp,n(h) = Z [0,n]×R h2 dν !p/2 + Z [0,n]×R hpdν .

By applying this inequality for the non-random functionh = (s, x) =g(s)x, and, recalling thatΠ(x8)<, we obtain,

sup 0≤t≤n E I ˇ L t (g) 8 <∞.

Taking into account that, for any non random square integrated functionf,the in-tegral

Rt

0 f(s)dws

is Gaussian with the parameters

0,Rt 0 f 2(s)ds, we obtain sup 0≤t≤n EItL(g) 8 <∞.

(24)

Finally, by using the Cauchy inequality, we can estimate for any 0 < t ≤ the following expectation as E(ItL(f))4(ItL(g))2 < q E(IL t (f))8 q E(IL t(f))4 i.e., sup 0≤t≤n E(ItL(f))4(ItL(g))2<∞.

Moreover, taking into account that the processes(Lt)t0 and(zt)t0are indepen-dent, we obtain that

E(Itz(f))4(ItL(g))2=E(Itz(f))4E(ItL(g))2 = Π(x2)

Z t

0

g2(s)dsE(Itz(f))4.

One can check directly here that, fort >0,

E|Itz(f)|4 ≤ |f|4EY14ENt2.

Note that the last bound in Corollary5.3yields sup0tn E(Itz(f))4 < ∞ and, therefore,

sup

0≤t≤n

E(It(f))4(ItL(g))2 <∞.

It follows directly thatEJ1 = 0. Now we study the last term in (6.8). To this end, first note that similarly to the previous reasoning we obtain that

E Z n 0 (ItL(f))2Itz(g)g(t)dLt= 0 and E Z n 0 ItL(f)Itz(f)Itz(g)g(t)dLt= 0.

Therefore, to show (6.7) one needs to show that E

Z n

0

(Itz(f))2Itz(g)g(t) dLt= 0. (6.10)

To check this note that, for any0< t≤nand for any bounded functionf,

Itz(f) = ∞ X k=1 f(Tk)Yk1{Tk≤t} = Nn X k=1 f(Tk)Yk1{Tk≤t}, i.e., Z n 0 (Itz(f))2Itz(g)g(t) dLt= Nn X k=1 Nn X l=1 Nn X j=1 f(Tk)f(Tl)g(Tj)YjYlYkIklj,

(25)

where Iklj = Z n 0 1{T k≤t}1{Tl≤t}1{Tj≤t}dLt.

Taking into account that the(Lt)t0is independent of the fieldGz =σ{zt, t≥0},

we obtain thatE Iklj|Gz = 0. Therefore, E Z n 0 (Itz(f))2Itz(g)g(t) dLt = E Nn X k=1 Nn X l=1 Nn X j=1 f(Tk)f(Tl)g(Tj)YjYlYkE Iklj|Gz = 0.

So, we obtain (6.10) and hence the proof is achieved. 2

7

Properties of the regression model

(

1.1

)

In order to prove the oracle inequalities we need to study the conditions introduced in [14] for the general semi-martingale model (1.1). To this end we set for any

x∈Rnthe functions B1,Q,n(x) = n X j=1 xj EQξj,n2 −σQ and B2,Q,n(x) = n X j=1 xjξej,n, (7.1)

whereσQis defined in (2.9) andξej,nj,n2 −EQξj,n2 .

Proposition 7.1. Assume that ConditionsH1)–H4)hold. Then

sup

x∈[−1,1]n

|B1,Q,n(x)| ≤C1,Q,n, (7.2)

whereC1,Q,nQτ φˇ 2maxkΥk1.

Proof.First, note that from (6.2) we have

ξj,n = √%1 nI L n(φj) + %2 √ nI z n(φj).

So, using (6.4) we can write that

j,n2 = % 2 1 n Z n 0 φ2j(t)dt+% 2 2 nE ∞ X l=1 φ2j(Tl)1{T l≤n}. (7.3)

(26)

Proposition5.2implies E ∞ X l=1 φ2j(Tl)1{Tl≤n} = Z n 0 φ2j(x)ρ(x)dx = 1 ˇ τ Z n 0 φ2j(x)dx + Z n 0 φ2j(x)Υ(x)dx . Note thatRn 0 φ 2

j(t)dt=n. So, in view of the condition (3.1), we obtain

Eξ 2 j,n−σQ = %22 n Z n 0 φ2j(x)Υ(x)dx ≤ % 2 2 n φ 2 maxkΥk1. (7.4)

Estimating here%22byσQτˇwe obtain the inequality (7.2) and hence the conclusion

follows. 2

Proposition 7.2. Assume that ConditionsH1)–H4)hold. Then

sup

|x|≤1

EQB22,Q,n(x) ≤ C2,Q,n, (7.5)

whereC2,Q,n=φ4max(1 +σQ2)3ˇlandˇlis given in(4.3).

Proof.By Ito’s formula one gets

dIt2(f) = 2It−(f)dIt(f) +%21%ˇ

2f2(t)dt+ X

0≤s≤t

f2(s)(∆ξsd)2, (7.6)

where ξtd = %3t +%2zt and %3 = %1p1−%ˇ2. Taking into account that the

processes( ˇLt)t0and(zt)t0are independent and the time of jumpsTkdefined in (2.4) has a density, we have∆zs∆ ˇLs = 0a.s. for anys ≥0. Therefore, we can rewrite the differential (7.6) as

dIt2(f) =2It−(f)dIt(f) +%21%ˇ 2f2(t)dt +%23d X 0≤s≤t f2(s)(∆ ˇLs)2+%22d X 0≤s≤t f2(s)(∆zs)2. (7.7)

From Lemma6.2it follows that EIt2(f) =%21 Z t 0 f2(s)ds+%22 Z t 0 f2(s)ρ(s)ds . Therefore, putting e It(f) =It2(f)−EIt2(f), (7.8)

(27)

we obtain dIet(f) = 2It(f)f(t)dξt+f2(t)dmet, met=%32mˇt+%22mt, wheremˇt=P 0≤s≤t(∆ ˇLs) 2tandm t= P 0≤s≤t(∆zs) 2Rt 0 ρ(s)ds. For any

non-random vectorx= (xj)1jnwithPn

j=1x 2 j ≤1, we set ¯ It(x) = n X j=1 xjIetj). (7.9) Denoting At(x) = n X j=1 xjItjj(t) and Bt(x) = n X j=1 xjφ2j(t), (7.10)

we get the following stochastic differential equation for (7.9)

d ¯It(x) = 2At(x)dξt+Bt(x)dmet, I¯0(x) = 0.

Applying the Ito’s formula one obtains EI¯n2(x) =2E Z n 0 ¯ It(x)d ¯It(x) + 4%21%ˇ2E Z n 0 A2t(x)dt +%23EDˇn(x) +%22EDn(x), (7.11) whereDˇn(x) =P 0≤t≤n 2At−(x)∆ ˇLt+% 2 3Bt(x)(∆ ˇLt)2 2 and Dn(x) = P+∞ k=1 2AT k−(x)Yk+%2BTk−(x)Y 2 k 2 1{T

k≤n}. Let us now show

that E Z n 0 ¯ It(x)dI¯t(x) ≤2%42φ3maxkΥk1n2. (7.12)

To this end, note that

Z n 0 ¯ It(x)d ¯It(x) =2 X 1≤j,l≤n xjxl Z n 0 e Itj)Itll(t)dξt + n X j=1 xj Z n 0 e It−(φj)Bt(x)dmet.

(28)

Using here Lemma6.5, we getE R0n Ietj)Itii(t)dξt= 0. Moreover, the

process( ˇmt)t≥0is a martingale, i.e.E

Rn 0 Iet−(φj)Bt(x)dmet= 0. Therefore, E Z n 0 ¯ It−(x)dI¯t(x) =%22 n X j=1 xjE Z n 0 e It−(φj)Bt(x)dmt.

Taking into account here that for any non-random bounded functionf

E Z n 0 f(t)dmt= 0, we obtainERn 0 Iet−(φj)Bt(x) dmt = E Rn 0 I 2 t−(φj)Bt(x) dmt. So, Lemma6.4 yields E Z n 0 e It−(φj)Bt(x)dmt = n X l=1 xlE Z n 0 It2j)φ2l(t)dmt ≤ 2%22φ3maxkΥk1 n X l=1 |xl|n . Therefore, E Z n 0 ¯ It(x)dI¯t(x) ≤2%42φ3maxkΥk1n X 1≤l,j≤n |xl| |xj| = 2%42φ3maxkΥk1n n X l=1 |xl| !2 .

Taking into account here that Pn

l=1 |xl| 2 ≤ nP l≥1 x 2 l ≤n, we obtain (7.12).

Reminding thatΠ(x2) = 1we can calculate directly that

EDˇn(x) = 4E Z n 0 A2t(x)dt+%43Π(x4) Z n 0 Bt2(x)dt . (7.13)

(29)

Note that, thanks to Lemma6.1, we obtain that E Z n 0 A2t(x)dt=X i,j xixj Z n 0 φi(t)φj(t)EItφi(t)Itφj(t)dt =X i,j xixj Z n 0 φi(t)φj(t) Z t 0 φi(v)φj(v)(%21+%22ρ(v))dv = 1 2% 2 1 X i,j xixj Z n 0 φi(t)φj(t)dt 2 +%22A1,n(x) = n 2 2 % 2 1+% 2 2A1,n(x), whereA1,n(x) =P i,jxixj Rn 0 φi(t)φj(t) Rt 0 φi(v)φj(v)ρ(v)dv dt. This term can be estimated through Proposition5.2as

A1,n(x) = n2 2ˇτ + X i,j xixj Z n 0 φi(t)φj(t) Z t 0 φi(v)φj(v) Υ(v)dv dt ≤ n 2 2ˇτ +n φ 4 maxkΥk1 X i,j |xi||xj| ≤ 1 2ˇτ + φ 4 maxkΥk1 n2.

So, reminding thatσQ=%21+%22/τˇand thatφmax≥1, we obtain that

E Z n 0 A2t(x)dt≤σQ 2 + φ 4 maxkΥk1 n2 ≤ 1 4+kΥk1 φ4max(1 +σQ2)n2. (7.14)

Taking into account that

sup t≥0 Bt2(x) ≤ φ4max   n X j=1 |xj|   2 ≤ φ4maxn , (7.15)

thatφmax ≥1,and that%41 ≤σ2Qwe estimate the expectation in (7.13) as

(30)

Moreover, taking into account that the random variable Yk is independent of

AT

k−(x)and of the field

G =σ{Tj, j ≥ 1}and thatEAT k−(x) |G = 0, we get E +∞ X k=1 BT k−(x)ATk−(x)Y 3 k1{Tk≤n}= +∞ X k=1 EEBT k−(x)ATk−(x)Y 3 k1{Tk≤n}|G =EY13E +∞ X k=1 BT k−(x)1{Tk≤n}E(ATk−(x) |G) = 0. Therefore, EDn(x) =%22EY14D1,n(x) + 4D2,n(x), (7.17) where D1,n(x) = +∞ X k=1 EBT2 k−(x)1{Tk≤n} and D2,n(x) = +∞ X k=1 EA2T k− (x)1{T k≤n}.

Using the bound (7.15) we can estimate the termD1,nasD1,n(x)≤φ4maxnENn. Using here Corollary5.3, we obtain

D1,n(x)≤ |ρ|φ4maxn2. (7.18) Now, to estimate the last term in (7.17), note that the processAt(x)can be rewritten as At(x) = Z t 0 Qx(t, s)dξs, with Qx(t, s) = n X j=1 xjφj(s)φj(t). (7.19)

Applying Lemma6.3again, we obtain for anyk≥1

E A2T k− (x)|G=%21 Z Tk 0 Q2x(Tk, s)ds+%22 k−1 X j=1 Q2x(Tk, Tj).

So, we can represent the last term in (7.17) as

D2,n=%21D(1)2,n+%22D(2)2,n, (7.20) where D(1)2,n = +∞ X k=1 E 1{T k≤n} Z Tk 0 Q2x(Tk, s)ds

(31)

and D2(2),n= +∞ X k=1 E 1{Tk≤n} k−1 X j=1 Q2x(Tk, Tj).

Thanks to Proposition5.2we obtain

D(1)2,n = Z n 0 Z t 0 Q2x(t, s)ds ρ(t) dt ≤ |ρ| Z n 0 Z n 0 Q2x(t, s)dsdt .

In view of the definition ofQxin (7.19), we can rewrite the last integral as

Z n 0 Q2x(t, s)ds= X 1≤i,j≤n xixjφi(t)φj(t) Z n 0 φi(s)φj(s) ds = n X i=1 x2i φ2i(t) Z n 0 φ2i(s) ds= n n X i=1 x2i φ2i(t). SincePn j=1 x 2 j ≤1, we obtain that, Z n 0

Qx2(t, s)ds ≤ φ2maxn and D2(1),n ≤φ2max|ρ|n2. (7.21)

Let us estimate now the last term in (7.20). First, note that we can represent this term as D2(2),n= +∞ X k=1 E 1{T k≤n} k−1 X j=1 Q2x(Tk, Tj) = ∞ X j=1 1{T j≤n}G(Tj) = Z n 0 G(t)ρ(t)dt , where G(t) = +∞ X k=1 E 1{T k≤n}Q 2 x((t+Tk), t) = Z n 0 Q2x(t+v, t)ρ(v)dv = Z n+t t Q2x(u, t)ρ(u−t)du .

It is clear that, for any0≤t≤n,

Z n+t t Q2x(u, u−t)ρ(u) du ≤ |ρ| Z 2n 0 Q2x(v, t) dv .

(32)

In view of the inequality (7.21) we obtain Z 2n 0 Q2x(u, t) du= Z 2n 0 Q2x(t, u) du ≤2φ2maxn . Therefore, max 0≤t≤nG(t)

≤ 2|ρ|φ2maxn and D2(2),n≤ 2|ρ|2φ2maxn2.

So, estimating%22 by τ σˇ Q and taking into account thatEY14 ≥ 1,we obtain that we obtain that

EDn(x)≤13 (1 + ˇτ)φ4max EY14(1 +|ρ|2)n2σQ.

Using all these bound in (7.11) we obtain (7.5) and thus the conclusion follows.2

Remark 7.1. The properties(7.2)and(7.5)are used to obtain the oracle

inequal-ities given in Section4(see, for example, [14]).

8

Simulation

In this section we report the results of a Monte Carlo experiment in order to assess the performance of the proposed model selection procedure (3.15). In (1.1) we chose a1-periodic function which is defined, for0≤t≤1,as

S(t) =tsin(2πt) +t2(1−t) cos(4πt). (8.1) We simulate the model

dyt=S(t)dt+ dξt,

whereξt= 0.5dwt+ 0.5dzt.

Herezt is the semi-Markov process defined in (2.3) with a GaussianN(0,1)

sequence(Yj)j≥1and(τk)k≥1used in (2.4) taken asτk ∼χ23.

We use the model selection procedure (3.15) with the weights (3.20) in which

k∗ = 100 +pln(n), ti = i/ln(n),m = [ln2(n)]andδ = (3 + ln(n))−2. We

define the empirical risk as R= 1 p p X j=1 ˆ ESbn(tj)−S(tj) 2 , (8.2)

where the observation frequencyp = 100001and the expectation was taken as an average overN = 10000replications, i.e.,

ˆ E b Sn(.)−S(.) 2 = 1 N N X l=1 b Snl(.)−S(.) 2 .

(33)

n R R∗

20 0.04430 0.235

100 0.01290 0.068

200 0.00812 0.043

1000 0.00196 0.010

Table 1: Empirical risks

We set the relative quadratic risk as

R∗=R/||S||2p, with ||S||2p = 1 p p X j=0 S2(tj). (8.3) In our case||S||2 p= 0.1883601.

Table1gives the values for the sample risks (8.2) and (8.3) for different num-bers of observationsn.

Figures 1–4 show the behaviour of the regression function and its estimates by the model selection procedure (3.15) depending on the values of observation periodsn. The black full line is the regression function (8.1) and the red dotted line is the associated estimator.

(34)

0.0 0.2 0.4 0.6 0.8 1.0 − 1. 0 − 0. 50 .00 .5

Figure 1: Estimator ofSforn= 20

0.0 0.2 0.4 0.6 0.8 1.0 − 1. 0 − 0. 50 .00 .5

(35)

0.0 0.2 0.4 0.6 0.8 1.0 − 1. 0 − 0. 50 .00 .5

Figure 3: Estimator ofSforn= 200

0.0 0.2 0.4 0.6 0.8 1.0 − 1. 0 − 0. 50 .00 .5

(36)

Remark 8.1. From numerical simulations of the procedure (3.15) with various

observation numbersnwe may conclude that the quality of the proposed

proce-dure: (i) is good for practical needs, i.e. for reasonable (non large) number of observations; (ii) is improving as the number of observations increases.

9

Proofs

We will prove here most of the results of this paper.

9.1 Proof of Theorem4.1

First, note that we can rewrite the empirical squared error in (3.9) as follows

Errn(λ) =Jn(λ) + 2

X

j=1

λ(j)ˇθj,n+||S||2−δPn(λ), (9.1)

whereθˇj,n =θej,n−θjθbj,n. Using the definition ofθej,nin (3.10) we obtain that

ˇ θj,n = √1 nθjξj,n+ 1 nξej,n+ 1 nςj,n+ σQ−σbn n , whereςj,n=EQξj,n2 −σQandξej,n =ξ2 j,n−EQξj,n2 . Putting M(λ) = √1 n n X j=1 λ(j)θjξj,n and Pn0= σQ|λ| 2 n , (9.2) we can rewrite (9.1) as Errn(λ) =Jn(λ) + 2 σQ−σbn n Lˇ(λ) + 2M(λ) + 2 nB1,Q,n(λ) + 2pP0 n(λ) B2,Q,n(e(λ) √ σQn +kSk 2ρP n(λ), (9.3)

wheree(λ) =λ/|λ|, the functionLˇ(·)is defined in (3.8) and the functionsB1,Q,n(·)

(37)

Letλ0 = (λ0(j))1≤j≤nbe a fixed sequence inΛandbλbe as in (3.16).

Substi-tutingλ0andλbin Equation (9.3), we obtain

Errn(λb)−Errn0) =J(λb)−J(λ0) + 2 σQ−σbQ n Lˇ($) + 2 nB1,Q,n($) + 2M($) + 2 q P0 n(bλ) B2,Q,n(eb) √ σQn −2 q P0 n(λ0) B2,Q,n(e0) √ σQn −δPn(bλ) +δPn(λ0), (9.4)

where$=bλ−λ0,be=e(λb)ande0=e(λ0). Note that, by (3.8),

|Lˇ(ˆx)| ≤ Lˇ(ˆλ) + ˇL(λ)≤2|Λ|.

Applying the inequality

2|ab| ≤δa2+δ−1b2 (9.5)

implies that, for anyλ∈Λ,

2pP0 n(λ) |B2,Q,n(e(λ))| √ σQn ≤ δP 0 n(λ) + B22,Q,n(e(λ)) δσQn .

Taking into account the bound (7.2), we get

Errn(ˆλ)≤Errn(λ0) + 2M($) + 2C1,Q,n n + 2B2,Q,n δσQn + 1 n|σb−σQ|(|bλ| 2+|λ 0|2) + 2δPn(λ0),

where B2,Q,n = supλΛB22,Q,n((e(λ)). Moreover, noting that in view of (3.8)

supλΛ|λ|2 ≤ |Λ|

∗, we can rewrite the previous bound as

Errn(bλ)≤Errn0) + 2M($) + 2C1,Q,n n + 2B2,Q,n δσQn +4|Λ|∗ n |bσ−σQ|+ 2δPn(λ0). (9.6)

To estimate the second term in the right side of this inequality we set

Sx=

n

X

j=1

(38)

Thanks to (2.6) we estimate the termM(x)for anyx∈Rnas EQM2(x)≤κQ1 n n X j=1 x2(j)θ2j =κQ 1 n||Sx|| 2. (9.7)

To estimate this function for a random vectorx∈Rnwe set

Z∗ = sup

xεΛ1

nM2(x)

||Sx||2

, Λ1= Λ−λ0.

So, through Inequality (9.5), we get

2|M(x)| ≤δ||Sx||2+ Z∗

nδ. (9.8)

It is clear that the last term here can be estimated as EQZ∗ ≤ X x∈Λ1 nEQM2(x) ||Sx||2 ≤ X x∈Λ1 κQ=κQˇι , (9.9)

whereˇι=card(Λ). Moreover, note that, for anyx∈Λ1,

||Sx||2− ||Sxb ||2= n X j=1 x2(j)(θ2j −θb2 j)≤ −2M1(x), (9.10) whereM1(x) = n−1/2 Pn j=1 x 2(j)θ

jξj,n. Taking into account that, for anyx ∈

Λ1the components|x(j)| ≤1, we can estimate this term as in (9.7), i.e.,

EQM12(x)≤κQ

||Sx||2

n .

Similarly to the previous reasoning we set

Z1∗ = sup xεΛ1 nM12(x) ||Sx||2 and we get EQZ1∗≤κQˇι . (9.11)

Using the same type of arguments as in (9.8), we can derive

2|M1(x)| ≤δ||Sx||2+

Z1

(39)

From here and (9.10), we get ||Sx||2≤ ||Sxb || 2 1−δ + Z1∗ nδ(1−δ) (9.13)

for any0< δ <1. Using this bound in (9.8) yields

2M(x)≤ δ||Sxb || 2

1−δ +

Z∗+Z1∗ nδ(1−δ).

Taking into account thatkSb$k2 ≤2 (Errn(bλ) +Errn0)), we obtain

2M($)≤ 2δ(Errn(bλ) +Errn(λ0))

1−δ +

Z∗+Z1∗ nδ(1−δ).

Using this bound in (9.6) we obtain

Errn(bλ)≤ 1 +δ 1−3δErrn(λ0) + Z∗+Z1∗ nδ(1−3δ) + 2C1,Q,n n(1−3δ) + 2B2,Q,n δ(1−3δ)σQn +(4|Λ|∗+ 2) n(1−3δ) |σb−σQ|+ 2δ (1−3δ)P 0 n(λ0).

Moreover, for0< δ <1/6,we can rewrite this inequality as

Errn(λb)≤ 1 +δ 1−3δErrn(λ0) + 2(Z∗+Z1∗) nδ + 4C1,Q,n n + 4B2,Q,n δσQn +(8|Λ|∗+ 2) n |σbn−σQ|+ 2δ (1−3δ)P 0 n(λ0).

In view of Proposition7.2we estimate the expectation of the termB2,Q,nin (9.6) as

EQB2,Q,n ≤X

λ∈Λ

EQB22,Q,n(e(λ))≤ˇιC2,Q,n.

Taking into account that|Λ| ≥1, we get

R(Sb∗, S)≤ 1 +δ 1−3δR(Sbλ0, S) + 4κQˇι nδ + 4C1,Q,n n + 4ˇιC2,Q,n δσQn +10|Λ|∗ n EQ|bσ−σQ|+ 2δ (1−3δ)P 0 n(λ0).

Using the upper bound forPn(λ0)in LemmaA.1, one obtains (4.1), that finishes

(40)

9.2 Proof of Proposition4.2

We use here the same method as in [11]. First of all note that the definition (3.12) implies that b tj,n=tj+√1 nηj,n, (9.14) where tj = Z 1 0 S(t)T rj(t)dt and ηj,n= √1 n Z n 0 Trj(t) dξt. So, we have b σn= n X j=[√n]+1 t2j + 2Mn+ 1 n n X j=[√n]+1 ηj,n2 , (9.15) where Mn= √1 n n X j=[√n]+1 tjηj,n.

Note that, for continuously differentiable functions (see, for example, Lemma A.6 in [11]), the Fourier coefficients(tj)satisfy the following inequality, for anyn≥1,

∞ X j=[√n]+1 t2j ≤ 4R1 0 |S˙(t)|dt 2 √ n ≤ 4kS˙k2 √ n . (9.16)

In the same way as in (9.7) we estimate the termMn, i.e.,

EQMn2≤ κQ n n X j=[√n]+1 t2j ≤ 4κQk ˙ Sk2 n√n ,

while the absolute value of this term forn≥1can be estimated as

|EQMn| ≤ κQ+k

˙

Sk2 √

n .

Moreover, using Propositions7.1and7.2 we can represent the last term in (9.15) as 1 n n X j=[√n]+1 η2j,n= σQ(n− √ n) n + B1,Q,n(x0) n + B2,Q,n(x00) √ n ,

(41)

withx0j =1{√n<j≤n}andx00j =1{√n<j≤n}/ √ n. Therefore, EQ 1 n n X j=[√n]+1 η2j,n−σQ ≤ √σQ n + C1,Q,n n + p C2,Q,n √ n .

Taking into account thatC2,Q,n ≥ 1,we obtain the bound (4.5) and hence the

desired result. 2

9.3 Proof of Theorem4.4

First note, that in view of (3.22) and (3.18)

lim n→∞ ˇ ι nˇ = limn→∞ k∗m nˇ = 0 for any ˇ >0.

Furthermore, the bound (3.23) and the conditions (2.10) and (3.18) yield

lim

n→∞ |Λ|

n1/3+ˇ = 0 for any >ˇ 0.

So, from here we obtain the convergence (4.10). 2

9.4 Proof of Theorem4.5

First, we denote byQ0the distribution of the noise (1.2) and (2.1) with the param-eter%1 =ς∗,%ˇ= 1and%2 = 0, i.e. the distribution for the “signal + white noise” model. So, we can estimate as below the robust risk

R∗n(Sen, S)≥ RQ

0(Sen, S).

Now Theorem 6.1 from [12] yields the lower bound (4.14). Hence this finishes the

proof. 2

9.5 Proof of Proposition4.6

Puttingλ0(j) = 0forj ≥nwe can represent the quadratic risk for the estimator (3.7) as kSbλ 0−Sk 2= ∞ X j=1 (1−λ0(j))2θj2−2Hn+ 1 n n X j=1 λ20(j)ξ2j,n,

(42)

whereHn = n−1/2 Pn

j=1(1−λ0(j))λ0(j)θjξj,n. Note thatEQHn = 0for any Q∈Qn, therefore, EQkSbλ 0 −S k 2= ∞ X j=1 (1−λ0(j))2θj2+ 1 nEQ n X j=1 λ20(j)ξj,n2 .

Proposition7.1and the last inequality in (2.9) imply that for anyQ∈ Qn

EQ n X j=1 λ20(j)ξj,n2 ≤ς∗ n X j=1 λ20(j) +φ 2 maxς ∗kΥk 1 ˇ τ :=ς ∗ n X j=1 λ20(j) +C∗1,n. Therefore, R∗n(Sbλ 0, S)≤ ∞ X j=j∗ (1−λ0(j))2θj2+ 1 υn n X j=1 λ20(j) + C ∗ 1,n n ,

wherej∗andυnare defined in (3.20). Setting

Υ1,n(S) =υ2nk/(2k+1) ∞ X j=j∗ (1−λ0(j))2θ2j and Υ2,n= 1 υn1/(2k+1) n X j=1 λ20(j),

we rewrite the last inequality as

υn2k/(2k+1)Rn∗(Sλb

0, S)≤Υ1,n(S) + Υ2,n+ ˇCn, (9.17)

whereCˇnn2k/(2k+1)C∗1,n/n. Note, that the conditions (2.10) and (4.8) imply thatC∗1,n = o(nδˇ) asn → ∞ for anyδ >ˇ 0; therefore, Cˇn → 0as n → ∞. Putting

un=υ2nk/(2k+1)sup

j≥j∗

(1−λ0(j))2/aj,

withajdefined in (4.12), we estimate the first term in (9.17) as

sup S∈Wk r Υ1,n(S)≤ sup S∈Wk r un X j≥1 ajθj ≤unr.

Taking into account thataj/(π2kj2k) → 1 asj → ∞andl0 → rasε → 0and using the definition ofωα

0 in (3.20), we obtain that lim sup n→∞ un≤ lim n→∞ υ 2k/(2k+1) n sup j≥j∗ (1−λ0(j))2 (π j)2k = lim n→∞ υn2k/(2k+1) π2kω2k α0 = 1 π2k(d kr)2k/(2k+1) .

(43)

Therefore, lim sup n→∞ sup S∈Wk r Υ1,n(S)≤ r1/(2k+1) π2k(d k)2k/(2k+1) =: Υ∗1. (9.18)

As to the second term in (9.17), note that

lim n→∞ 1 ωα 0 n X j=1 λ20(j) = Z 1 0 (1−tk)2dt= 2k 2 (k+ 1)(2k+ 1).

So, taking into account thatωα

0/υ 1/(2k+1) n → (dkr)1/(2k+1)asn→ ∞, the limit ofΥ2,n can be calculated as lim n→∞Υ2,n= 2(dkr)1/(2k+1)k2 (k+ 1)(2k+ 1) =: Υ ∗ 2.

Moreover, sinceΥ∗1+ Υ∗2=:r∗k, we obtain

lim n→∞υ 2k/(2k+1) n sup S∈Wk r R∗n(Sbλ 0, S)≤r ∗ k

and get the desired result. 2

9.6 Proof of Theorem4.7

Combining Proposition4.6and Theorem4.4yields Theorem4.6. 2

Acknowledgments. This research was partially supported by the Ministry of

Education and Science of the Russian Federation, project (No 2.3208.2017/PCH), by the Russian Federal Professor program of the Ministry of Education and Science of the Russian Federation, (project No 1.472.2016/FPM) and by the Academic D.I. Mendeleev Fund Program of the Tomsk State University (research project NU 8.1.55.2015 L).

10

Appendix

A.1 Property of the penalty term

Lemma A.1. For anyn≥ 1andλ∈Λ,

Pn0(λ)≤EQErrn(λ) +C1,Q,n

n ,

where the coefficientP0

References

Related documents

While the amplitude of vacuum fluctuation at the teat-end recorded during flow simulation for wide-bore tapered liners is high with simultaneous pulsation the vacuum loss during

The Ranger has a circular membrane button on the face of the instrument, which consists of: Power (Enter), ALARM, COUNT, Audio (Minus), MENU, Backlight (Plus), and MODE..

Grosvenor – International Index of World’s Most Resilient Cities - 2014 Toronto ranks 1 st out of 50 global cities for long-term real estate investment. Z/Yen Group –

The net asset value of the shares representing 50% of the entire registered share capital of TKT, before taking into account the Adina Berlin Checkpoint

Eight DCPR syndromes refer to the concept of abnormal illness behaviour: persistent somatisation; functional somatic symptom secondary to a psychiatric disorder; conversion

The negotiation of randomness and design, labyrinth and compost, decoded in abstract or figurative terms is also relevant to apprehend the cultural background of the

• Develops a category strategy and applies appropriate value levers and supporting techniques/tools as needed to meet value objectives (e.g., strategic sourcing, SRM,

The figure illustrates the evidence that the split tensile strength of GFRC increases with the increase of curing days together with content of fiber The graph shows that