Log-density Deconvolution by Wavelet Thresholding

(1)

This is an author-deposited version published in: http://oatao.univ-toulouse.fr/ Eprints ID: 8328

To link to this article: DOI: 10.1214/10-EJS576 URL: http://dx.doi.org/10.1214/10-EJS576

To cite this version:

Bigot, Jérémie and Gadat, Sébastien and Marteau,

Clément

Sharp template estimation in a shifted curves model. (2010)

Electronic Journal of Statistics, vol. 4. pp. 994-1021. ISSN 1935-7524

O

pen

A

rchive

T

oulouse

A

rchive

O

uverte (

OATAO

)

OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible.

Any correspondence concerning this service should be sent to the repository administrator: [email protected]

(2)

Sharp template estimation in a shifted curves model

Jérémie Bigot, Sébastien Gadat & Clément Marteau Institut de Mathématiques de Toulouse Université de Toulouse et CNRS (UMR 5219)

31062 Toulouse, Cedex 9, France

{Jeremie.Bigot,Sebastien.Gadat,Clement.Marteau}@math.univ-toulouse.fr, August 3, 2010

Abstract

This paper considers the problem of adaptive estimation of a template in a randomly shifted curve model. Using the Fourier transform of the data, we show that this problem can be transformed into a linear inverse problem with a random operator. Our aim is to approach the estimator that has the smallest risk on the true template over a finite set of linear estimators defined in the Fourier domain. Based on the principle of unbiased empirical risk minimization, we derive a nonasymptotic oracle inequality in the case where the law of the random shifts is known. This inequality can then be used to obtain adaptive results on Sobolev spaces as the number of observed curves tend to infinity. Some numerical experiments are given to illustrate the performances of our approach.

Keywords: Template estimation, Curve alignment, Inverse problem, Random operator, Oracle inequality, Adaptive estimation.

1 Introduction

1.1 Model and objectives

The goal of this paper is to study a special class of linear inverse problems with a random operator. We consider the problem of estimating a curve f, called template or shape function, from the observations ofnnoisy and randomly shifted curvesY1, . . . Yncoming from the following

Gaussian white noise model:

dYj(x) =f(x−τj)dx+ǫdWj(x), x∈[0,1], j= 1, . . . , n (1.1)

where Wj are independent standard Brownian motions on [0,1], ǫ represents a level of noise

common to all curves, the τj’s are unknown random shifts independent of the Wj’s, f is the

unknown template to recover, and nis the number of observed curves that may be let going to infinity to study asymptotic properties. This model is realistic in many situations where it is reasonable to assume that the observed curves represent replications of almost the same process and when a large source of variation in the experiments is due to transformations of the time axis. Such a model is commonly used in many applied areas dealing with functional data such as neuroscience (see e.g. Isserles, Ritov and Trigano (2008)) or biology (see e.g. Ronn (1998)). A well known problem in functional data analysis is the alignment of similar curves that differ by a time transformation to extract their common features, and (1.1) is a simple model where

f represents such common features (see Ramsay and Silverman (2002), Ramsay and Silverman (2005) for a detailed introduction to curve alignment problems in statistics).

The function f :R_→R_{is assumed to be of period}₁ _{so that the model (1.1) is well defined,} and the shifts τj are supposed to be independent and identically distributed (i.i.d.) random

(3)

the paper, it is supposed that the density g is known. Estimating f can be seen as an inverse problem with a random operator. Indeed, the template f is not observed directly, but through

nindependent realizations of the random operator Aτ :L2per([0,1])→L2per([0,1]) defined by

Aτ(f)(x) =f(x−τ), x∈[0,1],

whereL2

per([0,1])denotes the space of squared integrable functions on[0,1]with period 1, andτ

is random variable with densityg. The additive Gaussian noise makes this problem ill-posed, and Bigot and Gadat (2010) have shown that estimating f in such models is in fact a deconvolution problem where the densityg of the random shifts plays the role of the convolution operator. For theL2 risk on [0,1], Bigot and Gadat (2010) have derived the minimax rate of convergence for the estimation of f over Besov balls as n tends to infinity. This minimax rate depends both on the smoothness of the template and on the decay of the Fourier coefficients of the densityg. This is a well known fact for standard deconvolution problem in statistics, see e.g. Fan (1991), Donoho (1995), but the results in Bigot and Gadat (2010) represent a novel contribution and a new point of view on template estimation in inverse problems with a random operator such as (1.1). This appears also to be a new setting in the field of inverse problem with partially known operators as considered in Cavalier and Hengartner (2005), Efromovich and Koltchinskii (2001), Hoffmann and Reiß (2008), Marteau (2006) and Cavalier and Raimondo (2007).

However, the approach followed in Bigot and Gadat (2010) is only asymptotic, and the main goal of this paper is to derive non-asymptotic results to study the estimation of f by keeping fixed the number nof observed curves.

1.2 Fourier Analysis and an inverse problem formulation

Supposing that f ∈L2_per([0,1]), we denote byθk its kth Fourier coefficient, namely:

θk=

Z 1 0

e−2ikπxf(x)dx.

In the Fourier domain, the model (1.1) can be rewritten as

cj,k :=

Z 1 0

e−2ikπxdYj(x) =θke−i2πkτj+ǫzk,j (1.2)

where zk,j are i.i.d. NC(0,1) variables, i.e. complex Gaussian variables with zero mean and

such that E_|_z_k,j_|2 _{= 1}_{. This means that the real and imaginary parts of the} _z_k,j _{’s are Gaussian} variables with zero mean and variance 1/2. Thus, we can compute the sample mean of the kth

Fourier coefficient over thencurves as ˜ ck := 1 n n X j=1 ck,j =θkγ˜k+ ǫ √ nξk, (1.3) where ˜ γk:= 1 n n X j=1 e−i2πkτj_, _(1.4) and the ξk’s are i.i.d. complex Gaussian variables with zero mean and variance 1. The Fourier

coefficients ˜ck in equation (1.3) can be viewed as observations coming from a statistical inverse

problem. Indeed, the standard sequence space model of an ill-posed statistical inverse problem is (see Cavalier, Golubev, Picard and Tsybakov (2002) and the references therein)

ck =θkγk+σzk, (1.5)

where theγk’s are eigenvalues of a known linear operator,zkare random noise variables and σis

(4)

is to recover the coefficients θk from the observations ck under various conditions on the decay

to zero of theγk’s as|k| →+∞. A large class of estimators for the problem (1.5) can be written

as ˆ θk=λk ck γk ,

where λ= (λk)k∈Z is a sequence of reals called filter. Various estimators of this form have been

studied in a number of papers, and we refer to Cavalier et al. (2002) for more details. In a sense, we can view equation (1.3) as a linear inverse problem (with σ = √ǫ

n) with a

stochastic operator whose eigenvalues ˜γk = _n1Pjn=1e−i2πkτj are random variables that are not observed. Nevertheless, it is supposed that the density g of the shifts is known. Therefore, one can compute the expectation γk of the random eigenvaluesγ˜k given by

γk:=Eγ˜k=E e−i2πkτ= Z +∞ −∞ e−i2πkxg(x)dx.

Hence, if we assume that the density gof the random shifts is known, estimation of the Fourier coefficients of f can be obtained by a deconvolution step of the form

ˆ θk=λk ˜ ck γk , (1.6)

where ˜ck is defined in (1.3) and λ= (λk)k∈Z is a filter whose choice will be discussed later on.

Theoretical properties and optimal choices for the filter λ are presented in the case where the coefficientsγk are known. Such a framework is commonly used in inverse problems such as (1.5)

to obtain consistency results and to study asymptotic rates of convergence, where it is generally supposed that the law of the additive error is Gaussian with zero mean and knownvariance σ2_, see e.g Cavalier et al. (2002). In model (1.1), the random shifts may be viewed as a second source of noise and for the theoretical analysis of this problem the law of this other random noise is also supposed to be known.

Recently, some papers have addressed the problem of regularization with partially known operator. For instance, Cavalier and Hengartner (2005) consider the case where the eigenvalues are unknown but independently observed. They deal with the model:

ck=γkθk+ǫξk, ˜γk=γk+σηk, ∀k∈N, (1.7)

where (ξk)k∈Nand(η_k)_k_∈N denote i.i.d standard gaussian variables. In this case, each coefficient θk can be estimated by γ˜−_k1ck. Similar models have been considered in Cavalier and Raimondo

(2007), Marteau (2006) or Marteau (2009). In a more general setting, we may refer to Efromovich and Koltchinskii (2001) and Hoffmann and Reiß (2008).

In this paper, our framework is slightly different in the sense that the operator is stochastic, but the regularization is operated using deterministic eigenvalues. Hence the approach followed in the previous papers is no directly applicable to model (1.1). We believe that estimatingf and deriving convergence rates in model (1.1) without the knowledge of g remains a difficult task, and this paper is a first step to address this issue.

1.3 Previous work in template estimation and shift recovery

The problem of estimating the common shape of a set of curves that differ by a time trans-formation is usually referred to as the curve registration problem, and it has received a lot of attention in the literature over the last two decades. Among the various methods that have been proposed, one can distinguish between landmark-based approaches which aim at aligning common structural points of the curves (typically locations of extrema) see e.g Gasser and Kneip (1992), Gasser and Kneip (1995), Bigot (2006), and nonparametric modeling of the warping functions to align a set of curves see e.g Ramsay and Li (2001), Wang and Gasser (1997), Liu

(5)

and Müller (2004). However, in these papers, studying consistent estimates of the common shape

f as the number of curves ntends to infinity is generally not considered.

In the simplest case of shifted curves, various approaches have been developed. Self-modelling regression methods proposed by Kneip and Gasser (1988) are semiparametric models where each observed curve is a parametric transformation of a common regression function. Such models are usually referred to as shape invariant models and estimation in this setting is usually done by iterating the following two steps: estimation of the parameters of the transformations (here the shifts) given a reference curve, and nonparametric estimation of a template by aligning the observed curves given a set of known transformation parameters. Kneip and Gasser (1988) studied the consistency of such a two steps procedure in an asymptotic framework where both the number of functionsn and the number of observed points per curves grows to infinity. Due to the asymptotic equivalence between the white noise model and nonparametric regression with an equi-spaced design (see Brown and Low (1996)), such an asymptotic framework in our setting would correspond to the case where both n tends to infinity and ǫ is let going to zero. In this paper we prefer to focus only on the case wherenmay be let going to infinity, and to leave fixed the level of additive noise in each observed curve.

Based on a model with curves observed at discrete time points, semiparametric estimation of the shifts and the shape function is proposed in Gamboa, Loubes and Maza (2007) and Vimond (2010) as the number of observations per curve grows, but with a fixed number n of curves. A generalization of this approach for the estimation of scaling, rotation and translation parameters for two-dimensional images is also proposed in Bigot, Gamboa and Vimond (2009), but also with a fixed number of observed images. Semiparametric and adaptive estimation of a shift parameter in the case of a single observed curve in a white noise model is also considered by Dalalyan, Golubev and Tsybakov (2006) and Dalalyan (2007). Estimation of a common shape for randomly shifted curves and asymptotic innis considered in Ronn (1998) from the point of view of semiparametric estimation when the parameter of interest is infinite dimensional.

However, in all the above cited papers rates of convergence or oracle inequalities for the estimation of the template are generally not studied. Moreover, our procedure differs from the approaches classically used in curve registration as our estimator is obtained in only one very simple step, and it is not based on an alternative scheme between estimation of the shifts and averaging of back-transformed curves given estimated values of the shifts parameters.

Finally, note that Castillo and Loubes (2009) and Isserles et al. (2008) consider a model similar to (1.1), but they rather focus on the the estimation of the density g of the shifts as n

tends to infinity. Using such an approach could be a good start for studying the estimation of the templatef without the knowledge ofg. However, we believe that this is far beyond the scope of this paper, and we prefer to leave this problem open for future work.

1.4 Organization of the paper

In Section 2, we consider an estimator of the shape function f using monotone filters when the eigenvalues γk are known. Based on the principle of unbiased risk minimization developed

by Cavalier et al. (2002), we propose a data-based choice for the filter λ in (1.6). Then, we derive an oracle inequality showing that the resulting estimator has a risk close to an ideal one when choosing λ over a class of monotone filters. In Section 3, as an example, we study the case of projection filters. This gives an estimator based on the Fourier transform of the curves with a data-based choice of the frequency cut-off. We study its asymptotic properties in terms of minimax rates of converge over Sobolev balls. Finally in Section 4, a detailed simulation study is proposed to illustrate the numerical properties of such estimators. All proofs are deferred to a technical section at the end of the paper.

(6)

2 Estimation of the common shape

In the following, we assume that the Fourier coefficients γk are known. In this situation it is

possible to choose a data-dependent filter λ⋆ _{that mimics the performances of an optimal filter}

λ0 called oracle that would be obtained if we knew the true template f. The performances of this filter are related to the performances of the filterλ0 via an oracle inequality. In this section, most of our results are non-asymptotic and are thus related to the approach proposed in Cavalier et al. (2002) to study standard statistical inverse problems via oracle inequalities.

2.1 Smoothness assumptions for the density g

In a deconvolution problem, it is well known that the difficulty of estimating f is quantified by the decay to zero of theγk’s as|k| →+∞. Depending how fast these Fourier coefficients tend

to zero as _|k_{| →}+_∞, the reconstruction of f will be more or less accurate. This phenomenon was systematically studied by Fan (1991) in the context of density deconvolution. In this paper, the following type of assumption on gis considered:

Assumption 2.1 The Fourier coefficients ofghave a polynomial decay i.e. for some realβ _≥0, there exists two constants Cmax ≥Cmin>0 such that for all k∈Z

Cmin|k|−β ≤ |γk| ≤Cmax|k|−β. (2.1)

2.2 Risk decomposition

Recall that an estimator of the θk’s is given by θˆk = λkγ−k1˜ck, k ∈ Z, see equation (1.6),

where λ = (λk)k∈Z is a real sequence. Examples of commonly used filters include projection

weights λk = 11_|k|≤j for some integer j, and the Tikhonov weights λk = 1/(1 + (|k|/ν2)ν1) for some parameters ν1 >0 and ν2 >0. Based on the θˆk’s, one can estimate the signal f using the

Fourier reconstruction formula

ˆ fλ(x) = X k∈Z ˆ θke−2ikπx.

The problem is then to choose the sequence (λk)k∈Z in an optimal way with respect to an

appropriate risk. For a given filter λ we use the classical ℓ2-norm to define the risk of the estimatorθˆ(λ) = (ˆθk)k∈Z R(θ, λ) =E_θ_k_θˆ₍_λ₎₋_θ_k2 2 =Eθ X k∈Z |θˆk−θk|2 (2.2)

Note that analyzing the above risk (2.2) is equivalent to analyze the mean integrated square risk R( ˆfλ, f) = Ekfˆλ−fk2 = ER₀1( ˆfλ(x)−f(x))2dx

. The following lemma gives the bias-variance decomposition ofR(λ, θ). A detailed proof can be found in Bigot and Gadat (2010).

Lemma 2.1 For any given nonrandom filter λ, the risk of the estimator θˆ(λ)can be decomposed as R(θ, λ) =X k∈Z (λk−1)2|θk|2 | {z } Bias + 1 n X k∈Z λ2_k ǫ 2 |γk|2 | {z } V1 +1 n X k∈Z λ2_k|θk|2 1 |γk|2 − 1 | {z } V2 (2.3)

For a fixed number of curvesnand a given shape functionf, the problem of choosing an optimal filter in a set of possible candidates is to find the best trade-off between low bias and low variance in the above expression. However, this decomposition does not correspond exactly to the classical bias-variance decomposition for linear inverse problems. Indeed, the variance term in (2.3) is the sum of two terms and differs from the classical expression of the variance for linear estimator in

(7)

statistical inverse problems. Using our notations, the classical variance term isV1 = ǫ 2 n X k∈Z λ2_k |γk|2

and appears in most of linear inverse problems. However, contrary to standard inverse problems, the variance term of the risk also depends on the Fourier coefficientsθkof the unknown function

f to recover. Indeed, our dataγ_k−1˜ck are noisy observations of θk:

γ_k−1˜ck=θk+ ˜ γk γk − 1 θk+ ǫ √ nγ −1 k ξk. (2.4)

Hence, using the sequence (γk)k∈N instead of (˜γ_k)_k_∈N introduces an additional error. This

ex-plains the presence of the second termV2.

A similar phenomenon occurs with the model (1.7), although it is more difficult to quantify it. Indeed, in this setting:

˜ γ_k−1ck=θk+ γk ˜ γk − 1 θk+ǫ˜γk−1ξk, ∀k∈N. (2.5)

Hence, we also observe an additional term depending onθ. This term is controlled using a Taylor expansion but the quadratic risk cannot be expressed in a simple form. Let us stress that the difficulty of studying problem (2.5), when compared to our estimator (2.4), comes from the fact that in (2.5) there is a random term in the denominator. We refer to Marteau (2009) for a discussion with some numerical simulation and to Cavalier and Hengartner (2005), Efromovich and Koltchinskii (2001), Hoffmann and Reiß (2008), Marteau (2006) and Cavalier and Raimondo (2007).

2.3 An oracle estimator and unbiased estimation of the risk (URE)

Suppose that one is given a set Λ of cardinalityN _≥1 of possible candidate filters, that is Λ = (λj₎

j∈{1,...,N}, with λj = (λjk)k∈Z, j = 1, . . . , N which satisfy some general conditions to

be discussed later on. In the case of projection filters, Λ can be for example the set of filters

λj_k =11_|_k_|≤_j, k_∈Z_for _j_{= 1}_{, . . . , N}_.

Given a set of filters Λ, the best estimator is defined as the filterλ0 _{(called oracle) which has} the smallest riskR(θ, λ) overΛ, that is

λ0 := arg min

λ∈ΛR(θ, λ). (2.6) This filter is an ideal one because it cannot be computed in practice as the sequence of coefficients

θ is unknown. However, the oracle λ0 _{can be used as a benchmark to evaluate the quality of} a data-dependent filter λ⋆ chosen in the set Λ. This is the main interpretation of the oracle inequality that we will develop in the next section.

2.4 Oracle inequalities for monotone filters

2.4.1 Definitions

First, let us introduce the following class of monotone filters: Λmon := ( λ= (λk)k∈Z:λ_k=λ₋_k, X k∈Z λ2_k <+∞, 1≥λ0 ≥λ1≥. . .≥λm ≥. . .≥0 ) ,

In practice, the filters λin the set Λ are such that λk = 0 (or vanishingly small) for all k large

enough. Hence, for such choices of filters, numerical minimization of criterions such as (2.6) is feasible, since it only involves the computation of finite sums. Let us thus define the following threshold m0 beyond which all values of the filters λinΛ vanish

m0 = inf k:|γk|2 ≤ log2n n −1. (2.7)

(8)

Then,Λis supposed to be a finite set of cardinalityN of monotone filtersλwhich satisfiesλk= 0

as soon as|k| ≥m0, that is

Assumption 2.2 For N ≥ 1, Λ = (λj)j∈{1,...,N} ⊂ ΛNmon with λjk = 0 for |k| ≥ m0 and j= 1, . . . , N.

The choice of the filter λ⋆ will be obtained by minimization of a data-based criterion whose derivation is guided by the unbiased risk estimate (URE) minimization principle developed by Cavalier et al. (2002). Typically, one cannot minimize such a criterion over filters (λk)k∈Z of

infinite length. Indeed, each coefficient θk is estimated by γk−1˜ck where γk = Eγ˜k. Hence, the

ratio γ_k−1γ˜k should be as close as possible to 1. Since γk→0ask→+∞and the variance ofγ˜k

is equal to _n1 + (1₋_n1)_|γk|2, it is clear that large values of kshould be discarded. Bounds similar

to (2.7) on the maximum number of non-vanishing values for the filters are used in papers related to partially known operator, see for instance Cavalier and Hengartner (2005) or Efromovich and Koltchinskii (2001). This bounds have to be carefully chosen but are not of first importance. In general, estimating the operator is easier than estimating the function f.

In this paper, we have chosen to present an adaptive estimator based on the URE principle. Given the finite familyΛ, our aim is to select the best possible filter among this family. We are aware that different adaptive schemes are available in the literature. For instance, the penalized blockwise Stein’s rule (see Marteau (2006) and references therein) provides a filter for the model (1.5) leading to an oracle inequality among all monotone filters. In some sense, the generalization of such kind of result to our model would be more powerful. Nevertheless, we think that our approach is also interesting in this setting since it does not impose a particular regularization scheme. Moreover, the differences between model (1.5) and model (1.3) are easier to underline with our method.

2.4.2 Adaptive regularization scheme

Let us now explain how to compute an estimator U(Y, λ) of the risk R(θ, λ). First, recall that Lemma 2.1 yields the following expression of the quadratic riskR(θ, λ)

R(θ, λ) = X k∈Z (1−λk)2|θk|2 | {z } :=Bias +ǫ 2 n X k∈Z λ2_k|γk|−2 | {z } :=V1 +1 n X k∈Z λ2_k|θk|2 1 |γk|2 − 1 | {z } :=V2 ,

and suppose that it is possible to construct an estimator Θˆ2_k of |θk|2 from the observations of

the shifted curves Y = (Yi)i=1...n. For any non-random filter λ satisfying Assumption 2.2, by

replacing _|θk|2 in (2.3) by Θˆ2_k, the above decomposition of the risk R(θ, λ) suggests to compute

a data-based criterion U(Y, λ) (depending only on (Y, λ)) of the form

U(Y, λ) := X |k|≤m0 (λ2_k−2λk) ˆΘ2k+ ǫ2 n X |k|≤m0 λ2_k|γk|−2+ 1 n X |k|≤m0 λ2_k|γk|−2Θˆ2k. (2.8)

The criterion U(Y, λ) is thus an approximation of R(θ, λ)_{− k}θ_k2

2. Then, for choosing a data-dependent filter λ⋆, the principle of URE, see Cavalier et al. (2002) for further details, simply suggests to minimize the criterion U(Y, λ) over λ_∈Λ. Following the principle of URE, a data-dependent choice of λwould thus be given by

λ⋆:= arg min

λ∈ΛU(Y, λ). (2.9) In the following, we useΘˆ2

k=γk−2 h |˜ck|2−ǫ 2 n i

as an estimator of_|θk|2. Remark thatEθΘˆ2_k6=|θk|2.

(9)

rather use the principle of minimization of a risk estimate. Nevertheless, we will prove that this bias can be controlled, and that it is in some sense negligible compared toR(θ, λ).

Note that in for the computation ofU(Y, λ), we have taken into account all the terms (Bias,V1 andV2) in the decomposition of the riskR(θ, λ). Unfortunately, when usingΘˆ2_k =γ_k−2

h

|c˜k|2−ǫ

2

n

i

as an estimator of _|θk|2, minimization of such a criterion does not lead to satisfactory results.

This is due the term _n1P_k_∈Zλ2k|γk|−4|

n

|c˜k|2− ǫ

2

n

o

in (2.8) which is an estimation of the variance term V2 in the decomposition (2.3) of the risk R(θ, λ). The main issue is that the study of this term requires a control of |γk|−4, and not only |γk|−2 as for the study of the classical variance

term V1 = ǫ

2

n

P

k∈Zλ2_k|γk|−2 in standard inverse problem. Nevertheless, by definition (2.7), one

has that log_n2(n)γ_k−2 _≤ 1 for all _|k_{| ≤} m0. Therefore, this suggests to rather consider filters minimizing a criterion of the form

U1(Y, λ) := X |k|≤m0 (λ2_k₋2λk)|γk|−2 |c˜k|2− ǫ2 n +ǫ 2 n X |k|≤m0 λ2_k_|γk|−2 +log 2₍_n₎ n X |k|≤m0 λ2_k|γk|−4 |˜ck|2− ǫ2 n . (2.10)

Alternatively, following Cavalier and Hengartner (2005), it is sometimes possible to neglect the error generated by the use of an approximation of the unknown random eigenvalues ˜γk by

γk which yet corresponds to the termV2. Indeed, remark that one may findρ >0such that V2≤ 1 n X k∈Z λ2_k_|θk|2 |γk|2 ≤ 1 nkθk 2_sup k∈Z λ2_k_|γk|−2≤ρkθk2 1 n X k∈Z λ2_k_|γk|−2=ρkθk2 V1 ǫ2.

Hence, depending on the values of ρ, ǫ2 and _kθ_k2, the variance term V2 may be negligible compared to V1. In this case, one could rather consider the following criterion U0(Y, λ) derived from the decomposition on the classic quadratic risk (i.e. Bias +V1), and defined as

U0(Y, λ) := X |k|≤m0 (λ2_k₋2λk)|γk|−2 |˜ck|2− ǫ2 n +ǫ 2 n X |k|≤m0 λ2_k_|γk|−2. (2.11)

In the sequel, we summarize these two approaches by considering the more general criterion

Uα(Y, λ)given by Uα(Y, λ) := X k∈Z (λ2_k₋2λk)|γk|−2 |˜ck|2− ǫ2 n +ǫ 2 n X k∈Z λ2_k_|γk|−2+α log2n n X k∈Z λ2_k_|γk|−4 |˜ck|2− ǫ2 n . (2.12) where 0 ≤ α ≤ 1 is a parameter to be discussed. All the following results of the paper are given for any value of the parameter α in[0,1]. Following the URE principle, we will study the theoretical properties of the filters λ⋆

α ∈Λ defined as

λ⋆_α = arg min

λ∈ΛUα(Y, λ). (2.13) for 0 ≤ α ≤ 1. Note that Uα(Y, λ) can be written as a penalized version of the empirical risk

U(Y, λ) defined in (2.8). Choosing the regularization parameter α is a data-driven way is a delicate problem in nonparametric statistics. Nevertheless, the following heuristic arguments can be given for a suitable choice of α. The presence of the additional penalized term V2 is due to the variability along the time axis (random translation) of the template f. When ǫis small compared to kθk2, the white noise deconvolution may be considered as negligible comparing to the alignment issue of the observed curves. the mean error will be larger when the signal to reconstruct possesses a large number of modes. Thus, in a framework with a smallǫand a large

(10)

kθk2, it may be reasonable to chooseα 6= 0. To the contrary, if the level of noiseǫis large, the model (1.1) can certainly be considered as being close to the standard white noise deconvolution problem. In this setting, setting α = 0 may be recommended. Moreover, an optimal choice of

α is certainly related to the number of observed curves n. The problem of choosing α is thus discussed in detail in Section 4 on numerical experiments.

2.4.3 Sharp estimator of the oracle risk

We are now able to propose an adaptive estimator of θ. In the following, α will belong to [0,1]and we denote by θ_α⋆ the estimator related to the filters λ⋆_α defined in (2.13) that is

θ⋆_k,α= ˜ck

γk

λ⋆_k,α for θ⋆_α= (θ_k⋆)k∈Z and λ⋆_α = (λ⋆_k,α)_k_∈Z. (2.14)

To simplify the notations, we omit the dependency of θ⋆_α and λ⋆_α on α, and write θ⋆ =θ_α⋆ and

λ⋆ =λ⋆_α. Through a simple oracle inequality, the next theorem relates the performances ofθ⋆ to the ideal filterλ0 _{minimizing the risk}_R₍_{θ, λ}₎ _over_λ_∈_Λ_{. We denote by} _L

Λthe term introduced in Cavalieret al. (2002) which in some sense measure the complexity of the familyΛ. The proof of the theorem and a complete definition of LΛ are given in the Appendix.

Theorem 2.1 Suppose that Assumption 2.2 holds and that the density g satisfies Assumption 2.1. Let θ⋆ _{defined by (2.14). Then, there exists} ₀_{< γ}

1 <1 such that, for all 0< γ < γ1,

E_θ_k_θ⋆₋_θ_k2 _≤ _{(1 +}_h₁₍_{γ, n}_{)) inf} λ∈Λ " R(θ, λ) +αlog 2_n n X k∈Z λ2_k|γk|−2|θk|2 # + Γ(θ) (2.15) +C1 ǫ2 nLΛω kθk 2_L Λγ−1+C2 log2n n ω (1−α)kθk 2_log2₍_n₎_γ−1₊_Ce−γ2_log1+τ_n ,

whereh1(γ, n)→0 asγ →0 andn→+∞,C1, C2 andτ >0 are suitable constants independent

of n, Γ(θ) = X |k|>m0 ǫ2_{λ0_k_}2_|γk|−2+ (1−(1−λ0k)2)θk2 , and ω(x) = max λ∈Λ sup_k λ 2 k|γk|−2118 > < > : X i λ2_i|γi|−2≤xsup i λ2_i|γi|−2 9 > = > ; ∀x >0.

Theorem 2.1 proves that the quadratic risk is comparable to the risk of the oracle up to some residual terms. Before explaining these terms, just a few words on the quantities in the infimum. First, if α = 0, then E_θ_k_θ⋆₋_θ_k2 _{is comparable to} _R₍_{θ, λ}0₎ _{but the price to pay is a} residual term of order log2(n)/n. In the case where α = 1, we reach the quadratic risk up to a log term. This lack of precision can be explained by the processes involved inU1(Y, λ), which are hardly controllable due to the dependency between the ˜γk. Previously, we have only given some

heuristic arguments on the wayα could be chosen. Theorem 2.1 presents results for all possible values ofαbetween 0and1. Therefore, the above theorem can give some hints on how choosing

α. However, let us recall that the choice of α is strongly related to the choice of a good penalty in our criteria. This is a classical issue in many statistical problems, but finding a data-based value for a regularization parameter is a delicate problem.

The function ω was initially introduced in Cavalier et al. (2002). Under Assumption 2.1, it is of order x2β _{for many kind of filters (spectral cut-off, Tikhonov, Landweber, etc...). Hence,}

the two terms of (2.15) depending on ω are respectively of orderǫ2/nand log2(n)/n. They can be reasonably considered as negligible compared to R(θ, λ0)in many situations (see for instance Section 3 bellow). The same remark hold for the term Ce−γ2log1+τn, which tends to 0 faster than n−1_.

(11)

We conclude this discussion with the term Γ(θ). This term measures the error associated to the truncation of the estimation at the orderm0. Consider for instance the particular case of a projection (or spectral cut-off) family: λj_k =1_{|k|≤j} for all j = 1, . . . , N. Denote by λj0 = λ

0 the oracle filter. Then,Γ(θ) = 0as soon as the oracle bandwidth j0 is smaller thanm0. In some sense, the control of(˜γk)k∈Z is easier than the estimation of (θ_k)_k_∈Z (no inversion to perform).

Hence, in many cases, Γ(θ) = 0. A similar discussion holds for other kind of filters.

3 Minimax rates of convergence for Sobolev balls

Let us now study the special case of projection filters. In this section, we prove that such estimators attain the minimax rate of convergence on many functional spaces. In particular, the termlog2(n)added in (2.12) to control the estimation of the variance termV2, and the maximal bandwidth m0 (2.7) have no influence on the performances of our estimator from a minimax point of view.

Let1_≤p, q_{≤ ∞}andA >0, and suppose that f belongs to a Besov ball_Bs

p,q(A)of radius A

(see e.g. Donoho, Johnstone, Kerkyacharian and Picard (1995) for a precise definition of Besov spaces). Bigot and Gadat (2010) have derived the following asymptotic minimax lower bound for the quadratic risk over a large class of Besov balls.

Theorem 3.1 Let 1 _≤p, q _{≤ ∞} and A >0, let p′ ₌_p_∧₂ _{and assume that:} _f _{∈ B}s

p,q(A) and

s≥p′ (Regularity condition on f), g satisfies the polynomial decreasing condition (2.1) at rate

β on its Fourier coefficients (Regularity condition ong),s_≥(2β+ 1)(1/p₋1/2) ands_≥2β+ 1 (Dense case). Then, there exists a universal constant M1 depending onA, s, p, q such that

inf ˆ fn sup f∈Bs p,q(A) E_k_fˆ_n₋_f_k2_≥_M₁_n2s+2β+1−2s _, _as _n→ ∞_,

where fˆn∈L2per([0,1]) denotes any estimator of the common shape f, i.e a measurable function

of the random processes Yj, j= 1, . . . , n

Therefore, Theorem 3.1 extends the lower bound n2s+2β+1−2s _{usually obtained in a classical} deconvolution model to the more complicated model of deconvolution with a random operator derived from equation (1.1). Then, let us introduce the following smoothness class of functions which can be identified with a periodic Sobolev ball:

Hs(A) = ( f _∈L2_per([0,1]) ;X k∈Z (1 +_|k_|2s)_|θk|2 ≤A ) ,

for some constant A >0 and some smoothness parameter s >0, where θk =

R1

0 e−2ikπxf(x)dx. It is known (see e.g. Donohoet al. (1995)) that ifsis not an integer thenHs(A)can be identified

with a Besov ball_Bs

2,2(A′).

Let Λ = (λj)j∈{1,...,N}, with λjk = 11|k|≤j, k ∈ Z for j = 1, . . . , N and N ≤ m0 be a set of projection filters. In this case, the decomposition of the quadratic risk for the filter λj _∈Λis

R(θ, λj) = X |k|≥j |θk|2+ ǫ2 n X |k|≤j |γk|−2+ 1 n X |k|≤j |θk|2 1 |γk|2 − 1 ,

Assuming that s _≥ 2β + 1 and f _∈ Hs(A), then the classical choice λ⋆_k = 1k≤j⋆ where j⋆ ∼

n2s+2β+11 _{yields that}

R(θ, λ⋆)_∼ inf

λ∈ΛR(θ, λ)∼n

−2s 2s+2β+1,

provided thatj⋆ ≤m0. It can be checked that the choice (2.7) implies that m0 ∼n

1

2β _{and thus} for a sufficiently large n,the condition j⋆ _≤ _m

0 is satisfied since n

1

2s+2β+1 << n 1

(12)

lower bound obtained in Theorem 3.1 we conclude that the quadratic risk infλ∈ΛR(θ, λ)decays asymptotically at the optimal (in the minimax sense) rate of convergence:

sup f∈Hs(A) inf λ∈ΛR(θ, λ)∼_f_∈sup_H_s₍_A₎λinf∈ΛR(θ, λ)∼n −2s 2s+2β+1_.

Now, remark that for the estimator θ_α⋆ defined by (2.14), Theorems 2.1 yields that E_θ_k_θ_α⋆ ₋_θ_k2 ₌_O inf λ∈ΛR(θ, λ) asn_→+_∞, for any 0_≤α_≤1,

since it can be checked that in the case of projection filters, the additional terms in the upper bound (2.15) are of the order_O(_n11−ζ)for a sufficiently small positiveζ. Thus, for any0≤α≤1, the performances of the estimator θ_α⋆ is asymptotically optimal from the minimax convergence point of view.

4 Numerical experiments

The goal of this section on numerical experiments is to study the influence of the regularization parameter α used in the definition (2.12) of the criterion Uα(Y, λ). For sake of simplicity, we

study the case of projection filters Λ = (λj₎

j∈{1,...,N}, with λjk =11|k|≤j, k ∈ Z for j = 1, . . . , N

and N = m0 even if our experiments could be extended to more complex filters. In this case the choice of a filter amounts to choose a frequency cut-off level 1 ≤ j ≤ m0. For λj ∈ Λ and

0_≤α_≤1, the criterion to minimize over 1_≤j_≤m0 is Uα(Y, j) :=− X |k|≤j |γk|−2 |c˜k|2− ǫ2 n +ǫ 2 n X |k|≤j |γk|−2+α log2n n X |k|≤j |γk|−4 |˜ck|2− ǫ2 n .

For the mean pattern f to recover, we consider the three test functions shown in Figure 1. Then, for each test function, we simulate n= 20 randomly shifted curves with shifts following a Laplace distributiong(x) = √1

2σexp

−√2|x_σ|withσ = 0.1. Gaussian noise is then added to each curve. The level of the additive Gaussian noise is measured as the root of the signal-to-noise ratio (rsnr) defined as rsnr= ! R1 0(f(x)−f¯)2dx ǫ2 %1/2 where f¯= Z 1 0 f(x)dx.

A sub-sample of 10 curves forrsnr= 7is shown in Figure 1 for each test function. The Fourier coefficients of the density g are given by γk = ₁₊₂_σ12_π2_k2 which corresponds to a degree of ill-posednessβ = 2. The condition (2.7) leads to the choicem0 = 32. An example of estimation by spectral cut-off by minimizing the criterion Uα(Y, j) with α = 0 is displayed in Figure 1. One

can see that the obtained estimators are rather oscillatory suggesting that the selected frequency cut-off is somewhat too large when takingα= 0.

These results illustrate the problem of choosing the value of α. To better understand the influence of this parameter, we present a short simulation study. The factors are the number of curves n = 20,50,100 and the signal-to-noise ratio rsnr = 3,7. For each combination of these two factors, we generate m = 1, . . . , M (with M = 100) independent replications of the above described simulations. For each replication m we compute the estimator θ∗_α,m for α ranging on a fine grid of [0,1]. Then, since the templatef and its Fourier coefficients θare known, one can compute for each value ofα the following empirical mean squared error (MSE)

M SE(α) = 1 M M X m=1 kθ∗_α,m₋θ_k2₂.

(13)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (a) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 (b) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 (c) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 (d) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 (e) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 (f)

Figure 1: Test functions and an example of randomly shifted curves. First line: (a) Bumps, (b) Sine, (c) Blocks. Second line: sample of 10 curves out ofn= 20for each test function.

For each test function and each combination of the factors, we display in Figure 3 the curve

α _→ M SE(α). The value α∗ _minimizing _α _→ _{M SE}₍_α₎ _{depends on the template to recover.}

These simulations show thatα∗ tends to be smaller as the numbernof curves grows. The value of α∗ is also closer to zero when the signal-to-noise ratio decreases (which corresponds to high values of ǫ). This confirms the heuristic arguments developed in Section 2. If the level of noise

ǫis large compared tokθk2

2 (case of a low signal-to-noise ratio), then the model (1.1) is close to the standard white noise deconvolution problem. In this case, settingα= 0leads to satisfactory results which corresponds to taking the classical decomposition of the risk in standard inverse problems to do the estimation.

To conclude this section, let us consider the estimation by spectral cut-off by minimizing the criterion Uα(Y, j) with α = α∗ in the case n = 20 and rsnr = 7. This example is displayed

in Figure 1. One can see that the obtained estimators are much smoother than those obtained with the choice α = 0. This confirms the importance of the choice of α. However, finding a data-based value forα is clearly challenging and is an interesting topic for future work.

Appendix

This Appendix is divided in two parts. In the first part, we detail the scheme used for the proof of Theorem 2.1. The second part contains some technical lemmas. Throughout the proof,

C denote a generic positive constant whose value may change from line to line. We provide first some short definitions which will be used in the sequel. In some sense, these terms measure the complexity associated to the set of filtersΛ using the notations in Cavalieret al. (2002).

Definition 4.1 For each λ∈Λ, define

ρ(λ) = sup |k|≤m0 |γk|−2λk qP |i|≤m0|γi| −4_λ4 i , ρΛ= max λ∈Λρ(λ),

(14)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 (a)α= 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 (b)α= 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 (c) α= 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 (d)α_∗= 0.1212 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 (e)α_∗= 0.1919 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 (f)α_∗= 0.0808

Figure 2: An example of template estimation (n= 20andrsnr= 7) with α= 0and α=α_∗ for each test function.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 4 6 8 10 12 14 x 10−3 n=20 n=50 n=100 (a) Bumps,rsnr= 7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 10 x 10−3 n=20 n=50 n=100 (b) Sine,rsnr= 7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 0.022 n=20 n=50 n=100 (c) Blocks,rsnr= 7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 4 6 8 10 12 14 x 10−3 n=20 n=50 n=100 (d) Bumps,rsnr= 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 10 11 x 10−3 n=20 n=50 n=100 (e) Sine,rsnr= 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.005 0.01 0.015 0.02 0.025 n=20 n=50 n=100 (f) Blocks,rsnr= 3

Figure 3: Empirical Mean Square Error (MSE) over M = 100 simulations as a function of

α ∈[0,1] for n= 20,50,100 and rsnr= 7,3 for each test function: Bumps (first column), Sine (second Column), Blocks (third column).

S= max λ∈Λ _|_isup_|≤_m₀|γi| −2_λ2 i min λ∈Λ_|_isup_|≤_m₀|γi| −2_λ2 i , M =X λ∈Λ

(15)

For a brief discussion on these quantities, we refer to Cavalier et al. (2002). For all λ∈ Λ, we also introduce Rα(θ, λ)as Rα(θ, λ) = X |k|≤m0 (1−λk)2θ2k+ X |k|≤m0 λ2_k|γk|−2+α log2(n) n X |k|≤m0 λ2_k|θk|2|γk|−2+ X |k|>m0 |θk|2, (4.2)

which corresponds to an approximation of the quadratic risk. 4.1 Proof of Theorem 2.1

The proof uses the following scheme. The first step consists in computing the quadratic risk of θ⋆ _{and proving that it is close to} _R

α(θ, λ⋆). The aim of the second part is to show that

Uα(Y, λ⋆) is close to Rα(θ, λ⋆), even for a random filterλ⋆. Then, we use the fact thatλ⋆

min-imizes the criterion Uα(Y, λ⋆) over the filters in Λ and we compute the expectation of Uα(Y, λ)

for all deterministic λin order to obtain an oracle inequality. Step 1: remark that

E_θ_k_θ⋆₋_θ_k2 ₌ E_θX k∈Z |θ⋆_k−θk|2, = E_θ X |k|≤m0 |λ⋆_kγ_k−1c˜k−θk|2+ X |k|>m0 |θk|2, = E_θ X |k|≤m0 λ⋆_k˜γk γk − 1 θk+λ⋆kγ−k1 ǫ √ nξk 2 + X |k|>m0 |θk|2, = E_θ X |k|≤m0 λ ⋆ k ˜ γk γk − 1 2 |θk|2+ ǫ2 nEθ X |k|≤m0 {λ⋆_k_}2_|ξk|2|γk|−2+ X |k|>m0 |θk|2 +2E_θ X |k|≤m0 ǫ √ nRe (λ⋆_kγ_k−1γ˜k−1)θk×λ⋆k¯γk−1ξ¯k ,

where for a given z _∈ C_, _Re₍_z₎ _{denotes the real part of} _z _and _z_¯ _{the conjugate of} _z_{. In the} following, we denote byR˜(θ, λ) the commonly used risk in inverse problems, i.e.

˜ R(θ, λ) := X |k|≤m0 1−λk ˜ γk γk |θk| 2₊ǫ2 n X |k|≤m0 λ2_k|γk|−2+ X |k|>m0 |θk|2, ∀λ∈Λ.

ThenE_θ_k_θ⋆₋_θ_k2 _{can be rewritten as} E_θ_k_θ⋆₋_θ_k2 ₌ E_θ_R˜₍_{θ, λ}⋆_{) +}ǫ 2 nEθ X |k|≤m0 |γk|−2{λ⋆k}2(|ξk|2−1) +2E_θ X |k|≤m0 ǫ √ nRe (λ ⋆ kγk−1γ˜k−1)θk×λ⋆kγ¯−k1ξ¯k , = E_θ_R˜₍_{θ, λ}⋆_{) +}_A₁₊_A₂_. _(4.3) In order to bound A1, we follow the notations of Cavalieret al. (2002). Let us define

∆(λ) =LΛ ǫ2 n _|_ksup_|≤_m₀λ 2 k|γk|−2 and ¯∆(λ) = log2(n) n _|_ksup_|≤_m₀λ 2 k|γk|−2 for all λ∈Λ, (4.4)

where LΛ has been introduced in (4.1). Then, we apply the inequality (32) of Cavalier et al. (2002): there exists a universal constant C such that for anyγ >0

A1 := ǫ2 nEθ X |k|≤m0 |γk|−2{λ⋆k}2(|ξk|2−1)≤γ ǫ2 nEθ X |k|≤m0 {λ⋆_k_}2_|γk|−2+Cγ−1Eθ∆(λ⋆). (4.5)

(16)

Now, consider a bound for A2 defined as A2:= 2Eθ X |k|≤m0 ǫ √ nRe (λ ⋆ kγ−k1˜γk−1)θk×λ⋆k¯γk−1ξ¯k .

We apply inequality(31) of Cavalier et al. (2002) to obtain for anyγ >0

A2 ≤γEθ

X

|k|≤m0

1−λ⋆_kγ˜_kγ_k−12|θ_k|2+Cγ−1E_θ_∆(_λ⋆₎_. _(4.6) Now, for allγ >0, inequalities (4.3), (4.5) and (4.6) yield

E_θ_k_θ⋆₋_θ_k2 _≤_{(1 +}_γ₎E_θ_R˜₍_{θ, λ}⋆_{) +}_Cγ−1E_θ_∆(_λ⋆₎_. _(4.7) for some positive constant C. At last, we show that R˜(θ, λ⋆) is close to Rα(θ, λ⋆) defined in

(4.2). Remark that E_θ_R˜₍_{θ, λ}⋆₎ = E_θ X |k|≤m0 (1−λ⋆_k)2|θk|2+ ǫ2 nEθ X |k|≤m0 {λ⋆_k}2|γk|−2+Eθ X |k|>m0 |θk|2 +E_θ X |k|≤m0 |1₋λ⋆_kγ˜kγ_k−1|2−(1−λ⋆k)2 |θk|2, = E_θ_R₀₍_{θ, λ}⋆_{) +}E_θ X |k|≤m0 |θk|2{λ⋆k}2 ˜ γk γk − 1 2 | {z } :=B1 +2E_θ X |k|≤m0 λ⋆_k(1₋λ⋆_k)Re 1₋ ˜γk γk |θk|2. | {z } :=B2

First, we apply the Lemma 4.1 withK =γ in order to boundB1. We obtain B1 ≤γ log2n n Eθ X |k|≤m0 {λ⋆_k}2|γk|−2|θk|2+Ce−γlog 1+τ₍_n₎ ,

for someτ >0. ConcerningB2, we use the inequality2ab≤γa+γ−1bfor allγ >0and Lemma 4.1 withK =γ2 in order to obtain

B2 = 2 Eθ X |k|≤m0 λ_k⋆(1₋λ⋆_k)Re 1₋γ_k−1γ˜k|θk|2, ≤ γE_θ X |k|≤m0 (1−λ⋆_k)2|θk|2+γ−1Eθ X |k|≤m0 {λ⋆_k}2|θk|2|γk|−2|γk−γ˜k|2, ≤ γE_θ X |k|≤m0 (1₋λ⋆_k)2_|θk|2+γ log2(n) n Eθ X |k|≤m0 {λ⋆_k_}2_|θk|2|γk|−2+Ce−γ 2_log1+τ₍_n₎ .

Therefore, it follows that

E_θ_R˜₍_{θ, λ}⋆₎ _≤ _{(1 +}_γ₎E_θ_R₀₍_{θ, λ}⋆_{) + 2}_γlog 2₍_n₎ n Eθ X |k|≤m0 {λ⋆_k}2|θk|2|γk|−2+Ce−γ 2_log1+τ₍_n₎ , ≤ (1 + 2γ)E_θ_R_α₍_{θ, λ}⋆_{) + (1}₋_α₎_γ_k_θ_k2E_θ_∆(¯ _λ⋆_{) +}_Ce−γ2log1+τ(n)_. Using (4.7), we get E_θ_k_θ⋆₋_θ_k2 _≤_{(1 + 2}_γ₎2E_θ_R_α₍_{θ, λ}⋆_{) +}_Cγ−1E_θ_∆(_λ⋆_{) +}_C₍₁₋_α₎_γ_k_θ_k2E_θ_∆(¯ _λ⋆_{) +}_Ce−γ2log1+τ(n)_. (4.8)

(17)

This concludes the Step 1.

Step 2: First, we write Uα(Y, λ⋆)in terms of Rα(θ, λ⋆). Remark that

Uα(Y, λ⋆) = X |k|≤m0 ({λ⋆_k}2−2λ⋆_k)|γk|−2 |˜ck|2− ǫ2 n + ǫ 2 n X |k|≤m0 {λ⋆_k}2|γk|−2 +αlog 2₍_n₎ n X |k|≤m0 {λ⋆_k_}2_|γk|−4 |˜ck| − ǫ2 n , = Rα(θ, λ⋆) + X |k|≤m0 (_{λ⋆_k_}2₋2λ⋆_k)_|γk|−2 |c˜k|2− ǫ2 n −(1₋λ⋆_k)2θ_k2 − X |k|≥m0 |θk|2+α log2(n) n X |k|≤m0 {λ⋆_k_}2 |γk|−4 |c˜k| − ǫ2 n − |γk|−2|θk|2 . (4.9)

Recall that for all k∈N

|˜ck|2 =|θk˜γk|2+ ǫ2 n|ξk| 2_{+ 2}_√ǫ nRe(θkγ˜kξ¯k), and |γk|−2|c˜k|2=|θk|2 ˜ γk γk 2 + ǫ 2 n|γk| −2_|_ξ k|2+ 2 ǫ √ n|γk| −2_Re₍_θ k˜γkξ¯k).

Hence, equality (4.9) can be rewritten as

Rα(θ, λ⋆) = Uα(Y, λ⋆) +kθk2+ X |k|≤m0 (2λ⋆_k_{− {}λ⋆_k_}2)θ_k2 ! ˜ γk γk 2 −1 % +ǫ 2 n X |k|≤m0 (2λ⋆_k_{− {}λ⋆_k_}2)_|γk|−2(|ξk|2−1) +2√ǫ n X |k|≤m0 (2λ⋆_k− {λ⋆_k}2)|γk|−2Re(˜γkθkξ¯k) +αlog 2₍_n₎ n X |k|≤m0 {λ⋆_k}2 |γk|−2|θk|2− |γk|−4 |c˜k|2− ǫ2 n , = Uα(Y, λ⋆) +kθk2+C1+C2+C3+C4. (4.10) First consider the bound of C1. Thanks to Lemma 4.2

C1 := X |k|≤m0 (2λ⋆_k_{− {}λ⋆_k_}2)θ_k2 ! ˜ γk γk 2 −1 % , ≤ γE_θ_R_α₍_{θ, λ}⋆_{) +} γ+ γ− 1 log2(n) Rα(θ, λ0) + (1−α)γkθk2Eθ∆(λ⋆) +(1−α)γ−1kθk2∆(¯ λ0) +Ce−γ2log1+τ(n).

Concerning C2, we use the inequality (36) of Cavalieret al. (2002). We get C2 := ǫ2 n X |k|≤m0 (2λ⋆_k_{− {}λ⋆_k_}2)_|γk|−2(|ξk|2−1), ≤ 2γǫ 2 n X |k|≤m0 {λ⋆_k_}2_|γk|−2+Cγ−1Eθ∆(λ⋆). (4.11)

(18)

Then, using Lemma 4.3 C3 := ǫ √ n X |k|≤m0 (2λ⋆_k_{− {}λ⋆_k_}2)_|γk|−2Re(˜γkθkξ¯k) ≤ 3γE_θ_R₍_{θ, λ}⋆_{) + 2}_γR₍_λ0_{, θ}_{) +}_Cγ−1E_θ_∆(_λ⋆_{) +}_Cγ−1E_θ_∆(_λ0₎_. _(4.12) We are now interested in C4, the last residual term of (4.10). Thanks to the definition of˜ck:

C4 := log2n n Eθ X |k|≤m0 {λ⋆_k_}2_|γk|−2 −|γk|−2|˜ck|2+ ǫ2 n|γk|− 2₊_|_θ k|2 = log 2_n n Eθ X |k|≤m0 {λ⋆_k_}2_|γk|−2|θk|2 ! 1₋ ˜ γk γk 2% +ǫ 2 n log2n n Eθ X |k|≤m0 {λ⋆_k_}2_|γk|−4(1− |ξk|2) −2log 2_n n ǫ √ nEθ X |k|≤m0 {λ⋆_k}2|γk|−4Re(θkγ˜kξ¯k), ≤ DγE_θ_R_α₍_{θ, λ}⋆_{) +}_DγR_α₍_{θ, λ}₀₎ +(1−α)γkθk2E_θ_∆(_λ⋆_{) + (1}₋_α₎_γ−1_k_θ_k2_∆(¯ _λ0_{) +}_Ce−log1+τ(n)_, _(4.13) for some D, C >0 independent ofǫ and n. Indeed, we can use essentially the same algebra as for the bound of the terms C1, C2 and C3 and the inequality

|γk|−2≤

n

log2n, ∀k≤m0.

Now, we are interested in the terms∆(.) and∆(¯ .)introduced in (4.4). For all λ_∈Λ andx >0, we have sup |k|≤m0 λ2_k|γk|−2 ≤ 1 x X |k|≤m0 λ2_k|γk|−2+ sup |k|≤m0 λ2_k|γk|−21n xsup_|k|≤m0λ 2 k|γk|−2≥P|k|≤m0λ 2 k|γk|−2 o ≤ 1 x X |k|≤m0 λ2_k|γk|−2+ω(x), (4.14)

where the function ω is introduced in Theorem 2.1. Hence, using (4.10)-(4.14) with a suitable choice forx, (1₋Dγ)E_θ_R_α₍_{θ, λ}⋆₎ _≤ E_θ_U_α₍_{Y, λ}⋆_{) +}_k_θ_k2₊_D γ+ γ− 1 log2(n) Rα(θ, λ0) +Ce−log 1+τ₍_n₎ +C1 ǫ2 nLΛω (1−α)kθk 2_L Λγ−1+C2 log2n n ω (1−α)kθk 2_log2₍_n₎_γ−1

Step 3: From the definition of λ⋆, we immediately get (1₋Dγ)E_θ_R_α₍_{θ, λ}⋆₎ _≤ E_θ_U_α₍_{Y, λ}0_{) +}_k_θ_k2₊_D γ+ γ− 1 log2(n) Rα(θ, λ0) +Ce−log 1+τ₍_n₎ +C1 ǫ2 nLΛω (1−α)kθk 2_L Λγ−1+C2 log2n n ω (1−α)kθk 2_log2₍_n₎_γ−1_,

where λ0 denotes the oracle bandwidth. Step 4: Using that for all |k| ≤m0,Eθ

ˆ Θ2_k= |θk|2 1− 1 n+nγ12 k ≤ |θk|2 1− 1 n +log21₍_n₎ it follows that E_θ_U_α₍_{Y, λ}0₎_≤ 1 + 1 log2(n) Rα(θ, λ0)− kθk2 .

(19)

4.2 Technical Lemmas

Lemma 4.1 For all K >0, we have E_θ X |k|≤m0 {λ⋆_k_}2 ˜ γk γk − 1 2 |θk|2 ≤K log2(n) n Eθ X |k|≤m0 {λ⋆_k_}2_|γk|−2|θk|2+Ce−Klog 1+τ₍_n₎ ,

where C, τ denote positive constants independent of ǫand n.

PROOF. Let Q >0 a deterministic term which will be chosen later. E_θ X |k|≤m0 {λ⋆_k_}2 ˜ γk γk − 1 2 |θk|2 =Eθ X |k|≤m0 {λ⋆_k_}2_|θk|2|γk|−2|˜γk−γk|2, ≤ QE_θ X |k|≤m0 {λ⋆_k}2|θk|2|γk|−2+Eθ X |k|≤m0 {λ⋆_k}2|θk|2|γk|−2 |˜γk−γk|2−Q 11_{|˜γk−γk|2>Q}. Thanks to (2.7) and the monotonicity of λ, we have

E_θ X |k|≤m0 {λ⋆_k}2|θk|2|γk|−2 |˜γk−γk|2−Q 11{|˜γk−γk|2>Q} ≤ C n log2(n) X |k|≤m0 |θk|2Eθ |˜γk−γk|2−Q 11_{|˜γk−γk|2>Q}. For all _|k_{| ≤}m0, using an integration by part

E_θ_|_˜_γ_k₋_γ_k_|2₋_Q₁₁

{|γ˜k−γk|2>Q}=

Z +∞

Q

P(_|γ˜k−γk|2 ≥x)dx.

Letx≥Q. A Bernstein type inequality provides

P(|˜γk−γk|2 ≥x) = P ! 1 n n X l=1 n e−2iπkτl₋_E_[_e−2iπkτl_] o ≥ √ x % , ≤ 2 exp − (n √ x)2 2Pn_l₌₁Var(e−2iπkτl) +n√x/3 , ≤ 2 exp − n 2_x 2n+n√x/3 .

Hence, for all|k| ≤m0,

E_θ_|_˜_γ_k₋_γ_k_|2₋_Q₁₁ {|γ˜k−γk|2>Q} ≤ Z +∞ Q exp − nx 2 +√x/3 dx, ≤ Z 36 Q expn−nx 4 o dx+ Z +∞ 36 exp−Cn√x dx≤ C ne −Qn/4_,

where C denotes a positive constant independent of Q. Let K > 0. Choosing for instance

Q=n−1Klog2(n), we obtain E_θ X |k|≤m0 {λ⋆_k_}2 ˜ γk γk − 1 2 |θk|2 ≤ K log2(n) n Eθ X |k|≤m0 {λ⋆_k_}2_|γk|−2|θk|2+C nm0 log2(n)e −Klog2 (n)/4_, ≤ Klog 2₍_n₎ n Eθ X |k|≤|m0| {λ⋆_k_}2_|γk|−2|θk|2+Ce−Klog 1+τ₍_n₎ ,

(20)

Lemma 4.2 Let λ⋆ _{defined in (2.13). For all deterministic filter} _λ_and ₀_{< γ <}₁_{, we have}

E_θ X |k|≤m0 |θk|2{λ⋆k}2 ! ˜ γk γk 2 −1 % ≤ γE_θ_R_α₍_{θ, λ}⋆_{) +} γ+ γ− 1 log2(n) Rα(θ, λ0) +(1₋α)γ−1_kθ_k2∆(¯ λ0) + (1₋α)γ_kθ_k2E_θ_∆(¯ _λ⋆_{) +}_Ce−γ2log1+τ(n)_. where C, τ denote positive constants independent of ǫand n.

PROOF. First, remark that E_θ X |k|≤m0 |θk|2{λ⋆k}2 ! ˜ γk γk 2 −1 % = E_θ X |k|≤m0 |θk|2{λ⋆k}2|γk|−2(|γ˜k−γk+γk|2− |γk|2), = E_θ X |k|≤m0 |θk|2{λ⋆k}2|γk|−2|˜γk−γk|2 +2E_θ X |k|≤m0 |θk|2{λ⋆k}2|γk|−2Re((˜γk−γk)¯γk) (4.15)

Letλ_∈Λ be a deterministic filter, sinceE_θ_γ_˜_k₌_γ_k _{for all}_k_∈N_{, we can write that} E_θ X |k|≤m0 |θk|2{λ⋆k}2|γk|−2Re((˜γk−γk)¯γk) =Eθ X |k|≤m0 |θk|2({λ⋆k}2−λ2k)|γk|−2Re((˜γk−γk)¯γk),

and simple algebra yields

|{λ⋆_k_}2₋λ2_k_{| ≤}λ⋆_k_|1₋λ⋆_k_|+λk|1−λk|+λ⋆k|1−λk|+λk|1−λ⋆k|,∀k∈N.

Next, the Young inequality implies that for allγ _∈]0; 1]: E_θ X |k|≤m0 |θk|2 {λ⋆_k}2−λ2_k|γ_k|−2Re((˜γ_k−γ_k)¯γ_k) (4.16) ≤ γ−1 X |k|≤m0 [{λ⋆_k}2+{λk}2 |θk|2|γk|−2|γ˜k−γk|2+γ X |k|≤m0 |1−λk|2+|1−λ⋆k|2 |θk|2.

Hence, from equations (4.15) and (4.16), we obtain E_θ X |k|≤m0 |θk|2{λ⋆k}2 ! ˜ γk γk 2 −1 % ≤ (1 +γ−1)E_θ X |k|≤m0 |θk|2{λ⋆k}2|γk|−2|˜γk−γk|2+γ−1Eθ X |k|≤m0 |θk|2λ2k|γk|−2|γ˜k−γk|2 +γE_θ X |k|≤m0 |1−λk|2+|1−λ⋆k|2 |θk|2.

A direct application of Lemma 4.1 provides, for all K >0 E_θ X |k|≤m0 |θk|2 ! ˜ γk γk 2 −1 % ≤ (1 +γ−1)Klog 2₍_n₎ n Eθ X |k|≤m0 |θk|2{λ⋆k}2|γk|−2+ γ−1 n X |k|≤m0 λ2_k|θk|2|γk|−2(1− |γk|2) +2γE_θ X |k|≤m0 [_|1₋λk|2+|1−λk⋆|2]|θk|2+Ce−Klog 1+τ₍_n₎ .

(21)

FixK =γ2 and remark that X |k|≤m0 |γk|−2|θk|2λ2k≤ kθk2 sup |k|≤m0 λ2_k|γk|−2,∀λ∈Λ,

in order to conclude the proof of Lemma 4.2.

Lemma 4.3 Let λ⋆ the filter defined in (2.13). For all deterministic filter λand 0< γ <1, we have 2ǫ √ nEθ X |k|≤m0 2λ⋆_k− {λ⋆_k}2|γk|−2Re(θk˜γkξ¯k)≤2γR(θ, λ) +3γE_θ_R₍_{θ, λ}⋆_{) +}_Cγ−1E_θ_∆(_λ⋆_{) +}_Cγ−1E_θ_∆(_λ_{) +}_Ce−log1+τ(n)_, for someτ >0.

PROOF. In the following, we first state the inequality

P !_m₀ \ k=1 ˜ γk γk ≤2 % ≥1₋Cm0exp(−log2n), for some τ >0. Indeed

Then, for allγ >0, using the above result and inequality (35) of Cavalieret al. (2002), we obtain 2ǫ √ nEθ X |k|≤m0 2λ⋆_k_{− {}λ⋆_k_}2_|γk|−2Re(θk˜γkξ¯k) ≤ γ    X |k|≤m0 (1₋λk)2|θk|2+ ǫ2 nEθ X |k|≤m0 λ2_k_|γk|−4|γ˜k|2    +γE_θ    X |k|≤m0 (1₋λ⋆_k)2_|θk|2+ ǫ2 n X |k|≤m0 {λ⋆_k_}2_|γk|−4|˜γk|2    +Cγ−1E_θ_∆(_λ⋆_{) +}_Cγ−1E_θ_∆(_λ₎_, ≤ 4γ    X |k|≤m0 (1₋λk)2|θk|2+ ǫ2 nEθ X |k|≤m0 λ2_k_|γk|−2    +Ce−log1+τ(n) +4γE_θ    X |k|≤m0 (1−λ⋆_k)2|θk|2+ ǫ2 n X |k|≤m0 {λ⋆_k}2|γk|−2    +Cγ−1E_θ_∆(_λ⋆_{) +}_Cγ−1E_θ_∆(_λ₎_. This concludes the proof.

(22)

References

Bigot, J. (2006). Landmark-based registration of curves via the continuous wavelet transform. Journal of Computational and Graphical Statistics,15(3), 542-564.

Bigot, J. and Gadat, S. (2010). A deconvolution approach to estimation of a common shape in a shifted curves model. Annals of statistics,to be published.

Bigot, J., Gamboa, F. and Vimond, M. (2009). Estimation of translation, rotation and scal-ing between noisy images usscal-ing the fourier mellin transform. SIAM Journal on Imaging Sciences, 2, 614-645.

Brown, L. D. and Low, M. G. (1996). Asymptotic equivalence of nonparametric regression and white noise. Ann. Statist.,24(6), 2384–2398.

Castillo, I. and Loubes, J.-M. (2009). Estimation of the distribution of random shifts deformation. Math. Methods Statist.,18(1), 21–42.

Cavalier, L., Golubev, G. K., Picard, D. and Tsybakov, A. B. (2002). Oracle inequalities for inverse problems. Ann. Statist., 30(3), 843–874. (Dedicated to the memory of Lucien Le Cam)

Cavalier, L. and Hengartner, N. W. (2005). Adaptive estimation for inverse problems with noisy operators. Inverse Problems,21(4), 1345–1361.

Cavalier, L. and Raimondo, M. (2007). Wavelet deconvolution with noisy eigenvalues. IEEE Trans. on Signal Processing, 55, 2414–2424.

Dalalyan, A. (2007). Penalized maximum likelihood and semiparametric second-order efficiency. Math. Methods of Statist.,16(1), 43-63.

Dalalyan, A., Golubev, G. and Tsybakov, A. B. (2006). Penalized maximum likelihood and semiparametric second-order efficiency. Ann. Statist.,34(1), 169-201.

Donoho, D. L. (1995). Nonlinear solution of linear inverse problems by wavelet-vaguelette decomposition. Appl. Comput. Harmon. Anal.,2(2), 101–126.

Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: Asymptopia? J. Roy. Statist. Soc. Ser. B,57, 301–369.

Efromovich, S. and Koltchinskii, V. (2001). On inverse problems with unknown operators. IEEE Transactions on Information Theory,47(7), 2876-2894.

Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist.,19, 1257–1272.

Gamboa, F., Loubes, J.-M. and Maza, E. (2007). Semi-parametric estimation of shifts. Electron. J. Stat.,1, 616–640.

Gasser, T. and Kneip, A. (1992). Statistical tools to analyze data representing a sample of curves. Ann. Statist.,20(3), 1266-1305.

Gasser, T. and Kneip, A. (1995). Searching for structure in curve samples. JASA, 90(432), 1179-1188.

Hoffmann, M. and Reiß, M. (2008). Nonlinear estimation for linear inverse problems with error in the operator. Annals of Statistics,38, 310-336.

Isserles, U., Ritov, Y. and Trigano, T. (2008). Semiparametric density estimation of shifts between curves. preprint.

Kneip, A. and Gasser, T. (1988). Convergence and consistency results for self-modelling regres-sion. Ann. Statist.,16, 82-112.

Liu, X. and Müller, H. (2004). Functional convex averaging and synchronization for time-warped random curves. JASA,99(467), 687-699.

Marteau, C. (2006). Regularization of inverse problems with unknown operator. Mathematical Methods of Statistics,15, 415-443.

Marteau, C. (2009). On the stability of the risk hull method for projection estimator. Journal of Statistical Planning and Inference,139, 1821-1835.

Ramsay, J. and Li, X. (2001). Curve registration. Journal of the Royal Statistical Society, Series B,63, 243-259.

(23)

Ramsay, J. and Silverman, B. (2002). Functional data analysis. New York: Spriner-Verlag: Lecture Notes in Statistics.

Ramsay, J. and Silverman, B. (2005). Applied functional data analysis. New York: Spriner-Verlag: Lecture Notes in Statistics.

Ronn, B. B. (1998). Nonparametric maximum likelihood estimation for shifted curves. JRSS B, 60, 351-363.

Vimond, M. (2010). Efficient estimation for a subclass of shape invariant models. Annals of statistics,38(3), 1885-1912.

Wang, K. and Gasser, T. (1997). Alignment of curves by dynamic time warping. Ann. Statist., 25(3), 1251-1276.