Contents lists available atScienceDirect
Applied
and
Computational
Harmonic
Analysis
www.elsevier.com/locate/acha
Minimax
optimal
procedures
for
testing
the
structure
of
multidimensional
functions
John Astona, Florent Autinb,Gerda Claeskensc, Jean-Marc Freyermutha,∗, Christophe Pouetb
a Statistical Laboratory, University of Cambridge, Wilberforce Road Cambridge CB3 0WB, England,
United Kingdom
bAix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
cORSTAT and Leuven Statistics Research Center, KU Leuven, Naamsestraat 69, 3000 Leuven, Belgium
a r t i cl e i n f o a b s t r a c t
Article history:
Received1June2016
Receivedinrevisedform22March 2017
Accepted1May2017 Availableonlinexxxx
CommunicatedbyChristopherHeil
Keywords:
Adaptation Anisotropy Atomicdimension Besovspaces Gaussiannoisemodel Hyperbolicwavelets Hypothesistesting Minimaxrate Soboldecomposition Structuralmodeling
We present a novel method for detecting some structural characteristics of multidimensional functions. We consider the multidimensional Gaussian white noisemodel withan anisotropic estimand.Usingthe relation betweenthe Sobol decompositionandthegeometryofmultidimensionalwaveletbasiswecanbuildtest statisticsforanyoftheSobolfunctionalcomponents. Weassesstheasymptotical minimax optimality of these test statistics and show that they are optimal in presence of anisotropy with respect to the newly determined minimax rates of separation.Anappropriatecombinationoftheseteststatisticsallowstotestsome general structural characteristics such as the atomic dimension or the presence of some variables. Numerical experiments show the potentialof our method for studyingspatio-temporalprocesses.
©2017TheAuthor(s).PublishedbyElsevierInc.Thisisanopenaccessarticle undertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/).
1. Introduction
Multidimensional data often exhibit a simpler underlying structure, meaning that their effective di-mension is smaller than the observed dimension. Detecting the presence of such structure can enhance the understanding of the data and allows for more effective modeling and inferential strategies. There is a flourishing literature dealing with nonparametric methods for structure detection. These contributions are concerned with different typesof structures (such as additivity, small atomic dimension and variable selection) and with different modeling approaches according to the nature of the noise, the smoothness
* Correspondingauthor.
E-mail address:[email protected](J.-M. Freyermuth). http://dx.doi.org/10.1016/j.acha.2017.05.003
1063-5203/©2017TheAuthor(s).PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBYlicense (http://creativecommons.org/licenses/by/4.0/).
assumptions for instance.A briefoverview of the most directlyrelated contributions are given hereafter. Themaincharacteristicofthispaperistoprovidearigoroustheoreticalstudyofstructuredetectioninthe presenceofananisotropicallysmoothestimand.WeconsideramultidimensionalGaussianwhitenoisemodel and deriveteststatisticsfortheatomicdimensionofmultidimensionalobjects(i.e.,themaximaldegreeof interactionbetweenthevariables) andforvariable selection.Then,following theSieveestimation ofBirgé and Massart [5], we build adata-driven procedure. It is first tested in ‘idealistic’ numerical experiments before beingappliedto amoresophisticatedcontextof timeseriesdataanalysis.
Themainingredientofourmethodologyistoprojectthedataontoanhyperbolic(wavelet)basisandto build test statisticsbased ontheprojection coefficients.Its motivation relies onthe relationthatemerges between the geometry of that basis and the ‘functional components’ of the Sobol decomposition of the estimand (see Sobol [34]). The use ofan appropriate basis allowsto benefitfrom a sparse representation andanoptimaladaptationtotheanisotropicsmoothnessoftheestimand.Inthispaperweconstructoptimal testingproceduresforthefunctionalcomponentsaccordinglytotheminimaxhypothesistestingframework ofIngster[11–13].Appropriatelycombinedthesefunctionalcomponentscanberephrasedintermsofmore generalstructures suchas theatomicdimension.
This project ismotivated by therecentresultsof Autinet al.[3,4] whostudied theabilityof a tensor-productwaveletbasis,theso-calledhyperbolicwaveletbasistoestimatemultidimensionalfunctionshaving anisotropicsmoothness.Tensor-productbases(Fourierorwavelets)havealreadybeenwidelyusedinsignal detectionnotablytotestforadditivity(i.e.atomicdimensionequalsto1)asinDe CanditiisandSapatinas [8]. Nevertheless, in the latter they do not provide deep theoretical results on the performance of their method.Abramovichet al. [1]describe anotherprocedurefortestingadditivity.Theyproposeanadaptive procedure based onthe thresholding wavelet coefficients and derive interesting theoretical results. Never-theless,theyexploitastandard(isotropic)waveletbasisthatcannotoptimallydealwiththemorerealistic situationofhavinganisotropicestimands. Moreover,theirmethodislimitedtotestforadditivityanddoes not exploit the full directional representation of a multidimensional wavelet basis. The functional frame-work (where the dimensiond → ∞) hasalso been investigated by Gayraud and Ingster[10] whostudied the problem of signal detection in the caseof sparse additive functions. Other developments include, for example,multichannelsignaldetectionas inIngsterandSuslina[18],anddetectionininversemodelsasin Ingsteret al.[14].
Comminges andDalalyan [6]madeprofoundtheoretical contributions onhypothesistesting procedures basedonquadraticfunctionals.Theystudyvarioustestingproblemsinvolvingmultidimensionalanisotropic functions.Theyexploitatensor-productFourierbasistobuildnonadaptiveprocedures,i.e.proceduresthat arebuiltontheknowledgeoftheregularityparametersoftheanisotropicfunctionspacesinthealternative. Inthispaper,wefocusontheconsequencesofdealingwithanisotropicestimands.Therefore,wedefinetwo classes of testing procedures,referred to as minimax optimal and adaptiveminimax optimal methodsfor the Besov balls,see Section4.The formermethodology isbuilt usingthe full knowledgeofthe regularity parameterswhilethelatterusesonlyapartofit.
These methodsdifferby the natureof thetruncation appliedto the hyperbolic wavelet coefficients se-quence. An intuitive way to think about this is to consider the better known multidimensional function estimation context. The truncation rule characterizing the minimax optimal method can be viewed as a linear butdirectionallydependentsmoothing. Whilethetruncation rule oftheadaptive minimaxoptimal methodfinds itsroots inthe approximationof thehyperbolic crossthatisnaturallyassociated to tensor-product spaces (see for instance Schmeisser [31], Schmeisserand Sickel [32] and Sickel and Ullrich [33]). Tensor-product spaces are also oftenconsidered invarious statistical contexts such as in estimation with for examplethetensor-product spaceANOVAmodelas inLin[24]or insignaldetectionas inIngsterand Stepanova [15]. They are of great interest since one can hope to reach performances close to the unidi-mensional case. It is sometimespromoted as away to ‘tackle’the curse of dimensionality, neverthelessit is arestrictive assumption.In thecontext of functional ANOVA, forexample, assuming atensor-product
modelmeansthatthe interactionterm’s complexityhas tobe inverselyproportional to itsdegree.In this paperwedonotrestricttothestructuredetectionoffunctionsbelongingtotensor-productofBesovspaces buttolargerclassessuchas theanisotropicNikolskiiclass. Weshowthatourmethodsattainthe optimal separation rates between the null and the alternative hypothesis and we discuss how to combine sets of testingproceduresinawaythatisadaptedto theanisotropiccase.
Such testing procedures find applications in many fields. We illustrate its usage in time series data analysis. Our aim is to test the structure of the spatio-spectrum associated to aspatio-temporal process thatistimestationary.This allowsusalsotoillustratehow toadaptsuchtestingprocedureinthecontext ofmultidimensionaldatawhere thenumberofobservationsisverylarge,makingp-valuesuseless.
Thispaperisstructuredasfollows.First,inSection2,wegivethefundamentalknowledgeonhyperbolic waveletbasesandtheirrelationtotheSoboldecompositionofamultidimensionalfunction.InSection3we introducethemultidimensionalGaussianwhitenoisemodelandtheminimaxhypothesistestingframework. ThenwedescribeourteststatisticsforSobolfunctionalcomponentsinSection4beforeintroducingtestsfor globalstructuralcharacteristicsinSection5.Section6describesourdata-drivenadaptationandvalidation in a practical setting of the adaptive minimax optimal procedure and a final discussion is provided in Section7.Theproofsofthetheoreticalresultsarepostponedtotheappendix.
2. Hyperbolicwaveletbases
Westartfromaone-dimensionalperiodizedwaveletbasisB1ofL2([0,1)) whichisbuiltfromthedilations
andtranslationsofascalingfunctionφ andawavelet ψ withV (forsomeV ≥ 1)vanishingmoments,
B1=
φ(.), ψj,k(.) = 2j/2ψ(2j.− k) : j ∈ N, k ∈ {0, . . . , 2j− 1}
.
Ad-dimensionalhyperbolicwaveletbasisresultsbyformingd-variatefunctionstakingappropriateproducts oftheunivariatefunctionsφ andψ asfollows,
Bd =
φ0,0(.), ψj, ki (.) : i∈ {0, 1}d\ 0, j ∈ Ji, k∈ Kj
where0 = (0,. . . ,0),φ0,0(.)= φ(.)× · · · × φ(.),ψij,k(.)= ψji11,k1(.)× · · · × ψjidd,kd(.),i = (i1,. . . , id) withthe
followingnotations: ψiu ju,ku(·) = 2ju/2φ(2ju· −k u) if iu= 0 2ju/2ψ(2ju· −k u) if iu= 1 and Ji=j = (j 1, . . . , jd) : ∀u ∈ {1, . . . , d}, ju= juiu, ju∈ N , Kj= k = (k1, . . . , kd) : ∀u ∈ {1, . . . , d}, ku∈ {0, . . . , 2ju− 1} .
Thebasis Bd is generated byaset of ascaling functionφ0,0 and translated anddilated wavelet functions
ψij,k. If all the components of the vector of directional dilations j are equal, this results in the standard (isotropic) wavelet basis. Hereafter we consider that the components of j can take different values. The wavelet functions are then supported on hyper-rectangles and the resulting wavelet basis, the so-called hyperbolicortensor-productwaveletbasis,isproventobeabletooptimallydealwithanisotropicfunctions [3,4].Wesaythateachofthe2d− 1 elementsofi definesanorientation.Insuchabasis,anyf ∈ L
2([0,1)d)
f =f, φ0,02φ0,0+ i=0 j∈Ji k∈Kj f, ψi j,k2ψ i j,k ≡ f0+ i=0 fi (1)
where .,.2 isthe L2-scalarproduct.This representation facilitates acharacterization ofa specific
struc-ture, such as additivity. Indeed, for an additive function fadd, that is a function with Sobol
decompo-sition fadd(x1,. . . ,xd) = f0+du=1fu(xu), the coefficients θj,ki = f,ψ i
j,k2 in each orientation i with
|i| = i1+ i2+· · · + id > 1 areexactlyequalto 0.Informationaboutthe componentfunctionsfi in
equa-tion(1)canberetrievedviathewaveletcoefficientsequencethroughthegeometryofthehyperbolicwavelet basis.Thisenablesthedesignofspecifictestingproceduresforstructuralinformationformultivariate func-tions,moregeneralthanmerelytestingforadditivity.
Dalalyanet al.[7]introducedtheatomicdimensioninthephysicaldomainthatwedenoteasδ(f ).It re-flectsthemaximaldegreeofinteractionbetweenthed variablesintheSoboldecomposition.Forinstance,the additivemodelfadd hasδ(fadd)= 1,whileafunctionf (x1,. . . ,xd)=du=1fu(xu)+du=1
d
v=1,v=uxuxv
has δ(f ) = 2. In the sequel, we use the definition from Autin et al. [3] of the atomic dimension in the wavelet coefficient domain relatingthe orientationsof thenonzero wavelet coefficientswith corresponding Sobol functionalcomponents.
Definition 2.1.Letf ∈ L2([0,1)d) bedecomposedinthehyperbolic waveletbasisas
f = f0+ i=0 fi = θ00,0φ0,0+ i /∈0 ⎛ ⎝ j∈Ji k∈Kj θij,kψj,ki ⎞ ⎠ .
Define Af ={i ∈ {0,1}d\ 0; θij,k = 0 for some (j,k)}.Theatomicdimensionoff withrespecttoBdisthe
integerδBd= δBd(f )∈ {0,. . . ,d} such that:
δBd =
max{|i|; i ∈ Af} if Af = ∅
0 ifAf =∅.
Definition 2.1 gives us more flexibility than its definition in the physical domain. Indeed, the atomic dimensionδBddependsonthenumberofvanishingmoment(evendirectionalones)oftheconsideredbasisBd.
Moreprecisely,theatomicdimensionδ(f ) ofad-dimensionalfunctionf inthephysicdomainandtheatomic dimensionδBd(f ) ofthesamefunctionf inthecoefficientdomainareclearlytiedbythefollowinginequality: δBd(f ) ≤ δ(f). We mention here two situations to emphasize the practical importance of being able to
characterize theatomic dimension inthecoefficientdomain.The firstexample concernsmultidimensional function estimation.Autinet al.[3] introducedanovelestimation procedurebasedonthe thresholdingof thehyperbolicwaveletcoefficients.Theyshowthatitoutperformsthestandardhardthresholdingwhenever somestructuralconditionsontheestimandaresatisfied.Theseconditionsalsoincludesomeconstraintonthe valueoftheatomicdimensionδBd.Hereweprovideateststatistictotestthestructureoftheestimandbefore
applying the aforementioned estimation method. A second example concerns the exploitation of sparsity in statistical modeling that can be useful for building a statistical model using the hyperbolic wavelet coefficients or in generalized linear modeling. For example, Zhou et al. [38] describe such a generalized linear model with multidimensional covariates, their first step consists in dimension reduction by tensor decomposition. Onecould instead exploitthesparsity of thehyperbolic wavelet sequence. In suchacase, onemayseekforthehyperbolic waveletbasisthatprovidesthesparsestsituation.
3. Modelandminimaxapproachforhypothesis testing
dYε(x) = f (x)dx + εdW (x) (2)
wherex = (x1,. . . ,xd)∈ [0,1)d,f ∈ L2([0,1)d),W (x) istheBrowniansheetandε isthenoiselevel,
consid-eredheretobeclosetozero.Fortechnicalreasons(seetheproofsofPropositionA.3andPropositionA.4), withoutlossofgenerality,weassumethatitissmallerthanεo= e−e,wheree denotestheEuler’snumber.
WeconsiderthesequentialversionoftheGaussianwhitenoisemodelwhichconsistsoftheobservationsof thefollowingrandomvariables
ˆ θ00,0= θ00,0+ εξ =f, φ0,0L2+ εξ and θˆ i j,k = θ i j,k+ εξ i j,k =f, ψ i j,kL2+ εξ i j,k
where,ξ,ξj,ki arei.i.d.N (0, 1) and(j,k)∈ Nd× Kj.
For any chosenorientation i= 0,we maytest the following nullhypothesisHi,0 versus oneof thetwo
statedalternativehypotheses Hi,a andHi,a:
Hi,0: f ∈ Ni(R) = ⎧ ⎨ ⎩f : f 2≤ R, fi(x) = j∈Ji k∈Kj θij,kψij,k(x) = 0, ∀x ∈ [0, 1)d ⎫ ⎬ ⎭, (3) Hi,a: f ∈ Ai(R, C, s, rε,i) = ⎧ ⎪ ⎨ ⎪ ⎩f : f ∈ F s(R), f i 2= ⎛ ⎝ j∈Ji k∈Kj θj,ki 2 ⎞ ⎠ 1 2 ≥ Crε,i ⎫ ⎪ ⎬ ⎪ ⎭, (4) H i,a: f ∈ ∪s:uius−1u =γi−1Ai(R, C, s, r ε,i). (5)
Intheaboveexpression,C isapositiveconstantwhichdoesnotdependonε andrε,i,rε,i arecontinuous
sequencesofpositiverealnumberstendingto0 asε goesto0.Fs(R) denotesaballofradiusR inafunction
spacecharacterizedbyavectorofdirectionalregularitiess withharmonicsumintheorientationi denoted
asγi−1.
Given a simultaneous controlon the probabilities of type I and II errors, we consider the asymptotic minimax setup: we aim at providing, for any orientation i, the order of the optimal separation rates in the sense of Section2.6 in Ingsterand Suslina [17] between thenull hypothesis Hi,0 and eachone of the
alternativehypotheses Hi,a andHi,a associated toaparticular choiceofFswhen:
• s is known;
• onlytheharmonicsumγ−1i =uius−1u ofs withrespectto orientationi isknown.
Relatedtothis,wesearchforbothasequenceofFs-minimaxoptimaltestingproceduresandasequenceof
Fs-adaptiveminimaxoptimaltestingprocedures.
Definition 3.1. Let α ∈ 0,12, s ∈ (0,+∞)d and i ∈ {0,1}d\ 0. We say that r
ε,i corresponds to the
Fs-minimaxrateseparatingthehypothesesH
i,0andHi,a,uptoaconstant,ifthetwofollowingstatements
aresatisfied:
1. (Upper bound) for any C > Ci,α, with Ci,α an absolute positiveconstant, there exists asequence of
testing proceduresΔ∗ε,α ε suchthat sup 0<ε<εo sup f∈Ni(R) Pf(Δ∗ε,α= 1) + sup f∈Ai(R,C,s,rε,i) Pf(Δ∗ε,α= 0) ≤ α,
2. (Lowerbound)forany C < ci,α,withci,αanabsolutepositiveconstant,thefollowing inequalityholds inf 0<ε<εo inf Δ sup f∈Ni(R) Pf(Δ = 1) + sup f∈Ai(R,C,s,rε,i) Pf(Δ = 0) > α.
Thesequence oftestingproceduresΔ∗ε,α
ε issaidtobe F
s-minimaxoptimal.
Definition 3.2. Let α ∈ 0,12. Consider i ∈ {0,1}d such that |i| > 1 and let γ
i > 0. We say that rε,i
corresponds totheFs-adaptiveminimaxrateseparatingthehypothesesH
i,0andHi,a,uptoaconstant,if
thetwo followingstatementsaresatisfied:
1. (Upper bound) for any C > Ci,α , with Ci,α an absolute positive constant,there exists a sequence of testingprocedureΔ ε,α εsuchthat sup 0<ε<o ⎛ ⎝ sup f∈Ni(R) Pf(Δε,α= 1) + sup s:uius−1u =γ−1i sup f∈Ai R,C,s,rε,i Pf(Δε,α= 0) ⎞ ⎠ ≤ α,
2. (Lowerbound)forany C < ci,α,withci,αanabsolutepositiveconstant,thefollowing inequalityholds
inf 0<ε<εo inf Δ ⎛ ⎝ sup f∈Ni(R) Pf(Δ = 1) + sup s:uius−1u =γ−1i sup f∈Ai R,C,s,rε,i Pf(Δ = 0) ⎞ ⎠ > α. Thesequence oftestingproceduresΔ
ε,α
ε issaidtobe F
s-adaptiveminimaxoptimal.
Inthelattersituation,wehaveonlytheinformationabouttheharmonicsumofthedirectional smooth-ness parametersofthefunctionclassinthealternative. Todeterminetheminimax andadaptiveminimax separation rates, we naturallyconsider the combination of directional smoothness with agiven harmonic sum thatis givingtheworsttype IIerror rate.The fullyadaptive casecorresponds to an unknownvalue
γi. It isadirectextension studiedherewhich couldbe called partially adaptive,thatis to saywhen γi is
known.Inthisworkforthesakeofsimplicity weuseadaptiveinsteadof partiallyadaptive.
Remark 3.1. The definition of minimax and adaptive minimax rates are equivalent to the ones given in Section1 ofLepskiand Tsybakov[22](Section1) andslightlymoreprecisethantheones giveninSection 2.6 of Ingster and Suslina [17]. We will show that both the minimax separation rate and the adaptive minimax separationrate of testing depend on the smoothness of the underlyingfunctional space and on thenoiselevelε.Attheendoftheappendix,wehighlighttherelationshipsbetweentheabsoluteconstants
Ci,α,ci,α,Ci,α ,ci,α and theparametersappearingintheproblemsuchas R andthesmoothness.
4. Nonparametrictestsfora givenorientation
In this paragraph we present test statistics and testing proceduresthat are used in the sequel. These test statistics aregoing to provide sequences oftesting procedures thatareFs-minimax andFs-adaptive
minimax optimal in the particular case where Fs corresponds to a Besov space with s as smoothness
parameter.
Definition 4.1(Teststatistic).Leti= 0 andj = (j1,. . . ,jd)∈ Ji.Thenonnegative randomvariable
Ti,j= j∈Ji: ju<max(ju,1),∀u k∈Kj (ˆθji,k)2
iscalled thei-oriented teststatisticwithlevelparameterj.
Definition4.2(Testingprocedure). Leti= 0,j = (j1,. . . ,jd)∈ Ji and t> 0. Therandom variable
Δi,j(t) = 1{Ti,j>t}
iscalled the(i,j,t)-testingprocedure.
4.1. Bs2,∞-minimaxoptimalsequenceof testingprocedures
We define appropriate test statistics based on the estimated empirical energy that is the sum of the squaredvaluesoftheempiricalhyperbolic waveletcoefficientswithinthespecified orientationover certain scales.Weshowthatthissequenceoftestsconstructedusinghyperbolicwaveletbasesgetsoptimalminimax propertieswhenf belongstoaBesov ballwithsmoothnessparameters.
Definition4.3(Besovball). LetR > 0 ands = (s1,. . . ,sd)∈ (0,+∞)d.Wesaythatf ∈ L2([0,1)d) belongs
totheBesovball B2,s∞(R) ifandonlyif
sup i=0 supj∈Ji max 1≤u≤d{2 2jusu} k∈Kj |θi j,k|2≤ R2.
In this section we assume that we have the full knowledge about the smoothness parameter s of the functionsinthe alternative.Inother words itcorresponds to thenon adaptiveset up.We define avector
j∗= (j1∗,. . . ,jd∗)∈ Ji oftruncationscalessuchthat
2−ju∗≤ ε4iuγi/(1+4γi)su < 21−j∗u (6) withγi= ( d u=1ius−1u )−1. Theorem4.1. Letα∈0,1 2
,R > 0,s in (0,+∞)d andi∈ {0,1}d\ 0.Consider, asin(3) and(4),testing
the hypotheses Hi,0 versus Hi,a where Fs(R) = Bs2,∞(R). Then the sequence of the (i,j∗,tε,i,α)-testing
procedures Δi,j∗(tε,i,α)
ε is B s
2,∞-minimax optimal with j∗ asin (6),and with ε−2tε,i,α = χ1−α/2(2|j∗|),
the1− α/2 quantileof theChi-Squaredistributionwith 2|j∗|= 2j1∗+···+jd∗ degreesof freedom.Moreoverthe B2,s∞-minimaxrate separatingHi,0 andHi,a isoforder of
rε,i= ε
4γi 1+4γi.
Theproof of Theorem 4.1is stated intheappendix. Theabsolute constants Ci,αand ci,αfrom
Defini-tion 3.1willbe giventhere.
This theorem can be compared to the results given inthe book of Ingster and Suslina [17] for Besov bodiesintheone-dimension case.Inthisbook,Theorem 3.9 statesdistinguishabilityconditionsand sharp asymptotics are given in Theorem 4.4. The results stated above in Theorem 4.1 is similar but in any dimension;thereforethesmoothnessparameterisreplacedbytheaveragesmoothnessvalueobtainedfrom theharmonicsum.
Thereadercouldnotethattheworstrateisassociatedwithγ1with1 = (1,. . . ,1),i.e.totheinteraction
termofhighestdegree.ThiscaseissimilartotheoneconsideredinIngsterandStepanova[16](Remark 3.3). Weobtainthesameminimaxrateoftestinghypothesesinvolvingtheharmonicsumofsmoothnessastheone theyfoundforasignalfunctionf inaSobolevspace.Nevertheless wecanpoint thattheirnullhypothesis
H0 : f = 0 is different from the one considered here and briefly reformulated as H1,0 : f1 = 0 which
consists of functionswithan atomicdimensionstrictly smallerthan thedimensiond. Theseparationrate canbedifferentforeveryorientation.Whenitcomestotestforthestructureofad-dimensionalfunction,a sequentialtestingprocedurebasedonthesedifferentteststatisticsseemstobeidealbecauseonecanbenefit from betterseparationrates.
Remark 4.1. Inthe context ofmultidimensional function estimation,themaxiset approach introducedby Kerkyacharian and Picard [20]has been proveduseful to study theperformance of wavelet-based estima-tors[3].Itconsistsindeterminingthelargestfunctionspacesuchthattheriskofanestimatoriscontrolled atachosenrate.Ifthechoiceoftherateisoftenanissue,itiscommontousetheminimaxornear-minimax rates oversome‘standard’functional spacesF. Thenthe maxisetapproachreveals asetof functionsthat contains F strictly or not. This is an optimistic approach in the sense that finally, we can estimate at theminimax ornear-minimax ratemorefunctionsthatpreviouslythought. Inminimaxhypothesistesting problems,wecangetinspiredfromthemaxisetapproachtoremarkthatunderthealternative,wenaturally model the‘estimand’as afunctioninad-variate smoothnessspace, hereB2,s∞(R).Nevertheless, sincewe areonlyconcernedbythepresenceofinformationalonggivendirections,thebehavioralongthedirections thatarenotofinterestcanbelet‘uncontrolled’.Hence,thesequencespaceinthealternativecanbeenlarged to thesetofallBesovspacesBs2,∞(R) withsu≤ su foriu= 0 andsu= su otherwise.
4.2. B2,s∞-adaptive minimaxprocedure
Wenowsupposethattheregularityparameters appearinginthealternativehypothesisisunknownbut its harmonicsumγ−1i =du=1iusu−1 is knownfor achosen i suchthat|i|> 1.Insomesense,weconsider
now theadaptivesetup.
To definetheteststatisticusingtheknowledgeaboutγ,wedenote byJi theintegersuchthat
2−Ji≤ε4log log ε−1
1
1+4γi < 21−Ji. (7)
Theorem 4.2. Let α ∈ 0,12, R > 0, i ∈ {0,1}d satisfying |i| > 1 and let γi > 0. Consider, as in (3)
and (5),testingthehypotheses Hi,0 versus Hi,a where Fs(R)= B s
2,∞(R), whereγi−1=
d
u=1ius−1u .Then
the sequenceof testingprocedures Δi,max(tε,i,γi,α)
ε built from themaximum of the (i,j,t
ε,i,γi,α)-testing proceduresΔi,j(tε,i,γ,α) forwhichj issatisfying|j|= JiisBs2,∞-adaptiveminimaxoptimalprovidedthat,for
any ε∈ (0,εo), ε−2tε,i,γi,α= χ1−α/2Kγi(2
Ji), with K
γi = #{j ∈ J
i:|j|= J
i}. MoreovertheB2,s∞-adaptive
minimax rateseparatingHi,0 andHi,a isof orderof
rε,i =ε4log log ε−1
γi
1+4γi .
The proof ofTheorem 4.2 isstated intheappendix. The absoluteconstants Ci,α and ci,α from Defini-tion 3.2will begiventhere.
Results given foradaptation inTheorem 4.2canbe compared to usual resultsobtained foradaptation in hypothesistesting problems. Theunavoidable loss due to adaptation is theusual onethat canalso be foundforexampleinTheorem7.2 inIngsterandSuslina[17].TheresultsstatedaboveinTheorem4.2can be comparedtothelossobtainedbySpokoiny[35]intheGaussian whitenoisemodelinone-dimensionfor thesignaldetectionproblem.
Remark4.2.Note thatcompared totheB2,s∞-minimaxcase,theoptimal separationrateisslower because log log ε−1 → ∞ asε→ 0.This attestsof thedeteriorationoftheinformation aboutthefunctionspace in
the alternative. Nevertheless,inthe alternativehypothesis, the unionof sequence spacesB2,s∞(R) can be replacedbyanotherlargersequencespace,denotedAγi,2(R) and definedhereafter.
Definition 4.4 (Truncation ball). Let R > 0, i ∈ {0,1}d satisfying |i| > 1 and γi > 0. We say that
f ∈ L2([0,1)d) belongstothetruncationballAγi,2(R) ifandonlyif
sup
j∈Ji
22|j|γi k∈Kj
(θj,ki )2≤ R2.
ThetruncationballAγi,2(R) issimilartoaBesovballinthesensethatitcontrolsthedecayoftheenergy
ofthehyperbolicwaveletcoefficientsoverthescalesandhenceitisusedtocontroltheapproximationerror. FollowingLemma2.2ofNeumann[26], thefollowing inclusionproperty holds.
Proposition4.1. Fix i∈ {0,1}d\ 0. Letγ
i> 0. ForanyR > 0,there existsR > 0 such that
s: i1s−11 +...ids−1d =γi−1
B2,s∞(R)⊂ Aγi,2(R ).
Consideringthefunctionalcomponentfi off doesnotonlyenableustoexplorefinestructuresbutitis
alsoawaytoimprove theperformanceofthetesting proceduresformoregeneralstructure,seeSection6.
5. Teststhatcombine differentorientations
Manyinterestinghypothesesonthestructure ofafunctioncanbetestedby‘aggregating’thepreviously described optimal testing procedures for theSobol functional components.In this paragraph we describe suchresultsinthecontextoftheBs2,∞-minimaxframework,butitcanbeeasilygivenfortheB2,s∞-adaptive minimaxsettingas well.Wefirststatethefollowingproposition:
Proposition 5.1. [Max-testing]Fix α∈0,12,s∈ (0,+∞)d and considerβ = 1−1−α
2
|I|
such that |I| denotesthecardinality of anonemptysetof orientationsI that characterizesthealternative hypothesis.
Consider thefollowingtestinghypotheses
H0: f ∈ S(R, I)= {f : f 2≤ R, fi(x) = 0, ∀i ∈ I, ∀x ∈ [0, 1)d},
Ha: f ∈ T (R, I)= {f : f ∈ B2,s∞(R),∃ i ∈ I, fi 2≥ Crε,i}.
(8)
Then, forany ε∈ (0,εo),thetestingprocedureΛI,ε,α= maxi∈IΔi,j∗(tε,i,α) isβ-level,i.e.itsprobabilityof
type I errorisequaltoβ.
Theproofof Proposition 5.1isanimmediateconsequenceofTheorem 4.1.Since thetesting procedures involvedareindependentfordifferentorientations, thelimitingdistributionscanbe exactlycomputedand wedonotneedtheprobablyconservativeFWER.Foranyε∈ (0,εo),
sup
f∈S(R,I)Pf(ΛI,ε,α= 1) =f∈S(R,I)sup
1− Pf
max
i∈I Δi,j∗(tε,i,α) = 0
= 1− 1−α 2 |I| = β.
Remark5.1.Underthealternative,wenaturallymodeltheestimandasafunctioninad-variatesmoothness space,hereandonceagainaB2,s∞(R).Nevertheless,inspiredfromRemark 4.1,wecanstateanotheranalogy with the maxiset approach. Let us consider the testing problem with the same hypotheses as in (8) but with a ratethat is chosen to be the worst optimalone with respect to s, thatis rε,1 = ε4γ/(1+4γ). Since
we areonlyconcernedbythepresence ofinformation alongatleast onegiven directionthatbelongstoI, no assumption on the smoothness is needed in the other orientations. Hence, the sequence space in the alternativeassociatedwiththechosenraterε,1canbeconsideredlargerthantheinitialone,Bs2,∞(R).More
precisely it couldbe replaced bythesequence spacedefinedbythe unionofall Besovballs B2,s∞(R) with
s∈ (0,+∞)d suchthat
uiusu= γ−1 foratleastonei∈ I.
Inwhatfollows,wepresent someexampleswhere theoptimaltesting proceduresthatcombinedifferent orientations couldbeuseful.
5.1. Testingthestructureofthegoal function
Take the set I(δ0) = {i ∈ {0,1}d : |i| > δ0} that contains all orientations for which the number of
contributing components of the d-vector is strictly larger than a specified value δ0. The corresponding
hypotheses fortestingfortheatomicdimensionare:
H0: f ∈ S(R, I(δ0)) ={f : f 2≤ R, δBd(f )≤ δ0},
Ha: f ∈ T (R, I(δ0))={f : f ∈ Bs2,∞(R),∃ i ∈ I(δ0), fi 2≥ Crε,i}.
(9)
Tests for theatomic dimensionare moreversatile than merelytesting foradditivity, thelatter whichis a special casewith δ0 = 1.Concluding thata structure is simplerthan afull d-dimensional object leadsto
an efficiency gain inestimation by using the appropriate methods escaping (at least partly) the curse of dimensionality whenδ(f ) issmallerthand.
Combining tests over multiple orientations i ∈ I can be done by computing the maximum of testing proceduresintheseparate orientations.
From our testingproceduresΔi,j∗(tε,i,α) ,i = 0,there is aquitenaturalway to estimatethe structure
of thegoalfunctionf byconsidering
ˆ
Af,tε,i,α =
i∈ {0, 1}d\ 0 : Δi,j∗(tε,i,α) = 1
asanestimatorofAf.Neverthelessthisestimatorcould leadtomanyfalsediscoveriesthatareany
orienta-tioni∈ Af,tε,i,αsuchthatfiisthenullfunction.Obviouslythelargerα,thebiggerthenumberofexpected
falsediscoveries.
Thereisalsoawaytonaturallygetanestimatorδ ofˆ theatomicdimensionδ(f ) ofad-dimensionalsignal
f byputting,forachosenα∈0,12, ˆ
δBd=
max{|i|; i ∈ ˆAf,tε,i,α} if ˆAf,tε,i,α = ∅
0 if ˆAf,tε,i,α =∅.
Rejecting acombined orientations nullhypothesis as in(9) when δˆBd ≥ δ0 is equivalent with rejecting H0 whenΛI(δ0),ε,α= 1.
5.2. Reductionofmodel: selectionofvariables
Anotherapplicationisto testwhethersomevariablesamongx1,. . . ,xd maybe omittedfrom themodel
or not. Consider the case of leaving out variable xm for some m ∈ {1,. . . ,d}. In this case Im = {i =
(i1,. . . ,im,. . . ,id)∈ {0,1}d : im= 1} andtherelevanthypotheses are
H0: f ∈ U(R, m)={f : f 2≤ R, ∀i ∈ Im, fi(x) = 0, ∀x ∈ [0, 1)d},
Ha: f ∈ V(R, m)={f : f ∈ B2,s∞(R),∃ i ∈ Im, fi 2≥ Crε,i}.
Inadditiontopossiblyreducethemodel,whenconsideringsuchatestingproblemcouldfindapplications inmodelingtimeseries(see Section6fordetails).
Following the two previous paragraphs, an interesting point must be underlined here. Estimating the atomicdimensionofad-dimensionalobjectorreducingthenumberofitsrelevantvariablescanbeconsidered as afirst stepto guaranteethatthe object isa functionof anumberof variables smaller thand (usually
called‘sparsevariablefunction’).Thisassumptionwasoftenassumedinrecentworksinestimationproblem asinAutinet al.[3]andintestingproblemasinIngsterandSuslina[19].Fromthedata,ourmethodology alsopermitstotestwhetherthisassumption isreasonableor not.
6. Numericalexperiments
In this section we first describe a practical implementation of the adaptive method. We present its performanceintermsofempiricalpowercurvesw.r.t. signaltonoiseratiofortestingtheatomicdimension of some multidimensional functions corrupted by an additive Gaussian white noise. Then, we propose an application in time series analysis that consists in exploring the structure of the spatio-spectrum of spatio-temporalprocesses.
6.1. Data-drivenstructure detection
To link our theory to the practical setting we referthe reader to Reiss [30] for the equivalence in the Le Cam’s sense between the multidimensional nonparametric regression problem (11) and the Gaussian whitenoisemodel(2).Thisequivalenceisassumedto holdandtypicallyrelies onsomeminimalregularity conditionsandanappropriatecalibrationofthenoiselevel ε= σN−d2. Letζl
1,...,ld be i.i.d.N (0,1), Yl1,...,ld= f l1 N, . . . , ld N + σζl1,...,ld, 1≤ lu≤ N, 1 ≤ u ≤ d. (11)
Both,theB2,s∞-minimaxandB2,s∞-adaptiveminimaxoptimalproceduresstudiedpreviouslytruncatethe waveletcoefficientsequenceatscalesthatarecalibratedbasedontheknowledgeoftheunknownsmoothness oftheestimand (basedeitherontheknowledgeofallthedirectionalregularitiesor onlyoftheirharmonic sum). Following the idea of Sieve estimation described in Birgé and Massart [5], we build a data-driven version of the adaptive testing procedure. In this numerical part, we consider only the B2,s∞-adaptive minimaxoptimalmethodsinceitsalgorithmiccomplexityisofaboutO (log2(N )) comparedtoOlog2(N )|i| fortheB2,s∞-minimax optimalmethod.
Letusconsider thesets ofallpossibleadaptivemodelsFMA
N basedonε −2 observations, FMA N = ˆ fmA = ˆθ0,00 φ0,0+ i=0 (j,k)∈mA i ˆ θij,kψij,k; mA:={mAi ∈ MAN,i}, with MAN,i:=
mi,J :=(j, k)∈ Ji× Kj;|j| ≤ J; ju≤ log2(N ), ∀u
; J ≤ |i| log2(N )
.
Theoracleapproachleadstoconsider theestimatorfˆmA
o ofFMAN suchthatE ˆfmAo − f
2
2 isminimal.This
correspondsto findmA, denotedas mA
o, suchthatthefollowingquantityisminimal:
− (i,j,k)∈mA (θij,k)2− σ 2 Nd .
Theoptimizerisfoundbysolvingthefollowing problemforeveryi∈ {0,1}d\{0},
mAi,o= arg min
mA i∈MAN,i ⎧ ⎨ ⎩− (j,k)∈mA i (θj,ki )2− σ 2 Nd ⎫⎬ ⎭.
Inpractice,wepluginempiricalquantitiesandadjustforthevariabilityinthedataproposingthefollowing optimization,withλ = ˆˆ σ2dN−dlog N theuniversalthreshold,
ˆ
mAi,o= arg min
mA i∈MAN,i ⎧ ⎨ ⎩− (j,k)∈mA i (ˆθij,k)2− ˆλ2 ⎫ ⎬ ⎭. (12)
6.2. Empiricalpowerfunctions
Inthissectionweconsider3-dimensionalfunctionsf andwetestfortheiratomicdimensionH0: δ (f ) = 1
versus H1: δ (f ) > 1. Thereforeweconsider thehyperbolicHaar waveletbasisso thatδBd(f ) = δ (f ).For
sake ofsimplicity, wegenerate them usingthe Soboldecomposition ofad-variate functionf ∈ L2([0,1)d)
into orthogonalsummandsofgrowing dimensions,
f (x1, . . . , xd) = d u=1 i1<···<iu fi1...iu(xi1, . . . , xiu). (13)
The marginal functional componentsare chosen to be univariate functions thatare frequently consid-eredinthe waveletliterature [2].The interactionterms areobtainedas weightedproduct ofthe marginal functions.Inthisexperimentwechoosethefollowingtestfunctionswedescribebelow.
A: f1(x1):‘blip’,f2(x2):‘blip’,f3(x3):‘blip’,
B: f1(x1):‘blip’,f2(x2):‘wave’,f3(x3):‘bumps’.
Weconsiderthenullhypothesiswithδ (f ) = 1 andtwofunctionsinthealternativewithatomicdimension
δ (f ) = 2 or δ (f ) = 3.
• underH0: δ (f ) = 1,f (x1, x2, x3) = f1(x1) + f2(x2) + f3(x3)
• underH1: δ (f ) = 2,f (x1, x2, x3) = f1(x1) + f2(x2) + f3(x3) + Df1(x1) f2(x2)
• underH1: δ (f ) = 3,f (x1, x2, x3) = f1(x1)+f2(x2)+f3(x3)+Df1(x1) f2(x2)+Df1(x1) f2(x2) f3(x3).
Here,Fig. 1givesanillustrationofsomeofthese testfunctions.
Thetotalenergyofthesefunctionsf (x1,x2,x3) isnormalized.Thentheobserveddataaregeneratedby
adding Gaussian whitenoise to the test functions.The empirical poweris computed as afunctionof the SNRas follows: P (j) = 1 M M m=1 1 Λ(δ0)m,SN Rj=1 ,
where the signal to noise ratio (SNR) is defined as the ratio of the standard deviation of the function values to the standarddeviation of thenoise. Λ(δ0)m,SN Rj isthe result ofthe procedureΛI(δ0),ε,α (with
Fig. 1. Three dimensional test functions that are used in the numerical experiments.
Fig. 2. Simulatedrejectionprobabilitiesunder H0 and H1 forbothsetsoftestfunctionsAandB,asafunctionofthesignalto
noiseratio.
N ∈ {32, 64},1≤ j ≤ 50 andwithM = 100 MonteCarloreplicationsateveryvaluesofSN Rj.Theresults
aresummarized inFig. 2.
Fig. 2 shows that for both test functions, the nominal level of the test at α = 5% under the null hypothesisH0 ismaintainedoverthedifferentvaluesoftheSNR.Underthealternative,weobserve firstly
that thedetection appearsat very lowSNR values,secondly, an increaseinthe samplesize improvesthe detection power of the method, and thirdly, underthe two different alternatives, when the energyof the function is more spreadover orientations (δ (f ) = 3),the detection power decreases. These tests confirm the proper calibration of thedata-driven adaptive method. Wecan now apply itto amorerealistic data example.
6.3. Applicationintimeseries analysis
NeumannandvonSachs[28]considertestingthestationarityofatimeseriesbystudyingthestructureof itstime-varyingspectrum.Theyformlocalcontrastteststatisticsinthetimedirectionalsobasedon hyper-bolic waveletcoefficientsof anestimatorofthetime-varyingspectraldensity,namelythepreperiodogram. A crucial problem for practical application of this method in testing stationarity is due to the inherent natureofthepreperiodogramdata whichsuffers frominterferences,verylowSNRandnonGaussian noise distribution. This maystrongly affect theperformancesof thedata-drivenchoice ofthetruncation scales. Adaptingourtestingmethodologytothiscontextisaninterestingresearchprobleminitself.Inthesequel, wetestthestructureofthespatio-spectrumofspatio-temporalprocessesthataretimestationary.Aspecific concern forthis applicationisthe highdimensionalsetting. Weface the problemof having smallp-values due to a large number of observations [23]. We illustrate how to properly use our results to assess the structureofsuchdata.Wefirstdefineaclassofspatio-temporalprocesses,itsspatio-spectrum,explainhow to estimatethisandfinally weapplyourdata-drivenmethodtosimulateddata.
6.3.1. Spatio-temporal model
Following Ombaoet al. [29], wedefine theCramérrepresentation ofaspatio-temporalprocess {Xt(u) ;
u = (u1, u2)∈ [0,1)2,t∈ Z},whereu isthespatialindex,asfollows:
Xt(u) = π
−π
A (u, ω) exp (iωt) dZ (ω) (14)
whereZ (ω) isacomplex-valuedstochasticprocesswithzeromeanandorthogonalincrements,whichsatisfies
E [dZ (ω)] = 0, Cov (dZ (ω1) , dZ (ω2)) = Dirac (ω1− ω2) dω1
whereDirac(.) meanstheDiracfunctionatmasspoint0 andA(u, λ) isthespatialcomplex-valuedtransfer function. It is Hermitian A(.,−ω) = A∗(., ω),where A∗ is the adjoint operator. Thelocation-dependent spectrumisgivenby
f (u, ω) :=|A (u, ω)|2. 6.3.2. Estimation
In a practical setting we observe the spatio-temporal process over a discrete grid of dimensions (N1× N2× T ) ∈ N3. Since we assume it to be time stationary, we can compute the bias-corrected
log-periodogramateveryspatiallocation
log ⎛ ⎝ 1 2πT T t=1
Xt(u) exp (iωt)
2⎞
⎠ + κ where κ= 0.57721 istheEuler–Mascheroniconstant,as inWahba [37].
Fig. 3. SpatialAR(1)parameterforthetimeseriesapplication.ThevalueoftheAR(1)parameterateachspatiallocationranges from0.2 (black)to0.99 (whitecolor).
From thesespatial log-periodogramswetest thestructureof thespatio-spectrumf (u, ω).Hereafterwe directly apply our methodology without adaptation for the non Gaussian nature of the log-periodogram ordinates. Thecentrallimittheorem inthe coefficientdomain permitsto consider thatthewavelet coeffi-cientsare approximatelynormally distributedupto comparativelyfine scales.Theactualprocedurecould be improvedinthatspecificcontext,for examplebyadaptingthemethodologyof Gao[9]inthe penaliza-tionofthedata-drivenselectionofthetruncation scales.Nevertheless,omittingthisadaptationhelpsusto illustratethepracticalapplicationofourmethodconsideringaslightlysuboptimalprocedure.
Intheprevioussectionwecheckedthatourmethodiswell-calibrated.Inthistimeseriesexample,weare awayfromthisidealisticframework.WearenowdealingwithperiodogramdatathatarenonGaussianand thespatio-spectrummayhaveacomplicatedstructure(intermsofenergyalongthedifferentorientations). Inthishigh-dimensionalandmorerealisticcontext,thep-valueshavenotlongeranypracticalimportance. For (very)large sample sizes, p-values are almost always extremely small, suggesting the rejection of the nullhypothesissince,withreallifedata,thesituationoftotalabsenceofinformationinsomeorientationis rarelyexactlytrue.Asolutioninthiscontextisto useourteststatisticvaluestodescribetheproportions tothetotalenergyofthefunctionthatareinthedifferentorientations,similarlyasinprincipalcomponents analysis.Thisistheapproachadoptedhereafterfortimeseriesdata examples.
6.3.3. Results
Wegenerate thefollowingtime seriesmodels using adiscretizedversion oftheirCramér representation givenbyequation(14).Weconsidertwo spatialAR(1) processes,i.e.,theirspatio-spectrumisgivenby
f (u, ω) = σ
2
2π
1− 2φ (u) cos ω + φ (u)2 −1
.
Let‘Model1’be aconstantparameteroverspace,i.e.,φ(u) = φ= 0.9, ∀u ∈ [0, 1]2,and‘Model2’tohave acomplexspatial AR(1)structure overspaceasillustratedinFig. 3.
Table 1givestheresultoftheMonteCarloexperimentsongeneratedtimeseriesdataforN1= N2= 128
andT = 128.Lookingattheenergyrepartitionforthelogarithm ofthetruespatio-spectrumunderModel 1, it is clear that all the information is just contained in the marginal function of frequency. Under the Model 2, one can realize that the difference in the energy inorientations involving the space variable is prettylow.Forwhatconcernstherepartitionestimationinthedifferentorientation,thereisnoproblemfor identifyingthefirst leadingorientation,nevertheless,undermodel2,wedonotget aclearmessageabout theseconddominantorientation.
Table 1
TruevsEstimatedrepartitionofthetotalenergy(in%). orientation
i
Model 1 Model 2
True repartition Estimated True repartition Estimated
(1, 0, 0) 0 0 (0.013) 0.1 0.1 (0.09) (0, 1, 0) 0 0 (0.01) 0 0.1 (0.078) (1, 1, 0) 0 0 (0.01) 0 0.1 (0.069) (0, 0, 1) 100 99.9 (0.014) 91.8 90.5 (0.150) (1, 0, 1) 0 0 (0.001) 4.9 3.3 (0.051) (0, 1, 1) 0 0 (0.001) 3.1 5.9 (0.074) (1, 1, 1) 0 0 (0.001) 0 0 (0.008) 7. Discussion
InthispaperwedescribehowtheSoboldecompositioninrelationtothegeometryofthemultidimensional hyperbolic wavelet basis allows to build simple test statistics with impressive theoretical performance in thepresenceofanisotropicestimand.Appropriatelycombined,these teststatisticsallowtotestforgeneral structuralpropertiessuchastheatomicdimension.Wedeterminetheminimaxratesforseparatingthenull andalternativehypothesisfordetectingtheSobolfunctionalcomponentincasesoffullorpartialknowledge aboutthesmoothnessparameter.Weproposetwosequencesoftestingproceduresthatarebasedondifferent knowledgesofthesmoothnessparameter(fullorpartiallyknown)andweshowthattheyareasymptotically optimalintheminimaxsense.Interestinglyweobserveontheonehandthatalossofinformationaboutthe smoothnessparameterisaccompaniedwithadeteriorationoftheminimaxrateofseparation;ontheother hand the set of functionsin thealternative hypothesis canbe larger.Finally, we describe amethodology foradatadrivenversionofthetestingprocedureanditsapplicationtotimeseriesdata.
Inthecontextoftimeseriesanalysis,weconsiderextendingourapplicationexampletodealwithspatial and time non-stationary processes. We are therefore interested in spatial time-varying spectrum of these processes.Wecancomputeateveryspatiallocationsanestimatorofthetime-varyingspectrumsuchasthe preperiodogram(seeNeumannandvonSachs[28]).NeumannandvonSachs[27]provideargumentsabout theasymptoticequivalenceofthepreperiodogramestimationproblem,theGaussianwhitenoisemodeland themultidimensionalnonparametricregressionmodelsothatourtheoryactuallycaneasilybeextendedto thiscontext.Nevertheless,inapracticalsetting,thenatureofthepreperiodogramdatarequirestodevelop anadaptedmethodologyforchoosingthetruncationscales.Incombinationwiththefullyadaptivetesting procedure, this method certainly encounters success for practical applications. Already inthe context of estimation of the time-varyingspectral density, the adaptive smoothing of thepreperiodogram is still an actual researchtopic(seeforinstance vanDelftandEichler[36]).
Acknowledgments
The authors wouldlike to thank the two reviewers for useful commentsand suggestionson a prelimi-nary versionofthe manuscript.G.Claeskens and J.-M.Freyermuthacknowledge thesupportof theFund for Scientific Research Flanders, KU Leuvengrant GOA/12/14 and of the IAP Research Network P7/06 of the Belgian Science Policy. Jean-Marc Freyermuth’s and John Aston’s research was supported by the EngineeringandPhysical SciencesResearch Council[EP/K021672/2].
Appendix A
A.1. Proofof Theorem 4.1
TheproofofTheorem 4.1isadirectconsequenceofPropositions A.1 andA.2thatrespectivelydealwith theupperbound andthelowerboundoftheminimaxresult.
PropositionA.1.Under thesameassumptionsof Theorem 4.1andforany C > Ci,α,thefollowing inequal-ities hold sup 0<ε<o sup f∈Ni(R) Pf Δi,j∗(tε,i,α) = 1 = α 2 and0<ε<supo sup f∈Ai(R,C,s,rε,i) Pf Δi,j∗(tε,i,α) = 0 ≤ α 2.
RemarkA.1.TheconstantCi,αcorrespondstotheabsolutepositiveconstantCi,αappearinginDefinition 3.1
andwill begivenattheend oftheproofbelow.
Proof. Consider ε ∈ (0,εo). Suppose thatf ∈ Ni(R). Then ε−2Ti,j∗ has a Chi-Square distribution with
2|j∗|degreesoffreedom.Hence
Pf Δi,j∗(tε,i,α) = 1 =Pf Ti,j∗> tε,i,α =Pf ε−2Ti,j∗> χ1−α 2 2|j∗|=α 2.
Supposenow thatf ∈ Ai(R,C,s,rε,i).Then ε−2Ti,j∗ is anoncentral Chi-Squaredistributionwith2|j ∗|
degreesoffreedom.Moreover,ifVji∗ characterizesthefollowinglinearspanof
ψj,ki : j ∈ Ji: ju< max(ju∗, 1),∀u and k ∈ Kj
, then, Ef(Ti,j∗) = ε22|j ∗| + j∈Ji: ju<max(j∗ u,1),∀u k∈Kj θij,k2= ε22|j∗|+ P rojVi j∗(fi) 2 2, Varf(Ti,j∗) = 2ε42|j ∗| + 4ε2 j∈Ji: ju<max(j∗ u,1),∀u k∈Kj θij,k 2 = 2ε42|j∗|+ 4ε2 P rojVi j∗(fi) 2 2.
ByconsideringtheCramér–Chernoffmethod(seeChapter2ofMassart[25]),weeasilyobtainthefollowing concentrationinequalityforthenoncentralChi-Squaredistributionwith2|j∗|degreesoffreedom,
Pf Ef(Ti,j∗)− Ti,j∗≥ u ≤ exp − u2 2Varf(Ti,j∗) , ∀u > 0. So, Pf Δi,j∗(tε,i,α) = 0 =Pf Ti,j∗≤ tε,i,α =Pf
Ef(Ti,j∗)− Ti,j∗≥ Ef(Ti,j∗)− tε,i,α
≤ exp ⎛ ⎜ ⎝− Ef(Ti,j∗)− tε,i,α 2 2Varf(Ti,j∗) ⎞ ⎟ ⎠ = exp ⎛ ⎜ ⎜ ⎜ ⎝− ε22|j∗|+ P roj Vj∗i (fi) 22− tε,i,α 2 4ε42|j∗|+ 8ε2 P roj Vj∗i (fi) 22 ⎞ ⎟ ⎟ ⎟ ⎠ = exp ⎛ ⎜ ⎜ ⎜ ⎝− ε22|j∗|+ fi 22− fi− P rojVi j∗(fi) 2 2− tε,i,α 2 4ε42|j∗|+ 8ε2 f i 22 ⎞ ⎟ ⎟ ⎟ ⎠.
Remarkthatfi belongstoB2,s∞(R) becausef does.
Since fi 22 ≥ C2rε,i2 andχ1−α
2
2|j∗|≤ 2|j∗|+ 22|j∗|log(2α−1)
1 2
(seetheexponential inequalityfor Chi-SquaredistributiongiveninLemma1 ofLaurentandMassart[21]),onegets,
− logPf Δi,j∗(tε,i,α) = 0 −1 ≤ 4ε42|j ∗| + 8ε2 f i 22 ε22|j∗|+ f i 22− fi− P rojVi j∗(fi) 2 2− tε,i,α 2 ≤ 4ε42|j ∗| + 8ε2 f i 22 ε22|j∗|+ f i 22− R22−2|j ∗|γi − tε,i,α 2 ≤ 4ε42|j ∗| (C2− R2)2−2|j∗|γi− 2ε22|j∗|log(2α−1)12 2 + 8ε2 f i 22 fi 22− R22−2|j ∗|γi − 2ε22|j∗|log(2α−1)12 2 = 4ε 42(1+4γi)|j∗| C2− R2− 21+|i|2 (log(2α−1)) 1 2 2 + 8C4ε2 fi −22 C2− R2− 21+|i|2 (log(2α−1)) 1 2 2 ≤ 22+(1+4γi)|i|+ 8C2e − 2e 1+4γi C2− R2− 21+|i|2 (log(2α−1))1 2 2 (15) ≤log2α−1−1
provided that C ≥ Ci,α. The constantCi,α is the largest value of C such that thelast inequalitycan be
replaced by anequality. That leadsto Pf
Δi,j∗(tε,i,α) = 0
≤ α
2 for any C large enoughand uniformly
in f .
Notethatthesmallerα thelargerCi,α.Thisremarkisnotasurpriseobviouslybecausewhenα ischosen
to besmall,theproblemofdetectionbecomesharder.Neverthelesstheorderoftheminimaxratedoesnot depend onα.Wealsonote that(15)isobtainedbecauseweassumedthatε< e−e. 2
PropositionA.2.UnderthesameassumptionsofTheorem 4.1andforanyC < ci,α,thefollowinginequality
holds inf 0<ε<o inf Δ sup f∈Ni(R) Pf(Δ = 1) + sup f∈Ai(R,C,s,rε,i) Pf(Δ = 0) > α.
RemarkA.2.Theconstantci,αcorrespondstotheabsolutepositiveconstantci,αappearinginDefinition 3.1
and willbegiven attheendoftheproofbelow.
Proof. For anyε∈ (0,εo) let usconsider
fζ = k∈Kj∗ θij∗,kψij∗,k = k∈Kj∗ L d " u=1 2−j∗u(γi+12)ζ k ψji∗,k
where L> 0 andζk ∈ {−1,1} forany k.Thevaluesju∗ arechosenasintheupperbound. Werecallthat
2−ju∗≤ ε
4iuγi
Firstwecheck thatfζ isinthealternative, thatisto sayitbelongs tothe Besovball ofradius R with
theexpectedparameter ofregularity andit isseparated from 0 sufficiently.Accordingto thedefinitionof theBesovballwithradius R,wehavetocheck that
max 1≤u≤d{2 2ju∗su} k∈Kj∗ θi j∗,k 2 < R2.
Becauseof ourchoiceofθij∗,k,wehave
max 1≤u≤d{2 2j∗usu}2du=1j∗uL2 d " u=1 2−2ju∗(γi+12)= L2 max 1≤u≤d{2 2ju∗su} d " u=1 2−2ju∗γi≤ 2min1≤u≤dsuL2.
A first condition on theconstant L is thatit hasto be chosen smaller than √ R
2minu su. It ensures thatfζ
belongstotheBesovballunderconsideration.
Nextwecomputethesquare oftheL2-distance betweenf
ζ and0.Wehave k∈Kj∗ θij∗,k 2 = L2 d " u=1 2−2j∗uγi ≥ L22−2|i|γiε 8γi 4γi+1.
ThereforeL hastobe largerthan2|i|γiC.Thiskindofchoiceguaranteesthat f
ζ 2≥ Crε,i.
FollowingPropositions2.11 and2.12 ofIngsterandSuslina[17], wehavethat:
inf Δ sup f∈Ni(R) Pf(Δ = 1) + sup f∈Ai(R,C,s,rε,i) Pf(Δ = 0) ≥ 1 −1 2 # EEπ Lfζ 2 − 1 (16) whereπ is anypriorprobabilitymeasure concentratedontheset ofalternativesAi(R, C, s, rε,i) and
Lfζ = exp ε−2 fζ(x) dY (x)− 1 2ε −2 f ζ(x)2dx .
Therefore,accordingto(16),itsufficestoprovethatforjudiciouschoicesofπ wecanget
EEπ
Lfζ
2
< 4(1− α)2+ 1. (17)
Weconsidernowtheprobabilitymeasureπ suchthattheζk’sareindependentRademacherrandomvariables
withparameter 12.Notethat
Lfζ = exp ε−2 fζ(x) dY (x)− 1 2ε −2 f ζ(x)2dx = exp −1 2ε −2L22|j∗| d " u=1 2−2ju∗(γi+12) " k∈Kj∗ exp ε−1L ζk ξji∗,k d " u=1 2−ju∗(γi+12) .
Eπ Lfζ = exp −1 2ε −2L22|j∗| d " u=1 2−2j∗u(γi+12) × " k∈Kj∗ 1 2 $ exp ε−1L ξji∗,k d " u=1 2−ju∗(γi+12) + exp −ε−1L ξi j∗,k d " u=1 2−ju∗(γi+12) % .
AsasecondstepwetaketheexpectationaccordingtotheindependentstandardGaussianrandomvariables
ξji∗,k: EEπ Lfζ 2 = " k∈Kj∗ cosh ε−2L2 d " u=1 2−ju∗(2γi+1) (18) ≤ exp 1 22 |j∗| ε−4L4 d " u=1 2−2j∗u(2γi+1) ≤ exp L4 2 < 4(1− α)2+ 1.
The last inequality is obtained by choosing L < min2 log(1 + 4(1− α)2)14 ,√ R
2minusu
. So (17) is proved,when consideringC smallenough,thatis C≤ ci,α= 2−|i|γiL.
A.2. Proofof Theorem 4.2
Theproof ofTheorem 4.2follows thesamepathas theproofofTheorem 4.1. Itisadirectconsequence of Proposition A.3and Proposition A.4thatrespectivelydealwiththe upperbound andthelower bound of theminimaxresult.
PropositionA.3. UnderthesameassumptionsofTheorem 4.2andforanyC > Ci,α ,thefollowing inequal-ities hold sup 0<ε<o sup f∈Ni(R) Pf Δi,max tε,i,γi,α = 1 ≤ α 2 and sup 0<ε<o sup s: uius−1u =γ−1i sup f∈Ai R,C,s,rε,i Pf Δi,max tε,i,γi,α = 0 ≤ α 2.
RemarkA.3.TheconstantCi,α correspondstotheabsolutepositiveconstantCi,α appearinginDefinition 3.2 and willbegiven attheendoftheproofbelow.
Proof. Consider ε ∈ (0,εo). Suppose that f ∈ Ni(R). Then, for any j ∈ Ji, ε−2Ti,j has a Chi-Square
distributionwith2|j|degreesoffreedom.Hence
Pf Δi,max tε,i,γi,α= 1=Pf max j∈Ji:|j|=J i Δi,j tε,i,γi,α= 1 ≤ j∈Ji:|j|=Ji Pf Δi,j tε,i,γi,α = 1 = j∈Ji:|j|=J i Pf Ti,j> tε,i,γi,α
= j∈Ji:|j|=J i α 2Kγi =α 2.
Suppose now that f ∈ Ai(R,C,s,rε,i) where s :
uius−1u = γi−1. Then, for any fixed j ∈ Ji such
that|j|= Ji, ε−2Ti,j is anoncentralChi-Square distributionwith2Ji degrees offreedom. Hence,ifVji
characterizesthefollowinglinearspanof
ψj,ki : j ∈ Ji: ju< max(ju, 1),∀u and k ∈ Kj
, then, Pf Δi,max tε,i,γi,α = 0 ≤ Pf Ti,j ≤ tε,i,γi,α =Pf
Ef(Ti,j)− Ti,j ≥ Ef(Ti,j)− tε,i,γi,α
≤ exp ⎛ ⎜ ⎝− Ef(Ti,j)− tε,i,γi,α 2 2Varf(Ti,j) ⎞ ⎟ ⎠ = exp ⎛ ⎜ ⎜ ⎜ ⎝− ε22Ji+ P roj Vji(fi) 2 2− tε,i,γi,α 2 4ε42Ji+ 8ε2 P roj Vji(fi) 22 ⎞ ⎟ ⎟ ⎟ ⎠.
Since fi 22≥ C2(rε,i )2andχ1−α/2Kγ
2Ji≤ 2Ji+22Jilog(2α−1K γi)
1 2
(see[21])with(K0log ε−1)|i|−1≤
Kγi ≤ (K1log ε−1)|i|−1 for some K0,K1 > 0, when computing similar calculus as in the non adaptive
framework,weget − logPf Δi,max tε,i,γi,α = 0 −1 ≤ 4ε42Ji+ 8ε2 fi 22 ε22Ji+ f i 22− fi− P rojVji(fi) 22− tε,i,γi,α 2 ≤ 4ε42Ji+ 8ε2 fi 22 ε22Jγ+ f i 22− R22−2Jiγi− tε,i,γi,α 2 ≤ 4ε42Ji+ 8ε2 fi 22 fi 22− R22−2Jiγi− 2ε2 2Jilog(2α−1K γi 1 2 2 = 4ε 42(1+4γi)Ji C2− R2− 23 2(log(2α−1) + d(log K1+ 1)) 1 2 2 + 8C4ε2 fi −22 C2− R2− 23 2(log(2α−1) + d(log K1+ 1)) 1 2 2 ≤ 22+(1+4γi)(log log ε−1)−1 C2− R2− 232(log(2α−1) + d(log K1+ 1)) 1 2 2 + 8C4ε2 f i −22 C2− R2− 232(log(2α−1) + d(log K1+ 1)) 1 2 2 ≤ 22+(1+4γi)+ 8C2e − 2e 1+4γi C2− R2− 232(log(2α−1) + d(log K1+ 1)) 1 2 2 (19) ≤log2α−1−1
provided thatC≥ Ci,α .TheconstantCi,α isthevalue ofC for whichthelast inequalitycanbe replaced by anequality.Thatleadsto Pf
Δi,max
tε,i,γi,α= 0≤ α2 forany C largeenough anduniformly inf .
Wealsonote that(19)isobtainedbecauseweassumedthatε< e−e. Note, onceagain,thatthesmallerα the largerCi,α .
PropositionA.4. UnderthesameassumptionsofTheorem 4.2andforanyC < ci,α,thefollowinginequality holds inf 0<ε<o inf Δ ⎛ ⎝ sup f∈Ni(R) Pf(Δ = 1) + sup s:uius−1u =γi−1 sup f∈Ai R,C,s,rε,i Pf(Δ = 0) ⎞ ⎠ > α.
RemarkA.4.Theconstantci,αcorrespondstotheabsolutepositiveconstantci,αappearinginDefinition 3.2 and willbegiven attheendoftheproofbelow.
Proof. For anyε∈ (0,εo) let usconsiderthefamilyofd-upleJ =
j∈ Ji:|j| = J i . Remember that: • Ji issuchthat2−Ji ≤
ε4log log ε−11+4γi1 < 21−Ji,
• #J = Kγi ≥ (K0log ε−1)|i|−1 forsomeK0.
Forj ∈ J ,define fζ,j= k∈Kj θij,kψj,ki = k∈Kj L d " u=1 2−ju(γi+12)ζ k ψj,ki
where L> 0 andζk ∈ {−1,1} forany k.
As in the proof of Proposition A.2, we can easily check that fζ,j is in the alternative, provided that
2γC≤ L≤√R 2γi. Since inf Δ ⎛ ⎝ sup f∈Ni(R) Pf(Δ = 1) + sup s:uius−1u =γi−1 sup f∈Ai R,C,s,rε,i Pf(Δ = 0) ⎞ ⎠ ≥ 1 −1 2 & ' ' ' ' (E ⎛ ⎜ ⎝ ⎛ ⎝ 1 Kγ j∈J Eπ Lfζ,j ⎞ ⎠ 2⎞ ⎟ ⎠ − 1 (20)
where π isapriorprobabilitymeasure concentratedontheset ofalternativesAi
R, C, s, rε,i and Lfζ,j = exp ε−2 fζ,j(x) dY (x)− 1 2ε −2 f ζ,j(x)2 dx .
Therefore, accordingto (20),itsufficesto proveonceagainthatforjudiciouschoicesof π wecanget
E ⎛ ⎜ ⎝ ⎛ ⎝ 1 Kγ j∈J Eπ Lfζ,j ⎞ ⎠ 2⎞ ⎟ ⎠ = K12 γ E j∈J Eπ Lfζ,j 2 < 4(1− α)2+ 1. (21)