OATAO is an open access repository that collects the work of Toulouse
researchers and makes it freely available over the web where possible
Any correspondence concerning this service should be sent
to the repository administrator:
[email protected]
This is an author’s version published in:
http://oatao.univ-toulouse.fr/25360
To cite this version:
pilastre, Barbara and Boussouf, Loic and Escrivan, Stéphane d'
and
Tourneret, Jean-Yves
Anomaly detection in mixed telemetry data using a
sparse representation and dictionary learning.
(2020) Signal Processing, 168.
1-10. ISSN 0165-1684
Anomaly detection in mixed telemetry data using a sparse
representation and dictionary learning
Barbara Pilastrea·•. Loïc Boussoufh, Stéphane D'Escrivand, Jean-Yves Tournerecc,a •Té.SA laborarory, 7 boulevard de la Gare, Toulouse, 31500, France
b Airbus Defense and Space, 31 rue des Cosmonaures. Toulouse 31400, France <JNP-ENSEEJHT/IRIT. 2 rue Otaries Carnichel Toulouse, 31071, France
d Cenrre Narional d'Erudes Spatiales (CNES), 18 av Edouard Belin, Toulouse CEDEX 9 31401, France
ARTICLE INFO ABSTRACT
Keywords:
Spacecraft health monitoring Anomaly detection Sparse representation Dictionary learning Shirt-Invariance
Spacecraft he.alth monitoring and failure prevention are major issues in space operations. ln recent years,
machine le.aming techniques have received an increasing interest in many fields and have been applied
to housekeeping telemetiy data via semi-supervised leaming. The idea is to use past telemetry describing
normal spacecraft behaviour in order to learn a reference mode( to which can be compared most recent data in order to detect potential anomalies. This paper introduces a new machine learning method for anomaly detection in telemetry time series based on a sparse representation and dictionary Jearning. The
main advantage of the proposed method is the possibility to handle multivariate telemetry time series described by mixed continuous and discrete parameters, taking into account the potential correlations be
tween these parameters. The proposed method is evaluated on a representative anomaly dataset obtained
from real satellite telemetry with an available ground-truth and compared to state-of-the-art algorithrns.
1. Introduction
Spacecraft health monitoring and failure prevention are major
issues in space operations. By monitoring housekeeping telemetry
data, an anomaly affecting an equipment, a system or a sub-system
can be detected from the abnormal behaviour of one or several
telemetry parameters. A simple method for detecting anomalies in
telemetry is the well-known out-of-limits (OOL) checking, which
consists of defining an upper and a lower bound for each param
eter and checking whether the values of this parameter exceed
these bounds. This method is very simple and useful but has also
some limits. lndeed, the determination of bounds for each param
eter can be difficult and costly given the number of spacecraft sen
sors. Moreover, ail anomalies are not detected by the OOL checking,
e.g, when the parameter affected by an anomaly does not exceed
the predefined bounds. An example of anomaly not detected by
OOL checking is displayed in
Fig. 1
(box #2).
Anomaly detection (AD) is a huge area of research given its di
verse applications. Recent years have witnessed a growing inter
est for data-driven or machine Ieaming
(ML)
techniques that have
been used as effective tool for AD
(1-5).Motivated by this success,
some
ML
methods have been applied to housekeeping telemetry
• Corresponding author.
E-mail address: [email protected] (B. Pilastre). hnps :/ /doi.org/ 10.1016/j.sigpro2019.107320
after an appropriate preprocessing step
(6-10).These methods usu
ally consider a semi-supervised learning that can be outlined in
two steps: 1) learning from past telemetry describing only nomi
nal spacecraft events and 2) detecting abnormal behaviour in the
different parameters by an appropriate comparison to the mode(
Iearned in step 1 ).
ML-based algorithms for AD in telemetry can be divided in two
categories depending on their application to univariate or multi
variate data. Univariate AD strategies process the different teleme
try parameters independently, which is the most widely used ap
proach. Popular ML methods that have been investigated in this
framework include the one-class support vector machine
(7)
, near
est neighbour techniques
(8-10)or neural networks
[11.12)
. These
solutions showed competitive results and improved significantly
spacecraft heath monitoring. However, in order to improve AD in
telemetry, it is important to formulate the problem in a multivari
ate framework and take into account possible correlations between
the different parameters, allowing contextual anomalies to be de
tected. An example of contextual anomaly is shown in
Fig. 1
(box
#7). The detection of this kind of abnormal behaviour requires a
multivariate detection rule. Sorne recent multivariate AD are based
on feature extraction and dimensionality reduction
(13)
or on a
probabilistic model for mixed discrete and continuous telemetry
parameters
(14)
.
This paper studies a new AD method based on a sparse data
representation for spacecraft housekeeping telemetry. This method
Fig.1. Examplesofunivariateandmultivariateanomalies(highlightedinredboxes)thatareconsideredinthiswork.(Forinterpretationofthereferencestocolourinthis figurelegend,thereaderisreferredtothewebversionofthisarticle.)
is inspired by the works conducted in [15]. However, it has the advantage of handling mixed continuous and discrete data, and taking into account possible correlations between the different telemetryparametersthankstoanappropriatemultivariate frame-work. The proposed algorithm requires to build a dictionary of normalpatterns.New telemetrysignalscan then be decomposed intothisdictionaryusinga sparserepresentationallowing poten-tialanomaliesto be detectedby analyzingthe residuals resulting fromthissparsedecomposition.
The paper is organized as follow. Section 2 introduces the context of AD in mixed telemetry data considered in this work.
Section3briefly summarizesthetheoryofsparserepresentations and dictionary learning. Section 4 introduces the proposed AD methodadapted to mixed continuousanddiscrete telemetry pa-rameters. Section 5 evaluates the performance of the proposed methodusingaheterogeneousanomaly datasetwithacontrolled ground-truth.Acomparisontootherstate-of-arttechniquesshows thepotential of usinga sparse representationon a dictionary of normal patterns for detecting abnormal behaviour in telemetry. ConclusionsandfutureworkarereportedinSection6.
2. Anomalydetectionfortelemetry 2.1. Characteristics of spacecraft telemetry
Spacecraft telemetry consists of hundred to thousand house-keepingparameters. Alltheseparameters are quantized andtake theirvaluesin adiscrete set.However, itmakes sense tomakea distinctionbetweenparameterstakingfewdistinctvalues(suchas equipment operating modesor status), which can be considered asobservationsofdiscrete randomvariables,andparametersthat canbeconsidered asobservationsofcontinuousrandomvariables (suchastemperature, voltage,pressure etc.). Detectinganomalies in telemetry requires to consider these discrete and continuous randomvariables jointlyleadingto whatwe will callmixed vec-tors,in the sense that they contain discrete andcontinuous ran-domvariables.
In order to take into account relationships between the dif-ferentparameters,it isnecessary tolearn their behaviour jointly, which requires to consider vectors belonging to a possibly high dimensional subspace. Considering high-dimensional data leads
to major issues such as the well known curse of dimensional-ity[16,17]. Notealso that thishigh-dimensionality hasbeen con-sideredin manyrecent workssuch asthose devoted tobig data
[18,19].
Anotherimportantcharacteristicsoftelemetrydataisthatthey are generallysubjected toseveral preprocessings.As an example, it is quite classical to remove trivialoutliers caused by errors in the dataconversion andtransmission usingsimple outlier detec-tionmethods [14].Some telemetryparameterscan havebeen re-sampled to account for the fact the data have been acquired at different sampling frequencies. In addition, some reconstruction methods mayhave beenapplied to compensate for missingdata
[14,20].Finally,itisinterestingtonotethatadditional preprocess-ing isnecessary forthelearningphase to selecttelemetrywhich describesonlyusualnormalbehaviorofthespacecraft.Indeed, be-haviorsrepresentingrareoperations,e.g.,destockingorequipment calibration operations(abnormal inan other context) arenot se-lectedforlearning.
2.2. Anomalies in telemetry
Anomaliesthatoccurinhousekeepingtelemetrydatacanbe di-videdin twocategories thatcan be referred toasunivariate and multivariateanomalies.Univariateanomaliescorrespondtoan un-usual individual behaviour (never seen before) affectingone spe-cific parameter. Univariate anomalies can be classified in three maincategories[1]summarizedbelow
• Collectiveanomalies:acollectionofconsecutivedatainstances ortimeseriesconsideredasanomalouswithrespecttothe en-tiresignal.Twoexamplesofcollectiveanomaliesaredisplayed inFig.1(boxes#1and#4).
• Point anomalies: an individual data instance considered as anomalouswithrespecttotherestofthedata.Apointanomaly istheeasiest to detectbecauseit corresponds toan excessive valueofindividualsamples.Itisnotnecessarytoobservea col-lectionof time samplesto detect thiskind of anomaly.Point anomalies can be generally detected by simple thresholding, e.g., usingthe OOL AD method.Two examples ofconsecutive pointanomaliesaredisplayedinFig.1(boxes#3and#5).
• Univariatecontextualanomalies:anindividualdatainstanceor atimeseriesconsideredasanomalousinaspecificcontext,but nototherwise.Fig.1displaysexamplesofcontextualanomalies forconsecutivedatainstances(box#2)ortimeseries(box#6). Note thatcollective anomalies andsome individual contextual anomaliesmaynotbedetectedifdatainstancesareprocessed in-dependently.Thedetectionoftheseanomaliesrequirestoconsider collectionsofdatainstancesortimeseries.
A multivariate or contextual anomaly results from a parame-terwhosebehaviourhasneverbeenobservedjointlywiththe be-haviour ofone orseveralother parameters recorded atthesame time. Fig. 1 (boxes #4 and #7) shows examples of contextual anomalies. Notethat the anomaly ofFig.1 inbox #7is a multi-variatecontextualanomalythataffectsasetoftworelateddiscrete and continuousparameters. Notealso that the top signal is sup-posed toevolve differently dependingon thestatus ofan equip-ment (thatcanbe ONorOFF) associatedwiththebinarybottom parameter.Inthisexample,theexpectedbehaviour ofthe contin-uousparameterisnotobservedintheredbox,whichcorresponds toamultivariatecontextualanomaly.Thedetectionofthiskindof anomaly requirestoworkinamultivariateframework inorderto learn the behaviour ofmultiple parameters. The objectiveof this workistopropose aflexibleAD methodabletodetectunivariate aswellasmultivariateanomaliesaffectingtelemetry.
3. Sparserepresentationsanddictionarylearning
Sparserepresentationshavereceivedan increasingattentionin manysignalandimageprocessingapplications.Theseapplications includedenoising[21–23],classification [24–26]orpattern recog-nition [27–29].The use ofsparse representations forAD ismore original andhasbeen consideredin lessapplicationssuch as hy-perspectralimaging[30],detectionofabnormalmotionsinvideos
[31], irregularheartbeat detectionin electrocardiograms (ECG)or specular reflectance and shadowremoval in natural images [15]. The next part of this section recalls some basic elements about sparserepresentationsanddictionarylearning.
3.1. Sparse representations
Buildingasparserepresentation(alsoreferredtoas sparse cod- ing ) consists in approximating a signal y∈RN as y≈
x, where
∈R N×L is an overcomplete dictionary composed of L columns
called atoms , and x∈RL is a sparse coefficient vector. In others
words,thesignaly isexpressedasasparselinearcombinationof few atomsof the dictionary
. Once the dictionary
hasbeen determined,thesparserepresentationproblemreducestoestimate thesparsecoefficientvectorxbysolvingthefollowingproblem
x =argmin
x
y −x 2
2 s.t.
x 0≤T (1)where
.0 is the 0 pseudo-norm which counts the number ofnon-zeroentriesofx,
.2isthe2norm, T istheallowednumberofnon-zerosentriesofxand“s.t.” means“subjectto”.
Problem(1)isNP-hardandcanbesolvedbygreedyalgorithms such asmatching pursuit(MP)[32],orthogonal matchingpursuit (OMP)[33],orbyconvexrelaxation.Convexrelaxationreplacesthe
0pseudo-normbythe1 normdefinedby
x1=Ll=1|
x l|
,lead-ing to a convex problemwhose solution can be computedusing algorithms such astheleastabsoluteshrinkageandselection op-erator(LASSO)[34].
3.2. Dictionary learning
The qualityoftheapproximation y≈
x stronglyreliesonthe choice of the dictionary
. Dictionaries can be divided in two
mainclasses corresponding to parametric anddata-driven dictio-naries. Parametricdictionaries arecomposed of fixed atomssuch as wavelets, curvelets, contourlets or short-time Fourier trans-forms.Data-driven dictionarieslearnthedictionaryfromthedata, whichhasshowntobe interesting inmanypractical applications
[35].Thispaperfocusesonasemi-supervisedframeworkinwhich thedictionaryislearnedfromcleandatawhichdonotcontainany anomaly.Aclassicwaytolearnadictionaryfromthedataistouse data analysismethods such asthe well known principal compo-nentanalysis(PCA). However, moreefficientdata-drivenmethods fordictionarylearning(DL),oftenreferredtoasDLmethods,have attractedmanyattentioninrecentyears.Thesemethodslearn dic-tionariestailoredforsparserepresentationsbysolvingthe follow-ingproblem
x ,
=argmin
x,
y −x
2
F s.t.
x 0≤T. (2)ClassicalDL algorithms alternate between the estimation ofx
(inafirststepofsparsecoding)andtheestimationof
(ina sec-ondstepofdictionaryupdate).ManyefficientDLalgorithms have beenproposed intheliterature includingK-SVD[36]oronlineDL (ODL)[37].InK-SVD,thesparsecodingstepisdoneusingagreedy algorithm.Inthesecondstep,thedictionaryandthesparsevector areestimated usinga singularvalue decomposition(SVD), allow-ingthecolumnsof
aswellastheassociatedcoefficientsofxto beupdated.TheODLalgorithmhasbeendesignedtolearn dictio-nariesfromlargeanddynamicdatasets,usingasparsecodingstep performedbythe LARS-LASSOalgorithm[38,39] andadictionary updateusingblock-coordinatedescentwithwarmrestarts.
Unfortunately K-SVD and ODL algorithms have not been de-signedformixeddiscreteandcontinuousdataandthuscannotbe used for telemetrydata. Learning a dictionary with discrete and continuousatomsisaninterestingandchallengingproblem. How-ever,thedictionary canalsobe builtfromrepresentativetraining signalsthat are not affected by anomalies. Inthis work, the dic-tionaryhasbeenbuiltfrom“normal” vectors(thatarenotaffected byanomalies)belongingtoatrainingdatabase.Extendingstandard DLmethods (suchasK-SVDandODL)tomixeddatawill be con-sideredinfuturework.
4. Proposedanomalydetectionmethodformixedtelemetry
This section describes the proposed Anomaly Detection using DICTionary(ADDICT)algorithmwhichisanAD methodformixed data.Wefocushereonthedetectionstepandassumethatthe dic-tionaryhasbeenlearnedinapreviousstepusingtelemetrysignals associatedwithanormalspacecraftbehaviour.
4.1. Preprocessing
Telemetry timesseries acquiredatthe same time instant and considered as part of the same context are first segmented into overlappingwindowsoffixed size w withashift
δ
(withan over-lappingarea equal to w −δ
) asillustrated inFig.2.The resulting matricesarethenconcatenatedintovectorsyieldingmixedvectors whose components are discrete or continuous dependingon the considered parameter.One concatenatedvector thus representsa specificcontext containinginformationfrombothcontinuousand discrete signals on a duration w . Given this preprocessing,input data for AD are mixed signals composed of telemetry time se-riesformedbythedifferentparameters,i.e.,y=[yT1,...,yTK]T with
yk∈ R w,k =1,...,K,where K isthe numberoftelemetry
param-etersand w isthesizeofthetimewindow.Tosimplifynotations, the N D firstcomponentsofthemixedsignaly∈RN arecomposed
ofthediscreteparameterswhereasthelast N C componentsare
Fig.2. Segmentationoftelemetryintooverlappingwindows.
otherwords,themixedsignalispartitionedintodiscreteand con-tinuouscounterpartsdenoted asyD=[y
(
1)
,. . .,y(
N D)
]T andyC=[y
(
N D+1)
,. . .,y(
N D+N C)
]T such thaty=(
yTD,yTC)
T. 4.2. Anomaly detection using a sparse representationAmixeddictionary
∈R N×2Lcomposedofdiscreteand
contin-uousatomsisdefinedas
=
D 0
0
C
where
D and
C containthe discreteandcontinuousdictionary
atoms, respectively. The two dictionaries
D∈RND×L and
C∈ R NC×Lhavebeenextractedfromadictionarycomposedof L mixed
atomstakingintoaccountpossiblecorrelationbetweenthe differ-entparameters, especiallybetweendiscrete andcontinuousones. Inotherwords,the l thdiscreteatomofthediscretedictionary
D
andthe l thcontinuousatomofthe continuousdictionary
C are
composedof discrete andcontinuousbehavioursobserved inthe samemixedatom, whichare potentially correlatedThe proposed ADstrategydecomposesthemixedsignalasfollows
y =
x +e +b (3)
where
x is the nominal part of y, e=[eT
D,eTC]T is a possible
anomaly signal (e=0 in absence of anomaly), x=[xT
D,xTC]T is a
sparsevectorandb∈RN isan additivenoise. Theproposed
algo-rithmappliestwo distinctstrategiestoestimate thenominalpart ofy processingxD andxC differently.Theanomaly signaleis
esti-matedbyanalyzingresidualsresultingfromthissparse decompo-sition.Theproposeddetectorassumesthattheanomaliesaffecting telemetrydataareadditive,whichisgenerallythecase.Notethat theproposedmodel(3)providesaspecificstructureoftheresidue
e,whichallowsitsnonzerovaluestobeidentified.Thesenon-zero valuescorrespondtotheparametersaffectedbytheanomalies.
Thenominalcomponentofy isapproximatedbylinear combi-nationsofatomsdescribingonlynominalbehavioursofthe differ-entparameters,whichcanbewrittenas
x =
Dx D
Cx C . withxD∈R N D
andxC∈ R NC The discrete and continuous
counter-parts of the test signals will be approximated by two distinct strategies. However, it is important to preserve existing relation-shipsbetweenthesignal parameterstoallow forthedetectionof contextualanomalies.Tothisend,weproposetoestimatethe dis-creteapproximation
DxDandtheanomalysignaleDinafirststep
(leadingto estimatorsdenoted aseD andxD) andthecontinuous
approximation
CxC andthe anomaly signal eC in a second step
basedoneD andxD.Giventheproposedpreprocessing,thesignals eD andeC are dividedinto K D and K C discreteandcontinuous
pa-rameters,i.e.,eD=[eTD,1,...,eTD,KD]T andeC=[eTC,1,...,eTC,KC]T. 4.2.1. Sparse coding for discrete atoms
Inordertosolvethesparsecodingfordiscreteatoms,we pro-posetosolvethefollowingproblem
arg min xD∈B,eD∈RND
y D−Dx D−e D 22+bD KD k=1 eD,k2 (4)
where
eD,k2,k =1,...,K D is the Euclidean norm, eD,kcorre-spondstothe k thtime-seriesofeD associatedwiththe k th
param-eterand b D isaregularizationparameterthatcontrolsthelevelof
sparsityof eD. The sparsityconstraintfor the anomaly signal
re-flects the fact that anomalies are rare andaffect few parameters atthe sametime. Notethat thediscrete vector xD isconstrained
to belong to B, where B is the canonical or natural basis ofRL,
i.e.,B=
{
l,l =1,· · ·,L
}
,wherelisavectorwhose l thcomponent
equals 1 and whose other components equal 0. In other words, only one atom of the discrete dictionary
D ischosen to
repre-sent the discrete signal, thisamounts to looking forthe nearest neighbourofyD in thedictionary. Thisstrategy hasproved to be an effective method to reconstruct discrete signals (compared to a representationusinga linearcombinationof atoms),which ex-plainsthischoice.SincexD belongstoafiniteset,itsestimationis
combinatorialandcanbesolvedforeachatom
φ
D,l (whereφ
D,l isthe l thcolumnof
D)asfollows eD,l=argmine D,l y D−
φD
,l−e D,l22+bD KD k=1 eD,k2. (5)The solutionofthe optimizationproblem(5)is classically ob-tainedusingashrinkageoperatoreD,l=T bD
(
h)
,withh=yD−φ
D,l TbD(
h)
k= hk2−bD hk2 hk if hk2>bD 0 otherwise (6)wherehkisthe k thpartofhassociatedwiththe k thparameterfor
k=1,...,K D.Alltheatoms
φ
D,l yielding an anomalysignal equaltozeroareselectedandthecorrespondingvaluesof l arestoredin asubsetMdefinedas
M=
{
l∈{
1,· · ·,L}|
eD,l2=0}
. (7)Notethat Mcontainsthevaluesof l associatedwiththediscrete atomsof
D that aretheclosesttoyD.The regularization
param-eter b Dplaysanimportantroleintheatomselectionstepbecause
it fixesthe levelof authorizeddeviationfroma discrete parame-ter andan atom ofthedictionary. Thelower thevalueof b D,the
thecontinuousnominalsignalyC.Conversely,thehigherthevalue
of b D,thebetterthenominalestimationofyC,withahigherriskto
breakthelinksbetweendiscreteandcontinuousparametersby se-lectingnon-representative atomsandmissmultivariatecontextual anomalies.
4.2.2. Sparse coding for continuous atoms
The nominalcontinuoussignal isapproximatedusinga sparse linearcombinationofatomscontainedinadictionarydenoted as
M, composed of the continuous atoms
φ
C,l,l =1,· · ·,L whosediscrete parts
φ
D,l have beenselected inthediscrete atom selec-tion(i.e., l ∈M).Moreprecisely,whenM=∅,adiscreteanomaly is detected and no sparse coding is performed for continuous atoms. When M=∅, thecontinuousatoms corresponding tothe elementsof M=∅ areselected anda continuous sparse decom-position is performedusing the resulting representative continu-ous atoms(inview ofthediscrete testsignal yD).Thesecontinu-ous atomsareusedto detectanomaliesinmultivariatecorrelated mixeddata by preservingthe relationshipsbetween discrete and continuous parameters.As a consequence, the sparse representa-tionmodelusedforthecontinuousparametersisdefinedas
min xC,eC 1 2
y C−Mx C−eC 2 2+aCx C1+bC KC k=1 eC,k2 (8)
where
x1=n|
x n|
is the 1 norm of x, eC,k corresponds tothe k thtime seriesofeC associated withthe k thparameter with k =1,...,K C, a C and b C areregularizationparametersthatcontrol
the levelofsparsityofthe coefficient vectorxC andthe anomaly
signaleC,respectively.Notethat(8)considerstwodistinctsparsity
constraintsforthecoefficientvectorxC andtheanomalysignaleC.
Thisformulationreflectsthefactthatanominalcontinuoussignal can be well approximated by a linearcombination of few atoms ofthe dictionary(sparsityofxC) andthat anomalies are rare and
affectfewparametersatthesametime(sparsityofeC).
Problem (8) can be solved with the alternating direction method of multipliers (ADMM)[40] by adding an auxiliary vari-ablez min xC,eC,z 1 2
y C−Mx C−eC 2 2+aCz1+bC KC k=1 eC,k2 (9)
andthe constraintz=xC.Note that,contraryto Problem(8),the
firstandsecondtermsof(9)aredecoupled,whichallowsaneasier estimationofthevectorxC.TheADMMalgorithmassociatedwith (9)minimizesthefollowingaugmentedLagrangian
LA
(
x C,z,eC,m,μ
)
= 1 2y C−Mx C−eC 2 2+aCz1 +bC KC k=1 eC,k2+mTC
(
z−x C)
+μ
2Cz−x C2F (10)wheremC isaLagrange multiplier vectorand
μ
C is aregulariza-tionparametercontrollingthelevelofdeviationbetweenzandxC.
TheADMMalgorithmisiterativeandalternativelyestimatesxC,z, eC andmC.Moredetails abouttheupdateequationsofthe
differ-entvariablesatthe k thiterationareprovidedbelow.
UpdatingxC
xCisclassicallyupdatedasfollows x Ck+1=argmin xC 1 2
y C−Mx C−e k C22+mkC
(
zk−x C)
+μ
k C 2 z k−x C22. (11)Simplealgebraleadsto
x Ck+1=
(
M
T M+
μ
kCI)
−1(
T MrkC+mkC+
μ
kCzk)
(12) whererk C=yC−ekC. UpdatingzTheupdateofzisdefinedas
zk+1=argmin z aC
z1+(
m k C)
T(
z−x kC+1)
+μ
k C 2 z−x k+1 C 2 2. (13)Thesolutionof(13)isgivenbytheelement-wisesoftthresholding operator zk+1=Sγk
x kC+1− 1μ
k C mkC withγ
k= aC μk C,where the thresholding operator S γ(u) is defined by
Sγ
(
u)
=u
(
n)
−γ
ifu(
n)
>γ
0 if
|
u(
n)
|
≤γ
u
(
n)
+γ
ifu(
n)
<−γ
(14)whereu(n )isthe n thcomponentofu.
UpdatingeC
The errorvector e isalso updated usingthe shrinkage opera-toralreadydefinedin(6)forthesparsecodingofdiscreteatoms, i.e.,aseC=T bC[yC−
MxC]TheADMMresolutionof(9)isdetailed
in [15]and summarized in Algorithm 1 (theoretical convergence propertiesaredetailedin[41]).
Algorithm1 x,e,z,m=ADMM(y,
,
μ
,ρ
,a,b ). Initialisation:k=1,z0,e0,m0,μ
0,ρ
,,a,b repeat xk+1=
(
T
+
μ
kI)
−1[T
(
y−ek)
+mk+μ
kzk] zk+1=S γ(
xk+1− 1 μkm k)
,γ
= a μk ek+1= T b(
y−x
)
mk+1=mk+μ
k(
zk+1−xk+1)
μ
k+1=ρμ
k k=k+1untilstopcriteria
4.2.3. Proposed anomaly detection strategy
The estimated anomaly signale associated withthe test sig-naly is builtby theconcatenationof itsdiscrete andcontinuous counterpartse=
(
eTD,eTC
)
T.Theproposedanomalydetectionruleisdescribedbelow:
Adiscrete anomalyisdetectedif
eD2>0(i.eM=∅).More-over, a continuous or contextual anomaly is detected when
eC2> S PFA.where S PFA isathresholddependingontheprobabilityoffalse
alarmofthedetector.Thisthreshold canbe adjustedby theuser ordeterminedusingreceiveroperatingcharacteristic(ROC)curves ifa ground-truthis available (which willbe the case inthis pa-per).Notethat thesetM isused todetect anomaliesindiscrete datawhen itreduces tothe empty set,i.e., whenM=∅, andto extractthecontinuousatomscorrespondingtotheelementsofM
whenM=∅.Thesecontinuousatomsareusedtodetectanomalies inmultivariate correlated mixeddatausing thedecision rule(8), which preservesthe relationshipsbetween thediscrete and con-tinuousparameters.Moreprecisely,thefirststepofthealgorithm (detailedinSection4.2.1) considersthediscretepartofy,namely
yD,anddetects potentialanomaliesaffectingthediscrete
parame-ters.Inthesecondpartofthealgorithm(detailedinSection4.2.2), (8) isusedtodetectunivariatecontinuousanomaliesand contex-tual discrete/continuous anomalies using the continuous part of
Algorithm2 Anomalydetectionruleinmixedtelemetryusinga sparserepresentation(y,
C
M,
τ
max).Discrete Model and Atom Selection for l =1to L do eD,l=T bD
yD−φ
D,l endfor M={
l ∈{
1,...,L}|
eD,l2=0}
DiscreteAnomaly:ifM=∅,adiscreteanomalyisdeclared IfM=∅,thealgorithmconsidersacontinousmodel
Continuous Model
M=
{
C,l
|
l ∈M}
Anomaly Detection e=(
eTD,eTC
)
TJoint Anomaly: if M=∅ and
eC2>S PFA, a joint discrete/continuousanomalyisdetected
theatomsselectedinthefirststep(denotedas
M).Thetwosteps ofthealgorithmsaresummarizedbelowandinAlgorithm2.
• First step: the discrete anomaly detection looks for the anomaly vectors eD,l,l =1,...,L resulting from the discrete
sparse decomposition that are equal to 0and buildsa subset
Mdefinedas
M=
{
l∈{
1,· · ·,L}|
eD,l2=0}
. (15)WhenthesetMisempty,adiscreteanomalyisdetected.
• Second step: Theset M isused tobuild a dictionary of con-tinuousatoms(denotedas
M=
{
φ
C,l,l =1,· · ·,L}
)associated withthediscreteatomsselectedinthefirststep,i.e.,{
φ
D,l,l =1,· · ·,L
}
.Thisatom selectionallowsthecontinuoussparse de-compositiontobe performedusingonlyrepresentative contin-uousatoms(inviewofthediscretetestsignalyD).Asaconse-quence,contextualanomaliesbetweendiscreteandcontinuous parameterscanbedetected.
4.3. Shift-Invariant option
The proposed method hasa shift-invariance (SI)optionalstep that can be activated for the discrete model and for continuous atomselection.ThisSIoptionallows apossibleshiftbetweenthe data of interest and the atoms of the discrete dictionary to be mitigated.Notethatthisoptioncouldalsobe appliedto continu-ousdata.However,sinceitincreasesthecomputationalcomplexity significantly,ithasonly beenconsidered fordiscrete data inthis work.The SIoptionconsistsofbuildingan overcompletediscrete dictionaryby applyingshifts toall the discreteatomsof the dic-tionary.Inotherwords,eachdiscreteatom
φ
D,l isshiftedofτ
lagstocreate a new discrete atom
φ
D,l−τ, withτ
∈{
−τ
max,−(
τ
max−1
)
,. . .,−1,0,1,...,τ
max−1,τ
max}
. Note that the maximum shiftτ
max hastobe fixed bythe user.Byactivating theSI option,thesizeofthediscretedictionaryincreasesfrom L to2L
τ
max+L atoms.Thisoptionispotentiallyinterestingsinceitallowsmore represen-tative atomsto be considered for the estimation of the nominal signalandforatomselection.
5. Experimentalresults 5.1. Overview
The first experiment considers a simple dataset composed of K D=3 discrete and K C=7 continuous parameters with an
availableground-truth. Thedictionary wasconstructed usingtwo months of nominal telemetry (without anomalies), which repre-sentsapproximately 30000 mixed trainingsignals obtained after
applyingthepreprocessingdescribedinSection4withthe param-eters
δ
=5and w =50(i.e.,the signal lengthis N =500). As ex-plained before, the existing dictionary learningmethods such as K-SVDorODLhavenotbeendesignedformixeddiscreteand con-tinuous data.In thiswork, we built thedictionary of mixed dis-crete and continuous parameters as follows: 1) the dictionary is initializedwith L =2000trainingsignalsselectedrandomlyinthe trainingdatabase(the choice of L willbe discussed later), 2) the proposedsparsecodingalgorithmisappliedtothetrainingdatato determinethe sparse representationx andselectthe L training signalshaving thehighestresiduals y−
x.This process is re-peated100timesandthe L signalsmostoftenselectedamongthe iterationsareselectedasthecolumnsofthemixeddictionary.
TheperformanceofthedifferentADmethodsisevaluatedusing atestdatabaseassociatedwith18daysoftelemetry,i.e.,composed of1000signalsincluding90affectedby anomalies.Notethat the 90 anomaly signalsof the dataset are divided in 7 anomaly pe-riodswith various durationsdisplayedin Fig.1. Notealso that a specific attentionwasdevoted to the construction ofa heteroge-neoustestdatabasecontainingallkindsofanomalies,i.e., univari-atediscrete andcontinuousanomalies andtwo multivariate con-textual anomalies. Finally,it isimportantto note that the major-ityoftheseanomalies are actualanomalies observed inoperated satellites. This work investigates four AD methods whose princi-plesaresummarizedbelow
• Theone-class supportvector machine (OC-SVM)method[42]: the OC-SVM algorithm was investigated in a multivariate framework byusing input vectorscomposed ofmixed contin-uousanddiscreteparameters.Theinputvectorswereobtained usingthepreprocessingstepdescribedinSection 4.A.Denote asy∈RN oneoftheseinputvectorsobtainedbyconcatenating
timeseriesofthedifferenttelemetryparameters.The strategy adopted by OC-SVM is to map the training data in a higher-dimensionalsubspaceHusinga transformation
ϕ
,andtofind alinearseparatorinthissubspace,separatingthetrainingdata (consideredasmostlynominal) fromtheoriginwiththe max-imummargin.The separatorisfound bysolving thefollowing problem min w,ρ,Ei 1 2w2+ 1ν
N N i=1 Ei−ρ
s.t. w,ϕ
(
y)
H≥ρ
−Ei,Ei≥0 (16)wherewisthenormalvectorto thelinearseparator,<.,.>H istheHilbertinnerproductforHwhereHisequipedwith re-producingkernel
ρ
istheso-calledbias,Ei,i =1,...,Nareslackvariables(whichare equalto0whenysatisfiestheconstraint and are strictly positive when the constraint is not satisfied) and
ν
isarelaxationfactorthatcanbeinterpretedasthe frac-tionoftrainingdataallowedtobeoutsideofthenominalclass. Notethattheparameterν
hastobefixedbytheuser.The ker-nelusedinthisworkistheGaussiankerneldefinedask
(
y ,y)
=exp−γ
y −y 2 (17)where
γ
isa parameter(also adjustedbytheuser)controlling the regularity of the separator. Once the separator has been found, determining whethera newdata vector is nominalor abnormalisan easytaskasitonlyconsistsoftestingwhether thisvectorfallsinsideoroutsidetheseparatingcurve.Inother terms,thedecisionrulecanbeformulatedasfollowsf
(
y)
=sign[k(
w,y)
−ρ
] (18)wheresignisthefunctiondefinedby
sign
(
y)
= −1 if y <0 0 if y =0 1 if y >0
Notethatananomalyscorecanbedefinedasthedistance be-tweenthetestvectoryandtheseparatorwithapositivescore if f (y )<0 and a score equal to zero if f (y )>0. This anomaly scoreinthefirstcaseisdefinedby
a
(
y)
=ρ
−k(
w,y)
w . (20)Finally,wewouldliketomentionthattheparameters
ν
andγ
havebeentunedbycrossvalidationinthisstudy.• Mixtureofprobabilisticprincipalcomponentanalyzersand cat-egorical distributions(MPPCAD)[14]: thisis amultivariate AD methodbasedonprobabilisticclusteringanddimensionality re-duction.The input data vector y isdivided intotwo parts as-sociated withcontinuousanddiscrete vectorsdenotedasyC∈ RKC (containing one data instanceof each continuous
param-eter) andyD∈RKD (containing one data instanceof each
dis-crete parameter) acquiredat thesame time instant. Each dis-creteparameteryD
(
j)(
j =1,. . .,K D)
takesits valuesintheset{
1,. . .,M j}
containing M j different values. MPPCAD assumesthatthevectorofcontinuousvariablesisdistributedaccording to a mixture of Gaussian distributions and that the vector of discrete variablesis distributedaccordingto amixture of cat-egorical distributions. This assumption leads to the following probabilitydensitydistributionforthecontinuousdatayC
p
(
y C|
C
)
=G
g=1
π
gN(
y C|
μ
g,Cg)
(21)where the g thGaussian distribution hasmean vector
μ
g andcovariancematrixCg=WgWTg+
σ
g2IKC withWgthefactorload-ingmatrix,
σ
2g thenoisevariance,IKC istheidentitymatrixof
size K C×K Cand
π
gthepriorprobabilityofthe g thcluster.Thedistributionofthediscretedatabasedonamixtureof categor-icaldistributionsisdefinedas
p
(
y D|
D
)
= G g=1π
g KD j=1 Cat(
y D(
j)
|
θ
g,j)
(22)whereCat(.)isthecategoricaldistribution,yD(j )isthe j th
com-ponentofyD and
θ
g,j=[θ
g,j,1,...,θ
g,j,Mj]denotes theparame-tervectorsofthecategoricaldistributions,i.e., P
(
yD(
j)
=l|
g)
=θ
g,j,l.FinallythejointdistributionofthemixeddataisobtainedassumingindependencebetweenyCandyD
p
(
y C,y D|
)
=p(
y C|
C
)
p(
y D|
D
)
(23)where
=
{
T
C,
TD
}
T,C=
{
π
g,μ
g,Wg,σ
g2,g =1,...,G}
andD=
{
θ
g,j,g =1,...,G,j =1,...,M j}
.Theunknown parametervector
can be classically estimated using the Expectation-Minimization (EM) algorithm [43] yielding an estimator de-noted as
. The EM algorithm was initialized using k -means clustering following Ding’s method [44]. The authors of Yairi etal.[14]proposedtoestimatethenumberofcluster K andthe dimensionalityofthecontinuouslatentspace L usingheuristic rules. More precisely, the value of L wastuned usingthe so-called“elbow-law” afterapplyingaprincipalcomponent analy-sistothecontinuousdata.Thenumberofclusters K was man-uallyestimatedbasedonthescatterplotoftheprincipal com-ponentscores.Finally,itisinterestingtonotethatan anomaly score a
(
yC,yD|
)
canbedefinedastheminusloglikelihoodofthemixeddata
a
(
y C,y D|
)
=−ln p(
y C,y D|
)
. (24)• NewOperationalSofTwaReforAutomaticDetectionof Anoma-lies based on Machine-learning and Unsupervised feature Se-lection(NOSTRADAMUS):thisisaunivariatemethoddeveloped bythefrenchspaceagencyCNESbasedontheOC-SVMmethod appliedtoeachtelemetryparameterindividually[7].Theinput
data are vectors of features (mean, median, minimum, maxi-mum,standarddeviation...)computedontimewindows result-ing froma segmentation forthese parameters on a fixed pe-riodoftime.Differentfeaturesarecomputeddependingonthe discrete or continuous nature of the parameter. The OC-SVM method requires to define an appropriate kernel, which was chosen as the Gaussian kernel in [7]. An anomaly score was also defined in orderto quantify the “degree of abnormality” of any test vector. This degree of abnormality corresponds to thedistancebetweenthisvectorandtheseparatornormalized to[0,1]inordertoprovideaprobabilityofanomaly.Giventhe univariate frameworkofNOSTRADAMUS, ascoreisassignedto each parameter and is denoted as a
(
yk)
for the kthparame-ter.Inordertocomparewithmultivariate ADmethodsstudied inthiswork,wedefineamultivariatescoreforNOSTRADAMUS correspondingtothesumoftheunivariatescores
a
(
y)
= K k=1a
(
y k)
(25)whereKisthenumberofparameters.
• ADDICT: the proposed strategy is a multivariate AD method based on a sparse decomposition of any test vector y on a DICTionary (ADDICT) of normal patterns. The dictionary is learned from mixed training signalsassociated witha period oftimewherenoanomalywasdetected.Theinputdataofthis method are mixedvectors composed of telemetryparameters acquired during the same period of time. The preprocessing applied to the vector y and the AD algorithm were detailed in Section 4. An anomaly score can also be defined for this method
a
(
y)
=−1 if
e D2>0(i.e.M=∅)
e C 2 otherwise . (26)Alltheregularizationparameters(a,b C,b D)weredeterminedby
crossvalidationinthisstudy.Atthispoint,itisworth mention-ing that it might be interesting to consider other approaches suchasBayesianinference[45]toestimatetheseregularization parameters.
5.2. Performance evaluation
Thissection compares detection results obtainedwiththe AD methods summarized inthe previous section when they are ap-plied on an anomaly dataset with available ground-truth. Fig. 3
showsthe differentanomaly scoreswithground-truthmarkedby red backgroundsfor OCSVM(a), MPPCAD (b),NOSTRADAMUS (c) andthe proposed method ADDICT (d).The higherthe score, the highertheprobability ofanomaly.Foreach method,thedetection rule compares the anomaly score to a threshold anddetects an anomalyifthisscoreexceeds anappropriate threshold.Inan op-erational context,the threshold can be set in orderto obtain an acceptableprobability ofdetectionby constrainingtheprobability offalsealarmtobeupper-bounded,sincedetectingtoomanyfalse alarmsisaproblemforoperationalmissions.Inthispaper,we de-terminedthethresholdassociatedwiththevalueofthepair (prob-abilityoffalse alarm P FA,probability ofdetection P D) located the
closestfromtheidealpoint (0,1).Fig.3 showsthat point anoma-lieslocated inboxes #3and#5ofFig.1)arewell detectedby all themethods.Indeed,thescoresreturnedby allthemethods dur-ing thisanomaly periodare significantly higherthan the average score.The second anomaly (box #2 in Fig. 1 and indicated as 2 in Fig.3) is a univariate anomaly that is also relatively well de-tectedbyall themethods.Thefirstcollectiveanomaly(box #1in
Fig.3. Anomaly scores for the datasetwith ground-truth markedbyred back-ground.OCSVM(a),MPPCAD(b),NOSTRADAMUS(c)and ADDICT(d).(For inter-pretationofthereferencestocolourinthisfigurelegend,thereaderisreferredto thewebversionofthisarticle.)
Fig. 4. ROC curves of OC-SVM, NOSTRADAMUS, MPPCAD and ADDICT for the anomalydataset.
Table1
Values of P D and P FA for OCSVM, MPPCAD,
NOS-TRADAMUSandADDICT.
Method Threshold P D P FA OC-SVM 0.0016 89% 12.3% MPPCAD 12 67% 25% NOSTRADAMUS 29 77.26% 6% ADDICT(τmax=0) 3.8 84.6% 9.8% ADDICT(τmax=5) 3.7 89% 10.2%
parameter, is only detected by NOSTRADAMUS. This non detec-tionby MPPCADcanbe partlyexplainedbythefact thatthis ap-proachprocessestimewindowsoflength w =1whereasthiskind ofanomalyclearlyrequirestoconsiderlongertime windows.The fourthanomaly(box #4inFig.1)can beclassifiedasacollective anomalyifweconsideritsabnormalduration,orasamultivariate contextual anomaly ifwe consider the abnormal jointbehaviour ofthe twodiscrete parameters. Thisanomaly isonly detected by NOSTRADAMUS. This result isdue to the fact that anomalies af-fecting discrete data are poorly managed by the multivariate AD methods. Note that in thisfirst experiment, the SI optionof the proposedmethodwhichaimsatsolving thisproblemwasnot ac-tive. Onthe other hand,the sixth anomaly (box #6 inFig 1 and referred to as 6 in Fig.3), corresponding to a univariate contex-tual anomaly that occurs on continuous data,is detected by the proposedmethodbutnotbytheothers.Thenondetectionofthis anomaly by MPPCAD can be explained by the same arguments usedforthecollectiveanomaly.Moreover,thisanomalyisnot de-tectedbyNOSTRADAMUSsinceitdoesnotaffectsignificantly fea-turesthatformtheinputvectorofthisalgorithm.Finally,thelast anomalycorrespondingtoamultivariatecontextualanomalyfora continuousparameter(labelled7inFig.3andlocatedinbox#7of
Fig.1),isperfectlydetectedbyOCSVMandADDICT.However,itis lesssignificantforMPPCADandisnotdetectedbyNOSTRADAMUS, whichisnotabletohandleanomaliesduetocorrelationsbetween thedifferentparameters.
Quantitative results in terms of probability of detection and probability of false alarm are givenin Fig.4 which displaysROC curves ofthe four methods for theanomaly dataset.ROC curves werebuiltusingground-truth.Theperformancescorrespondingto the pair (probability of false alarm P FA, probability of detection P D) located theclosest from theideal point (0,1)are reportedin Table1.Ourcommentsaresummarizedbelow
• Fewfalse alarms are generatedby theMPPCAD algorithmbut animportantproportionofanomaliesfromthedatabaseisnot detected.
Fig.5. Anomaly scoresobtainedwithADDICTfor thedatasetwithground-truth markedbyredbackgroundandtheshift-invarianceoptionenabledwithτmax=5.
(Forinterpretationofthereferencestocolourinthisfigurelegend,thereaderis referredtothewebversionofthisarticle.)
• TheOC-SVMmethoddetectsthemostseriousanomalies affect-ingcontinuousparameters butis notableto detectanomalies associatedwithdiscreteparameters.
• NOSTRADAMUSisabletodetectthemajorityoftheunivariate anomaliesbutfailsforthemultivariateones.Itwouldbe inter-esting to adapt NOSTRADAMUS to detect multivariate anoma-liesinmixeddata.
• TheresultsobtainedwithADDICTareveryencouragingwitha highprobability ofdetection P D=0.846and asmall
probabil-ityoffalse alarm P FA=0.098.Theseresultsareimprovedwith
theSI optionleadingto P D=0.89and P FA=0.102, which
cor-respondstothebestperformanceforthisdataset.
5.3. Shift invariant option
ThissectioninvestigatestheusefulnessoftheSIoptionforthe proposed method. Fig.5 displaysthe anomaly scores of the pro-posed methodwith a maximumallowed shift
τ
max=5.Bycom-paringthese resultswiththose in Fig.3(d) obtainedwithout us-ing theSI option(i.e.,with
τ
max=0),we observethat theSIop-tionallowsanomaliesaffectingdiscreteparameterstobedetected (anomalies#1and#4inFig.1).Inaddition,theactivationofthe SIoptiondecreasesscoresofnominalsignals,whichisan interest-ingproperty. Moreprecisely,70%ofnominalsignalsyieldalower anomaly scorewhentheSIoptionisenabled.Reducingthisscore allowsthenumberoffalsealarmstobedecreased,improvingthe performanceoftheproposedmethod.
The maximumallowed shift
τ
max wasdetermined usingROCsthat expressthe probability ofdetection P D asa function ofthe
probability offalsealarm P FA.Fig.6showsROCsfordifferent
val-uesof
τ
max showingthatτ
max=5leadstoagoodcompromiseinterms of performance and computational complexity (the higher
τ
maxthehighertheexecutiontime).Theseresultsconfirmtheim-portanceoftheSIoptionfortheproposedmethod.
5.4. Selecting the number of atoms in the dictionary
ThissectionexplainshowtheproposedmethodADDICTselects thenumberofatomsinthedictionary.Intuitively,themoreatoms inthe dictionary, thebetterthe sparserepresentationofnominal signalsandthe lower theprobability offalse alarms. Our experi-mentshaveshownthattheanomaliesarealsobetterapproximated whenthenumberofatomsinthedictionaryincreases.
Fig.7showsthevaluesof P D and P FAreturnedbytheproposed
methodADDICTversusthenumberofatomsinthedictionary.The performance startsby improving when the numberof atoms in-creases. Forinstance,moving from100 to 2000atoms allows P D
Fig.6. ROCcurvesforADDICTwithdifferentvaluesofτmax∈{0,1,3,5,8}.
Fig.7. ValuesofP D(top)andP FA(bottom)versusthenumberofdictionaryatomsL .
to increase from77.78% to88.89% and P FA to decrease from26%
to5.72%,whichisasignificantimprovement.Beyond2000atoms, the detectionperformance does not improve, which explains the choice L =2000inourexperiments.Thisanalysisemphasizesthat choosingthe numberof atomsin thedictionary is importantfor ADusingADDICT.
6. Conclusion
Thispaperinvestigatedanewdata-drivenmethodforanomaly detectioninmixedhousekeepingtelemetrydatabasedonasparse representationanddictionary learning. Theproposed methodcan handle mixed discrete and continuous parameters that are pro-cessedjointlyallowingpossiblecorrelationsbetweenthese param-eterstobe captured.The approach wasevaluated on a heteroge-neousanomaly dataset with available ground-truth. The first re-sultsdemonstratedthecompetitivenessofthisapproach with re-specttothestate-of-the-art. Ourexperimentsshowedthe useful-nessofa shiftinvariantoption forthe detectionofanomalies af-fecting discrete parameters, leading to a significant reduction in theprobabilityoffalsealarm.
For future work, different issues might be investigated. The mostchallengingtaskisthedictionarylearningstep,whichshould be adapted to mixed discrete and continuous data. Another re-search prospect is the potential use of sparse codes or anomaly signalstoidentifythecausesoftheanomaly.Finally,wethinkthat integratingthe feedbackofusers inthealgorithm mightimprove thefuturedetectionofanomalies,e.g.,byreducingthenumberof falsealarms.Thisopensthewayformanyworksrelatedtoonline orsequentialanomalydetection, whichcould beusefulfor space-crafthealthmonitoring.
DeclarationofCompetingInterest
None.
Acknowledgements
The authors wouldlike tothank Pierre-BaptisteLambert from CNESandClémentineBarreyrefromAirbusDefenceandSpacefor fruitfuldiscussionsaboutanomalydetectioninspacecraft teleme-try. This work was supported by CNES and Airbus Defence and Space.
References
[1] V.Chandola,A.Banerjee,V.Kumar,Anomalydetection:asurvey,ACMComput. Surv.43(3)(2009).
[2] V.Chandola,A.Banerjee,V.Kumar,Anomalydetectionfordiscretesequences: asurvey,ACMComput.Surv.24(5)(2012).
[3] M.Pimentel,D.Clifton,L.Clifton,L.Tarassenko,Areviewofnoveltydetection, SignalProcess.99(2014)215–249.
[4] M.Markou, S.Singh,Novelty detection:areview-part2:neural network basedapproaches,SignalProcess.83(12)(2003)2499–2521.
[5] M. Markou, S. Singh, Novelty detection: a review- part 1: statistical ap-proaches,SignalProcess.83(12)(2003)2481–2497.
[6] C.Barreyre,B.Laurent,J.-M.Loubes,B.Cabon,L.Boussouf,Statisticalmethods foroutlierdetectioninspacetelemetries,in:Proc.Int.Conf.SpaceOperations (SpaceOps’2018),Marseille,France,2018.
[7] S.Fuertes,G.Picard,J.-Y.Tourneret,L.Chaari,A.Ferrari,C.Richard,Improving spacecrafthealthmonitoringwithautomaticanomalydetectiontechniques,in: Proc.Int.Conf.SpaceOperations(SpaceOps’2016),Daejeon,SouthKorea,2016. [8] J.-A.Martínez-Heras,A.Donati,M.Kirksch,F.Schmidt,Newtelemetry moni-toringparadigmwithnoveltydetection,in:Proc.Int.Conf.SpaceOperations (SpaceOps’2012),Stockholm,Sweden,2012.
[9] I.Verzola,A.Donati,J.-A.M.Heras,M.Schubert,L.Somodi,Projectsybil:a noveltydetectionsystemforhumanspaceflightoperations,in:Proc.Int.Conf. SpaceOperations(SpaceOps’2016),Daejeon,SouthKorea,2016.
[10] C.O’Meara,L.Schlag,L.Faltenbacher,M.Wickler,ATHMoS:automated teleme-tryhealthmonitoringsystematGSOCusingoutlierdetectionandsupervised machinelearning,in:Proc.Int.Conf.SpaceOperations(SpaceOps’2016), Dae-jeon,SouthKorea,2016.
[11]K.Hundman,V.Constantinou,C.Laporte,I.Colwell,T.Soderstrom,Detecting spacecraftanomaliesusingLSTMsand nonparametricdynamicthresholding, in:Proc.Int.Conf.KnowledgeDataMining(KDD’2018),London,United King-dom,2018,pp.387–395.
[12] C.O’Meara,L.Schlag,M.Wickler, Applicationsofdeeplearningneural net-workstosatellitetelemetrymonitoring,in:Proc.Int.Conf.SpaceOperations (SpaceOps’2018),Marseille,France,2018.
[13]N. Takeishi,T.Yairi, Anomalydetection frommultivariatetimes-series with sparserepresentation,in:Proc.IEEEInt.Conf.Syst.ManandCybernetics,San Siego,CA,USA,2014.
[14]T.Yairi,N.Takeishi,T.Oda,Y.Nakajima,N.Nishimura,N.Takata,Adata-driven healthmonitoringmethodforsatellitehousekeepingdatabasedon pobabilis-ticclusteringanddimensionalityreduction,IEEETrans.Aerosp.Electron.Syst. 53(3)(2017)1384–1401.
[15]A.Adler,M.Elad,Y.Hel-Or,E.Rivlin,Sparsecodingwithanomalydetection,J. SignalProcess.Syst.79(2)(2015)179–188.
[16]R.O.Duda,PatternClassificationandSceneAnalysis,Wiley,New-York,1973. [17]D.Dohono,High-dimensionaldataanalysis:thecursesandblessingsof
dimen-sionality,AMSMathChallengesLecture,2000.
[18]A.Tajer,V.Veeravalli,H.V.Poor,Outlyingsequencedetectioninlargedatasets: adata-drivenapproach,IEEESignalProcess.Mag.31(5)(2014)44–56. [19]A.Gilbert,P.Indyk,M.Iwen,L.Schmidt,Recentdevelopmentsinthesparse
fourier transform:acompressedfouriertransformfor bigdata, IEEESignal Process.Mag.31(5)(2014)91–100.
[20]N.Li,Y.Yang,Robustfaultdetectionwithmissingdataviasparse decomposi-tion,in:Proc.Int.Fed.AutomaticControl(IFAC’2013),Shanghai,China,2013. [21]M. Elad, A.Aharon,Image denoisingviasparse and redundant
representa-tionoverlearneddictionaries,IEEETrans.ImageProcess.14(12)(2006)3736– 3745.
[22]W.Dong,B. Li, Sparsity-basedimagedenoisingvia dictionarylearning and structural clustering, in: Proc. IEEE conf. Comput. Vis. Pattern Recogni. (CVPR’2010),SanSiego,CA,USA,2010.
[23]S.Xu,X.Yang,S.Jiang,Afastnonlocallycentralizedsparserepresentation al-gorithmforimagedenoising,SignalProcess.131(2017)99–112.
[24]K.Huang,S.Aviyente,Sparserepresentationforsignalclassification,in:Proc. Neur.Inf.Process.Syst.(NIPS’2006),Whistler,BC,Canada,2006.
[25]X.Mei,H.Ling,Robustvisualtrackingandvehicleclassificationviasparse rep-resentation,IEEETrans.Patt.Anal.Mach.intell.33(11)(2011)2259–2272–18. [26]Y.Oktar,M.Turkan,Areviewofsparsity-basedclusteringmethods,Signal
Pro-cess.148(2018)20–30.
[27]J.Wright,A.Y.Yang,A.Ganesh,S.S.Sastry,Y.Ma,Robustfacerecognitionvia sparserepresentation,IEEETrans.Patt.Anal.Mach.intell.31(3)(2009)1–18. [28]Q.Zhang,B.Li,DiscriminativeK-SVDfordictionarylearninginface
recogni-tion,in:Proc.IEEEconf.Comput.Vis.PatternRecogni.,SanFrancisco,CA,USA, 2010.
[29]H. Cheng,Z.Liu,L.Yang,X.Chen,Sparserepresentationandlearningin vi-sualrecognition:theoryandapplications,SignalProcess.93(6)(2013)1408– 1425.
[30]Y.Xu,Z.Wu,J.Li,A.Plaza,Z.Wei,Anomalydetectioninhyperspectralimages basedonlow-rankandsparserepresentation,IEEETrans.Geosci.RemoteSens. 54(4)(2016)1990–2000.
[31]S.Biswas,R.Venkatesh,Sparserepresentationbasedanomalydetectionwith enhanced local dictionaries, in: Proc. IEEE Int.Conf. ImageProcess., Paris, France,2014.
[32]S.G. Mallat,Z. Zhang, Matchingpursuits with times-frequency dictionaries, IEEETrans.SignalProcess.41(12)(1993)3397–3415.
[33]P.K.Y.Pati,R.Rezaiifar,Orthogonalmatchingpursuits:Recursivefunction ap-proximation with application towavelet decomposition,in: Proc. Asilomar Conf.Signals,Syst.Comput.(ACSSC’1993),PacificGrove,CA,USA,1993. [34]R.Tibshirani,RegressionshrinkageandselectionviatheLASSO,J.Roy.Statist.
Soc.Ser.B(Methodological)58(1)(1996)267–288.
[35]I.Tosic,P.Frossard,Dictionarylearning,IEEESignalProcess.Mag.28(2011) 27–38.
[36]M.Aharon,M.Elad,A.Bruckstein,K-SVD:analgorithmfordesigning overcom-pletedictionariesforsparsedecomposition,IEEETrans.Signal.Process.54(11) (2006)4311–4322.
[37]J.Mairal,F.Bach,J.Ponce,G.Sapiro,Onlinedictionarylearningforsparse cod-ing,in:Proc.Int.Conf.Mach.Learn.(ICML’2009),Montreal,QC,Canada,2009, pp.689–696.
[38]B. Efron, T.Hastie, I. Johnstone, R. Tibshirani, Leastangle regression,Ann. Statist.32(2)(2004)407–499.
[39]M.Osborne,B.Presnell,B.Turlach,Anewapproachtovariableselectionin leastsquaresproblems,IMAJ.Num.Anal.20(3)(2000)389–403.
[40]S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization andstatisticallearningviaalternatingdirectionmethodofmultipliers,Found. TrendsMach.Learn.3(1)(2010)1–222.
[41]J. Eckstein,D.Bertsekas,OntheDouglas-Rachford splittingmethodand the proximalpoint algorithmfor maximalmonotoneoperators, Math.Program. (Ser.AB)55(3)(1992)293–318.
[42]B.Schölkopf,J.C.Platt,J.Shawe-Taylor,A.J.Smola,R.C.Williamson, Estimating thesupportofahigh-dimensionaldistribution,NeuralComput.3(7)(2001) 1443–1471.
[43]A.P.Dempster,N.M.Laird,D.B.Rubin,Maximumlikelihoodfromincomplete dataviatheEMalgorithm,J.Roy.Statist.Soc.Ser.B(Methodological)39(1) (1997)1–38.
[44]C.Ding,X.He,K-meansclusteringviaprincipalcomponentanalysis,in:Int. Conf.MachineLearning(ICML’2004),Banff,Alberta,Canada,2004.
[45]M.E.Tipping,BayesianInference:anIntroductiontoPrinciplesandPracticein MachineLearning,Springer,2004.