Loda: Lightweight on-line detector of anomalies
Algorithm 4: Algorithm returning approximate of probability density in point x pro jected on the vector w
3. Proposed method
The idea behindtheproposed local adaptive multivariatesmoothing(LAMS) methodisto replacethe output ofthe anomalydetectorbyaverageanomalyscoreofsimilareventsobservedinthepast,wherethesimilaritybetweentwoevents isdefinedas Kh:X×X→ [0,1](furtheralsocalledcontext).Thiseffectivelysmoothstheoutputoftheanomalydetector andthereforereducesunstructuredfalsepositives.Thissmoothingcanbemathematicallyformulatedas
ˆ gnw(x)= n i=1Kh(x,xi)yi n i=1Kh(x,xi) , (3)
where {xi}ni=1 isthe setofobserved networkevents, ˆgnw(x) isthe expectedanomalyofevent x,and {yi}ni=1is theset ofcorrespondingADoutputs.ThespaceX onwhichitisdefinedcanbearbitrary(e.g.,spaceofallstrings,graphs,etc.), butGaussiankernel Kh(x,x)=exp
−x−x2
h
onEuclideanspaceisthemostcommoncombination(hparameterizesthe widthofthekernel).Estimator(3)isknownasaNadaraya–Watsonestimator[9,10]whichisanon-parametricestimatorof theconditionalexpectationofarandomvariable.
HowdoestheabovesmoothingremovefalsepositivesaccordingtothedivisionintroducedinSection2?
•UnstructuredfalsepositivesarereducedbytheNadaraya–Watsonestimatorthatperformslocalaveragingoftheevents. Sincetheunstructuredfalsepositivesareoftheformyi=g(xi)+ηi,itisprovenbyDevroye et al.[11]thatitconverges tothetruevalueg(xi)underfairlygeneralassumptions.Moreformally,Devroyehas shown
E|ˆgnw(x)−g(x)| ≤ c √
nh, (4)
assuminghischosenintermsofn,withh→0,nh→ ∞asn→ ∞;c>0 isaconstant;thekernelKisabsolutelycon- tinuousanddifferentiable,withK∈L1;gisdifferentiableandtheVar(y
i)isbounded.SinceweareusingaGaussian kernelandtheanomalyscores yi are boundedto[0,1]theestimateconvergestotheunderlyingtrueanomalyscore
g(x)witharateof(nh)−1/2,whenonlyunstructuredfalsepositivesarepresent.However,settingtheh→0 requires infinitecomputationandmemoryresources,thereforewerestricttheestimatortofixedh.
ThesmoothingeffectisshownintheFig. 1wheretheleftfigureshowstheinputtoLAMSandrightfiguretheoutput. Itisapparentthattheleftfigureisnoisier,aspointswithdifferentcolorsareclosetoeachother.
•Structuredfalsepositives arelong-term eventsconfined tosubsetofnetwork hostswithoutdirectrelationshiptothe background inthesense that AD’soutput onthemdoes not changewiththe time. LAMScan remove theminthe following situations.
IfLAMS’similaritymeasureofalertsKh isdifferentfromthatusedintheanomalydetector,largenumber ofevents withnormalADscorecanbesimilartostructuredfalsepositives,whichdecreasesthescoresoutputbyLAMSonfalse positives.ExampleofthistypeofbehaviorcanbeseenontheFig. 2,whereweseethattheamplitudeofLAMSinput ishigherthanthatoftheoutput.
Fig. 2.AnomalyscoreoftheHTTPrequesttogooglewebAPIintime.TherearesuddenspikesinAD’sanomalyscore(bluedashedline)thatarereduced bytheLAMSmodel(redsolidline).
IfLAMSaggregatesoutputsofthesameanomalydetectordeployedonmanynetworks,structuredfalsepositivescan benormalinothernetworks.Then,LAMSaggregation eliminatesthefalsepositivesbyaggregating themwiththese normalactivities.
Inothersituations, structuredfalsepositivesare confined and thereare no similarevents receiving lower anomaly score.InthiscaseLAMSfailstoremovethem,butdoesnotincreasetheirnumber.
3.1. Complexityconsiderations
The complexity ofthe smoothing ofAD’s output describedby Equation (3) islinearwith respect tothe number of observedevents.This,togetherwiththefactthatallobserved eventsneedtobestored,makesLAMSuselessforpracti- caldeployment,sincenumber ofnetwork eventscanbeashighas millionspersecond. Wethereforeresort tocommon approachtoapproximate(3)fromvaluesmaintainedinasetofpivots= {φj}jJ=1,φj∈X as
ˆ glams(x)= J j=1Kh(x, φj)ˆyj J j=1Kh(x, φj) , (5)
where { ˆyj}Jj=1 are estimates of gnw(φˆ j) calculated according to (3). Thisreduces the computational complexity ofthe estimateinarbitraryxto O(J),whichislinearwiththenumber ofpivotsand independentofthenumber ofobserved alerts.2Thesameholdsforspacecomplexity,becauseonlypositionsφj offinitenumberofpivotsandrelevantestimates
ˆ
yk need tobekept.ThesetofpivotsisupdatedusingthemodifiedLeader–Followeralgorithm[13],wherenewpivot isaddedtotheset whenaneventnotsimilar toanypivotin isreceived.Theupdateprocessisdescribedinmore detailinSection3.2.ContrarytothestandarddefinitionofLeader–Follower[13,14]pivotsarenotallowedtomove,because movingpivotstowardsareasofhigherdensitycausesforgettingofrareevents,whichwewanttoremember.
3.2.IncrementalupdateoftheLAMSmodel
KeepingLAMS model uptodateondata-streamsrequires maintainingestimates { ˆyj}Jj=1 in{φj}jJ=1 andalternatively addingnewpivotifneeded.Uponarrivalofanewobservationofanetworkevent(yt,xt),estimates ˆyj storedinpivotsφj shouldbeupdatedusingtheEquation(3)as
ˆ ynewj = n i=1Kh(φj,xi)yi+Kh(φj,xt)yt n i=1Kh(φj,xi)+Kh(φj,xt) . (6) Ifwedenote woldj = n i=1 Kh(φj,xi), (7) ˆ yoldj = n i=1Kh(φj,xi)yi n i=1Kh(φj,xi) , (8)
theEquation(6)canberewrittenas ˆ ynewj =w old j yˆoldj +Kh(xt, φj)yt wold j +Kh(xt, φj)yt . (9)
Therefore,forefficientincrementalupdateofthemodelonlyyˆj and wj needtobestoredineachφj.Theupdatecan thanbedoneusingthefollowing.
wnewj =woldj +Kh(xt, φj)yt, (10) ˆ ynewj = 1 wnewj woldj yˆoldj +Kh(xt, φj)yt , (11)
Theabovecanbefurthersimplifiedtoupdateonlypivotsclosetoxt,whichwedefineaspivotswithsimilarityKh(xt,φj) greaterthanKmin
h .Thisreducesthecomplexityoftheupdateprocess,becauseonlylimitednumberofnearestpivotsneeds tobeupdatedwitheachnewevent.
TheupdateisoutlinedinAlgorithm 1.First,asetofallpivotsinε(xi)intheKh(xt,φj)vicinityofanewevent(yt,xt)is foundandtheirestimates ˆyareupdatedasdefinedinEquation(11).Theγ parametercontrolsthedistance(thresholdon thesimilarity)uponwhichanewpivotiscreatedwithφJ+1=xt,ˆyJ+1=ytand wJ+1=1.
Data: Streamofevents(yi,xi),i=1. . . Result: Setofpivots= {φj=(xj,ˆyj)}Jj=1
Startwithanemptyset=∅;
whilethereisanewevent(yi,xi)do
ε(xi)= {φk∈|Kh(xi,φk)>Khmin}; forφk∈ε(xi)do
UpdateyˆkrelevanttopivotφkusingEquation(10)and(11); end
FindthemostsimilarpivotφNtothexi:
φN:=arg maxφ∈Kh(xi,φ); ifKh(xi,φN)<γ then
CreatenewpivotφJ+1:=xi;
wJ+1:=1; ˆ yJ+1:=yi; =∪ {φJ+1}; end end