Getting to the root of the problem: A detailed comparison of kernel and user level data for dynamic malware analysis

(1)

Contents lists available at ScienceDirect

Journal

of

Information

Security

and

Applications

journal homepage: www.elsevier.com/locate/jisa

Getting

to

the

root

of

the

problem:

A

detailed

comparison

of

kernel

and

user

level

data

for

dynamic

malware

analysis

Matthew Nunes

∗

, Pete Burnap

, Omer Rana

, Philipp Reinecke

, Kaelon Lloyd

SchoolofComputerScience&Informatics,Cardiff University,Queen’sBuildings,5TheParade,Cardiff,CF243AA,UK

a

r

t

i

c

l

e

i

n

f

o

Articlehistory:

Keywords:

Dynamicmalwareanalysis Behaviouralmalwareanalysis API-calls

Machinelearning

a

b

s

t

r

a

c

t

Dynamic malware analysis is fast gaining popularity over static analysis since it is not easily defeated by evasion tactics such as obfuscation and polymorphism. During dynamic analysis it is common prac- tice to capture the system calls that are made to better understand the behaviour of malware. There are several techniques to capture system calls, the most popular of which is a user-level hook. To study the effects of collecting system calls at different privilege levels and viewpoints, we collected data at a process-specific user-level using a virtualised sandbox environment and a system-wide kernel-level using a custom-built kernel driver. We then tested the performance of several state-of-the-art machine learning classifiers on the data. Random Forest was the best performing classifier with an accuracy of 95.2% for the kernel driver and 94.0% at a user-level. The combination of user and kernel level data gave the best classification results with an accuracy of 96.0% for Random Forest. This may seem intuitive but was hith- erto not empirically demonstrated. Additionally, we observed that machine learning algorithms trained on data from the user-level tended to use the anti-debug/anti-vm features in malware to distinguish it from benignware. Whereas, when trained on data from our kernel driver, machine learning algorithms seemed to use the differences in the general behaviour of the system to make their prediction, which explains why they complement each other so well. Our results show that capturing data at different privilege levels will affect the classifier’s ability to detect malware, with kernel-level providing more utility than user-level for malware classification. Despite this, there exist more established user-level tools than kernel-level tools, suggesting more research effort should be directed at kernel-level. In short, this paper provides the first objective, evidence-based comparison of user and kernel level data for the purposes of malware classification.

1. Introduction

Malware,shortforMaliciousSoftware,istheall-encompassing termforunwanted softwaresuch asViruses, Worms,andTrojans. Thethreatofmalwareishighlightedbythefactthat350,000new samples ofmalware are identiﬁed every day[1] — fartoo many forhumananalysts tomanuallyanalyse, thusmotivatingresearch intotheautomateddetectionofmalware.Malwarecanbeanalysed in oneof two ways;through staticcodeanalysis ordynamic be-haviouralanalysis.Staticcodeanalysisinvolvesstudyingthebinary ﬁle andlookingforpatternsinitsstructurethat mightbe indica-tive ofmalicious behaviour withouteveractually runningthe bi-nary.Dynamicbehaviouralanalysisinvolvesrunningthebinaryina

∗ _{Corresponding}_author.

E-mailaddresses:[email protected](M.Nunes),[email protected] (P. Burnap),[email protected](O.Rana),[email protected](P.Reinecke).

controlledenvironment,suchasanemulatedenvironment,or Vir-tualMachine(VM),andsearchingforpatternsofOperatingSystem (OS)calls orgeneralsystembehaviour thatareindicativeof mali-ciousbehaviour.Staticanalysishasbecomelesseffectiveinrecent yearsduetothefact thatmalware writerscancircumvent detec-tionmethodsusing techniquessuchascodeobfuscation and poly-morphism[2,3].Asaresult, behaviouralanalysishasgained popu-laritysince itactually runsmalware initspreferredenvironment makingithardertoevadedetectioncompletely.

Inordertoconductbehaviouralanalysis,thesamplebeing anal-ysed must be executed in such a way that data relating to the sample’sbehaviour canbecapturedwhileitisrunning.Thatdata can subsequently be used to train an automated machine learn-ing classiﬁer to distinguish malicious from benign software. One popularmechanisminthe literature forunderstandingmalware’s behaviour during execution is through capturing the calls made to the OS i.e., system calls. In order to capturethis information,

https://doi.org/10.1016/j.jisa.2019.102365

(2)

2 M.Nunes,P.BurnapandO.Ranaetal./JournalofInformationSecurityandApplications48(2019)102365

a tool must create a hook into the OS or monitored process. A hookmodiﬁesthestandardexecutionpathwaybyinsertingan ad-ditionalpiece ofcodeintothepathway[4].Thisisdone inorder tointerruptthenormalﬂowofexecutionthatoccurswhena pro-cessmakes a system call andsubsequently document the event. ThereareanumberofmethodstohooksystemcallsinWindows andthesefall intotwo generalcategories: thosethat run inuser modeandthosethatrun inkernelmode[4].Kernelmodeisone ofthehighestprivilegelevelsthatcanbereachedinthecomputer, whereas user mode is the privilege level that most applications andusers operateat. The argument forhooking inuser mode is that the codeanalysing the sample is“closer” to the application beinganalysed.Whereas,theargumentforhookingatkernelmode isthat the analysis program residesat a moreelevated privilege makingitharderformalwaretohidefromananalysistoolatthis level.

Thetermsuser’andkernelmodearelabelsassignedtospeciﬁc Intelx86 privilegeringsbuiltinto theirmicrochips.Privilegerings relatetohardwareenforcedaccesscontrol.Therearefourprivilege ringsandtheyrangefromring0toring3[5].Windowsonlyuses twooftheserings,ring0andring3.Ring0hasthehighest privi-legesandisreferredtoaskernelmode(thisistheprivilegemost driversrunat)bytheWindowsOS.Ring3hastheleastprivileges andisreferred toasusermode(andisthelevelofprivilegesthat mostapplicationsrunat)[6].WefocusonWindowsherebecause itisstillthemosttargetedOSbymalwareasreportedin[1,7,8].

User-modehookstendtoonlyrecordsystem/APIcallsmadeby a single process since they usually hook one process at a time, whilstkernel-modehooks arecapableof recordingcallsmadeby alltherunning processesataglobal, systemlevel.Thisisan im-portantdifferenceasmalwaremaychoosetoinjectitscodeintoa legitimateprocessandcarryoutitsactivities fromthere(whereit islesslikelytobeblockedbytheﬁrewall).Alternatively,malware could divideits code intoa numberof independentprocesses as proposedbyRamillietal.[9]sothat nosingleprocess initselfis malicious,but collectively, they succeed in achievinga malicious outcome.Therefore the choice ofhooking methodologycould af-fect the quality of the data gained. Another difference between kernelanduserlevel hooksis that each one hooksinto a differ-ent API. For example, one type of kernel level hook is to hook theSystemServiceDescriptorTable(SSDT)whosecallsaresimilar tothosefound inthenativeAPI, whichismostlyundocumented, whilst user mode hooks typically hook the Win32 API which is documented[10].AlthoughmethodsintheWin32 APIessentially callmethodsinthenativeAPI,theremaybesomemethodsinthe nativeAPI that are unique to it (since it is only supposed to be usedbyWindows developers)[11].Likewise,there aresome user level methods that do not make calls into the kernel. Therefore, it is of paramount importance that the difference in utility be-tween data collected at each level is objectively studied so that analystscanmakean informedchoiceonwhichtype ofdata col-lectionmethod to use. Anotherfactor that could affect the data collectedisthatduetothedifferencesbetweenthevarious types ofhookingmethodologies,malwarehastousedifferenttechniques to evade each hooking methodology as mentioned by Shaid and Maarof [12]. Consequently, if a piece of malware is focused on avoidingaparticulartypeofhookingmethodology,itislikelythat anyanalystsusingthesamemethodologytomonitormalwarewill seea very different picture to those usinganother methodology. Evasivemethodsarenotuncommon;infact,onestudyfound eva-sivebehaviourinover40%ofsamples[13].Itshouldalsobenoted thatcurrentlythemajorityoftheexisting literaturecapturesuser levelcallsasshowninTableA1intheappendix.Thissuggeststhat theliterature eitherbelievesthatuserlevel datahasmoreutility thankernelleveldataordoesnotbelievetheretobeasigniﬁcant differencebetweenuserandkernelleveldataforthe purposesof

detectingmalware(althoughthereare kernelleveltoolsavailable, theyarenotaspopularasuserleveltools).

Thus, given the aforementioned evasion concerns and funda-mentaldifferencesineachclassofhookingmethodology,the mo-tivationofthispaperistostudythedifferencesindatacollection atkernelanduserlevel,andconsiderwhetheriteffectsamachine learningmethod’sabilitytoclassify thedata.Inaddition,we pro-videinsightsintotheutilityofthedifferentformsofdatacollected from a machine when observing potentially malicious behaviour. Thisisparticularlyimportantinthecyber-securitydomainwhere the focustends to be onthe dataanalysismethod over thedata capturing method. We hypothesise that the features of malware that are used todifferentiate it frombenignware differbased on the datacapturing methodused. In orderto test our hypothesis, wehavecreatedourownKernelDriverthathookstheentireSSDT withthe exception ofone call. We chose to createourown ker-nel driverasmanyof theexisting toolsthat hook the SSDTonly monitor calls in a specific category (such ascalls relatingto the filesystemorregistry)andprovidenoobjectivejustificationasto whythey chose the calls they did(if they even makethat infor-mation available).Therefore,we hookall thecalls inthe SSDTto ensure we donot miss anysubtledetails regarding malware be-haviourandinordertomakeanobjectiverecommendationonthe mostimportantcallstohook whendetectingmalware.Ourdriver isalsouniqueinthatitcollectstheSSDTdataataglobal system-wide level as opposed to a local process-specific level. In doing this, we expect to determinewhether collecting data ata global levelassistsindetectingmalwareorissimplyaddingnoise.In or-der togather userlevel datatocompare withourdriver,we use CuckooSandbox,sinceitisthemostpopularmalwareanalysistool operatingata userlevel (asshowninTableA1intheappendix). ThedatagatheredfromourdriverandCuckooisthenusedto ex-periment withstate ofart machinelearning techniquesto better understand theimplications of monitoring machine activityfrom differentperspectives. Alongside thegeneralinsights gainedfrom classifying the data, we use feature ranking methods to provide insights concerning the behaviour of malware that is utilised by the classifiersinorder todistinguish it. Inthe interests of trans-parencyandreproduce-ability,wehavealsomadethesourcecode ofourkerneldriveravailableat[14]andthedatafromour exper-imentsavailableat[15].Thedrivercanbeinstalledonanysystem runningWindowsXP32-bitandeasilybeextendedtorunon Win-dows7.Insummary,thenovelcontributionsofthispaperarethe following:

1. Weperformtheﬁrstobjectivecomparisononthe effective-ness ofkernel anduser level calls forthe purposes of de-tectingmalware;

2. We compare the usefulnessof collecting data for malware detection at a global, system-wide level as opposed to a local,individual process level,providing novelinsights into datasciencemethodsusedwithinmalwareanalysis 3. Weassessthebeneﬁtsorotherwiseofcombiningkerneland

userleveldataforthepurposesofdetectingmalware; 4. We identify the features contributing to the detection of

malwareatkernelanduserlevelandthenumberoffeatures necessarytogetsimilarclassiﬁcationresults,providing valu-ableknowledge on theforms ofsystembehaviour that are indicativeofmaliciousactivity;

5. Weconductanextensivesurveyofdynamicmalware analy-sistoolsusedorproposedintheliterature;

6. We createa driverthat hooks allbut one callin theSSDT andgatherscallsatagloballevel,whichcanbeusedto ex-tendandenhanceourwork.

Theremainderofthispaperisstructuredasfollows:Section2’ furtherdescribesthevarioushookingmethodologiesandthe

(3)

mo-Fig.1. Systemcallvisualisation.

tivation for this paper. Section 3’ describes the various method-ologies already employed in the literature to gather kernel calls.

Section 4discusses theexperimentsthatwereperformedandthe environment they were performed in.Section 5 presentsand in-terprets theoutput fromtheseexperiments,andinSection 6,we summariseourworkandoutlinethenextsteps.

2. Problemdeﬁnition 2.1. Systemcallstructure

Inordertounderstandinghowsystemcallsarehooked,itis im-portanttoﬁrstunderstandhowsystemcallsarestructured.Fig.1

providesan exampleofthestructureofacalltreeforaWindows systemcall. From usermode,a process maycallcreateFileA, cre-ateFileW, NtCreateFile, or ZwCreateFile, however, ultimately, they all lead to the NtCreateFile method in the SSDT. In response to a systemcall beingmade, the processormust move fromRing3 (user level) to Ring 0 (kernel level). It does this by issuing the sysenter instruction.Although createFileAhasbeenshown tocall NtCreateFile/ZwCreateFile in Fig. 1, strictly speaking, it calls cre-ateFileW.However,astheyareprovidedbythesamelibrary,they areshownatthesamelevel.FromFig.1itcanbeseenthattoget thesameinformationwithinusermodethatisavailableinkernel mode, moremethods need to be hooked.The beneﬁtofhooking inuser-mode,however,is thattheanalysistool canobserveﬁner detailsinsystemcallsmade. Ouraiminthisresearchisto under-standifthesedetailsarehelpfulorirrelevant.

2.2. Systemcallhooking

Fig.2showsthehookingmethodsthatcanbeusedtointercept systemcallsorganisedaccordingtotheprivilegetheyhookat.

Fig.2showsthatthereareanumberofwaystointercept API-callsusinghooks— bothatuserlevelandkernellevel.Eachworks inaslightlydifferentway.AnImportAddressTable(IAT)hook mod-ifiesaparticularstructureinaPortableExecutable(PE)file.ThePE fileformatreferstothestructureofexecutablesandDLLsin Win-dows [16]. IAT hooksexploit a feature of thePE file format, the imports thatarelisted inaPEfile aftercompilation.AnIAThook modifies the imports so that the import points to an alternative pieceofcodeasopposedtothelegitimatefunction[11,17].An in-linehookreferstowhentheprologueofafunctionisreplacedin memorywitha jumpto anotherpiece ofcode[18].InWindows, thefirst fivebytesofmostfunctionsare thesame,therefore,this canbereplacedwithajumptoanalternativepieceofcodewhere the systemcall can be logged, andthen control can be returned back to the original function (afterexecuting the functionalityin thefirstfivebytes).

Instrumentation refers to the insertion of additional code into a binaryor systemforthe purpose ofmonitoring behaviour.

Dy-namic instrumentation implies that this occurs at runtime [19]. SSDT hooks modify a structure in kernel memory known as the SystemServiceDescriptorTable(SSDT).TheSSDTisatableof sys-temcalladdressesthattheOSconsultstolocateacallwhenitis invokedby aprocess. An SSDThook replacesthe systemcall ad-dresses withaddressesto alternativecode [4,11].In aModel Spe-cific Register(MSR) hook, the value ofa specific register is over-written so that it holds the address of the code performing the hooking.Thisregisterissignificantasaftera systemcallismade, itsvalueisloaded intotheEIP register(which istheregisterthat pointstothenextinstructiontobeexecuted).MSRhooksare fre-quently employed by Virtual Machine Introspection (VMI) solu-tions.VMIrefers tosolutionswheretheanalysisengineresidesat thesameprivilegelevelasthehypervisororVirtualMachine Mon-itor(VMM)[20].ThelastmethodisIRPhooking(asimilargoalcan be achieved with filter drivers). I/O request packets or IRPs, are usedto communicate requests touse I/O to drivers.In IRP hook-ing,adriverinterceptsanotherdriver’sIRPs[4,11].Filterdriversare driversthatessentiallysitontopofadriverforadevice meaning thattheyreceivealltheIRPsintendedforthatdriver[21].

Thereareanumberofresourcesthatdescribeeachofthe hook-ing methodologiesin much more detail such as[4,11,22]. As can beseen, thewayeach mechanisminterceptsAPI-callsdiffers sig-nificantly,andeachisthereforedetectedandevadedinadifferent manner. Furthermore, each mechanismhooks into different APIs (asmentionedpreviously),dependingonwhetheritisauser-mode orkernel-mode hook. Givenall thesedifferences, there is a very realpossibility that a tool hooking inuser-mode and monitoring aspecific process willget differentdatato atool monitoring the sameprocessinkernel-mode.Thisthereforeraisesthequestionof whichprivilegelevelgathersmorebeneficialdataforthepurposes ofdetectingmalware? Thisisthe questionthispaperattemptsto answer.

3. Literaturesurvey

In order to gain a better understanding of the tools used in the literature andthe methods that the toolsuse to gather API-calls,weconductedanextensivereviewoftheliteratureandnoted whichtoolwasused.TheresultsofthisareshowninTableA1in the appendix. Table A1 contains ﬁve columns; “Name” which is thename ofthe tool,“Description” which describes the tool and thehookingmethodologyituses,“KernelHook” whichis marked ifthe tool employs a hook atkernel level,“User Hook” which is marked if the tool employs a hook at user level, and“Used By” whichliststhepapersthatusedthattool.Foreachtoolmentioned inTableA1,ifthetool wasavailable online,wetesteditinorder to understand how it was intercepting API-calls. Where the tool wasnotavailable, we useddocumentation todetermine thetype ofhookbeingused.Tolimitthelengthofthetable,TableA1only

(4)

Fig.2. Hookingmethodologies.

containstools that hadbeen used at leastonce in the literature (i.e.,atleastoneentryintheir“UsedBy” column).

AscanbeseeninTableA1,themajorityoftoolsusedtogather APIcallsforthepurposesofmalwareanalysisuseuserlevelhooks (72%).Currently,theliteraturesuggeststhatCuckooSandboxisby farthemostusedtool. However, thatdoesnot meanthat all pa-persusingCuckoocollectedthesamedata,asit shouldbe noted thatCuckoocanbeenhancedtologadditionalAPIcalls.Ultimately, alluserleveltoolssufferfromthesameproblem,inthattheyrun atthesameprivilegelevelsasthefiletheyaremonitoringandare thereforemucheasiertoevadethankernelleveltools.Intermsof kerneldata,thereareanumberofmethods usedintheliterature togatherdata atthislevel.Thesecan roughlybe groupedby the specifichooking method they employ tointercept calls. The four maincategoriesofkernel-modemethods intheliteratureare: fil-ter drivers,MSRhooks & VirtualMachine Introspection,Dynamic BinaryInstrumentation(DBI),andSystemServiceDescriptorTable (SSDT)hooks.

3.1.Filterdrivers

Filter drivers do not directly communicate withthe hardware butsit on top of lower-leveldrivers and intercept anydata that comes their way. The mostwell-known tools using filterdrivers areProcmon [23] andCaptureBAT[24].H˘ajm˘a¸san etal.[25] take a similar approach to that taken by Procmon anddevelop a fil-ter driverthat registers withWindows callback functions[26] so thatitisnotifiedwhenanychangesare madeto theregistry,file system,orprocesses.ZhangandMa[27]takeanovelapproachby interceptingIRPsintheirsolution,MBMAS.Theythenusemachine learningtoclassifysequencesofIRPsasmaliciousorbenign. How-ever,thelimitationwithusingfilterdriversisthattheycannot in-terceptthesamebreadthofAPI-callsthatother hooking method-ologiescan.Theyfocusonthemajoroperationsinparticular cate-gories(suchasfilesystemandregistry).

3.2.Modelspeciﬁcregisterhook

A Model Specific Register (MSR) hook essentially hooks the sysenter instruction. More specifically, it involves changing the value of a processor-specific register referred to as the SYSEN-TER_EIP_MSRregister.Thisregisternormallyholds theaddress of

thenext instruction to execute whensysenter iscalled(which is calledevery timeasystemcallismade).Thereforeifthisvalueis altered,whenthesysenterinstructioniscalled,theprocessorwill jump tothe address pointedto by the newvalue in theregister (whichinthiscasecanpointtotheanalysisengine).SinceanMSR hookmodiﬁesaprocessorspeciﬁcregister,developersneedto en-surethattheymodifytheregistersoneach processor(sincemost systemsnowadayscontainmultipleprocessors)[6].Thereare few examples ofan MSRhook beingused asa standalone methodin theliterature. Usually,it isemployed inthecontext ofVMI solu-tions.

VMIreferstotoolthatoperateatthesamelevelasthe Hyper-visor. Thisprovides benefits such as theability to monitora VM withouthavingalargepresenceontheVM(andtherebymakingit harderformalwaretodetectthepresenceoftheanalysisengine). Thedifficultywithmonitoringatthislevelisthata“semanticgap” mustbebridgedinsomeway.Thesemanticgapreferstothefact that whenmonitoring attheVMM layer,much ofthedata avail-ableisverylowlevel(suchasregistervalues).Thisdataisnotata levelofgranularitythat iseasy tointerpret.Therefore,inorderto bridgethat,solutionsuseanumberoftechniquestoconvertthese valuesto moreabstractvalues.Forexample,asmentioned previ-ously,VMIsolutionsuseavariationoftheMSRhookwhereby in-stead ofplacing the addressof theanalysis solutioninto the SY-SENTER_EIP_MSRregister,an invalidvalueisplacedintothat reg-ister.Asaresult,everytime asystemcallismadeandsysenteris called,apagefaultwilloccur.ThiswillinturnleadtotheVMEXIT instruction beingcalled which will pass control to the VMI tool (since it operates at the same level as the hypervisor). The VMI tool mustthenexamine thevalue oftheEAXregisterinorderto findoutthesystemcallmade.Sincemonitoringsystemcallsinthis manner canhave a significantimpact on performance, VMItools usually limit their monitoring to a particular process. Toachieve this, the tool must monitor for any changes in the CR3 register. TheCR3registercontainsthebaseaddressofthepagedirectoryof thecurrentlyrunningprocess,therefore,ifthepagedirectory ad-dressoftheprocessofinterestisknown,thensystemcallscanbe filteredtoonlythoseemanatingfromtheprocessofinterest.

There area numberofVMI solutionsin theliterature. TTAna-lyze [28] is one ofthe best known toolsemployingVMI. TTAna-lyze executes malware inan emulated environment (QEMU [29]) as opposedto a virtual one. Unlike virtual environments (where

(5)

mostinstructionsareexecutedontheprocessor),inemulated en-vironments all instructions are emulated in software. This, they explain, makes it harderfor malwareto detect that they are not in a real environment since a realsystem can be mimicked per-fectly.However,thiscomesattheexpenseofperformance,as sam-plesareexecuted signiﬁcantlyslower.Anotherwellknowntoolin thisdomain isPanorama [30].Panorama isbuilt ontop ofTEMU

[31] (the dynamic analysis component of BitBlaze [31] that can performwhole-systeminstruction-levelmonitoring),andperforms fine-grainedtaintanalysisby monitoringanydatatouchedbythe executablebeinganalysed.Its contributionliesinthefine-grained tainttrackingitperforms,evenrecordingkeystrokes amongmany other things.Ether[32] isatool inVMIthat differsby exploiting Intel VT[33] which enableshardwarevirtualisation andprovides a significantperformance boostwhenrunningaVM.Etherisalso particularly focused on not being detectable by malware and, as such, hasverylittlepresence ontheguestmachine. Osiris [34]is similar to Ether, however, it manages to perform an even more complete analysis by also monitoring any processes the original process injectsits codeinto.Lengyeletal.[35] proposeDRAKVUF whichfocusesmoreonreducingthepresenceofananalysisengine from the guestmachine as normally there issome code present ontheguesttoruntheprocess beingmonitoredorhelptheVMI solution with the analysis. However, DRAKVUF employs a novel methodto executemalwareusingprocess injectionandtherefore doesn’trequireanyadditionalsoftwaretobepresentontheguest. Inaddition,itmonitorscallsatbothuserandkernellevel.Pékand Buttyán[36]takeadifferentapproachbyusinginvalidopcode ex-ceptions instead of breakpoints to intercept system calls. Invalid opcode exceptionsare raised ifsystemcalls aredisabled when a systemcall iscalled. This,they argue, hasbetterperformance. In addition,theirmonitoringsolutionisnotpairedwithahypervisor butexploitsavulnerability[37]tovirtualisealivesystem,forgoing theneedforareboottoinstallthemonitoringsolution.

While it’s clearthat signiﬁcant progress has been made with VMM solutions, there isstill adelay overheadincurred fromthe mechanism(breakpoints/pagefaults)thatistypicallyusedto mon-itor API-calls. Ether,a well-known tool inthis genre,was shown to haveapproximatelya 3000timesslowdown [38].This, among otherthings,makesiteasierformalwaretodetectthepresenceof a monitoring tool. Furthermore,while some solutions have man-aged toremove much ofthepresence oftheanalysis component fromthemachinebeingmonitored,thishastheunfortunateeffect ofmakingitevenmorechallengingtobridgethesemanticgap. 3.3. Dynamicbinaryinstrumentation(DBI)

DynamicBinaryInstrumentationreferstotheanalysisofan ex-ecutablethrough the injectionof additionalcode intothe source orcompiledcodeatruntime.Thisisusuallyimplementedusinga Just-in-Time(JIT)compiler.InDBI,codeisexecutedinbasicblocks, andthecodeattheendofeachblock ismodiﬁedso thatcontrol is passedto the analysisengine where it can perform anumber ofchecks,suchaswhetherasystemcallisbeingexecuted[39,40]. Twoofthemostpopularframeworksforachievingdynamic instru-mentationinWindowsareDynamoRIO[39]andIntelPin[41].

The main limitationin solutionsusing JIT compilation is Self-ModifyingandSelf-Checkingcode(SM-SC)sinceDBIsolutionscan bedetectedbythemodiﬁcationstheymaketothecode.Therefore, SPiKE[42]wasproposedasanimprovementtosuchtoolssinceit uniquely did not use a JIT compiler, butbreakpoints in memory. Speciﬁcally,itemploys“stealthbreakpoints”[43],thatretainmany of the properties of hardwarebreakpoints, butdon’t suffer from thelimitationthatpurehardwarebreakpointsdoofonlyallowing theuserto setbetweentwoandfour.Through usingsuch break-points, it isharderto detect the presenceof themonitoring tool

andthetoolismoreimmunetoSM-SCcode.Reportedly,thiseven broughtaperformancegain.Polinoetal.[40] builttheir solution, Arancino, on top of Intel Pin which is focused on counteringall knownanti-instrumentationtechniquesthatareemployedby mal-waretoevadedetection.Theyachievethroughtheuseofanumber ofheuristics.

Theproblemsthatsolutionsinthisspacesufferfromis perfor-manceandremainingundetectablebymalware.Though[40]make aconsiderableefforttowards improvingthis,they admittheir so-lutionisunlikelytobeundetectable.

3.4.SSDTHooks

ThisisthemethodchoseninthispapertomonitorAPIcallsat akernellevel.Wechose tousean SSDThookover afilterdriver, MSR hook, or DBI tool for a number of reasons. A filter driver tends toobtain theresults fromcalling asystemcall asopposed tothe exactsystemcalls called.While a VMM-layermonitorand DBI tool can suffer from a significant delay due to the manner inwhich it interceptssystemcalls, allowing malware to detecta monitorthroughmeasuringthedelayfromperformingspecific ac-tions.Inaddition,itcanbe difficulttodealwithSM-SCcodewith suchtools.Furthermore,bridgingthesemanticgapwhilstkeeping transparencycanbeextremely challenging.Ultimately nomethod iswithoutitslimitations(includingtheSSDThook),butwechose to usean SSDT hook since it hasthe mostsimilarities in imple-mentationto auser-levelhook (except thatit hooksintothe un-documentedkernel)andthedatareturnedfromitisanalogousto thatreturnedfromauserlevelhook.Thereforeitseemsmost suit-ableforthepurposesofacomparison.AnSSDThook alsohasthe benefitofnotmodifyinganythingondisk(sincetheSSDTis mod-ifiedin memory)and thereforeleaves asmaller footprint on the analysismachine.

While SSDT hooks have been used previously, they have not had as comprehensive a coverage of calls as ours has. Li et al.

[44] employed an SSDT hook to automatically build infection graphsandconstructsignaturesfortheir system,AGIS(Automatic GenerationofInfectionSignatures).AGISthenmonitorsaprogram toseeifitcontravenes asecuritypolicyandmatchesasignature. Therefore,itonly focusesoncalls fromaspeciﬁc process and ig-nores all other calls. Kiratetal.[45] propose BareBox tocounter the problems associated with malware capable of detecting that it is being run in a virtual environment. Barebox runs malware in a real system and is capable of restoring the state of a ma-chineto aprevious snapshotwithin fourseconds.Barebox moni-torswhattheauthorsperceivetobe importantsystemcallsusing anSSDThook.However,asthenumberofdevicesattachedtothe machineincrease,thetime ittakesBareboxto restorethesystem toa clean state increasesconsiderably.Grégio etal.[46] propose BehEMOT (Behaviour Evaluation fromMalware Observation Tool) whichanalysesmalwareinanemulatedenvironmentﬁrst,thenin a real environment ifit does not run within the emulated envi-ronment.Theyusean SSDThook to monitorAPI calls relatingto certainoperations. However,by performing analysisonareal en-vironment,BehEMOT suffers a similar problemto Barebox in re-lationto restoration time. Furthermore, the focuswith BehEMOT seems tobe producing human-readableandconcise reportsafter eachanalysisandtherefore,onlysmall-scaletestswereconducted onahandfulofsamples.

Asmentionedpreviously,whereoursolutiondiffersisthat pre-vioussolutionsusingSSDThooksonlylogcallsmadetocertainAPI callsbycertainprocesses.Ourtoollogsallcalls(exceptone)byall processesinordertodeterminetheirutilityinclassiﬁcation.TEMU istheonlytooltooffersimilarfunctionality,however,whereit dif-fersisthatitrunsinanemulatedenvironment(whichiseasierfor

(6)

Fig.3. HowtheAUCrespondsassamplesizeisincreased.

malwareto detect [47]) andis focused on providing instruction-leveldetailsasopposedtohigh-levelsystemcalls.

4. Method&implementation

In order to conduct the experiments required for our study, 2500malicious samples were obtainedfromVirusShare [48]and 2500cleansampleswereobtainedfromSourceForge[49]and File-Hippo[50].Inordertoselectan appropriatesamplesize,we con-ducteda series ofclassification experiments (described later)on different sample sizes and monitored the trend in the Receiver OperatingCharacteristic(ROC)AreaUndertheCurve(AUC)results (ROCAUCisdescribedlater).Intheseexperiments,wevariedthe samplesize from100samples upto over2000(in incrementsof 100)and foreach sample size, we trainedthe leading classifiers (using 10-fold-cross-validation) and notedthe ROC AUC returned bytheclassifier.WethenplottedtheROCAUCagainstthesample sizeandobservedwhenthecurveplateauedforeachclassifier.The resultsareshowninFig.3.

Fig. 3 showsthat after 1000 samples, the AUC values almost completelyplateau.Thissuggeststhatafterthispoint,addingmore sampleswill have an insigniﬁcant effecton the classiﬁcation re-sults.Therefore,we concludedthat 2500sampleswouldbe more thanenough.Inaddition,thissamplesizecorrelatedwiththe data-setsizesusedintheliterature[51–54].Thecategoriesofmalware inourdatasetareshowninTable1.Thisinformationwasobtained fromVirusTotal[55].Withregardstothecleansamples,eachwas runthroughVirusTotaltoensurethatitwasnotmalicious.

Togathercalls madetotheSSDT,wewrotea WindowsKernel Driver tohook all butone kernel call inthe SSDT since noneof thetoolsavailablecurrentlyprovidethis.Theonlycallwedidnot hook,NtContinue,wasnot hookedduetothefactthat hookingit produced critical systemerrors. Our Kernel driver gathers global datafroma systemperspective asopposedto simplymonitoring callsfroma singleprocessintroduced intothe system. Therefore, the data fromthe tool can be used to predict whetherthe

ma-Table1

Quantityofeachcategoryofmalwarein ourdataset. Category Quantity Trojan 1846 Virus 458 Worm 86 Rootkit 34 Ransomware 23 Adware 22 Keylogger 2 Spyware 2

chine’sstateismaliciousornot.Togatheruserleveldatawechose to usea tool readilyavailable since thereare alreadywell estab-lished solutions providing this. Speciﬁcally, we chose to use the toolmostfrequentlymentionedintheexistingliterature– Cuckoo (speciﬁcally, Cuckoo 2.0.3). Cuckoo is a sandbox capable of per-formingautomatedmalwareanalysis.

The experiments were carried out on a virtual machine with WindowsXPSP3installed.WechosetouseWindowsXPas writ-ingaKerneldriver,particularlyonedelvinginundocumentedparts of Windows, is frustratingly challenging. This, however, is made slightly easierin Windows XP dueto the fact that it has slowly become more documented through reverse engineering. In addi-tion,all 64bitsystems arebackwardscompatiblewith32bit bi-naries[56]andthemostcommonlyprevailingmalwaresamplesin thewildarealso32bit[57](withnota single64-bitsample ap-pearinginthetoptenmostcommonsamples).Asof2016,AVTEST found that 99.69%of malwarefor Windows was32bit [58]. The reasonforthepopularityof32bitmalwaresamplesover64bitis that its scope is not limitedto one architecture. Therefore,given thecurrentprevalenceof32bitmalware,wedidnotconsiderthat using Windows XPwould make ourresults anyless relevant es-pecially sinceourmethodcould berepeatedonother versionsof

(7)

Fig.4. Workﬂowdiagramofourproposedsystem’spipeline.

Windows anditwouldsimplifythealreadychallenging engineer-ing task.The hostOSwasUbuntu16.04andtheHypervisorused wasVirtualBox[59].Boththehostandguestmachine hada con-nectiontotheInternet.Inordertoensurefairnessandtoprovide automation, identicalsandbox features to Cuckoo (such as simu-latedhumaninteraction)were implementedforourkerneldriver.

Fig.4showsoursystemdiagramdescribingtheentire experimen-talprocessinordertoobtaintheresults.

Our kerneldriver creates one CSV file foreach system call.A new lineis written toeach file every time the systemcall asso-ciated withthe fileiscalled. Aftertheanalysis, ashared folderis usedtotransferovertheCSVfilestotheanalysismachine.Cuckoo operatesinasimilarmannerhoweveritusesnetworkconnections to transfer over analysis files fromthe VM to the host machine, afterwhichwetransfertheJSONfile totheanalysismachine.We encodetheoutputproducedfromeachofthemonitoringtools us-ingafrequencyhistogramofcallswithinatwominuteperiod.This featurerepresentationisusedtofitaclassificationmodelforvirus detection.

4.1. Initialexperiments’parameters

The transformed data fromCuckoo andthe Kerneldriver was then classified using a selection of machine learning algorithms provided by scikit-learn [60]. The machine learning algorithms chosen were drawn from the existing literature, as the focus of this research is on the utility ofthe different viewsof machine-level actions (user vskernel) ratherthan new classification algo-rithms. The classificationalgorithms we usedwere AdaBoost, De-cision Tree,Linear SVM, NearestNeighbours, andRandom Forest. The reasonwe chosethesealgorithmsisthat bothDecisionTrees andSVMsareusedwidely intheliterature[61–66].Random For-est,whilenotusedasfrequently,whenused,achievedimpressive results[61,65,67,68]ashasAdaBoost [61].Inaddition,though Ad-aBoostisanensemblemethodlikeRandomForest,itcomesunder a differentclass ofensemblealgorithms that useboosting as op-posedtobagging (likeRandomForest)andthereforemayalsobe capable ofstrong results.Finally, NearestNeighbours was chosen duetoitssimplicityinordertosetabaseline.Eachofthese meth-odsareverywelldocumented,however,briefly,AdaBoost[69]isa collectionofweakclassifiers(frequently DecisionTrees)onwhich thedataisrepeatedlyfittedwithadjustedweights(usually weight-ing misclassifiedsamplesmoreheavily)until,together,the classi-fiersproduceasuitableclassificationscoreoracertainnumberof iterationsarecomplete.DecisionTrees[70]createif-thenrules us-ing thetraining data which they then use to make decisions on unseendata.TheK-NearestNeighbormethodpicksrepresentative pointsineach classandwhen presentedwitha newobservation calculates its proximityto thepointsandassignsit to whichever isclosest.SVMs[71]separate thedatabyfindingthehyperplanes thatmaximizethedistancebetweenthenearesttrainingpointsin each class. Random Forest [72], like AdaBoost, is a collection of classifiers,and,likeAdaBoost,theclassifiersareall decisiontrees. However, AdaBoost tends to employ shallowdecisiontrees while Random Forest tends to use deep decision trees. Random Forest

splitsthedatasetbetweenallthedecisiontreesandthenaverages theresult.

For each classifier, the data was split using 10-fold cross-validationasitisalsothestandardinthisfield[54,61,63,73].Itis possibletoobtainanumberofmetricsrelatingtotheperformance oftheclassifiersofwhichwehavechosen touseArea Underthe ReceiverOperatingCharacteristic(ROC)Curve(AUC),Accuracy, Pre-cision, andF-Measure since theseare the metrics commonly re-portedin theliterature [51,63,64,68,74]andthey providea com-pleteview of the performance of the algorithm without missing outonsubtledetails(suchasthenumberoffalsepositives).To un-derstandthesemeasuresinthiscontext,itisimportanttodefinea fewbasicterms.WeinterpretTruePositives(TP)asmalicious sam-plesthatarecorrectlylabelledbytheclassifier asmalicious.False Positives(FP)are benignsamplesthatare incorrectlypredictedto bemalicious.TrueNegatives(TN)arebenignsamplesthatare cor-rectlyclassifiedasbenign.FalseNegatives(FN)aremalicious sam-ples thatare incorrectly classifiedasbenign. Withregardsto the actualmeasuresused,AUCrelatestoROCcurves.ROCcurvesplot TruePositiveRate(TPR)againstFalsePositiveRate(FPR).FPRisthe fraction of benign samples misclassified as malicious, while TPR representstheproportionofmalicioussamplescorrectlyclassified. AROCcurveshowshowthesevaluesvaryastheclassifier’s thresh-oldis alteredandthereforetheAUC isagoodmeasure ofa clas-sifier’sperformance. Accuracy can be describedasall the correct predictions(maliciousandbenign)dividedbythetotalnumberof predictions.Precisionisthenumberofcorrectlylabelledmalware dividedbythesumofthecorrectlylabelledmalicioussamplesand the incorrectly labelled clean samples (_TPTP₊_FP). Thisgives us the proportionofcorrectlylabelledmalwareincomparisontoall sam-pleslabelledasmalware.Recallisthecorrectlylabelledmalicious samples divided by the correctly labelled malicious samples and incorrectly labelled malicious samples (_TPTP₊_FN). This tells us the proportionof malicioussamples that are correctly identified.We chosetoincludeprecisionsincefalsepositivesareacommonissue inmalwaredetection.Recallwasnotincludedforbrevityandsince it can be quickly calculated from F-Measure (which is included) whichistheharmoniousmeanofprecisionandrecall.

Inordertoconfirmwhetherthedifferencesinclassification re-sultswere statisticallysignificant ordueto randomness,we con-ducted10-fold cross-validation100 times foreach classifier. This gaveus 1000 AUC values foreach classifier.We then checked to seeifthe 1000 valueswere normallydistributedusing Q-QPlots oftheAUCvaluesagainstanormaldistribution.Providedthedata wasnormal,wethenperformedWelch’st-test[75]inorderto de-terminewhetherthedifferencesbetweentheclassificationresults werestatisticallysignificantornot(withoursignificancelevel,

α

, set to 5% as is commonly used). We used Welch’s t-test due to its robustness and widespread recommendation in the literature

[76,77].

Inaddition,inordertogaininsightintowhethercollectingdata atagloballevelismorebeneﬁcialforclassifyingmalware,theAPI callsloggedbythekerneldriverwerereducedtojustthose com-ingfromtheprocessthatwasbeingmonitored(andanychild pro-cessesthatitcreated).Finally,thesamedatafromCuckooandour

(8)

Fig.5. Examplegraphoffeaturerankingmechanism.

KernelDriverwascombined.Thiswasdonetoseeifthe combina-tionofuserandkernelleveldatacanimproveclassiﬁcationresults. 4.2.Individualfeatureranking

To further understand the data recorded fromthe kernel and userlevel,andconfirmwhetherthefeatures beinguseddiffer de-pendingonthedatacollectionmethodused,werankedfeaturesby importanceusing two metricsforthe classifier that hadthebest results.Forthefirstmetric,we putthedatafromonefeature(or API-call)atatimethrougheachclassifierandnotedtheclassifier’s AUC score in differentiating malicious from clean usingonly the datafrom that feature. We refer to this asthe independent fea-tureranking method.This method can give an indication onthe strength ofindividual features. Where it lacks, however, isin its abilitytoaccountfortherelationshipbetweenfeatures.For exam-ple,afeatureonitsownmaynotbethatstrong,butwhenpaired withanother,may be very strong.Therefore,to account forthat, wealsorank featuresusing eachclassifier’sin-built feature rank-ingmechanism(whichwe referto asthein-built featureranking method).Thisrankingmechanismworksindifferentways depend-ingontheclassifier used.ForDecisionTreesscikit-learn usesthe Giniimportance asdescribedhere[70].Thesameis truefor Ran-domForestsandAdaBoostsincetheyarecomposedofamultitude ofDecisionTrees.Theonlydifferencebeingthatastheyare com-posedofmultipleDecisionTrees,theimportanceisaveragedover each one. Finally, with Linear SVMs, the coefficients assigned to eachfeatureisusedtorankthem.InthecaseofK-Nearest Neigh-bour,thereisnoin-builtfeaturerankingmechanism,therefore,we donotincludeitinthismeasure.

Inordertoverifythatbothofthefeaturerankingmethodswere selectingfeaturesthat are optimal,andthat theresultsthey pro-ducedcouldbereliedon,wecreatedaplotbycalculatingtheAUC usingonly the top ‘x’ features where‘x’ was gradually increased from10byincrementsof10uptothetotalnumberoffeatures.In addition,thiswouldshow theminimumamountof features nec-essarytoobtainsimilarclassiﬁcationresults

4.3.Completefeatureranking

In order togain a more consistent butconciseview of which featuresseemedtobeassignedahighimportance,wecreatedan

aggregate measure to rank features across all the classifiers. We applied it to both the in-built and independent feature ranking methods.Thiswillshowwhichfeaturesarerobustsincethe previ-ousmeasureonlyshowsthetoptenforthebestclassifier— which couldarguablybeskewedinitsfavour.Theaggregatemeasurewas calculatedasfollows.Foreachclassifier,thefeatureswere ranked accordingtothescoretheyweregivenbythe independentor in-built featureranking method. Then, the rankwas plottedon the x-axisfrom0(the bestrank)tothetotalnumberofAPI-calls(the worst rank). On the y-axis wasa score from0 to 1 andat each rank _number_of1_classifiers wasaddedtothescore.Oncethiswasdone, we found the area underthe curve and that representedthe to-tal strength of the features across all classifiers. This global fea-ture rankingmethod can be used with anylocal feature ranking method. Fig. 5 shows an example of this global feature ranking method.InFig.5,the featureinquestionhasgottheranks 0,20, 50,and200inthefourclassifiersitwasusedwith.Ateachrank, thevaluehasgoneupby1/4(sincetherearefourclassifiers).Ifa featurewasrankedasthemostusefulfeatureacrossallclassifiers, itsrankswouldbe0,0,0,and0,andthereforetheareaunderthe curveforitis1.

5. Results

Inthis section,we show the resultsfromclassifying data col-lected ata kernel anduser level.In addition,in orderto further understand the contributing factor to the results for the kernel data, we conduct additional experimentswith modiﬁedforms of thedata.Finally,inordertogainabetterunderstandingofthe re-sults,welookatthetenmostsigniﬁcantfeatures inorderto un-derstandwhatthemachinelearningalgorithmsareusingto iden-tifymalware

5.1. Initialexperiments

The results from classifying data collected using the Kernel DriveratagloballevelanddatacollectedfromCuckooareshown inTable2.

Onthe whole,the resultsshow that thedata fromthekernel driver ismarginally better forthe purposesof differentiating be-tweencleanandmaliciousstatesregardlessofthemachine learn-ing algorithmused. The algorithmwiththe bestperformance for

(9)

Table2

ComparisonofclassiﬁcationresultsofdatafromCuckooandKerneldriver. Machinelearning

algorithm

Kerneldriver Cuckoo

AUC Accuracy Precision F-measure AUC Accuracy Precision F-measure AdaBoost 0.983 94.1 0.934 0.941 0.973 91.8 0.911 0.920 DecisionTree 0.944 92.3 0.906 0.925 0.943 87.8 0.918 0.913 LinearSVM 0.945 90.3 0.873 0.906 0.932 86.9 0.835 0.870 NearestNeighbour 0.964 90.3 0.896 0.903 0.942 86.2 0.877 0.863 RandomForest 0.986 95.2 0.960 0.944 0.984 94.0 0.958 0.942 Table3

p-values returned from Welch’s T-Test using AUCvalues.

Machinelearningalgorithm p-value AdaBoost 1. 80×e−208

DecisionTree 1. 41×e−6

LinearSVM 8. 41×e−78

NearestNeighbour 9. 29×e−290

RandomForest 2. 29×e−10

both Cuckooandthe KerneldriverwasRandomForest, obtaining an AUCof 0.986and0.984, andanaccuracy of95.2and94.0 re-spectively. Wealso foundthat, onaverage (of1000 runs),93% of thesampleswere giventhesamelabelbyRandomForest regard-lessofwhetherkernel orcuckoodatawasused. Thisshowsthat whilethereisagreementonalargenumberofsamples,thereare stillsomesampleswheredatafromonewasbetterthantheother forclassifyingmalware.

In order to verify whetherthe difference betweenthe Kernel andCuckooclassificationresultsarestatisticallysignificantandnot justoccurringbychance,weusedWelch’st-testontheAUCvalues asdescribedearlier.Aprerequisite forusingWelch’st-testisthat thedatamustbenormallydistributed.WeverifiedthisusingQ-Q plotsasshowninFig.6.

The Q-Q plots show the distribution of the AUC values and how closely (or otherwise) they relate to the normal distribu-tion (shown as a red line). The plots show that the AUC values barely deviatefromthenormal distribution.Therefore,Welch’st -test wouldbean appropriate testtoobserveifthedifference be-tween the Kernel and Cuckoo values are statistically signiﬁcant. Given that the Q-Q plots forthe Cuckoo data were very similar, wechosenottoshowthemhereforbrevity.

In Welch’s t-test, the null hypothesis is that the means are equal (i.e., H0:

μ

1=

μ

2), and therefore the alternative

hypothe-sis isthatthemeansare notequal(i.e.,Ha:

μ

1=

μ

2).Wesetthe

threshold

α

value tobe 0.05asit isan appropriate levelforour experimentation.Therefore ifthep-value returnedfrom perform-ing Welch’s t-test wasless than

α

, we wouldreject the null hy-pothesis.Table3showstheresultsofperformingWelch’st-teston theAUCvaluesfromeachclassiﬁer.

AsTable3shows,thep-valuesreturnedareconsiderablylower than thethreshold,0.05. Therefore, wereject the nullhypothesis thatthemeansoftheKernelandCuckooAUCvaluesforeach clas-sifierarethesame.Thisshowsthat,atasignificancelevelof0.05, thedifference betweenthekernelandCuckoo resultsare statisti-callysignificantandnotjustduetochance.

Therefore, from the results in Table 2, we can conclude that datacollectedatthekernellevelproducesbetterclassiﬁcation re-sults than that collected at a user level, however, it is unclear whetherthisisbecausethedatacollectedatakernellevelwasat ahigherprivilegeandhookingadifferentAPI,orbecausethedata wascollected ona globalscale of all runningprocesses allowing usto seeeverything happeningonthemachine. Inorderto clar-ifywhethercollectingthedataatagloballevelassistedorharmed

Table4

ClassiﬁcationresultsofdatafromtheKerneldriverfocusingontheprocessunder investigation.

Machinelearning algorithm

Localisedkerneldriver

AUC Accuracy(%) Precision F-measure AdaBoost 0.962 89.6 0.902 0.891 DecisionTree 0.901 83.8 0.855 0.825 LinearSVM 0.884 82.0 0.893 0.788 NearestNeighbour 0.934 86.6 0.875 0.858 RandomForest 0.978 92.3 0.944 0.921 Table5

ClassiﬁcationresultsfromcombiningCuckooandkerneldata. Machinelearning

algorithm

Cuckooandkerneldriver

AUC Accuracy(%) Precision F-measure AdaBoost 0.990 94.9 0.956 0.960 DecisionTree 0.954 92.4 0.924 0.936 LinearSVM 0.952 91.5 0.916 0.915 NearestNeighbour 0.960 90.3 0.873 0.888 RandomForest 0.990 96.0 0.962 0.942

theclassiﬁcation process, we limitedthe kerneldatacollected to thatofthe dataproducedby theprocess beinganalysedandany processesitcreated.TheresultsfromthisareshowninTable4.

FromTable4,itcanbeseenthattheclassificationresultshave decreasedwhen collecting data fromthekernel driverata local, process-specific, level.Forexample,withRandom Forest theAUC has decreased from0.986 to 0.978 and the accuracy from 95.2% to92.3%.Inaddition,thedifferencesbetweenglobalandlocal ker-nel datawere also found to be statisticallysignificant. Therefore, itis evident that collecting dataata kernellevel isnot the only contributingfactortotheimprovedclassificationresultsoveruser level,datamustalsobe collectedataglobal-levelinorderto ob-tainbetter classificationresults.It isalsointeresting tonote that, at a significance level of 0.05, the classification results from lo-calised Kernel data are statistically significantly lower than the Cuckooresultsaswell. Thisshowsthat ifdataisgoingtobe col-lected at a process-specific level, user-level hooks provide more value since they will also observe many of the process’ interac-tions that did not reach the kernel. In addition, this showsthat simplycollectingatakernelprivilegeisnotenough.Thescopeof thecollection(localvsglobal) isalsoimportant.Itmaybe possi-bleto improvethelocalised Kernelresults slightlyby attempting to detectwhen malware injects its payload into benign software andrunsitfromthere.However,thatdatawouldbecapturedbya globalKernelcaptureandthereforewewouldn’texpecttheresults toimprovebeyondtheglobalkernelresults.

Sincelimitingthedatafromthekerneldriverdidnotimprove results, andgiven that Cuckoo and the Kernel Driver seemed to failondifferentsamples,we combinedthedatafromCuckooand theKerneldriverinordertoseewhetherclassiﬁcationresultsare improvedbyacombinationofdatafrombothlevels.Theresultsof thisarealsoshowninTable5.

(10)

(11)

Table6

Toptenfeaturesusingindependentfeaturerankingwith RandomForest.

Cuckoo Kerneldriver

GetSystemMetrics NtQueryDebugFilterState LoadResource NtEnumerateKey FindResourceExW NtQueryFullAttributesFile NtQueryInformationFile NtReleaseSemaphore SetFileTime NtEnumerateValueKey NtUnmapViewOfSection NtReadVirtualMemory NtOpenSection NtSetInformationProcess NtWriteFile NtSetValueKey FindResourceA NtOpenEvent CreateDirectoryW NtNotifyChangeKey Table7

Toptenfeaturesusingin-builtfeaturerankingwithRandom Forest.

Cuckoo Kerneldriver GetSystemMetrics NtWriteFile FindResourceA NtFlushVirtualMemory LdrGetProcedureAddress NtReadFile LoadResource NtUnlockFile NtReadFile NtOpenMutant NtQueryInformationFile NtLockFile SetFileTime NtNotifyChangeDirectoryFile GetFileAttributesW NtOpenEvent NtOpenSection NtDeleteAtom NtUnmapViewOfSection NtQueryValueKey

Table 5 shows that combining data from both tools produces classification resultsthat areslightlystronger forthepurposesof malware classification with an AUC of 0.990 for both AdaBoost and RandomForest. The only classifier withreduced results was K-Nearest-Neighbours suggesting that it strugglesto classify data beyond a certain number of dimensions. Again, as with all the data, the differencesshown in this table (improvements or oth-erwise)arestatisticallysignificant.Therefore,thisfurthervalidates the claim that thereis adifference in thedata fromCuckoo and theKernelDriverandthattheyfailondifferentsamplessincethe resultswouldnothaveimprovedhadthisnotbeenthecase. 5.2. Individualfeatureranking

Inordertofurtherunderstandandconﬁrmthedifferences be-tween the data gathered by Cuckoo and the Kernel Driver, we compare the top ten features using both feature selection meth-ods(describedinSection4.2– IndividualFeatureRanking)for Ran-domForestsinceitisthebestperformingalgorithm.Table6 com-paresthetoptenfeatures(inorderofscore)usingtheindependent feature rankingmethodforCuckooandtheKerneldriver.Table7

showsthe same,but usingthe in-built feature selectionmethod. The featureimportance isshownonly forRandomForest sinceit hadthebestperformance.Whileitwouldhavebeenidealtoshow acomparisonofallthecallsratherthansimplythetopten,dueto thelimitationsofspace,wehavechosentorestrictittoten.Ifthe data beingused by the machine learningalgorithms is thesame and thereforethe difference in results isdue to some other fac-tor, wewould expectthe toptenfeatures to beidentical ornear identical.

From Table6,we canseethat thedata collectedfromCuckoo and the Kernel do not have anyfeatures in common in the top tenfortheindependentfeaturerankingmethod.Thissuggeststhat both viewsused verydifferent indicatorsto distinguishmalware. In terms ofthe actual methods in the top ten foreach tool, the kernel driver containsrelatively generic calls relatingto the reg-istry, threading, memory, events, andprocesses. WhereasCuckoo

Table8

Toptenfeaturesusingin-builtfeaturerankingwith Ran-domForest.

Cuckoo Kerneldriver GetSystemMetrics NtReleaseSemaphore NtQueryInformationFile NtLockFile LoadResource NtUnlockFile RegQueryValueExW NtEnumerateKey NtUnmapViewOfSection NtWriteFile NtDuplicateObject NtOpenMutant RegOpenKeyExW NtReadFile RegCloseKey NtOpenThreadToken NtOpenSection NtReplyWaitReceivePortEx NtWriteFile NtQueryVirtualMemory

containssomehighlyspecificcallssuchasSetFileTime(tosetMAC (modify, access, and create) times on a file) and GetSystemMet-rics(to get information about the system). The presence of Set-FileTimeis not surprisingasit isoftenused by malwareto con-ceal concealits accesses of a file (and thereby conceal its mali-ciousactivity)[78].GetSystemMetricsisusedbymalwareto eval-uatewhetheritisrunninginavirtual environmentorarealone (since virtual machines tend to have low memory and storage). NtUnmapViewOfSection(andNtOpenSection)isalsousedtoevade detectionasmalwarecanuseittoreplacethecodeofalegitimate process in memory with its code so that the legitimate process runsitscode.Thiscouldbethereasonwhythekerneldriver mon-itoringatagloballevelperformedbetter thanCuckoomonitoring atalocallevelasitwasabletocapturethisbehaviourbetter.The top ten also includes some methods relating to resources (Load-Resource and FindResourceExW), malware tends to hide its pay-loadinside the resource section of a PE file, and thereforethese methods wouldbe used to extract it into memory. What is also noticeable in Cuckoo’s top ten is a mix of calls from the native API(usually startingwithNt)andtheWin32 API.An exampleof that is NtQueryInformationFile, usedto obtain information about a file. The reasonfor malware usingthis method over an equiv-alent Win32 call is that it provides more information. It’s clear that the vast majority of features favoured by classifiers to dis-tinguish malware in the Cuckoo data are the evasive features of malware,whereastheKernelDriverusesdifferencesinthegeneral behaviourofmalwaretodistinguishitfrombenignware.

MuchofourdiscussionaboutthetoptenfeaturesinCuckoofor

Table6alsoappliestothefeaturesofCuckooinTable7.However, unlikeTable6,thereisonemethodincommonbetweenthekernel andcuckoofeatures, NtReadFile. Thissuggeststhatthisfeature is importantregardlessof theperspective fromwhichdata isbeing gathered.Another interesting observation is that there are seven methodsincommonbetweenCuckoo’sindependent(Table6)and inbuiltfeature ranking (Table 8). This suggests that many of the contributingfeaturesinCuckoo’scasecanbeusedalonetodetect malware(which isworthconsidering whenselectingfeature rep-resentationmethods).Duetothis,manyoftheobservationsmade aboutCuckoo’s topteninTable 6applyhere(suchasCuckoo fo-cusingmoreon malware’s evasivebehaviour over its general be-haviour). Aside from this, Cuckoo’s top ten in Table 7 also con-tainsLdrGetProcedureAddress.Thisisimportantasitcanbe used bymalware toevade staticanalysisanddynamicheuristic analy-sis by loading all the routinesit needs at runtime andtherefore malwarecanachieve all thatit intendstowithonlythat method linkedatcompiletime.

Onthe Kernelside, thereis one methodin commonbetween theinbuiltandindependentfeaturerankingmethod,NtOpenEvent. This is no surprise asthis method can be used to interactwith WindowsEventswhichmalwarecould usetoensureitisrun ev-eryday,for example.Ingeneral, the toptens forthe kerneldata

(12)

forbothtablesaremorefocusedonthedifferencesingeneral pro-cessbehaviourbetweenmalwareandbenignware.Therearefewer methods directly related to speciﬁc behaviour exhibited by mal-ware,however,thereareafewexceptions.Intheindependent fea-turerankingforKerneldatashowninTable6,thereisthemethod NtSetInformationProcess, which has been known to be used by malwaretodisableDataExecutionPrevention(DEP).DEPisa pro-tectioninmemorywhichpreventsmalwarefromrunningcodein non-executable sectionsof memory [79]. Another method inthe toptenlikelytoberelatedtomalwareisNtNotifyChangeKey.This is used by a process to ask Windows to notify it whenever any changesaremadeto theregistry.Thiscouldbe usedbymalware tomonitorwhatisbeingdoneonthesystemorevenpreventany changestothekeysthatitcreated.

ThetoptenfortheKerneldatausingtheinbuiltfeatureranking method(showninTable7)alsoreﬂectsthis.Aswiththeprevious table,therearesomeunusualmethodsinthetoptenfeaturesfor theKerneldata;forexample,NtNotifyChangeDirectoryFile,a com-pletelyundocumentedmethod. Thismethodisused by aprocess toaskWindowstonotifyitwhenanychangesoccurinadirectory, therefore,malware maybe usingitto simplymonitorsystem ac-tivityandprotectitselfortoattachitself toanyﬁle moves. How-ever,another likely reasonis that thismethod is responsible for apublicisedvulnerability[80] thatcould be usedtoexposeparts ofkernelmemoryanddefeatAddressSpaceLayoutRandomisation (ASLR).NtNotifyChangeDirectoryFileisnottheonlyundocumented methodinthetopten;NtDeleteAtomandNtOpenMutantare also completely undocumented by Windows. This could explain why theKernel datawasable to better distinguishmalware from be-nignwareasitisabletocapturebehaviourthatcannotbecaptured atuser level. Aside from that, the differences ingeneral process behaviourarebeingusedtodetectmalware.

Tables 6and7demonstratethatRandomForest, whentrained ondata fromCuckoo andtheKernel Driver, utilises different be-havioural aspects when identifying if a ﬁle is malicious or not. WhileCuckoo andourkerneldriver generallymonitorequivalent calls,thefactthattheobservedrankingsaredifferentsuggeststhat thescope(local orglobal)ofthecalls isanimportantfactor. An-othercontributingfactorcould bethatmalwareevadesordetects theinlineAPI-hookingtechniqueusedbyCuckoobutnotthe Ker-nel hooking method employed by our driver (since it requires a moresophisticatedapproachtoevade).

Toconﬁrmthecorrectnessofbothofthefeatureranking meth-ods, we performed some simple feature reduction (described in “Section 4 -Method & Implementation”)using ourfeature rank-ing methods. The resultsof thisare shownin the Figs. 7and 8. Wecreatedthesegraphsforboth thedatafromthekerneldriver, andthe data from the Cuckoo driver. However, since the graphs wereaverysimilarshape,forbrevity’ssake,wehaveonlyshown thegraphsforthedatafromtheKerneldriver.

Formostoftheplots inFigs.7and8theAUC isatits lowest withjusttenfeatures,however,asthenumberoffeaturesthatthe machinelearningalgorithmsuseincreases,theAUCincreasesuntil itreachesitspeakataround50featuresafterwhichthe introduc-tion ofnew features simply addsnoise, thereby reducing or not contributingtothedifference intheAUC.Thishighlightsthatthe featureranking method seems tobe able to decipher which fea-turesare important.In addition,it showsthat, inmostcases,no morethan50API-callsneedtobehookedforsimilarresults. 5.3.Completefeatureranking

Finally, we applied the globalfeature ranking metric we cre-ated (described in “Section 4.3 – CompleteFeature Ranking”) to getaconcise yetcomprehensiveview ofthefeatures ofmalware thatwere consistentlyconsideredimportantby allclassiﬁers.The

Table9

Toptenfeaturesusingin-builtfeatureselectionconsideringallclassiﬁers. Cuckoo Kerneldriver

NtOpenSection NtFlushVirtualMemory InternetCloseHandle NtOpenMutant LoadResource NtFilterToken SetUnhandledExceptionFilter NtUnlockFile SetFileTime NtAccessCheckByTypeAndAuditAlarm LdrLoadDll NtQueryVirtualMemory CreateActCtxW NtDeleteAtom getaddrinfo NtWriteFile LdrGetDllHandle NtReadFile LdrGetProcedureAddress NtCompleteConnectPort

results fromapplying the globalfeature ranking forboth the in-built and independent feature selection methods are shown in

Tables8and9.

From these tables we can ascertain which features perform bestacrossall theclassiﬁersthat weused.Thisgivesusaclearer picture of which features are extremely strong when it comes to differentiating malware from cleanware. With regards to the Cuckoo data, we see in Table 8 some of the features used to evade detection that we have seen before (GetSystemMetrics, NtUnmapViewOfSection, and NtOpenSection). There are also re-sourcerelatedmethods(LoadResource)andthenativeAPImethod (NtQueryInformationFile) we encountered previously. Ofthe new methods, NtDuplicateObject is interesting because it is used by malware to evadeanti-virus heuristics, as anti-viruseswould ex-pect malware to call the more commonly used DuplicateHandle to duplicate a process handle to kill orinject into it andwould thereforebe lesslikelyto ﬂaga calltoNtDuplicateObject as sus-picious [81]. From thiswe canconclude thatCuckoo’s top tenin

Table8 containsa mixof evasive,potentiallymalicious,and gen-eralmethods.

Incontrast,Cuckoo’stopteninTable9hasmoreemphasison the evasive behaviour of malware. For example, LdrLoadDll, Ldr-GetDllHandle andLdrGetProcedureAddress are inthetop tenand are known to be used by malware to load DLLs dynamically in order to import methods from them. This can be used to avoid being detected by IAT hooks. Inaddition, the method SetUnhan-dledExceptionFilterintheCuckootop ten,isalsousedasan anti-debugging trick by malware asthis method is used to specify a functiontobecalledintheeventofanexceptionoccurringthatis nothandledbyanyexceptionhandler.However,thefunction spec-iﬁedwillonlybecallediftheprocessthat raisedtheexceptionis notbeingdebugged.Therefore,malwarecanregisterafunctionto deliver its payload andthen throw an exception, andif the pro-cessisbeingdebugged,thatfunctionwillnotbecalled,andhence themalwarewill notdisplay itsmaliciousbehaviour.SetFileTime, which hasbeen describedpreviously, is alsoused to curb suspi-cions.Finally,NtOpenSection,asmentionedpreviously,canbeused toembedmaliciouscodeinabenignprocess.Therefore,ascanbe seen,muchofthetoptenforCuckooinTable9utilisetheevasive behaviourofmalwaretodetectit.

Onthe Kernel side,each table contains methods froma wide range of categories (such as ﬁle-system, threading, networking etc.), makingit moregeneralthan thetop tenkernelcalls inthe Cuckoodata.Whilemanyofthemethodsinthesetablesarelikely to be usedby malware, they arenot used solely bymalware (as wouldbeexpectedfromatoolmonitoringatagloballevel).Onthe whole,it can be seenthat withthe Cuckoodata, malwareis de-tectedthroughthetechniquesitusestodetectamonitoringor vir-tual environment,whereas, withthedatafromtheKernelDriver, malwareisdifferentiatedfromcleanwarethroughhowitsgeneral behaviourdiffersfromthenorm.

(13)

Fig.7. Featureselectionusinginbuiltfeatureselectionmethod.

Fig.8. Featureselectionusingindependentfeatureselectionmethod.

6. Conclusion

Motivatedby a hypothesis thatkernel levelAPI calls anduser level API calls do not producethe same classification results,we conductedexperimentstounderstandthedifferencesbycollecting dataatdifferentprivilegelevelswithinthesameOperatingSystem. We collecteddataata userlevelusingCuckoo,andatthekernel levelusinga custommadeKerneldriversincethereareno exist-ingtoolsthathookallthecallsintheSSDTonaglobalscale.The datacollectedwasclassifiedusingseveralstate-of-the-artmachine learningalgorithmstodeterminewhethercollectingdataat differ-ent levelsalteredclassification results.The resultsshowedkernel data tobe statistically significantly betterfor all classification al-gorithmsdespitethefactthatuserlevelmethodsaresignificantly morepopularintheliterature.RandomForestperformedthebest with an accuracy of 94.0% for Cuckoo and 95.2% for the Kernel Driver. In addition, by limiting the kernel data to that produced bytheprocessunderobservation(anditssubprocesses),wefound that the classification results reducedsuggesting that the

collec-tionofdataata global,system-widelevelaidedtheclassiﬁcation process.Ourstrongestclassiﬁcationresultswereobservedby com-biningthedatafromCuckoo(userlevel)withthatfromourKernel driver;achievinganAUC of0.990 andaccuracy of96.0%for Ran-domForest.

In orderto understand why the differencesin data collection methodshadcontributedtothedifferentclassificationresults,we performedfeatureranking forRandomForest andcollectively for allclassifiersused,andfoundthatthefeaturesfocusedonby clas-sifiers differed significantly fromthe data used. The main obser-vation from this wasthat monitoring on a process specific level asCuckoo doescaused the machine learningalgorithm to detect malware using its evasive properties. Whereas, when trained on data obtainedfrom monitoringat a global, kernel level, the ma-chinelearningalgorithm usedthe moregeneralbehaviour of the malware (and processes in general) to distinguish it from clean-ware. The differences resulting from collecting data at different privilegelevelshighlightedthebenefitgainedfromcollectingdata ata kernellevel (or both levels)in orderto detect malwareand

(14)

theimportanceoftheliteraturecarefullydetailingthedata collec-tionmethodthat has beenusedsince the resultsare affectedby it.Toassist withthis, wehavedocumentedmanyofthedynamic malwareanalysis toolsin Table A1in the appendicesof this pa-per.TableA1showsthatwhilethereexists aplethora ofwell es-tablishedtoolsforcollectingdataatauserlevel,thereareonlya handfulofestablished tools tocollect data ata kernel level,and fewerstillthatarefreelyavailable.Whilethedriverwehave writ-tenisspecific toWindows XP, themain contributionsofthis pa-per(acomparisonofuserandkernellevelcalls)willapplyto fu-turereleases of Windows. In conclusion, this paperprovides the firstobjective,evidence-basedcomparisonofkernellevelanduser leveldataforthepurposesofmalwareclassification.Infuturewe hope to doan in-depth analysisinto the implications ofthe dif-ferencesintherepresentativefeaturesofmalwarewithkerneland userdata.

Funding

ThisworkhasbeensupportedbytheEngineeringandPhysical SciencesResearchCouncil[projectno.1657416].

DeclarationofCompetingInterest

Wewouldliketoreiteratethatwehavenoconﬂictsofinterest todisclose.

Acknowledgments

WewouldalsoliketothankVirusShareandVirusTotalfor pro-vidinguswithsamplesandinformationregardingmalware. AppendixA. ToolsusedintheliteraturetogatherAPI-calls

TableA1

Name Description Kernelhook Userhook Usedby APIMonitor[82] Capableofhookingeverymethod

intheWindowsAPI

x [63,83–85]

APIMon[86] UsesEasyHook[87]toperform inlinehookingonalluser-level APIs

x [88]

BusterSandboxAnalyser[89] Notdocumentedhowitgathers APIcalls.Monitorsspeciﬁc categoriesofcalls.

x [53,90]

CaptureBAT[24] Usesﬁlterdrivers x [91,92]

CuckooSandbox[93] Leadingopen-sourcedynamic malwareanalysissystem[93]. Usesinlinehooktohookcertain categoriesofWindowsAPIcalls

[94]

x [51,52,54,67,68,73,95–127]

CWSandbox[128] Usesin-linecodehookstorecord callsinspeciﬁccategories[128]

x [129–134]

Deviare[135] Hookingenginethathooksentire Win32APIandisalso

integrate-ablewithmany programminglanguages

x [65]

Ether[32] VMIsolutionfocusedonbeing undetectablebymalware(known forachievinggoodtransparency). UtilisesXenhypervisorandIntel VT[33]toprovidehardware virtualization

x [53,64,136,137]

HookMe UsesMicrosoft’sDetours[18]to performinlinehooking

x [61,138]

Malpimp[139] Basedonpydbg(purePython debugger)

x [140]

MicroanalysisSystem(MicS)

[141]

Executesinareal(notvirtual) environmentandusesIAThooking

x [142]

NtTrace[143] Toolthatusesinlinehookingto hookntdll.dll

x [144]

Osiris[34] VMIsolutionusingamodiﬁed versionofQEMU[29].Also providesasimulatednetwork environment.Monitorsspeciﬁcset ofuserandkernellevelcalls

x x [64]

StraceNT[145] InspiredbystraceonLinux.Uses IAThookingtohookalluser-level APIs

x [146–148]

SysinternalsProcessMonitor[23] Gathersdatausingakerneldriver (ﬁlesystemﬁlterdriver)[6]

x [51,91,149–153]

TEMU[154] Extensiblecomplete-system, ﬁne-grainedanalysisplatform capableofmonitoringanycall

x x [30,155,156]

TTAnalyze(usedinAnubis (Analysisofunknownbinaries) Sandbox[157][158])

UsesQEMU[29]toperform softwareemulation.Monitors speciﬁccategoriesofAPIcalls throughJITcompilation[28]

x x [62,159–163]

WinAPIOverride[164] Freetooltomonitoralluser-level WindowsAPIcallsmadeby processes

(15)

References

[1] AVTEST. The AV-TEST Security Report 2016/17. Tech. Rep.; 2017. https://www.av-test.org/ﬁleadmin/pdf/security_report/AV-TEST_Security_ Report_2015-2016.pdf.

[2] LiuJ,WangY,WangY.Thesimilarityanalysisofmalicioussoftware.In:2016 IEEEﬁrstinternationalconferenceondatascienceincyberspace(DSC);2016. p.161–8.doi:10.1109/DSC.2016.12.

[3] MoserA,KruegelC,KirdaE.Limitsofstaticanalysisformalwaredetection. In:Twenty-thirdannualcomputersecurityapplicationsconference(ACSAC 2007);2007.p.421–30.doi:10.1109/ACSAC.2007.21.

[4] RuddEM,Rozsa A,GüntherM,Boult TE.Asurvey ofstealthmalware at-tacks,mitigationmeasures,andstepstowardautonomousopenworld solu-tions.IEEECommunSurvTutor2017;19(2):1145–72.doi:10.1109/COMST.2016. 2636078.

[5] SchroederMD,SaltzerJH.Ahardwarearchitectureforimplementing protec-tionrings.CommunACM1972;15(3):157–70.doi:10.1145/361268.361275.

[6] RussinovichME,SolomonDA,IonescuA.Windowsinternalspart1.6thed; 2012.ISBN978-0-7356-4873-9.

[7] Garnaeva M, Sinitsyn F, Namestnikov Y, Makrushin D, Liskin A. Overall statisticsfor 2016; https://kasperskycontenthub.com/securelist/ﬁles/2016/12/ Kaspersky_Security_Bulletin_2016_Statistics_ENG.pdf.

[8] Symantec. Internet security threat report 21. https://www.symantec.com/ content/dam/symantec/docs/reports/istr-21-2016-en.pdf.

[9] RamilliM,BishopM,SunS.Multiprocessmalware.In:Proceedingsofthe 20116thinternationalconferenceonmaliciousandunwantedsoftware. MAL-WARE’11.Washington,DC,USA:IEEEComputerSociety;2011.p.8–13.ISBN 978-1-4673-0031-5.doi:10.1109/MALWARE.2011.6112320.

[10] NebbettG.WindowsNT/2000nativeAPIreference.ThousandOaks,CA,USA: NewRidersPublishing;2000.ISBN1578701996.

[11] BlundenB.Therootkitarsenal:escapeandevasioninthedarkcornersofthe system.2nded.USA:JonesandBartlettPublishers,Inc.;2012.144962636X, 9781449626365

[12] ShaidSZM,MaarofMA.Inmemorydetection ofwindowsapicallhooking technique.In:2015Internationalconferenceoncomputer,communications, andcontroltechnology(I4CT);2015.p.294–8.doi:10.1109/I4CT.2015.7219584.

[13] ChenX,AndersenJ,MaoZM,BaileyM,NazarioJ.Towardsan understand-ingofanti-virtualizationand anti-debuggingbehaviorinmodernmalware. In:2008IEEEinternationalconferenceondependablesystemsandnetworks withFTCSandDCC(DSN);2008.p.177–86.doi:10.1109/DSN.2008.4630086.

[14] NunesM.Matthewnunes/kernelssdtdriver:kerneldriver(withlocalisation); 2018.doi:10.5281/zenodo.1169136.

[15] NunesM.Dynamicmalwareanalysiskernelanduser-levelcalls;2018.doi:10. 17035/d.2019.0082395337.

[16] PietrekM.Insidewindows-anin-depthlook intothe win32portable exe-cutableﬁleformat.MSDNMag2002;17(2).

[17] LeitchJ.IatHookingRevisited;2011.

[18] HuntG,BrubacherD.Detours:binaryinterceptionofwin32functions.In:3rd usenixwindowsntsymposium.1999.

[19] skape.Dynamicbinaryinstrumentation.Uninformedorg2007;7.

[20] GarﬁnkelT,Rosenblum M.Avirtualmachineintrospectionbased architec-tureforintrusiondetection.In:Proc.networkanddistributedsystems secu-ritysymposium,3;2003.p.191–206.

[21]Viscarola P, Mason WA. Windows NT device driver development. 1st ed. ThousandOaks,CA,USA:NewRidersPublishing;1998.ISBN1578700582.

[22] HoglundG,ButlerJ.Rootkits:subvertingthewindowskernel.Addison-Wesley Professional;2005.ISBN0321294319.

[23] Russinovich ME. Process monitor — windows sysinternals | microsoft docs.https://docs.microsoft.com/en-gb/sysinternals/downloads/procmon; Vis-itedon2017-07-27.

[24]TheHoneynetProject.http://old.honeynet.org/index.htmlVisitedon 2017-07-26;

[25] H˘ajm˘a¸sanG,MondocA,Cre¸tO.Dynamicbehaviorevaluationformalware de-tection.In:20175thInternationalsymposiumondigitalforensicandsecurity (ISDFS);2017.p.1–6.doi:10.1109/ISDFS.2017.7916495.

[26] Callback Objects | Microsoft Docs. https://docs.microsoft.com/en-us/ windows-hardware/drivers/kernel/callback-objectsVisited on 2017-07-26; [27]ZhangF,MaY.Usingirpwithanovelartiﬁcialimmunealgorithmfor

win-dowsmaliciousexecutablesdetection.In:2016Internationalconferenceon progressininformatics andcomputing(PIC);2016.p.610–16.doi:10.1109/ PIC.2016.7949573.

[28] BayerU.TTAnalyze:atoolforanalyzingmalware;2005.http://old.iseclab.org/ people/ulli/TTAnalyze_A_Tool_for_Analyzing_Malware.pdf

[29] BellardF.Qemu,afastandportabledynamictranslator.In:Proceedingsofthe annualconferenceonUSENIXannualtechnicalconference.ATEC’05; Berke-ley,CA, USA:USENIXAssociation;2005,p. 41–41http://dl.acm.org/citation. cfm?id=1247360.1247401.

[30] YinH, SongD,Egele M,KruegelC,KirdaE.Panorama: Capturing system-wideinformationﬂowformalware detectionandanalysis.In:Proceedings ofthe14thACMconferenceoncomputerandcommunicationssecurity.CCS ’07. NewYork, NY, USA: ACM;2007. p.116–27.ISBN 978-1-59593-703-2. doi:10.1145/1315245.1315261.

[31] SongD,BrumleyD,YinH,CaballeroJ,JagerI,KangMG,etal.Bitblaze:a newapproachtocomputersecurityviabinaryanalysis.In:Proceedings of the4thinternationalconferenceoninformationsystemssecurity.ICISS’08.

Berlin,Heidelberg:Springer-Verlag;2008.p.1–25.ISBN978-3-540-89861-0. doi:10.1007/978-3-540-89862-7_1.

[32]DinaburgA,RoyalP,SharifM,LeeW.Ether:malwareanalysisviahardware virtualization extensions. In: Proceedings ofthe 15thACM conferenceon computerandcommunicationssecurity.CCS’08.NewYork,NY,USA:ACM; 2008.p.51–62.ISBN978-1-59593-810-7.doi:10.1145/1455770.1455779.

[33]UhligR,NeigerG,RodgersD,SantoniAL,MartinsFCM,AndersonAV,etal. Intelvirtualizationtechnology.Computer2005;38(5):48–56.doi:10.1109/MC. 2005.163.

[34]Cao Y, Liu J,Miao Q, LiW. Osiris: amalware behaviorcapturing system implemented at virtual machine monitor layer. In: 2012 Eighth interna-tionalconferenceoncomputationalintelligenceandsecurity;2012.p.534–8. doi:10.1109/CIS.2012.126.

[35]LengyelTK,MarescaS,PayneBD,WebsterGD,VoglS,KiayiasA.Scalability, ﬁdelityandstealthinthedrakvufdynamicmalwareanalysissystem.In: Pro-ceedingsofthe30thannualcomputersecurityapplicationsconference. AC-SAC’14.NewYork,NY,USA:ACM;2014.p.386–95.ISBN978-1-4503-3005-3. doi:10.1145/2664243.2664252.

[36]PékG,ButtyánL.Towardstheautomateddetectionofunknownmalwareon livesystems.In:2014IEEEinternationalconferenceoncommunications(ICC); 2014.p.847–52.doi:10.1109/ICC.2014.6883425.

[37]RutkowskaJ,TereshkinA.Isgameoveranyone?.USA:BlackHat;2007.

[38]YanL-K,JayachandraM,ZhangM,YinH.V2e:Combininghardware virtual-izationandsoftwareemulationfortransparentandextensiblemalware anal-ysis.SIGPLANNot2012;47(7):227–38.doi:10.1145/2365864.2151053.

[39]BrueningD,DuesterwaldE,AmarasingheS.Designandimplementationofa dynamicoptimizationframeworkforwindows.4thACMworkshopon feed-back-directedanddynamicoptimization(FDDO-4);2001.

[40]PolinoM,ContinellaA,Mar