Rochester Institute of Technology
RIT Scholar Works
Theses
Thesis/Dissertation Collections
2004
Using system call analysis to stop evasion attacks in
anomaly based Intrusion Detection System
Ashish Liladhar Samant
Follow this and additional works at:
http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please [email protected].
Recommended Citation
Using System Call Analysis To Stop Evasion Attacks in Anomaly Based
Intrusion Detection System
Thesis report submitted in partial fulfillment of the requirements of the
degree Master of Science in Computer Science
July 2004
Ashish Liladhar Samant
Advisor: Prof. Leon Reznik
Leon Reznik
Reader: Prof. Warren Carithers
Warren Re Carithers
Thesis/Dissertation Author Permission Statement
Title of thesis or dissertation: U ~, N
4
s ....
S
l' ~ f\.\ C A L.L P, N A L't
S , $
-,0
SlOP €VA.s'Of\J ATTPtC.\C.$ IN
A~oMfH",'tSA-SeD
I I-.l'RUSION D€"('('tTION S'l.$T€.N
S
.
Name of author:
f\
SH I S. Hc....
SAM
AtoJ
r
Degree:
M A
50T
c42..
0 FS (
I
6 I\JC-€
Program: CO M
i'L)'€4Z..
S,
I€: N (6College:
y
C ( IS
I understand that I must submit a print copy of my thesis or dissertation to the RIT Archives, per current RIT guidelines for the completion of my degree. I hereby grant to the Rochester Institute of Technology and its agents the non-exclusive license to archive and make accessible my thesis or dissertation in whole or in part in all forms of media in perpetuity. I retain all other ownership rights to the copyright of the thesis or dissertation. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.
Print Reproduction Permission Granted:
I, f) S \-\
ISH
LS
Prm
f\-
rV 'T
,
hereby grant permission to the Rochester Institute Technology to reproduce my print thesis or dissertation in whole or in part. Any reproduction will not be for commercial use or profit.Ashish Samant
Signature of Author: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Date: 0
'1 /
'30/1-00
Lt
Print Reproduction Permission Denied:
I, , hereby deny permission to the RIT Library of the Rochester Institute of Technology to reproduce my print thesis or dissertation in whole or in part.
Signature of Author: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Date: _ _ _ _ _
Inclusion in the RIT Digital Media Library Electronic Thesis
&Dissertation (ETD) Archive
I, , additionally grant to the Rochester Institute of Technology Digital Media Library (RIT DML) the non-exclusive license to archive and provide electronic access to my thesis or dissertation in whole or in part in all forms of media in perpetuity.
I understand that my work, in addition to its bibliographic record and abstract, will be available to the world-wide community of scholars and researchers through the RIT DML. I retain all other ownership rights to the copyright of the thesis or dissertation. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I am aware that the Rochester Institute of Technology does not require registration of copyright for ETDs.
I hereby certify that, if appropriate, I have obtained and attached written permission statements from the owners of each third party copyrighted matter to be included in my thesis or dissertation. I certify that the version I submitted is the same as that approved by my committee.
Acknowledgements
I wouldlike to take thisopportunity tothankall thepeoplewho haveguidedme, notonly
during
the completion of my thesis but alsoduring
the course of my education at Rochester Institute ofTechnology. I wouldlike to thank Prof. Leon Reznik, my advisor,for supportingme throughoutthe durationofthe thesisand for his invaluable input. Right
fromthestart, whilepreparingtheproposalforthis work, to theveryend while preparing
the final report, he was always very patient and helpful. I would specially like to thank Prof. Warren Carithers for his insightful comments and fruitful discussions over the
subject matter and for guidingme through the technical writing skills. I must also thank
Prof. Fereydoun Kazemian for reviewing my work and for his support. Aspecial thanks to Prof. Reek, who guided me through the initial phase and whose professionalism and
passioncontinuetobe a sourceofinspiration. I wouldalsolike to expressmy gratitudeto the very resourceful and generous users of the Linux Users
Group
Of Rochester(LUGOR)
forbailing
me out on numerous occasions when I had trouble with Linux issues. Lastly, I would like to thank my parents for their continuous support andAbstract
Intrusion Detection Systems (IDSs) that operate on the principle of system call
monitoringare knowntobe susceptible tomimicryorevasion attacks. Ithas been shown
that an intelligent adversary armed with comprehensive knowledge ofthe target system
or network, can penetratethese targets, hide his presence from the IDS, and continue to
carry out damage. IDSs, which use system calls to define normal behavior, often leave out complimentary information aboutthem, and intruders use preciselythis drawback, to
deceivethe IDS.
This thesis investigates the vulnerabilities of a system call based IDS and carries out a
theoretical and experimental study of methods allowing to improve the IDS performance
and reliability. It analyzes the designprinciples and architecture ofanomaly based IDSs
and studiestheimplementationofatypical system callbased anomaly IDS. This category
of anomaly detection systems is currently attracting considerable attention within the
research community and various prototypes have been developed in recent years. The thesis investigates the hypothesis that
by
monitoring the number of system calls that failand return error values on aper processbasis, itwouldbepossibleto
identify
abnormallybehaving
processes. It also suggests thatby
using only a certain set ofcritical systemcalls instead of all the defined calls, it could be possible to detect and stop mimicry
attacks. pH IDS is used for the purpose ofthe experiments as its source code is
freely
available. It works as a patch to the Linux kernel and alters the way system calls are
handled. The tests were carried out on a stand-alone Linux box running RedHat 9 with
kernel version 2.4.20. Local exploits, which were readily available on the Internet, were
usedintheexperiments.
Some oftheresults obtainedcontradictedour originalhypothesisand are indicative ofthe
scope for future work inthis area.Thetestsrevealed that itwasnotpossibleto simplyuse
system call return values to
identify
erroneouslybehaving
processes. However afterclassifying the system calls into critical and non-critical sets, a form ofmimicry attacks
could be successfully detected. The results confirm the potential of this technique to
TableofContents
Acknowledgements
Abstract
List offigures
1. Introduction 1
1.1. Intrusion Detection 1
1.2. pHIDS 2
1.3. Problem definition and pitfalls ofpHIDS 2
1.4. Thesis Outline 3
2. Literature Review 5
2. 1 Computer
Security
52.1.1 Whatare we
dealing
with ? 52.1.2 Whoarewe
dealing
with? 62.1.3 Genericapproachestocomputersecurity 7
2.2 Nature of attacks 8
2.3 Classificationofintrusion detectionprinciples 9
2.4 Currentstateof research 11
Features andVulnerabilitiesof pH IDS 1 5
3.1 Design of pHIDS 15
3.1.1 Process Homeostasisanddefinitionof'self 15
3.1.2 Systemcalltraces and pHIDS 18
3.2
Working
ofpH IDS 213.2. 1 Implementation 21
3.2.2
Learning
phase 233.2.3 Monitoringphase 25
3.2.4 ManipulatingpH 26
3.3
Mimicry
Attacks 273.4
Susceptibility
of pHIDS 28ModificationsandExperimentswith pH 3 1
4. 1
Testing
environment 314.2 Classificationof system calls 33
4.3
Tracking
of erroneoussystem calls 375. ResultsandConclusion 39
5.1 Outcome from classificationof system calls 39
5.3 Conclusion 48
5.4 Scopefor futurework 49
References 51
Appendix A 55
Appendix B 59
List ofFigures
Figure 2.1 PopularResearch Intrusion Detection Systems. Figures in brackets denote theirpublications in References List.
Figure 3.1 Sequences of system callswith window size offive.
Figure 3.2 Contents ofbyte array storingsystem calllook- aheadpairs.
Figure 3.3 Sequenceof eventsinsystem call execution.
Figure 3.4 Architecturaloverviewof pHIDS. Adapted from
[22]
Figure 4.1 Architectural Overview afterthemodificationstopH
Figure 5.1 Partial systemcall traceofXgalagaexploit.
Figure 5.2 Growthof profilebefore and aftermodifications.
Figure 5.3 Strace outputforemacs.
Figure 5.4 Graphicalrepresentation of rate ofgeneration oferrorsfor emacs. Referto
Appendix C for datavalues.
Figure 5.5 Graphicalrepresentationofdistributionof errors foremacs. Refer to
Appendix C for datavalues.
Figure 5.6 Graphicalrepresentation ofdistributionof errors foremacs, whenafile
wasedited. Referto Appendix Cfor datavalues.
Figure 5.7 Graphical representation ofdistributionof errors forpHmon. Referto
Chapter 1
Introduction
1.1 Intrusion Detection
With the advent ofthe Internet andhigh-speed access links,networks have beenthrown
open to the outside world and can be accessed with considerable ease as compared to
those of10 years ago.With this increase in availability ofservices, there also exists the
increased threat ofillegal access and misuse of systems. The source ofthis threat to the
integrity
of resources and services can be either internal (e.g. somebody within theorganization) or external (e.g. a hacker paid
by
a business competitor to siphon outsensitive data). Irrespective of the sources, it is necessary to safeguard systems and
maintain their appropriate and legal use. This is where Intrusion Detection Systems
(IDSs)come intothepicture.
Intrusiondetectionistheprocess ofmonitoringthestate and use of asystemtodetectthe
presence of unauthorized activity.
Many
features and characteristics of system can beused to construct views of normal activity; the system can then be monitored
continuously to spot any divergence from these normal views. Alternately, known cases ofintrusions are used as a reference andthe behavior ofthesystem is observedto detect
any similarity to behavior of a system that is compromised
by
these known attacks. Response to intrusions has traditionally not been considered as a necessary feature ofintrusion detection systems, but oflate, considerable research effort has
being
directed towardsintegrating
an automaticresponse mechanism inthesesystems.Intrusion Detection Systems deal witha little piece ofthe problem of computer security.
In a typical scenario, an IDS would accompany other security mechanisms like
firewalls,
anti virus software and
integrity
checkers, all of which deal with the same problem at different levels. The task ofpreventing systems fromfalling
in to the handsofintrudersand realizing thatattacks are inevitable and
directing
effortsto early detection. Intrusiondetection systems are directed toward the latter stage, where the goal is to minimize
damage caused
by
a security violation. The typical response of an intrusion detectionsystem is to 'raise an
alarm'
and notify the system administrator, who in turn checks for falsepositives andthen takesremedialactionifnecessary.
1.2 pH IDS
The pH IDS is a host-based anomaly intrusion detection system and was designed and
developed at the University ofNew Mexico
by
Anil Somayaji et al [23]. It can beconsidered as representative ofthe entire class of intrusion detection systems that use
system call monitoring. The underlying detection algorithm ofpH IDS is based on the
generic approach ofusing system call sequence information for constructing definitions
of normal program activity, first suggested
by
Stephanie Forrest et al [7]. One ofthe unique characteristics of pH is that it can also respond to intrusions and take preventiveaction on its own, without the intervention ofthe system administrator. For this reason, its authors have described it as 'autonomous response system', rather than an intrusion detectionsystem.
1.3 Problem definition andpitfalls of pH IDS
Results ofpreliminary analysis and experiments carried out
by
Forrest[2,3]
show that,system call monitoring is a very effective approach to do intrusion detection. Forrest
extensively used the 'sendmail program for testing purposes and '//?/-' in certain cases.
'SendmaiV was a good choice because it is
fiirly
complex, generates many variations oflegal behavior and there are many known attacks to which it is susceptible. In their
results, increased instances of abnormal activity could be detected in instances of
successful intrusions
by
some known attack scripts. Though these tests were carried outUNIX*
, with slight or no modifications. The pH IDS, which was based on the above principle and was developed for a Linux kernel displayed similar results when further
tests were carried exploitingthe vulnerabilitiesin SSH daemon [23].
From the above documented results andreports oftheperformance of pH IDS itcan be
concludedthat this approach (and the phIDS) is
fairly
successful against a broad varietyand classes of attacks and is not just limited to certain typical programs. Once a
vulnerability in a program has been exploited and the intruder injected his malicious code, it is
highly
probably that ph IDS will recognize the abnormal traces andraise an alarm before wide scale damage occurs. However, Wagner and Sato have shown thefailure of pH IDS in stopping mimicryor evasionattacks [27]. Inthese attacks,intruders
try to hide their presence, try not to generate abnormal traces of system calls, and then
attempt to 'slip under the radar'. Since the IDS does not see anything which it deems
'suspicious', it fails to trigger an alarm. When using system call information to learn about normalbehaviorofprocesses, manyaspects of systemcalls canbeconsidered(e.g.
parameter values passedto system calls,the return values obtained aftertheir execution,
the relative frequencies of system calls, the timing ofsystem calls with respect to the othernotable events in theprogram flow, and so on). pH onlyuses sequence information
or relative ordering information ofsystem calls. Thisdesign has very little overhead and
is relatively less complex. But the simplicity of pH IDS creates potential for anintruder
to break in. Theauthors of[27] have suggested some ways inwhichthese attacks can be carried out and this raises two issues: how to modify pH IDS to mitigate these kind of
attacks, and the practical risk involvedinthese kind of attacks. I focus on both ofthese
issues in mythesis.
1.4 Thesis Outline
The remainder ofthis documentis organized as follows. Chapter 2 throws somelight on
the broad areaofComputer
Security
andthe role ofIntrusion Detection Systems. It talksabout the different categories of intrusion detection systems and their features and
touchesonthecurrent state of research inthis area. Someofthesignificant research work
and papers are referenced and a background is created for the reader who may not be
familiarwiththe concepts and currentpoliciesinthis domain.
Chapter 3 discusses the working ofthe pH IDS. I look at the underlying architecture of
the system and explainin brieftheworkingofthis system. The environmentinwhichpH is implemented is described. Some ofthe salient results obtained
by
the authors ofthesystem are noted. I explain thesusceptibility of pHIDStoevasion and mimicryattacksas
outlined
by
WagnerandSato [27].Chapter 4 establishes ourhypothesis. I outline a few approachesto tacklingthe problem
and describe the tests and experiments that were carried out and the rationale behind them. The goal ofthese experiments was to evaluate some ofthe suggested solutions,
eliminate those, which do not yield satisfactory results and at the least, establish a few pointersfor future work,which might generate foolproofsolutiontothis problem.
Chapter 5 documents the results obtained from the tests. These are used to draw conclusions and test our assumptions and hypothesis. I try to correlate the actual results obtained and ourexpected results and explainthediscrepancies,when theyoccur. I also
Chapter 2
Literature Review
2.1 Computer
Security
2.1.1 What are we
dealing
with?Because of advancement in microprocessor technology, closer coupling between
operating system architectures and programming languages and multiplication of resourcesduetowidedeploymentofdistributed systems andnetworks,we are todayable togetmore outputfrom our systemsthanwas evenimaginablea fewyears ago. Increased
computing capabilitymeansthattoday'ssystems and applications nowhave significantly greater functionality, as well as
being
both, faster and larger. However, the downsidetothis has been that they have also become more complicated and thus, more unstable,
proneto crashes, securityviolations,abuse andlossof productivity.
The nature of our networks and pervasiveness of today's Internet only add to the
problems faced
by
system administrators and personnel in charge ofscripting security policies. The Internet is an unbounded network; remote hosts are added and deletedrandomlyand it is very difficult (ifnot impossible), to predict theirbehavior while they are connected to the network. The network protocols on which these networks operate
were designed decades ago, when the number ofhosts was far less.
Security
was not aprimary goal ofthe designers and so theseprotocols were
inherently
insecure (e.g. IP addresses, which are essentially a sequence of bits, is the only mechanism used to authenticatethe twocommunicating hosts).Tomake matters worse fromthe securitypoint ofview, computer systems and networks
are not static entities and they
display
constant changes in their characteristics. Thenumber ofhosts connected to a network can vary unpredictably; system and application
users accessing these systems is variable; and, most importantly, definition of what
constituteslegitimate behavioris alsodynamic. Inthisenvironment, it becomes harderto
distinguish between abnormal activityor stateof a system caused
by
a security violationandhithertounseenactivitythat is simplyaresultofevolvinguserbehavior.
Consequently
it becomes critical that we focus our research efforts on producing morereliable, fault tolerant and secure systems, networks and programs. Computer security is
notjust limited to detection ofintrusions, but also consists ofguaranteeing
integrity
ofdata, ensuring availabilityofservices andpreventingviolationsofsecuritypolicies.
2.1.2. Who arewe
dealing
with?Originally,
hacking
was the domain ofthe elite and exclusive group of computer savvy geeks.Hacking
into systems required access to resources like fast processors, largeamounts of memory and above all expert level technical know-how. The
hacking
community prided itselfon its exploits; the motivation forbreaking
systems and code was usually a desire to impress others with one's technical ingenuity. There were instances when hackers were paid to break into organizationsby
a rival business competitor,buttherewere intheminority.But
hacking
is no longer restricted to the geek community. Ironically, it is the wideavailability of information on this topic on the internet, that enables anybody with Internet access and a reasonably well-equipped computer to be in a position to
cause
damage
by
spreading internet worms, spamming email,injecting
trojan code andbreaking
into systemsby
use of malicious code. Thereexist entire websites dedicated tothe education ofthe average userintheart of computerhacking.
They
contain snippets of code, tutorials and listings ofknown exploits, and automated tools that sniffnetworks,identify
vulnerable machines,extract host information.The threat to the securityof systemsisnotlimitedtohackers or crackers who are external
to the organization.
Security
violations also occur fromwithin the organization itselfandprivileges may use an exploit against vulnerability in a program to gain elevated
privileges and access sensitive data, modifyrecords, installback doors or delete system files.
2.1.3 Generic approachesto computersecurity
Thomas Longstaffet al
[18]
classified acts ofsecurity violationsinthefollowing
manner.Whenever information or resources of a system are copied or read
by
an unauthorizeduser, itconstitutes lossofconfidentiality. Wheneverresourcesare corrupted
intentionally
or unintentionally, it constitutes loss of integrity. Whenever resources are deleted orbecome unreachable, it constitutes loss ofavailability. In all instances,
'resources' refers
toeitherthedatathatoursystemsholdortheservicesthattheyprovide.
The first step towardsensuringthe security ofa system isto restrict accessto thesystem.
A user wanting to connect to the computers must prove his identity. This is known as
authentication and generally involves verification through passwords and usernames.
After the authenticity ofthe user has beenproved, the user is allowed to access certain
parts of the system, while certain features are hidden from him. This is known as
authorization, whichdefineswhatregionsof asystem a particularuser can reach. This is typically achieved
by
setting upprivilege levels and read, writeand execute permissions onfilesanddirectories.Another aspect of computer security is guarding data as it travels across the network.
During
itstrip from onehostto another, it islikely
that thedatawillpassthrough routers,gateways and hosts thatmay notbea part ofthe organization. Itthus becomes important
that these external entities are not able to comprehend the sensitive data or garble it intentionally. For these purposes, mechanisms like encryption are usedto make the data
unintelligible toanyone otherthan thecommunicatingparties.
A third category of computer security tools and mechanisms deals with monitoring the
system,
detecting
violations, and responding to them. Tools such asfirewalls,
fileintegrity
checkers, system scanners and, anti-virus programsevaluate the state of a system, scrutinizethe data
flowing
inand out oftheboundaries ofanetwork, and generally look forany kind of suspicious activity, user or state of system
variables. IntrusionDetection Systems fall intothis category.
They
assumethatinspite ofbest efforts, systems will be compromised; their goal is to detect these violations at the
earliest and minimizedamage
by
takingappropriate remedial action.2.2 Nature ofAttacks
Any
activity that contradicts or diverges from the intended usage of a system and itsapplications and violates the security policies ofthe organizationis consideredharmful.
Thus even stealthy probes that are carried to detect vulnerable hosts on a network are
dangerous as these probes are often followed
by
payload ofthe attack itself. Networkscanning, probing, website-defacing, account compromise, to the more prominent,
passwordcracking, denial of service attacks,malicious code and the complete disruption
of normal servicesareall formsofhostileactivity.
Someofthe welldocumentedways ofattackinga system [18,
20]
aredescribed below:ProbesandScans: These are used to learn about the network, discovervulnerable hosts
and
identify
softwarebeing
used,including
operating system types, versions of variousprograms, updates (if any) on installed programs.
Stealthy
probes do not usuallythemselves damagethe system, but shouldbe consideredjust as dangerous as an actual
attack. These information gathering activities can be spread over a number ofdays and
evenweeks,as ports, applications andhostsare scanned andtested, one after another.
Privilege Escalation: Known vulnerabilities in systems thathave not been patched, may
beexploitedto gainsuper-user status. Thegoal ofthiskind
Denial ofService: These attacks are primarily carried out to make the systems or
applications unavailable and
deny
its authorized users access to services.They
do notnecessarily access or modify the data held within the system. Hackers may generate
repeated requests of a particular application or flood a network with their packets thus
making it harder for legal usersto accesstheseresources.
Packet Sniffers: These are anotherformofstealthyattacks. Theseprogramstrackpackets
that flow over the network, and siphon out information from the encrypted data.
Typically
these programs look for usernames and passwords and other authentication orpersonal information,whichis laterused for
forging
identity.Masquerading: In these kind of attacks, through clever manipulation of contents of
packets, intruders assumetheappearanceoftrustedhostsand users. Thus ifa networkhas
been set up to accept remote connections only from a few specific hosts, intruders
'masquerade'
asthese trustedhostsandmay fool thenetwork into acceptingthem.
Trojan Code, Viruses, Worms: These are all examples ofprograms which can cause
damage on alarge scale, affectingentire networks. Theirpresence isnot easily identified
until afterthe damage hasdone.
Typically
requireauserto execute some maliciouscode,which thenactivates ortriggers theseviruses and worms.Oftentheseare self-duplicating
andself-propagating,affectingall connectedhosts.
2.3 Classification ofIntrusion Detection Principles
Based ontheir detectionprinciple Intrusion Detection Systems
(IDS)
can be classified asanomaly based or signature based. An anomaly based IDS
initially,
learns about thenormal behaviorof a program orthesystemonthe whole,
by
observing some predefinedcharacteristics. From this information, the anomaly based IDS tries to generate rules or sets of conditions, which govern normal usage. After sufficient time has been spent
creating profiles of normal activity, it is then deployed and it monitors the network or
not match the reference profiles, is flagged as anomalous. When sufficient suspect
behavior isobserved,usually system administrators are notified. Notethat it is necessary
to ensure that
during
thislearning
phase, there are no instances of an attack or an intrusion,or else theIDSmightlearnaboutthe attackitself.In signature based IDS, previously known attack scripts and intrusions are used.
Using
the
history
of such already detected attack vectors, databases ofintrusive activities are created.During
the monitoring phase, these databases are used to examine all seenactivity. Ifthere is a match between patterns ofsystem calls or contents ofa series of
packets or some other system observable and the contents ofthe database, it is certain
thatthesystem is underattack.
There is also a categoryof intrusion detection systemsthat usesa combination ofabove
principles and is known as Compound IDS. The most popular example of such a system is the IDES or Intrusion Detection Expert System [19], which was followed
by
the NIDESorNext-generation IDES [1].Anomaly-based IDSs suffer from the problem offalse positives. It can be difficult for
them to distinguish between legitimate, changing user-behavior or less
frequently
observed legal user behavior that was not seen
during
thelearning
phase, and real novelties that occur because of an intrusion. Signature based IDS do not generate false positives; however, the success of these depends on how efficiently and quickly the database ofknown intrusions can be updated. This task is harder than it seems, becauseeven though, a newly discovered exploit can be
fairly
quickly added to the monitoringdatabase, it is often possible to generate variations ofthe same attack. Signature based
IDS arealsounableto
identify
completelynew attack vectors.IntrusionDetection Systems can also
loosely
be classified onthe basisof where they are deployed. WhentheIDS is deployedontheperipheries of a network and monitorstraffic in and out ofthe network, i.e. captures and analysespackets, it is known as aNetworkrestricted to that host, it is known as Host Based IDS. Host Based IDS can also use
networktraffic in and outofthehost tomonitorbehavior.
Amore detailed sub-classification ofthese systems, based on their
learning
mechanism,can be found in [2]. To summarize, in anomaly-based systems, the reference databases
contain valid behavior while in signature-based systems thereference databases contain
samplesofillegal behavior. Theabove discussionleadsustothe question of whatarethe
contents ofthesedatabases. Various attributesof a systemor anetworkhave beenusedto
classifynormal orabnormal behaviorof processesand programs. I discussthesebelow.
2.4 Currentstateof research
IDS TYPE FEATURES
IDES/NIDES
[6,7]
Anomaly and SignatureBased
Use profiles of user behavior
TRIPWIRE
[13]
AnomalyBased Checks attributes ofsystem files
NADIR[12] Network Based Monitorsnetwork traffic
DIDS
[11] [10]
Network Based Monitors network trafficTIM
[25]
Anomaly Based Creates profiles fromuserbehavior
NSM
[9]
Anomaly Based NetworkIDS
Monitorsnetworktraffic
Fig
2.1 Popularresearch Intrusions DetectionSystems.Figures in brackets denotetheirpublications in ReferencesList.
Initial interest inthe areaofintrusiondetectionandspecifically anomaly detection was
onfocused usinguserbehaviortodefine normal profiles. [19, 1, 12, 25, 5]. The criteria
usedin characterizingnormal useractivity includedtypical requests mademyusers,
[image:19.561.66.507.301.575.2]programs and applicationsrun, thetimingoftheir actions, amount oftimespentusing
system resources and amount ofsystemresources used. However thedifficulties inthis
approach were soon evident asthecomplexity ofapplications,the sizeofnetworks and
number of usersbeganto grow. 'Normaluserbehavior' includedawide range of actions
and was never stable. The dynamic natureofevolving legitimateuserbehaviormadethe
learning
processharder.The system call approach to anomaly intrusion detection was first proposed
by
Forrest[2,3]
andhasreceivedconsiderable interest from theresearch community. A combinationof some attributes of aprocess or aprogram andthesystem calls thattheygenerate, can
beused to characterize normalbehavior. System call information of a process can reveal
a wide range offurther information about the statusoftheprocess inthe kernel (e.g. its
interactions with other system resources like the network, the file system, the daemons
running inthebackground, thedevicesthataremounted, andits memoryrequests).
Forrest [2, 3] prototyped an intrusion detection system that used traces sequences of
system calls to
identify
normal program behavior. In their approach, the intrusion detectionsystem was madetosynthetically learnabout normal programbehavior.During
monitoring ifadifferentsequence ofsystem calls wasobserved, ananomalywasflagged.
Wepsi
[15]
implementedan intrusion detectionsystem based onthe system callapproachbut they used variable lengthpatterns in creating profiles, unlike Forrest's original work
[7].
They
also used audit trail patterns(certain specific events collected fromlog
files),which representthebehaviorof a process on a more superficial level than actual system
calls. Liao and Vemuri
[17]
take a slightly different approach to using system calls. Insteadofkeeping
trackof sequences ofcalls, theymonitorthefrequency
of system callsthat are issued
by
aprogram.By
doing
this, they avoidhaving
to treat every system callindividually
and reducetheoverhead involvedin analyzingandstoring every system call.Asystemcall isconsideredas aninstanceof a
'text'
andthewhole set of calls issuedas a
'document'. Proven
text-processing techniques can be applied to classifying these
'documents'. Priortothis,Lee
[16]
tried to improveontheresults obtainedby
Forrest[6]
sequences of system calltraces.
Frequency
basedmethods were also exploredby
Helmanand Bhangoo [11]. Instead of
keeping
track of frequencies ofevery system call, they recordedfrequenciesof sequencesofsystem calls.Kosoresow and Hofmeyer
[15]
used system call trace execution in a different manner.They
proposed to use finite state automata to model the sequences ofsystem calls. Nospecific method or algorithm was suggested
by
them for the creation of such automata.Typically
the program source code was used toderive a formal specification ofprogrambehavior, which in turn was used as input to construct the automata. Sekar et al
[21]
improvedon this methodology and cameup with afinite state automatabased automaticlearning
processthatusedsystemcallandprogram counterinformationto modelthe stateofthe automata.
Other metadata about program behavior, other than system call characteristics has also been used to define normal or 'self. Ko et al
[14]
used formal program specificationlanguageto create a rulebase. Theylookedat thesource code of a programto
identify
all its legal actionsand usedthis to createdefinitions ofnormal.Clearlyalot of efforthas been directed to the use ofsystem call analysis inthe area of anomaly detection. It has been established conclusively that system call analysis can reveal the details about program behavior. The current trend seems to be towards taking the information gathered from this analysis and using it to easily and efficiently create thresholdsorboundariesthatcan distinguish betweenanomalous andlegitimateevolving
user behavior (i.e. to reduce false positives). There also is some interest in
borrowing
concepts from neural networks and artificial intelligence, such as unsupervisedlearning
techniques, to enable the detection system to learn about 'self on its own, unlike theprevalentrulebased synthetic
learning
mechanisms. Another area ofintrusion detectionbeing
explored is the problem of response.Typically
the response ofintrusiondetectionsystems has been to kill processes, kill remote logins, suspend user accounts and in extreme instances isolate hosts altogether from the network. The drawbacks of these
responsesarethe overhead involvedand expensesincurred incase offalsepositives. The
pH IDS handlesthis problem
by delaying
suspicious processes and Somayaji [23] wereChapter 3
Features andVulnerabilities of pHIDS
3.1 DesignofpHIDS
3.1.1 Process Homeostasisanddefinition of'selP
Homeostasis refers to the ability ofbiological entities and especially humans to
identify
'self. Humans use this property to maintain a stable internal environment.
They
candetect when conditions like
body
temperature, amount and distribution ofbody
fluids,state ofthe immune system have altered, respond to such changes and reconstruct the
original balancedstate.The immunesystemcan detectthepresence offoreign bodies like viruses, bacteria and other external organisms and take corrective action to fight these
pathogens. This is possible because the human immune system can correctly
identify
itself, i.e. it is 'self-aware'. It is almost as if the immune system maintains a mammoth database
describing
the features and properties of varioustypes of cells andbody
fluidsand continuouslymonitors forentities which show variations fromthisdatabase.
The motivation forpH (Process Homeostasis) IDS comes from thisability ofhumans to define self and
identify
variations from self. This concept corresponds nicely to theprinciple behind anomaly intmsion detection.
Maintaining
computer or system security can be considered analogous to the task ofmaintaining normal internal environment in humans, although human physiological systems are far more complex than computersystems. Just asthe presenceof a foreign
body
or a viral infection inhumans, may result in abnormalbody
temperatures or some other deviation from normal characteristics,computer systems that are compromised do
display
signs of attacks and intrusions. Thechallenge for anomaly intrusion detection systems is to
intelligently
generate definitionsof 'self and spot signs of deviation from these 'normal' definitions at the earliest
momentpossible,withoutconsuming significant resources.
Forrest
[7]
were the first to propose the idea ofusing system call information to createdefinition of selffor processes.
They
proposed that inthe caseof computer systems andapplications, 'self can be expressed on a per-process basis,
by
using the system callhistory
ofthe program which the process is executing. Various attributes ofthe systemcalls a process executes can be used to collect information about whetherthe process is
behaving
abnormally or not. The timing of the system calls, the arguments passed tosystem calls, the program counter or instruction pointer values, separation between
certain specific system calls that ispossibly measured as the
hamming
distance betweenthese calls, the
frequency
of certain calls, the instructions executed between system callscan all be possibly usedto learn about a normally
behaving
process. It is also feasible touse a combination ofthese indicators. The only constraint in using these parameters is
that the detection system shouldnotinterfere with thenormal execution of programs and
shouldcarryoutits analysis andmonitoringtransparently.
Forrest et al [7] looked at short sequences of system calls to generate profiles of normal
program behavior. This simple technique surprisingly, worksjust as well as any ofthe
other above mentionedpossibilities, as proved
by
Warrenderet al [28]. Inthistechnique,signatures of programs are created by, running them in a secure environment and
recording the traces of system callsthatare executed. Thusthe entire set of system calls
executed
by
aprocess, are broken down into sequences oflengthsix, whichare stored inthe normal database. (The length six was chosen
by
the authors after experimentalanalysis and they suggest in general, lengths between six and fifteen forthe sequences.)
One of the drawbacks ofthis approach was that there was no, theoretical reasons or
rationalegiven forthechoice ofthisvalue.
Considerthe
following
snapshotof system calls:openQ, mmapQ, read(), socket, mmapQ, execveO, open(), readQ, closeQ.brkO
Withthistrace, openQ
being
thefirstsystemcall andbrk()
themost recent system callSeql
Seq2
Seq3
Seq4
Seq5
Seq6
openQ, mmapO, readQ, socketO, mmapO
mmapQ, readQ, socketQ, mmapO,execveQ
read(), socketQ, mmapQ, execveQ, openQ
socketQ, mmapQ, execveQ, openQ, readQ
mmapQ, execveQ, openQ, readQ, closeQ
execveQ, openQ, readQ, closeQ, brkO
Fig
3.1 Sequencesof system callswith window size offive.These sequences would be recorded in the database as part ofthe normal profile for this
particular program. Profiles usually consist of thousands of such short sequences. Interestingly, the size of a profileis notproportional to the size ofthe program. So, there
might be a program, which is smaller in size but generates more sequences than some
otherlarger programbecause the sequences generated are more relatedto the complexity
of the different execution paths within the program. If
during
the monitoring phase thefollowing
patternwasobserved,socketQ, mmapQ, execveQ, openQ, writeQ
Thiswould generate mismatches withthepatterns storedinthenormal database. Thusthe
anomalously
behaving
process, whichmay have loadedanother program(by
theexecveQcall)and opened afileand writtentoit (insteadofreading from
it)
willbe detected.Theuseof systemcall sequencesfor generatingnormal signatures can alsobejustified
by
the fact that security violations are
likely
to produce abnormal system call invocations. System calls are the only meansby
which a program operating in user space can enterkernel space and make use ofthe services provided
by
thekernel. It is unlikely that anattack script orpiece of malicious code can operate withoutentering kernel space. Once
operationofaprocessshifts into kernel mode, it haspotentialto cause damageto critical
system files and resources. System calls act as the
boundary
between user space andkernel space and are thus a good choice to watch interactions of a process with the
underlyingkernel. Foranintrusiontosucceedwithoutattempting anysystemcall, itmust
operate entirely in user space. It is very difficult, ifnot impossible to achievethis, andit
is
likely
that such an attack is based on some glaring flaw or programming error in theoperating system itself. One such instance ofthis found in literature [27], involved older
versions ofSolaris,where itwas possible to assume root privileges simply
by
callingthedivide-by-zero interrupthandler. Suchexamples are
fortunately
notverycommon.3.1.2 System calltraces andpHIDS
Somayaji and Forrest
[23]
developed a working prototype of the system call based anomaly detectionsystem, thepH(ProcessHomeostasis) IDS. Thoughtheprototype wasderived from the work of Forrest et al
[7]
as explained above, a slightly differentapproach was taken
by
Somayaji[22]
in storing system call sequences in the actualimplementation ofpH. In
fact,
pH does not store call sequences. It usesTook-ahead'
pans ofsystem calls. These pairs are obtained
by
processing traces of system calls in a'look-ahead'
window atanygiven point oftime. The current system call is recorded and
the pairs it forms with 'n-1
'
previously seen system calls are recorded, where V is the
size ofthe look-ahead window. For example, if a window size of5 is chosen, with 1
being
the oldest systemcall seen and5being
the latest or current systemcall, thelook-ahead pairsgenerated
by
thissequencewillbe (5,1), (5,2),(5,3)
and(5,4).Essentially
thesame information is stored and used to create profiles, however it is represented in a
differentmanner.
Going
back to the above example, consider the point in time when only the first threesystem calls ofthat sequencehave been observed and size ofthe look-aheadwindowis
five.
open mmap read socket mmap execve open read close brk
t
Current System Call
Contentsof look-aheadwindowof size 5
This state generatesthe
following
look-aheadpairs : (read, mmap)
, (read, *, open)
As the look-ahead window moves forward and new system calls are encountered the
number of look- ahead pairs increases. The above sequence would generate the
following
look- ahead pairs.
(n, n-1) (n, *, n-2) (n,*,*, n-3) (n,*,V,n4)
(mmap,open) (read, *,open) (socket,*, *, open) (mmap,*,*,*,open)
(read, mmap) (socket, *,mmap) (mmap, *, *,mmap) (execve,*,
*,*,mmap)
(socket, read) (mmap,
*,read)
(execve, *, *, read) (open,*,*,*,read)(mmap, socket) (execve, *,socket) (open, *, *, socket) (read,*,*,*,socket)
(execve, mmap) (open, *, mmap) (read, *, *, mmap) (close,
*,*,*,mmap)
(open, execve) (read, *, execve) (close, *, *,execve) (brk,*,*,*,execve)
(read, open) (close, *,open) (brk,*, *,open)
(close, read) (brk, *,read)
(brk, close)
In the actual implementation ofpH, alook-ahead window of size nine was used, which
generateseight look-aheadpairs pertrace. To compare the look-aheadpair approach and
the original sequence method, let us considerthe optimal or worst case scenario,where a
process ora program generates all thepossible permutations and combinations of system
call sequences. Let us denote the total number of system calls as
''
and the size of our
look-ahead windowandthus the lengthof our sequencesas '/w'. Look-aheadpairs can be
stored in n x nbit arrays, thus we would require m-1 bit arrays ofsize n x n. In Linux
kernel version 2.4.20 , 255 system calls have been defined in Thus the
look-aheadpairs methodwouldrequire 8 * 256 * 256bits forstoragethatisequivalentto
64KBperprogram. Onthe otherhand, sequencemethod, would requirem
"
(inthis case, 9256)bitsthatis ahugenumber.
*
Completepathislinux-2.4.20/include/asm/unistd.honRed Hat Linuxdistribution,with sourcetreerooted atlinux-2.4.20
The choice of nine as the look-ahead window size also serves another purpose. Since a
look-ahead window oflength nine, generates a maximum ofeight look-aheadpairs, byte
arrays with 256 *
256 locations or elements can be used to store the profiles ofevery
program. The presence or absence of a pair can then be represented as a bit map.
Considerthe
following
example,Sys Call 1 SysCall 2 Sys Call 3 SysCall 4
Sys Call 1
Sys Call 2
SysCall3
Fig
3.2 Contentsofbyte array storing system calllook-ahead pairs.The cell orbyte corresponding to Sys Call 2 and Sys Call 3, consists of8 bits which are
eitherset or reset,
depending
on thepresence or absence of a pair containingthese twosystem calls. Thus ifthe fourth bit ofthat location is set, it denotes that there is some
sequencein which Sys Call 2 and Sys Call 3 occur with a separation of4 system calls.
Similarly
ofthe second bit is reset, it indicatesthat Sys Call 2 and Sys Call 3 have never [image:28.561.67.428.232.464.2]3.2
Working
of pH IDS3.2.1 Implementation
The pH IDS is implemented as a patch to the Linux kernel. The authors ofpH had implementediton aLinuxbox running Debian andwith2.4.20 kernel which was current
atthat point oftime. It monitors andintercepts the invocationof each system call
by
all running or activeprocesses, performs somebookkeeping
operations, checks whether the invocation ofthecall ispermissible andif it is found tobe satisfactory itpasses control totheinterrupt handler.
System calls are treatedas software interrupts since they pass control from the executing
user program to the system call handler routines in the kernel. System calls are the interface between the
library
functions that are available to the user and their kernelcounterparts, which carry outthe actual work. The arguments ofthe
library
function are put in predefined registers and/or are located on the stack, the system call number is placed in the EAXregister and an interrupt is triggered. INT 0x80 is reserved in Linux forsystemcalls. The interrupthandlerthenuses the system callnumbertoindex into anaddress table defined in entry.S that holds the address ofthe function that actually does work forthe system call. The return value from the system call is then placed as a rule in the EAXregister and made available to original
library
function. This mechanism is described inadiagram below:Thismechanismisspecificto theIntelarchitecture andLinux kernel. Other operatingsystemsbasedon a differentplatform,may haveadifferent implementation.
**
Completelocation islinux-2.4.20/arch/i386/kernel/entry.S,on aRed Hat Linuxdistribution,with source treerooted atlinux-2.4.20
Program runningin
userspace
#include<stdio.h> Mnclude<stdlib.h>
int mainQ
{
intx:
x=
getchar(x);
while(x!=
EOF)
{
printf(%c, x);
getchar(x);
}
printf("\n"J; return0;
J
Libraryfunctions
Convertedto system calls
read(0,..,..)
write(l,..,..)
write(l,..,..)
Kernelspace,not accessible
touser spacefunctions
macro
do_syscall()
ret_from_syscall macro
[image:30.561.74.455.153.451.2]Passreturn valuebacktolibraryfunction.
Fig. 3.3 Sequenceofevents insystem call execution.
pH alters this sequence of events. In entry. S just before the call to the sys_call
handler is made, a pH function is called. This function checks if the look-ahead pairs
generated
by
the current system call andpreviouslyseen eight system calls are present inthe normal database. Ifthepairs have previously been seen, the execution ofthe system
call continues normally. However, if one or more anomalous pairs are
detected,
theexecution of this system call is suspended and the process delayed. This
delay
isCalltopH
-training
pH
-testing
-delay
SystemCallInvocation
System Call
Dispatcher
function
Process I
UserCodeandData
Anomalous /
Call ?
/
LegalCall
1
SyscallImplementation
retjromjyscall
Process2
Process 'n'
Scheduler
Pool of User Pro
for CPU.
cesseswaitirg Kernel Space
Fig
3.4 Architecturaloverviewof pH IDS.Adaptedfrom [22]3.2.2
Learning
phaseIn the
learning
phase pH observes the traces of system calls that are generatedby
aprogram runin a secure environment andbuildsnormalprofiles as describedabove. This
database ofnormal profiles is known asthe 'training data set'
As new look-aheadpairs
arediscovered
during
thelearning
phase, theyare includedinthe training data set. Onceenough time has passed without the addition ofany new pairs, the profile is set to be
ready for monitoring. pH uses a set of conditions to determine if it has seen enough
normal program behavior. After these conditions have been satisfied, it can begin the
monitoringphase andlookforanomalousbehavior.
The totalnumber of systemcalls executed
by
allprocesses running a common programisknownas the ctrain_counf Thenumber of system calls seen sincethe lastadditiontothe
[image:31.561.65.518.86.381.2]profile is known as
'
last_mod_counf. pH checks the ratio
train_count/ (train_count
-last_mod_count) . If this ratio is greater
than
four,
the profile is 'frozen',indicating
that enough system calls have been seenwithout any addition of newlook-ahead pairs to the profile. Ifnewlook-ahead pairs are
found while the profile is frozen, the profile is
'thawed'
and last_mod_count is set to
zero. For the profile to be frozen again, pH waits for the above ratio tobe greater than
four. The value offourwas arbitrarily chosen and canbe altered
by
the user. Notethat ifthere are two ormore processes ofthe sameprogram active, then all ofthese processes
contributeto theprofile pairs.
Theother conditionthat pH checks beforeitcanbeginmonitoring, istosee iftheprofile
stays frozen for a certain length oftime. Ifthe profile stays frozen for a period oftwo
weeks, then itmarks theprofile as
'normal'
and pHbegins monitoring forthe program.
Both ofthesechecks are necessaryto prevent a profile from
being
prematurely classifiedasnormal. Considerthe exampleof aprogramthatisnot
frequently
used, say onceeveryweek. This program will satisfy the above ratio, but since it has not been invoked too
often, its profile is still incomplete. There are certain execution paths that will generate
new look-ahead pairs and these have not been included in the profile. Ifthis profile is
marked as normal, without waiting for some periodoftime after it has been marked as
frozen, many falsepositivesmay be generated
during
the monitoringphase.On the other hand consider some complex program that too is not run frequently. A
program like Mozilla Web Browser displays significant variation in execution patterns
and generates a large number of look-aheadpairs. Ifwe only wait for a period oftwo
weeks, and do not check for the ratio condition, the profile may get classified as
3.2.3
Monitoring
phaseOnce the profile has beenmarked as normal, pH can shift its operationto themonitoring
phase. It does this
by
copying the training data set to the testing data set and uses the testing data set formonitoring purposes. Ifa system call generates alook-ahead pairthat is not present in the testing data set an anomaly is recorded. For every process, pHrecords the status oflast 128 system calls seen. Eachprocess is associatedwith a 128 bit
array, the contents ofwhich denote which ofthe last 128 system calls were anomalous. This value is called the
'locality
frame count'and is used to
delay
the processby
2iocaiity_frame_count * delay_f
actor . Delayjactor is a pH variable, which can be manipulated
by
the user(e.g
setting delayjactorto zero, effectively turns down pH's responses). Thus, even ifthecurrent system call is not anomalous,its executioncouldbe delayedby
the value computed above, if any ofthe last 128 system calls have beenmarked as anomalous.
The anomalies generated
by
each profile are also stored in the variablei
anomaly
Consider the example where two instances of vi editor are running, first of which has
generated atotal of 125 anomalies and the other 55 anomalies.
Assuming
these were theonly instancesof vi that were executed, the anomaly value for the entire profile of
vi is set to 180. Also, suppose the firstvi process has generated 20 anomalous calls in its
last 128 calls, while 40 anomalies have been associated with the last 128 calls ofthe
secondvi process. Thus the two processes will be delayed
by
different amounts and are independent ofthe anomaly value ofthe whole profile, (the first process with alocality
of 20, will be delayedby
@20 * delay_factor) seconds,while the second process with a
locality
of 40 will be delayedby
(240
* delay_f
actor) seconds.)
Once monitoring starts, the testing data set is not updated.
Theoretically,
no more new look-aheadpairs should bespotted but practically, there always exist a few pairs of callsthat were not encountered
during
thelearning
phase. Suchnew look-aheadpairs observedduring
monitoring are added to the training data set. These new anomalous pairs couldeither be attributed to intrusive activity or simply, legal user actions which were not
spotted
during
thelearning
phase. pH assumes thatsuch first few occasional anomalouscalls are more
likely
to be the result of unseen useractivity and includesthem intrainingdata set. These are then included in the normal profile ofthe program, when the
trainingdatasetis periodicallycopiedinto the testingdata set.
3.2.4
Manipulating
pHIt is possible to control and dictate the response and actions of pH through three
commands that are apart of a utilityprovided withthe source code ofpH. It is possible
to 'reset'
pH on aper profilebasis. Thisclearsthetraining andtesting datasets.
In spite of spending enough time in the
learning
phase, it could be possible that ourtesting data set is incomplete and does not represent the entire range ofbehavior of a
program. Ifthisis thecase, thena processreferringthisprofile could continueto generate
a large number ofanomalies, which in reality aremerely false positives. To
identify
thisevent, every profile is associated with an 'anomalyJimiV which specifiesthe maximum
number ofanomalies that it can generate. Ifthe anomaly ofthe profile exceeds its
anomalyJimitvalue,it is
likely
thatsuch aprofileis incompleteand we mustdiscontinueto use it. This is done
by
the 'tolerize' command, which tellspH to stop monitoring andbeginthe
learning
processagainfortheprofile inquestion.Since the new or anomalous pairs that are discovered
during
the monitoring phase, areadded to the trainingdata set and eventually intothe normal profile forthe program, we
could accidentally learn about an attack. pH tackles thisproblem
by limiting
the numberofanomalies that are added to the training data set in the monitoring set. This limit is called
'
tolerize and is associated with every process. If a particular process
generates more number of anomalies than this value, the 'sensitize' command can be used to tell pH that it should forget whatever new behavior it has learnt about this process.This is doneby,clearing thetrainingdataset.
*
Thisvalue,knownastolerize_limit,issetto12.Ifmorethan 12new pairs areseen,pHconcludesthatit
3.3
Mimicry
AttacksAn attack or intrusion can be separated intotwo phases, an initial penetration phase and
an exploitation phase.
During
the penetration phase generally some vulnerability ofthesystem is used to gain access to the system. This first step often consists ofgainingroot
privileges. The actual damage is done in the exploitation phase, when malicious code is
executed to change the state ofthe system. This often also involves
installing
backdoorstoensure the intruders can returnwheneverthey wantin future.
In mimicry attacks, intruders try to hide their presence
during
the exploitation phase.They
tryto escape detectionby
removingalltraces or theiractivityorcamouflagingtheirpresence.
They
'mimic' the behavior of a normal user. Mimicry attacks are possiblebecause intruders can adapttheir attackstoeludethe detectionalgorithm oftheIDS. It is
assumed that the intruders are aware ofthe internal operation ofthe IDS, are familiar
with its detectionprinciple and canduplicate its behavioron otherhosts.
Mimicry
or evasion attacks on network intrusion detection systems have received someattention in the past, although host based systems have notbeen looked at in great detail.
Chung
et al [4] showed thatby
'parallelizing' a sequential attack script, it is possible todistract the IDS. They showed that
by
using program parallelization techniques likeanalysis ofdata dependency, control
dependency
and data flow, it is possible to break aserial attack script into a concurrent or distributed script, which can be run
by
usingparallel threads of execution. Network Intrusion Detection Systems often employ a
'network
monitor'
observes the traffic of packets into and out of the network. These
network monitors have the requirement that they do not interfere or slow down
communications
by
a significant extent. Hence they are designed tobe light weight andbecause ofthisrequirementthey maynotbeaware ofdetailsoftheprotocol semantics on
thetarget systems.
Handley
etal[8]
suggeststhat such monitors maynotbe ableto detectthe ambiguities in network traffic and may be susceptible to the activity of a clever
intruder, (e.g. some monitors maynotbeprogrammed to reassemblefragmented packets,
or may not be aware ofthe underlying topology ofthe network). Ptacek and Newsham
[24]
have shown that the basic principle behind network intrusion detection, namelypassive protocol analysis, is not sufficient to stop attacks and it is possible to dupe the IDS.
3.4
Susceptibility
of pHIDSWagner et al
[27]
have shown the failure ofpH IDS in stopping mimicry or evasionattacks. In this category of attacks, the intruders try to hide their presence
by
not generating abnormal tracesofsystemcalls. Sincethe IDS doesnot see anythingwhich ithas not seenbefore, it fails to trigger an alarm. The authors of
[27]
have suggested someways in which these attacks canbe carried out andthis throws some light and also calls
formore discussion ontwo issues,
firstly
howto patchthepH IDS tomitigate thesekind of attacks andsecondlythepractical riskinvolved inthesekindofattacks.The above worklists a few ways in which evasion attacks can be carried out. The threat
and thepotential ofdamage from thiskind ofattacks is explained and is equal to, ifnot
greater than that from any other category of intrusions and attacks. What must be ascertained is, how much resources are needed
by
the intruder in terms oftime, effort, system capabilities and most importantly, expertise to successfully carry out thiskind of attacks. Obviously it is difficult to quantify these requirements, but a scientificapproximation canbe obtained, which should be able to indicate, how critical is theneed
to
develop
acountermeasureagainst, thisform ofintrusion.Some approaches to carrying out mimicry attacks, which are put forth
by
Wagner et al[27], seem practically impossible (though theoretically
feasible)
as this has beenacknowledged
by
the authors themselves. The authors claim that it might be possible forintruders to acquire root privileges
by
exploiting vulnerabilities in the OS and theninflict damage without
issuing
any system calls. However the extent of this damagewouldberestricted, ifnottrivial, asany significantdamageto the system orenvironment
requires circumventing the normal permission-level controls on its resources and this
approach suggests that the intruder could wait till a certain point of time, when his
malicious code sequence would pass as normal. The pH IDS keeps
learning
more aboutthe programs and processes that it monitors and ofien it observes new or abnormal
activitythatin factcouldvery wellbe simply evolving legitimateuserbehavior. It thenis
upto the system administrationtodetermine whetherthisnewactivity is indeed intrusive
or legal activity that needs to be added to its definition ofnormal. Thus, though it is
possiblethat the definition of normal might change overaperiodoftime andthischange
is visible to the intruder, it seems very unlikely that it will ever accommodate serious
harmful intrusive activity (e.g., it is unlikely that sendmaiV program would repeatedly
access or modify system files, an
ftp
server or client program would escalate thepermissions onfiles).
Thus it seems
fairly
implausible that these two approaches would be successful indeceiving
the IDS and avoiding detection. In theunlikely event that somehow, either ofthe two above approaches does manage to produce a successful intrusion and evade
detection, pH IDS would be able to do very little to thwart it , unless its design or
detectionalgorithmisalteredradically.
The other two suggested ways of carrying out evasion attacks are relatively more
plausibleand
interesting
with respecttodeveloping
a solution againstthis. The authors of[27]
havepointedoutthat intruderscould substitutethe arguments passedtosystem calls,by
some illegalparameters andachieve thedesiredresults. Thus iftheintruderwantedtomodifythe
/etc/syslog
. conf file, after gaining root access, instead ofwaiting forthe system call
open('
/etc/syslog.conf, write) to appear inthe normal execution
stream, he could use any open call and replace its parameter