Using system call analysis to stop evasion attacks in anomaly based Intrusion Detection System

(1)

Rochester Institute of Technology

RIT Scholar Works

Theses

Thesis/Dissertation Collections

2004

Using system call analysis to stop evasion attacks in

anomaly based Intrusion Detection System

Ashish Liladhar Samant

Follow this and additional works at:

http://scholarworks.rit.edu/theses

This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please [email protected].

Recommended Citation

(2)

Using System Call Analysis To Stop Evasion Attacks in Anomaly Based

Intrusion Detection System

Thesis report submitted in partial fulfillment of the requirements of the

degree Master of Science in Computer Science

July 2004

Ashish Liladhar Samant

Advisor: Prof. Leon Reznik

Leon Reznik

Reader: Prof. Warren Carithers

Warren Re Carithers

(3)

Thesis/Dissertation Author Permission Statement

Title of thesis or dissertation: U ~, N

4

s ....

S

l' ~ f\.\ C A L.L P, N A L

't

S , $

-,0

SlOP €VA.s'Of\J ATTPtC.\C.$ IN

A~oMfH",'t

SA-SeD

I I-.l'RUSION D€"('('tTION S'l.$T€.N

S

.

Name of author:

f\

SH I S. H

c....

SAM

AtoJ

r

Degree:

M A

50

T

c42..

0 F

S (

I

6 I\J

C-€

Program: CO M

i'L)'€4Z..

S,

I€: N (6

College:

y

C ( I

S

I understand that I must submit a print copy of my thesis or dissertation to the RIT Archives, per current RIT guidelines for the completion of my degree. I hereby grant to the Rochester Institute of Technology and its agents the non-exclusive license to archive and make accessible my thesis or dissertation in whole or in part in all forms of media in perpetuity. I retain all other ownership rights to the copyright of the thesis or dissertation. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

Print Reproduction Permission Granted:

I, f) S \-\

ISH

L

S

Prm

f\-

rV 'T

,

hereby grant permission to the Rochester Institute Technology to reproduce my print thesis or dissertation in whole or in part. Any reproduction will not be for commercial use or profit.

Ashish Samant

Signature of Author: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Date: 0

'1 /

'30/1-00

Lt

Print Reproduction Permission Denied:

I, , hereby deny permission to the RIT Library of the Rochester Institute of Technology to reproduce my print thesis or dissertation in whole or in part.

Signature of Author: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Date: _ _ _ _ _

Inclusion in the RIT Digital Media Library Electronic Thesis

&

Dissertation (ETD) Archive

I, , additionally grant to the Rochester Institute of Technology Digital Media Library (RIT DML) the non-exclusive license to archive and provide electronic access to my thesis or dissertation in whole or in part in all forms of media in perpetuity.

I understand that my work, in addition to its bibliographic record and abstract, will be available to the world-wide community of scholars and researchers through the RIT DML. I retain all other ownership rights to the copyright of the thesis or dissertation. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I am aware that the Rochester Institute of Technology does not require registration of copyright for ETDs.

I hereby certify that, if appropriate, I have obtained and attached written permission statements from the owners of each third party copyrighted matter to be included in my thesis or dissertation. I certify that the version I submitted is the same as that approved by my committee.

(4)

Acknowledgements

I wouldlike to take this_opportunity tothankall thepeoplewho haveguidedme, not_only

during

the completion of _my thesis but also

during

the course of _my education at Rochester Institute ofTechnology. I wouldlike to thank Prof. Leon _Reznik, _my advisor,

for supportingme throughoutthe durationofthe thesisand for his invaluable input. Right

fromthestart, while_preparingtheproposalfor_{this work, to the}_veryend while _preparing

the final _report, he was always _very patient and helpful. I would _{specially like} to thank Prof. Warren Carithers for his insightful comments and fruitful discussions over the

subject matter and for guidingme through the technical _writing skills. I must also thank

Prof. Fereydoun Kazemian for reviewing my work and for his support. Aspecial thanks to Prof. _Reek, who guided me through the initial phase and whose professionalism and

passioncontinuetobe a sourceofinspiration. I wouldalsolike to express_my gratitudeto the _very resourceful and generous users of the Linux Users

Group

Of Rochester

(LUGOR)

for

bailing

me out on numerous occasions when I had trouble with Linux issues. _Lastly, I would like to thank _my parents for their continuous support and

(5)

Abstract

Intrusion Detection Systems _(IDSs) that operate on the principle of system call

monitoringare knowntobe susceptible to_mimicryorevasion attacks. Ithas been shown

that an intelligent _adversary armed with comprehensive knowledge ofthe target system

or _network, can penetratethese _targets, hide his presence from the _IDS, and continue to

carry out damage. _IDSs, which use system calls to define normal _behavior, often leave out _{complimentary information} about_them, and intruders use _preciselythis _drawback, to

deceivethe IDS.

This thesis investigates the vulnerabilities of a system call based IDS and carries out a

theoretical and experimental _study of methods _allowing to improve the IDS performance

and reliability. It analyzes the designprinciples and architecture of_{anomaly based IDSs}

and studiestheimplementationofatypical system call_{based anomaly IDS. This category}

of _{anomaly detection} systems is currently attracting considerable attention within the

research _community and various prototypes have been developed in recent years. The thesis investigates the hypothesis that

by

_monitoring the number of system calls that fail

and return error values on aper process_basis, itwouldbepossibleto

identify

_abnormally

behaving

processes. It also suggests that

by

_{using only} a certain set ofcritical system

calls instead of all the defined _calls, it could be possible to detect and _{stop mimicry}

attacks. pH IDS is used for the purpose ofthe experiments as its source code is

freely

available. It works as a patch to the Linux kernel and alters the _way system calls are

handled. The tests were carried out on a stand-alone Linux box running RedHat 9 with

kernel version 2.4.20. Local _exploits, which were _readily available on the _Internet, were

usedintheexperiments.

Some oftheresults obtainedcontradictedour originalhypothesisand are indicative ofthe

scope for future work inthis area.Thetestsrevealed that itwasnotpossibleto _simplyuse

system call return values to

identify

_erroneously

behaving

processes. However after

classifying the system calls into critical and non-critical _sets, a form of_mimicry attacks

could be _successfully detected. The results confirm the potential of this technique to

(6)

TableofContents

Acknowledgements

Abstract

List offigures

1. Introduction 1

1.1. Intrusion Detection 1

1.2. pHIDS 2

1.3. Problem definition and pitfalls ofpHIDS 2

1.4. Thesis Outline 3

2. Literature Review 5

2. 1 Computer

Security

5

2.1.1 Whatare we

dealing

with ? 5

2.1.2 Whoarewe

dealing

with? 6

2.1.3 Genericapproachestocomputer_security 7

2.2 Nature of attacks 8

2.3 Classificationofintrusion detectionprinciples 9

2.4 Currentstateof research 11

Features andVulnerabilitiesof pH IDS 1 5

3.1 Design of pHIDS 15

3.1.1 Process Homeostasisanddefinitionof'self 15

3.1.2 Systemcalltraces and pHIDS 18

3.2

Working

ofpH IDS 21

3.2. 1 Implementation 21

3.2.2

Learning

phase 23

3.2.3 _Monitoringphase 25

3.2.4 _ManipulatingpH 26

3.3

Mimicry

Attacks 27

3.4

Susceptibility

of pHIDS 28

ModificationsandExperimentswith pH 3 1

4. 1

Testing

environment 31

4.2 Classificationof system calls 33

4.3

Tracking

of erroneoussystem calls 37

5. ResultsandConclusion ₃₉

5.1 Outcome from classificationof system calls ₃₉

(7)

5.3 Conclusion 48

5.4 Scopefor futurework 49

References 51

Appendix A 55

Appendix B 59

(8)

List ofFigures

Figure 2.1 PopularResearch Intrusion Detection Systems. Figures in brackets denote theirpublications in References List.

Figure 3.1 Sequences of system callswith window size offive.

Figure 3.2 Contents ofbyte array storingsystem calllook- aheadpairs.

Figure 3.3 Sequenceof eventsinsystem call execution.

Figure 3.4 Architecturaloverviewof pHIDS. Adapted from

[22]

Figure 4.1 Architectural Overview afterthemodificationstopH

Figure 5.1 Partial systemcall traceofXgalagaexploit.

Figure 5.2 Growthof profilebefore and aftermodifications.

Figure 5.3 Strace outputforemacs.

Figure 5.4 Graphicalrepresentation of rate ofgeneration oferrorsfor emacs. Referto

Appendix C for datavalues.

Figure 5.5 Graphicalrepresentationofdistributionof errors foremacs. Refer to

Appendix C for datavalues.

Figure 5.6 Graphicalrepresentation ofdistributionof errors for_emacs, whenafile

wasedited. Referto Appendix Cfor datavalues.

Figure 5.7 Graphical representation ofdistributionof errors forpHmon. Referto

(9)

Chapter 1

Introduction

1.1 Intrusion Detection

With the advent ofthe Internet andhigh-speed access _links,networks have beenthrown

open to the outside world and can be accessed with considerable ease as compared to

those of10 years ago.With this _{increase in availability} of_{services, there} also exists the

increased threat ofillegal access and misuse of systems. The source ofthis threat to the

integrity

of resources and services can be either internal (e.g. somebody within the

organization) or external (e.g. a hacker paid

by

a business competitor to siphon out

sensitive data). Irrespective of _{the sources,} it is necessary to safeguard systems and

maintain their appropriate and legal use. This is where Intrusion Detection Systems

(IDSs)come intothepicture.

Intrusiondetectionistheprocess of_monitoringthestate and use of asystemtodetectthe

presence of unauthorized activity.

Many

features and characteristics of system can be

used to construct views of normal _activity; the system can then be monitored

continuously to spot _{any divergence from} these normal views. _Alternately, known cases ofintrusions are used as a reference andthe behavior ofthesystem is observedto detect

any similarity to behavior of a system that is compromised

by

these known attacks. Response to intrusions has _{traditionally} not been considered as a _necessary feature of

intrusion detection systems, but of_late, considerable research effort has

being

directed towards

integrating

an automaticresponse mechanism inthesesystems.

Intrusion Detection Systems deal witha little piece ofthe problem of computer security.

In a _{typical scenario,} an IDS would _accompany other _security mechanisms like

firewalls,

anti virus software and

integrity

_checkers, all of which deal with the same problem at different levels. The task of_preventing systems from

falling

in to the handsofintruders

(10)

and _realizing thatattacks are inevitable and

directing

effortsto _{early detection. Intrusion}

detection systems are directed toward the latter stage, where the goal is to minimize

damage caused

by

a _security violation. The typical response of an intrusion detection

system is to 'raise an

alarm'

and _notify the system administrator, who in turn checks for falsepositives andthen takesremedialactionifnecessary.

1.2 pH IDS

The pH IDS is a host-based anomaly intrusion detection system and was designed and

developed at the _University ofNew Mexico

by

Anil Somayaji et al [23]. It can be

considered as representative ofthe entire class of intrusion detection systems that use

system call monitoring. The underlying detection algorithm ofpH IDS is based on the

generic approach of_using system call sequence information for constructing definitions

of normal program _activity, first suggested

by

Stephanie Forrest et al [7]. One ofthe unique characteristics of pH is that it can also respond to intrusions and take preventive

action on its _own, without the intervention ofthe system administrator. For _{this reason,} its authors have described it as 'autonomous response _system', rather than an intrusion detectionsystem.

1.3 Problem definition andpitfalls of pH IDS

Results of_preliminary analysis and experiments carried out

by

Forrest

[2,3]

show _that,

system call _{monitoring is} a _very effective approach to do intrusion detection. Forrest

extensively used the 'sendmail program for _testing purposes and '//?/-' in certain cases.

'SendmaiV was a good choice because it is

fiirly

_complex, generates _many variations of

legal behavior and there are _{many known} attacks to which it is susceptible. In their

results, increased instances of abnormal _activity could be detected in instances of

successful intrusions

by

some known attack scripts. Though these tests were carried out

(11)

UNIX*

, with slight or no modifications. The pH IDS, which was based on the above principle and was developed for a Linux kernel displayed similar results when further

tests were carried _exploitingthe vulnerabilitiesin SSH daemon [23].

From the above documented results andreports oftheperformance of pH IDS itcan be

concludedthat this approach (and the ph_IDS) is

fairly

successful against a broad variety

and classes of attacks and is not just limited to certain typical programs. Once a

vulnerability in a program has been exploited and the intruder injected his malicious code, it is

highly

probably that ph IDS will recognize the abnormal traces andraise an alarm before wide scale damage occurs. _However, Wagner and Sato have shown the

failure of pH IDS in stopping mimicryor evasionattacks [27]. In_{these attacks,}intruders

try to hide their presence, try not to generate abnormal traces of system _calls, and then

attempt to _'slip under the radar'. Since the IDS does not see _anything which it deems

'suspicious', it fails to trigger an alarm. _{When using} system call information to learn about normalbehaviorof_processes, _manyaspects of systemcalls canbeconsidered(e.g.

parameter values passedto system calls,the return values obtained after_{their execution,}

the relative frequencies of system _{calls, the} _timing ofsystem calls with respect to the othernotable events in theprogram _flow, and so on). pH _onlyuses sequence information

or relative _{ordering information} ofsystem calls. This_{design has very little} overhead and

is relatively less complex. But the _simplicity of pH IDS creates potential for anintruder

to break in. Theauthors of_[27] have suggested some ways inwhichthese attacks can be carried out and this raises two issues: how to _modify pH IDS to mitigate these kind of

attacks, and the practical risk involvedinthese kind of attacks. I focus on both ofthese

issues _{in my}thesis.

1.4 Thesis Outline

The remainder ofthis documentis organized as follows. Chapter 2 throws somelight on

the broad areaofComputer

Security

andthe role ofIntrusion _{Detection Systems.} It talks

about the different categories of intrusion detection systems and their features and

(12)

touchesonthecurrent state of research inthis area. Someofthesignificant research work

and papers are referenced and a background is created for the reader who _may not be

familiarwiththe concepts and currentpoliciesinthis domain.

Chapter 3 discusses the _working ofthe pH IDS. I look at _{the underlying} architecture of

the system and explainin briefthe_workingofthis system. The environmentinwhichpH is implemented is described. Some ofthe salient results obtained

by

the authors ofthe

system are noted. I explain the_{susceptibility} of pHIDStoevasion and _mimicryattacksas

outlined

by

WagnerandSato [27].

Chapter 4 establishes ourhypothesis. I outline a few approachesto _tacklingthe problem

and describe the tests and experiments that were carried out and the rationale behind them. The goal ofthese experiments was to evaluate some ofthe suggested solutions,

eliminate _those, which do not yield _satisfactory results and at the _least, establish a few pointersfor future _work,which might generate foolproofsolutiontothis problem.

Chapter 5 documents the results obtained from the tests. These are used to draw conclusions and test our assumptions and hypothesis. I _try to correlate the actual results obtained and ourexpected results and explainthe_{discrepancies,}when _theyoccur. I also

(13)

Chapter 2

Literature Review

2.1 Computer

Security

2.1.1 What are we

dealing

with?

Because of advancement in microprocessor _technology, closer _coupling between

operating system architectures and _programming languages and multiplication of resourcesduetowidedeploymentofdistributed systems and_networks,we are _todayable togetmore outputfrom our systemsthanwas evenimaginablea fewyears ago. Increased

computing capabilitymeansthattoday'ssystems and applications nowhave significantly greater _{functionality,} as well as

being

_both, faster and larger. _However, the downsideto

this has been that _they have also become more complicated and _thus, more _unstable,

prone_{to crashes,} _security_violations,abuse andlossof productivity.

The nature of our networks and pervasiveness of today's _{Internet only} add to the

problems faced

by

system administrators and personnel in charge of_{scripting security} policies. The Internet is an unbounded _network; remote hosts are added and deleted

randomlyand it is very difficult (ifnot _impossible), to predict theirbehavior while _they are connected to the network. The network protocols on which these networks operate

were designed decades ago, when the number ofhosts was far less.

Security

was not a

primary goal ofthe designers and so theseprotocols were

inherently

insecure (e.g. IP addresses, which are _essentially a sequence of _bits, is the _only mechanism used to authenticatethe two_{communicating hosts).}

Tomake matters worse fromthe _securitypoint of_view, _{computer systems and networks}

are not static entities and _they

display

constant changes in their characteristics. The

number ofhosts connected to a network can _{vary unpredictably;} system and application

(14)

users _accessing these systems is variable; and, most _importantly, definition of what

constituteslegitimate behavioris alsodynamic. Inthisenvironment, it becomes harderto

distinguish between abnormal _activityor stateof a system caused

by

a _security violation

andhithertounseen_activitythat _{is simply}aresultof_evolvinguserbehavior.

Consequently

it becomes critical that we focus our research efforts on _producing more

reliable, fault tolerant and secure _systems, networks and programs. Computer security is

notjust limited to detection of_intrusions, but also consists of_guaranteeing

integrity

of

data, ensuring availabilityofservices and_preventingviolationsof_securitypolicies.

2.1.2. Who arewe

dealing

with?

Originally,

hacking

was the domain ofthe elite and exclusive _group of computer _savvy geeks.

Hacking

into systems required access to resources like fast _processors, large

amounts of _memory and above all expert level technical know-how. The

hacking

community prided itselfon its exploits; the motivation for

breaking

systems and code was _usually a desire to impress others with one's technical ingenuity. There were instances when hackers were paid to break into organizations

by

a rival business competitor,buttherewere intheminority.

But

hacking

is no longer restricted to the geek community. _Ironically, it is the wide

availability of information on this topic on the _internet, that enables _anybody with Internet access and a _reasonably well-_{equipped computer} _to _{be in} _{a position} _to

cause

damage

by

spreading internet worms, spamming email,

injecting

trojan code and

breaking

into systems

by

use of malicious code. Thereexist entire websites dedicated to

the education ofthe average userintheart of computerhacking.

They

contain snippets of code, tutorials and listings ofknown _exploits, and automated tools that sniff_networks,

identify

vulnerable machines,extract host information.

The threat to the _securityof systemsisnotlimitedtohackers or crackers who are external

to the organization.

Security

violations also occur fromwithin the organization itselfand

(15)

privileges _may use an exploit against _{vulnerability in} a program to gain elevated

privileges and access sensitive _data, _modify_records, installback doors or delete system files.

2.1.3 Generic approachesto computer_security

Thomas Longstaffet al

[18]

classified acts of_security violationsinthe

following

manner.

Whenever information or resources of a system are copied or read

by

an unauthorized

user, itconstitutes loss_ofconfidentiality. Wheneverresourcesare corrupted

intentionally

or _{unintentionally,} it constitutes loss of integrity. Whenever resources are deleted or

become unreachable, it constitutes loss ofavailability. In all _instances,

'resources' refers

toeitherthedatathatoursystemsholdortheservicesthat_theyprovide.

The _{first step} towards_ensuringthe _security ofa system isto restrict accessto thesystem.

A user _wanting to connect to the computers must prove his identity. This is known as

authentication and _{generally involves} verification through passwords and usernames.

After the _authenticity ofthe user has been_{proved, the} user is allowed to access certain

parts of the _system, while certain features are hidden from him. This is known as

authorization, whichdefineswhatregionsof asystem a particularuser can reach. This is typically achieved

by

_{setting up}privilege levels and _read, writeand execute permissions onfilesanddirectories.

Another aspect of computer _{security is guarding data} as it travels across the network.

During

its_trip from onehostto another, it is

likely

that thedatawillpass_{through routers,}

gateways and hosts that_may notbea part ofthe organization. Itthus becomes important

that these external entities are not able to comprehend the sensitive data or garble it intentionally. For _{these purposes,} mechanisms like encryption are usedto make the data

unintelligible toanyone otherthan the_{communicating}parties.

A third _category of computer _security tools and mechanisms deals with _monitoring the

system,

detecting

violations, and _responding to them. Tools such as

firewalls,

file

integrity

checkers, system scanners _and, anti-_virus _programs

(16)

evaluate the state of a _system, scrutinizethe data

flowing

inand out oftheboundaries of

a_network, and _{generally look for}_{any kind} of suspicious activity, user or state of system

variables. IntrusionDetection Systems fall intothis category.

They

assumethatinspite of

best efforts, systems will be compromised; their goal is to detect these violations at the

earliest and minimizedamage

by

_takingappropriate remedial action.

2.2 Nature ofAttacks

Any

activity that contradicts or diverges from the intended usage of a system and its

applications and violates the _security policies ofthe organizationis consideredharmful.

Thus even _stealthy probes that are carried to detect vulnerable hosts on a network are

dangerous as these probes are often followed

by

payload ofthe attack itself. Network

scanning, probing, website-defacing, account _compromise, to the more _prominent,

password_cracking, denial of service _attacks,malicious code and the complete disruption

of normal servicesareall formsofhostileactivity.

Someofthe welldocumentedways of_attackinga system _[18,

20]

aredescribed below:

ProbesandScans: These are used to learn about _{the network,} discovervulnerable hosts

and

identify

software

being

_used,

including

_operating system _types, versions of various

programs, updates _{(if any)} on installed programs.

Stealthy

probes do not _usually

themselves damagethe system, but shouldbe consideredjust as dangerous as an actual

attack. These information gathering activities can be spread over a number ofdays and

even_weeks,as _ports, applications andhostsare scanned and_tested, one after another.

Privilege Escalation: Known vulnerabilities in systems thathave not been _patched, _may

beexploitedto gainsuper-_{user status.} _The_{goal of}_this_kind

(17)

Denial ofService: These attacks are _primarily carried out to make the systems or

applications unavailable and

deny

its authorized users access to services.

They

do not

necessarily access or _modify the data held within the system. Hackers may generate

repeated requests of a particular application or flood a network with their packets thus

making it harder for legal usersto accesstheseresources.

Packet Sniffers: These are anotherformof_stealthyattacks. Theseprogramstrackpackets

that flow over the network, and siphon out information from the encrypted data.

Typically

these programs look for usernames and passwords and other authentication or

personal _information,whichis laterused for

forging

identity.

Masquerading: In these kind of _attacks, through clever manipulation of contents of

packets, intruders assumetheappearanceoftrustedhostsand users. Thus ifa networkhas

been set _up to accept remote connections _{only from} a few specific _hosts, intruders

'masquerade'

asthese trustedhostsand_{may fool} thenetwork _{into accepting}them.

Trojan Code, Viruses, Worms: These are all examples ofprograms which can cause

damage on alarge scale, affectingentire networks. Theirpresence isnot _{easily identified}

until afterthe damage hasdone.

Typically

requireauserto execute some malicious_code,

which thenactivates ortriggers theseviruses and worms.Oftentheseare _{self-duplicating}

and_{self-propagating,}_affectingall connectedhosts.

2.3 Classification ofIntrusion Detection Principles

Based ontheir detectionprinciple Intrusion Detection Systems

(IDS)

can be classified as

anomaly based or signature based. An anomaly based IDS

initially,

learns about the

normal behaviorof a program orthesystemon_{the whole,}

by

_observing some predefined

characteristics. From this _information, the _anomaly based IDS tries to generate rules or sets of conditions, which govern normal usage. After sufficient time has been spent

creating profiles of normal _activity, it is then deployed and it monitors the network or

(18)

not match the reference _profiles, is flagged as anomalous. When sufficient suspect

behavior is_observed,_usually system administrators are notified. Notethat it is necessary

to ensure that

during

this

learning

_phase, there are no instances of an attack or an intrusion,or else theIDSmightlearnaboutthe attackitself.

In signature based _IDS, _{previously known} attack scripts and intrusions are used.

Using

the

history

of such _{already detected} attack _vectors, databases ofintrusive activities are created.

During

the _monitoring _{phase, these} databases are used to examine all seen

activity. Ifthere is a match between patterns ofsystem calls or contents ofa series of

packets or some other system observable and the contents ofthe _database, it is certain

thatthesystem is underattack.

There is also a _categoryof intrusion detection systemsthat usesa combination ofabove

principles and is known as Compound IDS. The most popular example of such a system is the IDES or Intrusion Detection Expert System _[19], which was followed

by

the NIDESorNext-generation IDES [1].

Anomaly-based IDSs suffer from the problem offalse positives. It can be difficult for

them to distinguish between _legitimate, _changing user-behavior or less

frequently

observed legal user behavior that was not seen

during

the

learning

phase, and real novelties that occur because of an intrusion. Signature based IDS do not generate false positives; however, the success of these depends on how efficiently and _quickly the database ofknown intrusions can be updated. This task is harder than it seems, because

even _though, a _{newly discovered} exploit can be

fairly

_quickly added to the _monitoring

database, it is often possible to generate variations ofthe same attack. Signature based

IDS arealsounableto

identify

_completelynew attack vectors.

IntrusionDetection Systems can also

loosely

be classified onthe basisof where _they are deployed. WhentheIDS is deployedontheperipheries of a network and monitorstraffic in and out of_{the network,} i.e. captures and analyses_packets, it is known as aNetwork

(19)

restricted to that _host, it is known as Host Based IDS. Host Based IDS can also use

networktraffic in and outofthehost tomonitorbehavior.

Amore detailed sub-classification of_{these systems,} based on their

learning

mechanism,

can be found in [2]. To _summarize, in anomaly-based _{systems, the} reference databases

contain valid behavior while in signature-based systems thereference databases contain

samplesofillegal behavior. Theabove discussionleadsustothe question of whatarethe

contents ofthesedatabases. Various attributesof a systemor anetworkhave beenusedto

classifynormal orabnormal behaviorof processesand programs. I discussthesebelow.

2.4 Currentstateof research

IDS TYPE FEATURES

IDES/NIDES

[6,7]

_Anomaly and Signature

Based

Use profiles of user behavior

TRIPWIRE

[13]

_AnomalyBased Checks attributes of

system files

NADIR_[12] Network Based Monitorsnetwork traffic

DIDS

[11] [10]

Network Based Monitors network traffic

TIM

[25]

_Anomaly Based Creates profiles from

userbehavior

NSM

[9]

_Anomaly Based Network

IDS

Monitorsnetworktraffic

Fig

2.1 Popularresearch Intrusions DetectionSystems.Figures in brackets denote

theirpublications in ReferencesList.

Initial interest inthe areaofintrusiondetectionand_{specifically anomaly detection} was

onfocused usinguserbehaviortodefine normal profiles. _{[19, 1, 12, 25,} 5]. The criteria

usedin _{characterizing}normal user_activity includedtypical requests made_my_users,

[image:19.561.66.507.301.575.2]

(20)

programs and applications_{run, the}_timingof_{their actions,} amount oftimespent_using

system resources and amount ofsystemresources used. However thedifficulties inthis

approach were soon evident asthe_complexity of_{applications,}the sizeofnetworks and

number of usersbeganto grow. 'Normaluserbehavior' includedawide range of actions

and was never stable. The dynamic natureof_{evolving legitimate}userbehaviormadethe

learning

processharder.

The system call approach to _{anomaly intrusion detection} was first proposed

by

Forrest

[2,3]

andhasreceivedconsiderable interest from theresearch community. A combination

of some attributes of aprocess or aprogram andthesystem calls that_theygenerate, can

beused to characterize normalbehavior. System call information of a process can reveal

a wide range offurther information about the statusoftheprocess inthe kernel (e.g. its

interactions with other system resources like _{the network, the} file _{system, the} daemons

running inthebackground, thedevicesthatare_mounted, andits memoryrequests).

Forrest _{[2, 3]} prototyped an intrusion detection system that used traces sequences of

system calls to

identify

normal program behavior. In their approach, the intrusion detectionsystem was madeto_{synthetically learn}about normal programbehavior.

During

monitoring ifadifferentsequence ofsystem calls wasobserved, an_anomalywasflagged.

Wepsi

[15]

implementedan intrusion detectionsystem based onthe system callapproach

but _they used variable lengthpatterns in creating _profiles, unlike Forrest's original work

[7].

They

also used audit trail patterns(certain specific events collected from

log

_files),

which representthebehaviorof a process on a more superficial level than actual system

calls. Liao and Vemuri

[17]

take a _{slightly different} approach to _using system calls. Insteadof

keeping

trackof sequences of_calls, _theymonitorthe

frequency

of system calls

that are issued

by

aprogram.

By

doing

_{this, they} avoid

having

to treat _every system call

individually

and reducetheoverhead involvedin analyzingand_{storing every} system call.

Asystemcall isconsideredas aninstanceof a

'text'

andthewhole set of calls issuedas a

'document'. Proven

text-processing techniques can be applied to _classifying these

'documents'. Priorto_this,Lee

[16]

tried to improveontheresults obtained

by

Forrest

[6]

(21)

sequences of system calltraces.

Frequency

basedmethods were also explored

by

Helman

and Bhangoo [11]. Instead of

keeping

track of frequencies of_every system call, they recordedfrequenciesof sequencesofsystem calls.

Kosoresow and Hofmeyer

[15]

used system call trace execution in a different manner.

They

proposed to use finite state automata to model the sequences ofsystem calls. No

specific method or algorithm was suggested

by

them for the creation of such automata.

Typically

the program source code was used toderive a formal specification ofprogram

behavior, which in turn was used as input to construct the automata. Sekar et al

[21]

improvedon this _methodology and came_up with afinite state automatabased automatic

learning

processthatusedsystemcallandprogram counterinformationto modelthe state

ofthe automata.

Other metadata about program _behavior, other than system call characteristics has also been used to define normal or 'self. Ko et al

[14]

used formal program specification

languageto create a rulebase. _Theylookedat thesource code of a programto

identify

all its legal actionsand usedthis to createdefinitions ofnormal.

Clearlyalot of efforthas been directed to the use ofsystem call analysis inthe area of anomaly detection. It has been established _conclusively that system call analysis can reveal the details about program behavior. The current trend seems to be towards _taking the information gathered from this analysis and _{using it} to _easily and _efficiently create thresholdsorboundariesthatcan distinguish betweenanomalous andlegitimate_evolving

user behavior (i.e. to reduce false positives). There also is some interest in

borrowing

concepts from neural networks and artificial _{intelligence,} such as unsupervised

learning

techniques, to enable the detection system to learn about 'self on its _own, unlike the

prevalentrulebased synthetic

learning

mechanisms. Another area ofintrusion detection

being

explored is the problem of response.

Typically

the response ofintrusiondetection

systems has been to kill processes, kill remote _logins, suspend user accounts and in extreme instances isolate hosts altogether from the network. The drawbacks of these

responsesarethe overhead involvedand expensesincurred incase offalsepositives. The

(22)

pH IDS handlesthis problem

by delaying

suspicious processes and Somayaji [23] were

(23)

Chapter 3

Features andVulnerabilities of pHIDS

3.1 DesignofpHIDS

3.1.1 Process Homeostasisanddefinition of'selP

Homeostasis refers to the _ability ofbiological entities and _{especially humans} to

identify

'self. Humans use this _property to maintain a stable internal environment.

They

can

detect when conditions like

body

_temperature, amount and distribution of

body

fluids,

state ofthe immune system have _altered, respond to such changes and reconstruct the

original balancedstate.The immunesystemcan detectthepresence offoreign bodies like viruses, bacteria and other external organisms and take corrective action to fight these

pathogens. This is possible because the human immune system can _correctly

identify

itself, i.e. it is 'self-aware'. It is almost as if the immune system maintains a mammoth database

describing

the features and properties of varioustypes of cells and

body

fluids

and _continuouslymonitors forentities which show variations fromthisdatabase.

The motivation forpH (Process _Homeostasis) IDS comes from this_ability ofhumans to define self and

identify

variations from self. This concept corresponds _nicely to the

principle behind anomaly intmsion detection.

Maintaining

computer or system _security can be considered analogous to the task of_maintaining normal internal environment in humans, although human physiological systems are far more complex than computer

systems. Just asthe presenceof a foreign

body

or a viral infection in_humans, _may result in abnormal

body

temperatures or some other deviation from normal _{characteristics,}

computer systems that are compromised do

display

signs of attacks and intrusions. The

challenge for anomaly intrusion detection systems is to

intelligently

generate definitions

of 'self and spot signs of deviation from these 'normal' definitions at the earliest

momentpossible,without_consuming significant resources.

(24)

Forrest

[7]

were the first to propose the idea of_using system call information to create

definition of selffor processes.

They

proposed that inthe caseof computer systems and

applications, 'self can be expressed on a per-process _basis,

by

_using the system call

history

ofthe program which the process is executing. Various attributes ofthe system

calls a process executes can be used to collect information about whetherthe process is

behaving

abnormally or not. The _timing of the system _{calls, the} arguments passed to

system _{calls, the} program counter or instruction pointer _values, separation between

certain specific system calls that is_possibly measured as the

hamming

distance between

these calls, the

frequency

of certain _{calls, the} instructions executed between system calls

can all be possibly usedto learn about a _normally

behaving

process. It is also feasible to

use a combination ofthese _{indicators. The only} constraint in using these parameters is

that the detection system shouldnotinterfere with thenormal execution of programs and

should_carryoutits analysis and_monitoringtransparently.

Forrest et al _[7] looked at short sequences of system calls to generate profiles of normal

program behavior. This simple _{technique surprisingly,} worksjust as well as _any ofthe

other above mentioned_{possibilities,} as proved

by

Warrenderet al [28]. Inthis_technique,

signatures of programs are created _by, _running them in a secure environment and

recording the traces of system callsthatare executed. Thusthe entire set of system calls

executed

by

a_process, are broken down into sequences oflength_six, whichare stored in

the normal database. (The length six was chosen

by

the authors after experimental

analysis and _they suggest in general, lengths between six and fifteen forthe _sequences.)

One of the drawbacks ofthis approach was that there was _no, theoretical reasons or

rationalegiven forthechoice ofthisvalue.

Considerthe

following

snapshotof system calls:

openQ, mmapQ, read(), socket, mmapQ, execveO, open(), readQ, closeQ.brkO

Withthistrace, openQ

being

thefirstsystemcall and

brk()

themost recent system call

(25)

Seql

Seq2

Seq3

Seq4

Seq5

Seq6

openQ, mmapO, readQ, socketO, mmapO

mmapQ, readQ, socketQ, mmapO,execveQ

read(), socketQ, mmapQ, execveQ, openQ

socketQ, mmapQ, execveQ, openQ, readQ

mmapQ, execveQ, openQ, readQ, closeQ

execveQ, openQ, readQ, closeQ, brkO

Fig

3.1 Sequencesof system callswith window size offive.

These sequences would be recorded in the database as part ofthe normal profile for this

particular program. Profiles usually consist of thousands of such short sequences. Interestingly, the size of a profileis notproportional to the size ofthe program. _So, there

might be a _program, which is smaller in size but generates more sequences than some

otherlarger programbecause the sequences generated are more relatedto the _complexity

of the different execution paths within the program. If

during

the _monitoring phase the

following

patternwas_observed,

socketQ, mmapQ, execveQ, openQ, writeQ

Thiswould generate mismatches withthepatterns storedinthenormal database. Thusthe

anomalously

behaving

process, which_{may have loaded}another program

(by

the_execveQ

call)and opened afileand writtentoit (insteadof_{reading from}

it)

willbe detected.

Theuseof systemcall sequencesfor generatingnormal signatures can alsobejustified

by

the fact that _security violations are

likely

to produce abnormal system call invocations. System calls are the _only means

by

which a program _{operating in} user space can enter

kernel space and make use ofthe services provided

by

thekernel. It is _unlikely that an

attack script orpiece of malicious code can operate without_entering kernel space. Once

operationofaprocessshifts into kernel _mode, it haspotentialto cause damageto critical

system files and resources. System calls act as the

boundary

between user space and

kernel space and are thus a good choice to watch interactions of a process with the

(26)

underlyingkernel. Foranintrusiontosucceedwithout_attempting _anysystem_call, itmust

operate _entirely in user space. It is _very _difficult, ifnot impossible to achieve_this, andit

is

likely

that such an attack is based on some _glaring flaw or _programming error in the

operating system itself. One such instance ofthis found in literature [27], involved older

versions of_Solaris,where itwas possible to assume root privileges _simply

by

_callingthe

divide-by-zero interrupthandler. Suchexamples are

fortunately

not_verycommon.

3.1.2 System calltraces andpHIDS

Somayaji and Forrest

[23]

developed a _working prototype of the system call based anomaly detectionsystem, thepH(Process_Homeostasis) IDS. Thoughtheprototype was

derived from the work of Forrest et al

[7]

as explained _above, a _{slightly different}

approach was taken

by

Somayaji

[22]

in _storing system call sequences in the actual

implementation ofpH. In

fact,

pH does not store call sequences. It uses

Took-ahead'

pans ofsystem calls. These pairs are obtained

by

_processing traces of system calls in a

'look-ahead'

window at_anygiven point oftime. The current system call is recorded and

the pairs it forms with 'n-1

'

previously seen system calls are _recorded, where V is the

size ofthe look-ahead window. For _example, if a window size of5 is _chosen, with 1

being

the oldest systemcall seen and5

being

the latest or current system_{call, the}

look-ahead pairsgenerated

by

thissequencewillbe (5,1), (5,2),

(5,3)

and(5,4).

Essentially

the

same information is stored and used to create _profiles, however it is represented in a

differentmanner.

Going

back to the above _example, consider the point in time when _only the first three

system calls ofthat sequencehave been observed and size ofthe look-aheadwindowis

five.

open _mmap read socket _mmap execve open read close brk

t

Current System Call

Contentsof look-aheadwindowof size 5

(27)

This state generatesthe

following

look-aheadpairs : (read, mmap

)

, (read, *

, open)

As the look-ahead window moves forward and new system calls are encountered the

number of look- ahead pairs increases. The above sequence would generate the

following

look- ahead pairs.

(n, n-1) (n, *, n-2) (n,*,*, n-3) (n,*,V,n4)

(mmap,open) (read, *,open) (socket,*, *, open) (mmap,*,*,*,open)

(read, mmap) (socket, *,mmap) (mmap, *, *,mmap) (execve,*,

*,*,mmap)

(socket, read) (mmap,

*,read)

(execve, *, *, read) (open,*,*,*,read)

(mmap, socket) (execve, *,socket) (open, *, *, socket) (read,*,*,*,socket)

(execve, mmap) (open, *, mmap) (read, *, *, mmap) (close,

*,*,*,mmap)

(open, execve) (read, *, execve) (close, *, *,execve) (brk,*,*,*,execve)

(read, open) (close, *,open) (brk,*, *,open)

(close, read) (brk, *,read)

(brk, close)

In the actual implementation of_pH, alook-ahead window of size nine was _used, which

generateseight look-aheadpairs pertrace. To compare the look-aheadpair approach and

the original sequence method, let us considerthe optimal or worst case _scenario,where a

process ora program generates all thepossible permutations and combinations of system

call sequences. Let us denote the total number of system calls as

''

and the size of our

look-ahead windowandthus the lengthof our sequencesas '/w'. Look-aheadpairs can be

stored in n x nbit arrays, thus we would require m-1 bit arrays ofsize n x n. In Linux

kernel version 2.4.20 , 255 system calls have been defined in Thus the

look-aheadpairs methodwouldrequire 8 * 256 * 256bits forstoragethatisequivalentto

64KBperprogram. Onthe otherhand, sequencemethod, would requirem

"

(in_{this case,} ₉₂₅₆₎bitsthatis ahugenumber.

*

Completepathislinux-2.4.20/include/asm/unistd.honRed Hat Linuxdistribution,with sourcetreerooted atlinux-2.4.20

(28)

The choice of nine as the look-ahead window size also serves another purpose. Since a

look-ahead window oflength _nine, generates a maximum ofeight look-aheadpairs, byte

arrays with 256 *

256 locations or elements can be used to store the profiles of_every

program. The presence or absence of a pair can then be represented as a bit map.

Considerthe

following

_example,

Sys Call 1 SysCall 2 Sys Call 3 SysCall 4

Sys Call 1

Sys Call 2

SysCall3

Fig

3.2 Contentsofbyte array storing system calllook-ahead pairs.

The cell orbyte corresponding to Sys Call 2 and Sys Call _3, consists of8 bits which are

eitherset or _reset,

depending

on thepresence or absence of a pair _containing_{these two}

system calls. Thus ifthe fourth bit ofthat location is set, it denotes that there is some

sequencein which Sys Call 2 and Sys Call 3 occur with a separation of4 system calls.

Similarly

ofthe second bit is reset, it indicatesthat Sys Call 2 and Sys Call 3 have never [image:28.561.67.428.232.464.2]

(29)

3.2

Working

of pH IDS

3.2.1 Implementation

The pH IDS is implemented as a patch to the Linux kernel. The authors ofpH had implementediton aLinux_{box running Debian} andwith2.4.20 kernel which was current

atthat point oftime. It monitors andintercepts the invocationof each system call

by

all running or active_processes, performs some

bookkeeping

operations, checks whether the invocation ofthecall ispermissible andif it is found to_{be satisfactory it}passes control to

theinterrupt handler.

System calls are treatedas software interrupts since _they pass control from the _executing

user program to the system call handler routines in the kernel. System calls are the interface between the

library

functions that are available to the user and their kernel

counterparts, which _carry outthe actual work. The arguments ofthe

library

function are put in predefined registers and/or are located on _{the stack, the} system call number is placed in the EAXregister and an interrupt is triggered. INT 0x80 is reserved in Linux forsystemcalls. The interrupthandlerthenuses the system callnumbertoindex into an

address table defined in entry.S that holds the address ofthe function that _actually does work forthe system call. The return value from the system call is then placed as a rule in the EAXregister and made available to original

library

function. This mechanism is described inadiagram below:

Thismechanismisspecificto theIntelarchitecture andLinux kernel. Other operatingsystemsbasedon a differentplatform,may haveadifferent implementation.

**

Completelocation islinux-2.4.20/arch/i386/kernel/entry.S,on aRed Hat Linux_{distribution,}with source treerooted atlinux-2.4.20

(30)

Program runningin

userspace

#include<stdio.h> Mnclude<stdlib.h>

int mainQ

{

intx:

x=

getchar(x);

while(x!=

EOF)

{

printf(%c, x);

getchar(x);

}

printf("\n_"J; return0;

J

Libraryfunctions

Convertedto system calls

read(0,..,..)

write(l,..,..)

Kernelspace,not accessible

touser spacefunctions

macro

do_syscall₍₎

ret_from_syscall macro

[image:30.561.74.455.153.451.2]

Passreturn valuebackto_libraryfunction.

Fig. 3.3 Sequenceofevents insystem call execution.

pH alters this sequence of events. In entry. S just before the call to the sys_call

handler is made, a pH function is called. This function checks if the look-ahead pairs

generated

by

the current system call and_previouslyseen eight system calls are present in

the normal database. Ifthepairs have _previously been _{seen, the} execution ofthe system

call continues normally. _However, if one or more anomalous pairs are

detected,

the

execution of this system call is suspended and the process delayed. This

delay

is

(31)

CalltopH

-training

pH

-testing

-delay

SystemCallInvocation

System Call

Dispatcher

function

Process I

UserCodeandData

Anomalous /

Call ?

/

LegalCall

1

SyscallImplementation

retjromjyscall

Process2

Process 'n'

Scheduler

Pool of User Pro

for CPU.

cesseswaitir_g Kernel Space

Fig

3.4 Architecturaloverviewof pH IDS.Adaptedfrom _[22]

3.2.2

Learning

phase

In the

learning

phase pH observes the traces of system calls that are generated

by

a

program runin a secure environment andbuildsnormalprofiles as describedabove. This

database ofnormal profiles is known asthe _'training data set'

As new look-aheadpairs

arediscovered

during

the

learning

phase, theyare includedinthe _training data set. Once

enough time has passed without the addition of_any new _{pairs, the} profile is set to be

ready for monitoring. pH uses a set of conditions to determine if it has seen enough

normal program behavior. After these conditions have been _satisfied, it can begin the

monitoringphase andlookforanomalousbehavior.

The totalnumber of systemcalls executed

by

allprocesses _running a common programis

knownas the ctrain_counf Thenumber of system calls seen sincethe lastadditiontothe

[image:31.561.65.518.86.381.2]

(32)

profile is known as

'

last_mod_counf. pH checks the ratio

train_count/ (train_count

-last_mod_count) . If this ratio is greater

than

four,

the profile is _'frozen',

indicating

that enough system calls have been seen

without _any addition of newlook-ahead pairs to the profile. Ifnewlook-ahead pairs are

found while the profile is _frozen, the profile is

'thawed'

and last_mod_count is set to

zero. For the profile to be frozen again, pH waits for the above ratio tobe greater than

four. The value offourwas _arbitrarily chosen and canbe altered

by

the user. Notethat if

there are two ormore processes ofthe sameprogram _active, then all ofthese processes

contributeto theprofile pairs.

Theother conditionthat pH checks beforeitcanbeginmonitoring, istosee iftheprofile

stays frozen for a certain length oftime. Ifthe profile stays frozen for a period oftwo

weeks, then itmarks theprofile as

'normal'

and pHbegins monitoring forthe program.

Both ofthesechecks are _necessaryto prevent a profile from

being

_prematurely classified

asnormal. Considerthe exampleof aprogramthatisnot

frequently

_used, _say once_every

week. This program will _satisfy the above _ratio, but since it has not been invoked too

often, its profile is still incomplete. There are certain execution paths that will generate

new look-ahead pairs and these have not been included in the profile. Ifthis profile is

marked as _normal, without _{waiting for} some periodoftime after it has been marked as

frozen, many falsepositives_{may be} generated

during

the _monitoringphase.

On the other hand consider some complex program that too is not run frequently. A

program like Mozilla Web Browser displays significant variation in execution patterns

and generates a large number of look-aheadpairs. Ifwe _only wait for a period oftwo

weeks, and do not check for the ratio _{condition, the} profile _may get classified as

(33)

3.2.3

Monitoring

phase

Once the profile has beenmarked as _normal, pH can shift its operationto the_monitoring

phase. It does this

by

_copying the _training data set to the _testing data set and uses the testing data set for_monitoring purposes. Ifa system call generates alook-ahead pairthat is not present in the _testing data set an _{anomaly is} recorded. For every process, pH

records the status oflast 128 system calls seen. Eachprocess is associatedwith a 128 bit

array, the contents ofwhich denote which ofthe last 128 system calls were anomalous. This value is called the

'locality

frame count'

and is used to

delay

the process

by

2iocaiity_frame_count * _{delay_f}

actor . Delayjactor is a pH variable, which can be manipulated

by

the user

(e.g

_{setting delayjactor}_{to zero,} _effectively turns down pH's responses). _Thus, even ifthecurrent system call is not _anomalous,its executioncouldbe delayed

by

the value computed _above, if any ofthe last 128 system calls have been

marked as anomalous.

The anomalies generated

by

each profile are also stored in the variable

i

anomaly

Consider the example where two instances of vi editor are running, first of which has

generated atotal of 125 anomalies and the other 55 anomalies.

Assuming

these were the

only instancesof vi that were _{executed, the anomaly} value for the entire profile of

vi is set to 180. _Also, suppose the firstvi process has generated 20 anomalous calls in its

last 128 calls, while 40 anomalies have been associated with the last 128 calls ofthe

secondvi process. Thus the two processes will be delayed

by

different amounts and are independent ofthe _anomaly value ofthe whole profile, (the first process with a

locality

of _20, will be delayed

by

@20 * delay_factor) _seconds,

while the second process with a

locality

of 40 will be delayed

by

(240

* _{delay_f}

actor) seconds.)

Once monitoring starts, the testing data set is not updated.

Theoretically,

no more new look-aheadpairs should bespotted but practically, there always exist a few pairs of calls

that were not encountered

during

the

learning

phase. Suchnew look-aheadpairs observed

during

monitoring are added to the _training data set. These new anomalous pairs could

either be attributed to _{intrusive activity} or _simply, legal user actions which were not

(34)

spotted

during

the

learning

phase. pH assumes thatsuch first few occasional anomalous

calls are more

likely

to be the result of unseen user_activity and includesthem in_training

data set. These are then included in the normal profile ofthe program, when the

trainingdatasetis periodicallycopiedinto the _testingdata set.

3.2.4

Manipulating

pH

It is possible to control and dictate the response and actions of pH through three

commands that are apart of a _utilityprovided withthe source code ofpH. It is possible

to 'reset'

pH on aper profilebasis. Thisclearsthe_training and_testing datasets.

In spite of _spending enough time in the

learning

_phase, it could be possible that our

testing data set is incomplete and does not represent the entire range ofbehavior of a

program. Ifthisis thecase, thena process_referringthisprofile could continueto generate

a large number of_anomalies, which _{in reality} are_{merely false} positives. To

identify

this

event, every profile is associated with an '_anomalyJimiV which specifiesthe maximum

number ofanomalies that it can generate. Ifthe _anomaly ofthe profile exceeds its

anomalyJimitvalue,it is

likely

thatsuch aprofileis incompleteand we mustdiscontinue

to use it. This is done

by

the 'tolerize' command, which tellspH to _{stop monitoring} and

beginthe

learning

processagainfortheprofile inquestion.

Since the new or anomalous pairs that are discovered

during

the _monitoring _phase, are

added to the _trainingdata set and _{eventually into}the normal profile for_{the program,} we

could _{accidentally learn} about an attack. pH tackles thisproblem

by limiting

the number

ofanomalies that are added to the _training data set in the _monitoring set. This limit is called

'

tolerize and is associated with _every process. If a particular process

generates more number of anomalies than _{this value, the} 'sensitize' command can be used to tell pH that it should forget whatever new behavior it has learnt about this process.This is doneby,_clearing the_trainingdataset.

*

Thisvalue,knownastolerize_limit,issetto12.Ifmorethan 12new pairs areseen,pHconcludesthatit

(35)

3.3

Mimicry

Attacks

An attack or intrusion can be separated intotwo phases, an initial penetration phase and

an exploitation phase.

During

the penetration phase _generally some _{vulnerability} ofthe

system is used to gain access to the system. This first step often consists of_gainingroot

privileges. The actual damage is done in the exploitation _phase, when malicious code is

executed to change the state ofthe system. This often also involves

installing

backdoors

toensure the intruders can returnwhenever_they wantin future.

In mimicry attacks, intruders try to hide their presence

during

the exploitation phase.

They

tryto escape detection

by

_removingalltraces or their_activityor_camouflagingtheir

presence.

They

'mimic' the behavior of a normal user. _Mimicry attacks are possible

because intruders can adapttheir attackstoeludethe detectionalgorithm oftheIDS. It is

assumed that the intruders are aware ofthe internal operation ofthe _IDS, are familiar

with its detectionprinciple and canduplicate its behavioron otherhosts.

Mimicry

or evasion attacks on network intrusion detection systems have received some

attention in _{the past,} although host based systems have notbeen looked at in great detail.

Chung

et al _[4] showed that

by

'parallelizing' a sequential attack _script, it is possible to

distract the IDS. _They showed that

by

_using program parallelization techniques like

analysis ofdata _dependency, control

dependency

and data _flow, it is possible to break a

serial attack script into a concurrent or distributed _script, which can be run

by

_using

parallel threads of execution. Network Intrusion Detection Systems often _employ a

'network

monitor'

observes the traffic of packets into and out of the network. These

network monitors have the requirement that _they do not interfere or slow down

communications

by

a significant extent. Hence _they are designed tobe light weight and

because ofthisrequirement_they _maynotbeaware ofdetailsoftheprotocol semantics on

thetarget systems.

Handley

etal

[8]

suggeststhat such monitors _maynotbe ableto detect

the ambiguities in network traffic and _{may be} susceptible to the _activity of a clever

intruder, (e.g. some monitors _maynotbeprogrammed to reassemble_fragmented _packets,

or _may not be aware ofthe _underlying _topology ofthe network). Ptacek and Newsham

(36)

[24]

have shown that the basic principle behind network intrusion detection, _namely

passive protocol _{analysis, is} not sufficient to _stop attacks and it is possible to dupe the IDS.

3.4

Susceptibility

of pHIDS

Wagner et al

[27]

have shown the failure ofpH IDS in stopping mimicry or evasion

attacks. In this _category of _{attacks, the} intruders _try to hide their presence

by

not generating abnormal tracesofsystemcalls. Sincethe IDS doesnot see _anythingwhich it

has not seen_before, it fails to trigger an alarm. The authors of

[27]

have suggested some

ways in which these attacks canbe carried out andthis throws some light and also calls

formore discussion ontwo _issues,

firstly

howto patchthepH IDS tomitigate thesekind of attacks and_secondlythepractical riskinvolved inthesekindofattacks.

The above worklists a few ways in which evasion attacks can be carried out. The threat

and thepotential ofdamage from thiskind ofattacks is explained and is equal _to, ifnot

greater than that _{from any} other _category of intrusions and attacks. What must be ascertained _is, how much resources are needed

by

the intruder in terms of_time, _effort, system capabilities and most _importantly, expertise to _{successfully carry} out thiskind of attacks. _Obviously it is difficult to _quantify these requirements, but a scientific

approximation canbe obtained, which should be able to _indicate, how critical is theneed

to

develop

acountermeasure_{against, this}form ofintrusion.

Some approaches to _carrying out _mimicry attacks, which are put forth

by

Wagner et al

[27], seem _practically impossible (though _{theoretically}

feasible)

as this has been

acknowledged

by

the authors themselves. The authors claim that it might be possible for

intruders to acquire root privileges

by

_exploiting vulnerabilities in the OS and then

inflict damage without

issuing

_any system calls. However the extent of this damage

wouldberestricted, ifnot_trivial, as_any significantdamageto the system orenvironment

requires _{circumventing the} normal permission-level controls on its resources and this

(37)

approach suggests that the intruder could wait till a certain point of _time, when his

malicious code sequence would pass as normal. The pH IDS keeps

learning

more about

the programs and processes that it monitors and ofien it observes new or abnormal

activitythatin factcould_very wellbe simply evolving legitimateuserbehavior. It thenis

upto the system administrationtodetermine whetherthisnew_{activity is indeed intrusive}

or legal _activity that needs to be added to its definition ofnormal. _Thus, though it is

possiblethat the definition of normal might change overaperiodoftime andthischange

is visible to the _intruder, it seems _{very unlikely} that it will ever accommodate serious

harmful intrusive _activity _(e.g., _{it is unlikely} that sendmaiV program would _repeatedly

access or _modify system _files, an

ftp

server or client program would escalate the

permissions onfiles).

Thus it seems

fairly

implausible that these two approaches would be successful in

deceiving

the IDS and _{avoiding detection. In} the_unlikely event that somehow, either of

the two above approaches does manage to produce a successful intrusion and evade

detection, pH IDS would be able to _{do very little} to thwart it , unless its design or

detectionalgorithmisalteredradically.

The other two suggested ways of _carrying out evasion attacks are _relatively more

plausibleand

interesting

with respectto

developing

a solution againstthis. The authors of

[27]

havepointedoutthat intruderscould substitutethe arguments passedtosystem _calls,

by

some illegalparameters andachieve thedesiredresults. Thus iftheintruderwantedto

modifythe

/etc/syslog

. conf file, after gaining root access, instead of_{waiting for}

the system call

open('

/etc/syslog.conf, _write) to appear inthe normal execution

stream, he could use _any open call and replace its parameter