Anomaly based Network Intrusion Detection System

(1)

Anomaly based

Network Intrusion Detection System

(2)

Anomaly based

Network Intrusion Detection System

Thesis Submitted in Partial fulfillment of the requirements for the Degree Of

Master of Technology

In

Computer Science and Engineering

By

Dinakara K (06CS6026)

Under the supervision of

Prof. Jayanta Mukhopadhyay

Prof. S.K. Ghosh

Computer Science and Engineering

Indian Institute of Technology

Kharagpur -721302, India

(3)

Computer Science and Engineering

INDIAN INSTITUTE OF TECHNOLOGY

KHARAGPUR

Certificate

This is to certify that the thesis entitled

“ Anomaly based Network Intrusion

Detection System ”

which is being submitted to the Indian Institute of Technology,

Kharagpur, for the award of the degree of Master of Technology in Computer Science and

Engineering by Dinakara K ., Roll No. 06CS6026 has been carried out by him under our

guidance. This thesis, in our opinion, is worthy of consideration for the award of degree

of Master of Technology in accordance with the regulations of this institute.

(Dr. Jayanta Mukhopadhyay) (Dr. S. K. Ghosh)

Professor, Asst. Professor,

Dept. Computer Science and Engineering School of Information Technology

Indian Institute of Technology Indian Institute of Technology

(4)

ACKNOWLEDGEMENTS

Many people deserve to be acknowledged for their contribution to this work and even more need to be mentioned for their enthusiasm and support in the last one year. This page is for them all.

I want to start by thanking my project guides Dr. Jayanta Mukhopadhyay and

Dr. S. K. Ghosh. Thanks for their invaluable guidance, incessant inspiration, prolific

encouragement and for just being there whenever I needed you the most. Their untiring help and constructive suggestions during the course of the project have helped me in learning a lot and without which it would have been difficult to complete the thesis work.

I express my sincere thanks to Dr. D. K Nanda, Chief Systems Manager, Computer and

Informatics Centre, IIT kharagpur for providing the facility for sniffing the IIT network.

I am deeply indebted to Dr. G Athithan, Head, Intelligence Systems Division, Centre for

Artificial Intelligence and Robotics, Bangalore for his precious guidance and support given for my

thesis work.

Sincere thanks to my friends, Biswajit Paul, Girish Gokuldasan and Dinesh Singh Kutiyal for their support and constructive suggestions throughout this project as well as the whole course.

I would love to dedicate this thesis to my parents whose cooperation, support, affection and well wishes enabled me to complete this endeavour successfully.

Above all I humbly acknowledge the grace and blessings of thy supreme power that capacitates me to fulfill this well nurtured dream.

Dinakara K

(5)

A

CRONYMS AND

A

BBREVIATIONS

ACL : Access Control List

ARP : Address Resolution Protocol

BASE : Basic Analysis and Security Engine

DDOS : Distributed Denial of Service

DMZ : Demilitarized Zone

DNS : Domain Name Server

DOS : Denial of Service

HTTP : Hyper Text Transfer Protocol

ICMP : Internet Control Message Protocol

IP : Internet Protocol

NIC : Network Interface Card

NIDS : Network Intrusion Detection System

PCRE : Perl Compatible Regular expression

RPC : Remote Procedure Call

SPAN : Switched Port Analyzer

TAP : Test Access Point

TCP : Transmission Control Protocol

TTL : Time to Live

(8)

L

IST OF

F

IGURES

FIGURE 1: NETWORK IDS PLACED BEFORE THE GATEWAY FIREWALL... 11

FIGURE 2: NETWORK IDS IN THE DMZ ... 12

FIGURE 3: NETWORK IDS WITHIN THE PRIVATE NETWORK... 12

FIGURE 4: NETWORK IDS SNIFFING THE NETWORK IN A HUB ENVIRONMENT... 13

FIGURE 5: NETWORK IDS SNIFFING THE NETWORK USING TAP DEVICE... 14

FIGURE 6: DEPLOYMENT SCENARIO OF NIDS WITH SENSORS IN STRATEGIC POINTS... 15

FIGURE 7: SNIFFED PACKET (SNORT –V)... 17

FIGURE 8: SNIFFED PACKET ( SNORT –DEV)... 17

FIGURE 9: ALERTS GENERATED IN INTRUSION DETECTION MODE... 18

FIGURE 10: OVERALL SYSTEM ARCHITECTURE... 23

FIGURE 11: NETWORK IDS SENSOR... 23

FIGURE 12: NETWORK IDS PRE-PROCESSOR... 24

FIGURE 13: ANOMALY DETECTION PRE-PROCESSOR... 25

FIGURE 14: SCREENSHOT OF BASE CONSOLE SHOWING THE GENERATED ALERTS... 27

FIGURE 15: BASE CONSOLE SHOWING THE ALERT STATISTICS... 28

FIGURE 16: BASE CONSOLE SHOWING THE DETAILS OF SNIFFED PACKET... 29

FIGURE 17: TIME SLOTS USED IN GENERATING THE NETWORK PROFILE... 33

FIGURE 18: ALGORITHM FOR GENERATING THE PROFILE... 33

FIGURE 19: ALGORITHM FOR DETECTION... 34

FIGURE 20: FLOW CHART DEPICTING THE OVERALL WORKING OF ANOMALY DETECTION TECHNIQUE35 FIGURE 21: NORMAL DISTRIBUTION CURVE WITH DIFFERENT CONFIDENCE INTERVALS... 36

FIGURE 22: MULTIVARIATE GAUSSIAN DISTRIBUTION CURVE... 39

FIGURE 23: TRAFFIC PATTERN IN THE COURSE OF A DAY (MONDAY) ... 50

FIGURE 24: TCP PACKET COUNT IN THE COURSE OF A DAY (MONDAY) ... 50

FIGURE 25: TCP STATISTICS IN THE COURSE OF A DAY (MONDAY )... 51

(9)

FIGURE 27: UDP STATISTICS IN THE COURSE OF A DAY (MONDAY ) ... 52

FIGURE 28: ICMP PACKET COUNT IN THE COURSE OF A DAY (MONDAY ) ... 52

FIGURE 29: ICMP PACKET COUNT IN THE COURSE OF A DAY (MONDAY )... 53

FIGURE 30: NUMBER OF CONNECTIONS IN THE COURSE OF A DAY (MONDAY)... 53

FIGURE 31: CONNECTION STATISTICS IN THE COURSE OF A DAY (MONDAY)... 54

FIGURE 32: TRAFFIC STATISTICS IN THE COURSE OF A DAY (SATURDAY )... 54

FIGURE 33: TRAFFIC STATISTICS IN THE COURSE OF A DAY (SUNDAY )... 55

FIGURE 34: TRAFFIC STATISTICS IN THE COURSE OF A WEEK... 55

FIGURE 36: INTRUSIVE TRAFFIC STATISTICS IN THE COURSE OF A DAY (MONDAY)... 56

FIGURE 37: INTRUSIVE TRAFFIC STATISTICS IN THE COURSE OF A WEEK... 57

FIGURE 38: AVERAGE TRAFFIC STATISTICS IN THE COURSE OF A DAY (MONDAY ) ... 57

FIGURE 39: BASECONSOLE DISPLAYING THE TRAFFIC STATISTICS BY PROTOCOL... 58

FIGURE 40: BASE CONSOLE DISPLAYING THE ALERTS STATISTICS... 58

(10)

L

IST OF

T

ABLES

TABLE 1:TYPICAL VALUES OBTAINED FOR THE NORMAL AND INTRUSIVE NETWORK TRAFFIC WITH HOTELLING’S AND BAYESIAN DISCRIMINATOR FUNCTIONS 42 TABLE 2:CHART SHOWING THE COMPARATIVE RESULTS OF THE EXPERIMENTS 43

(11)

1. CHAPTER

1 1.1. I

NTRODUCTION

Internet is forcing organizations into an era of open and trusted communications. This openness at the same time brings its share of vulnerabilities and problems such as financial losses, damage to reputation, maintaining availability of services, protecting the personal and customer data and many more, pushing both enterprises and service providers to take steps to guard their valuable data from intruders, hackers and insiders. Intrusion Detection System has become the fundamental need for the successful content networking.

IDS provide two primary benefits: Visibility and Control [1]_{. It is the} combination of these two benefits that makes it possible to create and enforce an enterprise security policy to make the private computer network secure. Visibility is the ability to see and understand the nature of the traffic on the network while Control is the ability to affect network traffic including access to the network or parts thereof. Visibility is paramount to decision making and makes it possible to create a security policy based on quantifiable, real world data. Control is key to enforcement and makes it possible to enforce compliance with security policy.

1.2. B

RIEF

H

ISTORY OF

IDS

The idea of detecting the intrusions or system misuses by looking at some kind malicious patterns in the network or user activity was initially conceived by James Anderson in his report titled “Computer Security Threat Monitoring and

Surveillance” [2]_{to US Air Force in the year 1980.}

In the year 1984, the first prototype of Intrusion Detection System which monitors the user activities, named “Intrusion Detection Expert System” (IDES) was developed. In the year 1988, “Haystack” became the first IDS to use patterns and statistical analysis for detecting malicious activities, but it lacked the capabilities of real time analysis.

(12)

Meanwhile, there were other significant advances occurring at University of California Davis' Lawrence Livermore Laboratories. In the year 1989, they built a IDS called “Network System Monitor” (NSM) for analyzing the network traffic. This project was subsequently developed into IDS named “Distributed Intrusion Detection System” (DIDS). “Stalker” based on DIDS became the first commercially available IDS and influenced the growth and trends of future IDS. In the Mid 90’s, SAIC developed “Computer Misuse Detection System” (CMDS), a host based IDS. US Air Force’s Cryptographic support centre developed “Automated Security Incident Measurement” (ASIM), which addressed the issues like scalability and portability.

The intrusion detection market began to gain in popularity and truly generate revenues around 1997. In that year, the security market leader, ISS, developed a network intrusion detection system called “Real Secure”. A year later, Cisco recognized the importance of network intrusion detection and purchased the Wheel Group, attaining a security solution they could provide to their customers. Similarly, the first visible host-based intrusion detection company, Centrax Corporation, emerged as a result of a merger of the development staff from Haystack Labs and the departure of the CMDS team from SAIC. From there, the commercial IDS world expanded its market-base and a roller coaster ride of start-up companies, mergers, and acquisitions ensued.

Martin Roesch, in the year 1998 launched a light weight open source Network IDS named “SNORT” [3]_{, which has since then gained much popularity.} In year 1999 Okena Systems worked out the first Intrusion Prevention System (IPS) under the name “Storm Watch”. IPS are the systems which not only detect the intrusions but also are able to react on alarming situation. These systems can co-operate with firewall without any intermediary applications.

1.3. T

YPES OF

IDS

(13)

Network based IDS (NIDS):

Monitors and analyzes the individual packets passing around a network for detecting attacks or malicious activities happening in a network that are designed to be overlooked by a firewall’s simplistic filtering rules.

Host based IDS (HIDS):

Examines the activity on individual computer or host on which the IDS is installed. The activities include login attempts, process schedules, system files integrity checking system call tracing etc. Sometimes two kinds of IDS are combined together to form a Hybrid IDS.

Generally IDS has two components –

Central Administration (Management) Module:

Provides centralized facility for managing and monitoring of all the installations of Intrusion Detection System and hence centralized way of analyzing and detecting the intrusions. It has the complete view of the various activities and events occurring in different segments of the organizational network. Moreover the policy settings, actions to be triggered, patches/signature updation, fine tuning of sensors can be achieved with this module.

IDS Sensors (Agents):

Analyses the network traffic and identifies attacks and security breaches, which take place by exploiting the technology of network implementation, reports the alerts to the Management module and performs the preset actions. IDS Agents are more autonomous in their functions as compared to the Sensors.

1.4. D

ETECTION

T

ECHNIQUES

Various techniques are in place for intrusion detection which can be broadly classified as follows.

(14)

Signature/pattern based Detection:

In this technique, the sensors which are placed in different LAN segments filter and analyse network packets in real time and compares them against a database of known attack signatures. Attack signatures are known methods that intruders have employed in the past to penetrate a network. If the packet contents match an attack signature, the IDS can take appropriate countermeasure steps as enabled by the network security administrator. These countermeasures can take the form of a wide range of responses. They can include notifications through simple network management protocol (SNMP) traps or issuance of alerts to an administrator’s email or phone, shutting down the connection or shutting down the system under threat etc.

An advantage of misuse detection IDS is that it is not only useful to detect intrusions, but it will also detect intrusion attempts; a partial signature may indicate an intrusion attempt. Furthermore, the misuse detection IDS could detect port scans and other events that possibly precede an intrusion.

Unauthorised Access Detection:

In unauthorised access detection, the IDS detects attempts of any access violations. It maintains an access control list (ACL) where access control policies for different users based on IP addresses are stored. User requests are verified against the ACL to check any violations

Behavioural Anomaly (Heuristic based) Detection:

In behavioural anomaly detection method, the IDS is trained to learn the normal behavioural pattern of traffic flow in the network over an appropriate period of time. Then it sets a baseline or normal state of the network’s traffic, protocols used and typical packet sizes and other relevant parameters of network traffic. The anomaly detector monitors different network segments to compare their state to the normal baselines and look for significant deviations.

(15)

Protocol Anomaly Detection:

With this technique, anomaly detector alerts administrator of traffic that does not conform to known protocol standards. As the protocol anomaly detection analyzes network traffic for deviation from standards rather than searching for known exploits there is a potential for protocol anomaly to serve as an early detector for undocumented exploits.

1.5. D

EPLOYMENT SCENARIOS OF

IDS

There exist three strategic locations where NIDS can be installed in the network for effective monitoring of the network, as depicted in the diagrams below.

Before the Gateway firewall:

In this point, the NIDS can keep track of all network events of interests, even those attacks which subsequently may fail. As it has to handle large traffic, NIDS ought to be installed on a faster machine so that analysis is done in real time. Also it has to be configured correctly so that number of false alarms can be reduced. Figure 1 shows such a configuration.

Figure 1: Network IDS placed before the Gateway Firewall

Internet

Private

Network

Router

Firewall

DMZ

(16)

In the DMZ (De-Militarized Zone):

Placing IDS within the DMZ enables it to monitor the traffic which is already partly filtered off through the gateway firewall as depicted in figure 2. This reduces the burden on the IDS but also limits its visibility

Figure 2: Network IDS in the DMZ

Inside the private corporative network:

The last possibility where NIDS can be stationed is within the corporate network as shown in figure 3. Such a location aims at monitoring the attacks emerging from the local networks and also those which are transmitted via firewall. As the number of attacks possible in this place is lesser than the preceding cases, this makes the application demands smaller. In this case IDS generates few false alarms. The scope of visibility is limited to within the corporate network, thus will not be able to detect the failed attacks as in the previous cases.

Figure 3: Network IDS within the private network

Internet

Private

Network

Router

Firewall

DMZ

Network IDS

Public Server

Internet

Private

Network

Router

Firewall

DMZ

Network IDS

(17)

It is always advisable to install NIDS on systems other than firewall so that attacker using the fact that firewall together with the IDS on a single computer can pump in malicious traffic to generate too many false alerts, and at the same time consuming system resources affecting the operations of firewall.

1.6. S

NIFFING THE

N

ETWORK

T

RAFFIC WITH

IDS

In order to monitor the network, the traffic in that segment of the network has to be made available to the Network IDS. There exists several ways to eavesdrop the network packets without obstructing their normal flow across the network as mentioned below.

Sniffing the network packets in a Hub environment

Figure 4: Network IDS sniffing the network in a Hub environment

A network Hub is a physical layer device, hence whenever data frames arrive, it simply broadcasts them to all other ports. Only the destination system processes the data while other machines discard. In such an environment, IDS can be connected to one of the Hub ports with its NIC in promiscuous or general mode which enables it to get all the network packets moving around the network. Such a configuration is depicted in figure 4.

Eavesdropping via port mirroring or SPAN (Switched Port ANalyser) port in a switched environment:

In a switched network, the packets from a source machine are forwarded only to the respective destination machine as specified by the IP address unlike in

Hub

Network IDS

Hub

(18)

the case of a network connected via Hub where packets are broadcasted to every other machine in the network. In such an environment, sniffing is made possible by a technique called Port Mirroring or Switched Port Analyzer where the mirrored port gets a copy of packet from all other ports. Machine with IDS is connected to the mirrored port or SPAN port in promiscuous mode so that it can process all the packets irrespective of their destination. Because of the aggregation of traffic on a single SPAN port, there are chances of packet drop.

Sniffing the traffic using Network TAP (Test Access port):

Figure 5: Network IDS sniffing the network using TAP device

Network TAPs [4]_{are the hardware devices having three interfaces, entry,} exit and test port. IDS is connected to the test port where it can see the entire network traffic as shown in figure 5. TAPs does not introduce any delay or affect the data movement in the network and operates transparently as it doesn’t possess IP and hardware address.

Stealth mode operation

The Network IDS has to operate transparently to avoid the intruders from targeting the IDS itself. So generally the IDS is configured to work in a special mode called “Stealth mode”. In this arrangement, the IDS sniffing interface is put in promiscuous mode without assigning the IP address, thus only listening to the packets flowing across the network keeping its presence transparent from network users.

Usually the IDS has two Network interfaces, one to monitor the network and the second one for administrative purposes, like configuring IDS, updating

Internet

Private

Network

Router

TAP

Switch

(19)

signatures, communication with IDS sensors/Manager ,dispatching alerts etc. Attacker can easily detect the configuration and location of IDS by analyzing these messages in the network. It is possible therefore to guard the IDS by encoding its messages or to create a separate network for management as shown in the diagram. The advantage of having a separate network between IDS Manager and IDS Sensors is not only to provide security but also to ensure “out of band” communication, meaning no bandwidth of the existing network is utilized for its communication.

Figure 6: Deployment scenario of NIDS with sensors in strategic points

It is generally recommended to use IDS sensors inside and outside the firewall or between each firewall in a multi-layered environment and host based IDS on all critical or key hosts. IDS Management Module and its sensors communicate via zero bandwidth LAN segment in a transparent or stealth operation mode. This kind arrangement enables the IDS to have complete view of the organizational network and can even detect the failed attempts of attacks while reducing the chances of being compromised. Figure 6 depicts a complete deployment scenario of Network IDS.

1.7. IDS

RESPONSES AGAINST ATTACK

Whenever IDS detects any intrusions or attacks, it reacts as per the preconfigured settings. The responses can range from mere alert notifications to blocking of the attacks based on the severity. The appropriate reactions on the

Subnet

Internet

Subnet

Router Firewall Firewall

DMZ

IDS Sensor

Public

Server

IDS Sensor

IDS

Admin

Console

Switch

(20)

threats are a key issue for safety and efficacy. Generally the responses can be of three types [2]

Active response:

IDS by itself cannot block attacks, however can take such actions which can lead to stopping of attacks. Such actions can be for example, sending TCP reset packets to the machine(s) which is being the target of attack, reconfiguring router/firewall as to block the malicious connection. In extreme cases, IDS can even block all the network traffic to avoid potential damage to the firm.

Passive response:

Passive solutions deliver information to IDS administrator on the current situation and leave the decision to take appropriate steps to his discretion. Many commercial systems rely on this kind of reactions. Examples for this kind of actions can be simple alarm messages and notifications. Notifications can be sent on email, cellular phone or via SNMP messages.

Mixed response:

Mixed responses combine both active as well as the passive responses appropriately as per the needs of situation.

1.8. SNORT,

A OPEN SOURCE SIGNATURE BASED

IDS

SNORT is a libpcap based lightweight network intrusion detection system, capable of performing real-time traffic analysis and packet logging on IP networks [5]_{. It can perform protocol analysis, content searching/matching and can be used} to detect a variety of attacks and probes, such as buffer overflows, stealth port scans, CGI attacks, OS fingerprinting attempts, and much more. Snort uses a flexible rules language to describe traffic that it should collect or pass, as well as a detection engine that utilizes a modular plug-in architecture. Snort has a real-time alerting capability as well, with alerts being sent to syslog, a separate “alert” file or even Windows computer via Samba.

(21)

The first version of SNORT was released in 1998 by Martin Roesch under GPL license. Currently version 2.8 is running. Snort has three primary modes of operation [3]_{. They are}

Sniffer

In this mode, SNORT simply eavesdrop the packets and displays them like tcpdump program. Depending on the flags used with SNORT, we can determine how detailed information we want to avail. Figure 7 shows the minimal details of a packet captured by SNORT.

Figure 7: Sniffed Packet (snort –v)

Packet logger:

Whenever the SNORT user wants to record the packets captured by the IDS, SNORT has to be run in the Packet logger mode, specifying the directory name where the packets are to be logged. It logs packets either in tcpdump format (binary) or in decoded ASCII format. Figure 8 shows descriptions of packets sniffed by the SNORT program.

(22)

Intrusion Detection mode:

In this mode, SNORT will not record every packet that it sniffs but logs only those events which triggered its rules as shown in figure 9.

Figure 9: Alerts generated in intrusion detection mode

SNORT Rule structure:

SNORT rules are written in PCRE format which are straight forward and quite powerful. These rules are editable as per the need. Generally the rule structure has two logical parts

Rule header contains

The type of action SNORT has to take on matching of a rule (e.g. alert, log) Protocols (IP, ICMP, TCP, UDP)

Sender IP address and the port number Flow direction (incoming, outgoing or both) Receiver IP address and the port number Source port and destination.

Rule options contains

Alert messages and information on which parts of the packet should be inspected to determine if the rule action should be taken.

(23)

The sample SNORT rule given above says that if the payload of a TCP packet matches with the content “00 01 86 a5” originated from any source address and any port number to the destination address 192.168.1.0/24 with port number 111 generate alert message “mountd access”.

1.9. R

ELATED

W

ORK

Network intrusion detection systems like snort [3]_{or Bro}[11]_{typically use} signature based detection, matching patterns in network traffic to the patterns of known attacks. This works well, but has the obvious disadvantage of being vulnerable to novel attacks. An alternative approach is anomaly detection, which models normal traffic and signals any deviation from this model as suspicious. The idea is based on work by Forrest et al. (1996), who found that most UNIX processes make highly predictable sequences of system calls in normal use.

Network anomaly detectors look for unusual traffic rather than unusual system calls. ADAM (Audit Data and Mining) [12]_{is an anomaly detector trained} on both attack-free traffic and traffic with labelled attacks. It monitors port numbers, IP addresses and subnets, and TCP state. ADAM uses a naive Bayes classifier which means that the probability that a packet belongs to some class (normal, known attack, or unknown) depends on the a-priori probability of the class, and the combined probabilities of a large collection of rules under the assumption that they are independent.

In the IDES/NIDES systems [9], [10]_{, a statistical based anomaly detection} technique is used to represent the expected normal behaviour of a subject and variance due to noises. The statistical-based anomaly detection technique overcomes the problems with rule-based anomaly detection technique in handling noises and variances. However, the statistical technique in IDES/NIDES is a univariate technique that is applied to only one behaviour measure, where as many intrusions involve multiple subjects and multiple actions having impact on multiple behaviour measures. Hence, a multivariate anomaly detection technique is needed for intrusion detection.

(24)

Matthew V. Mahoney and Philip K. Chan developed “Packet Header Anomaly

detection for identifying Hostile Network (PHAD)” [16],[17]_{that learns the normal}

ranges of values for each packet header field at the data link (Ethernet), network (IP), and transport/control layers (TCP, UDP, ICMP). PHAD detects some of the attacks in the DARPA data set that involve exploits at the transport layer and below.

The paper, “Detecting Novel Network Intrusions Using Bayes Estimators” [18] authored by Daniel Barbara and et al suggests a method called pseudo-Bayes estimators as a means to estimate the prior and posterior probabilities of new attacks. Then a Naive Bayes classifier is used to classify the instances into normal instances, known attacks and new attacks.

1.10. M

OTIVATION AND

O

BJECTIVE

Despite the fact that intrusion detection systems are commercially developed and used for more than a decade, there still exist many issues around IDS. Some of the shortcomings of the current IDS which handicap its effectiveness are discussed below.

a) Only the known attacks are detected in signature based techniques which simply means no protection is offered against novel attacks or new variants of existing intrusions. A small variation in the attack pattern can invalidate a signature. By the time the new signatures/patches come up the intrusions might have done the intended damages.

b) How well a signature captures the attacks in its string is again a matter of concern. There are quite a few such poorly written signature codes. So the actual attack pattern may stretch across multiple packets, easily evading the detection system.

c) In order to perform an exhaustive signature based search, the processing and memory needs are very high and in the real time scenario, there is quite likely hood of missing genuine attacks. Also, there is the problem of ever increasing attack signature databases.

(25)

d) Also the attackers can frame such malicious packets that are likely to have many attack signatures to keep the detection engine busy and in the course of action some packets with real attack patterns will find their way into the internal network, thus evading the detection system

e) There is another class of attacks which targets the detection algorithms as elucidated below. String matching algorithms are the core component of any signature detection mechanism and there is not a single string matching algorithm which can be efficient in any given situation. So the sly intruders can fabricate and send the packets which cause the algorithms to run in the worst case complexities.

f) And what if the attacker sends packets with signatures spread across multiple packets, use techniques like stealth scanning.

g) In anomaly approach, though new kinds of intrusions are detected, this benefit is paralyzed by high number of false alarms. More over improper/ insufficient training to anomaly module results in showing the genuine changes in the network traffic pattern as suspicious activities only to raise the number of false positives and false negatives.

1.11. O

BJECTIVE

The aim of the present work was to design and develop of a Anomaly or behavioural based Network Intrusion Detection System which can detect intrusions based on behavioural patterns (i.e. without the use of signatures) and can also detect novel attacks which are anomalous in nature.

The work also aimed at reducing number of false alarms by characterizing the target network with appropriate network parameters and analyzing them with mathematical models.

Literature survey reveals that, the Bayesian Analysis is successfully used in the SPAM filters but in the area of IDS it is still not explored to great extent. So in this work, Bayesian classification technique is used for discriminating the anomalous attacks from that of normal activities. Hotelling’s Multivariate

(26)

statistical hypothesis technique and statistical mean- variances model are also being used.

The project is integrated with a open source signature based IDS called SNORT so that it forms a complete package having both signature and anomaly techniques for effective defence against the Network attacks

1.12. O

RGANIZATION OF

T

HESIS

This report is organized as follows. Section 1 gives brief introduction to the project topic, Types and techniques for IDS, deployment scenarios of IDS etc. Then related work in the field of IDS is covered. It also talks about the motivation for taking up the project and objectives set for the project. Chapter 2 deals with the system architecture, explains the individual components of the IDS. Next section i.e. Chapter 3 explains the techniques used in the research. Chapter 4 deals with the results and discussions. Finally chapter 5 covers the conclusion and the future directions for enhancing the capabilities of the present IDS.

(27)

2. CHAPTER

2 2.1. S

YSTEM

A

RCHITECTURE

The proposed architecture of Network IDS has various components as depicted in the figure 10. This architecture is based on SNORT, which is a open source Network IDS [19]_{. The components execute different functionalities which} are discussed below.

Figure 10: Overall System architecture

2.2. S

ENSOR

/D

ECODER

(28)

The NIC is put in promiscuous mode to sniff all the packets in the network irrespective of their target. The decoder receives the packets from the libpcap packet capturing library and processes them. Formal checker evaluates the packet structure for truncated packet headers and proper checksum, depending on whether it is an Ethernet, ARP, IP, TCP, UDP or ICMP packets. When Formal checker detects an error in the packet structure, it informs the decoder and the packet is discarded from further processing. Figure 11 shows the block diagram of the sensor/decoder. This module executes following functionalities.

- Sniffs all the network packets visible to it in real time.

- Extract the header and payload information from the Ethernet frame. - Updates the Ethernet, ARP, RARP, IP, TCP, UDP and ICMP counter as and when the respective packets are received

- Perform necessary checks on header and payload information. - Sniffed packets sent to the Pre-processor

2.3. P

REPROCESSOR

This module takes the packets from the decoder and performs the functions like IP de-fragmentation, building the sessions for reassembly of packets etc. Several pre-processors are available with SNORT to execute the necessary tasks as depicted in Figure 12. This module also hosts the Anomaly learning and detection pre-processor used for detecting the intrusions leading to anomalies.

(29)

The pre-processor has following responsibilities: - De-fragments the fragmented IP packets - Reassembles the TCP packets into streams

- Normalizes Application Layer protocols like Telnet/HTTP - Detects Port scans/Evasion Attacks

- Pre-processed packets sent to Detection Engine

- Anomaly Detection pre-processor detects the intrusive activities in the network

2.4. A

NOMALY

D

ETECTION PRE

-

PROCESSOR

This module helps to detect network based intrusions which manifests in abnormal network behaviour. It runs in two phases, learning (Training) mode and detection mode. In the learning mode, the module learns the traffic pattern of the entire network and records the corresponding network parameters. Once the learning is over, the network profile is generated using the profiler program. This profile is used to detect the anomalies when the module runs in the detection mode. Figure 13 shows the structure of Anomaly detection pre-processor.

(30)

It performs following functionalities: In the Learning mode

- Measures the network parameters at regular intervals as configured

by user

- Stores these values into a log file at regular interval In the Detection mode

- Measures the network parameters at regular intervals - Reads baselined values from the file

- Finds statistical deviations (Mean and Variance)

- Computes values for Hotelling’s expression and Bayesian

discrimination function

- Triggers the alerts on detecting any abnormalities in the traffic

pattern

2.5. D

ETECTION

E

NGINE

It is the main part of the entire system which is responsible for detecting the attack signatures in the pre-processed packets. The overall system performance directly depends on this module. Some of the main functions handled by this module are listed below.

- Parses the rules and build an internal data structure that holds the rules in a customized tree structure. Once the tree is built, loads it

into memory.

- Passes traffic through this rule tree for comparing the packet header and data against the rules. (Uses strings matching algorithms)

- Report to Alert module on packets that have found to be carrying

malicious data.

- If any new rules have been added or if existing rules are modified or deleted then updates the same to the detection engine tree structure. - When the application is exited this will clean up all memory

(31)

2.6. A

LERT

M

ODULE

- Sends the alerts triggered by the Detection Engine to Alert Console in real time.

- Stores the alerts into a alert file (/var/log/snort) and/or into a Database such as MySQL as per the configuration

Open source php based console, called “Basic Analysis and Security Engine” (BASE) is integrated with the Alert Module to enhance the user friendliness. The figure 14 shows screenshot of the BASE console.

(32)

2.7. B

ASIC

A

NALYSIS AND

S

ECURITY

E

NGINE

(BASE)

BASE is a open source code written in the PHP programming language which displays information from a database in a user friendly web front end [6],[7]_. It is based on the code from the “Analysis Console for Intrusion Databases” (ACID) project. Apache web server has to be setup for running BASE. Figures 15 and 16 shows the screenshots on BASE console

Figure 15: BASE console showing the alert statistics

When used with Snort, BASE reads both tcpdump binary log formats and Snort alert formats [7]_{. Snort must be configured to log alerts to the database used} by BASE (for example. MySQL). The alerts from Anomaly detection pre-processor can also be viewed on BASE console. Once data is logged and processed, BASE has the ability to graphically display both layer-3 and layer-4 packet information.

(33)

It also generates graphs and statistics based on time, sensor, signature, protocol, IP address, TCP/UDP port, or classification. The BASE search interface can query based on alert Meta information such as sensor, alert group, signature, classification, and detection time, as well as packet data such as source/destination addresses, ports, packet payload, or packet flags.

Thus BASE allows for the easy management of alert data. The administrator can categorize data into alert groups, delete false positives or previously handled alerts, and archive and export alert data to an email address for administrative notification or further processing. Support for user logins and roles, allowing an administrator to control what is seen through the web interface.

(34)

2.8. O

PERATING ENVIRONMENT

The development work is carried out in C language on Linux platform to comply with the SNORT program. The following software/tools are used for the development and execution of the project

ANJUTA - Open source IDE

BASE - Basic Analysis and Security Engine

GCC - GNU C Compiler to compile the components. Libpcap - Linux Packet capturing library

MYSQL - Centralized database storage.

RHEL4 - Redhat Enterprise Linux 4

SNORT - Open Source Network Intrusion Detection System

The IDS works efficiently on a system with the following configuration:

Pentium IV 2.0 GHz 512MBRAM

40 GB Hard Disk or higher

(35)

3. CHAPTER

3 3.1. R

ESEARCH

A

PPROACH

The primary task was to characterize the target network in terms of suitable network parameters. The parameters are chosen such that their values will change perceivably in normal and intrusive conditions. The features considered are the commonly seen protocols in the network traffic, the traffic data rate and the flow direction.

In essence, the Anomaly model tries to capture the network behaviour in terms of two quantities intensity and heterogeneity. Intensity refers to the number of occurrences of a given network parameter over a period of time (for example number of TCP connections or number of outgoing HTTP packets etc) while heterogeneity refers to the observed pattern of the nature of network activities over time (for example the data rate of HTTP packets in different time segments of the day or observations like web traffic is more during the beginning of office hours and then drops. It rises again during the closing hours etc). These two quantities closely relate to activities occurring in any given network and thus can represent the behaviour of network under the assumption that network behaviour has certain degree of repeatability.

Once the network behaviour is quantified with these parameters, the next step would be to observe how they vary with time. The observation has to be made on different days of a week because the network behaviour changes over working days and non working days of a week and also on general holidays. The Anomaly based IDS has two operational modes.

Learning (or training) mode:

In this mode, the IDS learns the normal traffic behaviour in terms of representative feature set characterizing the target network. It collects the statistics of the selected network parameters for different types of days (Week days from

(36)

Monday to Friday, Saturdays and Sundays) and then stores them into a specified file for subsequent processing. The frequency of statistics collection is set as per requirement; it is set by default to 10 minutes. IDS is put in this mode for sufficient period to learn the normal network behaviour. Sufficient training period is the key factor in reducing the false alarms. When IDS is learning the normal behaviour, the target network is assumed to be free from attacks and intrusions Following attributes are considered for characterizing the network:

TCP Packet count (incoming, outgoing and within LAN) UDP Packet count (---’ ’---)

ICMP Traffic (---’ ’---) The number of TCP connections

Web Traffic (incoming, outgoing)

DNS Traffic (---’ ’---)

Data rates TCP traffic in kb/s (---’ ’---) Data rates UDP traffic in kb/s (---’ ’---) Data rates HTTP traffic in kb/s (---’ ’---) Data rates DNS traffic in kb/s (---’ ’---)

Once the learning is over, profile for the target network is generated with the gathered data using a profiler. If statistics collections is done at every 10 minutes and the learning period is say 1 month, total 24 sample values are available for each network parameter corresponding to each hour of the week day. Hence the profile is generated for each hour of the day over entire week. This implies that total 168 baseline vectors are established for the entire week, each vector containing 25 network parameters. The profile also contains 168 inverse matrices each of the order 25 x 25, accounting for number of parameters in consideration. This profile is used by Anomaly detection module during the detection phase. The IDS is also trained to learn the network behaviour in the presence of network intrusions. Intrusions are simulated using the MIT-DARPA training data set. Network profile is also generated for this condition. Figure 17 shows the time slots used for generating the profile.

(37)

Figure 17: Time slots used in generating the network profile

When the network environment changes for genuine reasons, it may result into a number of false positives. In such situations the Anomaly model can be updated by rerunning the training phase on the changed traffic and rebuilding the profile using profiler program.

The logic for profile generation is given in figure 18.

Input : The file containing the features values logged during the learning phase Output : files containing the mean, standard deviations and inverse matrices of

feature set

begin

for i =1 to Num .of week days do

for j =1 to Num. of hours in a day do

Read the feature values logged during learning phase; for k =1 to Num. of network features do

find sum of the values corresponding to the same hour and day of the week; Compute Average values and standard deviation for each feature;

Compute

∑

−

= n m l T m l

x

1 ,

)

)(

(

μ

where n is the total number of features

Compute the Determinant of above covariance matrices

if Determinant ≤ 0

Consider the neighbouring covariance matrix having positive Determinant Compute inverse matrix corresponding to each Covariance matrix

end

(38)

Detection mode:

In this mode, IDS detects in real time, the network based attacks leading to abnormal traffic pattern. The abnormality is decided on the basis of the network profile constructed earlier. The profile contains 168 vectors corresponding to each hour of the day over entire week, each vector containing as set of 25 features which describes the network. The Anomaly detection module samples the selected network parameters at regular intervals, as in the case of learning mode, checks whether they comply with already established network profile for that particular hour and day of the week. If it detects significant deviations, then it triggers alerts. The logic for detection is given in figure 19

Input : The file containing the network profile

Output : Sends alert in case a event is detected as intrusion

begin

for i =1 to Num .of week days do

for j =1 to Num. of hours in a day do for k =1 to Num. of network features do

Read Average values and standard deviation for each feature; Read the inverse matrices

Read the determinant matrix corresponding to each inverse matrix

Compute (μ ± σ) for each parameter

if

( μ

−

σ

>

x

>

μ

+

σ)

x is intrusive

Compute

T

2

=

(X

−

μ

)S

−1

(

X

−

μ

)

T

if 2

T exceeds the threshold flag alerts

Compute ( ) ( ) ln ( ) 2 1 | | ln 2 1 ) (X S X S 1 X p I g T i = − − −μ − −μ +

if g_i( X ) exceeds the threshold flag alerts

end

(39)

The flow chart in figure 20 shows the overall working of Anomaly Detection technique.

Figure 20: Flow chart depicting the overall working of Anomaly Detection

(40)

3.2. S

TATISTICAL

M

OMENTS OR

“M

EAN AND

S

TANDARD

D

EVIATION

M

ODEL

”

Statistical based anomaly detection techniques use statistical properties (mean and variance) of normal activities to build a statistical normal profile and employ statistical tests to determine whether observed activities deviate significantly from the normal profile [20]_.

Figure 21: normal distribution curve with different confidence intervals

The arithmetic average, or the mean, is a statistic that measures the central tendency of a set of data. It is given by,

n n i i x μ ∑ = = 1 Where μ = mean

xi = value of ith observation of a given parameter, i =1… n

n= total number of observation in a sample

The Standard Deviation is a measure of the amount of data dispersion around the mean. It is given by,

1 1 ) ( 2 − ∑ = − = n n i i x _μ

σ Where σ=standard deviation

xi = value of ith observation of a given parameter, i =1… n

μ = mean

n = total number of observation in a sample

(41)

If the value of xi goes beyond (μ±n*σ ), it simply indicates an anomalous

situation and can be flagged as alert.

It is difficult to determine thresholds above which an anomaly should be considered intrusive. Setting threshold too low results in false positives and setting it too high results in false negatives. So the confidence interval is chosen suitably based on the experimentation [21]_{. Figure 21 shows different confidence} intervals for a Gaussian distribution.

3.3. H

OTELLING

’

S

T

2

_H

_YPOTHESIS

_,

_A

_M

_ULTIVARIATE

S

TATISTICAL

T

ECHNIQUE

When there are enough computational resources and the security level is also high then "multivariate models" are a good choice since they produce better results with less false alarm rate as compared to mean and standard deviation model. Hence these are recommended for the IDS.

Hotelling’s T2_{test is a multivariate statistical process control technique that} detects anomalies in the activities of a network. It can be assumed as the multivariate extension of mean/standard deviation model, employing an n dimensional mean vector and the corresponding covariance matrix.

Hotelling’s

_{Τ statistic for an observation}2 i X is determined by [13],[14] ) ( ) ( 1 2

μ

− − = Τ − i T i S X X Where ) ... , , ( _i₁ _i₂ _i₃ _ip i X X X X

X = , denotes an observations of p variables at time t

) ,... ,

,

(μ1 μ2 μ3 μp

μ = , denotes a vector of mean values of p variables at time t and Sis the covariance matrix given by,

∑

− − − = n T i i X X n S 1 ) )( ( ) 1 ( 1 μ

μ , where n is the data sample size

The computed 2

Τ value is small if the data point conforms to the norm profile. If the value of the _{Τ statistic is greater than a threshold value, then the}2

(42)

behaviour. The threshold value is set based on the observed values of _{Τ for}2

normal and intrusions during the learning phase. Hotellings _{Τ test provides a}2

complete data model of multivariate data. Since it uses the covariance matrix S of p variables, it detects both mean shifts and their interrelationship in a multivariate manner which is important in finding the network anomalies. The test detects three kinds of events. They are normal, suspicious and Intrusive. Normal corresponds to the events which comply with previous normal traffic pattern. Suspicious means the events which are deviated to some extent from their normal behaviour and Attack indicates there is a large variation in the observed and expected traffic pattern.

3.4. B

AYESIAN

C

LASSIFICATION

,

A

P

ROBABILISTIC

T

ECHNIQUE

In probabilistic classification method, a pattern is assigned to the class that is most probable, given the observed features, i.e., point x of a feature space is assigned to the class that maximize p(C_j/x)

The classification problem is formulated in terms of estimating the posterior probability that pattern x belongs to one of the m data classes

Posterior probability depends on

- The prior probability p(C_i) i.e. the likelihood that a random selected pattern belong to class Ci

- The class conditional probability density function p(x/C_i) i.e. the distribution of patterns of class ciin the selected space.

Baye’s Theorem:

Bayesian statistics, in the most general form, provides a framework for combining observed data with prior assumptions in order to model stochastic systems [23], [24]_. ) ( ) ( ). / ( ) / ( ) ( ) ( ) / ( ) / ( 1 i M i i i i i i i p C C p C x p C x p C p x p C x p x C p

∑

= = =

(43)

Any function that computes the conditional probabilities p(C_i/x) is referred to as discriminant function. Given an observation x , the Bayes theorem provides a method to compute p(C_i/x).

) (x

p can be ignored, since it is the same for all the classes and thus does not help in discriminating the classes.

The likelihood function )p(x/C_i denotes a probability density function of the vector samples x given a particular estimate Ci of the underlying probability

distribution generating that data. A multivariate normal distribution is assumed for p(x/C_i). Figure 22 shows the multivariate Gaussian distribution curve.

Figure 22: Multivariate Gaussian distribution curve

A Gaussian or multivariate normal distribution is characterized by its mean value vector μ and its covariance matrix S and has the distribution function,

)}

(

)

(

exp{

||

)

2 (

1 )

,

(

1₂ 1 2 / 1

μ

π

μ

Σ

=

−

_X

−

_S

−

_X

−

S

f

T p --- (2)

Here X is a pdimensional pattern vector of real valued attributes

The discriminant function g_i( X) can be derived by using the equations (1) and (2).

)

(

ln

)

(

)

(

2

1 |

|

ln

2

1 )

(

X

S

X

S

1

X

p

I

g

T i

=

−

μ

−

μ

+

(44)

4. CHAPTER

4 4.1. E

XPERIMENTAL

R

ESULTS AND DISCUSSION

To evaluate the system, two major indicators of performances are chosen. - Detection rate

- False positive rate

Detection rate is defined as the number of intrusion instances detected by the system divided by the total number of intrusion instances present in the test set. The false positive rate is defined as the total number of instances that were wrongly detected as intrusions divided by the total number of normal instances. These are good measures of performances since they measure what percentage of intrusions the system is able to detect and how many incorrect classifications it makes in the process. The following sub sections give the details of evaluation scheme and the results obtained.

4.2. E

VALUATION

S

CHEME

The Anomaly IDS is trained for five weeks to learn the normal network traffic of the IIT, Kharagpur. The model considers a vector of 25 network attributes to describe the target network. The IDS is also trained for more than three weeks to learn the network behaviour under intrusions. The intrusions are simulated in the network using MIT-DARPA 1999 data set. The training data contains a total of 4396 vector data points for normal traffic and 2120 vector data points for intrusive traffic. The training period covers different types week days (working, Saturday and non working days). The network profile is generated using the training data which contains a total of 168 vector data points corresponding to each hour of the day over the entire week. The same training data and the test data is used with all the three techniques discussed earlier.

(45)

About MIT-DARPA IDS Evaluation

In 1998, the Information Systems Technology Group of Lincoln Laboratory at MIT, in conjunction with the Air Force Research Laboratory (AFRL) and the Defence Advanced Research Projects Agency (DARPA), began work to develop a standard for the evaluation of Network IDS. Developing this evaluation meant the creation of consistent and repeatable network traffic. The traffic was created through the study of 4 months of data from Hanscom Air Force Base and approximately 50 other bases. Using that data, they were able to generate and simulate network traffic, while introducing attacks, probes and intrusions into the data. Both training and testing data were simulated and two types of traffic were published. Training data is traffic in which the attacks were known from the start. A second set of data contains traffic in which the attacks were not described explicitly. Data sets of Week 1 and Week 3 contain attack free traffic while Week 2 contains training data with attacks. Week 4 and Week 5 are the testing data containing network attacks in the midst of normal background data. Test Data sets contains four categories of simulated attacks

DoS – Denial of service (e.g. SYN flood)

R2L -- unauthorized access from remote machine (password guessing)

U2R –unauthorized access to super user or root functions (buffer overflow attacks) Probing --surveillance and other probing vulnerabilities (port scanning)

A more complete discussion on this is available at the Lincoln Laboratory/ MIT site [22]_.

(46)

The table 1 gives the values obtained for the Hotelling’s multivariate expression and Bayesian Classifier for normal and intrusive network traffic.

Values for Hotelling’s Statistic Values for Bayesian Classifier

Normal Intrusive Normal Intrusive

1 _7.74E+09 1.32E+17 3.07E+08 6.59E+16

2 7.60E+08 9.07E+16 1.48E+07 4.54E+16

3 5.60E+08 6.26E+16 1.32E+07 3.13E+16

4 4.49E+08 6.05E+16 1.07E+07 3.02E+16

5 1.59E+08 4.35E+16 1.04E+07 2.18E+16

6 8.84E+07 2.97E+16 1.03E+07 1.48E+16

7 5.10E+07 2.60E+16 6.70E+06 1.30E+16

8 4.50E+07 2.37E+16 6.52E+06 1.19E+16

9 2.95E+07 1.95E+16 2.88E+06 9.77E+15

10 2.46E+07 1.57E+16 2.74E+06 7.85E+15

11 2.09E+07 1.09E+16 1.71E+06 5.44E+15

12 1.93E+07 9.58E+15 2.16E+05 4.79E+15

13 1.36E+07 9.34E+15 2.60E+05 4.67E+15

14 1.34E+07 6.34E+15 7.19E+05 3.17E+15

15 1.17E+07 5.19E+15 1.29E+06 2.59E+15

16 8.36E+06 5.12E+15 1.40E+06 2.56E+15

17 7.88E+06 3.79E+15 1.41E+06 1.89E+15

18 6.27E+06 2.64E+15 1.59E+06 1.32E+15

19 5.67E+06 2.29E+15 1.63E+06 1.15E+15

20 4.85E+06 2.28E+15 2.42E+06 1.14E+15

21 3.26E+06 3.32E+14 2.84E+06 1.66E+14

22 3.18E+06 2.67E+14 3.13E+06 1.34E+14

23 2.82E+06 2.67E+14 3.94E+06 1.33E+14

24 2.80E+06 2.12E+14 4.18E+06 1.06E+14

25 2.59E+06 1.65E+14 5.85E+06 8.25E+13

26 1.44E+06 1.08E+14 6.70E+06 5.39E+13

27 5.20E+05 7.73E+13 6.82E+06 3.87E+13

Table 1: Typical values obtained for the normal and intrusive network traffic with Hotelling’s and Bayesian discriminator functions

By manually analysing a large set of values obtained for Hotelling’s and Bayesian discriminators, it is found that following values more closely discriminate the normal activities from the intrusive ones.

Hotellings Technique: On an average, the values for normal activities lie between

1.00E+06 to 5.00E+07 while for intrusive the values are above .90E+08.

Bayesian Technique: On an average, the values for normal activities lie between

(47)

4.3. C

OMPARATIVE RESULTS

Attack Name Tools/Data set used Count Detection using different Techniques

Probabilistic (Bayesian Classifier) Statistical (Hotelliing's Hypothesis) Statistical (Mean ± 2*SD)

ping flood ping tool 15 15 15 15

DoS attack ddos open source tool 5 5 5 5

TCP RST attack neti open source code 5 5 5 5

TCP Syn flood

attack neti open source code 7 7 7 6

UDP attack neti open source code 10 10 10 10

X mas scan nmap tool 5 5 4 4

NTinfoscan MIT_ DARPA 1999 Data set 1 0 0 0 pod " " 2 2 2 2 back '' " 2 0 0 0 httptunnel " " 2 0 0 0 land " " 2 2 2 2 secret " " 3 0 0 0 portsweep " " 3 3 3 2 eject " " 3 0 0 0 mailbomb " " 2 2 2 2 ipsweep " " 3 3 2 2 satan " " 2 1 1 1 neptune " " 2 2 2 2 Total 74 62 60 58 Detection Accuracy (%) 83.78 81.08 78.38 Total Alerts generated 65 64 67 No. of Attacks missed 12 16 20

(48)

rate (%) False Negative rate (%) 16.22 21.62 27.03 Positive Prediction rate (%) 95.40 90.63 78.30

Table 2: Chart showing the comparative results of the experiments

Table 2. given below shows the results obtained by Daniel Barbara et al using pseudo-Bayes estimators [6]

Table 3. Experimental results on MIT_LL DARPA 1999 Data set.

Source: http://www.cs.ubc.ca/local/reading/proceedings/siam_datamining2001/pdf/sdm01_29.pdf

4.4. D

ISCUSSION

The experiment clearly revealed that the Bayesian classification method gives better detection rate and less false positives in detecting the intrusions among the three techniques discussed in the project. The detection accuracy of ≈84 % is achieved using the Bayesian method with the false positive rate of 4.6%. Hotelling’s statistical method gave a hit rate of ≈81% at 6.2% false positive rate. The performance metrics for statistical Moments (mean and standard deviation) model yielded hit rate of ≈ 78% while the false positive rate was 13%. The

(49)

comparative analysis with the previous works also reveals that the Bayesian approach is a superior technique.

In summary, the results show that the approach followed in this thesis is quite effective and efficient for detecting the network based attacks. It is also observed that the multivariate statistical techniques are more effective than the univariate technique, particularly the Bayesian techniques has promising potential in the future IDS research

(50)

5. CHAPTER

5 5.1. C

ONCLUSION

Network Intrusion Detection System has a major role to play in safeguarding the network resources against various kinds of attacks. With the advent of new vulnerabilities and sophistications in the nature of attacks, new techniques for intrusion detection have evolved. The main objectives of the research being increasing the detection accuracy while keeping the false positive rate low.

As stated earlier, the signature based techniques are good but has the obvious short comings like failure to detect novel attacks, increasing signature database etc. So the viable alternative would be to analyse the behaviour of the network as a whole and trying to build the model based on the observations. So Anomaly based detection has been a wide area of interest for researchers since it provides the base line for developing promising techniques.

The Anomaly based detection complements the Signature based technique and helps in identifying the novel attacks which lead to the anomalies in the network traffic. The major concerns in this method are identifying the appropriate network features to characterize the network and build a behavioural model and also the rate of false positives may increase sharply if the IDS is not trained sufficiently in the target network.

In the present framework of project, discussed the design and development of “Anomaly based intrusion Detection system” which is built on top of a existing open source signature based network IDS, called SNORT so to have both the analysis techniques in a single package .

The Anomaly based component of IDS is trained in the Computer and Informatics Centre of Indian Institute of Technology (IIT), Kharagpur where the IIT network traffic is sniffed using a port mirrored switch at the gateway. The IDS is trained for more than a month in the IIT network at computer and Informatics centre, to learn the normal traffic pattern. Also it is exposed to the intrusive traffic

(51)

for more than 3 weeks, in a simulated environment by replaying the MIT DARPA Intrusion Detection System training datasets (1999).

The thesis presented three techniques for detecting anomaly based intrusions at the network level. Statistical based anomaly detection techniques use statistical properties and statistical tests to determine whether "observed behaviour" deviate significantly from the "expected behaviour". The first technique is based on univariate statistic model with mean and variance. The second method uses the multivariate Hotelling’s method while the last technique uses the Bayesian classification technique for discriminating attacks from that of normal activities.

All the three techniques are evaluated with the DARPA IDS evaluation Data sets (1999) and the results are compared. Bayesian approach proved to be a better solution than the Hotelling’s Multivariate technique and the method of Statistical Moments.

Presently, the work caters only to identify and classify the events into normal and attack classes. It can be extended to detect and classify the attacks into multiple attack classes. Dynamic updation of the Anomaly Model using Bayesian Network can also be considered for future enhancement. Different Analysis techniques like HMM and Fuzzy Logic can also be tried as alternative techniques for anomaly detection.

Anomaly based Network Intrusion Detection System

Anomaly based

Network Intrusion Detection System

Anomaly based

Network Intrusion Detection System

Master of Technology

In

Computer Science and Engineering

Dinakara K (06CS6026)

Prof. Jayanta Mukhopadhyay

Prof. S.K. Ghosh

Computer Science and Engineering

Indian Institute of Technology

Kharagpur -721302, India

Computer Science and Engineering

INDIAN INSTITUTE OF TECHNOLOGY

KHARAGPUR

Certificate

This is to certify that the thesis entitled

“ Anomaly based Network Intrusion

Detection System ”

which is being submitted to the Indian Institute of Technology,

Kharagpur, for the award of the degree of Master of Technology in Computer Science and

Engineering by Dinakara K ., Roll No. 06CS6026 has been carried out by him under our

guidance. This thesis, in our opinion, is worthy of consideration for the award of degree

of Master of Technology in accordance with the regulations of this institute.

ACKNOWLEDGEMENTS

Dinakara K

CONTENTS

A

A

L

F

L

T

1. CHAPTER

1

1.1. I

1.2. B

H

IDS

1.3. T

IDS

1.4. D

T

1.5. D

IDS

Internet

Private

Network

Router

Firewall

Firewall

DMZ

Internet

Private

Network

Router

Firewall

Firewall

DMZ

Network IDS

Public Server

Internet

Private

Network

Router

Firewall

Firewall

DMZ

Network IDS

1.6. S

N

T

IDS

Hub

Network IDS

Hub

Internet

Private