A Fast Host-Based Intrusion Detection System Using Rough Set Theory

(1)

A Fast Host-Based Intrusion Detection System

Using Rough Set Theory

Sanjay Rawat1,2_{, V P Gulati}2_{, and Arun K Pujari}1

1 _{AI Lab, Dept. of Computer and Information Sciences}

University of Hyderabad, Hyderabad-500046, INDIA

[email protected], [email protected]

2 _IDRBT

Castle Hills, Road No.1

Masab Tank, Hyderabad-500057, INDIA

[email protected]

Abstract. Intrusion Detection system has become the main research focus in the area of information security. Last few years have witnessed a large variety of technique and model to provide increasingly efficient intrusion detection solutions. We advocate here that the intrusive behav-ior of a process is highly localized characteristics of the process. There are certain smaller episodes in a process that make the process intrusive in an otherwise normal stream. As a result it is unnecessary and most often misleading to consider the whole process in totality and to attempt to characterize its abnormal features. In the present work we establish that subsequences of reasonably small length of sequence of system calls would suffice to identify abnormality in a process. We make use of rough set theory to demonstrate this concept. Rough set theory also facilitates identifying rules for intrusion detection. The main contributions of the paper are the following- (a) It is established that very small subsequence of system call is sufficient to identify intrusive behavior with high ac-curacy. We demonstrate our result using DARPA’98 BSM data; (b) A rough set based system is developed that can extract rules for intrusion detection; (c) An algorithm is presented that can determine the status of a process as either normal or abnormal on-line.

Keywords:Data mining, Decision Table, Rough Set, Intrusion Detec-tion, Anomaly, Misuse.

1 Introduction

Intrusion detection systems (IDSs) have become a major area of research and product development. They work on the premise that intrusions can be detected through examinations of various parameters such as network traffic, CPU uti-lization, I/O utiuti-lization, user location, and various file activities. Based on the various approaches, different types of IDS are proposed in the literature. On the basis of audit data, there are two types of IDS. Thenetwork-basedsystems collect data directly from the network that is being monitored, in the form of packets

(2)

[29] and the host-based systems collect data from the host being protected [2]. Based on processing of data to detect attacks, IDS can also be classified into two types – misuse-based systems and anomaly-based systems. While the for-mer keeps the signatures of known attacks in the database and compares new instances with the stored signatures to find attacks, the latter learns the normal behavior of the monitored system and then looks out for any deviation in it for signs of intrusions. It is clear that misuse based IDS cannot detect new attacks and we have to add manually any new attack signature in the list of known patterns. IDS based on anomaly detection, on the other hand, are capable of de-tecting new attacks as any attack is assumed to be different from normal activity. However anomaly based IDS sometimes sets false alarms because it cannot dif-ferentiate properly between deviations due to authentic user’s activity and that of an intruder.

Among various IDS approaches,signature-analysis stores patterns of attacks as semantic descriptions [21]. The main drawback of the signature analysis tech-nique, like all misuse-based approaches, is the need for frequent updates to keep up with the stream of new vulnerabilities/attacks discovered.Rule-based intru-sion detection [34][20][13] assumes that intruintru-sion attempts can be characterized by sequences of events that lead to the state of compromised-system. Such sys-tems are characterized by their expert system properties that fire rules when au-dit records or system status information begin to indicate suspicious activity. The main limitations of this approach are the difficulty of extracting knowledge about attacks and the processing speed. State transition analysis technique describes an attack with a set of goals and transitions, and represents them as state tran-sition diagrams [18][19][32]. The most widely used approach of anomaly-based intrusion detection is statistical [16][27]. User or system behavior is measured by a number of variables sampled over time and stored in a profile. The current behavior of each user is maintained in a profile. At regular intervals the current profile is merged with the stored profile. Anomalous behavior is determined by comparing the current profile with the stored profile.

Forrestet al [11][12] suggest that system calls trace of a process under normal execution can be taken as its normal behavior in terms of system calls, as varia-tion in sequences of system calls is very small. On the other hand, this variavaria-tion is relatively higher when compared to a sequence of system calls under abnormal execution. This variation can be attributed to the presence of one or morealien (thus malicious) subsequences in the abnormal process. It should be noted that not all the subsequences of an abnormal process are malicious. Thus intrusive part should be detectable as a subsequence of the whole abnormal sequence of the process.

In this paper we present a technique of discovering rules for intrusion detec-tion. We make use ofrough set theory for this purpose. To best of our knowledge, Lin was the first to propose the idea of applying rough sets to the problem of anomaly detection [25]. Though the paper lacks the experimental results [25], it provides some solid theoretical background. The following two theorems are important:

(3)

1. Every sequence of records in computer has a repeating sequence 2. If the audit trail is long enough, then there are repeating records

Following the argument of Forrestet al and in the view of above theorems, our approach is based on subsequences of system calls. We formulate the problem as a classification problem by writing the set of subsequences as a decision table. The proposed method is a combination of signature-based and anomaly based approaches. A program behavior is monitored as a sequence of system calls. These sequences are further converted into the subsequences of shorter length. These subsequences are considered as the signatures for malicious as well as nor-mal activities. By doing so, one of the disadvantages of signature-based approach of frequently updating the signature database can be avoided. Empirical results show that the proposed system is able to detect new abnormal activities without updating the signatures. Further, these signatures are represented in the form of IF-THEN type decision rules. The advantage of representing signatures in this form is that such signatures are easy to interpret for further analysis. Rough set theory is used to induce decision rules. Rules induced by using rough set theory are very compact because before inducing rules, all the redundant features of the audit data are removed. This makes the matching of rules faster, thus making the system suitable for on-line detection. The proposed system is also fast in the sense that process is compared, in parts, as it starts calling system calls. So we do not have to wait until it exits.

The major contributions of the paper are:

– It is established empirically that short sequences of system calls are sufficient to detect intrusive behavior with high accuracy;

– A rough set based approach is developed that can extract decision rules for intrusion detection;

– An algorithm is presented that can classify a process as normal or abnormal on-line.

Rest of the paper is organized as follows: Section 2 gives an overview of research work on process profiling using sliding window approaches and learning rules for intrusion detection. Section 3 presents some preliminary background to understand the approach. A detailed description of the proposed scheme is given in the section 4. Section 5 covers the experimental setup and analysis of the results. Section 6 concludes the paper.

2 Related Work

Recently, process monitoring for the sign of intrusions has attracted the atten-tion of many researchers and active research is being done in this area. In the approach, calledtime-delay embedding (tide), initiated by Forrestet al [11][12], normal behavior of processes is captured because programs show a stable be-havior over the period of time under normal execution. In this approach, short

(4)

attribute values. Knowledge representation is very simple and learning rate is very fast as compared to other techniques. Our study shows that it is possible to detect an attack by mare looking at some portion of the abnormal process. This reduces the dimension of the data to be processed and thus makes the subse-quent computations much faster. The decision rules induced by rough set theory are easy to interpret and thus can be useful in further analyzing the events. We have tested our scheme by conducting experiments on DARPA’98 data. Empir-ical results, reported in the paper, justify our approach of making use of rough set for intrusion detection. As our future work, we intend to use the concept of incremental learning so that new rules can be learnt without retraining on whole data. We are also analyzing the IF-THEN rules to better understand the relationship among system calls to gain more insight about attacks. Our future work also includes to combine rough set method with other learning techniques, e.g. neural networks to propose a more robust IDS in terms of accuracy.

Acknowledgement

The authors are thankful to anonymous reviewers for their useful comments to improve the presentation and quality of the paper. The first author is associated with IDRBT as research fellow and thankful to IDRBT for providing financial assistance and infrastructure to carry out this work. The third author is thankful to MIT, India for its funding.

References

1. An A., Huang Y., Huang X., Cercone N.: Feature Selection with Rough Sets for Web Page Classification. In Dubois D., Grzymala-Busse J.W., Inuiguchi M., and Polkowski L. (eds),Rough Sets and Fuzzy Sets, Springer-Verlag (2004)

2. Bace R., Mell P.: NIST special publication on intrusion detection system. SP800-31, NIST, Gaithersburg, MD (2001)

3. Bazan J.: A Comparison of Dynamic and non-Dynamic Rough Set Methods for Extracting Laws from Decision Tables, In: Skowron A., Polkowski L.(ed.), Rough

Sets in Knowledge Discovery 1, Physica-Verlag, Heidelberg, (1998) 321–365

4. Bazan J., Nguyen H. S., Nguyen S. H., Synak P., and Wrblewski J.: Rough set algorithms in classification problem. In: Polkowski L., Tsumoto S., Lin T.Y. (eds.),

Rough Set Methods and Applications, Physica-Verlag, Heidelberg, (2000) 49-88. 5. Bazan J. G., Szczuka M. S., Wrblewski A.: A New Version of Rough Set Exploration

System. In: Proceedings of the Third International Conference on Rough Sets and Current Trends in Computing RSCTC, Malvern, PA, Lecture Notes in Artificial Intelligence vol. 2475, Springer-Verlag (2002) 397-404

Available at: http://logic.mimuw.edu.pl/~rses/

6. Cabrera J. B. D., Ravichandran B., Mehra R. K.: Detection and classification of intrusions and faults using sequences of system calls. In: ACM SIGMOD Record, Special Issue: Special Section on Data Mining for Intrusion Detection and treat Analysis, Vol. 30(4) (2001) 25-34

(5)

7. Cai Z., Guan X., Shao P., Peng Q., Sun G.: A Rough Set Theory Based Method for Anomaly intrusion Detection in Computer Network Systems. J Expert System 20(5) (2003) 251-259

8. Cios K., Pedrycz W., Swiniarski Roman W.: Data mining methods for Knowledge discovery. Kluwer Academic Publisher USA, (2000)

9. DARPA 1998 Data Set, MIT Lincoln Laboratory, available at: http://www.ll.mit.edu/IST/ideval/data/data index.html

10. Delic D., Lenz Hans-J, Neiling M.: Improving the Quality of Association Rule Min-ing by Means of Rough Sets. In: ProceedMin-ings of the First International Workshop on Soft Methods in Probability and Statistics (SMPS’02), Warsaw (poland) (2002) 11. Forrest S., Hofmeyr S. A., Somayaji A.: Computer Immunology. Communications

of the ACM, 40(10) (1997) 88-96

12. Forrest S., Hofmeyr S. A., Somayaji A., Longstaff T. A.: A Sense of Self for Unix Processes. In: Proceedings of the 1996 IEEE Symposium on Research in Security and Privacy. Los Alamitos, CA. IEEE Computer Society Press, (1996) 120-128 13. Garvey T., Lunt T. F.: Model-based Intrusion Detection. In: Proceedings of the

14th National Computer Security Conference. (1991) 372-385

14. Grzymala-Busse J. W.: A New Version of the Rule Induction System LERS. Fun-damenta Informaticae, 31(1) (1997) 27-39

15. Guan J. W., Bell D. A., Liu D. Y.: The Rough Set Approach to Association Rule Mining. In: Proceedings of the Third IEEE International Conference on Data Mining (ICDM’03), (2003)

16. Helman P., Liepins G.: Statistical Foundations of Audit Trail Analysis for the Detection of Computer Misuse. IEEE Transactions on Software Engineering, 19(9) (1993) 886-901

17. Hofmeyr S. A., Forrest A., Somayaji A.: Intrusion Detection Using Sequences of System Calls. Journal of Computer Security, 6 (1998) 151-180

18. Ilgun K.: USTAT: A Real-Time Intrusion Detection System for UNIX. In: Pro-ceedings of the 1993 IEEE Symposium on Research in Security and Privacy. (1993) 16-28

19. Ilgun K., Kemmerer R. A., Porras P. A.: State Transition Analysis: A Rule-Based Intrusion Detection Approach. IEEE Transactions on Software Engineering 21(3) (1995) 181-199

20. Kemmerer R. A.: NSTAT: A Model-based Real-time Network Intrusion Detection System. Technical Report, Number TRCS97-18, Computer Science, University of California, Santa Barbara. (1998)

21. Kumar S., Spafford E.: A Pattern-Matching Model for Intrusion Detection. In: Proceedings National Computer Security Conference, (1994) 11-21

22. Lee W., Stolfo S., Chan P.: Learning Patterns from Unix Process Execution Traces for Intrusion Detection. In: Proceedings of the AAAI97 workshop on AI methods in Fraud and risk management. AAAI Press. (1997) 50-56

23. Lee W., Stolfo Salvatore J.: Data Mining Approaches for Intrusion Detection. In: Proceedings of the 7th USENIX Security Symposium (SECURITY-98), Usenix Association, January 26-29. (1998) 79-94

24. Lian-hua Z., Guan-hua Z., Lang YU., Jie Z., Ying-cai B.: Intrusion Detection Using Rough Set Classification. Journal of Zhejiang University SCIENCE Vol. 5(9) (2004) 1076-1086

25. Lin T. Y.: Anomaly Detection: A Soft Computing Approach. In: Proceedings of the 1994 Workshop on New Security Paradigms, Little Compton, Rhode Island, United States, IEEE Computer Society Press (1994) 44-53

(6)

26. Lingras P.: Rough Set Clustering for Web Mining. In: Proceedings of the IEEE International Conference on Fuzzy Systems 2002, Honolulu, Hawaii (2002) 27. Lunt T. F.: Using Statistics to Track Intruders. In: Proceedings of the Joint

Sta-tistical Meetings of the American StaSta-tistical Association (1990)

28. Lunt T. F., Tamaru A., Gilham F., Jagannathan R., Neumann P. G., Javitz H. S., Valdes A., Garvey T. D.: A Real-Time Intrusion Detection Expert System (IDES) Technical Report, SRI Computer Science Laboratory (1992)

29. Mukherjee B., Heberlein L. T., Levitt K. N.: Network Intrusion Detection. IEEE Network. 8(3) (1994) 26-41

30. Mukkamala R., Gagnon J., Jajodia S.: Integrating Data Mining Techniques with Intrusion detection Methods. In: Research Advances in database and Information System Security: IFIPTCII, 13th working conference on Database security, July, USA, Kluwer Academic Publishers (2000)

31. Pawlak Z.: Rough sets: Theoretical aspects of reasoning about data. Kluwer Aca-demic Publishers, Dordrecht (1991)

32. Porras P. A.: STAT – A State Transition Analysis Tool For Intrusion Detection. Technical Report, Number TRCS93-25, Computer Science. University of Califor-nia, Santa Barbara (1993)

33. Rawat S., Gulati V. P., Pujari A. K.: Frequecy And Ordering Based Similarity Mea-sure For Host Based Intrusion Detection. J Information Management and Com-puter Security. 12(5), Emerald Press (2004) 411-421

34. Sebring M. M., Shellhouse E., Hanna M. E., Whitehurst R. A.: Expert System in Intrusion Detection: A Case Study. In: Proceedings of the 11th National Computer Security Conference, (1988) 74-81

35. Stefanowski J.: On Rough Set Based Approaches to Induction of Decision Rules. In: Polkowski L, Skowron A (eds) Rough Sets in Data Mining and Knowledge Discovery, vol 1. Physica Verlag, Heidelberg. (1998) 500-529

36. Tandon G., Chan P.: Learning Rules from System Calls Arguments and Sequences for Anomaly Detection. In: ICDM Workshop on Data Mining for Computer Secu-rity (DMSEC), Melbourne, FL. (2003) 20-29

37. Warrender C., Forrest S., Pearlmutter B.: Detecting Intrusions Using System Calls: Alternative Data Modelss. In: IEEE Symposium on Security and Privacy (1999) 38. Wespi A., Dacier M., Debar H.: Intrusion Detection Using Variable-Length Audit

Trail Patter. In : LNCS # 1907, RAID 2000. Toulouse, France. (2000) 110-129 39. Zhu D., Premkumar G., Zhang X., Chu Chao-Hsien: Data mining for Network

Intrusion Detection: A comparison of alternative methods. J. Decision Sciences 32(4) (2001) 635-660