CHAPTER VII CONCLUSIONS

(1)

CHAPTER VII

CONCLUSIONS

To do successful research, you don’t need to know everything, you just need to know of

one thing that isn’t known. -Arthur Schawlow

In this chapter, we provide the summery of the thesis and discuss the possible extensions as future work.

7.1 Summery

The thesis tries to answer the problem, stated earlier as “How to make the IDS fast enough to process data on-line and detect attacks early and in case of anomaly-based IDS, how to reduce the false positives to an accepted level, with a high detection rate.” Consequently, we propose some techniques, which individually, constitute the answer of the problem.

We study the existing techniques and algorithms to understand their strength and weakness and to get motivation. As a result, we provide a detailed survey on the IDS techniques and methods. In this thesis, our focus has been on anomaly-based IDS. We propose techniques for host- and network-based IDS. For host-based systems, we analyze the system calls, invoked by the processes and for network-based systems, we analyze network packets.

we study the behavior of unix processes in terms of system calls and observe that not all of the system calls, invoked by the processes, are necessary to define its behavior as normal or abnormal. Considering such system calls as redundant or noise, we make use of a linear algebraic techniques, called singular value decomposition to reduce the noise. The idea is inspired by an information retrieval technique - latent semantic indexing. Though the SVD reduces the dimension of the data by projecting it to some space of lower dimension, it is very difficult to interpret the new dimensions. In order to show the appropriateness of the idea of using SVD, we produce empirical results. We show that SVD removes only the

(2)

not-so-important system calls, likemmap, from the data. We also compare our results with already established scheme to show that reduction in data does not lead to a degradation in accuracy. We show results in terms of ROC curve and AUC score. Such methods are useful in making IDS fast by reducing the data, to be analysed.

Motivated by the work, described in [120], we study the kNN based classifier with cosine metric and find the cases where kNN based scheme may produce some erroneous results. We, therefore, propose a new similarity measure, termed as binary weighted cosine (BWC) metric. BWC is based on the frequency and number of common system calls be- tween two processes. BWC metric is used to calculate the similarity and kNN is used to classify the process as normal or abnormal. This scheme represents an example of distance weighted kNN classifier. We also think of applying SVD approach further to reduce the data, but as our scheme involves two matrices, it is not feasible to do so. We extend the above scheme by including the partial information about the order of occurrence of indi- vidual system calls in the process. For this purpose, we make use of Kendall Tau distance, which is used in rank aggregation [51]. We provide experimental results for each of the schemes on DARPA’98 data, as ROC curve and AUC score. The results are compared with relevant scheme, proposed in the literature. However, the calculating the Kendall Tau distance is computationally very expensive. Also, we observe that on the expense of more computational time, the rise in the accuracy is not in that proportion. This point sets the ground and motivation for our next work.

In order to capture the ordering information of system calls, we should use the small sequences of system calls, as they are appearing in the process. We also notice that not whole of the process is abnormal, as compared to normal process. Only a small part (or parts) of an abnormal process is abnormal, while most of it is similar to normal process.

We, therefore, project an IDS as a decision table and apply rough set [151] based techniques to extract the feature for normal and abnormal processes. The processes corresponding to some attacks and normal ones are represented in a decision table. The lower approxima-

(3)

rough set based rule learning algorithm - LEM2 [72] to generate IF-THEN type rules. These rules are used to classify the processes as normal or abnormal. In this way, we are able to analyze the process while it is in running state, thereby making it suitable for on-line intru- sion detection system. We experiment on DARPA’98 data, using RSES tool. We are unable to provide AUC score for this scheme as in this case, classification of processes does not produce any ranking. Also, we faced problem while working with RSES, as only GUI is available for use. Therefore, there is no flexibility of customizing it as per the requirement of experiments.

Although, through out our work, we concentrated on host-based IDS, the multiscal- ing property of wavelets in analyzing network traffic motivated us to explore its usability in IDS. The idea is inspired by the work described in [66][13]. The self-similarity, which is exhibited in network traffic, is taken as the characteristics of normal traffic. The self- similarity is characterized by estimating Hurst parameter H. The loss of self-similarity in the network data can be attributed to the presence of some anomaly. We extend the energy- scale plot based scheme for estimating H, by enabling it to detect the locality of the anomaly and the scale on which the anomaly is exhibited. For this purpose, we use wavelet theory and definition of self-similarity. The proposed scheme performs well on KDDcup’99 data.

We further provide the extension of the above mentioned scheme. The algorithm proposed herein integrates the wavelet transform with singular value decomposition (SVD) for the analysis of self-similar network traffic data. The algorithm makes use of the properties of the SVD of a matrix of local energies of wavelet coefficients, to determine the scales over which the data have possibly normal behaviour and locations at which the data have possible anomalous behaviour. We concentrate more on the theoretical aspects of our work.

To show applicability of our method, we have taken a very small known self-similar data.

However, to justify our approach empirically, we apply it on real network data, captured from an operational financial network INFINET [84] and kdd data set [89].

We next discuss some of the possible future extensions of the work summarized above.

(4)

7.2 Future Work

In this thesis, we present various techniques that, in one or other way, contribute to make IDS more efficient. There are several interesting future directions, out of which few are mentioned below.

• We used SVD to reduce the dimension of the data. But it is difficult to interpret the result. We can use other techniques that reduce the dimension by discarding the features that are not much discriminatory i.e. which system calls are really important to understand the normal behavior of process. In this context, entropy based approaches like information gain and rough based techniques like reduct can also be used. One good thing about such approaches is that they can calculate explicitly the importance of each system call, and therefore, are easy to interpret the results.

• Almost all of the process behavior-based anomaly approaches are proposed for Unix based system. It should be interesting to apply such approaches on Windows based OS.

• In BWC metric, we consider the frequency of individual system call. Instead of tak- ing single system call, a combination of two or more system calls can be taken. In this way, we can capture co-occurrence of system calls as well, which may produce better results.

• There is a lack of formal analysis methods for IDS. This requires a mathematical model and reasoning based on that. In this direction, one possibility is to consider a process as POMSET P (partially order multiset), by defining a relation ‘< ’ as s_i <

s_j, i 6= j ⇒ system call s_i is followed by s_j, where s_i, s_j ∈ P . This is just an idea and requires further investigation.

• We used wavelets to analyze network data. Wavelets can be also be used on system calls data. Once we construct the incidence matrix A, defined in section 3.3, we

(5)

with other system calls, in a process. This may be useful in monitoring and profiling a process.

7.3 Concluding Remarks

During the years, spent on the work reported in this thesis, I experienced moments of joy and sorrow, excitement and resentment. Each of the failures forced me to think more and work hard, and of course, knock my supervisors more frequently. Though it is the end of this thesis, but I find it is the beginning of the journey as a researcher to contribute to and serve the society more.

‘‘The woods are lovely, dark and deep, But I have promises to keep,

And miles to go before I sleep, And miles to go before I sleep.”

From ”Stopping by Woods on a Snowy Evening” - Robert Frost (1875 - 1963)

* * * * *