Layered Conditional Random Fields for Network Intrusion Detection
4.3 Data Description
We perform our experiments with the benchmark KDD 1999 intrusion data set [12]. The data set is a version of the 1998 DARPA intrusion detection evaluation program, prepared and managed by the MIT Lincoln Labs. The data set contains about five million connection records as the training data and about two million connection records as the test data. In our experiments, we use the ten percent of the total training data and ten percent of the test data (with corrected labels) which are provided separately. This leads to 494,020 training and 311,029 test instances. Each record in the data set represents a connection between two IP addresses, starting and ending at some well defined times with a well defined protocol. Further, with 41 different features, every record represents a separate connection and, hence in our experiments, we consider every record to be independent of every other record.
Table 4.1 gives the number of instances for every class in the data set. The training data is either labeled as normal or as one of the 24 different kinds of attack. All of the 24 attacks can be grouped into one of the four classes; Probe, Denial of Service (DoS), unauthorized access from a remote machine or Remote to Local (R2L) and unauthorized access to root or User to Root (U2R).
Similarly the test data is also labeled as either normal or as one of the attacks belonging to the four attack classes. It is important to note that the test data includes specific attacks which are not present in the training data. This makes the intrusion detection task more realistic [12].
Table 4.1: KDD 1999 Data Set
Training Set Test Set
Normal 97,277 60,593
Probe 4,107 4,166
DoS 391,458 229,853
R2L 1,126 16,349
U2R 52 68
Total 494,020 311,029
4.4 Methodology 51
4.4 Methodology
Given the network audit patterns where every connection between two hosts is presented in a summarized form with 41 features, our objective is to detect most of the anomalous connections while generating very few false alarms. In our experiments, we used the KDD 1999 data set described in Section 4.3. Conventional methods, such as decision trees and naive Bayes, are known to perform well in such an environment; however, they assume observation features to be independent. We propose to use conditional random fields which can capture the correlations among different features in the data and hence perform better when compared with other methods.
The KDD 1999 data set represents multiple features, a total of 41, for every session in rela-tional form with only one label for the entire record. In this case, using a condirela-tional model would result in a maximum entropy classifier [108], [110]. However, we represent the audit data in the form of a sequence and assign label to every feature in the sequence using the first order Markov assumption instead of assigning a single label to the entire observation. Though, this increases complexity, it also improves the attack detection accuracy. To manage complexity and improve system’s performance, we integrate the layered framework, described in the previous chapter, with the conditional random fields to build a single system which is more efficient and more effective.
Figure 4.1 represents how conditional random fields can be used for detecting network intrusions.
= 0 = SF
Figure 4.1: Conditional Random Fields for Network Intrusion Detection
In the figure, observation features ‘duration’, ‘protocol’, ‘service’, ‘flag’ and ‘source bytes’ are used to discriminate between attack and normal events. The features take some possible value for every connection which are then used to determine the most likely sequence of labels< attack, attack, attack, attack, attack >or < normal, normal,normal, normal, normal >. Custom feature functions can be defined which describe the relationships among different features in the observation. During training, feature weights are learnt and during testing, features are evaluated
for the given observation which is then labeled accordingly. It is evident from the figure that every input feature is connected to every label which indicates that all the features in an observation determine the final labeling of the entire sequence. Thus, a conditional random field can model dependencies among different features in an observation. Present intrusion detection systems do not consider such relationships. They either consider only one feature, as in case of system call modeling, or assume independence among different features in an observation, as in case of a naive Bayes classifier. Our experimental results, described in Section 4.5, clearly suggest that conditional random fields can effectively model such relationships among different features of an observation resulting in higher attack detection accuracy.
We also note that in the KDD 1999 data set, attacks can be represented in four classes; Probe, DoS, R2L and U2R. In order to consider this as a two class classification problem, the attacks belonging to all the four attack classes can be re-labeled as attack and mixed with the audit patterns belonging to the normal class to build a single model which can be trained to detect any kind of attack. Another approach for considering the same problem, as a two class problem, is to use only the attacks belonging to a single attack class mixed with audit patterns belonging to the normal class to train a separate sub system for all the four attack classes. The problem can also be considered as a five class classification problem, where a single system is trained with five classes (normal, Probe, DoS, R2L and U2R) instead of two. Such a system can easily identify an attack once it is detected but is very slow in operation, making their deployment impractical in high speed networks.
As we will see from our experimental results, particularly from Table 4.14 in Section 4.6, considering every attack class separately not only improves the attack detection accuracy but also helps to improve the overall system performance when integrated with the layered framework.
Furthermore, it also helps to identify the class of an attack once it is detected at a particular layer in the layered framework. However, a drawback of this implementation is that it requires domain knowledge to perform feature selection for every layer. Nonetheless, this is one time process and given the critical nature of the problem of intrusion detection, if domain knowledge can help to improve the attack detection accuracy it is recommended to do so.
Using conditional random fields improve the attack detection accuracy particularly for the U2R attacks. They are also effective in detecting the Probe, R2L and the DoS attacks. However, when we consider all the 41 features in the data set for each of the four attack classes separately,
4.4 Methodology 53
conditional random fields can be expensive during training and testing. For a simple linear chain structure, the time complexity for training a conditional random field is O(TL2NI)where T is the length of the sequence, L is the number of labels, N is the number of training instances and I is the number of iterations. During inference, the Viterbi algorithm [118], [119] is employed which has a complexity of O(TL2). The quadratic complexity is significant when the number of labels is large as in language tasks. However, for intrusion detection there are only two labels normal and attack and, thus, our system is very efficient. We further improve the overall system performance by implementing the layered framework and performing feature selection which decreases T, i.e., the length of the sequence. We now describe feature selection for all the four attack classes.
4.4.1 Feature Selection
Attacks belonging to different classes are different and, hence for better attack detection, it be-comes necessary to consider them separately. As a result, in our layered system, we train every layer separately to optimally detect a single class of attack. We therefore select different features for different layers based upon the type of attack the layer is trained to detect. In Figure 4.2, we represent a detailed view of a single layer (Probe layer) which can be used to detect Probe attacks in our integrated system.
All Features
Probe Layer Feature Selection
Audit Data
(Normal + Probe) Normal No
Yes Allow
Block
Figure 4.2: Representation of Probe Layer with Feature Selection
The Probe layer is optimally trained to detect only the Probe attacks. Hence, we use only the Probe attacks and the normal instances from the audit data to train this layer. Other layers can be trained similarly. Note that, we select different features to train different layers in our framework. Experimental results clearly suggest that feature selection significantly improves the
attack detection capability of our system. Ideally, we would like to perform feature selection automatically. However, experimental results in Section 4.6.2 suggest that present methods for automatic feature selection are not effective. Hence, we use domain knowledge to select features for all the four attack classes. We now describe our approach for selecting features for every layer and why some features were chosen over others.
1. Probe Layer – Probe attacks are aimed at acquiring information about the target network from a source which is often external to the network. Hence, basic connection level fea-tures such as the ‘duration of connection’ and ‘source bytes’ are significant; while feafea-tures like ‘number of file creations’ and ‘number of files accessed’ are not expected to provide information for detecting Probe attacks.
2. DoS Layer – DoS attacks are meant to prevent the target from providing service(s) to its users by flooding the network with illegitimate requests. Hence, to detect attacks at the DoS layer; network traffic features such as the ‘percentage of connections having same destination host and same service’ and packet level features such as the ‘source bytes’ and
‘percentage of packets with errors’ are significant. To detect DoS attacks, it may not be important to know whether a user is ‘logged in or not’ and hence, such features are not considered in the DoS layer.
3. R2L Layer – R2L attacks are one of the most difficult attacks to detect as they involve both, the network level and the host level features. Hence, to detect R2L attacks, we selected both, the network level features such as the ‘duration of connection’, ‘service requested’
and the host level features such as the ‘number of failed login attempts’ among others.
4. U2R Layer – U2R attacks involve the semantic details which are very difficult to capture at an early stage at the network level. Such attacks are often content based and target an application. Hence for detecting U2R attacks, we selected features such as ‘number of file creations’, ‘number of shell prompts invoked’, while we ignored features such as ‘protocol’
and ‘source bytes’.
From all the 41 features in the KDD 1999 data set, we select only five features for Probe layer, nine features for DoS layer, 14 features for R2L layer and eight features for U2R layer. Since every layer in our framework is independent, feature sets for all the four layers are not disjoint. We list the features used for all the four layers in Appendix B.
4.4 Methodology 55
4.4.2 Integrating the Layered Framework
The layered framework, introduced in Chapter 3, is general and can be tailored to build specific in-trusion detection systems. In this section, we describe how we can integrate the layered framework with the conditional random fields to build an effective and an efficient hybrid network intrusion detection system.
Given the four different attack classes in the KDD 1999 data, we implement a four layer system where every layer corresponds to a single attack class. The four layers are arranged in a sequence as represented in Figure 4.3.
Figure 4.3: Integrating Layered Framework with Conditional Random Fields
In the system, every layer is trained separately with the normal instances and with the attack instances belonging to a single attack class. The layers are then arranged one after the other in a sequence as shown in Figure 4.3. However, during testing, all the audit patterns (irrespective of their attack class, which is unknown) are passed into the system starting from the first layer. If the layer detects the instance as an attack, the system labels the instance as a Probe attack and initiates the response mechanism; otherwise it passes the instance to the next layer. Same process is repeated at every layer until either an instance is detected as an attack or it reaches the last layer where the instance is labeled as normal if no attack is detected. We now give the algorithm to integrate the layered framework with conditional random fields.
Algorithm: Integrating Layered Framework & Conditional Random Fields Algorithm 1 Training
1: Select the number of layers, n, for the complete system.
2: Separately perform features selection for each layer.
3: Train a separate model with conditional random fields for each layer using the features se-lected from Step 2.
4: Plug in the trained models sequentially such that only the connections labeled as normal are passed to the next layer.
Algorithm 2 Testing
1: For each (next) test instance perform Steps 2 through 5.
2: Test the instance and label it either as attack or normal.
3: If the instance is labeled as attack, block it and identify it as an attack represented by the layer name at which it is detected and go to Step 1. Else pass the sequence to the next layer.
4: If the current layer is not the last layer in the system, test the instance and go to Step 3. Else go to Step 5.
5: Test the instance and label it either as normal or as an attack. If the instance is labeled as an attack, block it and identify it as an attack corresponding to the layer name.