CHAPTER 2. Correlation based Node Behavior Profiling for Enterprise
2.2 Related Work
Profiling has been widely used in many areas of computer networks [7,13,16,17,19,25]. We mainly focus ourselves on node related profiling in this section. Prior to node level profiling, many studies have been focused on the flow level profiling. In [17] [18] [104], the authors characterize the Internet traffic for traffic classification by profiling the traffic at flow level. Flow level features are considered during profiling, including flow duration, port number, inter-arrival time, bandwidth, payload size. Flows are further clustered into different groups such that traffic type from an unknown port can be identified if the flow characteristics match to a known traffic type group. However, as the Internet traffic varies broadly across different networks, these approaches either encounter performance challenges or produce unstable outputs for different
traces [8]. In addition, statistical metrics such as mean and variance used in above research papers require sufficient samples of a flow before traffic type identification, which is not suitable for short-term malware traffic.
Therefore, one solution to the above issues is to use node (level) behavior profiling. In the literature, node behavior profiling can be generally categorized as interest-based [23], entropy-based [4,21], and attribute-based [5,8,20,105,106]. These classes are determined from their selection of features and the relevant values used for profiling. The interest based approach [23] aims at preventing the worm from fast propagation through building the host profile and a set of if-then like rules. The host profiles consist of different combinations of four tuples (protocol, srcIP, dstIP, dstport) from historical data. Discovering that historical data is not a good predictor for the usage of ephemeral ports, the authors in [23] set no constraint for node pairs if they have communicated via ephemeral ports in the profile. To find ephemeral ports, they associate two tuples (number of connections, number of servers) with all destination ports, and then use k-means clustering. In the evaluations, they show that their scheme can prevent the simulated worm propagation very well. However, it is not certain in their paper if the scheme can detect the worm propagation.
For entropy based approaches [4, 21], the main idea is to use entropy to summarize the feature distributions to lower the dimension of features. In detail, srcIP, srcport, destIP and destport are considered as four features for each given host. Each time interval the entropy of each feature is updated. By use of clustering the known attacks, the authors in [21] are able to find different synthetically injected attacks such as DOS and worm scan. In [4], by studying the entropy change of other three features with one fixed, the authors are able to extract important behaviors for that given feature. And in their evaluations, they show that it is possible to differentiate the normal behaviors from the abnormal behaviors.
For attribute based approaches, the authors in [20] propose BLINC to study node behavior profiles. It is a graphlet consisting of five tuples (srcIP, protocol, dstIP, srcport, dstport), and capable of modeling transport layer flow patterns of specific applications for each host.
Given the pre-known flow patterns for different applications, BLINC is further used for traffic
classification at aggregate level. As shown in [20], it can classify approximately 80%-90% of the total number of flows in each trace with 95% accuracy. However, as pointed out by [5], this work is only suitable in a supervised manner, given the flow pattern of different applications already known.
An extended version of graphlet is further proposed in [5] by the same authors. By in-troducing an additional tuple (dstIP) to the original five tuples for the graphlet, and a set of rules (e.g. delay-accept, aging) in constructing activity and profile graphlets, they achieve a good balance of concise node behavior representation and meaningful characterization in their initial study on real user data. Similarly, in [8], the authors consider another five tuples (dai-lydestnum, dailybytenum, tcpport, udpport, communication similarity) to capture the node behavior. Rather than using graph representations, the profiles are in XML-like format and agglomerative clustering is used to group the nodes of similar values in those five tuples. Eval-uations on real user data showed their scheme has the potential to detect worm outbreak. In addition, the authors in [105, 106] build a relatively simple host behavior profile which only uses the number of destinations contacted and a list of destination port numbers.
The authors in [102] propose an architecture for network security by sharing and reporting past behavioral patterns about network hosts. However, the behavioral patterns are only attack behaviors observed from network entities; no normal behavior pattern is considered. In addition, as pointed out in [8], such design brings in trust problem, and it can not detect the anomalies instantaneously with online traffic.
To summarize, although the above work did a good job in detecting the abnormal traffic, most of them are at the stage of using examples to illustrate the possibility of their schemes.
Moreover, few of them evaluate their schemes thoroughly by considering the false positive rates. In the literature, the work in [8] is the most similar one to ours. However, our work mainly differs from theirs in the following ways:
1. The main difference is that we profile the node behavior in a way which can be coupled with further statistical analysis, and we consider a more complicated case where multiple new worms are used which could be further applied for botnet construction, all these
were not studied in [8];
2. As an important metrics, we evaluate the proposed detection metric by false positive rate, while the false positive rate is hard to measure hence not considered in [8];
3. We profile all the nodes in the network, while in [8], only about 17% of nodes have their profiles (they can only profile most active normal nodes, which may not be enough for malware detection at early stage);
4. We use a shorter time interval and jointly consider the time and node correlation to build profile, while they built the profile on individual node in an accumulative level at longer time intervals (1 day).
In addition, it is uncertain that behavior profiles in [8] can be interpreted in a reasonable way, but we have shown in this Chapter that it is easy to interpret network traffic at the node level by our approach.