Intrusion
Detection
Response
Network Traffic Traffic Features Alarms ActionsFigure 2.3:The core architectural components of anNIDS and the key data items the components transfer.
SNIDS and ANIDSs have different advantages and disadvantages. SNIDSsdo not suffer from many of the difficulties discussed in the pre- vious section (Section 2.2.1). In theory, these systems have low false positive rates and are highly effective in recognizing known attacks. To obtain signatures, instead of requiring difficult to obtain datasets, they need intrusion examples. Finally, countermeasures can be tai- lored to specific attacks. For these and other reasons, SNIDSs are the most widespreadNIDSs(e. g., Bro [Paxson,1999], Snort [Roesch,1999] and Suricata3
). Creating signatures is a major challenge of SNIDSs and many researchers have created automated mechanisms, often based on ML, to derive signatures (e. g., [Kim et al., 2004; Kreibich et al.,2004]). We have also worked in this direction [Vasilomanolakis, Srinivasa, Cordero, et al., 2016]; however, this thesis concentrates on ANIDSsas, we argue, the effectiveness of SNIDSs is in decline. SNIDSs need accurate signature databases to be effective. However, creat- ing and maintaining these databases is becoming daunting due to 3 https://www.openinfosecfoundation.org
b a c k g r o u n d a n d r e l at e d w o r k
the constant appearance of novel threats, the popularization of en- crypted communication mediums, and the growth of attack surfaces. Under these conditions, even large and constantly updated signature datasets are not sufficient to protect networks.
Despite some of their problematic aspects, ANIDSs have qualities that make them more appropriate thanSNIDSsin many circumstances: ANIDSscan detect attacks that have not been seen before. For this rea- son, the field of NIDSs is still actively developing anomaly detection methodologies [Pimentel et al., 2014]. A popular and effective tech- nique to identify anomalies in network traffic is to use the subspace method [Lakhina, Crovella, and Christophe Diot,2004]. This method subspace method
consists in splitting network traffic into disjoint normal and abnormal subspaces using techniques such asPCA[Ringberg et al.,2007]. Many researchers, however, have criticized PCAstating that the mechanism is not robust enough (e. g., [X. Li et al., 2006]). On the other hand, recently proposed modifications to PCA have made it more robust within networks (e. g., [Chen et al.,2016]). In Chapter4, we propose a robust isomorphic alternative to PCA that detects network traffic anomalies.
2.2.3 Anomaly-based Network Intrusion Detection
ANIDSsare based on thesuspicious hypothesis. The suspicious hypoth- esis states that anomalous events are deemed suspicious from a se- curity point of view [Estevez-Tapiador et al., 2004]. In the context of network traffic, an anomalous event refers to traffic that does not con- form to the expected behavior of a network. Anomalous events are found using an anomaly detection system (M,D), as defined in Sec- tion2.1.4. In the context of anANIDS, the model of normalityMfinds representations of normal network traffic while the similarity mea- sure D finds the distance between arbitrary networks traffic andM. If, according toD, network traffic is above a predetermined threshold, the traffic is considered anomalous. The model of normalityMcan be represented in many ways. Matthew V Mahoney and P. Chan [2003] useconditional rulesto model normal behavior.Autoencoders, a type of neural network, are successfully used in the literature to model nor- mality (e. g., [Dau et al.,2014]). Even models borrowed from physics, such asWaveletsandFourier transformation, are used to create models of normality [Jiang et al.,2014].
Anomaly detection uses two independent components to construct normality models and detect intrusions. The diagram in Figure 2.4 shows a simplified example of how information flows when con- structing normality models and detecting intrusions. Networks gen- erate traffic and one or many sensors collect the traffic to extract fea- tures. Features are aggregated and then distributed to the modeling component. This component is responsible for learning a model M modeling component
2.2 n e t w o r k i n t r u s i o n d e t e c t i o n s y s t e m s
to represent the normality of the features it received. The learned
model is shared with thedetection componentwhich, in turn, uses it to detection component
detect intrusions. To detect intrusions, the detection component com- pares the features it receives against the normality model M and, if the features are above a threshold according to similarity measureD, the traffic is labeled as anomalous. In consequence, ANIDSs need to be trained before they can detect intrusions. Note that the training process (i. e., creating a normality model) does not involve labeled network traffic and assumes that only normal network traffic is con- sidered.
X
Net
w
ork
Sensors
Sensors
Sensors
Sensors
Modeling
Component
Detection
Component
Model Traffic Traffic Traffic Traffic Features Features Features Features Features FeaturesFigure2.4:Information flow of an anomaly detection system. Network traf- fic is monitored by sensors that extract features. Sensors send features to a modeling component that is responsible for creating a normality model. The detection component uses the normality model to determine if some features are abnormal or not.
In principle, normality models can be constructed using arbitrary selections of features. In large networks, however, modern anomaly
detection techniques use the entropy of IP header fields as features. entropy of IP header fields
Entropy is a metric that efficiently calculates the dispersion and con- centration of a distribution [Ringberg et al.,2007]. Most network-wide intrusions affect the dispersion and concentration of IP header fields. Therefore, entropy is a suitable metric to learn normality models that represent large networks. Lakhina, Crovella, and Christiphe Diot [2004] provided most of the analysis that made the (Shannon) entropy of IP header fields a default feature in most other works. Many other researchers have also experimented and successfully demonstrated the usefulness of other types of entropies as features (e. g., the nonex- tensive or Tsallis entropy [Tellenbach et al.,2011; Ziviani et al.,2007]). Beyond using the entropies of IP header features, researchers have created improved anomaly detectors by mixing the entropy of other feature types beside IP header fields. A notable example is proposed by Nychis et al. [2008]. In their work, they improve anomaly detec- tion by using the entropy of the behavior of flows (i. e., the in- and out-degree distributions of hosts).
The diagram in Figure 2.4 assumes that one entity is responsible for building one single model of normality. Likewise, the diagram
b a c k g r o u n d a n d r e l at e d w o r k
implies that only one entity is responsible for using the normality model in the detection component. These two assumptions limit the system in its scale. To cope with this limitation, researchers propose groups of collaborative NIDSs, or Collaborative Intrusion Detection Systems (CIDSs). These systems are the topic of discussion in the next section.
2.3 c o l l a b o r at i v e i n t r u s i o n d e t e c t i o n s y s t e m s
The necessity to detect collaborative attacks has brought forth collabo- rative defenses. Collaborative Intrusion Detection Systems (CIDSs) are collections of NIDSs that together collaborate to detect widespread intrusions. Computer networks can reach monumental sizes, creat- ing an environment where attackers can easily conceal their activities. The goal of a CIDSis to detect those undesired activities that would otherwise be overlooked by individualNIDSs. ACIDSis composed of multiple sensors, communication channels and one or more analy- sis units. As in an NIDS, sensors are responsible for monitoring local
CIDS sensors
network traffic. Analysis units, in contrast to anNIDS, can be plentiful
CIDS analyzers
and have different roles depending on whether the collaboration level of aCIDSis at the detection or alarm level (see Section2.3.2). Analysis units share the responsibilities of the modeling and detection compo- nent ofNIDSs(see Figure2.4).
CIDSsare complex systems that can be organized differently accord- ing to different criteria. In the coming two sections, we expand upon
organizational
criteria two different organizational paradigms of CIDSs. The first paradigm
considers the communication overlay of aCIDS. The second paradigm
communication
overlay takes into account the collaboration level at which a CIDS operates. collaboration level Regardless of how they are organized,CIDSsare made up of the same
components. The architectural components of CIDSs are presented in
architectural
components Section 2.3.3. This architecture plays an important role in this thesis as it is used as the foundation by which this thesis’ contributions are organized.
2.3.1 CIDS Communication Overlays
According to the communication overlay they use,CIDSscan be orga- nized into three different classes. Figure 2.5shows the three commu- nication overlays by which related work can be organized [Vasiloma- nolakis, Karuppayah, et al.,2015]. CentralizedCIDSstend towards the centralized CIDSs
best detection accuracy given that the data of all sensors is analyzed by one single analyzer. Its obvious deficiency is that it does not scale well to large networks. HierarchicalCIDSs alleviate the scalability is-
hierarchical CIDSs
sue by creating hierarchies of analyzers. At each level of the hierarchy, an analyzer processes the data of a limited number of sensors. Analyz- ers may collect (and aggregate) the results of other analyzers to create
2.3 c o l l a b o r at i v e i n t r u s i o n d e t e c t i o n s y s t e m s
meta-models. The lower an analyzer is in the hierarchy, the narrower its view is and, theoretically, the less accurate its network-wide detec- tion capabilities are. Both centralized and hierarchical overlays have