Machine Learning based SCADA Intrusion detection

3 Prior Research

3.3 Machine Learning based SCADA Intrusion detection

Researchers at the University of Surrey has published a paper, presenting how machine learning methods can be utilized in intrusion detection systems protecting SCADA systems [35]. This subchapter sums up the methods discussed in this paper.

Intrusion detection is the process of observing and analyzing the events taking place in an information system to discover signs of security related issues. Intrusion detection systems (IDS) are traditionally analyzed by human security analysts. When the amount of data increases, this process will be time consuming and expensive. Machine learning has the capability to gather new data and make predictions based on the previous data. Machine learning methods in intrusion detection systems could detect more attacks, reduce the number of false positives and analyze more efficient than humans [35].

Rule-based Approach

This approach uses rules that describe the correlation between attribute conditions and class labels. When applied to intrusion detection, the rules becomes descriptive normal profiles of users, programs and other resources. The intrusion detection mechanism identifies a potential attack if users or programs act inconsistently with the established rules. The rules can be written in the form of if-then. If there are too many rules, the system can become difficult to maintain and can suffer from poor performance [35] , [36].

Artificial Neural Networks

This approach uses an artificial neural network (ANN), which involves a network of simple processing neurons, which make up the layers of “hidden” units, and can predict complex behavior, determined by the connections between the processing elements and element parameters. When applied to intrusion detection systems, an ANN could provide the capability of analyzing the data even if it is incomplete. Due this capability an ANN can

learn abnormal behaviors and identify potential attacks, even if the attacks are similar to prior attacks but do not match the previous malicious behaviors exactly. ANN provides fast speed and nonlinear data analysis. The main difficulty of an ANN is that it needs a large amount of training data to ensure accurate predictions [35].

Hidden Markov Model (HMM)

This approachs use the Hidden Markov Model (HMM), where the observed examples, , ∈ {1, … , }, have an unobserved state at time t. Each node in HMM represent a random variable with hidden state and observed value at time t. In HMM it is assumed that state has a probability distribution over the observed samples and that the sequence of observed samples embeds information about the sequence of states. Statistically, HMM is based on the Markov property that the current true state is conditioned only on the value of the hidden variable 1 but is independent of the past and future states. Similarly, the observation only depends on the hidden state . The famous solution to HMM is the Baum-Welch algorithm, which derives the maximum likelihood estimate of the parameters of the output given the data set of output sequences. When applied to intrusion detection systems, HMMs can effectively model variations in system behavior. To apply HMM for anomaly intrusion detection, we need a set of normal activity states = { , … . , } and a set of normal observations = { , … . , } [35] , [37].

Given an observation sequence = { , … . , }, the HMM searches for a normal state sequence of = { , … . , } which has a predicted observation sequence most similar to Y with a probability for examination. If this probability is less than a predefined threshold, we declare that this observation indicates an anomaly state [35].

Support Vector Machines (SVM)

This approach use Support Vector Machines (SVM), which are one of the leading machine learning tools, mostly used as a classifier. SVM is a family of learning algorithms for classification of data into two classes. It uses a function to map data into a space where it is linearly separable. The space where the data is mapped may be of higher dimension than the initial space. The SVM allows finding a hyperplane which optimally separates the classes of data: the hyper-plane is such that its distance to the nearest training data points is maximal [35].

The SVM has shown superior performance in the classification problem and has been used successfully in many real-world problems. However, the weakness of SVM is that it needs prior labelled data and is very sensitive to noise [35] , [38].

When applied to intrusion detection systems, patterns in the data that are normal or abnormal may not be obvious to operators and all above techniques rely on this prior information. Although these techniques proved to be a powerful classification tool, it is difficult without labelled data for tuning the algorithm [35].

One Case SVM (OCSVM): CockpitCI Approach

This approach aims to overcome the issue described above. The OCSVM separates attack data from the normal data, and can be considered as a regular two-class SVM where all the data lies in the first class and the origin is the only member of the second class. The basic idea of the OCSVM is to map the input data into a high dimensional space and construct an optimal separating hyper plane, which is defined as the one with the maximum spreading between the two classes. This optimal hyper-plane can be solved easily using a dual formulation. The solution is sparse and only support vectors are used to specify the separating hyper-plane. The number of support vectors can be very small compared to the size of the training set and only support vectors are important for prediction of future points. A function can be used to compute the separating hyper-plane without explicitly carrying out the mapping operations into the feature space and all necessary computations are performed directly in the input space [35] , [39].

Figure 26: OCSVM classification [35]

When applied to intrusion detection systems,OCSVM is used to train the offline data and generate a detection model. This model is used for intrusion detection. If the decision model returns a negative value, it implies an abnormal event. Unlike other classification methods, OCSVM does not need any labelled data (no signatures required) for training or any information about the kind of intrusion [35].

The researchers also provide a performance comparison of the described machine learning techniques, show in table 2.

Table 2: Performance comparison of machine learning techniques [35]

In document SCADA Intrusion Detection System Test Framework (Page 36-39)