In this section, the significant impacts the study has contributed to the knowledge domain are presented.
6.2.1 Deep Packet Inspection (DPI)
Deep Packet Inspection (DPI) is the further examination of packet level data to gain a further insight into network traffic communications consequently, there are challenges associated with applying DPI for intrusion detection purposes. Pimenta Rodrigues et al. (2017), had identified two challenges these are, processing the increased volume of data needed for conducting DPI and the challenge of storing the increased volume of data.
The findings from this study suggest a reduced dataset (evenly and coherently represents) of the respective full dataset could address the challenges of applying DPI for intrusion detection purposes. In addition, the reduced datasets lifted the processing burden of DPI. This study has shown that selected machine learning algorithms are more efficient at processing the reduced datasets than their associated full datasets, adding knowledge to the domain. In addition, a reduced dataset requires less storage space, addressing the challenge of storing data that occurs when dealing with the significant volumes of data traversing contemporary networks.
Studies have been conducted on the use of adversarial based intrusion detection using DPI, however the studies predominantly focused on the use of a single action or command to determine an intrusion (Koch et al., 2014; Kudłacik et al., 2016; Moon et al., 2016; Tsai et al., 2017). In addition to providing evidence that the patterns extracted from the reduced datasets by the selected machine learning algorithms can be more precise and more efficiently processed, this study has shown that patterns extracted from sequential adversarial SSH commands can also be used for intrusion detection purposes Thus, the study has contributed to the domain of utilising DPI data for pattern-based intrusion detection.
6.2.2 Enhance Machine Learning for Pattern-Based Intrusion Detection
The two main approaches to enhancing machine learning algorithms for intrusion detection purposes, improvement in the features selected or the development of a hybrid algorithm (Gauthama Raman et
al., 2017). As such, this study was not concerned with implementing enhanced versions of the chosen
machine learning algorithms but instead the focus was on improving the feature selection process through appropriately pre-processing the datasets.
The pre-processing phase developed for the study had two main aims, standardising the format of datasets and generating a reduced dataset that is an evenly and coherently representation of the
161
respective full dataset. The results from this study showed that the pre-processing procedure developed can be used to enhance the performance of the selected machine learning algorithms to extract more precise patterns efficiently from a sequence of adversarial commands. this outcome contributes to the intrusion detection knowledge domain by suggesting an appropriately reduced dataset can be used to precisely detect adversarial activities efficiently, mitigating the exposure of assets on a network.
Four machine learning algorithms had been selected to validate the pre-processing procedure developed for this study. This study has shown that the developed pre-processing procedure could be utilised to enhance the precision and efficiency of these machine learning algorithms to extract patterns. The contributions the study has made to the use of the selected machine learning algorithms for intrusion detection purposes are presented below.
6.2.2.1 Impact of the Reduced Datasets on the Naïve Bayes algorithm
The findings from this study show that an appropriately pre-processed reduced dataset can enhance the performance of the Naïve Bayes algorithm using DPI data. The contribution to the knowledge domain is that the Naïve Bayes algorithm can be negatively affected by datasets that consist predominantly of duplicated or discrete data.
There are limited studies that have been conducted in improving the efficiency of the Naïve Bayes algorithm to classify data (Kevric et al., 2017). The results from this study provide evidence that an appropriately reduced dataset can be utilised to enable the Naïve Bayes algorithm to efficiently process data, thereby suggesting an approach to enhancing the precision and efficiency of the Naïve Bayes algorithm in the cyber security knowledge domain. This finding in particular could be applied to other knowledge domains such as malicious software detection.
6.2.2.2 Impact of the Reduced Datasets on the Markov Chain algorithm
The results show the Markov Chain algorithm was the best performing algorithm, at efficiently extracting more precise patterns from a reduced dataset compare to the respective full dataset. Studies within literature had identified efficiently converging a transition matrix as a challenge of the Markov Chain algorithm (Ali et al., 2018). The findings from this study contribute to the knowledge domain by suggesting an appropriately pre-processed dataset can improve the convergence time of a transition matrix. Further, the results provide evidence that a possible method for enhancing the Markov Chain algorithm is through appropriately pre-processing a dataset, allowing for more precise patterns to be extracted efficiently. Lastly, the results presented here suggest that the Markov Chain algorithm could be used to determine a sequence of commands and can be applied to other cyber security knowledge domains such as Industrial Control Systems (ICSs). The protocols utilised by ICS devices to
162
communicate are also command based. The findings from this study could be applied to identifying malicious sequences of commands sent by compromised devices.
6.2.2.3 Impact of the Reduced Datasets on the Apriori algorithm
While the Apriori algorithm is commonly applied to packet level data, the findings from this study show the algorithm can be applied to DPI data as well. The results have shown that the Apriori algorithm is negatively affected by datasets that consist predominantly of duplicated data, as the same number of patterns had been extracted from the full and reduced datasets. However, the efficiency of the Apriori algorithm is affected by datasets that consist predominantly of discrete data, as the full dataset was more efficiently processed by the algorithm compared to the reduced dataset. The findings from this study contribute to the knowledge domain in applying the Apriori algorithm to DPI data for pattern-based intrusion detection purposes. The findings could also be applied to other cyber security strategies such as Intrusion Prevention Systems (IPSs), if a sequence of known malicious commands are seen by the IPS an adversary can be prevented from completing the launched attack.
6.2.2.4 Impact of the Reduced Datasets on the Eclat algorithm
The findings from this study show the Eclat algorithm can be utilised to extract patterns from sequential adversarial commands. However, datasets that predominantly consist of duplicated data negatively affect the Eclat algorithm in extracting more patterns. The findings show that while the Eclat algorithm can be used to extract more precise patterns from appropriately reduced datasets, it is less efficient at processing the appropriately reduced datasets. This study contributes to the use of the potential use of the Eclat algorithm in the cyber security domain since there are a lack of studies conducted, as argued in the literature review in section 2.3.1