A Contextual Deep Clustering Based Intrusion Detection Method for Cloud

(1)

A Contextual Deep Clustering Based Intrusion Detection Method for Cloud

B. Sudhakar¹, Dr. V. B. Narsimha², Dr. G. Narsimha³

1Reserach Scholar-JNTUH & Associate Professor, Department of CSE-GNIT

2Assistant Professor, Department of CSE- University College of Engineering, OU

3Professor, Department of CSE- JNTUH College of Engineering, JNTUH.

1[email protected]

2[email protected]

3[email protected]

Abstract— With the growth in the recent internet-based services and the information generated by these services have attracted many attackers to intrude in the services, infrastructure and the information generated by the services. The services many of the times must make itself visible for opening service features to the end user. The intruders take the advantages of these situations for making the attacks on the internet services.

Many of the parallel research attempts have aimed to detect the intrusions by automating the process of detections basedon the characteristics of the attacks. Although interruption location frameworks screen systems for conceivably malevolent movement, they are likewise arranged to bogus alerts. Subsequently, associations need to tweak their IDS items when they initially introduce them. It implies appropriately setting up the interruption discovery frameworks to perceive what ordinary traffic on the system resembles when contrasted with vindictive action.

Nevertheless, the characteristics-based approaches apply a primary technique called classification for the detection of the attacks or intrusions.

Most of the instances, it is been realized that the detection mechanism ignores some of the events as the parametric or characteristics-based detections cannot interpret the contexts of the values for these parameters at some events. This leads to the vulnerability of the security methods deployed to detect and prevent intrusions by attacks. The problem can be resolved using clustering the complete attack situations and identifying the overlaps between the attack events. Nonetheless, the clustering methods can be tricky as a smaller number of clusters can again stimulate the older problem of contextual insensitivity and the greater number of clusters can identify regular accesses also as attacks or intrusions.

Henceforth, this work proposes a contextual deep clustering method using the deep analysis of the Euclidian distance measures for finding the accurate number of clusters for better intrusion detection. As an outcome of the research this work demonstrates nearly 90% accuracy in the detection for the overall system and for selected clusters it demonstrates nearly 100% accuracy.

Keywords

—

Intruder, IDS, Clustering, Euclidian Distance, Deep Clustering.

I. INTRODUCTION

Intrusion attacks are the situations, where the attacking module triggers some functional activities to make the services or the functionalities unstable.The attacks by the intrusions can be highly vulnerable and can be initiated from any of the service components of the working system or from outside module attached to the working system. The fundamental strategy to prevent the intrusion is to identify and resist the intrusions. This process is called the intrusion detection and the functional version is called the intrusion detection systems. The most recent completely working model is proposed and reported in the work of C. Guo et al. [1] for multilevel security against the attacks. The systems like this are generally the software components, which can be added to the functional system as an added layer. An overall characteristic of the expected intrusion detection systems and the expected security aspects for each attack types are analysed and reported by G. V.

Nadiammai et al. [2]. The detection systems must be capable of detecting the intrusion and must be able to raise an alert into the system. Many of the parallel research works have demonstrated a further step as terminating the intrusions and restoring the working system to a prior safe state with the roll forward applications to make the system again up to the timestamp of the attack. The recent outcome from the work by P. Mishra et al. [3] have demonstrated that, the traditional approaches such as classification can be over written by newer methods to obtain the better results and this work takes the motivation from this to furnish a newer method called deep

clustering for intrusion detection. The details of this method are furnished in the upcoming sections of this work.

II. GENERIC CLUSTERING METHODS

In this section of the work, the generic clustering methods are analysed for further identification of the issues persist in these methods and identification of the problem. This section also helps in identifying the applicability of the deep clustering methods for intrusion detection.

A. K - Means Clustering Method

The K-Means clustering method is one of the traditional methods with less opportunities for flexibilities in cluster design.This method first selects various classes to utilize and haphazardly introduce their focuses. To make sense of the quantity of classes to utilize, it's great to investigate the information and attempt to distinguish any unmistakable groupings. The middle focuses are vectors of a similar length as every datum point vector.

The mathematical foundation for this method is presented here.

Firstly, assuming that the complete dataset is denoted as D[]

and each attribute in the dataset is assumed to be presented as,

(2)

A

xfor total of n number of attributes. Hence, the following relation can be formed.

1 2 3

[] , , ,....

_n

D  A A A A 

(Eq.1) Here, each and every attribute is considered to have their own domain with m number of records each and the data elements are denoted as

D

_i, which can be represented as,

1

[]

m

x i

i

A D



 

^(Eq.2)

The K – Means clustering method initiates the clustering process with pre-determined clusters. Assuming that, the clusters are denoted as C[] and each centroid for each cluster are denoted as



_ifor total of k number of clusters. Thus, this relationship can be denoted as,

1

[]

k i i

C 



 

^(Eq.3)

Further each domain of the total dataset must be evaluated for the centroid,



_X , of that specific domain and can be represented as,

1 m

i i X

i

D

 

^

D





(Eq.4)

Hence for all the attributes in the dataset, the overall centroids must be analysed, as,

1 2

...

_n

       

(Eq.5) Finally, the assignment of the elements from the dataset, must be assigned to the clusters. In light of these ordered focuses, the algorithm recomputes the gathering focus by taking the mean of the considerable number of vectors in the attributes. This can be represented as,

[1... ] [1... ]

i

k n

  

^(Eq.6)

The existing K - Means algorithm steps are elaborated here:

Algorithm: Traditional K – Means Algorithm Step - 1. Select different classes to use and

introduce their individual attention targets Step - 2. Every datum stage is arranged by enrolling

the separation between there and every amassing attention

Step - 3. Characterize the purpose to maintain the amassing whose centre is closest for this Step - 4. Re-compute the amassing focus by simply

taking the mean of this Substantial amount of vectors from the amassing

Step - 5. Repeat Measure -- 1 to Measure - 4 before all data points have been saved inside almost any bunch

B. Mean Shift Clustering Method

The Mean Shift clustering method is yet another traditional method for with the sliding mean variations on the total dataset and each data points. The fundamental principle of this method is to incrementally calculate the means of the centroids and further constructs the clusters. The advantage over the previous method is that, the number of clusters and the centroids are not pre-defined.

Firstly, assuming that the complete dataset is denoted as D []

A

1 2 3

[] , , ,....

_n

D  A A A A 

^(Eq.7)

Here, each and every attribute is considered to have their own domain with m number of records each and the data elements are denoted as

D

1

[]

m

x i

i

A D



 

^(Eq.8)

Further, assuming that the first data point is

D

_x , subsequently the distance from the other data points from this data point must be calculated for total n data points.

1 1

[]

n n

x x i

i i

D D D 

 

  

 

^(Eq.9)

Where,

 []

is the total distance set.

Henceforth, each and every distance set based on the least distance must be comprised into one cluster.

1 1

[]

n j j

j j

C



 





  



(Eq.10)

The existing Mean Shift algorithm steps are elaborated here:

(3)

Algorithm: Traditional Mean Shift Algorithm Step - 1. Start out with a circular sliding window

cantered in some stage C

Step - 2. The sliding window has been changed towards areas of greater density by simply altering the centre stage to the expression of those things inside the window

Step - 3. Carry on altering the slipping window Based on this mean until There's no direction where a change can adapt more things

Step - 4. Duplicate Measure -- 1 Step - 3 before all data points have been saved inside almost any bunch

C. Hierarchical Clustering Method

The hierarchical clustering method is yet another traditional method for with the advantages over the previous two methods as the mean can be adjusted based on the linear difference.

A

1 2 3

[] , , ,....

_n

D  A A A A 

D

1

[]

m

x i

i

A D



 

^(Eq.12)

Further, assuming that the first data point is

D

_x, this data point must be made part of the initial cluster, CX. As,

x X

D  C

(Eq.13)

Further, calculate the distance from other data points, the remaining clusters are designed. As,

1 1

,

1

n n

x i x j

i i

i x

j x i x

Iff D D D D Then D C

Else D C and D C

 



  



 

 

(Eq.14)

The existing Hierarchical algorithm steps are elaborated here:

Algorithm: Traditional Hierarchical Algorithm Step - 1. Start with handling each datum stage as a

lone group

Step - 2. Combine two classes into a single Whilst the 2 bunches to be merged are selected as people with all the tiniest linkage

Step - 3. Repeat Measure -- Step and 1 - 2 before all data points have been saved inside almost any bunch

Henceforth, it is natural to realize that the existing methods are bounded by either the fixed cluster properties or less flexible Euclidian distance measure. Hence, the demand for newer clustering mechanism is the demand of the research.

Further, this work illustrates the applicability of thedeep clustering or simple clustering mechanisms for the intrusion detection systems:

 Firstly, the intrusion detection systems depend on the parametric detection of the classes to determine the attack characteristics. Hence, reduction of over laps can significantly increase the chances for better detection of the intruder classes. This is possible with deep clustering.

 Secondly, the intrusion attacks can be similar in different approaches or by the nature of the damages.

Hence, the clustering methods can also help in identifying the small similarities of the intrusions.

Henceforth, with the detailed understanding the existing and popular clustering methods, in the next section of this work, the existing intrusion detection systems are discussed.

III. PARALLEL RESEARCH OUTCOMES:SURVEY

In this section of the work the parallel research outcomes are analysed and understood for identification of the research problems.

The generic intrusion detection systems are consisting of three different components as packet decoder, detection of the anomalies and finally the alerting systems. These functionalities can be performed using various methods and each and every system proposes their unique ways to determine the presence of the intrusions or the nature of the attacks. The outcome from the research of N. V. Patel et al. [4] have demonstrated the analysis of the application signature method can be helpful for detecting the nature of the operations.

Nonetheless, this method has also been demonstrated by Apple Inc and the results are not satisfactory.

In the other hand, many of the methods are also applied on the hardware systems to protect the working systems from the physical network levels. The conclusive research work by V.

Bontupaui et al. [5] have demonstrates many methods based on bit verifications.

Motivated by various methods for intrusion detection, the work by Z. Dewa et al. [6] demonstrated significant proof that the data mining methods such as classification or clustering can

(4)

be proven to be highly accurate for the detection. This research has increased the motivation towards the data mining methods.

As an outcome, the research work by J. Shen et al. [7] have demonstrated the secondary feature extraction for the detections. However, the accuracy is limited and cannot be applied for fewer types of classification errors. Also, yet another outcome from the work by W. L. Al-Yaseen et al. [8], which constructs a modified K-Means algorithm for the detection process. This method is also criticised for being less accurate due to the absence of interim classifiers.

In contrast, the application of the soft computing approaches is also widely popular for these purposes. The work by P. Ravi Kiran Varma et al. [9] have proposed to optimize the classification method using ACO for faster detection of the intrusions. As a matter of fact, for the ACO is the method demands huge volume of historical data in order to be accurate.

The final and most recent outcome by M. Zhu et al. [10] have demonstrated a novel method of including multiple data mining methods for creating highly accurate system. Nonetheless, this method is also being criticized for higher time complexity and some of the times inaccurate voting methods.

Henceforth, with the understanding of the parallel research outcomes, in the next section of the work, the research problem is formulated using mathematical modelling methods.

IV. PROBLEM FORMULATION

After the detailed understanding of the existing and parallel research outcomes, in this section of the work, the identified problem is formulated.

A

1 2 3

[] , , ,....

_n

D  A A A A 

D

1

[]

m

x i

i

A D



 

^(Eq.16)

Assuming that the first data point is

D

_xand there are two other data points as

D

_yand

D

_zavailable on the datasetand the distance between these data points are X, Y and Z respectively.

This can be formulated as,

1

x y

D  D  

(Eq.17)

And,

2

y z

D  D  

(Eq.18)

And,

3

x z

D  D  

(Eq.19)

In the situations, where

1 2

  

(Eq.20)

And,

3 1

  

(Eq.21)

And,

3 2

  

^(Eq.22)

Here, the existing clustering methods cannot decide the member of the clusters and the outliers. Thus, this problem must be addressed.

Henceforth, in the next section of this work, the proposed mathematical formulation for the proposed system is elaborated.

V. PROPOSED METHOD:MATHEMATICAL FOUNDATION

With the detailed understanding of the fundamental principles of the clustering mechanisms, parallel research outcomes and the drawback using the mathematical modelling, in this section of the work, the proposed solution is mathematically presented.

Assuming that the complete dataset is denoted as D [] and each attribute in the dataset is assumed to be presented as,

A

_x

for total of n number of attributes. Hence, the following relation can be formed.

1 2 3

[] , , ,....

_n

D  A A A A 

^(Eq.23)

Here, each and every attribute is considered to have their own domain with m number of records each and the data elements are denoted as

D

1

[]

m

x i

i

A D



 

^(Eq.24)

Further, the Euclidian distance between the data points can be considered as the similarity measure and the total distance set is represented as

 []

, then,

1 1

[]

n

i i

i

D D



_



  

^(Eq.25)

Further, the Euclidian distance between the elements of

 []

are calculated,

(5)

1

1 1

[]

n

i i

i



^

 

_



  

^(Eq.26)

The new

 []

set defines the relation between the elements based on their similarities.

Furthermore, the repetitive iteration of the Eq. 26 can measure the similarities with deeper and contextual aspect, which can be represented as,

1 1

[]

n k

k i i

i



^

 

_



  

^(Eq.27)

Thus, based on the similarity measures of Euclidian distance of the similarity measures of the elements and the Euclidian distance of the similarity measures of the Euclidian distances, the final cluster centroids can be calculated as,

0

[] [] []

1

n

k k

i

C

i i

 

_

 

 

^(Eq.28)

Henceforth, in the next section of the work, the proposed algorithm is elaborated.

VI. PROPOSED ALGORITHM:CONTEXTUAL DEEP CLUSTERING

In insights and related fields, a comparability measure or similitude work is a genuine esteemed capacity that evaluates the likeness between two articles. Albeit no single meaning of a closeness measure exists, typically such measures are in some sense the backwards of separation measurements: they take on huge qualities for comparable items and either zero or a negative an incentive for extremely divergent articles.

Cosine comparability is an ordinarily utilized similitude measure for genuine esteemed vectors, utilized in data recovery to score the likeness of archives in the vector space model. In AI, normal bit capacities, for example, the RBF bit can be seen as comparability capacities.

The proposed algorithm in simple steps are furnished here.

Algorithm: Deep Similarity Extraction for Cluster Formation using Contextual Information Algorithm (DSE-CF-CI) Step - 1. Accept the initial dataset

Step - 2. For each attribute set, A[i]

a. For each data points under the domain, D[i]

i. Calculate the distances as L[i] using Eq. 25

ii. For each distance L[i]

1. Calculate the similarity measures as L1[i] using Eq.

26

2. For each L[i] and L1[i]

a. Calculate the deep similarity

measures as L2[i]

using Eq. 27 Step - 3. Calculate the cluster centroids using Eq. 28

Step - 4. Form the clusters with the Step – 3 until all elements are placed in the clusters

The above algorithm is deployed for each and every attribute and all the attributes will define a unique cluster centroid characteristic based on the context of the data. Hence, this algorithm considers the contextual sensitivity of the parameters and results into a higher accuracy detection model.

The results, obtained from the proposed algorithm is significant and highly satisfactory, are elaborated in the next section of the work.

VII. CLOUD DEPLOYMENT MODEL FOR PROPOSED IDS

The proposed algorithm is deployed on a cloud architecture with the following architectural components [Fig – 1].

Fig. 1 Proposed Deployment Model

A host-based IDS is equipped for observing all or parts of the dynamic conduct and the condition of a PC framework, in light of how it is arranged. Other than such exercises as progressively examining system parcels focused at this

(6)

particular host (discretionary segment with most programming arrangements economically accessible), a HIDS may distinguish which program gets to what assets and find that, for instance, a word-processor has abruptly and mysteriously begun altering the framework secret word database.

The deployment model is realized on Amazon Web Services and the detailed are showcased here [Fig – 2].

Fig. 2IDS Instance at Action

Numerous PC clients have experienced devices that screen dynamic framework conduct as against infection (AV) bundles.

While AV programs frequently likewise screen framework state, they do invest a great deal of their energy seeing who is doing what inside a PC – and whether a given program ought to or ought to not approach specific framework assets. The lines become obscured here, the same number of the apparatuses cover in usefulness.

Further, the live monitoring of the proposed framework is realized on the real-time as well [Fig – 3].

Fig. 3 IDS Instance Live Monitoring

Henceforth, in the next section of the work, the proposed framework is again tested on cloud-based and local datasets and the results are realized.

VIII. RESULTS AND DISCUSSIONS

In this section of the work, the results are analysed for each cluster and for few of the parameters. As the actual KDD [11]

dataset includes class type variable and this proposed algorithm converts the class variables into numeric. Thus, the total number of variables cannot be presented in this work. Hence, few of the sample parameters are furnished here.

Firstly, the service type attribute for the label LINK is elaborated here [Table – 1].

TABLEI

CLUSTER ANALYSIS - SERVICE=LINK

Cluster Actual Euclidian

Distance

Proposed Method Euclidian

Distance

Main 0.0018 0.0015

Cluster_Group_Nam

e: 0 0.0015 0.0126

Cluster_Group_Nam

e: 1 0.0126 0.0000

Cluster_Group_Nam

e: 2 0.0000 0.0000

Cluster_Group_Nam

e: 3 0.0000 0.0014

Cluster_Group_Nam

e: 4 0.0014 0.0000

Cluster_Group_Nam

e: 5 0.0000 0.0000

Cluster_Group_Nam

e: 6 0.0000 0.0000

Cluster_Group_Nam

e: 7 0.0000 0.0000

Cluster_Group_Nam

e: 8 0.0000 0.0000

Cluster_Group_Nam

e: 9 0.0000 0.0018

Even though they relate with system security, an IDS differs from a firewall in that a firewall looks apparently for intrusions so as to prevent them from happening. Firewalls limit access between systems to reduce intrusion and don't indicate an attack from the network.

The results are visualized graphically here [Fig – 4].

Fig. 4 Cluster Analysis – Service = LINK

(7)

Secondly, the service type attribute for the label X11 is elaborated here [Table – 2].

TABLEII

CLUSTER ANALYSIS - SERVICE=X11 Cluster Actual

Euclidian Distance

Distance

Main 0.0007 0.0000

Cluster_Group_Nam

e: 0 0.0000 0.0036

Cluster_Group_Nam

e: 1 0.0036 0.0000

Cluster_Group_Nam

e: 2 0.0000 0.0000

Cluster_Group_Nam

e: 3 0.0000 0.0056

Cluster_Group_Nam

e: 4 0.0056 0.0000

Cluster_Group_Nam

e: 5 0.0000 0.0000

Cluster_Group_Nam

e: 6 0.0000 0.0000

Cluster_Group_Nam

e: 7 0.0000 0.0000

Cluster_Group_Nam

e: 8 0.0000 0.0000

Cluster_Group_Nam

e: 9 0.0000 0.0007

Fig. 5 Cluster Analysis – Service = X11

Third, the service type attribute for the label urp_i is elaborated here [Table – 3].

TABLEIII

CLUSTER ANALYSIS - SERVICE= URP_I

Distance

Main 0.0010 0.0000

Cluster_Group_Nam

e: 0 0.0000 0.0000

Cluster_Group_Nam

e: 1 0.0000 0.0000

Cluster_Group_Nam

e: 2 0.0000 0.0000

Cluster_Group_Nam

e: 3 0.0000 0.0000

Cluster_Group_Nam

e: 4 0.0000 0.0221

Cluster_Group_Nam

e: 5 0.0221 0.0000

Cluster_Group_Nam

e: 6 0.0000 0.0000

Cluster_Group_Nam

e: 7 0.0000 0.0000

Cluster_Group_Nam

e: 8 0.0000 0.0000

Cluster_Group_Nam

e: 9 0.0000 0.0010

It plays an investigation of departure traffic on the whole subnet, also fits with the traffic that's handed down the subnets into the library of famous attacks. Once an attack has been identified, or unnatural behaviour is felt, the alarm can be transmitted to the secretary. An Instance of asystem will be set up it on the subnet in which firewalls are located so as to determine if someone is trying to break in the firewall.

Fig. 6 Cluster Analysis – Service = URP_I

Fourth, the guest login attribute for the label ―1‖ is elaborated here [Table – 4].

TABLEIV

CLUSTER ANALYSIS –GUEST_LOGIN=1 Cluster Actual

Distance

Main 0.0284 0.0000

Cluster_Group_Nam

e: 0 0.0000 0.2067

Cluster_Group_Nam

e: 1 0.2067 0.0000

Cluster_Group_Nam

e: 2 0.0000 0.0000

Cluster_Group_Nam

e: 3 0.0000 0.0000

Cluster_Group_Nam 0.0000 0.0000

(8)

e: 4 Cluster_Group_Nam

e: 5 0.0000 0.0000

Cluster_Group_Nam

e: 6 0.0000 0.0012

Cluster_Group_Nam

e: 7 0.0012 0.0000

Cluster_Group_Nam

e: 8 0.0000 0.0000

Cluster_Group_Nam

e: 9 0.0000 0.0284

Anomaly-based intrusion detection processes have been primarily introduced to detect unknown attacks, simply as a result of accelerated maturation of malware. The fundamental strategy will be to use machine learning how to create a version of activity that is dependable, then compare fresh behaviour contrary to this model. Considering these models might be trained in line with the hardware and applications settings, machine learning established system has an improved generalized land in contrast to conventional signature-based IDS.

Fig. 7 Cluster Analysis – GUEST_LOGIN =1

Fifth, the protocol type attribute for the label TCP is elaborated here [Table – 5].

TABLEV

CLUSTER ANALYSIS –PROTOCOL_TYPE=TCP Cluster Actual

Distance

Main 0.8375 0.8025

Cluster_Group_Nam

e: 0 0.8025 1.0000

Cluster_Group_Nam

e: 1 1.0000 0.0000

Cluster_Group_Nam

e: 2 0.0000 1.0000

Cluster_Group_Nam

e: 3 1.0000 1.0000

Cluster_Group_Nam

e: 4 1.0000 0.0000

Cluster_Group_Nam

e: 5 0.0000 1.0000

Cluster_Group_Nam

e: 6 1.0000 1.0000

Cluster_Group_Nam

e: 7 1.0000 1.0000

Cluster_Group_Nam

e: 8 1.0000 1.0000

Cluster_Group_Nam

e: 9 1.0000 0.1625

Fig. 8 Cluster Analysis – Protocol_Type=TCP

Sixth, the dst_host_diff_srv_rate attribute is elaborated here [Table – 6].

TABLEVI

CLUSTER ANALYSIS – DST_HOST_DIFF_SRV_RATE

Distance

Main 0.0905 0.9198

Cluster_Group_Nam

e: 0 0.9198 0.1017

Cluster_Group_Nam

e: 1 0.1017 0.0994

Cluster_Group_Nam

e: 2 0.0994 0.1463

Cluster_Group_Nam

e: 3 0.1463 0.0866

Cluster_Group_Nam

e: 4 0.0866 0.0305

Cluster_Group_Nam

e: 5 0.0305 0.0783

Cluster_Group_Nam 0.0783 0.1068

(9)

e: 6 Cluster_Group_Nam

e: 7 0.1068 0.0353

Cluster_Group_Nam

e: 8 0.0353 0.0051

Cluster_Group_Nam

e: 9 0.0051 0.0854

Fig. 9 Cluster Analysis – DST_HOST_DIFF_SRV_RATE

Seventh, the service attribute for the label other is elaborated here [Table – 7].

TABLEVII

CLUSTER ANALYSIS –SERVICE=OTHER Cluster Actual

Distance

Main 0.0372 0.9861

Cluster_Group_Nam

e: 0 0.9861 0.0456

Cluster_Group_Nam

e: 1 0.0456 0.0181

Cluster_Group_Nam

e: 2 0.0181 0.0000

Cluster_Group_Nam

e: 3 0.0000 0.0181

Cluster_Group_Nam

e: 4 0.0181 0.0000

Cluster_Group_Nam

e: 5 0.0000 0.0000

Cluster_Group_Nam

e: 6 0.0000 0.0000

Cluster_Group_Nam

e: 7 0.0000 0.0000

Cluster_Group_Nam

e: 8 0.0000 0.0000

Cluster_Group_Nam

e: 9 0.0000 0.0372

Fig. 10 Cluster Analysis – SERVICE=OTHER

Eighth, the detection accuracy is furnished here [Table – 8].

TABLEVIII

CLUSTER ANALYSIS –SERVICE=OTHER Cluster Actual

Label

Detected Label

Detection Accuracy (Matched Label is 100%)(%)

Main anomaly anomaly 99.79

Cluster_Group_Name:

0 anomaly anomaly 99.91

Cluster_Group_Name:

1 anomaly normal 60.00

Cluster_Group_Name:

3 anomaly normal 60.00

Cluster_Group_Name:

Thus, the overall accuracy of the detection process is nearly 91.2% for the complete framework and for the selected clusters, the accuracy is nearly 99%.

Finally, the testing details from the cloud-based environment is again furnished here [Fig – 11].

(10)

Fig. 11 Cluster Analysis – Defined Groups

Also, the detection results from the above findings are showcased here [Fig – 12].

Fig. 12 Cluster Analysis – Detailed Analysis

Henceforth, with the detailed analysis of the obtained results, in the next section of the work, the comparative analysis is furnished.

IX. COMPARATIVE ANALYSIS

In this section of the work, the comparative analysis of the parallel research works with the proposed model is discussed [Table – 9] for the unsupervised learning methods.

TABLEIX COMPARATIVE ANALYSIS

Proposed Method, Author, Year

Parameter Extraction

Data Analysis

Method

Base Method

Detection Accuracy

Clustering, N. P.

Shetty et al, 2016 [12]

No Clustering K – Mean 78%

Clustering, M. Zhu et

al., 2017 [13]

No Clustering Hierarchical 83%

Clustering, A. Sultana et al. 2016

[14]

No Clustering Mean Shift 83%

SVM, W.

L. Al- Yaseen,

2017 [8] Yes

Classification

& clustering

SVM

83%

ACO, P.

Ravi Kiran Varma,

2018 [9] No Optimization

Multiple Clustering

and ACO

90%

Host Log Analysis, M. Zhu,

2017 [10] No Text Mining

K-Mean

89%

DSE-CF-

CI, 2019 Yes

Deep Clustering

Deep

Clustering 92%

IDS and firewall both are identified with the system security however an IDS varies from a firewall as a firewall searches externally for interruptions so as to prevent them from occurring. Firewalls limit access between systems to anticipate interruption and if an assault is from inside the system it doesn’t flag. An IDS portrays a speculated interruption once it has occurred and afterward flag a caution. In the previous section of this work, the detailed discussions on the comparing works are illustrated.

Finally, after the problem formulation, proposal modelling and discussions on the results, in the next section of the work, the research conclusion is presented.

X. CONCLUSION

The increase in the internet-based services have also worked as catalyst for the intrusions. The surveys have demonstrated that substantial increase in the intrusion events are decreasing the motivation of many application providers to open services for automation. Hence, this work aims to demonstrate a novel methodfor intrusion detection using deep clustering method.

The proposed model depends on the deep clustering based on the extracted characteristics of the intrusion attack events. The work relies on the similarity measures of Euclidian distance of the similarity measures of the elements and the Euclidian distance of the similarity measures of the Euclidian distance vectors and generates accurate number of clusters for deep detection of the intrusions. The work demonstrates nearly 92%

(11)

accuracy for the overall design and for selected true positive clusters demonstrates nearly 99% accuracy. With the higher accuracy, this work is proven to be the state-of-the-art system for making the internet-based systems more secure.

REFERENCES

[1] C. Guo, Y. Ping, N. Liu, and S. Luo, ―A two-level hybrid approach for intrusion detection,‖ Neurocomputing, vol. 214, pp. 391–400, 2016.

[2] G. V. Nadiammai and M. Hemalatha, ―Effective approach toward Intrusion Detection System using data mining techniques,‖ Egyptian Informatics Journal, vol. 15, no. 1, pp. 37–50, 2014.

[3] P. Mishra, V. Varadharajan, U. Tupakula, and E. S. Pilli, ―A Detailed Investigation and Analysis of using Machine Learning Techniques for Intrusion Detection,‖ IEEE Communications Surveys & Tutorials, pp.

1–1, 2018.

[4] N. V. Patel, N. M. Patel, and C. Kleopa, ―OpenApplD - application identification framework next generation of firewalls,‖ International Conference on Green Engineering and Technologies (IC-GET), pp. 1–5, 2016.

[5] V. Bontupaui and T. M. Taha, ―Comprehensive survey on intrusion detection on various hardware and software,‖ National Aerospace and Electronics Conference (NAECON), pp. 267–272, 2015.

[6] Z. Dewa and L. A. Maglaras, ―Data mining and intrusion detection systems,‖ International Journal of Advanced Computer Science and Applications (IJACSA), vol. 7, no. 1, pp. 62–71, 2016.

[7] J. Shen, J. Xia, Y. Shan, and Z. Wei, ―Classification model for imbalanced traffic data based on secondary feature extraction,‖ IET Communications: IET Journals, vol. 11, no. 11, pp. 1725–1731, 2017.

[8] W. L. Al-Yaseen, Z. A. Othman, and M. Z. A. Nazri, ―Multi-level hybrid support vector machine and extreme learning machine based on modified K-means for intrusion detection system,‖ Expert Systems with Applications, vol. 67, pp. 296–303, 2017.

[9] P. Ravi Kiran Varma, V. ValliKumari, and S. Srinivas Kumar, ―A Survey of Feature Selection Techniques in Intrusion Detection System:

A Soft Computing Perspective,‖ in Advances in Intelligent Systems and Computing, Singapore: Springer Singapore, vol. 710, pp. 785–793, 2018.

[10] M. Zhu and Z. Huang, ―Intrusion detection system based on data mining for host log,‖ IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (lAEAC), pp. 1742–1746, 2017.11:40 PM

[11] M. Tavallaee, E. Bagheri, W. Lu, and A. Ghorbani, ―A Detailed Analysis of the KDD CUP 99 Data Set,‖ Submitted to Second IEEE Symposium on Computational Intelligence for Security and Défense Applications (CISDA), 2009.

[12] N. P. Shetty, ―Using clustering to capture attackers,‖ International Conference on Inventive Computation Technologies (lCICT): IEEE, vol.

3, pp. 1–5, 2016.

[13] M. Zhu and Z. Huang, ―Intrusion detection system based on data mining for host log,‖ IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (lAEAC), pp. 1742–1746, 2017.

[14] A. Sultana and M. A. Jabbar, ―Intelligent network intrusion detection system using data mining techniques,‖ 2nd International Conference on Applied and Theoretical Computing and Communication Technology, pp. 329–333, 2016.

ABOUT THE AUTHORS

Dr. V. B. Narsimha, Assistant Professor, Department of Computer Science and Engineering, University College of Engineering, Osmania University, Hyderabad.

Dr. G. Narasimha, Professor & NSS Co-ordinator, Department of Computer Science and Engineering, JNTUH College of Engineering, JNTUH-Sultanpur.

B. Sudhakar, Research Scholar, Department of Computer Science and Engineering, JNTUH, Hyderabad and working as an Associate Professor in the Dept of CSE GNIT, Hyderabad.