CLASSIFICATION BASED NOVEL FRAMEWORK FOR NETWORK
TRAFFIC ANALYSIS IN CLOUD COMPUTING
Sourav Debnath1, Vijay Kumar Jha2
1 Student, M. Tech, Department of Information Technology, Birla Institute of Technology, Mesra 2 Associate Professor, Department of Information Technology, Birla Institute of Technology, Mesra
ABSTRACT:
At this present time, the need of Network Traffic Classification schemes grow day-by-day. For that reason, many researchers are attracted to this field of research. Recently, many new machine learning algorithms prepared to analyze the network traffic. New machine learning algorithms are coming in this field for building the network traffic classifiers. This paper proposes a novel framework for Network Traffic Analysis in Cloud Computing. In the cloud, the network traffic data collected and those data are in the cloud database and a machine learning system formed. In a cloud computing scenario, the network traffic data sends to the classification machine or clustering machine as per the labelled or unlabelled network traffic data. The design of the system is scalable and the system can also control remotely.
Keywords: Network Traffic Analysis, Classification, Data Mining, Cloud Computing, Classification
[1] INTRODUCTION
Recently, Cloud network traffic classification has attracted many researchers. Traffic classification schemes plays a crucial role in cloud network traffic management. A proficient decision policy of cloud network traffic is required to secure the cloud computing network. The supervised, semi-supervised and unsupervised machine learning algorithms are employed in cloud network traffic classification. The machine learning based classification schemes can give the high accuracy of analysis.
Single classifier based architectures are the most popular in network traffic classification. To achieve effective cloud network traffic classification, a new architecture require in which each classifier will operate independently. The model built for the cloud network traffic classification cannot maintain easily. For that, we need a proper model that can incorporate all the rules of classification scheme. In cloud computing platforms, the network traffic classification is more scalable, flexible which improves the accuracy of the model by parallel classification.
In cloud computing, a database is required to collect all the information and the machine learning based training system is adopt to classify the cloud network traffic. The cloud based network analysis systems will be very scalable. The learner and classifier machines gather
cloud network traffic information in some databases and in this way, we get the opportunity to identify the cloud network application traffic.
The main contributions of this study are the following:
To present a proper framework which gives us the support to analysis the cloud network traffic flow. We are presenting a basic framework for the cloud network traffic classification.
[2] RELATED WORK
The cloud network traffic flows require a proper classification model by which we can classify the applications. Now-a-days, the cloud network traffic flows are encrypted for the security reasons. Still, threat’s level is not decreasing. We need a modern machine learning based system which can identify any kind of unauthorized or any kind of Security Bridge. Machine learning algorithms have applied to the attributes captured from the cloud network traffic flows. Naïve Bayesian Classification and the Fast Correlation Based Feature Selection [1] help to select the required features of the network traffic flow. The Support Vector Model [2] is very well to classify the internet traffic flow.
To identify cloud network traffic flow of the cloud applications according to their different behavior among different categories of applications [3] like social level, functional level and application level. The classification not only based on port based classification, it captures the full payload of the packets to classify the cloud application services.
In the recent days, the real-time network traffic monitoring and classification has drawn many researchers attention to create traffic identification methods by using supervised, unsupervised and semi-supervised machine learning algorithms. The supervised machine learning algorithm’s work is to label the training data. It analyses the training data to produce a function for mapping new data. The unsupervised machine learning algorithm’s try to find the hidden path of the unlabelled data. The semi-supervised machine learning algorithm used for the unlabelled data for the training, mainly for less amount of labelled data with large amount of unlabelled data. Unlabeled data instances and labelled data instances are both include in the semi-supervised. The labelled data instances are used to train the model in which the unlabeled data instances contain. [4] Has applied the semi-supervised machine learning algorithm and the layer modelling for 11 flow size attributes which help to achieve more than 95% accuracy at category level.
Each flow has certain set of flow attributes of characteristics on the perspective of application layer [6]. Ability of the flow attributes were tested with different machine learning algorithms. In protocol level identification, the method works better than existing methods for the real-time traffic classification and it can also identify the encrypted protocols. [5] proposed a method which has the capability to characterize both TCP and UDP flows from an application level perspective.
The classification must be dynamic in the recent age because in cloud computing, the application didn’t use a selected port. Many applications use dynamic port numbers means, those applications use to change the port numbers they are going to use for the different time and decoding the protocols used by the applications requires high amount of computing resources [7]. The method has a disadvantage that it is not feasible for the unknown or encrypted protocols. It uses feature selection to get an optimal feature set and determine the influence of different features.
The online traffic classification gets more attraction. An automatic online traffic classification architecture really helps to classify the network flow using unsupervised techniques. This helps to demonstrate the capability and efficiency of the automated classification architecture [8].
The semisupervised learning algorithm for the classification of cloud computing network traffic flow uses flow statistics to classify traffic. The semisupervised learning algorithm allows classifier to design from the training data consisting of labelled and many unlabelled flows. After clustering and classification applied to label the data for labelling the clusters [9]. The application of the cloud network can be identified by traffic classifications on the cloud network. The machine learning based application identification helps to increase the accuracy to classify the cloud network [10]. Real time traffic classification is also new to this study and can help to identify the traffic generate by the internet applications and also identify any illegitimate pattern in the network traffic [11]. The use of Machine Learning algorithms are gaining popularity due to its widespread availability and to its somewhat straightforward application to internet traffic. [12] introduce Adaboost Dynamic with Logistic Function, an extension of Adaboost that combines various classifiers to improve the final hypothesis. The hierarchical classifier model help to accelerate the network training without loss of accuracy [13]. In cloud computing network environment, the classification tool is very essential tool to analyse networks and maintain system security. Unknown flows which generate by the unknown applications are classifying to detect the unknown flow [14]. There are some associative classifier to solve traffic staticsbased classifier problems. Those associative classification algorithms are Classification Based on Association, Classification based on.
Multiple Association Rules, and Classification based on Predictive Association Rules which applied on the real dataset to analyse the internet traffic [15]. The data rates of Internet links are increasing with the help of optical technology. The high speed network’s packet need to be classified very quickly. Different types of packet classification schemes have been developed, but they required a certain amount of memory access as if classification is very complex and the memory is slow. [16] approached to provide the support the fast memory, such as cache, in computer systems, to support packet classification schemes. This scheme makes the faster and smaller memories and also reduces the number of memory accesses to classify the packets. DeepPacket Inspection is gaining popularity within the traffic classification field and its concept for analysis of the contents of the captured packets. DeepPacket Inspection devised as a means to address several issues associated with port and statistical based classification approaches for achieving an accurate and timely traffic classification [17]. To improve the performance of the traffic classification systems, an analysis of existing traffic classification techniques presented [18]. [19] proposed a novel approach to accurate classification of P2P traffic at a finegrained level, which depends solely on the number of special flows during the small time intervals. The clustering flows are defined as the most frequent and steady flows generated by P2P applications by detecting the appearance of the corresponding
clustering flows. The experimental results showed that this approach can able to classify P2P application correctly with an average true positive rate of above 98% and a very negligible false positive rate 0.01%. The accuracy of the identification and classification of network traffic according to application to the type of the application is an important element for many network management tasks. A P2P network traffic can be classified according to the application types with the statistical characteristics of network traffic. [20] works on mainly with the traffic classification problem on four application types of P2P, namely, BitTorrent, PPLive, Skype, MSN. This classification framework based on Support Vector Machine. [21] implemented a network traffic classification model on the basis of the Gaussian Mixture Model-Hidden Markov Model using packet-level properties in network traffic flows. The focus is now on the effective early detection scheme to distinguish Distributed Denial of Service (DDoS) attack traffic from normal flash crowd traffic. [22] proposed a reliable identification model for flash crowd and DDoS attacks. A Probabilistic Neural Network based traffic pattern classification method is proposed for the classification of attack traffic from legitimate traffic. The proposed technique uses the normal traffic profile for classification process which consists of single and joint distribution of various packet attributes. The proposed method has the highest classification accuracy for DDoS Flooding attacks with less than 1% of false positive rates. By data mining and machine learning techniques to detect network intrusion often faces the problem of very large training dataset size. [23] demonstrate a simple active learning procedure can dramatically reduce the size of the training data without significantly sacrificing the classification accuracy of the intrusion detection model. Actively trained neural network model compared with a C4.5 decision tree indicated that the actively learned model had better generalized accuracy. All packets is belonging to the same flow and obey pre-defined rules and are processed in a similar manner by the router. [24] shows various packet classification algorithms for packet filtering, policy routing, accounting and billing, traffic rate limiting, traffic shaping etc. Multiclass Classification is a very important technique. Genetic algorithms are proven to be effective to select features to multiclass classification by support vector machines. [25] introduced a hybrid method for network traffic classification. It combined port-based, signature string matching, regular expression matching and machine learning methods. This method able to achieve high speed and accurate traffic classification. A new semi-supervised method presented to improve traffic classification performance when very less supervised data are available. At the training stage, it uses the flow correlation to extend the supervised data set by automatically labelling unlabelled flows according to their correlation to the pre-labelled flows. Consequently, traffic classifier perform better due to extend size and quality of the supervised data sets. At the testing stage, correlated flows are identified and classified jointly by combining their individual predictions, to boost the classification accuracy [26].
We use feature selection to find an optimal feature set and determine the influence of different features Cloud computing is gaining popularity in the recent years. The cloud computing software services are very dynamic that it can scale up or scale down very easily as per the requirement. Based on the above research study on cloud network traffic flow classifications, a proper distributed framework for the cloud network traffic classification required.
[3] TRAFFIC CLASSIFICATION SERVICES ON CLOUD PLATFORM
A. Architecture of Traffic Classification Service for Cloud Network Classification Service:
For cloud computing network traffic flow analysis for application identification have many advantages. First, the cloud computing’s network management devices are responsible for the classification and clustering platform which require for the cloud based network traffic flow analysis. The classification schemes need the feature selection algorithms to reduce the
amount of statistical attributes for classifying each flow. The classification schemes decrease the addition overhead for the cloud network management devices. The clustering schemes are adaptable to help single out useful features that distinguish different groups. The network administrators are able to improve the algorithm for the classification and clustering schemes to enhance the classification or clustering accuracy from time to time. The cloud computing network platform is a distributed kind of a platform, which is more scalable from the training data sets and concurrently supporting more traffic classification requests.
The classification on the basis of the use of the port number of the transport layer had done in the early days of cloud computing network service. Only classification on the basis of the port number supply very limited number of information. The more reliable model need more information to classify the cloud network traffic flow. For that a payload based cloud network traffic classification require.
The cloud based network traffic classification services consists several parts. Those are: 1. Learner Machine
2. Classifier Machine
3. Learner Machine Management Server 4. Classifier Machine Management Server 5. Cloud Computing User
6. Network Management Devices 7. Network Administrator
Learner Machine: The learner machine associate to training and test the data sets. The cloud computing network administrators will select the supervised, unsupervised and semi-supervised machine learning algorithms to build the model.
Classifier Machine: The classifier machine receives the classification request from various network devices and classify the cloud network traffic flow.
Learner Machine Management Server: The learner machine management servers are responsible for supervising the learner machines according to the computing resource required. The request for the training service sent to the Learner Machine Management Server and then, the Learner Machine Management Server will assign required number of Learner Machines to train the training tasks.
Classifier Machine Management Server: The classifier machine management servers are responsible for monitoring the classifier machines which needed to classify the data set. The network devices send the request for the classifier machine to the classifier machine management server. The classifier machine management server selects one or more classifier machine to handle the classification request and send the request to the network device along with the IP address of the classifier machine. After that, the network devices can be able to send the classification request to the respective classifier machine.
Cloud Computing User: The cloud computing users are those who use the application of the cloud computing service. To secure the cloud computing network, we required cloud computing classification framework that can able to classify the network traffic flow. The system gathers the information the network traffic flow of the cloud computing network. The client can able to identify by their IP address and user id of that cloud computing; the user’s id also binds with the application id which can able to fetch from the IP packet header.
Network Management Devices: Network Management Devices are liable to take the instruction from the classifier management server and then sends it to the classifier machines and updated classifier data set stored in the database.
Network Administrator: Network administrators are responsible to monitor the whole network and can able to do it by the checking each and every network components.
The overall structure of classification based novel framework for network analysis in cloud computing is shown in figure-1.
Figure: 1. Basic architecture of traffic classification service on cloud computing platform B. The Training Service:
The cloud computing network traffic need two step classification service: training and classification. First, we require to train the system. To do that, the cloud network management devices send the labeled data to the training system.
Firstly, the network management devices send the requests for the training services to the learner management server. Then, the learner management server check the status of the learner machines and then select one or more learner machine as per the load of the training request. Then, the learner management server sends the IP address, port number and the learner machine id to the network management device. Then, the network management devices send a request to the learner machine and check the learner machine id with the learner machine to authenticate the learner machine. After the authentication of the learner machine, the network management device sends the training data to the learner to the learner machine / learner machines for the training phase. When the learner machine finish its learning phase, the classification rules generated and stored in databases. Figure-2 shows the training service system.
Figure: 2. Training Service C. The Classification Service:
After the training service completed, the classification service starts. For classification service, the network management devices send the unlabeled data to the classification system. The basic operation of the classification services are stated below.
Firstly, the network management devices capture the new data flow from the user of the cloud computing network. Then, the network management devices send a classification request to the classification management server for asking the classification service. After the classification management server receives the request for the classification request, it will check the status of the classification machines and classification management server assign one or more classification machines for the classification service. Then, the classification management server send the IP address and the port number of the classification machines to the network management server. Then, the network management devices send the new network traffic data to the classification machines. After that the classification machines load the classification rules from the database and complete the classification process for the new cloud computing network traffic data and send the classification result to the administrator database and its local database. Figure-3 shows the classification service system.
Figure: 3. Classification Service C. Feature Selection for Classification:
The feature is required for the cloud computing network traffic flow classification. For getting more classification accuracy, we always select some attribute from the IP packet header. The optimal set of flow features will decrease the cost of the classification service and also minimize the required amount of memory for storing information about each flow. Feature selection play a great role in classification of the cloud computing network traffic flow, which have high computation and memory consumption. The flow feature selection can able to increase classification accuracy and also minimize the learning or classification times.
D. Packet Decode Module, Management Server Control and The process of training and classification module:
In cloud computing network management devices have a packet decoding module to decode the IP packet header. The Packet Decode Module is responsible for the decoding the IP packet header and then the packet decode module sends it to the flow process module are responsible for computing the required flow attribute and manage the a table to store the flow attribute. Then, it send the necessary field information instance to the learner machine or the classifier machine. Management servers sends the IP address and the port number of the learner/classifier machine to get the selected service.
The learner/classifier management server’s main job is to receive the training request or classification request from the cloud network management devices and send the control message to the network management devices and receive the response message from the management server.
As per the requirement, the training and classification algorithms have been chosen for the training service and also for the classification service. The classifier machine store the classifier rule to its database. Whenever, the classifier machine wants to classify a new cloud network traffic flow, classifier machine fetch its rule from the database.
[4] CONCLUSION
We proposed a new framework for the cloud computing network traffic classification system. The framework includes the training service of the system for the network traffic flow and also classifies the new traffic flow. The system is so scalable that we can able to add or remove components from the system as per requirement. The framework is designed in such a way that it can able to gather information about the user who use the cloud computing application service from the packet header and send selected attribute of the packet header to the classification machine for classification. Here, we put our attention on the pay-load based classification system. For load sharing, we used management servers and multiple classification and training machine. The framework gathers the information from the IP packet of the user and send them for training. After training, the new network flow’s information send for classification. Our future work will be implementing a more scalable and secure classifier architecture for the cloud computing network and a traffic classification algorithm for the cloud computing network that fit with the architecture.
[1] A.W. Moore, D. Zuev, Internet traffic classification using Bayesian analysis techniques, in: Proc. ACM SIGMETRICS international conference on Measurement and modeling of computer systems, Banff, Alberta, Canada, 2005, pp. 50-60.
[2] Z. Li, R. Yuan, X. Guan, Accurate Classification of the Internet Traffic Based on the SVM Method, in: Proc. IEEE Int. Conference on Communications (ICC '07), Glasgow, Scotland, 2007, pp. 1373-1378
[3] Thomas Karagiannis, Konstantina Papagiannaki, and Michalis Faloutsos, BLINC: multilevel traffic classification in the dark, Proc. In: The 2005 conference on Applications, technologies, architectures, and protocols for computer communications, pp. 229240 [4] J. Erman, A. Mahanti, M. Arlitt, I. Cohen, C. Williamson, Offline/realtime traffic classification
using semi-supervised learning, Performance Evaluation, 64(9-12) (2007), pp. 1194-1213
[5] N.-F. Huang, G.-Y. Jai, H.-C. Chao, Early Identifying Application Traffic with Application Characteristics, in: Proc. IEEE Int. Conference on Communications (IEEE ICC '08), Beijing, China, 2008, pp. 5788-5792.
Nen-Fu Huang ; Dept. of Comput. Sci., Nat. Tsing Hua Univ., Hsinchu ; Gin-Yuan Jai ; Han-Chieh Chao, Early Identifying Application Traffic with Application Characteristics, Published in:, pp. 5788 - 5792
[6] Zander, S. ; Centre for Adv. Internet Archit., Swinburne Univ. of Technol., Melbourne, Vic. ;
Nguyen, T. ; Armitage, G.Automated traffic classification and application identification using machine learning, in: Local Computer Networks, 2005. 30th Anniversary, Sydney, NSW, pp. 250 – 257
[7] Jian Zhang ; Beijing Univ. of Posts & Telecommun., Beijing, China ; Zongjue Qian ; Guochu Shou ; Yihong Hu, Online automatic traffic classification architecture in access network, in:
Electronic Measurement & Instruments, 2009. ICEMI '09. 9th International Conference, pp. 3-24 - 3-29
Shrivastav, A. ; Dept. of Comput. Eng., Shri GS Inst. of Tech. & Sc., Indore, India; Tiwari, A., Network Traffic Classification Using Semi-Supervised Approach, in: Machine Learning and Computing
(ICMLC), 2010 Second International Conference, pp. 345 – 349
[8] Nen-Fu Huang ; Dept. of Comput. Sci., Nat. Tsing Hua Univ., Hsinchu, Taiwan ; Gin-Yuan Jai ;
Chih-Hao Chen ; Han-Chieh Chao, On the cloud-based network traffic classification and applications identification service, in: Mobile and Wireless Networking (iCOST), 2012 International Conference, pp. 36 - 41
Rocha, E. ; Inst. de Telecomun., Univ. of Aveiro, Aveiro, Portugal ; Salvador, P. ; Nogueira, A., A real-time traffic classification approach, in: Internet Technology and Secured Transactions (ICITST), 2011 International Conference, pp. 620 - 626
de Souza, E.N. ; Sch. of Inf. Technol. & Eng., Univ. of Ottawa, Ottawa, ON, Canada ; Matwin, S. ; Fernandes, S., Network traffic classification using AdaBoost Dynamic, in: Communications Workshops (ICC), 2013 IEEE International Conference, pp.1319 – 1324
Stacey, D.A. ; Dept. of Comput. & Inf. Sci., Guelph Univ., Ont., Canada ; Farshad, R., A probabilistic self-organizing classification neural network architecture, in: Neural Networks, 1999. IJCNN '99. International Joint Conference, pp. 4059 - 4063 vol.6
Jun Zhang ; Sch. of Inf. Technol., Deakin Univ., Melbourne, VIC, Australia ; Chao Chen ; Yang Xiang ;
Wanlei Zhou, An Effective Network Traffic Classification Method with Unknown Flow Detection, in:
Network and Service Management, IEEE Transactions, pp. 133 - 147
Long Li ; Dept. of Electr. & Comput. Eng., Western Univ., London, ON, Canada ; Kianmehr, K., Internet traffic classification based on associative classifiers, in: Cyber Technology in Automation, Control, and Intelligent Systems (CYBER), 2012 IEEE International Conference, pp. 263 – 268
Guinde, N.B. ; Electr. & Comput. Eng. Dept., New Jersey Inst. of Technol., Newark, NJ, USA ; Rojas-Cessa, R. ; Ziavras, S.G., Packet classification using rule caching , in: Information, Intelligence, Systems and Applications (IISA), 2013 Fourth International Conference, pp. 1 – 6
Finsterbusch, Michael ; Faculty of Computer Science, HTWK Leipzig, Germany ; Richter, Chris ; Rocha, Eduardo ; Muller, Jean-Alexander; Hanssgen, Klaus; A Survey of Payload-Based Traffic Classification Approaches, in: Communications Surveys & Tutorials, IEEE (Volume:16 , Issue: 2 ), pp. 1135 - 1156
Yibo Xue ; Tsinghua Nat. Lab. for Inf. Sci. & Tech., Beijing, China ; Dawei Wang ; Luoshi Zhang, Traffic classification: Issues and challenges, in: Computing, Networking and Communications (ICNC), 2013 International Conference, pp. 545 - 549
He Jie ; Coll. of Comput., Nat. Univ. of Defense Technol., Changsha, China ; Yang Yuexiang ; Qiao Yong ; Tang Chuan, Accurate classification of P2P traffic by clustering flows, in: Communications, China (Volume:10 , Issue: 11 ), pp. 42 – 51
Ai-Min Yang ; Sch. of Inf., Guangdong Univ. of Foreign Studies, Guangzhou ; Sheng-Yi Jiang ; He Deng, A P2P Network Traffic Classification Method Using SVM, in: Young Computer Scientists, 2008. ICYCS 2008, pp.398 – 403
Xuefeng Mu ; Nat. Key Lab. of Software Dev., Beijing Univ. of Aeronaut. & Astronaut., Beijing, China ; Wenjun Wu, A Parallelized Network Traffic Classification Based on Hidden Markov Model, in: Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2011 International, pp. 107 – 112
Akilandeswari, V. ; Dept. of Comput. Sci. Eng., Thiagarajar Coll. of Eng., Madurai, India ; Shalinie, S.M., Probabilistic Neural Network based attack traffic classification, in: Advanced Computing (ICoAC), 2012 Fourth International Conference, pp.1 – 8
Seliya, N. ; Comput. & Inf. Sci., Univ. of Michigan-Dearborn, Dearborn, MI, USA ; Khoshgoftaar, T.M., Active learning with neural networks for intrusion detection, in: Information Reuse and Integration (IRI), 2010 IEEE International Conference, pp.49 – 54
Dixit, M. ; E & TC Dept., Cummins Coll. of Eng. For Women, Pune, India ; Barbadekar, B.V. ; Barbadekar, A.B., Packet classification algorithms, in: Industrial Electronics, 2009. ISIE 2009. IEEE International Symposium, pp.1407 – 1412
Hui Dong ; Sch. of Comput. Sci. & Technol., Harbin Univ. of Sci. & Technol., Harbin, China ; Guang-Lu Sun ; Dan-Dan Li, A hybrid method for network traffic classification, in: Measurement, Information and Control (ICMIC), 2013 International Conference, pp. 653 – 656
Jun Zhang ; Sch. of Inf. Technol., Deakin Univ., Melbourne, VIC, Australia ; Chao Chen ; Yang Xiang ; Wanlei Zhou, Semi-supervised and Compound Classification of Network Traffic, in: Distributed Computing Systems Workshops (ICDCSW), 2012 32nd International Conference, pp. 617 – 621