3983
Classifying Network Traffic Using DPI And DFI
Argha Ghosh, Dr. A. Senthilrajan
Abstract: Nowadays, most of the people are using Internet, for that reason, Internet getting crowded or full of traffic in terms of the network traffic. In between, Hackers/Phishers get best of their chances to make it count for doing their anonymous work comfortably. For managing or handling this much number of traffic it’s a big task to ask for. So, particularly some techniques are needed to check the incoming traffic is malicious or not. Mainly there are three types of network traffic identification methods. And, they are Port-Matching, Deep Packet Inspection and Machine Learning. Port matching is the simpler among those and mainly used in the past. Deep Packet Inspection (DPI) mainly used for High-Speed networks for detect the Network Traffic. And, some of the country’s government likes Egypt, China, etc. is using Deep Packet Inspection for better network traffic identification. Machine Learning mainly used to detect modern-day network traffic. And, it has several classification algorithms like Bayesian identification, Support Vector Machine (SVM), C4.5 and other machine-learning algorithm. This paper proposes a network traffic identification approach using Deep Packet Inspection and Deep Flow Inspection. Besides those above-mentioned identification methods, this paper focuses on P2P traffic identification also because nowadays almost 60%- 80% of traffic comes under P2P traffic.
Index Terms: Deep Packet Inspection, Deep Flow Inspection, Machine Learning, Network Traffic Identification, Port Matching, P2P Traffic Identification.
————————————————————
1 INTRODUCTION
Day by Day Internet getting crowded because most of the people are using the internet, and also with-out the Internet these day human’s life is incomplete, for all those reasons network traffic also gets increased. Most of the people want fast forward identification of network traffic so that common people could continue their surfing, browsing and also Internet services in a faster manner. Network Traffic Identification is fruitful for knowing the sender’s protocol (WWW, FTP, P2P, etc.), sender’s address, sender’s port, receiver’s address; receiver’s port and, size of the payload or packet in the queue, for identifying the content of payload and, also for identifying the Application. Moreover, to check anything got changed in- between Server and Client or between Sender and Receiver. To check any anonymous activity in the middle or not, if there any malicious activity found then stopped the transmission of the payload or packet queue, before it reaches to the client or receiver. In the context of network traffic identification, presently there are three common methods are there, those are Port Matching, Deep Packet Inspection and, Machine Learning. We have been described about all the three methods briefly in Section IV. To modify Moreover, nowadays 60%-80% bandwidth/traffic occupied on the Internet by P2P traffic, Peer-to-Peer(P2P) traffic mainly generated by distributed applications like Skype, BitTorrent, Gnutella, eDonkey2000, QQLive, Fasttrack etc. Here, proposed architecture to identify P2P traffic. The rest of the paper is formulated as follows. Section II discuss about Literature Survey about previously used techniques to detect P2P traffic. Section III presents DPI and DFI and their differences, characteristics, ability, and advantages over each other. Section IV will provide all the three methods of Network Traffic Identification; those are Port Matching, Deep Packet Inspection and Machine Learning. Section V will present the proposed architecture to identify P2P traffic using DPI and DFI. Section VI provides evaluation of the proposed
architecture, and Section VII concludes the paper with future work.
2
RELATED WORK
Bowen Yang et al. [1] proposed architecture to identify network traffic using Deep Packet Inspection and Machine Learning. They implement both DPI and Machine Learning to develop a framework to identify Network Traffic. Liu Zhenxiang et al. [2] proposed a model to identify P2P traffic, they build a recognizer using Naive Bayes machine learning algorithm to identify P2P traffic. Chunzhi Wang et al. [3] proposed a logical view of DPI and DFI, and they include four modules of DPI and DFI traffic identification and a concurrent view of DPI and DFI. Hongwei Chen et al. [4] proposed P2P Traffic Identification Model based on DPI and DFI. They compare between Library of DPI Feature and Library of DPI Method between Library of DFI Feature and Library of DFI Method. And, proposed a coordinate module between DPI Module and DFI Module for identifying the P2P traffic. Hongwei Chen et al. [5] proposed Algorithm Comparison of P2P Traffic Identification based on Deep Packet Inspection. They have been compared between the matching algorithms like Aho-Corasick (AC) algorithm, Wu-Manber algorithm and Set Backward Oracle Matching (SBOM) algorithm. ZebaAtique Shaikh et al. [6] provide an overview of Network Traffic Classification methods, like Payload-Based Traffic Classification, Deep Packet Inspection and Cisco Classification Technologies. Lastly, they present an approach on Naïve Bayesian and Bayesian Neural Network based traffic classification. Jingyu Wang et al. [7] analyzes the characteristics of P2P traffic then presents a traffic identification algorithm and, in the end, evaluates the performance of traffic identification algorithm based on some of the P2P applications like eMule, pplive, kugoo, etc. Song Yang et al. [8] proposed a traffic flow model for optical network traffic based on content identification and they provide an analysis of traffic flow model. Fereshte Dehghani et al. [9] proposed a traffic classification model using Baysian algorithm for real-time traffic classification based on statistical and payload content features.
————————————————
Argha Ghosh is currently pursuing Doctor of Philosophy (Ph. D) in computer science in Alagappa University, India, PH-+918145232677. E-mail: [email protected]
3984
3 DEEP
PACKET
INSPECTION
AND
DEEP
FLOW
INSPECTION
Deep Packet Inspection (DPI) is a real-time network filtering and Internet traffic analyzing technology that mainly works in High-Speed network connection. DPI can be implemented in the application layer of Open System Interconnection (OSI) model. It is called “deep” inspection because the inspection not only includes the packet headers but also covers the packet payloads [10]. Deep Packet Inspection (DPI) technologies are intended to allow network operators precisely to identify the origin and content of each packet of data that passes through the networking hubs [11]. DPI can identify the packet content and packet ID. A classical algorithm for decades, string matching has recently proven useful for deep packet inspection (DPI) to detect intrusions, scan for viruses, and filter Internet content [12].DPI makes network filtering by examining the signature of the payload packet either by string matching algorithms like Wu-Manber, Aho-corasick, and SBOM, or by regular expression matching algorithms which is used in NIDS of Snort, Bro and L7-filter in Linux [13]. DPI uses two approaches to collect data packets, and they are Port Mirroring and Optical Splitter. Port mirroring known as Switched Port Analyzer (SPAN) also, it mainly used to monitor the network traffic. It can perform the task of monitoring each incoming packet in one port of a network. Optical Splitter mainly collects the packet information and used to send the information of a packet to the network manager. To improve programmability and re-configurability, the hardware intrusive detection system is using network processor (NP) to perform pattern search using deep packet inspection [14]. DPI able to detect protocols and applications using three methods and those are Port Detection, Signature Detection and Heuristics Detection. Other characteristics of a high-performance DPI system include flow-based detection (for TCP, UDP and WAP), support for IPv4 and IPv6, TCP/IP normalization and reassembly and rules-based metadata extraction [15]. In most applications, DPI use Signature Detection approach for signature matching through automaton-based pattern matching. Traditional packet forwarding systems like Shallow Packet Inspection (SPI), Medium Packet Inspection (MPI) can’t perform the detection on the Data section of a packet but, Deep Packet Inspection (DPI) can check Data section of a packet. Moreover, Shallow Packet Inspection (SPI) can perform analyzing on Physical and Data Link Layer of Open System Interconnection (OSI) and Medium Packet Inspection (MPI) able to perform the detection task on Transport, Network, Data Link and Physical Layer of Open System Interconnection (OSI) whereas, Deep Packet Inspection (DPI) able to monitor all the Layers of Open System Interconnection (OSI) of reference model. DFI also used for detecting or identifying the network traffic but DFI can’t provide the high accuracy like Deep Packet Inspection (DPI) but still DFI is effective. But both DPI and DFI is promising in terms of identifying network traffic. Traffic identification means classifying through a series of features of flow [5]. DFI mainly uses some of the network flow features like TBF (Total Bytes of Flow), PCF (Packet Count of Flow), DF (Duration of each Flow), APBF (Average Packet Bytes of Flow), etc.
However, DPI used to use some of the methods for string matching and expression matching the same way DFI also uses methods like Support Vector Machine (SVM), Neural Network, Bayes Classifier, Decision Tree etc. Intrusion detection, virus scanning, content filtering, instant-messenger management, and peer-to-peer identification all can use string matching for inspection [14]. Malicious behavior detection is generally classified into two levels: packet level and flow level, for which DPI (Deep Packet Inspection) and DFI (Deep Flow Detection) are representatives [18]. DPI and DFI are two supportive processes of each other in the context of identifying network traffic. Using the deep packet inspection (DPI) technology thoroughly reads the contents of the IP packet payload [19].
4
NETWORK
TRAFFIC
IDENTIFICATION
METHODS
The term Network Traffic Identification mainly refers that identifying the incoming network traffic that mainly generated by the network applications (like WWW, FTP, P2P) in the network, mainly generated by protocol like TCP/IP, SMTP, HTTPS, SNMP, FTP, DNS, POP 3, Telnet, IMAP protocol. For detecting or identifying network traffic commonly there are three methods used and those are Port Matching, Deep Packet Inspection and Machine Learning. In the following have been discussed network traffic identifying methods.
4.1 Identification Method Based on Port Matching
Port matching is the basic and straight-forward method used in network traffic identification. Before DPI and Machine Learning came in the scenario olden days, port matching mainly used to identify the network traffic, but nowadays this technique didn’t used to detect the network traffic. Port matching mainly follows the simple concept of traffic detection; most of the P2P application has their default port like BitTorrent have the port 6881-6889 TCP/UDP. When BitTorrent application runs, it uses this particular port to communicate with the Internet, then network administrator can identify the network traffic, by seeing that any communication made by using that particular default port and then network administrator can assume that its P2P traffic used by that particular P2P application. If a network administrator doesn’t have costly Intrusion Detection System (IDS), without using any IDS, network administrator can understand the type of network traffic as well as type of
3985
application. However, port matching is easy to compare to DPI and machine learning it’s has the biggest drawback also because a user can change their port for the application manually, in that case, this method will not identify the network traffic correctly. Moreover, new P2P applications in the present era used to use the dynamic port, in that context port matching is not effective as well as impossible to identify the network traffic correctly as well as the P2P application also. Previously, people used only HTTP and FTP and though have the port number 80 and 21 respectively, for that reason only in past time port matching was effective but nowadays it can provide mostly inaccurate network traffic identification result.
4.2 Identification Method Based on Deep Packet Inspection
In this approach the data packets are inspected for specific protocol signatures in an effort to identify the originating network applications and, as a result, the traffic is scanned at all the OSI levels (2-7) including the headers and the payloads, extracting a packet signature according to a predefined set of rules [20].Deep Packet Inspection identify the data packet content during network interaction or data transmission in terms of pattern matching, and identify the type of application depends on the content of data packet. Deep packet filtering (DPF) plays a crucial role in sophisticated access control for large networks [21]. DPI can’t get affected by port changes or by dynamic port, and this is the common difference between DPI based network traffic detection and port-based network traffic detection. One of the most frequently performed operations in IDS (DPI) is searching for predefined patterns in the packet payload [22]. DPI can identify the network traffic fast manner. Identification Level of DPI is Packet-level identification and DPI didn’t perform the task of network traffic identification for encrypted traffic. The basic concept of DPI contains content analysis of the captured packets as well as accurate and timely discrimination of the traffic flows generated by different application programs [23]. DPI mainly performs the classification task on strings or bit sequences. DPI mainly has two working modes. In online mode, it copies packet from the buffer and identifies packets with the DPI Method assigned by the user, then it sends packets and identification results to Coordinate Module. In offline mode, it maintains and updates the DPI Feature Database [4]. Furthermore, DPI and its conforming pattern matching algorithms are also important building blocks for other applications in the network such as load balancing and monitoring of network traffic [24].
4.3 Identification Method Based on Machine Learning
Due to different application protocols, network data flow has different characteristics in terms of data flow duration, packet length, packet transmission frequency and packet rate [1]. In this era of Internet, everyone used to use various kinds of applications with internet for that reason the entire internet user’s network traffic is not similar, for that reason all the user’s data packet rate, flow transmission, packet length is also not similar, different than each other. Machine Learning provides a collection of techniques to fundamentally adapt to the dynamic behavior [25]. Several machine learning algorithms are there, like Naive Bayesian classification,
Support Vector Machine (SVM), C4.5, etc. based on packet/ traffic characteristics machine learning algorithm has been implemented to identifying network traffic. Moreover, it’s clear that the machine learning techniques are used to detect modern-day application and its network traffic.
Machine Learning algorithms are used to make decision in terms necessary intelligence or knowledge based on Feature selection. Automated networks can be pushed further with the help of artificial intelligence and machine learning in addition to monitoring [27]. Automated networks mainly designed and implemented by use cases.
4.4 Comparison between Network Traffic Identification Methods
After discussed all the three network traffic identification approaches it’s clear that Port Matching mainly perform in olden days when there is no dynamic port or user can’t set the port manually in that time but whereas DPI is using in modern-day on the high-speed network to perform the network traffic identification to detect the network traffic easily. In fact, deep packet inspection (DPI) is able to accurately classify and control traffic in terms of applications and content [28]. In the same context, whereas Machine Learning is useful to detect the network traffic where the user used to use various kind of application at a time. High-speed network packet forwarding system plays a vital role in many fields, such as packet analysis, virus detection and traffic management [26]. Port matching perform the detection the task of network traffic in port-level or based on the port-number whereas DPI uses the packet-level detection for network traffic detection and machine learning perform based on the characteristics of Packet like packet size, number of packets in the queue, packet transmission frequency, etc.
5
P2P
TRAFFIC
CLASSIFICATION
BASED
ON
DPI
AND
DFI
3986
results with high throughput and minimum latency [27].
DPI module mainly contains two libraries; they are DPI Feature and DPI Method. DPI Feature mainly contains P2P Application and their respective Characteristics code. By checking the respective characteristics code for incoming data packet/network traffic, it can detect P2P Application. For that reason, have been already discussed that if new P2P application’s data packet/network traffic passes through the proposed architecture, that time DPI can’t identify the new P2P Application name, for that reason proposed architecture unable to detect new P2P Application. DPI Method mainly contains three string matching algorithms and those are Aho-Corasick (AC) algorithm, Wu-Manber algorithm and Set Backward Oracle Matching algorithm. All those three-string matching algorithms can be applied to match the string for classifying the P2P traffic. The modern DPI tools use many different strategies in order to determine the application, protocol, and content like: pattern matching, port numbers, packet sizes [28]. DFI module mainly contains two libraries; they are DFI Feature and DFI Method. DFI Feature mainly contains Network Flow Features. And, using those network flow feature like DF (Duration of each Flow), APBF (Average
Packet Bytes of Flow), TBF (Total Bytes of Flow) and, PCF (Packet Count of Flow), DFI mainly perform the task of classification on network traffic. DFI Method mainly contains four artificial intelligence algorithms and they are Artificial Neural Network, Support Vector Machines, Decision Tree and, Naive Bayes Classifier. All those artificial intelligence algorithms can be applied for classifying the P2P traffic. However, already discussed that DPI can’t identify the encrypted traffic compare to that whereas DFI can identify encrypted network traffic. So, for that reason, proposed architecture coupled DFI with DPI, to over-come the encrypted traffic detecting issue. Identification Result segment mainly perform the job of coordinating between DPI and DFI Module and then based on those two modules result, Identification Result segment detecting/identifying the network traffic. It’s mainly performing the job of matching the incoming traffic with DPI and DFI module then it’s making the decision as well as producing the result that the traffic is P2P traffic or not, else what kind of traffic it is.
6 EXPERIMENTAL
EVALUATION
We have been implemented the proposed architecture for classifying the network traffic and tries to utilize the performance of our proposed model. Our proposed design running successfully for classifying the network traffic, we are testing that by Wireshark data-packet analyzer for our reference to classify the network traffic. Our proposed design can’t identify the new P2P application traffic and can’t detect encrypted traffic also due to its ability. In future, planning to implement the Naive Bayesian Classification method for recognizing the encrypted and new P2P traffic based on deep packet load method including DPI, DFI and DCI(Deep Content Inspection) to detect the new P2P application traffic as well as to detect encrypted traffic also because, Deep Packet Inspection (DPI) can’t identify the encrypted and new P2P traffic.
Fig. 3. Real-Time Traffic Analyzing
3987
7 CONCLUSION
AND
FUTURE
WORK
Network traffic classification is a big job to ask for because everyone is using Internet nowadays, that makes the Internet traffic heavy and it’s difficult to identifying the network traffic from those huge number of Internet traffic. Here, we proposed a network traffic identifying architecture using DPI and DFI, both the methods have drawbacks over each other. DPI and DFI identifying the network traffic in terms of their Library Features and Library Methods. In future, planning to implement the Naive Bayesian Classification method for recognizing the encrypted and new P2P traffic based on deep packet load method because, Deep Packet Inspection (DPI) can’t identify the encrypted and new P2P traffic.
ACKNOWLEDGMENT
We would like to thank Dr. K. Kuppusamy for improving the content of this paper, as well as acknowledging the effort of Dr. E. Ramaraj for his guidance. This research work has been written with the financial support of Rashtriya Uchchatar Shiksha Abhiyan (RUSA- Phase 2.0) grant sanctioned vide Letter No. F.24-51/2014-U, Policy (TNMulti-Gen), Dept. of Edn.
Govt. of India, Dt.
09.10.2018. Express appreciation to all those author’s whose references we used in this research-work. Acknowledging Mrs. Anju Ghosh, Mrs. Moumita Ghosh Bairagi, Mr. Bidhan Ghosh and rest of my family members for their Support and Love. Special Thanks’ to Mr. N. Alagu Ganesan and Mr. G. Veerapandi for their helpful hand and Support in this research-work.
REFERENCES
[1] Bowen Yang and, Dong Liu “Research on Network Traffic
Identification based on Machine Learning and Deep Packet Inspection” Available: https://ieeexplore.ieee.org/document/8729153
[2] Liu Zhenxiang, He Mingbo, Liu Song and, Wang Xin
“Research of P2P Traffic Comprehensive Identification
Method” Available:
https://ieeexplore.ieee.org/document/5948739
[3] Chunzhi Wang, Xin Zhou, Fangping You and, Hongwei
Chen “Design of P2P Traffic Identification Based on DPI
and DFI” Available:
https://ieeexplore.ieee.org/document/5374577
[4] Hongwei Chen, Zhengbing Hu, Zhewei Ye and, Wei Liu “A
New Model for P2P Traffic Identification Based on DPI and
DFI” Available:
https://ieeexplore.ieee.org/document/5366295
[5] Hongwei Chen, Fangping You, Xin Zhou and, Chunzhi
Wang “Algorithm Comparison of P2P Traffic Identification Based on Deep Packet Inspection” Available: https://ieeexplore.ieee.org/document/5374593
[6] ZebaAtique Shaikh and, Prof. Dr. D.G. Harkut “An Overview
of Network Traffic Classification Methods” Available: https://pdfs.semanticscholar.org/8efd/03df47062a376fbd1e 8710a10940296643a6.pdf
[7] Jingyu Wang, Jiyuan Zhang and, Yuesheng Tan “Research
of P2P Traffic Identification Based on Traffic Characteristics” Available: https://ieeexplore.ieee.org/document/6001790
[8] Song Yang, Xiaoguang Zhang, Lixia Xi and, Congpeng Lu
“Research Optical Network Traffic Based on the Content Identification” Available: https://ieeexplore.ieee.org/document/6155889
[9] FereshteDehghani, Nasser Movahhedinia, Mohammad
Reza Khayyambashi and, Sahar Kianian “Real-time Traffic Classification Based on Statistical and Payload Content
Features” Available:
https://ieeexplore.ieee.org/document/5473467
[10] SafaAlkateb “White Paper: 5 Things You Need to Know
About Deep Packet Inspection (DPI)” Available: https://docplayer.net/7150123-5- things-you-need-to-know-about-deep-packet-inspection-dpi.html
[11] “White paper on Deep Packet Inspection” Available:
http://tec.gov.in/pdf/Studypaper/White%20paper%20on%20 DPI.pdf
[12] Po-Ching Lin, Ying-Dar Lin, Tsern-Huei Lee and,
Yuan-Cheng Lai “Using String Matching for Deep Packet Inspection” Available: https://ieeexplore.ieee.org/document/4488244
[13] RehamTaher El-Maghraby, Nada MostafaAbdElazim and,
Ayman M. Bahaa-Eldin “A Survey on Deep Packet Inspection” Available: https://ieeexplore.ieee.org/document/8275301
[14] N. Weng, L. Vespa and, B. Soewito, “Deep packet
pre-filtering and finite state encoding for adaptive intrusion detection system” Available: https://www.sciencedirect.com/science/article/abs/pii/S1389 128610003749
[15] Chengcheng Xu, Shuhui Chen, JinshuSu, S.M. Yiu and,
Lucas C.K. Hui “A Survey on Regular Expression Matching for Deep Packet Inspection: Applications, Algorithms and Hardware platforms” Available: https://ieeexplore.ieee.org/document/7468531
[16] Yu-tong Guo, Yang Gao, Yan Wang, Meng-yuan Qin, Yu-jie
Pu, Zeng Wang, Dan-dan Liu, Xiang-jun Chen, Tian-feng Gao, Ting-ting Lv and, Zhong-chuan Fu “DPI &DFI: a Malicious Behavior Detection Method Combining Deep Packet Inspection and Deep Flow Inspection” Available: https://www.sciencedirect.com/science/article/pii/S1877705 81730276X
[17] Li Wei, Liu Hongyu and, Zhang Xiaoliang “A Network Data
Security Analysis Method Based on DPI Technology”
3988
Available: https://ieeexplore.ieee.org/document/7883228
[18] S. Zamfir, T. Balan, F.Sandu and, C.Costache “Solutions for
Deep Packet Inspection in Industrial Communications” Available: https://ieeexplore.ieee.org/document/7528337
[19] Yi-Hui Lin, Shan-Hsiang Shen, Ming-Hong Yang, De-Nian
Yang and, Wen-Tsuen Chen “Privacy-Preserving Deep Packet Filtering over Encrypted Traffic in Software-Defined
Networks” Available:
https://ieeexplore.ieee.org/document/7510993
[20] ZouheirTrabelsi, SafaaZeidan and, Mohammad M. Masud
“Network Packet Filtering and Deep Packet Inspection Hybrid Mechanism for IDS Early Packet Matching” Available: https://ieeexplore.ieee.org/document/7474172
[21] Gaolei Li, Mianxiong Dong, Kaoru Ota, Jun Wu, Jianhua Li
and, Tianpeng Ye “Deep Packet Inspection based Application-Aware Traffic Control for Software Defined
Networks” Available:
https://ieeexplore.ieee.org/document/7841721
[22] RoaaShubbar and, Mahmood Ahmadi “Fast 2D filter with
low false positive for network packet inspection” Available: https://ieeexplore.ieee.org/document/8245943
[23] Danish Rafique and, Luis Velasco “Machine Learning for
Network Automation: Overview, Architecture, and Applications” Available: https://ieeexplore.ieee.org/document/8501533
[24] Fabien Boitier and, Patricia Layec “Automated Optical
Networks with Monitoring and Machine Learning” Available: https://ieeexplore.ieee.org/document/8473802
[25] Mohammad Al-hisnawi and, Mahmood Ahmadi “Deep
Packet Inspection Using Quotient Filter” Available: https://ieeexplore.ieee.org/document/7548376
[26] Hao BI and, Zhao-Hun WANG “DPDK-based Improvement
of Packet Forwarding” Available: DOI: 10.1051/itmconf/20160701009
[27] Mohammad Al-hisnawi and, Mahmood Ahmadi “QCF for
deep packet inspection” Available: https://ieeexplore.ieee.org/document/8444520
[28] B. Renukadevi and, Dr. S. Daniel Madan Raja “Deep