• No results found

Toward line rate Traffic Classification

N/A
N/A
Protected

Academic year: 2021

Share "Toward line rate Traffic Classification"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Toward line rate Traffic Classification

Toward line rate Traffic Classification

Niccolo' Cascarano

Politecnico di Torino

(2)

Background

Background

In the last years many new traffic classification algorithms

based on statistical approach

One of the claims of these new algorithms is that their

computational requirements are lows than “Deep Packet

Inspection” [3-8]

DPI is commonly considered too expensive

Is that true?

Can DPI be further improved?

Is there anything better than DPI?

(3)

3

The path toward the answers

The path toward the answers

Create a model of some classifiers (currently, DPI, Naïve

Bayes and SVM) and compare their complexity

Joint work with Università di Brescia

Improve the DPI engine itself

(4)

Question 1: is DPI so computationally

Question 1: is DPI so computationally

complex?

(5)

5

What is DPI?

What is DPI?

DPI = pattern matching through regular expressions

Two main flavors:

Packet-Based per-Flow State (PBFS): network data are analyzed

on a packet-by-packet basis as soon packets are received by the

classifier

Message-Based per-Flow State (MBFS): network data are analyzed

as an unique stream of data after TCP/IP normalization

PBFS seems roughly equivalent MBFS

with respect to traffic

classitication

[1-2]

We use

PBFS DPI classifier

+ capability to analyze correlated

session (e.g., FTP and SIP)

(6)

Methodology

Methodology

Cost modeling

Average cost per packet (instead of worst-case)

Modeled each classifier

Derived the cost of each block

Determined the transition probability from one block to the

other by analyzing real traces (with ground truth [26])

Derived the min/max/average cost per packet

(7)

7

Models

Models

DPI

SVM

Session ID Extracion

extracts the

L3 and L4 information from network

packets

Session lookup

checks within the

“session table” if a packets belongs

to a classified session

Pattern matching

implements the

pattern matching algorithm (DPI only)

SVM decision

implements the

SVM classification algorithm (SVM

only)

Session update

updates the

“session table” with the outcome of

the classification

Correlated session

it analyzes the

application data for obtaining

information on correlated sessions

(DPI only)

(8)

Basic blocks implementation

Basic blocks implementation

Session ID extraction

: native assembly code for IA32 generated

NetVM framework [19]

Session Lookup

e

Session Update

: C++ code using

hash_map

container of extended STL C++ library [18]

Pattern matching

: C++ code implementing a DFA-based

algorithm generated by Flex [20]. About 30 application protocol

are recognized (NOTE: the cost of this block

does NOT depend

on

the number of protocol recognized)

SVM Decision

: C++ code written exploiting the multivariate

Gaussian joint density function. We generated the models for

recognizing about 10 application protocols. (NOTE: the cost of this

block linearly

DEPENDS

on the number of protocol recognized)

Correlated Session

: C++ code written on purpose deriving

correlated session rules for FTP and SIP protocol from the NetPDL

database [17]

(9)

9

Experimental evaluation

Experimental evaluation

Costs of each block measured with the RDTSC instruction

Costs dependent on the input traffic (e.g. DFA) is further

characterized in order to push relevant parameters in the

final formula

Traffic traces

UNIBS

trace contains a big percentage of p2p traffic, known

to be challenging for DPI classifiers

POLITO

trace contains a medium size campus network traffic

trace (~6000 hosts within the network)

(10)

Absolute costs of each basic block

Absolute costs of each basic block

Pattern matching depends on the packet size

(11)

11

Comparison

(12)

Comparison

Comparison

Legend

Best case: all the packets belong to already classified sessions

(fast path)

Worst case: all the packets need to take the slow path

Average case: the costs are normalized using the execution

probabilities of each basic block

Results

DPI classifier has the same order of magnitude of the other

ones, even for UNIBS challenging trace

May be better on some traces

(13)

13

Conclusion 1

Conclusion 1

Packet-based DPI may not be as complex as we thought,

(14)

Question 2: can we reduce DPI cost?

(15)

15

Yes, We Can

Yes, We Can

… if we focus on traffic classification and not network

security

(16)

(1) Use fast algorithms

(1) Use fast algorithms

Min (ticks) Avg (ticks)

Max (ticks)

Flex (canonical DFA)

76

3980

19147

PCRE (NFA-based)

35.7K

2.08M

9.16M

DFA is simple and O(payload_length)

Key question: is the DFA usable?

(17)

17

(2) Use “friendly” regular expressions

(2) Use “friendly” regular expressions

(18)

(2) … and convert some in “friendly”

(2) … and convert some in “friendly”

Baseline: not anchored + Kleene

http 

unknown

unknown 

http

Anchored (on UNIBS-GT)

0%

0%

Anchored + Kleene (on UNIBS-GT)

0%

0%

Anchored (on POLITO)

0.004%

0.38%

Average cost on HTTP

Match (ticks)

No match

Anchored

1663

1415

Anchored + Kleene

5622

1367

Not anchored + Kleene

5503

3300

(19)

19

(3) Use a packet-based approach

(3) Use a packet-based approach

Unknown TCP

traffic

Additional classified

TCP traffic

POLITO

23.5GB

2.6MB

(20)

(4) Snapshot-based classification

(21)

21

(4) Snapshot-based classification

(4) Snapshot-based classification

Fair speedup with

TCP traffic

(22)

(5) Limiting classification attempts

(5) Limiting classification attempts

Avg # pkts

Std dev

UNIBS-GT (TCP)

654

4619

POLITO-GT (TCP)

563

3659

POLITO (TCP)

68

1879

UNIBS-GT (UDP)

2.62

0.71

POLITO-GT (UDP)

6.05

26.4

POLITO (UDP)

9.17

476

Avg # pkts

Std dev

Bittorrent (TCP)

1

0

Samba (TCP)

1.01

0.29

HTTP (TCP)

1.05

15.6

Skype (UDP)

1.7

437

SSL(UDP)

1.92

267

(23)

23

(5) Limiting classification attempts

(24)

(5) Limiting classification attempts

(25)

25

(5) Limiting classification attempts

(5) Limiting classification attempts

Possible high

speedup with TCP

(26)

(4)+(5) Snapshot + Attempts limit

(27)

27

Conclusions 2

Conclusions 2

DFA is OK for traffic classification

Fast algorithms

Up to 3 orders of magnitude

“friendly” regex

May achieve up to 5 times speedup

No message-based processing

Snapshot = 256 for UDP and fair attempts limit (e.g. 10)

Fairly small packets; signature that operate on packet sequences

Strict attempt limit for TCP (N=2)

Able to catch response packets

A speedup of 15 on results in Conclusion1 gives 20Mpps on a 3GHz

CPU

(28)

Addendum

Addendum

What are regex?

We usually assume regex= regular expressions (e.g. PERL)

We believe this model is not powerful enough to cope with

modern traffic classification

We have to think about a more extended model

E.g. currently Skype and RTP are detected with some

imperative code in addition to regex

(29)

29

Is there anything better than DPI?

(30)

Better perhaps no, but…

Better perhaps no, but…

Service-Based Traffic Classification is surely an answer

Not exactly a replacement of DPI

Instead, something orthogonal to (I would like to say most)

traffic classification approaches

Service-Based Classification:

Once you associated (IP, port) with Service S, all established

sessions that insist on that endpoint are associated to S

(31)

31

Service-Based Traffic Classification

Service-Based Traffic Classification

No further details are provided in this presentation

However, a lot of analysis done that confirm that it really

works

By-product: if the first classification is correct, a lot of more

traffic classified

A service with a few sessions in clear and most encrypted

traffic

(32)

SBC: Services vs. sessions

SBC: Services vs. sessions

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

200000

0

20

40

60

80

100

120

140

160

Time (hours)

Services

Sessions

(33)

33

Conclusions

Conclusions

DPI well-known limit is encrypted sessions

No way to cope with that with DPI alone

DPI (for traffic classification) may not be so costly compared to

other competitors and have many advantages

E.g. no training (regex are “simple” to derive)

Simple implementation

Most of time, walks over small portions of DFA (in cache)

Service-Based Classification may be a good complement of

previous solutions

My 2c: statistical traffic classifiers may have a better fit with a

limited number of protocols (i.e. if you want to identify just

P2P) but are not applicable to hundreds of protocols

(34)

Questions?

(35)

35

References

References

[1] A. Moore, K. Papagiannaki, Toward the Accurate Identification of Network Application, 6th International Workshop on Passive and Active Network Measurement,Boston MA, USA, May 2005, pp. 41-54.

[2] F. Risso, A. Baldini, M. Baldi, P. Monclus, O. Morandi, Lightweight, Payload-Based Traffic Classification: An Experimental Evaluation, IEEE International Conference on Communications (ICC 2008), Beijing (China), pp. 5869-5875, May 2008.

[3] J. Erman, A. Mahanti, M. Arlitt, C. Williamson, Identifying and discriminating between web an peer-to-peer traffic in the network core, Proceedings of the 16th International Conference on World Wide Web, Banff, Alberta, Canada pp. 883 - 892, 2007.

[4] J. Erman, M. Arlitt, A. Mahanti, Traffic classification using clustering algorithms, Proceedings of the 2006 SIGCOMM, Pisa, Italy, pp. 281 - 286, 2006.

[5] L. Bernaille, R. Teixeira, I. Akodkenou, Traffic classification on the fly, 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, San Jose, CA, pp. 40-49, 2008.

[6] S. Zander, T. Nguyen, G. Armitage, Self-learning IP traffic classification based on statistical flow characteristics, International Workshop on Passive and Active Network Measurement, Boston MA, pp. 325-328, 2005.

[7] M. Crotti, M. Dusi, F. Gringoli, L. Salgarelli, Traffic Classification through Simple Statistical Fingerprinting, ACM SIGCOMM Computer Communication Review, Vol. 37, No. 1, pp. 5-16, Jan. 2007.

[8] L. Bernaille, R. Teixeira, K. Salamatian, Early Application Identification, 2nd CoNEXT Conference, Lisboa, Portugal, Dec. 2006. [9] A. Este, F. Gringoli, L. Salgarelli, Support Vector Machines for TCP Traffic Classification, Universit` degli Studi di Brescia, Technical Report a. 08-07, Jul. 2008.

[10] N. Williams and S. Zander and G. Armitage, A Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Classification, SIGCOMM Computer Communication Review, Vol. 36, No. 5, , pp. 7-15, Oct. 2006.

[11] H. Kim, Kc Claffy, M. Fomenkova, D. Barman and M. Faloutsos, Internet Traffic Classification Demystified: The Myths, Caveats and Best Practices, ACM CoNEXT, Madrid, Spain, Dec. 2008.

(36)

References

References

[13] T. Karagiannis, K, Papagiannaki, M. Faloutsos, BLINC: Multilevel traffic classification in the Dark, ACM SIGCOMM, Aug. 2005.

[14] A. Este, F. Gargiulo, F. Gringoli, L. Salgarelli, C. Sansone, Pattern Recognition Approaches for Classifying IP Flows, 7th International Workshop on Statistical Pattern Recognition, Orlando, FL, Dec. 2008.

[15] V.N. Vapnik, Statistical Learning Theory. John Wiley and Sons, New York, 1998.

[16] B. Scholkopf, J.C. Platt, J. Shawe–Taylor, A.J. Smola, R.C. Williamson, on Estimating the Support of a High–Dimensional Distribution. Neural Computation, 13, pp. 1443–1471, 2001.

[17] Computer Networks Group (NetGroup) at Politecnico di Torino. The NetBee Library. August 2004. [online] Available at http://www.nbee.org/.

[18] Hash map container reference, http://www.sgi.com/tech/stl/hash map.html

[19] O. Morandi, F. Risso, M. Baldi, A. Baldini, Enabling flexible protocol processing through dynamic code generation, International Conference on Communications, Beijing (China), pp. 5849 - 5856, May 2008.

[20] flex: The Fast Lexical Analyzer, http://flex.sourceforge.net/

[21] R. Smith, C. Estan, S. Jha, S. Kong, Deflating the big bang: fast and scalable deep packet inspection with extended finite automata, ACM SIGCOMM Computer Communication Review, Volume 38 , Issue 4 (October 2008), Pages 207-218.

[22] M. Becchi, P. Crowley, Efficient regular expression evaluation: Theory to pratice, Symposium On Architecture For Networking And Communications Systems, Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, San Jose, California, Pp. 50-59, 2008.

[23] S. Kumar, S. Dharmapurikar, F. Yu, P. Crowley, J. Turner, Algorithms to accelerate multiple regular expressions matching for deep packet inspection, ACM SIGCOMM Computer Communication Review, Volume 36, Issue 4, pp. 339 - 350, October 2006

[24] File Transfer Protocol (FTP), RFC 959, http://www.ietf.org/rfc/rfc959.txt

[25] N. Brownlee, Traffic flow measurement: Meter MIB, Request for Comments RFC 2064, Internet Engineering Task Force, January 1997. [26] F. Gringoli, L. Salgarelli, M. Dusi, N. Cascarano, F. Risso, K.C. Claffy, GT: picking up the truth from the ground for Internet traffic,

References

Related documents

We measured proneness for flow ex- periences in three domains of everyday life, work (FP-Work), house- hold maintenance (FP-Maintenance), and leisure time (FP-Leisure), using the

Regardless of the method, of which there are four, all of them use one or more payload inspection techniques like Deep Packet Inspection to verify and classify traffic..

Multi data sources management for security analysis In the context of MMT, DPI (Deep Packet Inspection) and DFI (Deep Flow Inspection) are used to help detect and tackle harmful

and Barker, L.J., ACM SIGCSE Bulletin, Proceedings of the 40th ACM technical symposium on Computer science education - SIGCSE Volume 41 Issue 1, 2009 “Trends and discoveries of

As another example of how Caise supports collaboration by means of project change analysis and propagating events, in figure 6 we present a Caise -based code age editor for the

First, the merger could lead to input foreclosure, by which the upstream division of the merged firm refuses to sell, degrades quality, or raises the input prices charged to

The purpose of this study was to determine whether a live, synchronous distance delivery technology would facilitate interaction, immediacy, and presence between an instructor and

Smith, Some Legal Problems in Medical Treatment and Research, Human Laboratory Animals: Martyrs for Medicine, 36 Fordham L.. Available