HOST-BASED CLASSIFICATION - Peer-to-Peer Traffic

Peer-to-Peer Traffic

5. HOST-BASED CLASSIFICATION

In order to evaluate if the patterns described above are noticeable in real traffic, we implemented a simple host-based classifier, relying on the entropy value for sliding windows with size of 100 packets. The following heuristics were defined and implemented as illustrated by Figure 8:

1. if the entropy value, excluding the TCP packets with no payload, is greater than 3, the host is running P2P applications (E noTCPPayload > 3);

The Computer Journal, Vol. X, No. X, 2011

Exploring Behavioral Patterns Through Entropy in Multimedia Peer-to-Peer Traffic 13

Entropy exluding TCP packets without payload (E_noTCPpayload) Entropy for outgoing traffic (E_outgoing)

Entropy for outgoing traffic, slots of 200 bytes (E_outgoingSlots200)

E_noTCPPayload > 3

FIGURE 8. Flowchart of the proposed classification scheme.

2. if the entropy value, excluding the TCP packets with no payload, is smaller than 1, the host is not running any P2P applications (E noTCPPayload

< 1);

3. if the entropy value for outgoing traffic is smaller than 0.5, the host is not running any P2P applications (E outgoing < 0.5);

4. if the entropy value for outgoing traffic, using slots of 200 bytes, is smaller than 0.1, the host is not running any P2P applications (E outgoingSlots200

< 0.1);

5. in any other case, the host is running P2P applications.

The classifier processes these heuristics sequentially, meaning that each of them is used only if the previous ones were not valid.

The existence of available traces with payload is scarce. Moreover, even if we could have access to such traces, obfuscation techniques used by many applications would make it very difficult to determine which applications generated each flow. Therefore, we set up a testbed using several host computers, running different operating systems, and we captured the traffic generated by all the machines in an aggregation point.

Each computer was monitored so as to be sure of the applications used in every moment and by each host.

Different applications and protocols were ran in the hosts, some times simultaneously. Four datasets, with 1.8, 1.6, 3.1, and 15.5 GB, were captured and used to

evaluate the effectiveness of the classification scheme proposed. All of them contain P2P and non-P2P traffic.

Nevertheless, the fourth dataset contains a larger share of traffic from P2P applications. The results obtained for the classification are listed in Table 3.

The classifier based on the patterns identified in this study performed very well, with a false positives rate of almost 3% and a false negatives rate of less than 10%.

Even though the analysis used in this work is completely in the dark, using only the information of the length of the packets, it was possible to accurately identify the hosts running P2P applications. The method used is very lightweight, it does not require the calculation of probability distributions, nor does it need to correlate the information of the packet lengths with other properties like, for example, the inter-arrival times.

Moreover, since it relies only on the characteristics of the lengths of the packets generated by generic P2P applications, it can identify encrypted traffic and traffic from previously unknown P2P protocols. In fact, during the experimental tests, the classifier identified traffic from a flash-based streaming service of a TV channel (CNN) as being P2P traffic. After a human verification, we realized that what first seemed to be a false positive case was, indeed, a true positive. The CNN streaming service, which we thought to be a common client-server service, was using a plugin to implement a P2P system to reduce the costs of the video distribution.

6. CONCLUSION

The search for new methods that could provide a deeper knowledge about the behavior of the network traffic led the researchers to look at the traffic from different perspectives. Many approaches rely on statistical tools to describe the traffic properties mathematically and derive conclusions that can be used in practice due to their computational efficiency.

In this article, we used source traffic from individual users and based our study on a host level perspective.

We analyzed the characteristics of the lengths of the IP packets from several popular applications, giving special attention to the dissimilarities between P2P and non-P2P traffic. The analysis of the datasets showed different patterns regarding the heterogeneity of the lengths, which was measured using entropy. Non-P2P traffic presented a very low entropy level when compared to the P2P datasets. In order to distinguish ambiguous cases, we also analyzed the entropy for the outgoing traffic and, to improve the results, the lengths were separated into slots. Our approach relied on a sliding window with a constant size that enables the analysis of entropy in real-time and makes it sensitive to variations in the characteristics of the traffic during the lifetime of the flows.

The heterogeneity of the packet lengths, applied at a host level, can be used for the characterization of the behavior of a user or node. The information it retrieves The Computer Journal, Vol. X, No. X, 2011

Classi cation of Peer-to-Peer Traf c by Exploring the Heterogeneity of Traf c Features Through Entropy

14 J. V. Gomes, P. R. M. In´acio, M. Pereira, M. M. Freire, and P. P. Monteiro

TABLE 3. Results of the host-based classification.

Datasets Traffic Volume (GB) False Positives Rate (%) False Negatives Rate (%)

Dataset 1 1.8 04.17 12.50

Dataset 2 1.6 00.00 09.09

Dataset 3 3.1 10.42 07.69

Dataset 4 15.5 00.00 09.88

Total 22.0 03.11 09.73

may be helpful to understand the traffic generated by a single host and its interactions with the remaining nodes. Based on the observations of the packet length heterogeneity, we defined a set of heuristics and implemented a simple classifier to identify users running P2P applications, which performed accurately. Since the classifier relies on traffic characteristics identified for generic P2P traffic rather than for a specific protocol, it can be used to identify traffic from previously unknown P2P protocols. Moreover, it can be applied to encrypted traffic as it resorts only to the lengths of the packets and does not need any encrypted data carried within the packet payload.

The analysis we describe may be extended for other perspectives. We plan to study the feasibility and gain of using the same analysis at other levels (e.g., host-port pairs, flows, separate the incoming and outgoing traffic, etc.) and combining the results with the ones obtained at the host level. Moreover, we intend to further develop the approach we exposed and apply it directly to the classification of individual traffic flows.

FUNDING

This work was partially supported by Instituto de Telecomunica¸cões, by University of Beira Interior, and by Funda¸cão para a Ciência e a Tecnologia, through the grant contract SFRH/BD/60654/2009 and the project TRAMANET: Traffic and Trust Management in Peer-to-Peer Networks with con-tracts PTDC/EIA/73072/2006 and FCOMP-01-0124-FEDER-007253.

ACKNOWLEDGEMENTS

The authors would like to thank David A. Carvalho for his assistance in the setup of the network testbed.

REFERENCES

[1] Leland, W. E., Taqqu, M. S., Willinger, W., and Wilson, D. V. (1994) On the self-similar nature of Ethernet traffic (extended version). IEEE/ACM Trans.

Netw., 2, 1–15.

[2] Nychis, G., Sekar, V., Andersen, D. G., Kim, H., and Zhang, H. (2008) An empirical evaluation of entropy-based traffic anomaly detection. Proc. ACM SIGCOMM Internet Measurement Conf. (IMC 2008), Vouliagmeni, Greece, October, pp. 151–156. ACM, New York, NY, USA.

[3] Arlitt, M. and Williamson, C. (2007) The extensive challenges of Internet application measurement. IEEE Netw., 21, 41–46.

[4] Kind, A., Dimitropoulos, X., Denazis, S., and Claise, B.

(2008) Advanced network monitoring brings life to the awareness plane. IEEE Commun. Mag., 46, 140–146.

[5] Karagiannis, T., Papagiannaki, K., and Faloutsos, M.

(2005) BLINC: Multilevel traffic classification in the dark. Proc. ACM SIGCOMM Conf. Applications, Technologies, Architectures, and Protocols for Com-puter Communications, Philadelphia, PA, USA, Au-gust, pp. 229–240. ACM, New York, NY, USA.

[6] Iliofotou, M., Kim, H., Faloutsos, M., Mitzenmacher, M., Pappu, P., and Varghese, G. (2011) Graption: A graph-based P2P traffic classification framework for the internet backbone. Elsevier Comput. Netw., 55, 1909–

1920.

[7] Gomes, J. V. P., In´acio, P. R. M., Lakic, B., Freire, M. M., da Silva, H. J. A., and Monteiro, P. P.

(2010) Source traffic analysis. ACM Trans. Multimedia Comput. Commun. Appl. (ACM TOMCCAP), 6, 1–23.

[8] Karagiannis, T., Faloutsos, A. B. M., and Claffy, K.

(2004) Transport layer identification of P2P traffic.

Proc. ACM SIGCOMM Internet Measurement Conf.

(IMC 2004), Taormina, Sicily, Italy, October, pp. 121–

134. ACM, New York, NY, USA.

[9] Tutsch, D., Babin, G., and Kropf, P. (2008) Application-layer traffic analysis of a peer-to-peer system. IEEE Internet Comput., 12, 70–77.

[10] Khakpour, A. R. and Liu, A. X. (2009) High-speed flow nature identification. Proc. 29th IEEE Int.

Conf. Distributed Computing Systems (ICDCS ’09), Montreal, Quebec, Canada, June, pp. 510–517. IEEE Computer Society, Los Alamitos, CA, USA.

[11] Moore, A. W. and Zuev, D. (2005) Internet traffic classification using bayesian analysis techniques. ACM SIGMETRICS Performance Evaluation Rev., 33, 50–

60.

[12] Erman, J., Mahanti, A., Arlitt, M., Cohen, I., and Williamson, C. (2007) Offline/realtime traffic classification using semi-supervised learning. Elsevier Perform. Eval., 64, 1194–1213.

[13] Freire, E. P., Ziviani, A., and Salles, R. M. (2008) Detecting VoIP calls hidden in web traffic. IEEE Trans.

Netw. Serv. Manag., 5, 204–214.

[14] Palmieri, F. and Fiore, U. (2009) A nonlinear, recurrence-based approach to traffic classification.

Elsevier Comput. Netw., 53, 761–773.

[15] Moore, A. W., Zuev, D., and Crogan, M. L. (2005) Dis-criminators for use in flow-based classification. Techni-cal Report RR-05-13. Intel Research, Cambridge, UK.

The Computer Journal, Vol. X, No. X, 2011

Exploring Behavioral Patterns Through Entropy in Multimedia Peer-to-Peer Traffic 15 [16] Bernaille, L., Teixeira, R., and Salamatian, K.

(2006) Early application identification. Proc. 2nd Conf. Future Networking Technologies (CoNEXT ’06), Lisboa, Portugal, December, pp. 1–12. ACM, New York, NY, USA.

[17] Dainotti, A., de Donato, W., Pescap`e, A., and Rossi, P. S. (2008) Classification of network traffic via packet-level hidden markov models. Proc. IEEE Global Telecommunications Conf. (GLOBECOM 2008), New Orleans, LA, USA, November/December, pp. 1–5.

IEEE Communications Society, New York, NY, USA.

[18] Dusi, M., Crotti, M., Gringoli, F., and Salgarelli, L. (2009) Tunnel Hunter: Detecting application-layer tunnels with statistical fingerprinting. Elsevier Comput. Netw., 53, 81–97.

[19] Bonfiglio, D., Mellia, M., Meo, M., Rossi, D., and Tofanelli, P. (2007) Revealing Skype traffic: When randomness plays with you. ACM SIGCOMM Comput.

Commun. Rev., 37, 37–48.

[20] Branch, P. A., Heyde, A., and Armitage, G. J. (2009) Rapid identification of Skype traffic flows. Proc.

18th Int. Workshop Network and Operating System Support for Digital Audio and Video (NOSSDAV ’09), Williamsburg, VA, USA, June, pp. 91–96. ACM, New York, NY, USA.

[21] Gomes, J. V. P., In´acio, P. R. M., Freire, M. M., Pereira, M., and Monteiro, P. P. (2008) Analysis of peer-to-peer traffic using a behavioural method based on entropy. Proc. 27th IEEE Int. Performance Computing and Communications Conf. (IPCCC 2008), Austin, TX, USA, December, pp. 201–208. IEEE Computer Society Press, Los Alamitos, CA, USA.

[22] Li, B., Ma, M., and Jin, Z. (2011) A VoIP traffic identification scheme based on host and flow behavior analysis. J. Netw. Syst. Manag., 19, 111–129.

[23] Dhamankar, R. and King, R. (2007). Protocol identification via statistical analysis (PISA). White Paper, Tipping Point.

[24] Dorfinger, P., Panholzer, G., Trammell, B., and Pepe, T. (2010) Entropy-based traffic filtering to support real-time Skype detection. Proc. 6th Int.

Wireless Communications and Mobile Computing Conf.

(IWCMC ’10), Caen, France, June/July, pp. 747–751.

ACM, New York, NY, USA.

[25] Gu, Y., McCallum, A., and Towsley, D. (2005) Detecting anomalies in network traffic using maximum entropy estimation. Proc. 5th ACM SIGCOMM Internet Measurement Conf. (IMC 2005), Berkeley, CA, USA, October, pp. 345–350. USENIX Association, Berkeley, CA, USA.

[26] Wagner, A. and Plattner, B. (2005) Entropy based worm and anomaly detection in fast IP networks.

Proc 14th IEEE Int. Workshops Enabling Technologies:

Infrastructure for Collaborative Enterprise (WETICE 2005), Link¨oping, Sweden, June, pp. 172–177. IEEE Computer Society, Los Alamitos, CA, USA.

[27] Han, C.-K. and Choi, H.-K. (2009) Effective discovery of attacks using entropy of packet dynamics. IEEE Netw., 23, 4–12.

[28] Androulidakis, G., Chatzigiannakis, V., and Papavas-siliou, S. (2009) Network anomaly detection and clas-sification via opportunistic sampling. IEEE Netw., 23, 6–12.

[29] Fraleigh, C., Moon, S., Lyles, B., Cotton, C., Khan, M., Moll, D., Rockell, R., Seely, T., and Diot, C. (2003) Packet-level traffic measurements from the Sprint IP backbone. IEEE Netw., 17, 6–16.

[30] Bianco, A., Mardente, G., Mellia, M., Munaf`o, M., and Muscariello, L. (2009) Web user-session inference by means of clustering techniques. IEEE/ACM Trans.

Netw., 17, 405–416.

[31] Shannon, C. E. (1948) A mathematical theory of communication. The Bell System Technical J., 27, 379–423.

The Computer Journal, Vol. X, No. X, 2011

Classi cation of Peer-to-Peer Traf c by Exploring the Heterogeneity of Traf c Features Through Entropy

Chapter 5 Identi cation of Peer-to-Peer VoIP Sessions Using

In document Classification of Peer-to-Peer Traffic by Exploring the Heterogeneity of Traffic Features Through Entropy (Page 154-159)