Internet and Tor Traffic Classification Using Machine Learning

(1)

RIT Scholar Works

Theses

5-3-2019

Internet and Tor Traffic Classification Using Machine Learning

Siddharth Palsambkar

[email protected]

Follow this and additional works at: https://scholarworks.rit.edu/theses

Recommended Citation Recommended Citation

Palsambkar, Siddharth, "Internet and Tor Traffic Classification Using Machine Learning" (2019). Thesis. Rochester Institute of Technology. Accessed from

This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact

(2)

Machine Learning

Under Supervision of:

Prof. Joseph Nygate, PhD Submitted By: Siddharth Palsambkar

Thesis submitted in partial fulfillment of requirements of Degree of Master of Science in Telecommunication Engineering Technology

Department of Electrical, Computer & Telecom Engineering Technology College of Engineering Technology

Rochester Institute of Technology Rochester, NY 14623

May 3, 2019

(3)

Committee Approval

Joseph Nygate, Ph.D. Date

Associate Professor, Thesis advisor

William P. Johnson, J.D. Date

Graduate Program Director, MS Telecommunications Engineering Technology Program

Mark Indelicato Date

(4)

Firstly, I would like to express my sincere gratitude to my advisor Prof. Joseph Nygate,

Associate Professor at the Rochester Institute of Technology for the continuous support during

my graduate thesis and related research. The door to his office was always open whenever I had

questions about my research and writing. He encouraged me to work independently on my

research while regularly providing me the much-needed advice and insights.

I would also like to acknowledge Mr. Chris Brown, Systems Administrator, Laboratory

Manager and Adjunct Faculty at the Rochester Institute of Technology for providing his support

in setting up and maintaining the lab during the research.

Finally, I must express my very profound gratitude to my parents, teacher and friends for

providing me with unfailing support and encouragement throughout my graduate study. This

(5)

Privacy has always been a concern over the internet. A new wave of privacy networks

struck the world in 2002 when the TOR Project was released to the public. The core principle of

TOR, popularly known as the onion routing protocol, was developed by the ‘United States Naval

Research Laboratory’ in the mid-1990s. It was further developed by ‘Defense Advanced

Research Projects Agency’. The project that started as an attempt to create a secured

communication network for the U.S. Intelligence was soon released as a general anonymous

network. These anonymous networks are run with the help of volunteers that serve the physical

need of the network, while the software fills up the gaps using encryption algorithms.

Fundamentally, the volunteers along with the encryption algorithms are the network. Once a part

of such a network, the identity, and activity of a user is invisible. The users remain completely

anonymous over the network if they follow a few steps and rules. As of December 2017, there

are more than 3 million TOR users as per the TOR Project’s website. Today, the anonymous web

is used by people of all kinds. While, some just want to use it to make sure nobody could

possibly spy on them, others are also using it to buy and sell things. Thus, functioning as a

censorship-resistant peer-to-peer network.

Through this thesis, we propose a novel approach to identifying traffic and without

sacrificing the privacy of the Tor nodes or clients. We recorded traffic over our own Tor Exit and

Middle nodes to train Decision Tree classifiers to identify and differentiate between different

types of traffic. Our classifiers can accurately differentiate between regular internet and Tor

traffic while can also be combined together for detailed classification. These classifiers can be

used to selectively drop traffic on a Tor node, giving more control to the users while providing

(6)

Internet and Tor Traffic Classification Using Machine Learning

RIT Scholar Works

RIT Scholar Works

Internet and Tor Traffic Classification Using Machine Learning

Internet and Tor Traffic Classification Using Machine Learning

Machine Learning

Committee Approval

Table of Contents