Fingerprinting Websites Using Remote Traffic Analysis

(1)

Analysis

Xun Gong

^†

, Negar Kiyavash

^‡

, Nikita Borisov

^†

† ECE Department.

‡ IESE Department.

University of Illinois at Urbana-Champaign {xungong1,kiyavash,nikita}@illinois.edu

Abstract—Recent work has shown that traffic analysis of data carried on encrypted tunnels can be used to recover important semantic information. As one example, attackers can find out which website, or which page on a website, a user is accessing simply by monitoring the traffic patterns.

We show that traffic analysis is a much greater threat to privacy than previously thought, as such attacks can be carried out remotely. In particular, we show that, to perform traffic analysis, adversaries do not need to directly observe the traffic patterns. Instead, they can send probes from a far-off vantage point that exploit a queuing side channel in routers.

We demonstrate the threat of such remote traffic analysis by developing a remote website fingerprinting attack that works against home broadband users. Because the observations obtained by probes are more noisy than direct observations, we had to take a new approach to detection that uses the full time series data contained in the observation, rather than summary statistics used in previous work. We perform k-nearest neighbor classification using dynamic time warping (DTW) distance metric.

We find that in our experiments, we are able to fingerprint a website with 80% accuracy in both testbed and target system. This shows that remote traffic analysis represents a real threat to privacy on the Internet.

I. INTRODUCTION

Protecting the secrecy of online activities from prying eyes is a long-standing problem in Internet security. A number of encryption technologies (e.g., Transport Layer Security (TLS) [7], IPSec [15], and SSH [31]) protect the contents of much of today’s communication. Anonymizing communication systems, such as Tor [8], offer complementary protection, hiding the identity of the user from their communicating parties, and the relationships between parties from outside observers. However, both these technologies are vulnerable to traffic

analysis, where patterns of communication—such as packet sizes, timings, and counts—are used to infer sensitive information.

One important class of traffic analysis attack tar- gets application layer privacy. The attacker aims to recover content information of user’s applications, such as keystrokes typed [26], [32], words spoken over VoIP [29], [30], and websites visited [2], [17], [4], [10]. These attacks can be quite effective, but defenses can, in turn, be quite expensive; e.g., cover traffic that hides the real underlying activities introduces significant performance overhead [20].

Most users are reluctant to deploy such defenses as they perceive the threat to be relatively limited:

to perform traffic analysis, it is necessary to observe the patterns of packets. For a home user, this reduces the threat to those who are in physical proximity and can monitor their home network (perhaps wire- lessly [24]) or those who have privileged access to the routers used by ISPs to route the traffic.

We show that the threat is, in fact, much greater than previously considered. The design of Internet protocols gives attackers a mechanism to observe traffic patterns at routers remotely. In particular, by sending a low-bandwidth series of probe packets to a router, an adversary can create a side channel that leaks information about the size of the router’s queue. This side channel conveys a surprising amount of information, even if the attacker’s probes are sent from a vantage point that is geographically distant from the monitored host;

e.g., in another state or another country.

To demonstrate the power of this side channel, we develop a remote website fingerprinting attack.

It allows an adversary to identify what websites a

(2)

home user is accessing knowing only the user’s IP address. Our high-level goal is similar to previous work on website identification [2], [17], [4], [10];

however, because the side channel provides noisy information, we have to take a significantly different approach. Specifically, whereas previous work used summary statistics only for classification, we make use of the entire time series information obtained through traffic analysis. We address the problem of comparing time series, in the face of packet inser- tions and deletions, by using a dynamic time warping (DTW) distance, a signal processing technique previously developed for speech recognition. We then use the k-nearest neighbor (k-NN) algorithm to match the user’s website to a library of previously collected time series.

We evaluated our attack by recovering websites visited by a home user in Illinois. We were able to identify the website with 80% accuracy. The attacker probes were generated from vantage points in New Jersey, Seattle, and Quebec, Canada. We made use of commercial hosting services that cost as little as US$8 per month, showing that this attack is within easy reach of millions of people.

The remainder of this paper is structured as follows. Section VI is an overview of related work.

We explain our remote traffic analysis approach in Section II. In Section III, we present the website fingerprinting scheme. The implementation results of our attack are given in Section IV. Discussion of attack follows in Section V. Finally, we conclude in Section VII.

II. REMOTE TRAFFICANALYSIS

Traffic analysis attacks have been known to be effective for quite some time. And yet, for most Internet users, they represent a minor concern at best. Although a dedicated attacker could always intercept traffic by, say, bribing a rogue ISP em- ployee, or tapping a switch box, he would run the risk of being caught and potentially incur criminal charges. In any case, this level of effort seems justified only for highly sensitive material, rather than casual snooping; therefore, as long as sensitive data are protected by encryption or other techniques, a user may feel relatively safe.

We show, however, that traffic analysis can be carried out at a significantly lower cost, and by

attackers who never come into physical proximity with the user. In fact, the attackers can launch their attacks from another state or country, as long as they have access to a well-provisioned Internet connection. This, in turn, is very easy to obtain due to the highly-competitive Internet hosting business sector: a virtual private server in a data center can cost as little as $8/month.¹ The attack traffic is very low rate, thus attackers do not need to incur high bandwidth costs, and, on the flip side, users who are being spied upon are unlikely to notice the small amount of performance overhead. Thus, anyone with a credit card can carry out the attack and leave nearly no trace.

In this section, we describe our approach to remote traffic analysis; in the next section, we demonstrate how it can be used for remote website fingerprinting, effecting a real compromise to user privacy.

A. Queuing Side Channel

We will consider a home user, Alice, browsing a website via her DSL Internet connection. Un- beknownst to her, Bob, who is located in another state, or another country, uses his computer to send a series of ICMP echo requests (pings) to the router in Alice’s house,² and monitors the responses in order to compute the round-trip times (RTTs). These RTTs will include in them queuing delays incurred on the incoming and outgoing DSL link to Alice’s house, thus leaking information about the queue sizes on those links, which can in turn reveal traffic patterns for Alice.

The question is, then, how much information is leaked by this channel? The probe packets traverse many Internet links, and the queuing delays on Alice’s DSL link are but one component of the RTT. To investigate this question, we carried out a simple test with a home user in Illinois downloading the www.yahoo.com home page, while a computer in New Jersey sent ping request to the public IP address of the home user at a rate of 100 pings per second. The results are shown in Figure 2.

Figure 2(a) plots the volume of the home user’s real traffic binned into 10ms intervals. Figure 2(b)

1See, for example, www.vpslink.com (retrieved in April 2010).

2This is usually a wireless or wired router, implementing network address translation, but in some cases it might be Alice’s PC itself.

(3)

Alice DSL router

Internet

Alice's ISP

Bob's ISP

Bob Illinois

New Jersey queue

Alice's traffic

Bob's probe

Fig. 1. Queueing side channel

plots the RTTs of the ping requests. We see that the RTTs are highly correlated with the HTTP traffic;

whenever there is a large peak in the user’s traffic, the attacker observer correspondingly large RTTs.

It is interesting to observe the dramatic impact Alice’s traffic has on the RTTs, compared to other variation. This can be explained by the dynamics of Internet traffic. Alice’s DSL link is, by far, the slowest link that both her traffic and Bob’s probe are likely to traverse. Furthermore, the intermediate routers are not likely to be very congested, as previous work shows that congestion is most likely to occur at edge links, rather than in the network core [16], [1]. We can observe this directly: when Alice’s DSL link is idle, the RTT variation (jitter) is only one or two milliseconds. On the other hand, the queues at Alice’s router can grow to be quite long (in relative terms), due to TCP behavior such as slow start that cause the www.yahoo.com server to send a batch of TCP packets at a fast rate. We can see that the additional delay caused by traffic at Alice’s DSL line can be as high as 30ms. Thus, Alice’s traffic patterns are clearly visible. We next discuss how we recover the traffic patterns, but first we discuss some of the requirements for the techniques to work well. These requirements are commonly met in home broadband installation, thus our analysis has broad applicability.

B. Requirements

• No firewall.The probes use the ICMP protocol, so they could be blocked by a firewall on the

0 1 2 3 4 5 6

0 500 1000 1500 2000 2500 3000 3500

Time (s)

Traffic (byte)

(a) HTTP trace of Yahoo.com

0 1 2 3 4 5 6

0.03 0.055 0.08 0.105 0.13 0.15

Time (s)

RTTs (s)

(b) Observed RTTs

0 1 2 3 4 5 6

0 5 10 15 20 25 30

Time(s) Processed RTTs (ms)

(c) Processed RTTs

Fig. 2. Real traffic on a DSL vs. probe RTTs.

home router. In a brief survey of consumer- grade routers, we found that most of them do not perform ICMP filtering, at least not in the default configuration. Note that other forms of probes may be used as well; for example, if the home router exposes TCP ports for file sharing or other applications, SYN packets can be used with the same effectiveness.

• A single user. The probes cannot distinguish between the traffic of multiple users on the same link, so shared broadband connections present an obstacle to our attack. However, even in multi-user installations, it is still common for only one of them to be using the Internet at some point during the day. Addition- ally, previous work on traffic analysis has used blind source separation to separate traffic from

(4)

1 2 1 2 3 3

1 2

Fig. 3. Input/output relationship of a FIFO queue.

multiple users [33]; similar techniques may be applicable here.

• Known IP address. Bob needs to know Alice’s IP address to know where to send the probes.

Although this mapping is typically only explicitly known to ISPs, many protocols, such as file sharing, instant messaging, VoIP, and email, will reveal the IP address of a user. Other forms of IP address reconnaissance may be possible but outside the scope of this work.

• Limited bandwidth. The broadband link bandwidth must be low enough to introduce noticeable queuing delays. In our experiments, we have used speeds typical of current home broadband speeds—several Mbps. The deployment of faster links, such as Fiber-to-the-Home (FTTH), will reduce the effectiveness of the queuing side channel.

• FIFO queuing. Most routers today do not use QoS extensions and thus schedule packets on a given link in FIFO order. A fair queuing implementation [25] would reduce the impact that cross-traffic would have on the probe sequence and hence reduce the effectiveness of the side channel, but not entirely eliminate it [14].

C. Traffic Pattern Recovery

We model the incoming DSL link as a FIFO queue. For any FIFO queue the input/output relationship can be expressed as

s_i = D_i − max(D_i−1, A_i), (1) where Ai denotes the arrival time, Di the departure time, and si denotes the service time (total amount of time the packet was being served) of packet i.

Figure 3 clarifies the input/output relationship of Equation (1); because packet 2 arrives when packet 1 is still in the system, its total service time is s₂ = D₂− D₁. On the other hand, for both packets 1 and 3, the service time is si = D_i − A_i, as no other packet was being served when they arrived.

In the case of our attack, arrival processes to Alice’s DSL queue consist of two flows: the HTTP trace and the probe ping requests. Let A_i be the time when the ith ping request packet P_i arrives in the queue (i = 1, 2, ...). After the router finishes serving all the packets in the queue before Pi, Pi

will be served and arrive at the home user’s router.

The router will then generate a reply packet that will be delivered back to the probe origin. (As most traffic volume in an HTTP session occurs on the download side, we will ignore the queuing behavior on the outgoing link, though it could be modeled in a similar fashion.)

The round trip time of the ith ping request measured by the attacker is given by

rtt_i ≈ D_i− A_i+ rtt^∗, (2) where again Di denotes the departure time of Pi

from the queue. The first component Di − A_i represents the queuing delay experienced by the ping at the incoming DSL link. rtt^∗ models the propagation and processing delays experienced by the probe packets when there is no congestion; it can be estimated by the minimum RTT observed by the probes. As explained in §II-A, the jitter from intermediate routers is quite small, thus we can ignore it in our model.

As the ping packets have small sizes, most of the delay (i. e., Di − Ai) is spent on waiting for the router to serve HTTP packets that have arrived before P_i. Figure 4 shows one example of the above queuing system. In the arrival process, the ping packets (thin blocks) arrive in fixed intervals, and the HTTP packets (thick blocks) fall into these intervals. The router implements FIFO queuing scheme, where the serving times are proportional to packet sizes. Compared to the HTTP packets, most of which have maximal size (typically 1500 bytes), the transmission time of one ping packet is negligible. Combining Equations(1) and (2), we get the following recursive algorithm for recovery of traffic patterns from the RTT observations:

A_i = T_ping· i; (3) D_i = rtt_i− rtt_min+ A_i; (4) rttd_i = D_i− max(D_i−1, A_i) (5)

(5)

T

...

A1 A2 A3 A4 A5

D1 D2 D3 D4

Departure Process Arrival Process

Fig. 4. Queue system in the DSL router

where i is the ping sequence number, T_ping is the time interval between two consecutive pings, rtt_min is an approximated rtt^∗_i. The attacker first reconstructs the arrival and departure times of all ping packets, A_i and D_i, and then computes the delay incurred by unfinished HTTP packets arriving in the last period, drtti, which is approximately proportional to the total packet sizes between pings P_i−1 and P_i.

The accuracy of the estimation procedure above depends on the following two factors.

• Router Bandwidth.The bandwidth of the router determines the queue length and how fast the buffer is emptied. For a fixed ping frequency, if the router has a high bandwidth, more user packets will leave the buffer before the next ping request arrives. Hence, the attacker looses more information about user’s traffic, resulting in a poorer estimation of the original traffic.

Thus, the attacker is able to capture more information about user’s packets when the router has a lower bandwidth.

• Probe Frequency. The attacker’s probes take a period snapshot of the length of the queue. The ping frequency, therefore, affects the amount of information. The more frequently the attacker sends the ping requests, the less user packets he will miss. To improve the estimation accuracy, the attacker should sends the probes as frequently as possible. Theoretically, the attacker can capture every single HTTP packet if the ping period is chosen to be less than _bandwidth^{M T U size}, but the increase in the bit rate of probe signals may expose the attacker.

Therefore, to improve the estimation accuracy, the attacker should choose a probe frequency compat- ible with the router’s processing speed. We will see how these two factors affect the traffic analysis performances in Section IV.

To sum up, the attacker processes the RTTs of ping requests and computes a time series estimate of user’s arrival process. Next, we show how to extract a classification feature from this time series to perform the website fingerprinting attack.

III. WEBSITE FINGERPRINTING

Previous work on traffic analysis has shown that it is often possible to identify the website that someone is visiting based on traffic timings and sizes [17], [11], even if the website connection is carried over an encrypted tunnel to a proxy that hides the true destination (such as Tor [8]). We show that we can use remote traffic analysis to perform the same attack without observing the user’s traffic directly.

As compared with the previous work, using remote traffic analysis for website fingerprinting introduces two additional challenges. First, previous work created a training set for classification purposes from the same vantage point that was then used for fingerprinting. An attacker perform- ing remote traffic analysis must, of course, use a different environment for collecting the training set, potentially affecting the measured features. Second, previous work used exact packet size distributions to create features, whereas this information is not read- ily available to the attacker, since smaller packets are unlikely to produce noticeable queuing delays.

We describe our approach to solving these two problems next.

A. Training Environment

To obtain an accurate fingerprint for traffic of a particular user, the attacker must be able to replicate the network conditions on that user’s home network.

The approach we used was to set up a virtual machine running a browser that is connected to the Internet via a virtual Dummynet link [22]. The virtual machine is then scripted to fetch a set of web pages of interest; at the same time, a probe is sent across the Dummynet link, simulating the attack conditions. The processed RTTs from the

(6)

probe are then added to a database for classification (see below).

The link has a number of parameters that affect the fingerprint. We found that most important parameter to replicate was the link bandwidth. As discussed in §II-C, the probe frequency should be adjusted based on the available bandwidth. Band- width also affects the magnitude of the queuing delays; additionally, it can significantly alter the traffic pattern, as TCP congestion control mechanisms are affected by the available bandwidth. Fortunately, estimating available bandwidth on a link is a well- studied problem [27], [19], [21]. In our tests, we used a packet-train technique by sending a burst of probe packets and measuring the rate at which responses were returned and found that the results were reasonably accurate.

The round-trip time between the home router and the website also affects the fingerprint; however, we found that this did not have a large impact on the classification accuracy and thus did not explicitly model this parameter. We note, however, that the round-trip time is relatively easy to estimate from a trace: in the earlier section of Figure 2(b), it is easy to see the TCP slow-start behavior, as exponentially larger bursts of packets are sent. These bursts will be spaced one RTT apart and can therefore be used to tune the training data.

Many other factors affect the fingerprint, such as the web browser used, operating system, CPU speed, available memory, etc. We found, however, that we were able to obtain good success rates without modeling this behavior more explicitly.

B. Dynamic Time Warping

To deal with the fact that the queuing side channel is more noisy than with direct observation, we developed a classification strategy that uses all of the information obtained from the training set, rather than summary statistics. We can model the RTT observations as a marked point process, where each ping time is annotated with the corresponding RTT. To simplify analysis, we pre-process this data and keep only those points where the RTT has a significant increase from the previous observation.

This corresponds to new traffic arriving at the queue between two pings. As the queue drains at a constant

rates, pings from periods when no new traffic arrived provide no new information.

After processing both the training set and the observed RTTs in this way, our goal is to find the best match between the observation and the training set. The challenge is to define a meaningful distance between marked point processes. Note that point- wise comparisons will produce poor results, since some traces will have some observations missing, and the point processes will quickly become out of sync. Aligning two point processes by time values is equally error prone, as the delays between packets do not follow a strict pattern and have a large amount of variation.

To solve this problem, we turn to the Dynamic Time Warping (DTW) distance [23]. DTW was developed for use in speech processing to account for the fact that when people speak, they pronounce various features of the phonemes at different speeds, and do not always enunciate all of the features.

DTW attempts to find the best alignment of two point processes by creating a non-linear time warp between the sequences.

Consider marked point processes:

A = {a1, a2. . . , aI} and B = {b₁, b2. . . , bJ}.

To visualize the difference between these series consider an plotting one against the other as depicted in Figure 5. Let function F (c) = {c(1), . . . , c(K)} be a mapping from point process A to point process B or c(k) = (c(i), c(j)).

For any such a function, we can define the distance d(c) = d(i, j) = |a_i − b_j|. Furthermore, a weighted time normalized distance of A and B can defined by

D(A, B) = min

F

(PK

k=1d(c(k))w(k) PK

k=1w(k) )

, (6) where the weights w(k) are nonnegative coefficients intended to make the definition of distance D(A, B) more flexible. To make the optimization problem of (6) tractable, the weights are chosen independent of warping function F (c) and assumed to sum up to a constant or PK

k=1w(k) = N . Thus, the calculation of distance of (6) is reduced to

D(A, B) = min

F

( _K X

k=1

d(c(k))w(k) )

(7)

Warp

A a_I a₁

B

b1

bJ

c(3) = (2, 3)

c(K) = (I, J)

Fig. 5. Warping function example

Depending on the application of interest, some proper restrictions are imposed on warping function F . One example of the restrictions imposed on the matching of the sequences is on the monotonicity of the mapping in the time dimension. Dynamic programming can be used to find the distance D(A, B).

C. k-Nearest Neighbor Algorithm

We find the match site for the test sample using the k-Nearest Neighbor (k-NN) algorithm. k-NN is a simple type of instance-based learning commonly used in pattern recognition and machine learning [6]. Test samples are classified based on most similar training samples in the feature space, namely neighbors. The decision rule for one test sample is based on majority votes of its neighbors; the test sample is matched to the class most common amongst the k nearest neighbors.

The neighbor number k is a positive integer (usually small odd integers). When k = 1, it becomes nearest neighbor algorithm. Increasing k can reduce the effect of noise in training samples minimum distance decoding, but the boundaries between different classes becomes less distinct. The best choice of k depends on the specific problem and test data. We try different k values and choose the best one through experiments.

IV. EVALUATION

A. Experimental Set Up

Our experiments involves three systems: a target system, the attack system, and the training testbed.

The target system is a PowerBook G4, located

in Illinois, connected to a DSL line with 3 Mbps download and 512 Kbps upload speeds. We used a shell script to automatically browse websites using Firefox 3.5³. To focus on user traffic generated by browsing single website, we disable the browser cache, automatic update checks, and unnecessary online plugins. Also, we make the browser only opens one website at a time.

We used several commercial hosting sites for the attack server, located in New Jersey, Seattle, and in the Canadian province of Quebec, with the results presented in the graphs. We used hping⁴ to schedule pings at precise time intervals, based on the measured router bandwidth. We then analyzed the RTTs from a packet trace recorded via tcpdump⁵.

The testbed is a Linux machine located in our lab running several VMWare instances: a virtual target that is scripted to browser websites, similar to the real target, a virtual router providing NAT service, and a dummynet link configured to act as a bandwidth bottleneck. We used hping to send probes from the host O/S to the virtual NAT router. This provided very clean data for the training set, as there is no additional noise added by intermediate routers.

Note that, in practice, the same machine can be used for both the testbed and the attack server; however, we wanted to use rented machines for attacks to provide distance vantage points.

Following are main parameters related to our performance. We vary their values in the tests and discuss about the resulting classification performances.

• s is the size of our training sample set. We collect at most 50 samples for each website in the training stage. The default value for s is 50.

• N is the number of websites considered in classifications. We consider 30 most popular websites.

• k is the number of closest neighbors in k-NN algorithm. We choose a series of odd integers to find the best k for the attacker.

• bw is the bandwidth of the DSL router. The bandwidth in our test is 3 Mbps. We will study the attack performance under different bandwidths in a virtual machine setup.

3http://www.mozilla.com/firefox/

4http://www.hping.org/

5http://www.tcpdump.org/

(8)

0 5 10 15 20 25 30 35 40 45 50 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Training set size

Detection rate

k=1 k=3

Fig. 6. Accuracy of varying training set size

• T_ping is the time interval between two pings.

With the router bandwidth to be 3 Mbps, we set the default value of T_ping to be 0.005 s.

B. Testbed Results

We first tune the parameters for our attack by using the testbed to provide both the training and the target traces. The attacker can achieve about 80% detection accuracy to classify 30 websites by sending ping requests every 0.005s. We see the performances under different values of N , s, and k.

We first fix N = 30, Tping = 0.005s, bw = 3 Mbps, and change the training set size s. Figure 6 depict the results for k = 1 and 3. We see that in both cases more training samples result in better classification accuracy. This is because the effect from noisy samples in the training set is reduced as we increase the total training set size. Notice that the accuracy goes from 40% to 70% for k = 1, and from 60% to 80% for k = 3. The difference indicates that the increasing k can also reduce affect of the noise training samples. However, as we show later, larger values of k do not uniformly perform better.

We fix N = 30, Tping = 0.005s, bw = 3M bps, and change the value of k. Figure 7 depict the results when s = 10, 20 and 50. As we choose larger k, the classification accuracies all first increases and then decrease. As discussed above, at the beginning larger k will eliminate noises in training sample.

However, when the k continues to increase, the boundary between classes becomes blurred, which explains the decreasing accuracy in Figure 7. As the

0 2 4 6 8 10 12 14 16

0 0.2 0.4 0.6 0.8 1

The nearest neighor number k

Detection rate

training set size = 50 training set size = 20 training set size = 10

Fig. 7. Accuracy of varying k

5 10 15 20 25 30

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Number of websites

Detection rate

k=3 k=1

Fig. 8. Accuracy of varying k

accuracy peaks at k = 3, we set this value for k in our tests.

Next, we fix T_ping = 0.005s, bw = 3M bps, s = 50, vary N from 5 to 30. The results are depicted in Figure 8. The classification performance gets poorer when more websites are considered.

The accuracy is about 90% when only classifying 5 websites, and decreases to below 80% when classifying 30 websites. This is expected because in hypothesis testing, for fixed sample size (s=50), the error probability increases as the number of hypotheses increases. We notice that the accuracy of k = 1 is higher at N = 5.

C. Router Bandwidth and Ping Frequency

The accuracy of the traffic pattern estimation depends on the router bandwidth and ping sending frequency. In this section, we show the classification performance under different values of these two parameters.

We fix N = 10, bw = 3 Mbps, s = 50,

(9)

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

The ping interval (s)

Detection rate

Fig. 9. Effect of ping intervals

and increase the ping interval from 0.003 s. The result is depicted in Figure 9. The detection rate drops from about 90% at Tping = 0.003s to 20%

at T_ping = 0.08s. The poorer classification performance is expected because the attacker fails to observe patterns due to coarse sampling. For the router in our test, the attacker needs to send the probes every ^1500byte_{3M bps} = 0.004s to capture every user packet. This also explains the high detection accuracy at low ping intervals depicted in Figure 9.

Next, we show how the router’s bandwidth affects the results. We use virtual machines to construct a virtual router with adjustable bandwidths. The results are given in Table I. We see that with the same ping interval, the accuracy decreases as the bandwidth increases.

D. Target System Results

We evaluate the classification accuracy of our website fingerprinting attack by obtaining test data from the target system connected by a DSL line and using the virtual machine testbed to collect a training set tuned with the same bandwidth parameters. We also consider matching a trace from the DSL scenario against other traces collected at the DSL computer; likewise for the virtual machines.

This allows us to differentiate between the impact of having an adequately tuned training environment and of the noise introduced by the queuing side channel. The results are shown in Table II. We see that, when the same computer is used for collecting training and test data, the classification results are very good. The results are comparable with the success rates of previous work, showing that the

TABLE I

ACCURACY OF VARYING BANDWIDTH

Bandwidth ping interval Accuracy 3 Mbps

1 ms 98.64%

3 ms 88.64%

5 ms 81.09%

5 Mbps

1 ms 98.18%

3 ms 73.64%

5 ms 70.45%

10 Mbps

1 ms 85.91%

3 ms 66.82%

5 ms 30.00%

TABLE II

CLASSIFICATION ACCURACY

N Training Test Accuracy 12

VM VM 84.21%

DSL DSL 81.25%

VM DSL 36.81%

24

VM VM 80.26%

DSL DSL 82.64%

VM DSL 21.53%

queuing side channel is an effective way to perform traffic analysis. When testing the DSL traffic against the VM training set, we get significantly worse results, although our classification rates are much higher than would be expected from random classification (8% for 12 websites and 4% for 24).

We expect that, with further tuning of the testbed architecture, the accuracy can be improved. Some degradation, on the other hand, may be inherent to using multiple vantage points, as is discussed in

§V-B.

V. DISCUSSION

A. Privacy preserving router policies

The basis of our remote traffic analysis is pres- ence of the queuing side channel, therefore coun- termeasures to our attack must seek to mitigate this side channel.

We have shown that the router may leak most of the user traffic information using FCFS policy. This is because a large correlation exists between user’s arrival process and attacker’s RTTs. To counter our attack, the router could employ a scheduling policy, which produces low correlation between the two incoming traffic flows. However, design of such a policy is not trivial. We evaluate some existing policies, namely Round-Robin and TDMA policies.

(10)

TABLE III

DETECTION RATE WITH DIFFERENT POLICIES

Router policy Detection accuracy Average delay

FCFS 81.09% 0.445s

Round-Robin 78.00% 0.457s

TDMA 17.00% 2.61s

The simulation results are presented in Table III.

In the simulation, N = 10, t = 0.003s, s = 50, and k = 3. We see the classification results of both policies are poorer than FCFS. In the round-robin case, the packets from user and attacker are served alternately, so the ping only needs to wait for at least one user packet in the buffer. Hence, the queuing evolution resulting from user’s activity conveys less information than in the FCFS case. From the results in the table, we see the classification accuracy is lower than FCFS, but still quite high, around 70%, much higher than uniformly random pick (10%).

For TDMA, the router’s service time is equally allo- cated to the user and attacker, hence the correlation between their traffic is extremely low, which means the attacker learns nothing. This is verified by the simulation results in Table III, where TDMA gives a detection accuracy around 20%. However, TDMA causes large delays, and exact slot allocation policy can be challenging to implement for a dynamic packet network. Hence, the tradeoff between packet delay and information leak is important issue in designing practical router policies to preventing our attack.

B. Dynamic Websites

Our attack relies on web sites having a relatively stable fingerprint. We note that, even for dynamic websites, the overall pattern captured by our RTT probes remains static enough to obtain good classification results. Website content also changes over time, and with it the traffic patterns. We note that our technique is less sensitive to small changes in content, as we do not rely on exact packet sizes;

however, large changes (e.g., site redesigns) will result in new patterns being discovered. Thus, for best results, the training set should be collected close in time to the probe. Note that, since the collection occurs in a testbed controlled by the attacker, collecting new data at any time is not a problem.

Websites that use content distribution networks (CDNs) will use different servers to deliver content based on the user’s location. They may also present localized versions of the site to users in different countries or regions. As shown in our experimental results, this can cause fingerprints to differ significantly. If identifying these sites is a high priority for the attacker, additional work would be needed to obtain fingerprints of the right version by, for example, using proxies and other techniques to fool IP-based localization.

VI. RELATEDWORK

The idea of remote traffic analysis using probes has been previously explored in context of exposing identity of Tor relays participating in a given circuit.

Murdoch and Danezis implemented one such attack against Tor and MorphMix [18]. Their approach was to send an on-off pattern of high-volume traffic through the anonymous tunnel and a low-volume probe to a router under test. If the waiting times of the probe show a corresponding increase during the “on” periods, the router is assumed to be routing the flow. However, Murdoch and Danezis’s attack was performed on only a 13-relays lightly loaded Tor network and it was not practical on today’s 1500-relay heavily loaded Tor network.

Even ignoring the growing false positive rates resulting from possible increases in traffic load due to legitimate uses during attacker’s “on”

period, the attacker needs extremely large amount of bandwidth to measure enough relays during the attack window. Evans et al. [9] strengthened Murdoch and Danezis’s attack of by a bandwidth amplification attack which make their attack feasible in modern-day deployment of Tor. By combining JavaScript injection with a selective and asymmetric denial-of-service (DoS) attack, Evans et al. were able to infer specific information about the path selected by the victim and thus circumvent Murdoch and Danezis’s attack. Hopper et al. [13]

use a combination of Murdoch and Danezis’s approach and pairwise round trip times (RTTs) between Internet nodes to correlate Tor nodes to likely clients. Chakravarty et al. [3] propose an attack for exposing Tor relays participating in a circuit of interest by modulating the bandwidth of an anonymous connection and then observing

(11)

the fluctuations as they propagate through the Tor network.

We survey previous work on probing HTTP traffic in the literature. The objective of these attacks is to identify the website a user is browsing, namely fingerprinting websites. Website fingerprinting procedure often entails a training stage, where the attacker extracts certain features from training HTTP traces and makes a profile for every website containing its feature information. In this way, the attacker can build a database containing profiles of all websites of interest. In the test stage, when user is browsing a website, the attacker analyzes features in the download flow, and classifies it into the most similar profile in the database. Following this framework, the difference in the literature is often in the chosen features or the classification algorithms.

Cheng et al. [5] present one of the earliest website fingerprinting approach. The classification features used in their scheme are the object sizes and the HTML file sizes. This attack is impractical for two reasons. first, the attacker needs to access the target website and record the sizes of all objects and references when building the page profile database;

second, it is infeasible to detect the size of individual object in HTTP traces if connection pipelining (supported in HTTP 1.1) or tunnel-based encryption tools (e.g., WEP/WPA links and SSH tunnels) is used; moreover, this attack is constrained to one single web server, hence is less powerful than later schemes, which consider identification of a larger number of candidate websites.

Hintz [12] and Sun et al. [28] both consider website fingerprinting attacks in SSL encrypted HTTP connections. Their classification features are object sizes and counts. While Hintz did not present implementation details and experiment results, Sun et al. use a Jaccard’s coefficient based classifier and show that their attack can achieve a correct identification rate of 75 %. Like Cheng, they assume that individual web objects in one transmission can be separated by examining the timing of TCP connections. Hence, the attack becomes infeasible if user applies connection pipelining or tunnel-based encryptions.

Instead of looking at web objects, Bissias et al.

[2], Liberatore et al. [17], and Herrmann et al.

[11] study the statistical characteristics of individual packets in the traffic flows. Bissias et al. use packet sizes and inter-arrival timings as classification features. Their method is fragile to the changes in the network environment, as the inter-arrival timing is highly dependent on the specific routing path and varies from time to time. To address this problem, Liberatore et al. only use packet sizes and counts in classification. They implement both Jaccard coefficient and Naive Bayes classifier, and show the effi- cacy of the attack in practice. Using similar scheme, Herrmann et al. further improve the classification accuracy using Multinomial Naive Bayes classifier.

Using these packet-based fingerprinting methods, the attacker can identify the target HTTP connection at over 80% detection accuracy.

Summing up, if the attacker can capture user’s HTTP traffic traces, he can infer the websites user is browsing by analyzing statistical information, such as packet sizes and counts. In many cases, however, the attacker has no direct access to the target traffic flows. For instance, the attacker may be in a different state or country from the user. In this case, all the schemes above, belonging to the category of local traffic analysis, become infeasible.

Next, we present a scheme that allows the attacker to infer user’s HTTP traffic pattern using a low-bit sequence of probes. Our scheme uses standard website fingerprinting framework. However, the attacker does not have either size or timing information of user packets. Instead, we use a recovered time series pattern as our classification features. We compute similarity between samples with a DTW metric, and make classification decisions in a k-nearest neighbor manner.

VII. CONCLUSION

We show that traffic analysis attacks can be carried out remotely, without access to the analyzed traffic, thus greatly increasing the attack surface. We identify a queuing side channel that can be used to infer the queue size of a given link with good accuracy and thus monitor traffic patterns. We show how this channel can be used to carry out remote website fingerprinting and identify a remote user’s browsing patterns. This highlight the importance of traffic analysis attacks in today’s connected Internet.

(12)

REFERENCES

[1] A. Akella, S. Seshan, and A. Shaikh. An empirical evaluation of wide-area internet bottlenecks. In Internet Measurement Comference, pages 101–114, 2003.

[2] G. Bissias, M. Liberatore, D. Jensen, and B. Levine. Privacy vulnerabilities in encrypted HTTP streams. In Privacy Enhanc- ing Technologies, pages 1–11, 2006.

[3] S. Chakravarty, A. Stavrou, and A. Keromytis. Identifying proxy nodes in a tor anonymization circuit. In Signal Image Technology and Internet Based Systems, 2008. SITIS ’08. IEEE International Conference on, pages 633 –639, 30 2008-dec. 3 2008.

[4] S. Chen, R. Wang, X. Wang, and K. Zhang. Side-Channel Leaks in Web Applications: a Reality Today, a Challenge Tomorrow.

In Proceedings of IEEE Symposium on Security and Privacy (Oakland), 2010.

[5] H. Cheng, , H. Cheng, and R. Avnur. Traffic analysis of ssl encrypted web browsing, 1998.

[6] T. M. Cover and P. E. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13:21–27, 1967.

[7] T. Dierks and C. Allen. The TLS protocol version 1.0.

RFC2246, Jan. 1999.

[8] R. Dingledine, N. Mathewson, and P. Syverson. Tor: The second-generation onion router. In In Proceedings of the 13th USENIX Security Symposium, pages 303–320, 2004.

[9] N. Evans, R. Dingledine, and C. Grothoff. A practical congestion attack on Tor using long paths. In 18th USENIX Security Symposium, pages 33–50, 2009.

[10] D. Herrmann, R. Wendolsky, and H. Federrath. Website fingerprinting: attacking popular privacy enhancing technologies with the multinomial na

”ıve-bayes classifier. In Proceedings of the 2009 ACM workshop on Cloud computing security, pages 31–42. ACM, 2009.

[11] D. Herrmann, R. Wendolsky, and H. Federrath. Website fingerprinting: Attacking popular privacy enhancing technologies with the multinomial naive-bayes classifier. In ACM Cloud Computing Security Workshop, 2009.

[12] A. Hintz. Fingerprinting websites using traffic analysis. In Workshop on Privacy Enhancing Technologies, 2002.

[13] N. Hopper, E. Y. Vasserman, and E. Chan-tin. How much anonymity does network latency leak. In In CCS 07: Proceed- ings of the 14th ACM conference on Computer and communications security. ACM, 2007.

[14] S. Kadloor, X. Gong, N. Kiyavash, T. Tezcan, and N. Borisov.

Low-cost side channel remote traffic analysis attack in packet networks. In IEEE International Conference on Communica- tions, 2010.

[15] S. Kent and R. Atkinson. RFC2401: security architecture for the Internet protocol. RFC Editor United States, 1998.

[16] K. Lakshminarayanan and V. N. Padmanabhan. Some findings on the network performance of broadband hosts. In nternet Measurement Comference, pages 45–50, 2003.

[17] M. Liberatore and B. N. Levine. Inferring the source of encrypted HTTP connections. In CCS ’06: Proceedings of the 13th ACM conference on Computer and communications security, pages 255–263, New York, NY, USA, 2006. ACM Press.

[18] S. J. Murdoch and G. Danezis. Low-cost traffic analysis of tor. In SP ’05: Proceedings of the 2005 IEEE Symposium on Security and Privacy, pages 183–195, Washington, DC, USA, 2005. IEEE Computer Society.

[19] R. Prasad, C. Dovrolis, M. Murray, and K. Claffy. Bandwidth estimation: metrics, measurement techniques, and tools. IEEE network, 17(6):27–35, 2003.

[20] J. Raymond. Traffic analysis: Protocols, attacks, design issues, and open problems. In Designing Privacy Enhancing Technolo- gies, pages 10–29. Springer, 2000.

[21] V. Ribeiro, R. Riedi, R. Baraniuk, J. Navratil, and L. Cottrell.

pathChirp: Efficient available bandwidth estimation for network paths. In Passive and Active Measurement Workshop, volume 4.

Citeseer, 2003.

[22] L. Rizzo. Dummynet: a simple approach to the evaluation of network protocols. ACM SIGCOMM Computer Communication Review, 27(1):31–41, 1997.

[23] H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26:43–49, 1978.

[24] T. Saponas, J. Lester, C. Hartung, S. Agarwal, T. Kohno, et al.

Devices that tell on you: Privacy trends in consumer ubiquitous computing. In Usenix Security, volume 3, page 3, 2007.

[25] M. Shreedhar and G. Varghese. Efficient fair queueing using deficit round-robin. IEEE/ACM Transactions on Networking (TON), 4(3):385, 1996.

[26] D. X. Song, D. Wagner, and X. Tian. Timing analysis of keystrokes and SSH timing attacks. In USENIX Security Symposium, 2001.

[27] J. Strauss, D. Katabi, and F. Kaashoek. A measurement study of available bandwidth estimation tools. In Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, pages 39–44. ACM, 2003.

[28] Q. Sun, D. R. Simon, Y.-M. Wang, W. Russell, V. N. Padman- abhan, and L. Qiu. Statistical identification of encrypted web browsing traffic. In IEEE Symposium on Security and Privacy.

Society Press, 2002.

[29] C. V. Wright, L. Ballard, S. E. Coull, F. Monrose, and G. M.

Masson. Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations. In IEEE Symposium on Security and Privacy, pages 35–49, 2008.

[30] C. V. Wright, L. Ballard, S. E. Coull, F. Monrose, and G. M.

Masson. Spot me if you can: Uncovering spoken phrases in encrypted VoIP conversations. In SP ’08: Proceedings of the 2008 IEEE Symposium on Security and Privacy, pages 35–49, Washington, DC, USA, 2008. IEEE Computer Society.

[31] T. Ylonen and C. Lonvick. Internet draft - ssh transport layer protocol. March 2005.

[32] K. Zhang and X. Wang. Peeping Tom in the Neighborhood:

Keystroke Eavesdropping on Multi-User Systems. In USENIX Security, 2009.

[33] Y. Zhu and R. Bettati. Unmixing mix traffic. In Privacy Enhancing Technologies, pages 110–127. Springer.