Leaking No Information - Experimental Results

5.6 Experimental Results

5.6.6 Leaking No Information

Figure 5.5 presents two websites in terms of the variation trends of guessing probabilities among five locations, as two typical examples of leaking no information. The data plotted was generated in the first testing round.

Case 1

In Figure 5.5(1) for website imgur.com, the variation trends are inconsistent between locations. Likewise, the variation trends are also inconsistent in the second testing round, which are not plotted here. Moreover, the web traffic cannot identify user locations.

Therefore, the web traffic with the guessing probabilities shown in Figure 5.5(1) can leak neither user identities nor user locations.

However, it is uncertain that what kind of information causes the inconsistent guessing probabilities. Is it because of noise, or a combination of multiple user secrets transmitted to the network? And the combination of multiple user secrets causes random observations which can actually leak, e.g., both user identities and locations? These questions can

Figure 5.5: Websites without leakage

be asked and be examined in the future work, to explore the link between the irregular variation trends of guessing probabilities and the user privacy.

Case 2

Another typical case of leaking no information is when the maximum gap between the guessing probabilities is very small, regardless whether the variation trends are consistent or not.

Figure 5.5(2), in terms of website twitter.com, illustrates an extreme example of this case, where the guessing probabilities in each scenario are all 0.02 at each location in all the testing rounds. Moreover, we discover that the web traffic generated is totally identical in any situations, which is completely in no relation to user accounts or user locations.

Therefore, website twitter.com provides a perfect example of leaking no user information through the web traffic for indirect transitions. It presents the practicability of web traffic completely indistinguishable from user information, which makes a traffic analysis incapable of compromising user privacy.

Certainly, we can simply suppose that twitter.com did not request any information related to the users from Google website. However, it is still possible that during communications between Google and Twitter, the information related to user identities was transmitted. However, a state of the art mitigation against traffic analysis may be applied on either the web traffic or the user information, which caused the observations identical.

Hence the questions like “why does Twitter leak no user information”, “does it find a solution to mitigate side-channel leakages”, etc. are still attractive, which can be investigated in the future work.

On the other hand, through the communications within Twitter website, it is possible that user privacy can be leaked. For example, unique characteristics can be constructed from the posted messages, which are transmitted and which cause the web traffic depending on users during the communications within a social networking website. This kind of topic has been studied in the context of user privacy in social networking websites, e.g.

[121]. However, it is not the topic this thesis examines.

5.7 Discussion

This chapter starts an in-depth analysis of the side-channel leakage of user identities. It demonstrates a potential leaking threat which reveals user identities from Google accounts to external websites in the real world. Moreover, user location is also identifiable by traffic analysis in our experiments.

This chapter also analyses the effects of cookies and logged Google accounts on the web traffic varying depending on user identities. It is not a surprise to see that cookies leak user privacy, as it has been a well-known mean of web tracking. However, to our best knowledge, this research is the first study on cookies in side-channel leakages of user identities via traffic analysis.

On the other hand, one may come up with an argument like “is the increment of guessing probability of around 0.3 large enough to indicate that the user identity is leaked”. Moreover, questions like “is the sample space too small compared with millions of Google accounts”, “does the guessing probability reduce rapidly when the sample space expands” etc. are still open.

Opposite to the previous work which quantifies leakages by simulating real attacks, the experimental results in this chapter are obtained simply using the trained data. We lack the experimental validation of the leakages obtained by the analyses. In the future work, real-world attacks need to be mounted to measure the accuracy of guesses and compare with the vulnerability produced by the analyses.

Furthermore, when assessing whether a maximum gap between the guessing probabilities is large enough, we do not establish a threshold of the existence of leakage. Instead we subjectively determine whether the gap is large enough to indicate the leakage of user identities.

Even though these limitations exist, the purpose of this research is achieved. The purpose of our research is not to quantify the precise leakages in real-world web applications. Instead a primary aim of this chapter is to illustrate a new traffic-analysis threat concerning leaking user identities. The security threat is based upon user accounts on a website like Google, where a user identity can be leaked from communications with external websites.

Moreover, this chapter gives a null pattern for observations which do not follow the non-null patterns. Although the null pattern is weak to present traffic features, it is a beginning in the multiple mapping between transitions and observations, which is more common in real-world communications. In this sense it is an advance towards a more precise evaluation of traffic-analysis vulnerabilities in real-world web applications.

On the whole, this chapter demonstrates a new security threat in side-channel attacks in web applications. Instead of closing a case, this research, as believed, opens a new case towards the in-depth investigation of side-channel vulnerabilities in communications with external websites.

In this chapter, we review previous work including the following contexts:

6.1 Side-Channel Vulnerabilities

6.1.1 Overview

There is a long standing and large literature on side-channel vulnerabilities. It can date back on WWII, where Bell Labs uncovered by accident that the encryption machine emitted a spike each time the machine stepped. And these spikes could unbelievably divulge the plain text of a message being enciphered by the machine. The early history of how this phenomenon was discovered is introduced in [59].

From then on, side-channel vulnerabilities on encrypted communications have been examined in various domains, including encrypted Voice over IP (VoIP) conversations [123, 122], multimedia data streaming [105], keyboard acoustic emanations [23, 126] and cryptographic systems [73, 61].

For example, in encrypted VoIP conversations, techniques such as variable bit rate (VBR) are used to encode audio for saving bandwidth. Wright et al. [123] demonstrate that the lengths of encrypted compressed VoIP packets can be exploited to identify the language spoken in an encrypted conversation. Moreover, phrases spoken within an encrypted call can also be revealed by encrypted compressed packet lengths [122].

Keyboard acoustic emanations provide a possibility of revealing typed characters on keyboard-like input devices, differentiating the sounds emanated by different keys [23, 126]. Apart from acoustic emanations, keystrokes can also be revealed by timing attacks. Song et al. [110] infer key sequences through the time differences between each two keys pressed, as an individual IP packet is sent to the remote machine immediately after every key is pressed.

Timing attacks have also been exploited on cryptographic implementations. For example, [73] demonstrates how to obtain secret keys by measuring the amount of time required for operating cryptographic computations in, e.g. Diffie-Hellman [51], RSA

[101], and DSS [57] systems.

Especially, cache based side-channel attacks have been mounted on cryptographic implementations, e.g. [31, 21]. They break cryptosystems through exploring cache states of cache access mechanisms. For example, in time-driven cache attacks the total time needed of performing certain computations can be obtained to infer the number of cache hits and misses during encryptions [31].

Another important category of side-channel vulnerabilities is traffic analysis in web applications, which have arisen increasing public concerns over the past two decades. An adversary’s intention is to infer web users’ online activities, such as the websites they have visited.

In document Quantitative Information Flow of Side-Channel Leakages in Web Applications (Page 122-126)