7.1.1 Overview
This research conducts an extensive study of analysing side-channel vulnerabilities in web applications through traffic analysis. Test case generation is brought in to advance on a fully automated side-channel analysis in web applications. This research aims to assist developers in detecting side-channel vulnerabilities in their web applications.
We examine side-channel vulnerabilities of user privacy include user web activities and user identities. It also coarsely involves user locations. A wider range of communica- tions are analysed, including those implicitly transmitting sensitive formation and those interacting with external websites.
Moreover, the leaking factors which cause side-channel vulnerabilities through traffic analysis are also analysed. We discover that cookies and logged user accounts can be the sources which lead to varying web traffic depending on user identities.
More specifically, we summarise the thesis from the practical and the theoretical sides as follows.
7.1.2 From the Practical Side
On the practical side, this thesis proposes the implementations for analysing Struts-based Java applications and real-world web applications.
Our earliest work proposes a framework for automating the analysis of side-channel vulnerabilities in Struts-based web applications. A main advance of this work is the automated generation of test cases for web applications. We use a novel approach to achieve the automation of test case generation, by combining static analysis with symbolic execution. Test cases are generated in terms of examined secrets and then the leakages are evaluated.
The techniques are implemented into a tool–SideAuto. SideAuto is then evaluated over six real-world or simulated web applications. Our study shows that the system works
well and effectively on these web applications with acceptable overheads. The details are present in Chapter 3.
Next our focus is turned to real-world web applications, which are not limited to the Struts framework.
A black-box analysis is proposed to detect which transitions are vulnerable to leak user privacy. An automated algorithm is developed to generate test cases automatically. Individual transitions are examined, including those explicitly and implicitly involving sensitive information. The analysis is applied to four real-world web applications.
The experimental results show that transitions which appear to have no relation to user sensitive information can, actually, reveal more user secrets than those in explicit relation to user sensitive information. Moreover, the experiments on Google website demonstrate that the user identities can be largely leaked from Google accounts, through transitions implicitly interacting with sensitive information. The details are described in Chapter 4.
Inspired by the surprising result of large leaks of user identities from Google accounts, we then conduct an in-depth study of the leakage of user identities on Google user accounts.
We examine the leaks of user identities from fifty Google user accounts through com- munications with Alexa Top 150 websites. Experimental results show that user identities can be revealed through communications between Google website and external websites. Moreover, it is shown that user locations may also be leaked through traffic analysis.
Furthermore, four testing scenarios are designed to explore the leaking sources. We discover that cookies and logged user accounts can be the factors which cause web traffic to leak user identities. More details are presented in Chapter 5.
7.1.3 From the Theoretical Side
On the theoretical side, we evaluate leakages using quantitative information flow. Shan- non entropy, min entropy and conditional entropy are used to quantify leaks in terms of the uncertainty of user privacy in Struts-based web applications in Chapter 3. And the guessing probability is also used to evaluate the probability of correctly guessing a secret in one try in Chapters 4 and 5.
When constructing a most likely sequence, we develop an approach motivated by hidden Markov models to find the solution which best explains the web traffic collected. Then the distance between an observation and the traffic pattern is measured based on an optimised Damerau-Levenshtein distance with super transpositions and shifts. We then calculate the probabilities of observations from transitions to build a probability distribution in terms of the most likely sequences.
Moreover, we propose an advanced fingerprinting model of “one → many” mapping where a single transition associates to at least one traffic patterns. Compared with a “one → one” mapping, this mapping is closer to the real-world cases, as a transition may generate varying web traffic. Hence the “one → many” mapping considers a bigger range of web traffic analysed. It opens a new research direction in analysing web traffic.
sequences using a novel method. However, we do not validate that these most likely sequences are same as those generated using a standard approach. This means that we lack the guarantee that the most likely sequences generated are correct.
On the other hand, for the leakages evaluated in the experiments, we also lack val- idations of the results produced by the analyses presented. More precisely, we do not validate that the accuracy of the leakages obtained. We do not perform real attacks on the web applications, to justify that the leakages of user privacy are really as much as the data produced by the analyses.
Without validations, one may ask questions such as “how can you say that the re- sults of the analyses are accurate”, “how can you say that the web applications really leak user privacy”, etc. Although this research aims to discover possible side-channel vulnerabilities, instead of estimating precise leakages, the errors of results may lead the developers to a wrong way, where some important vulnerabilities are missed but some insignificant vulnerabilities are investigated. Therefore, future work needs to overcome these limitations.