5.6 Experimental Results
5.6.3 Leaking User Identities
We assume that the web traffic for an indirect transition examined in a testing scenario can fingerprint the user when it is consistent regardless of the testing locations, testing machines, testing time, etc. In other words, similar guessing probabilities will be obtained among different testing rounds and different locations.
This chapter considers, therefore, that indirect transitions for a website leak user identities if
(1) the guessing probabilities generated at different locations are consistent. The five variation trends of the guessing probabilities in four scenarios, each generated from a location, generally follow a consistent variation pattern; and
(2) the highest gap between two guessing probabilities among any two scenarios should be large enough.
Condition (1) considers the websites with consistent leakage, regardless of the external factors such as locations.
For condition (2), currently we only examine the leakage when the largest gap between guessing probabilities is originated from scenarios 1 and 4, i.e. the highest is from scenario 1 and the lowest from scenario 4. Leakages in other cases are left for future investigation. Analysing Variation Trends of Guessing Probabilities
To ensure the variation trends of guessing probabilities from different locations are con- sistent, we execute test cases for each website at least two rounds at each location, except the testing sites in Berlin and Neuchatel. This is because that the two testing sites are unavailable after a round of testing for all the websites.
Next we describe the process of determining which websites for which the indirect transitions generate consistent variation trends of guessing probabilities among locations.
1. First two testing rounds
In the beginning, test cases for each website were executed two rounds at each location, excluding the testing sites in Berlin and Neuchatel, where only one testing round was performed.
After two rounds of testing, for each website, eight variation trends of guessing proba- bilities in four scenarios were generated, five from the five locations in the first round and the other three from the second round.
If more than half of the variation trends, i.e. at least five trends are generally consistent with each other, the average variation trend of the average guessing probabilities among the consistent variation trends is regarded as a variation pattern of guessing probabilities for the website.
If both of the two variation trends generated in a location are consistent with the variation pattern, the location is considered as a consistent location. For the only one variation trend generated in Berlin or Neuchatel, if it follows the variation pattern, then the location is also regarded as the consistent location.
On the other hand, for a location from which at least one variation trend is inconsistent with the variation pattern, test cases in terms of the website will be executed in the next testing round at the location.
If all the five locations are regarded as consistent locations, i.e. all the eight variation trends are consistent, this website is considered generating consistent variation trends between locations, and the testing for this website is stopped.
On the contrary, when at most half of the variation trends, i.e. four variation trends are consistent, it is supposed that the variation trends of guessing probabilities are unstable. Then the leakage of the user identity with regard to this website will not be analysed.
2. More testing rounds
From the third testing round, as mentioned, test cases are only executed in the loca- tions which are not considered as consistent locations. At the end of a testing round,
for each location, if all the variation trends generated in the location, including those generated from previous testing rounds, are inconsistent with the variation pattern, this location is regarded as an inconsistent location.
On the contrary, if more than half of the variation trends generated from a testing site are consistent with the variation pattern, the location is now regarded as a consistent location. Otherwise, test cases for the website will be tested in the next round at this location.
The testing at a location is repeated until (1) the location is deemed as the consistent location, i.e. more than half of the variation trends generated in this location are consistent with the variation pattern, or (2) the testing for the website is stopped. 3. Stopping testing
At the end of each testing round, except the first two round, if at least three locations are accepted as consistent locations, the website is regarded as generating consistent guessing probabilities. Then the testing on the website is stopped.
In contrast, if at least three locations are considered as inconsistent locations, the guessing probabilities for this website are inconsistent. Then the testing for the website is stopped and the leakage of the user identity from the indirect transitions with this website is not analysed.
Moreover, the testing for the website is stopped when the maximum number of testing rounds is reached. In this research, test cases for a website were tested at most five rounds. This choice of loop bounds is decided from the experiments, as the variation trends tend to be consistent when the looping time arrives to five.
After examining the 27 websites, we obtained top six websites for which the guessing probabilities are consistent among locations, and the maximum gaps between guessing probabilities are large enough. Next section we analyse the variation trends for the six websites.