6.5 Evaluation
6.5.3 Expected privacy and overhead
We consider three different forms for the adversary’s prior probability distribution – (i) no knowledge–uniform-global: equal probability throughout the 320 ◊ 320 cells, (ii) locality knowledge–uniform-local: equal probability in a circle of 25 cells radius, zero everywhere else, and (iii) precise knowledge–gaussian: normally distributed probabilities; mean cell at the center and variance of 50 cells. Figure 6.4 depicts the uniform-local and the gaussian knowledge distributions. We assume that the attacker’s background knowledge is always correct, i.e. the user can never be at a cell where the attacker’s prior probability is zero.
Figure 6.5 shows the expected privacy achieved for different coarse grid sizes. The data points in each line correspond to ˆZ = 80, 40, 20, 10, 8 and 5, from left to right. We choose
34.0522o N 118.2428o W
5000 meters
Figure 6.4: Uniform-local (right) and gaussian (left) background knowledge. Darker cells imply higher probability.
three different keywords—starbucks coffee (92 POIs), gas station (347 POIs) and bakery (834 POIs)—to evaluate the quantities in the case of low, medium and high density POIs.
Based on empirical data, it is reasonable to say that the expected exact privacy for all three POI densities is much above levels of concern, greater than 90% in this case. This implies that it will be difficult for the adversary to accurately make a random “guess” about the user’s location using the posterior distribution, even if precise information (Gaussian knowledge adversary) on the whereabouts of the user is available to the adversary.
The expected privacy under inexact localization depends primarily on the extent of background knowledge. As expected, the uncertainty about the user’s location is signifi- cantly less when the adversary has more precise knowledge. Larger values of ˆZ help improve the expected privacy to some extent. Note that, for lower values of ˆZ (larger sub-grids), the privacy level we observe (high or low) is primarily due to the prior knowledge of the adversary. Larger sub-grids will encompass most of the locality where the adversary’s prior knowledge is concentrated. Since the convergence requirement is enforced, the expected privacy value from the posterior distribution will be the same as that from the prior distri- bution.
The expected interest set size shows some variations across different knowledge forms and POI densities. In general, smaller values of ˆZ results in more cells in a region; therefore, more number of top-K sets are merged to create the interest set. The denser the POI, the higher is the number of unique top-K sets. The set sizes are larger for the uniform-local
10 15 20 25 30 0 500 1000 1500 2000
expected size of interest set
e xpected pr iv acy (ine xact) starbucks coffee gas station bakery uniform−global 10 20 30 40 50 60 0 200 400 600 800
expected size of interest set
e xpected pr iv acy (ine xact) uniform−local 10 20 30 40 50 60 0 100 200 300 400
expected size of interest set
e xpected pr iv acy (ine xact) gaussian e xpected pr iv acy (e xact) 0.94 0.96 0.98 1.00
uniform−global uniform−local gaussian
Figure 6.5: Expected privacy and expected interest set size trade-off. Data points in each line correspond to ˆZ = 80, 40, 20, 10, 8 and 5, from left to right.
and gaussian knowledge adversaries; since the user is highly likely to be located in central downtown, the variations in the top-K POIs are also expected to be the most (due to the higher concentration of POIs).
The solid data points in the first three plots signify the case of ˆZ = 10 (output regions of the R function are 32 ◊ 32 cells). Irrespective of the general trends, use of this value results in expected privacy levels of at least 2 km2 (200 cells) and interest sets of around
30. An area of 2 km2 is equivalent to around 1000 homes with 22, 000 ft2 lots, which we
consider to be a significantly large area for a privacy conscious user. The expected interest set size is larger than what is necessary, but may prove to be useful if the user does not find an acceptable choice in the top-10 results. Retrieval of detailed feature data for 30 POIs
is also not expensive considering that most current applications already retrieve more than that (Google Places allows retrieval of data on up to 60 POIs in a query).
6.6 Summary
In this chapter, we proposed our first algorithm that fits into the two-roundtrip archi- tecture. We used the realistic ranking method that considers the prominence of the POIs in addition to the distance from the query point. We brought together several techniques to ensure that the LPPM is efficient enough for a client side implementation. We implement these algorithms on realistic simulators and actual mobile devices to ensure that the cost of quality of service drop is minor.
Chapter 7
Multiple Query Scenario
Pursuing our model from the previous chapter, the user is located in box Bu which
consists of b ◊ b cells and the interest set Iu is the union of the top-K POIs of each cell
in Bu. As long as multiple queries by the user happen when the user is in the same box,
the attacker’s knowledge of the user’s location will not be enhanced. The coarseness of the attacker’s estimation of user’s location remains b ◊ b cells. Consider the case where the user moves form one box to another between two consecutive queries and the time interval between these two queries is larger than the time needed by the user to reach the farthest cell in AR from the current cell. In this case, the attacker is still not able to enhance his
knowledge about the user’s location. This is because, there is enough time for the user to move to any cell in the grid G. The work in this chapter is based on our outcome in [58,60]. Now we consider the converse scenario, where the time interval between the two queries is less than the time needed by the user to go from the current cell to the farthest possible cell in grid G. In this case, the attacker may be able to narrow down the user’s location to an area less than b ◊ b cells. This is illustrated in Figure 7.1. Let T be the time period needed by the user to move by one cell, assumed to be constant. It shows two adjacent boxes A and B, where b = 4. Assume that the time interval between two queries, t, is less than or equal to the time required to move by one cell (0 < t Æ T). Let us further assume that the user was in one of the cells in box A that borders with box B. The user moves by
one cell into box B and then issues her second query. This clearly allows the attacker to narrow down the user’s location to the boundary cells (shaded) of the box B because he sees the first query’s interest set matches with box A and the second query’s interest set matches with box B, and the user had only enough time to move by one cell. The attacker is able to narrow down the user’s location to an area much smaller than b ◊ b, thus clearly breaching the user’s privacy requirement. This shows that obfuscation is not guaranteed in the case of multiple queries happening across boxes: this is the main problem this chapter addresses. The techniques proposed for single query scenario need to be enhanced to preserve the privacy of the user to the expected b ◊ b level in the case of multiple queries.
Figure 7.1: Location inference during multiple queries.