The privacy preserving algorithm in this context can be abstracted as the composition of two functions A(R(·)), such that R maps a cell to a region (set of cells, i.e. an element of the power set of C) and A maps that region to a (sub)set of the matching POIs, i.e. R : C æ 2C and A : 2C æ 2P . The domain of the function A is determined by the range of
R, denoted as C1, C2, ..., Cmœ 2C . Correspondingly, let I1, I2, ..., Im be the POI (sub)sets
that are mapped to these regions by A. Without loss of generality, assume an arbitrary set Iu œ {I1, I2..., Im} that is generated as the interest set for a user located in region Cu.
Consider the case when Iu ”= It, t = 1, ..., u ≠ 1, u + 1, ...m. Given Iu, the attacker can
determine Cu. For all cells cj œ C/ u , we have 1(cj) = 0, then for all cells ci œ Cu , the
posterior probability is 1(ci) = P r(Iu|ci) 1(ci) P r(Iu) = P r(Iu|ci) 0(ci) q cjœCuP r(Iu|cj) 0(cj) . (4.3) 4.8.1 Assessments
We use a set of metrics to assess the privacy and the QoS of the LPPM. Following is a detailed explanation for each of these metrics.
Obfuscation
This technique has been applied to protect the location privacy in LBS in various other research (Section 2.3). It is based on hiding the user’s location within a geographical area of the site that contains the real location of the user, rather than accurately revealing the exact location when requesting the service. So, the user can have access to the information related to her location, while upholding the privacy. In order to achieve obfuscation, we should have a “large” number of cells (including the cell of the user) in the region Cu with
positive probabilities. This number may be specified as part of the privacy policy of the user. However, the user specified requirements cannot be achieved if the adversary has a prior probability knowledge that allows him to narrow the user’s location down to a smaller area. Hence, a precise statement (such as the 0 function) is necessary for the adversary’s
knowledge.
The adversary can sample one cell at a time (without replacement) based on the distri- bution . The expected number of cells that the adversary would sample before arriving at the user’s actual location creates an obfuscation area for the user. We call this the expected inexact privacy metric. It can be computed using the following closed form expression
expected_inexact( , cu) = ÿ c”=cu (c) (c) + (cu) . (4.4)
Expected inexact privacy can be viewed as the smallest obfuscation area one can expect if the attacker is successful in learning an approximate presence area using the sampling method. The area can be obtained by multiplying the metric’s value with the area of one cell. We will refer to this measure by the areal privacy metric. On the other hand, the expected exact privacy reflects the probability of not arriving at the user’s location in one single attempt. The expected estimation error of the adversary measures the average distance between the true location of the user and the location estimated by the adversary [143]. When is a uniform distribution over a subset C of cells, expected inexact privacy is equal to |C|≠1
expected inexact privacy is 511.5 cells. Say that the cell area is 0.01km2, then the areal
privacy of the user is 5.115km2.
Convergence
Based on the prior probability distribution of the attacker, the chances for the user being in some cells may be higher than the chances of the other cells within the region Cu. In fact,
the disclosure of the interest set Iu, which is part of the execution of the protocol proposed in
the Section 4.3, gives the attacker some additional information about the current position of the user. The attacker is now able to build a new posterior probability distribution based on Iu and his prior probability distribution (Equation 4.3). However, if the posterior
probability distribution of the attacker for the region Cu is directly proportional to the
prior probability distribution, then this will ensure that the set Iu has no contribution to
improving the attacker’s knowledge about the new location of the user within the boundaries of the region Cu. We refer to this as the convergence property of the algorithm. In Equation
4.3, convergence is achieved if P r(Iu|ci), for any ci œ Cu , is a constant. Therefore, the
probability of producing the output Iu should be equal for all cells in the region Cu.
The convergence condition is hard to satisfy for the case of multiple queries. So in order to measure the contribution of an LPPM in improving the attacker’s knowledge about the new location of the user, we use a metric called nearness privacy. This metric is based on the adversary’s best guess according to the distribution . Given a probability distribution over the cells in the grid and the user’s current location cu, nearness privacy is computed
as
nearness( , cu) = distance(maxc (c), cu). (4.5)
We use Euclidean distance in our evaluation. If multiple cells have the most probable value, then we pick the cell closest to cu as the adversary’s guess. Note that when is a
The quality of service
To achieve a high level of obfuscation, the region Cu must contain the largest possible
number of cells; in other words, a trivial solution is to always use the entire grid G in the query of the second round-trip. But, this may degrade the QoS by increasing the expected communication overhead (Steps 3 and 4 in Figure 4.3). Therefore, the privacy algorithm must choose the smallest possible set of POIs that covers the area of the region Cu. Ideally,
the sets I1, I2..., In should each have only K elements.