In this section, we present our solution for authenticating kNN queries in the native space. With the VR-tree, the verification process involves two steps: a) we need to verify that none of the valid partitions have been missed; b) for partitions that should be checked, none of the valid points have been missed.
Verifying that the query answer covers all the candidate partitions is straightforward for a known hyper-sphere. It comprises the following two phases:
• In the first phase, we need to identify the list of candidate partitions. A partition is
a candidate if the range of the ordering dimension of its MBR overlaps the range of the ordering dimension of the tightest hyper-cube that bounds the hyper-sphere. In our example, only partitions R1, R2 and R3 are candidate partitions.
• In the second phase, some of these candidate partitions can be further pruned. The
ones to be pruned are those whose minimum distance to Pcis larger than the hyper-
sphere radius r (since no points inside these partitions will ever be nearer than the
kth NN answer point. This can be easily verified since the bounding points of each
partition are known. In our example, R1 and R3 are further pruned.
For each remaining candidate partition P , there are 3 possible relationships between
P and the hyper-sphere (denoted H(Pc, r)) centered at Pcwith radius r.
1. H(Pc, r) contains P . In this case, we return all the points in P , i.e., the pub-
lisher returns p0 to pn+1 and n, together with the respective signatures sig(P0) to
sig(Pn+1) and sig(P ). The user would compute the digests for both the points and
the partition to verify the result.
2. P contains H(Pc, r). Let Pi = (xi1, xi2, ..., xid), and Pc = (o1, o2, ..., od). Let
44
also centered at point Pc, and the length of each edge l = 2r. Let H′(Pc, r)’s
bounding points be (hl1, hl2, ..., hld) and (hu1, hu2, ..., hud). Thus, hui = oi+ r and
hli = oi− r for all i ∈ [1, d]. The data points in P can be separated into
(a) pα, pα+1, ..., pβ−1, pβ, such that xi1 ∈ [hl1, hu1] for α≤ i ≤ β.
These points can be further categorized into answer points (A) and false positives (F). For each answer point pi ∈ A, dist(Pc, Pi) ≤ r, and for
each false positive pi ∈ F, dist(Pc, Pi) > r. Furthermore, there are two
types of false positive points. In the first type, denoted Fa, for each pi ∈
Fa, ∃z, xiz ∈ [h/ lz, huz]. In the second type, denotedFb, for each pi ∈ Fb,
∀z, xiz ∈ [hlz, huz]. Note that Fa corresponds to points outside the hyper-
cube, whileFb are points inside the hyper-cube but outside the hyper-sphere.
Let us use the data space in Figure 4.1 as an example of a partition containing the hyper-sphere. Here, we haveA = {r5, r8, r9}, Fa = {r6, r7} and Fb =
{r4}.
(b) p1, ...pα−1, pβ+1, ...pk, which are clearly not answer points. Referring to Fig-
ure 4.1, these points are r1to r3and r10to r20.
For data points from different categories, the publisher returns different sets of verification objects.
(a) For each point pi ∈ A, the publisher returns pi and sig(pi).
(b) The publisher also returns p0, pn+1, sig(p0) and sig(pn+1), and sig(P ).
(c) For each point pi ∈ Fa∪ Fb∪ {pα−1, pβ+1}, the publisher finds a reference
point S = (S1, S2, ..., Sd) on the surface of the hyper-sphere1, such that, if
xiz < oz, Sz ∈ (xiz, oz), else if xiz > oz, Sz ∈ (oz, xiz).
1We do not require the point to be on the surface. All that is needed is to find a point that is outside
of the hypersphere that is closer to the query point than the point to be hidden. However, for ease of presentation, we shall refer to the reference point as a point on the surface.
We note that the same S point could be used as a reference point for multiple
pis as long as the above conditions hold. For simplicity, we pick the point
closest to the sphere’s surface on the line joining Pc and pi. Among these
points, we then eliminate “redundant” reference points.
After an S point is chosen for each pi ∈ Fb, we could simply verify that
dist(Pc, pi) > dist(Pc, S)≥ r.
The publisher then returns several pieces of information together with the detailed information of point S:
i. if xiz < Sz, hSz−xiz−1(xiz) and hxiz−Lz−1(xiz) are returned.
ii. if xiz > Sz, hUz−xiz−1(xiz) and hxiz−Sz−1(xiz) are returned.
With the above information, the user can compute g(pi) without knowing the ac-
tual value of pi.
• if xiz < Sz, the user applies h on hSz−xiz−1(xiz) an additional (Uz − Sz)
times to get hUz−xiz−1(x
iz).
• if xiz > Sz, the user applies h on hxiz−Sz−1(xiz) an additional (Sz − Lz)
times to get hxiz−Lz−1(x
iz).
• The user computes g(pi) using Equation 3.2.
Consider Figure 4.1 again as our example where P contains H(Pc, r). We could
see that the point r7is outside the hyper-cube, which means that r7is not an answer.
Instead of just returning the value of r7, the publisher picks a reference point W
near the circle, where W.x > r7.x and W.y < r7.y. Then (part of the information)
the server returns: for query answers{r8, r9}, it returns r8, r9, sig(r8), and sig(r9);
46
and hr7.y−W.y−1(r
7.y). Here, L and U denote the two bounding points of the par-
tition. With these, the user can determine hU.x−r7.x−1(r
7.x) and hr7.y−L.y−1(r7.y),
and compute the digest of r7. (S)he can then further verify that r8 is an answer
point.
3. P overlaps H(Pc, r). This case can be handled by splitting P into two parts: one
overlaps H′(Pc, r) (the hyper-cube of H(Pc, r)), and the other does not overlap
H′(Pc, r) (which means it does not overlap H(Pc, r)). For the first part, we handle
it in the same manner as case (2) above. For the second part, it can be dropped (except to verify that its points are outside H′(Pc, r)). As such, we shall not go
into the details of this case.
In the above discussion, we have assumed only one layer of partitioning. We can easily extend the scheme to work with the VR-tree. All that is needed is to verify that no internal nodes are tampered with and dropped unnecessarily. This can be done as described above since the internal nodes are also signature chained.