kNN Authentication in Native Space - Query Authentication and processing on outsourced database

In this section, we present our solution for authenticating kNN queries in the native space. With the VR-tree, the verification process involves two steps: a) we need to verify that none of the valid partitions have been missed; b) for partitions that should be checked, none of the valid points have been missed.

Verifying that the query answer covers all the candidate partitions is straightforward for a known hyper-sphere. It comprises the following two phases:

• In the first phase, we need to identify the list of candidate partitions. A partition is

a candidate if the range of the ordering dimension of its MBR overlaps the range of the ordering dimension of the tightest hyper-cube that bounds the hyper-sphere. In our example, only partitions R1, R2 and R3 are candidate partitions.

• In the second phase, some of these candidate partitions can be further pruned. The

ones to be pruned are those whose minimum distance to Pcis larger than the hyper-

sphere radius r (since no points inside these partitions will ever be nearer than the

kth NN answer point. This can be easily verified since the bounding points of each

partition are known. In our example, R1 and R3 are further pruned.

For each remaining candidate partition P , there are 3 possible relationships between

P and the hyper-sphere (denoted H(Pc, r)) centered at Pcwith radius r.

1. H(Pc, r) contains P . In this case, we return all the points in P , i.e., the pub-

lisher returns p0 to pn+1 and n, together with the respective signatures sig(P0) to

sig(Pn+1) and sig(P ). The user would compute the digests for both the points and

the partition to verify the result.

2. P contains H(Pc, r). Let Pi = (xi1, xi2, ..., xid), and Pc = (o1, o2, ..., od). Let

also centered at point Pc, and the length of each edge l = 2r. Let H′(Pc, r)’s

bounding points be (hl1, hl2, ..., hld) and (hu1, hu2, ..., hud). Thus, hui = oi+ r and

hli = oi− r for all i ∈ [1, d]. The data points in P can be separated into

(a) pα, pα+1, ..., pβ−1, pβ, such that xi1 ∈ [hl1, hu1] for α≤ i ≤ β.

These points can be further categorized into answer points (A) and false positives (F). For each answer point pi ∈ A, dist(Pc, Pi) ≤ r, and for

each false positive pi ∈ F, dist(Pc, Pi) > r. Furthermore, there are two

types of false positive points. In the first type, denoted Fa, for each pi ∈

Fa, ∃z, xiz ∈ [h/ lz, huz]. In the second type, denotedFb, for each pi ∈ Fb,

∀z, xiz ∈ [hlz, huz]. Note that Fa corresponds to points outside the hyper-

cube, whileFb are points inside the hyper-cube but outside the hyper-sphere.

Let us use the data space in Figure 4.1 as an example of a partition containing the hyper-sphere. Here, we haveA = {r5, r8, r9}, Fa = {r6, r7} and Fb =

{r4}.

(b) p1, ...pα−1, pβ+1, ...pk, which are clearly not answer points. Referring to Fig-

ure 4.1, these points are r1to r3and r10to r20.

For data points from different categories, the publisher returns different sets of verification objects.

(a) For each point pi ∈ A, the publisher returns pi and sig(pi).

(b) The publisher also returns p0, pn+1, sig(p0) and sig(pn+1), and sig(P ).

(c) For each point pi ∈ Fa∪ Fb∪ {pα−1, pβ+1}, the publisher finds a reference

point S = (S1, S2, ..., Sd) on the surface of the hyper-sphere1, such that, if

xiz < oz, Sz ∈ (xiz, oz), else if xiz > oz, Sz ∈ (oz, xiz).

1_{We do not require the point to be on the surface. All that is needed is to find a point that is outside}

of the hypersphere that is closer to the query point than the point to be hidden. However, for ease of presentation, we shall refer to the reference point as a point on the surface.

We note that the same S point could be used as a reference point for multiple

pis as long as the above conditions hold. For simplicity, we pick the point

closest to the sphere’s surface on the line joining Pc and pi. Among these

points, we then eliminate “redundant” reference points.

After an S point is chosen for each pi ∈ Fb, we could simply verify that

dist(Pc, pi) > dist(Pc, S)≥ r.

The publisher then returns several pieces of information together with the detailed information of point S:

i. if xiz < Sz, hSz−xiz−1(xiz) and hxiz−Lz−1(xiz) are returned.

ii. if xiz > Sz, hUz−xiz−1(xiz) and hxiz−Sz−1(xiz) are returned.

With the above information, the user can compute g(pi) without knowing the ac-

tual value of pi.

• if xiz < Sz, the user applies h on hSz−xiz−1(xiz) an additional (Uz − Sz)

times to get hUz−xiz−1_(x

iz).

• if xiz > Sz, the user applies h on hxiz−Sz−1(xiz) an additional (Sz − Lz)

times to get hxiz−Lz−1_(x

iz).

• The user computes g(pi) using Equation 3.2.

Consider Figure 4.1 again as our example where P contains H(Pc, r). We could

see that the point r7is outside the hyper-cube, which means that r7is not an answer.

Instead of just returning the value of r7, the publisher picks a reference point W

near the circle, where W.x > r7.x and W.y < r7.y. Then (part of the information)

the server returns: for query answers{r8, r9}, it returns r8, r9, sig(r8), and sig(r9);

and hr7.y−W.y−1_(r

7.y). Here, L and U denote the two bounding points of the par-

tition. With these, the user can determine hU.x−r7.x−1_(r

7.x) and hr7.y−L.y−1(r7.y),

and compute the digest of r7. (S)he can then further verify that r8 is an answer

point.

3. P overlaps H(Pc, r). This case can be handled by splitting P into two parts: one

overlaps H′(Pc, r) (the hyper-cube of H(Pc, r)), and the other does not overlap

H′(Pc, r) (which means it does not overlap H(Pc, r)). For the first part, we handle

it in the same manner as case (2) above. For the second part, it can be dropped (except to verify that its points are outside H′(Pc, r)). As such, we shall not go

into the details of this case.

In the above discussion, we have assumed only one layer of partitioning. We can easily extend the scheme to work with the VR-tree. All that is needed is to verify that no internal nodes are tampered with and dropped unnecessarily. This can be done as described above since the internal nodes are also signature chained.

In document Query Authentication and processing on outsourced databases (Page 51-54)