Proof of Correctness for Hypergraph Case

Optimization Methods

6.2. Proof of Correctness for Hypergraph Case

The method given in (Bar-Yehuda, 2001) is only applicable to regular graphs. We next prove the method is also applicable to hypergraphs. For a hyperedge e of {n1,n2..nk}, let u be the first node and v be the last node according to an ordering π.

To prove the value computed by formula (2) equals the hypergraph version of la(G) and, hence, DBW, it is sufficient to prove that for any hyperedge e, the cost computed by formula (2) equals w(e)*|π(u)−π(v)|.

Let the least common ancestor of all nodes of e in the decomposition tree be

t0, then all the nodes involved in computing the costs regarding e are within t0. Thus

we do not need examining the nodes that are outside of t0 in the proof. In Fig. 6-4., let

the right-most node in t0’s left child ( ) be x and the left-most node in t0’s right child

( t ) be y. Clearly π(y) - π(x)=1. Suppose the nodes on the path from t1 to u are L1,

ˆt

…Lk-2, Lk-1, Lk, and the sizes of the right sub-trees of the trees having L1, …Lk-2, Lk-1, Lk as the root nodes are p1, …pk-2, pk-1, pk, respectively, we have p1+p2+…+pk= π(x)-

π(u) since they are the number of the nodes between node u and node x. Note that L1

is the root node of t1.

t2 t1 p1 t0 pk pk-1 pk-2 Right Cuts u L1 v y x u Lk-2 x y v

Fig. 6-4. The BDT Structure in an Ordering Sequence for a Hyperedge

Now let us expand formula (2) completely and examine which terms involve

w(e). According to the method used for deriving formula (2), w(e) appears in the

following cases: appear once as the tree cost when the sub-tree is a leaf node and labeled as v, appear in the right outer cuts from (L) to t (R) and appear as the left outer cuts from t (R) to t (L). We examine the left outer cuts first.

1 ˆt ˆ₂ 2 ˆ 1 ˆ

cost(t )ˆ =cost( )+cost(ˆt₁ tˆ₂) +|V(t2)|.cost(right_cut(tˆ1)+|V(t1)|.cost(left_cut(ˆt2))

only cost( t ) and cost(left_cut(v)) can contribute to the left outer cuts of node v. Cost( t ) can contribute to the left outer cuts of node v because when it is computed recursively another level down, the left outer cuts of node v with regard to e will appear. Since we are only concerning costs with regard to e, left_cut( ) with regard to e is w(e). Since |V(t1)| =L1 at root level of t1, |V(t1)|.cost(left_cut( t )) = L1 * w(e).

Continue the recursion process till leaf node u is reached and the total left outer cuts with regard to e is as follows:

1 ˆ 1 ˆ 2 ˆt 2 ˆ

Similarly we can prove that the right outer cut of is [π(v) - π(y)]*w(e). Recall that w(e) will appear once as the tree (leaf node) cost, thus the total cost with respect to w(e) is:

ˆt | Lk|*w(e) +| Lk-1|* w(e) +…+ | L1|* w(e)

=pk* w(e) + pk-1* w(e) +…+ p1* w(e)

=( p1+p2+…+pk)* w(e)

= [π(x)- π(u)]* w(e)

[π(x)- π(u)]* w(e)+ [π(v)- π(y)]* w(e)+w(e) =[π(v)- π(u)]* w(e)+[ π(x)+1-π(y)]* w(e) =[π(v)- π(u)]* w(e)

Since for any hyperedge e, the cost computed using formula (2) equals | ) ( ) ( | )* (e u v

hypergraph having the set of the hyperedges E is exactly the definition of la(G) for a hypergraph, i.e.,

∑

∈ − = E e k k n n n n n n HG la( ) max{π( ₁),π( ₂),...π( )} min{π( ₁),π( ₂),...π( )}

where e={n1, n2, …nk}, which is the same as our cost model. Thus we can

apply the algorithm proposed in (Bar-Yehuda, 2001) for optimizing broadcast sequencing.

6.3. Generating BDT

A BDT can be generated from an arbitrary ordering sequence. However, the possible number of orderings using a BDT is reduced from n! to 2n-1 which means that some of the orderings are not possible under certain decompositions, thus the global optimal ordering might be missed. A good decomposition will lead to good ordering which is shown through the following simple example.

In Fig 6-5, for the example access graph, where the numbers inside rectangles are the weights of the corresponding edges, the four possible orderings of decomposition #1 is {{0,1,2}, {0,2,1}, {1,2,0}, {2,1,0}} and the corresponding total access time costs are {13,11,11,13}. The four possible orderings of decomposition #2 is {{1,2,0}, {1,0,2}, {2,0,1}, {0,2,1}} and the corresponding costs are {11,12,12,11}. The four possible orderings for decomposition #3 is {{2,1,0}, {2,0,1}, 1,0,2}, {0,1,2}} and the corresponding costs are {13,12,12,13}. Although it is possible for decomposition #1 and decomposition #2 to obtain the globally optimal solution, it is simply not possible for decomposition #3 due to its bad binary decomposition. Thus

(d) (c) (b) (a) 0 1 2 0 2 1 2 1 0 3 4 2 1 2 0

Fig. 6-5. (a) The Access Graph (b) Decomposition #1 (c) Decomposition #2 (d) Decomposition #3

In the example, the minimum cost (one of the orderings in decomposition #1 and #2) is obtained when node 0 and node 2 are placed next to each other. Thus to reduce the total access time, an intuitive idea would be to cluster data items that have larger edge weight into the same sub-tree. Actually we can prove the following general case.

For three data items i, j and k, without loss of generality, we assume their edge weights wi,j>wj,k >wi,k, L1 is the interval between the first and the second item and L2

is the interval between the second and the third item, then the order of (i,j,k) or (k,j,i) has the smallest cost among all possible six orderings.

Proof: L2 L1 (k,i,j) _k _i j L2 L1 L2 L1 (j,k,i) _j _k i (k,j,i) k j i L2 L1 (j,i,k) _j _i k L2 L1 L2 L1 (i,k,j) _i _k j (i,j,k) _i _j k

Cost(i,j,k)= wi,j *L1 + wj,k *L2+ wi,k *(L1+L2)

Cost(i,k,j)= wi,k *L1+wk,,j* L2+ wi,j *(L1+L2) Cost(j,i,k)=wj,i* L1+ wi,k * L2+ wj,k *(L1+L2) Then

Cost(i,j,k) - Cost(i,k,j)=- wi,j *L2+wi,k*L2=(wi,k- wi,j)*L2<0

Cost(i,j,k) - Cost(j,i,k)=- wj,k*L1+wi,k*L1=(wi,k-wj,k)*L1<0

Thus the order of (i,j,k) has the smallest cost among (i,j,k), (i,k,j) and (j,i,k). Similarly,

Cost(k,j,i)=wk,j*L1 +wj,i*L2+wk,i*(L1+L2)

Cost(j,k,i)=wj,k*L1+wk,i* L2+wj,i*(L1+L2)

Cost(k,i,j)=wk,i* L1+wi,j* L2+wk,j*(L1+L2)

Then

Cost(k,j,i) - Cost(j,k,i)=- wj,i*L1+wk,i*L1=(wk,i-wj,i)*L1<0

Cost(k,j,i) - Cost(k,i,j)=- wk,j*L2+wk,i*L2=(wk,i-wk,j)*L1<0

Thus the order of (k,j,i) has the smallest cost among (k,j,i), (j,k,i) and (k,i,j). On the other hand,

Cost(i,j,k)- Cost(k,j,i)=wi,j*(L1-L2)-wj,k* (L1-L2)=(wi,j-wj,k)*(L1-L2)

The relationship between Cost(i,j,k) and Cost(k,j,i) also depends on the relationship between L1 and L2. Nevertheless we can draw the conclusion that the

order of (i,j,k) or (k,j,i) has the smallest cost among all six possible orderings. In either case, data items i and j are placed next to each other.

Spatial index trees, such as the R-Tree (Guttman, 1984), R+-Tree (Sellis, 1987) and R*-Tree (Beckmann, 1990), are designed to put data items spatially close to each other into the same branch while put data items spatially far away from each other into different branches. Based on our hypergraph representation, the extended regions of the points represented by the nodes of a hyperedge generally have a larger portion of overlap in the case where these nodes are from the same branch than in the case where these nodes are from different branches. In other words, the weight of a hyperedge whose nodes are from the same branch is generally larger than the weight of a hyperedge whose nodes are from different branches in spatial range queries. Thus tree-based spatial index methods are good candidates to generate a BDT for point data. We use R-Tree to generate a BDT in this study due to its popularity in spatial databases for geographical data. We replace an m-branches R-tree node with a small binary tree and connect all such small binary trees to build the BDT. An illustration is given in Fig. 6-6.

Algorithm RTreeToBDT Input:

r_root: the root of an R-Tree

m: the number of the children of r_root

Output:

b_root: the root of the built BDT

Put the children of r_root in array seq

b_root=GenBDT(seq,0,m)

Algorithm GenBDT Input:

seq: An array of the children node of an R-Tree first: The beginning position to build a BDT in seq last: The ending position to build a BDT in seq

Output:

root: The pointer to the root of the BDT being built Allocate memory for root

If first equals last

If(seq[first]) is a non-leaf R-Tree node Mark root as the intermediate node

Let new_root be the pointer of a BDT node Let num be the number of children of seq[first]

new_root =RTreeToBDT(seq[first],num)

Set the first child of root to the first child of new_root Set the second child of root to the second child of new_root else

Set the ID of root to the ID of seq[first]

Set the two children of root to NULL Else

Mark root as the intermediate node

middle=(first+last)/2

Set the first child of root to the result of GenBDT(seq, first, middle) Set the second child of root to the result of GenBDT(seq, middle+1,last) Return(root)

Fig. 6-7. The Process of Generating a BDT From an R-Tree

The process of generating an BDT from an R-Tree is presented in Fig. 6-7. We begin with the root of the R-Tree and divide the immediate nodes of the root into two groups recursively to build a small binary tree. The root of the small binary tree will be the root of the BDT. This process is performed recursively until the leaf nodes

and each R-Tree node is processed exactly once, we claim that the complexity of the algorithm is linear with respect to the number of nodes in the R-Tree. The proof is similar to the proof of linearity of the tree traversal problem as shown in (Cormen, 2001).

In document Sequencing geographical data for efficient query processing on air in mobile computing. (Page 113-121)