Optimizing DBW - Optimization Methods - Sequencing geographical data for efficient query proces

Optimization Methods

6.4 Optimizing DBW

Due to the similarity between DBW and la(G) as we discussed in Section 4.3 of Chapter 4, we can use the MinLA method proposed in (Bar-Yehuda, 2001). Although the paper only handles the graph MinLA problem, its implementation can handle the hypergraph MinLA problem, however, with two restrictions. The first restriction is that it requires that there are at least two nodes in a hyperedge while hyperedges (query result sets) in our representation might only include one node (either a point in a point data set or a vertex in a graph data set). The second restriction is that the implementation assumes that all hyperedges have unit weight. Since the DBW for accessing a single data item is always 0, the first restriction is not a problem. We also modify the implementation to allow hyperedges to have different weights.

In the implementation, for each of the node in the BDT, there is a pointer to its parent and two pointers to its two children. An orientation flag is also associated with each of the node in the BDT. The parent pointer of the root of BDT is empty and the two children pointers of a leaf node of BDT are also empty. We next briefly introduce the Least Common Ancestor Tree (LCA-Tree) data structure used in the

implementation to efficiently determine whether a node is the beginning/ending node of a hyperedge.

The LCA-Tree is constructed during preprocessing. An auxiliary array is needed in the construction. The pointers in the array map hypergraph nodes to the corresponding leaf nodes in the BDT of the hyper graph. For each of the nodes in a hyperedge, the corresponding BDT node is first retrieved and the path from the node all the way to the root of the BDT is travelled. The ID of the hyperedge is assigned as the flag of all the intermediate nodes on the paths. Next, starting from the root of the BDT, the implementation first tries to find a node of the BDT whose both children’s flags have the value of ID of the hyperedge. In case only one child whose flag has the value of ID of the hyperedge, the implementation follows the child until a node whose both children’s flags have the value of ID of the hyperedge is reached. This process is performed recursively until the leaf nodes of the BDT is reached. The implementation adds a node in the LCA tree in both cases, i.e., either a BDT node whose both children’s flags are the assigned ID or it is a leaf BDT node. Fig. 6-8 shows the process, where the ID represents the edge number of the hyperedge of {1,4,5,6} and the dashed lines shows the correspondences between the nodes in the BDT and the LCA tree.

For a hyperedge with k nodes, it takes at most log(n) for each of them to reach the root of the BDT where n is the number of the nodes in the hypergraph and the BDT. From the root of the BDT, it takes at most log(n) to reach each of the nodes at

O(k*log(n)). Since we assume k is bounded by a constant, thus the total time complexity for constructing LCA trees for the m hyperedges is O(m*log(n)). The space complexity of a LCA tree is in the order of O(k) , thus the space complexity for constructing the LCA trees is O(m), provided that k is bounded by a constant.

9 10 11 6 7 8 3 4 5 0 1 2 ID ID ID ID ID ID ID ID ID ID 5 4 1 6

Fig. 6-8. Determining the Beginning/Ending Node of a Hyperedge By using the LCA tree, whether a leaf node of the BDT is the beginning/ending node of a hyperedge can be determined efficiently. First start with the root of the LCA tree of the node and then follow the left/right child (depending on the orientation of the sub-tree rooted at the node) until we reach a leaf node of the LCA tree. Since the nodes of a LCA tree is a subset of the nodes of the BDT tree, the ID of the LCA leaf node and the ID of the BDT leaf node can be compared and decision can be made. The cost of the traversal from the root to a leaf node of a LCA tree is in the order of log(k) where k is the number of nodes in the corresponding hyperedge. Since we assume the maximum number of the nodes in a hyperedge of the

hypergraph is bounded by a constant, thus the determination can be made in small constant time.

We use the example data set shown in Fig. 4-1 to illustrate the optimization method for broadcast sequencing. The data set has 4 nodes and 11 hyperedges. The hyperedges and their weights are listed in Table 6-1. This simple data set will also be used to illustrate the other two proposed optimization methods subsequently.

Table 6-1. The Hyperedges and Their Weights for the Data in Fig. 4-1

Hyperedge Nodes Weight

1 62 2 19 3 34 4 85 1,3 2 2,3 38 1,2 22 3,4 8 2,4 3 1,2,3 14 2,3,4 4

We first remove the 4 hyperedges each of which has only one single node as discussed above. We then build an R-Tree with a branch factor of 3. The resulting R- tree has two leaf nodes in each of its two sub-trees as shown in Fig. 6-9 and we use it as the BDT. Traversal of the R-Tree gives an ordering of [1,2,4,3] and we use it as our initial ordering. Among the 7 hyperedges, edge {1,2} rooted at T11 with in_cut of

22, edge {3,4} rooted at T12 with in_cut of 8 and the rest rooted at T0 with their total

{2,3}, 3 for edge {2,4}, 14 for edge {1,2,3}and 4 for edge {2,3,4}, Thus the total inner_cut is 61. T12 T11 3 4 2 1 T0

Fig. 6-9. The BDT of the Example for Illustrating DBW Optimization

(b) 0-Orientation (a) 1-Orientation T12 _T₁₁ T0 2 1 3 4 - - + T12 _T₁₁ T0 1 2 3 4 + - +

Fig. 6-10. The Orientation Trees of Two Possible Orientations of T11

For the orientation tree in Fig. 6-10(a), the ordering is [4, 3, 2, 1]. Node 2 is the ending node of edges {2,3}, {2,4} and {2,3,4}, thus the left_cut of node 2 is 38+3+4=45. Node 2 is also the beginning node of edge {1,2} and thus its right_cut is 22. Similarly, the left_cut of node 1 is 2+14+22=38 and the right_cut of node 1 is 0. Since nodes 1 and 2 are leaf nodes, their costs are the same as their left_cuts which are 45 and 38, respectively. Thus the left_cut and the right_cut of their parents, T11,

are 45+38-22=61 and 22+0-22=0, respectively. The total cost of T11 under the current

1-orientation can be computed as 45+38+(22-22)*1+(38-22)*1=99. If the orientation of T11 is switched to 0-orientation, we can get the left_cut of node 1 as 2, the right_cut

of node 1 as 22, the left_cut of node 2 as 81 and the right_cut of node 2 as 0, thus the left_cut and the right_cut of T11 under the current 0-orientation are 2+81-22=61 and

22+0-22=0, respectively. The total cost of T11 is 2+81+(22-22)*1+(81-22)*1=142.

Since the 1-orientation of T11 has the smaller cost (99) than the 0-orientation (142) of

T11, we set the 0-orientation to T11. Similarly, we set the 0-orientation to T12 since its

0-orientation cost (15) is smaller than its 1-orientation cost (66) as shown in Fig. 6-10 (b). The left_cut and the right_cut under the 0-orientation of T12 are 0 and 61,

respectively. Thus the total cost of T0 is 99+15+(61-61)*2+(61-61)*2=114. This is

the global optimal value of all possible 4!=24 orderings. The cost of the optimized ordering [1,2,3,4] is 44.1% better than the initial ordering {1,2,4,3} whose cost is 165.

To examine how the BDT affects the best optimization we can achieve, we put nodes 2 and 4 in a branch and nodes 1 and 3 in another branch of the BDT to use. Although the cost of the initial ordering [2,4,1,3] has the worst cost (233) among the possible 24 orderings, the cost of the optimized ordering [4,2,3,1], which is 139, is 67.6% better. However, it is still 21.9% worse than the optimized ordering using R- tree traversal as the initial ordering. This simple example demonstrates the effectiveness of both the proposed optimization method and the proposed heuristics of generating BDT from R-Tree and generating initial ordering using R-Tree traversal.

In document Sequencing geographical data for efficient query processing on air in mobile computing. (Page 121-127)