Generating the Adjacency Information - Conservative Partitioning for Hypergraphs

3.4. Conservative Partitioning for Hypergraphs

3.5.3. Generating the Adjacency Information

All graph-aware partitioning algorithms like MINCUTLAZY [5] (Appendix A.1), MINCUTAGAT (Section 2.3), MINCUTCONSERVATIVE (Section 2.4) or MINCUT-

BRANCH(Section 2.5) utilize the neighborhood information to extend connected sets.

Thus, our generic partitioning framework has to provide this information to the particular partitioning algorithmPARTITIONgraph−awareit is instantiated with. We make

TDPlanGenHyp

TDPGHypSub

partition_X

initializeInfoStack

computeAdjacencyInfo

composeCompoundVertices connectionTestRequired partition_graph_aware decode

isConnectedHyp

computeLookUpIdx storeAdjInfo

getBCCInfo maximizeCompoundVertices manageAdjacencyInfo

findInitialCompounds maximizeCompoundVerticesSub

maintainLabels findInitialCompoundsSub

Figure 3.22.: Call graph for the top-level case ofPARTITIONX

erence to Figures 3.23 and 3.24. Essentially,PARTITIONgraph−awarehas to rely on the

precomputed neighborhood array Nmof struct StackEntry (Figure 3.23). The other

variables mainly exist for performance and maintenance reasons and are explained later on. The initial setup is done byCOMPUTEADJACENCYINFOgiven in Figure 3.29. Before going into the details of the pseudocode, we need to explain how the adjacency information can be stored.

Precomputed Neighborhoods

To capture the adjacency information of simple edges is relatively easy and straight forward. This is done by an associative array Ns that stores for a given vertex x the

vertices adjacent to x, as defined in Figure 3.25. In practice, Nsis implemented as an

array of bitvectors, as Section 3.5.7 will point out. Here the lookup for a given x works slightly different. First, we determine the index i of the given vertex x (Definition 3.2.2). Since x itself will be represented as a bitvector where only one bit is set, we simply have to determine the index of that particular bit. This can be done by the bit scan f orward assembler instruction. In a second step, we use i as an index into the array Ns. With access to the corresponding bitvector, we can determine the bits

that are set. Each bit then corresponds to an index that represents a vertex in V . To compute the simple neighborhood Nsimple(C) (Definition 3.2.14), we simply iterate

over the bits that are set in the bitvector representing C. We use the corresponding index of each bit that is set to access the corresponding bitvector in Ns. The result of

3.5. Generic Top-Down Join Enumeration for Hypergraphs

of C. Thus, the adjacency information of a simple hyperedges can be easily stored and accessed in such an array of bitvectors.

For storing the adjacency information of complex hyperedges, we have to choose a different approach. There are three reasons for this. (1) First of all, we cannot determine a unique index of a given complex hypernode v the same way as before, because in the corresponding bitvector of v is more than one bit set. One might argue to choose the bitvector’s integer value instead. But then the array of bitvectors will have a size of 2|V |, which is not practical for large V s. As an alternative implementation, we could use a map. But then the lookup is slightly more costly, O(log|V |) (e.g. in case of a red-black tree) instead of O(1). Let us assume we have solved this problem. That brings us to our second reason. (2) For a given complex hypernode, there might exist more than one adjacent hypernode. Thus, an array or map of bitvectors is not sufficient anymore. We need to have a set of bitvectors instead. Hence, we can cope with the first two reasons by paying some performance penalties. Nevertheless, there is a third reason. (3) This approach is not suitable for computing the complex neighborhood N (S, C, ∅) (Definition 3.2.15). For the simple neighborhood, we only had to iterate over every vertex of C. But now we would have to consider every possible subset of C and would need to check our array or map if we found a set of bitvectors for that particular subset of C. Thus, we can conclude that the lookup of a precomputed complex neighborhood is infeasible.

Hence, we store the complex hyperedges. We can compute a hyperedge filter EF

that points only to those hyperedges that reference only vertices contained in the cur- rent vertex set S. To compute N (S, C, ∅), we have to consider all those hyperedges that are qualified by the filter. Thus, we have a complexity of O(|Ecomplex|), where

Ecomplex = {(v, w) | v, w ∈ E ∧ (|v| > 1 ∨ |w| > 1)} is the set of complex hy-

peredges. Since the neighborhood is computed many times, this will have a negative impact on the performance of the partitioning algorithm. The good news is that we will not need to compute N (S, C, ∅). Moreover, since we map the complex hyperedges to simple hyperedges, we can rely on the precomputed simple neighborhood. As we will see, the adjacency information of our simple hyperedges that we have generated by the complex to simple hyperedge mapping is stored in the array Nhinstead of in Ns.

The Pseudocode in Detail

At the beginning of COMPUTEADJACENCYINFO (Figure 3.29), we precompute the simple neighborhood Ns (Figure 3.25) by iterating over all simple edges (Line 3

to 5). The complex hyperedges of a given hypergraph are stored in an array called HyperEdges (Figure 3.25) of type HyperEdge (Figure 3.24). The size of HyperEdges is |Ecomplex|. Therefore, we loop over Ecomplex in Line 8 of COM-

PUTEADJACENCYINFO and invoke STORECOMPLEXHYPEREDGE (Figure 3.30) in

Line 9.STORECOMPLEXHYPEREDGEstores the two hypernodes v and w of the cur- rent complex hyperedge (v, w). The values for vrep and wrep are assigned later by a

call toSTOREADJACENCYINFO(Figure 3.32). Both values are used to store the result

of the complex hyperedge (v, w) to simple edge (vrep, wrep) mapping. Thus, vrep⊆ v

and wrep ⊆ w holds.

To understand Lines 8 to 41, recall the third observation of Section 3.5.1. We solve the problem illustrated there by first pretending to substitute every complex hyperedge

(v, w) with all possible combinations (= |v| ∗ |w|) of simple edges (loop of Line 11 to 20). For every created overlapping simple edge, we increase card (Line 14). For every combination of indices i of xi∈ v and j of yj ∈ w (Definition 3.2.2), we compute the

position of the corresponding entry within the array Ovlp (Line 1) by a call to COM-

PUTELOOKUPIDX(Line 13). The formula used in Line 1 ofCOMPUTELOOKUPIDX

(Figure 3.27) guarantees space efficiency (SIZEOF(Ovlp) = |V |∗(|V |−1)₂ ). In Line 19 and 20 ofCOMPUTEADJACENCYINFO, we keep track of the simple edge and its array entry that is generated most frequently. After the generation of simple edges (they are not materialized yet), we check if we have found overlapping hyperedges (Line 22). If so, we materialize the simple edge that was generated most frequently (Line 30). At this point, we remove all other combinations of simple edges (Line 26 to 29) for the set of overlapping edges stored in Ovlp[idx].Eref (Line 23). Lines 33 to 36 spot the next largest set of overlapping hyperedges, and the process is started again. Those complex hyperedges that do not overlap are substituted with one simple edge in Lines 37 and 41 through a call toSTOREADJACENCYINFO(Figure 3.32). We decided to store the substituted complex hyperedges not within the simple neighborhood Ns, but with-

in Nh, where h stands for hyperneighborhood, although it is not an exact translation

(Definition 3.2.15).

Storing the Adjacency Information

The purpose of STOREADJACENCYINFOis to update Nh, but also HEdgeLkp and

HyperEdges. Figure 3.25 declares HEdgeLkp as an array of hyperedge refer- ences and HyperEdges as an array of all complex hyperedges. Our generic partitioning framework needs both variables in order to reconstruct the complex hyperedge (v, w) for a given simple edge ({x}, {y}). From the handed over xi and yi

we know that a simple hyperedge ({xi}, {yj}) is represented. Thereby, ({xi}, {yj})

is the result of the complex hyperedge to simple edge mapping where (v, w) is the corresponding complex hyperedge so that xi ∈ v ∧ yi ∈ w holds. By calling

COMPUTELOOKUPIDX(xi, yj), we can compute an index into HEdgeLkp that re-

turns a hyperedge reference in form of an index into HyperEdges.

In Line 3 of STOREADJACENCYINFO, we compute the index lkp. We use lkp to update HEdgeLkp in Line 4. The generated simple hyperedge ({xi}, {yj}) is materi-

alized in the particular HyperEdges entry of the corresponding complex hyperedges Eref in Lines 6 and 7.

In document Algorithms for Efficient Top-Down Join Enumeration (Page 95-98)