Workload - Conservative Partitioning for Hypergraphs

3.4. Conservative Partitioning for Hypergraphs

3.6.2. Workload

There are two situations giving rise to complex hyperedges: (1) the TES as produced by CD-A [20] indicating non-reorderability of non-inner joins and (2) complex predicates referencing more than two relations.

In Sections 3.6.4 and 3.6.5, we distinguish between these two cases. Thus, we have implemented two kinds of query graph generators. The first graph generator is based on a random operator tree generator that attaches all different inner and non-inner join operators. Here, we make only one assumption. More than half of all attached join operators should be inner join operators. We then compute acyclic complex hypergraphs from these generated operator trees with the conflict detector CD-A, as proposed in [20]. To gain cyclic graphs, we determine all subgraphs of a given hypergraph that are connected by inner join edges only. Within those subgraphs, we randomly generate more inner join edges. We call the generated graphs non-inner join query graphs, although the majority of edges are still inner join edges. We refer to those graphs as non-inner/simple.

The second graph generator generates random acyclic and cyclic graphs, containing simple edges only. Thereby, the edges are randomly added by selecting two relation’s indices using uniformly distributed random numbers. After having generated a simple graph, the generator starts transforming simple edges to complex hyperedges at random. Therefore, it randomly chooses between 3 parameters: (1) and (2) the size of the hypernodes connected through the new hyperedge and (3) if the simple edge that is transformed is part of a cycle or not, i.e., is the only connection between two connected subgraphs or not. An edge is only transformed if the resulting complex hyperedge is not subsumed by any other edge. Essentially, this generator generates hypergraphs with complex hyperedges that model complex join predicates involving more than two relations. Therefore, we call the generated graphs complex predicate query graphs and refer to them as inner/complex.

To generate cardinalities and selectivities, we follow the approach of [12] as de- scribed in Section 4.4.2. Note that since we do not apply branch-and-bound pruning techniques, the assigned cardinalities and selectivities were not important for our stud- ies.

In Section 3.6.7, we compare the runtime performance of our plan generators with different benchmarks. Therefore, we computed the query graphs for all queries of the TPC-H [34] and the TPC-DS [33] benchmarks. As basis for the query graph compu- tation, we used the explain output of the IBM DB2 10.1 LUW database management system [18]. For every query we took the operator tree from the explain output and reduced it to a join tree. Thereby, every base relation of the join tree introduced a new vertex into the query graph. We assigned every vertex with the cardinality of the base relation if no local predicate could be applied. Otherwise, we took the optimizer’s cardinality estimate when all corresponding local predicates had been applied. The hyperedges were extracted from the predicates that were attached to the join operators. In case a materialized intermediate result in the form of a temporary relation T M P was referenced, we located the T M P operator in the join tree. Instead of a simple hyperedge, we introduce a complex hyperedge. For the part referencing the T M P operator, we introduced a complex hypernode that contained all base relations underneath that T M P operator. Hence, if two T M P operators were referenced in the join predicate, the corresponding hyperedges had two complex hypernodes at the end. For several TPC-DS queries the DB2 optimizer applied subplan sharing. Since the root node of every subplan is a T M P operator, we did not convert every join predicate referencing that T M P into a complex hyperedge. Instead, we introduced only a complex hypernode for every first predicate that we encountered referencing such a shared subplan

3.6. Evaluation

via a T M P . For all other references to the same T M P , we introduced a new vertex in the query graph. We assigned the vertex with a cardinality corresponding to the same T M P . If (local) predicates could be applied on that T M P , we took the cardinality after the predicate application. In order to simplify things, we ignored groupings and U N ION operations. Although not valid, we transformed every U N ION into an inner join with a selectivity of one. We modeled the corresponding edge as a complex hyperedge. Thereby, all base relations as a whole on each side of the U N ION formed a complex hypernode.

As third benchmark, we took the SQLite test suite [29]. The query graphs for the SQLite test suite were provided through the courtesy of Thomas Neumann and ob- tained from the HyPer optimizer [32].

3.6.3. Organizational Overview

In our empirical analysis, we compare the performance of six top-down join enumera- tors:

• TDBASICHYP as the instantiated TDPLANGENHYP (Section 3.3.1) variant with naive partitioning PARTITIONnaiveHyp(Section 3.3.2)

• TDMCLHYPnaiveas the instantiated TDPLANGENHYPvariant with MINCUT-

LAZY [5] (Appendix A.2). Thereby, we apply a graph mapping of complex hyperedges to simple hyperedges where every complex hypernode is represent- ed by the vertex with the smallest index. We reuse Lines 37 to 40 of COM-

PUTEADJACENCYINFO. We filter out false ccps withISCONNECTEDHYP(Sec-

tion 3.3.3).

• TDMCBHYPnaiveas the instantiated TDPLANGENHYPvariant with MINCUT-

BRANCH(Section A.2). We applyCOMPUTEADJACENCYINFO(Section 3.5.3)

and filter out false ccps withISCONNECTEDHYP(Section 3.3.3).

• TDMCCHYPas the instantiated TDPLANGENHYP(Section 3.3.1) variant with

MINCUTCONSERVATIVEHYP(Section 3.4).

• TDMCBHYP as TDPLANGENHYP instantiated with PARTITIONX (Sec-

tion 3.5) and MINCUTBRANCHas partitioning algorithm (Section 2.5).

• TDMCCFWHYP as TDPLANGENHYP instantiated with PARTITIONX (Sec-

tion 3.5) and MINCUTCONSERVATIVEas partitioning algorithm (Section 2.4).

Table 3.6.3 gives a summarized overview of the six different algorithms. In order to put all top-down plan generators into perspective, we include the results of Moerkotte and Neumann’s DPHYP[21] as the state of the art in bottom-up join enumeration via dynamic programming.

We present our results in terms of the quotient of the algorithm’s execution time and the execution time of DPHYP. We refer to this quotient as the normed time. Table 3.4 shows the average, minimum, and maximum normed time over the whole workload for non-inner/simple and inner/complex queries.

Since the normed time for DPHYP is always 1, we rather give its elapsed time in seconds. Figure 3.53 displays the runtime results for acyclic/inner/complex and

Name Partitioning Strategy Remarks

TDBASICHYP PARTITIONnaiveHyp Sec. 3.3.2

TDMCLHYPnaive COMPUTEADJACENCYINFO+MINCUTLAZY Sec. 3.5.3, A.2

TDMCBHYPnaive COMPUTEADJACENCYINFO+MINCUTBRANCHSec. 3.5.3, A.2

TDMCCHYP MINCUTCONSERVATIVEHYP Sec. 3.4

TDMCBHYP PARTITIONX+MINCUTBRANCH Sec. 3.5.2, 2.5

TDMCCFWHYP PARTITIONX+MINCUTCONSERVATIVE Sec. 3.5.2, 2.4

Table 3.3.: Names of different plan generation algorithms and the corresponding partitioning strategies.

Figure 3.54 for acyclic/non-inner/simple queries. We give the number of vertices on the abscissa and the execution time in log scale on the ordinate. We draw lines to connect the averaged execution times.

For randomly generated cyclic queries, the algorithms’ performance results deviate significantly for the same number of vertices. Thus, we cannot show the results for different numbers of vertices at the same time. Figures 3.55 and 3.56 present the results for 10 and 15 vertices for cyclic/inner/complex queries. The results for 10 and 15 vertices for cyclic/non-inner/simple queries are shown in Figures 3.57 and 3.58.

For the experiments with the randomly generated queries (Section 3.6.4 and 3.6.5), we include only those query graphs in our evaluation that all plan generators could process in less than 100 seconds. In the third part of our experiments, where we eval- uated the runtime performance with the benchmark queries, we measured the compile time only for those query graphs with equal or less than 32 vertices. Here, we applied no time constraint.

The workload of the randomly generated queries consists of more than 50000 query graphs. We generated graphs up to 20 vertices of non-inner/simple queries and graphs up to 22 vertices of inner/complex queries. Thereby, among the cyclic queries the number of edges per number of vertices is evenly distributed. In fact, when generating the cyclic queries, we took care that the minimal number of edges was equal to the number of vertices and that the maximal number of edges was at least twice the number of vertices. Every graph had to have at least one complex hyperedge.

Our experiments of Sections 3.6.4 to 3.6.6 were conducted on an Intel Pentium D with 3.4 GHz, 2 Mbyte second level cache and 3 Gbyte of RAM running openSUSE 12.1. The performance evaluation of Section 3.6.7 was conducted on an i7 Intel Quad Core with 3.4 (1.6) GHz, 8 Mbyte second level cache and 4 Gbyte of RAM running openSUSE 12.1. On both machines, we used the Intel C++ compiler with the compiler option O3.

In document Algorithms for Efficient Top-Down Join Enumeration (Page 125-128)