It is obvious that we may have multiple valid execution plans for a query pattern and different execution plans may have different performances, therefore the challenge is to find the most efficient one among them. In this section, we present two techniques to tackle this challenge.
4.4.1 Minimising Number of Rounds
Given a query pattern P and an execution planPL, recall Algorithm 11, we have |PL| rounds for each region group, where in each round the workload can be shared. To be specific, a single undetermined edge e may be shared by multiple embedding candidates. If they are generated in the same round, the cost of network communication and verifica- tion by e of be shared among them effectively. The same principle applies to the foreign vertices whose cost of fetching and memory space can be shared among multiple em- bedding candidates if they happen in the same round. Therefore, in order to optimize the performance, our first heuristic is to minimize the number of total rounds. Here we present a technique to compute query execution plans, which guarantees a minimum number of rounds.
Let us review the concept of connected dominating set [20].
Definition 15. Given a query pattern P , a connected dominating setDSis a subset of the vertex set of a P where any two vertices are reachable to each other and any vertex of the P is either inDSor adjacent to a vertex inDS.
A minimum dominating setMDSis the one with smallest cardinality among all con- nected dominating sets. A query pattern may have multipleMDSs while they all have the same cardinality. We have the following theorem.
Theorem 3. Given aDSof query pattern P , there exists a valid execution planPLwhere
4.4 Computing Execution Plan 87
Proof. To prove Theorem 3, let us try to construct aPL where the upi vot of each unit
is a vertex withinDS. Sequentially following an increasing order of vertex ID, for each vertex u ∈DS, we create a unit d pi and set d pi.upi vot= u. For any u′∈¡ Ad j (u) ∩DS¢,
if u′has neither been used as a pivot nor been added into leaf vertices of any other unit yet, we add u′to d pi.Vl ea f. After all units are created, for any u′∈ (VP−DS), we add u′
to d pi.Vl ea f where the degree of d pi.upi vot is the largest among Ad j (u′) ∩DS. Finally
arranging those units intoPLfollowing the ascending order of the ID of their upi vot, it is
easy to see thatPLis a valid execution plan. Theorem 4. Given aPLof a query pattern P ,S
d pi∈PL{d pi.upi vot} is a connected domi-
nating set of P .
Proof. It is intuitive to prove Theorem 4, since every leaf vertex is connected to a pivot in
PL. And the pivot vertices ofPLis a connected subset of P as per its definition. Based on Theorem 3 and Theorem 4, it is straightforward to have the Corollary 2. Corollary 2: Given aMDS of a query pattern P , |MDS| is the minimum number of rounds for any valid execution plan that P may have.
Given the above three theorems, it is guaranteed that we can generate at least one execution planPLgiven aMDSof a query pattern and thePLis guaranteed to have a minimum number of rounds.
Example 20. Consider the query pattern in Figure 4.2(a), we can get a minimum domi-
nating set MDS= {u0, u1, u2}, based on which, we have an minimum round execution planPL= {d p0, d p1, d p2} where d p0.upi vot = u0, d p0.Vl ea f = {u1, u2, u7, u8, u9} and d p1.upi vot = u1, d p1.Vl ea f = {u3, u4} and d p2.upi vot= u2, d p2.Vl ea f = {u5, u6}.
4.4.2 Moving Forward Verification Edges
Given a query pattern P , we may have multiple execution plans with the minimum num- ber of rounds. In order to further optimize the performance of our approach, we need to find the one with the best performance from them. Then the question will be: given two execution plans with the same number of rounds, how could we determine one is more efficient than another? To answer this question, we propose the following scoring functionS(PL) for a given planPL= {d p0, . . . , d pl}:
SC(PL) = X d pi∈PL ρ (i + 1)× (|E si b d pi| + |E anc d pi|) (4.2)
Algorithm 12: COMPUTEEXECUTIONPLAN
Input: Query pattern P
Output: The execution planPL={d p0. . . d pl}
1 MDSs ← mi ni mumC DS(P) 2 PLmax← ;,PL← ; 3 for eachMDS∈MDSs do 4 for each u ∈MDSdo 5 create a unit d p 6 d p.upi vot← u 7 N R′← Ad j (u) ∩MDS 8 d p.Vl ea f ←N R′ 9 add d p toPL 10 next RoundUni t (N R′) 11 remove d p fromPL Subroutine nextRoundUnit(N R) 1 ifN R= ; then 2 for each u ∈ (VP−MDS) do
3 PLsub←Sd p∈PL,d p.vpi vot∈(Ad j (u)∩MDS)(d p)
4 d p ← ear l i estUni t(PLsub)
5 add u to d p.Vl ea f
6 ifSC(PL) >SC(PLmax) orPLmax= ; then
7 PLmax←PL
8 for each u ←N Rdo 9 N R′← {N R− u}
10 create a unit d p
11 d p.upi vot← u
12 for each u′∈ Ad j (u) and u′is a neither a pivot nor a leaf inPLdo 13 if u′∈MDSthen
14 add u′to d p.Vl ea f and toN R′
15 add d p toPL
16 nextRoundUnit(N R′)