Enumerating the query result with constant delay

4. Answering q-hierarchical conjunctive queries under updates 29

4.6. Enumerating the query result with constant delay

The aim of this chapter is to prove Theorem 3.3(b), i.e. we now discuss how the data structure from the previous chapter can be used to enumerate the query result with constant delay. See Table 4.1 for the 22 result tuples of Example 4.4.

In the next example we show how to algorithmically enumerate the tuples in Table 4.1 using the data structure for D0and Q from Example 4.4

Example 4.13. Let us consider the queryQ and the database D0 from Example 4.4.

Algorithm 2 enumerates the tuples in the query result. The algorithm runs through the data structure and it runs over all fit items. Since all the items are fit and the output tuples comes from the assignments of fit item, it follows that the tuples are part of the query result.

4.6. Enumerating the query result with constant delay

y x1 x2 x3

1 1 4 1

1 1 5 2

1 1 6 3

1 1 6 4

1 2 4 1

1 2 5 2

1 2 6 3

1 2 6 4

1 3 4 1

1 3 5 2

1 3 6 3

1 3 6 4

2 4 2 1

2 4 2 8

2 4 2 4

2 8 2 1

2 8 2 8

2 8 2 4

2 9 2 1

2 9 2 8

2 9 2 4

3 2 1 1

Table 4.1.: Enumeration of Q(D0) from Example 4.4.

Algorithm 2Enumeration algorithm.

1: if [∅] is fit then

2: for [^a_y⁰]in the y-list of [∅] do

3: for [^a_{y x}⁰^a¹

1]in the x1-list of [^a_y⁰] do

4: for [^a_{y x}⁰^a²

2]in the x2-list of [^a_y⁰] do

5: for [â_{y x}⁰â²â³

2x₃]in the x3-list of [^a_{y x}⁰^a²

2] do

6: Output(a0, a1, a2, a3)

7: Output EOE

set the item in line 4 is set to [_{y x}^{1 5}

2] and in line 5 is set to [_{y x}^{1 5 2}

2x3]. The algorithm continues with enumerating all results.

For the remainder of this chapter we assume that Q(x1, . . . , xk, b1, . . . , bℓ) is a q-hie-rarchical conjunctive query, vars(Q) ={x¹, . . . , xm} with 0 ⩽ k ⩽ m, and Q is of the form

Q = {(x1. . . xk, b1,· · · , b^ℓ) : ∃x^k+1· · · ∃x^m(ψ1∧ · · · ∧ ψ^d)} , (4.4) where b1, . . . , bℓ∈ dom and ψ¹, . . . , ψd are atomic queries of schema σ. To enumerate the result of a non-Boolean conjunctive q-hierarchical query Q(x1, . . . , xk, b1, . . . , bℓ), let Tfree be the subtree of TQ induced on V (Tfree) := free(Q) ∪ {v^root} = {x¹, . . . , xk, vroot}. For each node v of T^free, let us fix an (arbitrary) linear order ⩽^v for all variables v in Q. In our example query we have Tfree= T and we let x1<^yx2. For the enumeration procedure we will use the notion of Eⁱ (see Definition 4.11) and the decomposition of these sets given in Lemma 4.12.

The main idea for the enumerate routine is the following. If the start-item [∅] is not fit, the enumerate routine stops immediately with output EOE. Otherwise, we do the following. Inductively, for every item i we enumerate the set ˜Eⁱusing the following algorithm. We iterate for all u∈ child(vⁱ)∩free(Q) over allˆι∈ Lⁱuover all assignments in αu ∈ ˜E^ˆ^ι^j and construct the assignment αⁱ∪⋃

u∈child(vⁱ)αu. By using Lemma 4.12 we know that we enumerate the assignments in ˜Eⁱ. If we enumerate ˜E^[∅], we can easily construct the result tuples in Q(D) by outputting (α(x1), . . . , α(xk), b1, . . . , bℓ) for every α∈ ˜E^[∅]. The pseudo-code for the described recursive enumerate routine is given in Algorithm 3. The Enum function in Algorithm 3 also requests a set of an order for every node. This order defines in which grouping the result set will be output, since the order can change the sorting of the for loops. For example, in Table 4.1 we used the order x1<^y x2 for y. This implies by construction of the algorithm, that the result set is grouped by the variables y and x1.

The next lemma establishes the correctness of Algorithm 3 and show that the tuples will be enumerated with delay O(|vars(Q)|) without repetition.

Lemma 4.14. LetQ be a q-hierarchical CQ and let D be a σ-db. For all present and fit items i in the data structure for Q on D it holds that:

(a) the assignments yielded by Enum(i,{<^u}^u∈succ(vⁱ)) in Algorithm 3 are exactly the assignments in ˜Eⁱ.

(b) The procedure Enum(i,{<^u}^u∈succ(vⁱ)) in Algorithm 3 takes time O(| succ(vⁱ)|)

• until the first assignment will be yielded and

• between two assignments were yielded and

• the last assignment will be yielded and the procedure finished.

Proof. Let TQ be the q-tree of Q that is used to construct the data structure for Q on D and let Tfree be the subtree of TQ induced on free(Q).

4.6. Enumerating the query result with constant delay Algorithm 3Enumeration algorithm.

1: function Enum(i,{<^v}^v∈succ(vⁱ))

2: Input: present and fit item i in the data structure D and a set that contains for every w∈ succ(v) in T^free an order over their children.

3: if vⁱ is a leaf in Tfree then

4: yieldαⁱ

5: else

6: Let u1<^vⁱ . . . <^vⁱ us be the chilren of vⁱ that belongs to free(Q).

7: forˆι1∈ Lⁱu₁ do

8: forα1∈ Enum(ˆι¹,{<^v}^v∈succ(u1)) do

9: . ..

10: forˆιs∈ Lⁱu_s do

11: forαs∈ Enum(ˆι^s,{<^v}^v∈succ(us))) do

12: yieldαⁱ∪ α¹∪ · · · ∪ α^s

13:

14: if [∅] is fit then

15: forα∈ Enum([∅], {<^v}^v∈V^′) do

16: print(α(x1), . . . , α(xk), b1, . . . , bs)

17: print EOE

We show Lemma 4.14 by induction of the height of an item in Tfree.

For the induction base let us consider an item i of height 0, i.e., vⁱ is a leaf in Tfree. Then, we simply output αⁱ. This is exactly the only assignment in ˜Eⁱand it takes O(1) time until the assignment will be yield and until the procedure will finish. Clearly, we do not output duplicates.

For the inductive step let us consider an item of height h. Let u1,· · · , u^s⊆ free(Q) be the children of vⁱ in Tfree. By construction of the algorithm it holds that for all j ∈ [s] there is a ˆιj ∈ Lⁱuj such that αⁱ ∪ α¹ ∪ · · · ∪ α^s will be yield, where αj

will be yielded by Enum(ˆιj). Note that the itemsˆιj for all j ∈ [k] are fit. Therefore, by induction hypothesis, these are exactly the assignments in ˜E^ˆ^ι^j. Thus, with Lemma 4.12 it follows that the assignments, yielded by Enum(i,{<^u}^u∈succ(vⁱ)) are exactly the assignments in ˜Eⁱ.

Part (b) follows from the fact that for every iteration of the for-loop, where we iterate over assignments in Enum(ˆιj,{<^w}^w∈succ(uj)), it takes by induction hypoth-esis time O(| succ(u)|) to receive the first assignment we consider, and between two considered assignments, and between the last assignment and the end of the for-loop.

Furthermore, we need time O(1) to go to the first or the next element ofLⁱu_j. There-fore by induction hypothesis, we need O(∑s

j=1| succ(u^j)|) = O(succ(vⁱ)) until the first assignment will be yielded and between two assignments were yielded and between the last assignment will be yielded and the end of the procedure.

To prove part (c), let us assume for a contradiction, that an assignment α was yielded twice. We consider now the outermost loop, that continued with the iteration

between the two times α was yielded. If the loop is of the formˆι∈ Lⁱuj, item [α^{i α(u}_u^j⁾

j ] was considered twice. This is a contradiction to the construction of the data structure.

If the loop is of the form αj ∈ Enum(ˆιj), then αj was considered twice but this is a contradiction to the induction hypothesis.

Algorithm 3 is correct, since

Q(D) = {(α(x¹), . . . , α(xk), b1, . . . , bℓ) : α∈ ˜E^[∅]}

4.14(a)

= {(α(x¹), . . . , α(xk), b1, . . . , bℓ) : α in Enum([∅], {<^v}^v∈V^′)}.

Since we can check in time O(1) (using the Boolean variable start-is-fit in the data structure) if [∅] is fit and Lemma 4.14(b), we enumerate the tuples with delay td= O(|vars(Q)|) without repetition (this is guaranteed by Lemma 4.14(c)).

This concludes the proof of Theorem 3.3(b).

Remark 4.15. As a remark, we show now that if the data structure is lexicographically ordered. Then the enumeration algorithm Enum([∅], {<^vs}^v∈V^′) where xj <^vs xj^′ if and only if j < j^′ and xj, xj^′ ∈ child(v) for all v ∈ V , enumerates the tuples in lexicographical order.

Let (a1, b)∈ Q(D) and (a², b)∈ Q(D) where b = (b¹, . . . , bℓ) be two tuples such that a1will be enumerated beforea2. Letj∈ [k] be the smallest index with (a¹)j̸= (a²)j. By definition of{<^vs}^v∈V^′ is the outermost loop, that continued with the iteration between a1 anda2 were enumerated in the for loop of the formˆι ∈ Lⁱxj. Since these lists are lexicographically ordered, it follows that(a1)j < (a2)j and thusa1< a2.

In document Answering Conjunctive Queries and FO+MOD Queries under Updates (Page 52-56)