• No results found

The data structure for queries with aggregates

5. Testing t-hierarchical conjunctive queries under updates 69

6.6. The data structure for queries with aggregates

First of all, we give an example how to construct a data structure that maintains the query Qcounton the database given in Example 4.4. Recall the query Qcount:

Qcount := {(y, count(x1), max(prod(x2, sum(x3)))) : Eyx1 ∧ F yx2x3 ∧ Gyx2x3}.

On every present item i in the data structure with vi = x2, we store two values wsum(xlist 3)(i) and witemprod(x2,sum(x3))(i) where wprod(xitem 2,sum(x3))(i) stores the product of the sum and its constant. Furthermore, we store for every item i with vi = y a value wlistcount(x1)(i) and wlistmax(prod(x2,sum(x3)))(i),

See Figure 6.1 for an illustration of the data structure together with the values.

Every red (blue) number on an item i with vi = x2 shows the value wsum(xlist 3)(i) (wprod(xitem 2,sum(x3))(i)), the red (blue) number above an item i with vi = y shows the value wcount(xlist 1)(i) (wmax(prod(xlist 2,sum(x3)))(i)).

6.6. The data structure for queries with aggregates

Figure 6.1.: Illustration of Example 4.4 with aggregates.

[∅]

Now, suppose that (4, 1) was inserted to E. As we know from our example the item [4y]is going to be fit and we insert [yx4 11]to the x1-list of [4y]. Therefore, we simply have to update the value wcount(xlist 1)(i) to 1. See Figure 6.2 for an illustration.

Different to the running example in Section 4.1 we now suppose that we delete (1, 6, 4) from F . As a consequence, the item [yx1 6 42x3]loses his fit-status. Therefore, we this can be done in O(log n) since the aggregation time for max is at most O(log(n)) where n :=|adom(D)|.

See Figure 6.3 for an illustration after the update step.

Let us now go into technical details for arbitrary q-hierarchical queries with aggre-gates and databases. Let Q ={(x1, . . . , xs, expr1, . . . , exprr) : φ} be the input query.

Let TQ be a q-tree of Q and let Tfree be the subtree of TQ induced on free(Q). Let D be a σ-db. For every v∈ V (T) let AE(v) be the set of subexpressions that appear in one of the expressions exprt for t∈ [r] and exprtis an expression with the variable v.

To maintain queries with aggregates, we have to ensure that the following condition holds.

Figure 6.2.: Illustration of 4.4 with aggregates after (4, 1) was inserted to E.

Figure 6.3.: Illustration of 4.4 with aggregates after (4, 1) was inserted to E and (1, 6, 4) deleted from F .

6.6. The data structure for queries with aggregates Condition 6.14. For all present itemsi in the data structure for Q on D, where vi is not a leaf inTfree, we store for everyu∈ child(vi) and every expr∈ AE(u) variables of the formwlistexpr(i) and wexpritem(i) (if expr is of the form G(expr)) such that the following is satisfied.

• if expr is of the form F(u) then wlistF(u)(i) = fn

aˆι : ˆι∈ Liuj

• and if expr is of the form G(expr) then wG(exprlist )(i) = gn

wexpritem(ι) : ι∈ Liu

whereas if expr is of the form F(expr1, . . . , expr) then witemexpr(ι) = f

wlistexprj(ι) : j ∈ [ℓ]⦄ , and if expr is of the form F(u, expr1, . . . , expr) then

witemexpr(ι) = f+1

(⦃aι⦄ ∪⦃

wexprlistj(ι) : j∈ [ℓ]⦄) , and if expr is of the form ⟨u, expr1, . . . , expr⟩ then

wG(expritem )(ι) = ⟨aι, wexprlist1(ι), . . . , wexprlist(ι)⟩

The main idea is that these values are partial solutions of the aggregate values in the sense that wlistexpr(i) =JexprK

(D,αi)

. These values can be computed using a bottom-up algorithm. Such a value for i has to be changed only if there is an item in one of the u-lists of i, which value was changed, or the fit status of i changes. Therefore, whenever the fit status of an item changes, we can change all the partial solutions of wlistexpr(i) and, afterwards we recompute the partial solution for the parent item. We recompute the partial solutions of the corresponding parent item from the item we changed the values the last time, again and again, until we receive an parent item that has no aggregate or the item is the start-item.

In the following lemma, we show that the values stored in wexprlist(i) are the partial solutions.

Lemma 6.15. Let Q be a q-hierarchical CQ with aggregates and D be a σ-db and Tfree be the induced subgraph of the q-tree of Q on free(Q). For every fit item i in the data structure for Q on D (that corresponds to TQ), of height ⩾ 1 in Tfree and every u∈ succ(vi) and every expr∈ AE(u) the following holds: wlistexpr(i) =JexprK

(D,αi)

. Proof. We prove this lemma by induction over the height of an item in Tfree. For the induction base, let i be a fit item i of height 1 in Tfree. Then, for every u∈ child(vi) has the expression expr the formF(u). Then,

wF(u)list (i) = fn

⦃aˆι : ˆι∈ Liu

⦄ = JexprK

(D,αi)

.

For the inductive step, let us consider an item i of height h > 1 and let u∈ child(vi) arbitrary and expr be an arbitrary expression in AE(u). If expr is of the form F(u), the claim follows from the same argument as the induction base. If expr is of the form G(F(expr1, . . . , exprs)) where s :=|child(u)| then

Note that the equation marked with (IH) follows by the induction hypothesis. The case that expr is of the form G(F(u, expr1, . . . , exprs)) or G(⟨u, expr1, . . . , exprs⟩) can be shown by an analogous way.

To update the data structure we do the following. Every time we create a new item i during the update procedure, we set for all u ∈ succ(vi) and for all expr ∈ AE(u) the value f0 if expr has the form F(u) and the value g0 if expr has the formG(expr).

Whenever the the fit status of an item i is changed we execute Algorithm 9.

Algorithm 9The fit status of [vb1,...,vp

6.6. The data structure for queries with aggregates Lemma 6.16. LetQ be a q-hierarchical CQ with aggregates and let D be a σ-db. For all present itemsi in the data structure for Q on D the following holds.

1. After the data structure is initialised for the empty database, Condition 6.14 holds fori and

2. if Condition 6.14 holds fori, the condition still holds after the data structure was updated and Algorithm 9 was applied.

Proof. When initialising the data structure for the empty database, we initialise a start item with empty lists and nothing else. In particular, Condition 6.14 holds.

Let us suppose that we have a data structure for a database D such that for all items Condition 6.14 holds for all present items in the data structure. Now the fit status of an item changed (or it is created or deleted). Then, it is straightforward to verify that the algorithm ensures for all j ∈ {0, . . . , p}, all possible wlistexprj) and all witemexprj) are correct where expr are appropriate aggregate expressions.

Let us assume for a contradiction that there is an item ˆι /∈ {ι1, . . . , ιp} where the wlistexpr(ˆι) or the witemexpr(ˆι)-value is not correct and for all u ∈ child(vˆι)∩ free(Q) and all ˜ι∈ Lˆιu is the wlistexpr(˜ι) and the wexpritem(˜ι) value is correct where expr are appropriate aggregate expressions. Then the corresponding multisets for wexprlist(ˆι) and wexpritem(ˆι) must be changed. But this can only happen if an item gets fit or unfit. Since ιp is the only item whose fit status was changed, this violates the fact that there is an item ˆι /∈ {ι1, . . . , ιp} where the wlistexpr(ˆι) or the wexpritem(ˆι)-value is not correct. In particular it follows that the values are correct for all present items. This concludes the proof.

We now analyse the running time of Algorithm 9:

Lemma 6.17. LetQ be a q-hierarchical CQ with aggregates and let ta be the aggrega-tion time ofQ. Algorithm 9 takes time poly(Q)ta on a data structure for an arbitrary σ-db D.

Proof. Let us first consider line 4. The item ιp changes the fit status, therefore it is added to or removed from the vp-list of ιp−1. Therefore, the value stored in wlistexprp−1) (before the fit status of ιpchanged) is equal to fn−1

aˆι : ˆι∈ Liuj\ {ιp}⦄ or fn+1

aˆι : ˆι∈ Liuj ∪ {ιp}⦄

. To obtain the correct wlistexprp−1) value update the value in time O(ta(F)) by adding or removing a ιp to/from the multiset. With the same trick, we can update the value in line 6. Let us now consider line 12, 14 and 16). Since the values for the arguments aιj+1 and witemexprs(ˆι) for all s ∈ [ℓ] are al-ready stored and the number of these arguments are bounded by ℓ + 1, it takes at most (|child(vj+1)| + 1)ta to compute wexpritemqj+1) from scratch. To update wexprlistj) in line 17 it takes at most 2ta to update the value, since we have to remove the ”old”

witemexprqj+1) and insert the ”new” value. Since the number of aggregate expressions is bounded by|Q| it follows for the running time of Algorithm 9:

|Q| · 2ta+

p−2

j=0

O(|Q|)[(|child(vj+1)| + 1)ta+ 2ta] ⩽ poly(Q)ta

All in all, we have shown that there is a data structure that maintains queries with