We have already explained the basic idea of step-wise evaluation over time in Section 14.1 for a single CERA expression Q. The idea extends straightforwardly to query plans QP. In the incremental evaluation, we are not only concerned with computing the output in each step, but also with maintaining the event histories (i.e., the contents of the materialization points). We will see, however, that solving one issue also solves the other.
14.3.1
Input and Output in Incremental Evaluation
Each step of the incremental evaluation of a query plan QP = hQ1 := E1, . . . , Qn := Eni is
initiated by one or more base events happening at the current time, which we denotenow. The availableinput of the evaluation step is:
• 4B1, . . . ,4Bm are relations that contain the new base events that happen at the current
time now. We have 4Bi =σ[MBi = now](Bi), whereBi is the conceptual “omniscient” base relation that contains all events that ever happen (past, present, and future).
• ◦Q1, . . . , ◦Qn are relations for the event histories of the materialization points that store
results and intermediate results from previous evaluation steps. We have ◦Qi =σ[MQi < now](Qi), where Qi is the “omniscient” result forQi (as described in Section 14.2.2).
• ◦B1, . . . , ◦Bm are relations for the event histories of the base events. We have ◦Bi =
σ[MBi < now](Bi). In practice, ◦B1, . . . , ◦Bn are often not needed because we have materialization points that capture their information.4
4An example of this are the two query plans from Section 14.2.4. For their incremental evaluation the history
◦Eof the stream of incoming eventsEis not needed because the histories◦A,◦B,◦Cof the materialization points
4σC(E) = σC(4E) 4ρA(E) = ρA(4E) 4πP(E) = πP(4E) 4µM(E) = µM(4E) 4γG(E) = γG(4E) 4QXi:q(E) = QXi:q(4E) 4(E1∪E2) = 4E1∪ 4E2 4(E1onE2) = 4E1on◦E2 ∪ 4E1on4E2 ∪ ◦E1on4E2 4(E1oniwjE2) = 4E1oniwj◦E2 ∪ 4E1oniwj4E2 4(E1niwjE2) = 4E1niwj◦E2∪ 4E1niwj4E2 ◦σC(E) = σC(◦E) ◦ρA(E) = ρA(◦E) ◦πP(E) = πP(◦E) ◦µM(E) = µM(◦E) ◦γG(E) = γG(◦E) ◦QXi:q(E) = QXi:q(◦E) ◦(E1∪E2) = ◦E1∪ ◦E2 ◦(E1onE2) = ◦E1on◦E2 ◦(E1oniwjE2) = ◦E1oniwj ◦E2 ◦(E1niwjE2) = ◦E1niwj ◦E2
Figure 14.2: Equations for finite differencing
Theoutputof the evaluation step must be:
• 4Q1, . . .4Qn the results for current step for all materialization points. As explained earlier,
they must be4Qi =σ[MQi =now].
In addition to producing the output, the evaluation step must perform an importantside-effect as preparation for future evaluation steps:
• After the evaluation step, the event histories◦B1, . . . ,◦Bm,◦Q1, . . . ,◦Qnmust be updated
(to◦B10, . . . ,◦Bm0 ,◦Q01, . . . ,◦Q0n) so that they can become the input of the next evaluation step. This means that we must have ◦Bi0 = σ[MB0
i ≤ now](Bi) and ◦Q
0
i = σ[MQ0 i ≤ now](Qi).
Since ◦Q0i=◦Qi∪ 4Qi (analogous for◦Bi0), computing the output also solves the main issue
of the side-effect. Therefore, the main concern of the incremental evaluation is to compute the
4Qi’s efficiently.
14.3.2
Finite Differencing
We can compute each4Qifor a materialization pointQi:=EiinQP efficiently using the changes
4Rjto its input relationsRj, together with◦Rj =σ[MRj < now](Rj), their materialized histories from the previous evaluation step. Note that the input relations Rj can be base relations (B1,
. . . ,Bn) or other materialization points that are defined (and computed) earlier in the query plan
(Q1, . . . , Qi−1).
Using a technique called finite differencing, we can derive a relational algebra expression4Ei
so that4Eiinvolves only4Rj and◦Rj and4Ei=4Qi(for each step). Finite differencing works
by pushing the differencing operator 4 inwards according to the equations in Figure 14.2. The equations might yield expressions where the “history operator”◦is applied to an expression that is not a base relation or materialization point. In those cases, we also need to push the “history operator”◦inwards; the appropriate equations are also given in Figure 14.2.
Finite differencing is a method originating in the incremental maintenance of materialized views in databases, which is a problem very similar to incremental event query evaluation [GL95]: the materialization points in our query plans can be understood as definitions of (materialized) views that must be updated (“maintained”) whenever new events are added to their input relations. In contrast to the general view maintenance problem, however, we only have to consider adding events (not also removing or changing them). This is due to the temporal preservation of CERA. An extension where we would also consider removing events (by introducing5Qiand5Ri) would
Note that finite differencing has similarities with obtaining the derivate of a function through symbolic differentiation (e.g., with equations such as d
dx(f g) = f d dxg+
d
dxf g) in mathematical
calculus. However, finite differencing is not concerned with a differential quotient f(x+44xx)−f(x) but the finite difference of the contents of an event history between two different steps. If we see an event history ◦Qas a time-varying function f(t), then this difference is the set difference f(t+4t)−f(t) where4tis the time that elapses between two steps.
14.3.3
Correctness
To show that the equations for finite differencing of Figure 14.2 are correct, we have to show that for all time pointsnow and all materialization pointsQ:=E (E a CERA expression or a union of relations) it holds that
4E=σ[MQ=now](Q) and ◦E=σ[MQ < now](Q)
provided that
4Ri =σ[MRi =now](Ri) and ◦Ri=σ[MRi < now](Ri) for all the input relationsRi ofE.
The proof is a simple structural induction on E and makes similar arguments about time stamps (and related restrictions of CERA compared to traditional relational algebra) as the proof of the temporal preservation property of CERA (cf. Chapter 13.3 and Appendix B.1).
Note that finite differencing of arbitrary relational algebra expressions is not always as simple as it is here for CERA. In traditional algebra, care must be taken for example with projections that4πP(E) does not produce any “duplicate” tuples that are already in◦πP(E) (and therefore
the equation 4πP(E) = πP(4E) does not hold in general for relational algebra). Further new
tuples on the right hand side of an anti-semi-join or a difference might actually remove tuples from the result so that there not tuples that are added to the result (4E) play a role but also tuples that are removed from the result (5E). The time stamps that are part of every relation and the related restrictions in CERA make finite differencing much easier because they enure that there are no difficulties with respect to duplicates and no tuples must ever be removed from the result.5
14.3.4
Finite Differencing of Multiple Joins
When applying finite differencing to expressions with multiple joins such as E = R on S on T (or more generallyE =R1 on. . . onRn), the equations of Figure 14.2 have a disadvantage: the
resulting expression4E is exponential in size compared to the original expressionE and some subexpressions occur multiple times. For example
E=RonS onT gives
4E = 4R on (◦Son◦T)
4R on (4Son◦T ∪ 4Son4T ∪ ◦Son4T)
◦R on (4Son◦T ∪ 4Son4T ∪ ◦Son4T).
Notice that the subexpression (4S on◦T ∪ 4S no4T ∪ ◦Son4T) occurs twice. For a join of nrelationsE =R1 on. . .noRn, the resulting expression 4E will have a size ofO(2n) compared
to the original expressionE. (Each of then−1 joins doubles the size of the original expression.) 5Note that tuples being removed from the result would be a big difficulty for the step-wise evaluation of event queries over time: in essence it would mean that the event query evaluation gives an answer at one point in time and “retracts” it at a later point in time.
An alternative equation for the finite differencing of a join ofnrelations that would avoid that the same subexpression occurs several times would be:
4(R1onR2on. . .onRn) = 4R1 on 4R2 on . . . on 4Rn ∪ ◦R1 on 4R2 on . . . on 4Rn ∪ 4R1 on ◦R2 on . . . on 4Rn ∪ ◦R1 on ◦R2 on . . . on 4Rn ∪ . . . ∪ 4R1 on 4R2 on . . . on ◦Rn ∪ ◦R1 on 4R2 on . . . on ◦Rn ∪ 4R1 on ◦R2 on . . . on ◦Rn
This expression basically makes a union of all combinations of 4Ri and ◦Ri, except for ◦R1 on
◦R2on. . .on◦Rn. In total there are 2n−1 =O(2n) such combinations, so the length of resulting
expression is still exponential.
However, this equation is very systematic so that there is no need to explicitly represent it in an implementation of the query evaluation engine. (It can internally just use the original expression E instead of 4E). This is particularly interesting because in practice in each step most 4Ri’s
will be empty anyway and only one or two4Ri contain new event tuples. The joins in the union
containing at least one4Ri =∅ will deliver an empty result, so that it suffices to consider only
those few joins that contain at least one4Ri6=∅.
A further alternative would be the following equation for the finite differencing of a join of n relations.6 4(R1onR2on. . .onRn) = 4R1on◦R2on. . .on◦Rn ∪ 4(R2onRn)on(4R1∪ ◦R1) = 4R1on◦R2on. . .on◦Rn ∪ 4R2on◦R3on. . .on◦Rn on(4R1∪ ◦R1)∪ . . . ∪ 4Rion◦Ri+1on. . .on◦Rnon(4R1∪ ◦R1)on. . .on(4Ri−1∪ ◦Ri−1)∪ . . . ∪ 4Rnon(4R1∪ ◦R1)on. . .on(4Rn−1∪ ◦Rn−1)
The length of the resulting expression is quadratic in the size of the original expression. Particularly interesting about expressions that take this form is that they contain subexpressions of the form
4Ri∪ ◦Ri, which is turn is just ◦R0i (cf. Section 14.3.1), the value that ◦Ri should have after
each evaluation step. When reading (and evaluating) the subexpressions of the union from top to bottom, then all subexpressions before theith (i.e., before the line starting with4Rion. . .) access only ◦Ri and all subexpressions after it only4Ri∪ ◦Ri. As long as ◦Ri is not accessed in any
other materialization points of a query plan, the side-effect ◦R0
i =4Ri∪ ◦Ri can be performed
immediately when evaluating theith subexpression.7
14.3.5
Overall Query Evaluation Algorithm
To get back to the overall picture of incremental event query evaluation, let us again consider how a given query plan QP =hQ1 :=E1, . . . , Qn :=Eni is conceptually evaluated. LetQP use the
base relationsB1, . . . ,Bm.
6Note that this equation has different join orders in the different subexpressions of the union. Some care is therefore necessary when trying to transfer this equation from the named perspective on relational algebra to the unnamed, positional (which is relevant for an implementation). Under the named perspectiveRonS=SonR. Under the unnamed perspective, however,SonRhas a different order of attributes thanRonS. This different order must be “rectified” with a projection (which might be implemented as part of the join operation).
7This is particularly advantageous for hash joins: we only have to compute the hash value for a tupler∈ 4R
i
As part of query compilation, we apply finite differencing to QP by simply applying it to every materialization point definition Qi := Ei. The result will be written 4QP = h4Q1 :=
4E1, . . .4Qn := 4Eni. Note that the base relations of 4QP are 4B1, . . .4Bm, . . .◦B1, . . .◦Bm,◦Q1, . . .◦Qn.
The evaluation ofQP then conceptually follows the following schema, where each iteration of the (infinite) loop corresponds to an evaluation step.
◦B1:=∅;. . .;◦Bm:=∅;
◦Q1:=∅;. . .;◦Qn :=∅;
while(true) {
advancenowto the occurrence time of the next incoming event(s);
4Q1:=∅;. . .;4Qn:=∅;
initialize 4B1, . . . ,4Bmwith the current events;
compute 4Q1, . . . ,4Qn according to 4QP; for i:= 1. . . n { ◦Qi:=◦Qi∪ 4Qi; } output 4Q1, . . . ,4Qn; }
The evaluation of 4QP in each step is as described in Section 14.2.2. Keep in mind however that the base relations are4QP are4B1, . . .4Bm, . . .◦B1, . . .◦Bm,◦Q1, . . .◦Qn (notB1, . . . ,
Bm as in the original, “omniscient”QP). Also, expressions of materialization point definitions in
4QP can contain unions at arbitrary places4QP.