An important motivation for introducing a formal representation of query plan as we have done in this chapter is that it enables us to use query plan rewriting as an optimization technique. Query rewriting is a central technique in the optimization of database queries and it can be expected to be of similar importance for event queries.
Query rewriting is usually based on rewriting rules that express a transformation of a given query plan into another, equivalent query plan. Equivalence between query plans means for our purposes that the result forZn is the same for all possible values of the incoming event streamE.
14.5.1
Traditional Relational Algebra Equivalences
Since the expression on the right hand side of materialization point definitions are based on CERA, which in turn is a variant of relational algebra, many well-known rewriting rules using laws about equivalences in relational algebra are applicable. This includes for example the following laws that give rise to corresponding rewriting rules:
• Changing join order: RonS=S onR, (RonS)onT=Ron(SonT)
• Pushing selections: σC(R onS) = σC(R)onS, σC(R niwj S) =σC(R)niwj S, (provided
that Ccontains only attributes fromsch(R))
• Changing selection order: σC1(σC2(R)) =σC1∧C2 =σC2(σC1(R)).
• Pushing projection: πP(R on S) = πP(πP1(R) on πP2(S)), where P1 = P ∩sch(R), P2 =
P∩sch(S)
• Projection before grouping γG,a←F(A)(R) =γG,a←F(A)(πG∪A(R)) (the purpose of this law
is usually to then push the projectionπG∪A further down inR with the equivalence above)
Comprehensive lists of these laws can be found in most books on databases (see, e.g., [AHV95] or [GUW01]); because they are not specific to event queries and event query plans we do not go into further detail on these rewritings.
14.5.2
Equivalences Based on Temporal Reasoning
More interesting in our context are rewritings that are specific to event queries, e.g., because the leverage temporal reasoning to simplify temporal conditions. Examples are the following equivalences:
• Simplifying maxima on time stamps: σ[max{i1, i2. . . in} −min{j1, . . . jm} ≤d ∧ i1 ≤ ik](R) = σ[max{i2. . . in} −min{j1, . . . jm} ≤ d ∧ i1 ≤ ik](R), where k 6= 1; similar
equivalences can be given for minima.
• Elimination of implied conditions: σ[i ≤ j, j ≤ k, i ≤ k](R) = σ[i ≤ j, j ≤ k](R); note that such an elimination may also use implicit assumptions about time stamps, e.g.,σ[i.e≤
j.s, i.s≤j.s](R) =σ[i.e≤j.s](R) because we always assumei.s≤i.e.
14.5.3
Introduction of New Materialization Points
Even more interesting are rewritings that affect not just the right hand side of a single material- ization points definition but affect the query plan as a whole. Because they may affect incremental evaluation by changing which intermediate results are materialized, these rewritings are a very important part of query optimization for event query plans.
The most important rewriting is to create a new materialization pointV for some subexpression E0 in a materialization point definitionQ:=E. Written as a rule, the rewriting is:
h . . . ,
Q:=E (E contains subexpressionE0),
. . . i h . . . ,
V :=E0, (Vnew name)
Q:=E[E0/V], (E[E0/V] denotes replacingE0 withV inE) . . . i
Typically this rewriting rule should only be considered when E contains at least one join inside E0 and one outside E0. The reason for this is that the additional materialization point has a significant effect on incremental evaluation only when this is the case. For example when
hQ:=R onS onTi is changed to hV :=R onS, Q :=V onTi, then the incremental evaluation will utilize stored intermediate results forRonS in◦V and avoid recomputing them.
On the other hand, changing for examplehQ:=π(σ(R))itohV :=σ(R), Q:=π(V)ihas little effect. The incremental evaluation ofQ:=π(V) uses only4V and not◦V, so that no benefit in terms of avoiding to recompute intermediate results in different evaluation steps is given.
Note that the rewriting rule given here could also be applied in the other direction to remove an existing materialization point (provided thatV is not used in the definition of another materi- alization exceptQ). This direction is less relevant here, because we have translated rules in way that the do not create any “unwanted” materialization points (see Section 14.4). However another strategy might be to work from the opposite direction and create, e.g., a materialization point for every binary join in the translation phase and then remove “unwanted” ones in the rewriting phase. (“Unwanted” here means that heuristics or cost-measures in a query planner indicate that the query plan without that specific materialization point is more efficient.)
14.5.4
Multi-Query Optimization
A salient feature of our query plans is that they can describe multi-query optimizations well. For example the following rule will utilize the result of another materialization point V if it is equivalent to (i.e., provides the same results as) the subexpressionE00in Q:=E0.
h . . . , V :=E, . . . ,
Q:=E0, (E0 contains subexpressionE00 withE00≡E) . . . i h . . . , V :=E, . . . , Q:=E[E0/V], . . . i
Note that the hard problem in multi-query optimization is recognizing equivalent subexpres- sion, i.e., that E00 ≡E. It usually has a high computational complexity or might even be un- decidable.9 Our notion of query plans helps us to describe the multi-query optimization in the operational semantics but not much in recognizing possibilities for multi-query optimization.
Because event query evaluation usually entails evaluating several, often very many, event queries at the same time, multi-query optimization is of high importance there. In particular, it is more important than in databases where the traditional model is to evaluate a single query at a time. (Accordingly, multi-query optimization there is often limited to equivalent subexpressions within the same query — this can be expected to be far less the case than equivalent subexpressions over many different queries.)
14.5.5
Outlook: Query Planning in Complex Event Processing
Rewriting rules as they have been shown in this section are only one part of a query optimizer. The second part is to have good cost measures to compare alterative query plans that have been generated using the rewriting rules. Such cost measures and how to efficiently explore the search space of alternative query plans in a branch and bound manner are issues that have been investigated deeply for traditional database systems (see, e.g., [GM93, Gra95]).
The general approach of database systems transfers to event queries. However, there are two important difficulties:
• Event queries require different cost measures because they are evaluated differently. Their evaluation is usually main-memory-based whereas traditional cost measure estimate number of page accesses on disc. The goal of optimization is also different. Databases aim at reducing the overall cost, event queries often aim also at having the cost distributed well over different evaluation steps in the incremental evaluation.
• Cost measures require statistics and estimations about data distribution etc. in the input data; in the case of event queries this would mean information about the incoming event stream. Such statistics and estimations might not be available at the time of query com- pilation, simply because the event stream that will be received in the future is not known. In a database, all data —and thus necessary statistics and estimates— are readily available during query compilation.
One possible solution to these issues might be to generate several query plans solely based on simple heuristics that do not use cost measures, start the evaluation of all plans in parallel, and drop plans that turn out (by using appropriate measurements) to be inefficient at runtime.
9Note that since it is an optimization technique, it usually is not necessary to recognize all equivalences. We would be content just with recognizing many common cases. In so far, sound but incomplete recognition of equivalent subexpressions can be interesting.