• No results found

We now describe the general ideas behind the operational semantics of XChangeEQ, which will be detailed in full in the next chapters. The focus of our operational semantics are logical query plans that describe the necessary operations in evaluating a query in a still fairly abstract way. For a given operation on the logical level (e.g., a join), there are typically many possible realizations on the physical level (e.g., nested loop join, hash join, merge join).

The logical query plans of XChangeEQ are based on an extended and tailored variant of re- lational algebra called Complex Event Relational Algebra (CERA). For incremental evaluation and maintenance of event histories, algebra expressions are “differentiated” into expressions that compute only changes to event histories (this includes the output as a special case). Intermediate results are captured through so-called materialization points. For garbage collection, so-called temporal relevance conditions are derived statically at compile time from a query plan. Dur- ing runtime, these conditions allow to identify events and intermediate results that have become irrelevant due to the progressing of time and can be garbage collected.

12.3.1

Relational Algebra as Foundation

A core observation in this work is that evaluation of complex event queries can be based on relational algebra and, more importantly, that we can separate the algebraic query plan and its incremental evaluation. Relational algebra is an attractive candidate for formalizing operational semantics of an event query language:

• It is a established and successful formalism in the database field. Expressiveness and com- plexity of relational algebra are well-understood. We can expect that proving soundness and correctness w.r.t. our declarative semantics is manageable.

• Event query evaluation can build upon a myriad of work on query optimization from databases and reconsider it in the new light of incremental evaluation on event streams. In particular this includes query rewriting and cost-based heuristics [GM93, Gra95], physical implemen- tation of operators and index structures [Gra93], and adaptive query evaluation [DIR07].

• Relational algebra lends itself to incremental evaluation relatively easily. Issues related to that have been considered for the incremental maintenance of materialized views [GL95] and also in production rule systems [For82, Mir87].

• Optimizations that utilize temporal conditions in queries together with assumptions about the timing and arrival order of events are possible, as we will see in this work.

In our operational semantics, single XChangeEQ or RelEQ rules are translated into relational algebra expressions. Simple event queries give rise to the base relations in these expressions. Complex event queries use the a set of operators that together form a special variant of relational algebra called Complex Event Relational Algebra (CERA). This variant is particularly suited for complex event queries and restricted somewhat to make incremental evaluation easy and efficient. For the translation of single rules into CERA, there are two important ideas: First, we “pre- tend” that the base relations contain all events that ever happen and (at first) ignore that expres- sions must eventually be evaluated in an incremental way. Of course, this is by now an “old trick” that we have used before in our declarative semantics (cf. Chapter 9). The design of CERA will ensure that the expression will still deliver the correct result up to a time pointnowwhen the base relations contain only events up to that particular time pointnow. Second, time stamps of events are, for most part, treated like regular data attributes. Accordingly temporal conditions will be expressed, for example, simply as selections. Allowing to treat time like data gives important flexibility and extensibility for the operational semantics. However we will make use of the special meaning of these time stamp attributes in the incremental evaluation, in optimizations, and in the garbage collection.

12.3.2

Incremental Evaluation

Simply evaluating CERA expressions from scratch in every evaluation step would be inefficient as discussed in Section 12.2.1. For a more efficient, incremental evaluation we employ a technique called finite differencing. From a given CERA expression, we will obtain a new expression that will compute only the changes to the result from changes to the base relations. The design of CERA ensures that these changes are exactly the output we require in each evaluation step.

An important issue in the incremental evaluation is indicating which intermediate results should be materialized across steps to avoid their recomputation. To this end, we will extend the algebra to query plans with materialization points. A materialization point is an equation giving the result of a CERA expression a name (similar to view expressions in databases) so that it can be used like a base relation in other CERA expressions. Materialization points correspond to event histories and thus treat storing materialized intermediate results and incoming events in a uniform way. (Accordingly, we will then often refer to intermediate results also as “events.”) Further, materialization points address chaining of rules in a program when one rule accesses the results of another.

12.3.3

Relevance of Events for Garbage Collection

The incremental evaluation with materialization points addresses how new events and intermediate results are added to the event histories. It does not address how irrelevant events and intermediate results are removed from the event histories. Garbage collection is based on the idea of formalizing the notion whether an event in an event history is still relevant to the query plan at a given time point as so-called relevance conditions. These relevance conditions are evaluated at query runtime, either as part of evaluation steps or asynchronously in a separate execution step, and irrelevant events removed.

The core issue of garbage collection then is determining these relevance conditions. In this thesis we focus on a particular form of relevance, temporal relevance, which is determined from time-related conditions in queries. We develop a method for statically (at query compile time) determining temporal relevance conditions for a given query plan.

This formalization of garbage collection through relevance conditions is important in order to prove correctness of the garbage collection. We will see that this can be done in a fairly elegant way: because the relevance conditions are determined statically (at compile time) we can switch back in the proof to an “omniscient” perspective on query plans that ignores the step-wise incremental evaluation over time to some degree.

Chapter 13

Complex Event Relational

Algebra (CERA)

The first building block of our operational semantics is a special variant of relational algebra called Complex Event Relational Algebra (CERA). The core idea in the design of CERA is to obtain an algebra that is expressive enough to translate XChangeEQ rules but still restricted enough to be suitable for the incremental, step-wise evaluation that is required for complex event queries.

We explain the basic idea of using relational algebra for evaluating single event query rules focusing on the simplified event query language RelEQ (Section 13.1). Some basic familiarity with relational algebra is assumed (see, e.g., [AHV95, GUW01]). We then define CERA formally (Section 13.2) and show a property of CERA called temporal preservation (Section 13.3), which is important for the incremental evaluation in the next chapter. Finally, we provide full details for the translation of XChangeEQ rules into CERA expressions (Section 13.4).

13.1

Expressing Event Queries in Relational Algebra

To explain the basic idea of using relational algebra for event query rules, we focus first on the simplified event query language RelEQ: this hides the complexity of XChangeEQ, in particular with regards to constructing and querying simple events, and allows us to concentrate on the core topic of detecting complex events. For illustration, we use the following three RelEQ rules, which cover all relevant aspects of querying events:

(1) comp(id, p) ← o: order(id, p, q), s: shipped(id, t), d: delivered(t)

obefore s, sbefored, {o, s, d}within48

(2) overdue(id) ← o: order(id, p, q), w:extend(o,6h),

whilew: notshipped(id, t), q <10

(3) load(count(id)) ← o: overdue(oid), w:from-end-backward(o,24h),

whilew: collectshipped(id, t)

Conceptually similar rules have been used in the use cases of XChangeEQin Chapter 7.1. The first rule detects completed order events as a composition of order, shipped, and delivery events. The events must happen in said temporal order within 48 hours. Variableidis the order number, pthe product name,qthe quantity, andtthe tracking number. The second rule detects overdue orders as the absence of a shipped event in the time span of 6 hours after an order event. It applies only to orders with a quantity of less than 10 items. The third rule reports the number of shipped events that have taken place in the last 24 hours prior to an overdue event.

13.1.1

Relations for Events and Event Data

We associate a relation Ri with each simple event query i : R(x1, . . . , xn) in the rule body.

Each event of typeR that happens corresponds to one tuple in Ri. Its occurrence time interval

is part of that tuple, and expressed with its starting time i.s and ending time i.e (where i is name of the event identifier bound in the atomic event query). Accordingly, Ri has the schema

sch(Ri) ={i.s, i.e, x1, . . . , xn}.

Note that we use the named perspective on relations and relational algebra here, where tu- ples are viewed as functions that map attribute names to values. Because variables give rise to attributes, this is more intuitive here than the unnamed perspective where attribute names are identified by their position in an ordered tuple.

Example rule (1) thus gives rise to three such relations: Rowithsch(Ro) ={o.s, o.e, id, p, q}for

o: order(id, p, q), Ss with sch(SS) = {s.s, s.e, id, t} fors: shipped(id, t), and Td withsch(Td) =

{d.s, d.e, t}ford: delivered(t). These relations will be the input of the relational algebra expression into which we will translate the rule.

13.1.2

Event Composition and Temporal Conditions

By virtue of representing occurrence times as part of tuples, translating the complex event query in the body of example rule (1) into a relational algebra expressions becomes quite straightforward. The combination of the three simple event queries with conjunction is expressed with natural joins. Maybe a bit surprisingly, temporal conditions (such asobefore s) are expressed as selections; this works because we made temporal information (i.e., occurrence times of events) part of the data of our base relations.

In our example, we will have to join Ro, Ss, and Td. The temporal condition obefores

gives a selection with condition o.e < s.s, the temporal condition sbefored a selection with condition s.e < d.s. The metric condition {o, s, d}within 48 gives a selection with condition max{o.e, s.e, d.e} −min{o.s, s.s, d.s} ≤48.

With this, the rule body could be translated into the following relational algebra expression: σ[max{o.e, s.e, d.e} −min{o.s, s.s, d.s} ≤48](

σ[s.e < d.s]( σ[o.e < s.s](

(RoonSs)onTd))).

For readability, we write parameters of operators in square brackets, e.g., σ[o.e < s.s], rather than in the more conventional way of subscripts, e.g.,σo.e<s.s.

There are of course a number of alternative relational algebra expressions that compute the same result. For example the expression

σ[o.e < s.s](

σ[max{o.e, s.e, d.e} −min{o.s, s.s, d.s} ≤48]( Roon

σ[s.e < d.s](SsonTd)))

would do the same as the one above. Rewriting rules could be used to transform one expression into the other. This is a well-explored topic for relational algebra and gives rise to query optimizations on the logical level such as pushing selections or reordering joins. There is also potential to simplify the relational algebra expression by reasoning about the temporal selections. For example, σ[max{o.e, s.e, d.e} −min{o.s, s.s, d.s} ≤48] could be simplified to justσ[d.e−o.s≤48] because of the other temporal conditionso.e < s.s ands.e < d.eand the implicit knowledge thati.s≤i.e for anyi. The implicit knowledgei.s≤i.efor any icomes from the fact that the ending time of an event can never be before its starting time.

13.1.3

Rule Head

The expression just seen translates only the rule body. To translate the full rule, we still have to drop attributes that are not in the head (hereqandt) and to generate the occurrence time of the result. Dropping attributes is simply a projection.

The occurrence time of the result will be expressed with time stampsr.sandr.e. By definition, r.smust be the smallest value of the time stamps of the input events andr.ethe largest (cf. Chap- ter 6.4 and the last line in the model theory of Figure 9.1). Here, thereforer.s= min{o.s, s.s, d.s}

andr.e= max{o.e, s.e, d.e}.

To generate the occurrence time of the result, we introduce an operator µ that is not part of standard relational algebra. The merging operator µ[j ← i1t · · · tin](E) computes a new

occurrence time interval (with start and end time stamps j.s and j.e) from existing occurrence times so that it covers all these intervals, i.e.,j.s= min{i1.s, . . . in.s}andj.e= max{i1.e, . . . in.e}.

The result contains only the new occurrence time, the input occurrence times are dropped. Merging of time intervals is not really a new operation for relational algebra. It is equivalent to the following extended projection [GUW01], a common practical extension of relational algebra used to compute new attributes from existing ones:

π[ j.s←min{i1.s, . . . in.s}, j.e←max{i1.e, . . . in.e},

sch(E)\ {i1.s, . . . in.s, i1.e, . . . in.e} ](E).

However, we do not want to allow arbitrary extended projections on time stamp attributes — they could violate the temporal preservation of CERA (cf. Section 13.3). Therefore we only allow its restricted use through the newµoperator.

The full rule of our example (1) then becomes the following relational algebra expression. Note that only the π and µ operator on the top have been added compared to the expression from earlier.

π[r.s, r.e, id, p]( µ[r←otstd](

σ[max{o.e, s.e, d.e} −min{o.s, s.s, d.s} ≤48]( σ[s.e < d.s](

σ[o.e < s.s](

(RoonSs)onTd))))).

13.1.4

Relative Timer Events and Negation

To translate example rule (2), we have to accommodate two further issues: the generation of rel- ative timer events and the negation of an event. Relative timer events are expressed as auxiliary relations that will be joined with the relations of the other events. Negation can be expressed through an anti-semi-join, or more precisely a θ-anti-semi-join that uses theθ condition for ex- pressing the event accumulation window.

The timer eventw:extend(o,6h) in our example is defined relative to the evento: order(id, p, q), which has the corresponding relationRo. The relation for the timer event will be denotedXwand

defined as

Xw:={x|(x(o.s), x(o.e))∈π[o.s, o.e](Ro), x(w.s) =x(o.s), x(w.e) =x(o.e) + 6}.

In this definition,x(y) denotes the value for attributeyof a tuplexas usual. RelationXwcontains

four time stamps: the timer event’sw.s andw.e, which are computed based on the time stamps r.s and r.e of R, and also r.s and r.e, which are needed for the join Ro on Xw. Naturally, the

definition ofXwis dependent onRo.

Recall that negation of events must still be sensitive to data. In our example rule (2), only shipped events with the same value for id as the order event are of relevance. Accordingly, an anti-semi-join is appropriate (rather than, say, a difference). In addition, negation is restricted by a time window, which is specified by another event through thewhilekeyword. This time window can be expressed as a condition, here w.s ≤ s.s∧s.e ≤ w.e, where w.s and w.e are the time

stamps of the event giving the time window ands.s and s.eare the time stamps of the negated event. This condition is added to the anti-semi-join so that it becomes aθ-anti-semi-join. Recall that aθ-anti-semi-join of a relationRwith a relationScan be defined in terms of other relational algebra operators asRnθS=R\πsch(R)(σθ(RonS)). Here\is the usual difference operator of relational algebra.

Because the expressionw.s≤s.s∧s.e≤w.eis somewhat longwinded and we will use it fairly often, we abbreviate it also with wwsand write accordingly justnwws.1 The intuition is that

the time intervals= [s.s, s.e] is a subset of w= [w.s, w.e], i.e., [s.s, s.e]v[w.s, w.e], if and only ifw.s≤s.s∧s.e≤w.e. To further emphasize thatwis on the left hand side andson the right hand side of the anti-semi-join, we usewinstead ofv.

With this, example rule (2) can be expressed as the following relational algebra expression. Note that the condition on data (q <10) simply becomes an ordinary (data) selection.

π[r.s, r.e, id]( µ[r←o, w](

σ[q <10](

(RoonXw)nwwsSs))),

whereXw:={x|(x(o.s), x(o.e))∈π[o.s, o.e](Ro), x(w.s) =x(o.s), x(w.e) =x(o.e) + 6}

There are two things to remark on this expression. First, it might seem that instead of joining with the auxiliary relation (RoonXw) to generate the relative timer event, one might also use an

extended projection in form π[w.s ← o.s, w.e ← o.e+ 6, sch(Ro)](Ro). However, this extended

projection would cause difficulties in the incremental evaluation because it does not satisfy the temporal preservation of CERA. Using the auxiliary relation also gives a considerable gain in flexibility and expressivity. Novel relative timer events that cannot be expressed as simple addition or subtraction (e.g., “the next Thursday after evento”) can be modeled simply as such auxiliary relations and thus integrated in the operational semantics easily. Second, one might argue that theθ-anti-semi-join is an unnecessary operation because it can be expressed using the “low-level” operations difference, projection, selection, and join. This is true, but as we will see, CERA does not allow that particular expression.2 The reason for this is again that the operators needed for that expression one could also be used to build expression that do not satisfy the temporal preservation. Also, aθ-anti-semi-join is very valuable for an efficient incremental query evaluation.

13.1.5

Aggregation

Dealing with the event accumulation (while/collect) used to aggregate data (e.g.,count) as in the example rule (3) can be broken down into two tasks. First, we have to “supply” all the data that will be aggregated. Second, we then have to actually aggregate the data. Aggregation is an operation not supported by standard relational algebra. However, the grouping operator γ is a common extension to relational algebra to support aggregation [GUW01].

Supplying the necessary data for aggregation will be done with aθ-join. As with negation and itsθ-anti-semi-join, theθ condition is used to expresses the temporal window over which events are collected. For our example rule, theθ-condition isw.s≤s.s∧s.e≤w.e, or abbreviatedwws, as before. However w.s and w.e are now from a different auxiliary relation Y because we have a different relative timer event forw. Here Yw:={y|(y(o.s), y(o.e))∈π[o.s, o.e](Uo), y(w.s) =

y(o.e)−24, y(w.e) =y(o.e)}, whereUo is the relation for the overdue events.

Unlike theθ-anti-semi-joinnθ, theθ-joinonθ is not really necessary. It could be just expressed