Formal Definition of CERA - Eckert, Michael (2008): Complex Event Processing with XChang

We now formally define the operations of our Complex Event Relational Algebra (CERA). This formal definition makes clear that we use only a restricted set of relational algebra operations (e.g., no difference and union operators), have some restrictions on operators (e.g., projection must not drop time stamp attributes), and have some additional operations that are not part of the standard relational algebra (e.g., grouping or merging). After the definition, we shortly summarize these differences between CERA and traditional relational algebra. A so-incline reader may want to jump directly to this summary in Section 13.2.13 and only go back to the formal definition of CERA in case some notation or definition later in this and the next chapters cannot be grasped fully from the context.

13.2.1 Relations and Schemas

LetAttrN amesdenote a set of possible attribute names. It is partitioned into four disjoint sets, one for names for start time stamps (formi.s), one for end time stamps (formi.e), one for event references (formi.ref), and one for regular data attributes (formx, not containing a dot). Let dom denote the domain, i.e., the set of all possible attribute values. Since we need to represent time stamps and data terms amongst others,_T⊆dom andDataT erms⊆dom.

The basic data objects our algebra operates with are tuples under the named perspective. A named tuple t is a partial function t : AttrN ames → dom from attribute names to attribute values. We write r(X) = ⊥ when r is undefined for the attribute name X. Named tuples will usually be denoted with lower case lettersr, s, t.

A relation R is a set of tuples. Each relation R is associated with a schema sch(R). For all tuplesr∈Rit must hold thatr(X)6=⊥ifX ∈sch(R) andr(X) =⊥ifX6∈sch(R).

An important constraint is that every schema must contain at least one pair of time stamps, i.e.,

∃i{i.s, i.e} ⊆sch(R). Time stamps occur only pairwise, i.e.,∀i i.s∈sch(R) ⇐⇒ i.e∈sch(R). For a given schema sch(R), we also introduce the following “subschemas”:

• schstart(R) = {i.s|i.s∈sch(R)} is the set of all attribute names for start time stamps in

the schema,

• schend(R) ={i.e|i.e∈sch(R)} is the set of all attribute names for end time stamps in the

schema,

• schtime(R) =schstart(R)∪schend(R) is the set of all attribute names for start or end time

stamps in the schema,

• schref(R) ={i.ref |i.ref ∈sch(R)} is the set of all attribute names for event references,

and

• schdata(R) =sch(R)\(schtime(R)∪schref(R)) is the set of all attribute names for regular

The notion of a schema for a relation extends straight-forwardly to the notion of a schema for a CERA expression formed using operators. We will define it along with the operators.

We make two “sanity” assumptions about contents of relations. First, no starting time stamp in a tuplerof a relation Rmay be later than its corresponding end time stamp. Formally,

∀r∈R. r(i.s)≤r(i.e).

Second, all tuples with the same event reference must have the same values for the time stamps that correspond to the event reference. Formally,

∀i.ref ∈schref(R)∀r∈R∀r0∈R. r(i.ref) =r0(i.ref) =⇒(r(i.s) =r0(i.s)∧r(i.e) =r0(i.e)).

If these sanity assumptions hold for the input relations of a CERA expression, they also hold for the result of the expression. (We don’t give a formal proof of this; it is just a trivial structural induction with one case for every CERA operator.)

13.2.2 Equality and Simulation Equivalence

For values from the domain of data terms (DataT erms), a small but important remark is necessary. When we compare the equality of two data terms, t1 = t2, we must do this with simulation equivalence as defined in [Sch04]. This ensures that terms that are syntactically different such as a{b,c}and a{c,b}but have the same semantics in the data model are recognized as equal, i.e., a{b,c}=a{c,b}is true.

13.2.3 Selection

Selection in CERA is the same as in traditional relational algebra. For a given conditionC, the selection operator takes as input a relationRand delivers as output all those tuples fromRthat satisfyC.

σ[C](R) ={t∈R|C(t) is true}, sch(σ[C](R)) =sch(R).

There are no restrictions on the condition C of a selection. Typically, a condition involves a comparison operator (=, <, >, ≤, ≥), attribute names from sch(R), and possibly constants. Importantly, the condition may operate on time stamp attributes, e.g., C ≡ i.e < i.s. The condition may also do some more computations, e.g., computing maxima, minima, and differences that are then used in a comparison. The typical example are conditions such asC≡max{i.e, j.e}−

min{i.s, j.s} ≤1 used for translating thewithinmetric temporal constraint.

We note that the selection operator σ[C] should not be confused with a substitution named σ. Fortunately, this danger is slim since in the operational semantics, the substitutions have been “replaced” by tuples in relations and will not appear anymore.

13.2.4 Renaming

The named perspective of relational algebra sometimes requires for technical reasons a renaming operator, which changes the names of attributes without affecting the result. For example, the CERA expression for rule (2) in Section 13.1 produces a relation with an attributeid. This relation is accessed in the CERA expression for rule (3) but the attribute is called oidthere. (Note that simply changingoidto idis not possible, becauseid is already used elsewhere in rule (3) and its corresponding CERA expresssion.)

Renaming is denotedρ[a0₁←a1, . . . an0 ←an](R) and renames attributesa1, . . . ,anrespectively

toa0₁, . . . ,a0_n. Recall from Section 13.2.1 that time stamps must always occur pairwise; accordingly, they can only be renamed pairwise.

ρ[a01←a1, . . . a0n←an](R) ={t| ∃r∈R. t(a0i) =r(ai) and∀X 6∈ {a1, . . . an} t(X) =r(X)},

sch(ρ[a0

1←a1, . . . a0n←an](R)) = (sch(R)\ {a1, . . . , an})∪ {a01, . . . , a0n}),

where{a1, . . . , an} ⊆sch(R),{a01, . . . , a0n} ∩sch(R) =∅,

andj.s←i.s∈ {a0₁←a1, . . . a0n←an} iffj.e←i.e∈ {a01←a1, . . . a0n←an}

13.2.5 Natural Join

Natural join in CERA is the same as in traditional relational algebra. It combines those tuples from its input relationsR andS that agree on the values of shared attributes into output tuples. Note that ifsch(R)∩sch(S) =∅, the natural join “degenerates” to a Cartesian product.

R_o_nS ={t| ∃r∈R∃s∈S∀X. ifX ∈sch(R)\sch(S) thent(X) =r(X), ifX ∈sch(S)\sch(R) thent(X) =s(X),

ifX ∈sch(R)∩sch(S) thent(X) =r(X) =s(X), t(X) =⊥otherwise},

sch(RonS) =sch(R)∪sch(S).

Recall from Section 13.2.2 that if r(X) and s(X) are data terms, then their equality in the definition means simulation equivalence. As value that is assigned tot(X), either one of the two can be chosen.

13.2.6 Temporal

θ-Join

Because they can be expressed as a combination of selection and natural join, θ-joins are not strictly necessary in CERA. However, the translation of while/collect in XChangeEQ and RelEQ is conveniently expressed with a particularθ-join. In this temporal θ-join RonθS, the condition

θ has the form i.s ≤j.s∧j.e ≤i.e, where {i.s, i.e} ⊆sch(R) and {j.s, j.e} ⊆ sch(R). We also abbreviate this withiwj (to emphasize thati.sandi.eare on the left side, we write “w” rather than “v” with swapped arguments).

Because the temporalθ-join is expressible through other CERA operators, formal proofs about properties of CERA may ignore it.

R_o_niwj S=σ[i.s≤j.s∧j.e≤i.e](RonS).

In addition to providing some notational convenience, temporalθ-joins are interesting because they allow certain optimizations based on the temporal condition [BE07b].

Up until this point, the definitions of CERA operators were identical with traditional relational algebra. The operators that we will meet in the remainder of this section will diverge and be restricted some way. The restrictions all serve the purpose of achieving the temporal preservation property of CERA (see Section 13.3) that enables a step-wise and incremental evaluation as necessary for complex event queries.

13.2.7 Projection with Preservation of Time Stamp Attributes

Projection in CERA is subject to an important constraint: it must preserve all time stamp attributes and thus may only drop data attributes and event reference attributes. For a given set A of attribute names and an input relationR, whereA ⊆schdata(R)∪schref(R), it delivers as

output a relation that is reduced to the schemaA but otherwise equal toR.

π[A](R) ={t| ∃r∈R∀X.ifX ∈Athent(X) =r(X), otherwiset(X) =⊥}, sch(π[A](R)) =A,

The restriction that time stamps must be preserved by projection is important for the temporal preservation of CERA (see Section 13.3).

13.2.8 Merging of Time Intervals

Merging is a new operator in CERA that is not found in traditional relational algebra. The operator builds new time stamps i.s, i.e in the output from time stamps j1.s, j1.e, . . . jn.s, jn.e.

The new time stamps are constructed so that the interval i = [i.s, i.e] covers exactly all the intervals j1 = [j1.s, j1.e], . . . , jn = [jn.s, jn.e]. Using the notation “t” from Chapter 9, this is

written i = j1t · · · tjn and thus explains the notation used for the µ operator. The old time

stampsj1.s, j1.e, . . . jn.s, jn.eare simply dropped.

µ[i←j1t · · · tjn](R) ={t| t(i.s) = min{r(j1.s), . . . , r(jn.s)},

t(i.e) = max{r(j1.e), . . . , r(jn.e)},

t(X) =r(X) ifX ∈sch(R)\ {i.s, i.e, j1.s, j1.e, . . . , jn.s, jn.e},

t(X) =⊥otherwise},

sch(µ[i←j1t · · · tjn](R)) = (sch(R)\ {j1.s, j1.e, . . . , jn.s, jn.e})∪ {i.s, i.e},

where{j1.s, j1.e, . . . , jn.s, jn.e} ⊆sch(R).

As noted earlier, merging can be understood just as a restricted version of an extended projection. However, full extended projection is not allowed in CERA. It would be possible to add extended projection to CERA provided that it does not modify or drop time stamps (i.e., with the same restriction as standard projection). However, extended projection is not really necessary in CERA because the grouping operator performs the same duty (together with the construction function).

13.2.9 Temporal Anti-Semi-Join

CERA does not support arbitrary difference or anti-semi-join operations. However it does allow a special form of θ-anti-semi-joins, where the θ-condition gives a restriction so that temporal preservation (cf. Section 13.3) is assured and thus step-wise, incremental evaluation possible. The θ condition must have the formiwj (short fori.s≤j.s∧j.e≤j.e) wherei.s,i.e are some time stamp attributes of the left input relation and j.s, j.e are the only time stamp attributes of the right input relation.

The temporal anti-semi-joinRniwj Stakes as input two relationsRandS, where{i.s, i.e} ⊆

schtime(R) and{j.s, j.e}=schtime(S). (Note that it is “⊆” for the time stamps of the left side of

the anti-semi-join and “=” for time stamps on the right side!) Its output isRwith those tuplesr removed that have a “partner” isS, i.e., a tuples∈S that agrees on all shared attributes withr and whose time stampss(j.s),s(j.e) are within the time boundsr(i.s),r(i.e) given byr.

R_niwjS = {r∈R| ∀s∈S. if∀X∈sch(R)∩sch(S)r(X) =s(X)

then [r(i.s), r(i.e)]6v[r(j.s), r(j.e)]}

= {r∈R| ∀s∈S. ∃X ∈sch(R)∩sch(S)r(X)6=s(X) orr(i.s)> r(j.s) orr(j.e)> r(i.e)}, sch(RniwjS) =sch(R),

where{i.s, i.e} ⊆schtime(R) and{j.s, j.e}=schtime(S).

This definition is a somewhat length; therefore it might be easier to think of the temporal anti-semi-join as being defined as a combination of other operators:

R_niwj S=R\πsch(R)(σiwj(RonS)).

Keep in mind, however, that the expression on the right hand side of this definition is not allowed in CERA because there is no difference operator in CERA and because its projection does not preserve all time stamps.

13.2.10 Temporal Grouping

Grouping is an operator that is not part of the traditional relational algebra, but a common prac- tical extension to it for dealing with aggregation (e.g.,COU N T,M AX,M IN,SU M). Grouping in CERA is subject to an important restriction: all time stamps of the input relation must be used as grouping attributes. We therefore also call it temporal grouping. Again, this restriction serves to ensure temporal preservation in CERA (cf. Section 13.3).

The temporal grouping operatorγ[G, a←F(A)](R) takes as input a relationR. Its parameters are set of attributesG, the so-called grouping attributes, and an aggregation expression. All time stamps of the input relations must be grouping attributes, i.e.,schtime(R)⊆G. The aggregation

expression consists of an attribute name a and an aggregation function F(A) with parameters A (attribute names). The grouping operator partitions R into groups Pi, one group for each

combination of values of the grouping attributesG(that is, all tuples inGi have the same values

for the grouping attributes). Each group Pi gives rise to one output tuple. The output tuple

contains the grouping attributes Gwith the corresponding values and additionally the attribute a. The value ofais obtained by applying the aggregation functionF(A) toPi.

γ[G, a←F(A)](R) ={t| ∃∅ 6=P ⊆R. ∀p∈P∀X ∈G t(X) =p(X),

∀p0 ∈(R\P)∃X ∈G t(X)6=p0(X), g(a) =F(A)(P)}

sch(γ[G, a←F(A)](R)) =G∪ {a},

where schtime(R)⊆G⊆sch(R), a6∈Gis the name of a data attribute, andA⊆sch(R).

The generalization of the grouping operator γ[G, a ← F(A)](R) with a single aggregation expressiona←F(A) to a grouping operatorγ[A, a1←F(A1), . . . an ←F(An)](R) with multiple

aggregation expressions is straightforward and will therefore not be detailed here further. The aggregation functionF(A) is any function that takes as input a single relation and produces as output a single value. Common aggregation functions areCOU N T(A), M AX(A),M IN(A), andSU M(A). They have the following definitions:

COU N T(A)(P) = |π[A](P)|,

M AX(A)(P) = max{v| ∃p∈P.v=p(A)}, M IN(A)(P) = max{v| ∃p∈P.v=p(A)}, SU M(A)(P) = P

p∈Pp(A).

In the first case, A can be a set containing one or more attribute names. In the other three cases,Amust contain only a single attribute name, and all values of that attribute inAmust be numbers.

Another aggregation function is CX_c, which is used for the construction of new data terms (as needed for translating XChangeEQ, not just RelEQ). This aggregation function is detailed in the next section. Because construction may actually produce not just one new data term but several, we generalize our temporal grouping operator. In the generalized version, it produces for each group not just a single tuple but a one tuple per value in the result (which is a set) of the aggregation function. Note that all that changes in the definition that follows is that we have g(a)∈F(A)(G) instead ofg(a) =F(A)(G).

γ[G, a←F(A)](R) ={t| ∃∅ 6=G⊆R. ∀g∈G∀X∈G t(X) =g(X),

∀g0 ∈(R\G)∃X ∈G t(X)6=g(X), g(a)∈F(A)(G)}

sch(γ[G, a←F(A)](R)) =G∪ {a},

where schtime(R)⊆G⊆sch(R), a6∈Gis the name of a data attribute, andA⊆sch(R).

When the aggregation functionsCOU N T(A),M AX(A),M IN(A), andSU M(A) are adapted so that they deliver a singleton set instead of directly delivering a value, this generalized version can be used for them as well.

13.2.11 Construction

The construction of data terms is realized as an aggregation functionCX_c(A) (also writtenCX[c](A)), where c is a construct term and X a set containing one or more attribute names. Like other aggregation functions,CX_c takes as input a relation provided by the grouping operator. As output it produces a set of data terms. These data terms are constructed from the construct term c by interpreting the input relation as a substitution set (and accordingly the tuples of the input relations as the individual substitutions of the substitution set).

CXc(X) is more or less a black box operation defined by Xcerpt, the Web query language

underlying XChangeEQ. The result of CXc(A)(P) is the application Σ(c) of the substitution set

Σ that corresponds to πA(P) to the construct term c as defined for Xcerpt in Chapter 7.3.3 of

[Sch04]. Recall that we have met this application Σ(c) already in Chapter 9.2.3.

CX_c(X)(P) = ΣA,P(c)

where ΣA,P :={σ| ∃p∈P∀X. σ(X) =p(X) ifX∈A, σ(X) =⊥otherwise}

Keep in mind that CX_c(X) is an aggregation function (likeCOU N T orM AX), not an algebra operator (likeσ oron). It can only be used inside a temporal grouping expression.

Note that as already mentioned in Section 13.1.6, other construction operations, for example from other query languages than Xcerpt could be easily integrated into CERA. All we need for this is an appropriate construction functionCLfor that language. This shows that CERA is a very general formalism for the evaluation of complex event queries that is applicable beyond RelEQand XChangeEQ.

13.2.12 Matching

Obtaining variables bindings by matching the query term q of a simple event query event i:q against data terms of events is realized by the matching operator QXi:q (also written Q

X [i : q]). The input ofQXi:q is a relationRwith schemasch(R) ={e.s, e.e, term}. The relation contains one

tuple for each simple event; its starting time the value ofe.s, its ending time the value ofe.eand its data term the value of term. The resultQi:q(R) is a relation that contains a tuple for each

substitution obtained from matching the query term q against all the term values in the input relation. These tuples have an attribute for each free variable in q, and additionally the three administrative attributesi.s,i.e, andi.ref, which will be detailed shortly.

Note that matching of q against a single data term yields a set of substitutions (not a single substitution). Now,QX_i_:_q matches against all data terms in R, and each of these terms yields an individual set of substitutions. The result relation Qi:q(R) is however just a “flat” set of tuples

for substitutions, not a set of sets. We therefore need a way to reconstruct from the flat set Qi:q(R) the tuples belonging together because they were obtained from the same simple event.

To this end, tuples inQi:q(R) contain the additional attributei.ref. This attribute, called event

reference, is an identifier that tells us which tuples belong together. For tuples that were obtained from the same simple event, the values of i.ref are the same, for tuples obtained from different simple events they are different. There are no restrictions on the domain of the event reference attribute or how it is generated as long as it fulfills this purpose. It could be implemented for example by simply assigning consecutive numbers to simple events or as a memory address of the simple event. One could also concatenate the string representations of start time stamp, end time stamp and data term of the simple event, because the string obtained this way would be unique. (Note however that this is only interesting for theoretical investigations; the strings would consume an unnecessarily large amount of memory compared to consecutive numbers.) The generation of event references is handled by a designated functionref below.

The tuples also contain the start and end time stamp of the simple event they were obtained from. For convenience, the matching operator renames them frome.s,e.etoi.s,i.e.

The matching itself is (like construction) a black box operation realized by Xcerpt. Be- low we write it as a function match(q, d), where d is a data term and q is a query term that does not contain negated variables. Negated variables in query terms are those that occur

In document Eckert, Michael (2008): Complex Event Processing with XChangeEQ: Language Design, Formal Semantics, and Incremental Evaluation for Querying Events. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 167-174)