Desiderata and Design Decisions - Eckert, Michael (2008): Complex Event Processing with

The previous section has only explained the basic task of event query evaluation; it provides still little guidance for actually developing an event query evaluation, however. We now describe the desiderata for operational semantics that build the foundation of an efficient event query evaluation method. We also describe some of the design decisions that have been made in the operational semantics of XChangeEQ_.

12.2.1 Incremental Evaluation with Intermediate Results

For efficiency reasons, it is desirable to use an incremental, data-driven evaluation method for complex event queries and rules. An incremental evaluation ensures that in an evaluation step only the required output, i.e., events with an occurrence time now, is produced. Events with an earlier occurrence time < now, that have been in the output of earlier steps, need not and will not be produced again.

Different evaluation steps often require computation of the same intermediate results. It is often desirable to store and update such intermediate results across steps to avoid recomputing them in every step. Note however that storing intermediate results consumes additional memory. Therefore, there is always a trade-off between memory usage and computation time involved.

To illustrate the importance of incremental evaluation and intermediate results, consider evaluating the following event query rule:

d(x, y)←i: a(x), j: b(x, y), k: c(y).

A naive way to evaluate it might be to maintain sets of a, b, and c events as event histories. When ever some event happens at timenow we perform an evaluation step. In the step we first add the new event to its corresponding history. Then we use the event histories to evaluate the event query from scratch with some traditional, non-event method. Because of the shared variables in the simple event queries, this essentially this means performing a three-way join (in the sense of relational algebra) of the sets of events stored in the event histories. The result of this join, however, contains not only our desired output of complex events with occurrence time now . It also contains complex events with occurrence time< now. We therefore need to filter the result further to obtain the desired output.

This naive method has two major issues. First, in each step we compute far more than the necessary output. We compute not only results with occurrence timenow, but also all results with an occurrence time≤now. Only later we select the desired output from these results. In general, the number of complex events with an occurrence time< nowcan be expected to be much larger than the number with occurrence time now. Therefore a considerable amount of computation is wasted on producing unneeded results.

Second, each step recomputes intermediate results that have already been computed in previous steps. For example if a step is initiated by a c event, then the binary join of the event histories for a and b events is computed as an intermediate result in the three-way join. However, previous steps have done the same computation and the result has not changed because the contents of these event histories have not changed.

An incremental evaluation with intermediate results that computes only the necessary output and stores not only incoming events in the event history but also some intermediate results can be expected to perform much better. As we will see in later chapters, the term “incremental” derives from the idea that in each step wecompute only the changesrelative to the previous results given by the current incoming events. These changes, it will turn out, correspond exactly to our desired output of complex events with occurrence timenow.

12.2.2 Timing and Order of Events

For the presentation of our operational semantics we assume that all incoming events are given occurrence times according to a single time axis. We further assume that they are received and processed by the event query evaluation engine in the order of their occurrence times. Occurrence times can be time intervalst= [b, e], so more precisely we mean by this that events arrive ordered by the end time pointe of that time interval. Recall that our domain of time points is linearly ordered (Chapter 9.1.2).

It is important that the assumptions of a single time axis and of an ordered arrival can be given up in the operational semantics. As discussed in Chapter 2.4.4, they might not be suited for distributed systems. Clocks in a distributed system cannot be perfectly synchronized and thus

give rise to several time axes when events are time-stamped at different nodes. Varying network latencies in event transmission give rise to an unordered arrival of events.

While we will start with these assumptions, they are not an integral part of the operational semantics that we develop in the following chapters. The operational semantics are designed so that they can be easily modified to work without these assumptions. Typically they would be replaced by an assumption of a so-called scrambling bound [MWA+_{03] that limits the disorder} and divergence of time axes.

Still, starting with these assumptions simplifies presentation of operational semantics tremen- dously and make it much easier to get the general ideas across. Also in some applications, a simple solution where event are assigned time stamps upon reception and according to a local clock of the event query evaluation engine is sufficient. In this case, a single time axis and ordered arrival are given by definition. When assumptions hold, they give rise to some interesting optimizations in the query evaluation. And in turn these optimizations can often be modified to work also in cases where the assumptions do not hold.

12.2.3 (Framework for) Query Optimization

For a given event query there is not just one single way for evaluating it, there are many different ways that all achieve the same result. Such a “way” to evaluate a query is also called query plan. Different query plans for the same query differ for example in the order they perform certain operations or in the concrete data structures used for the event histories. Consider again the rule

d(x, y)←i: a(x), j: b(x, y), k: c(y).

For example, one query plan might choose to first combine a events with b (performing something like an equi-join on thexvariable) and then combine this intermediate result with c events (equi- join onyvariable). Another might first combine c with b events, and then with a. One plan might use arrays for event histories, another hashes.

The performance of these different query plans will differ, in fact, might vastly differ. Perfor- mance depends highly on characteristics of the event stream and the data contained in events. Accordingly, a query plan that outperforms others on one event stream might be much worse than its alternatives on another event stream. Query optimization, that is considering different query plans for evaluating a given query and choosing one that is expected to perform well, is a vital and deeply explored issue in database systems. We can expect query optimization to be equally important for event queries.

Operational semantics for event queries should therefore not just describe one single way to evaluate a query or rule program. They should be able to capture a whole space of different query plans and provide a framework for query optimization (e.g., through rewriting query plans).

12.2.4 Soundness, Completeness, Termination

Declarative semantics for XChangeEQ_{have been given in the form of a model theory and associated} fixpoint theory (Chapters 9 and 10). It is obvious that operational semantics should be sound and complete w.r.t. the declarative semantics. Soundness here means that any answer produced by the operational semantics is also an answer according to the declarative semantics. Completeness means the converse, anything that is an answer according to the declarative semantics will be produced by the operational semantics. Another, in the field of algorithms more common term, for soundness and completeness taken together would be (partial) correctness. Since XChangeEQ_is a rule-based language the terminology from logic programming is deemed more suitable, however. Soundness and completeness is of course more an obvious requirement than a desideratum. However, they lead to an important desideratum and are therefore listed in this section: our operational semantics should make proving soundness and completeness reasonably simple. Com- plicated proofs would not only be prone to contain oversights or errors (i.e., not be proofs at all); they might also be an indication that optimization (e.g., in the form of rewriting query plans) is difficult.

Event query evaluation is not a simple algorithm that just runs once. It is a step-wise procedure that involves updating and garbage collecting information in the event histories across steps. Operational semantics that can be proven sound and completeness are therefore a considerable challenge. We will see that the operational semantics of XChangeEQ _{use several intermediate} representations and transformations between them to get from a given XChangeEQ _{program to} the final query plan. Arguably, this chain of transformations makes the operational semantics a bit long-winded to understand at first; however is helps greatly in proving correctness: the individual transformations are all quite intuitive and their correctness is easy to see.

Along with soundness and completeness comes the question of termination of the event query evaluation. Since we work on unbounded streams, termination here means that every evaluation step terminates — not that the whole evaluation terminates (which it should not on infinite streams). Termination is of course desirable. However there is a trade-off between a language’s expressiveness and termination. If the language is such that it allows an operational semantics that guarantees termination, then the language’s expressiveness it limited (in particular it is not Turing complete). A very typical approach to this dilemma is to give an operational semantics that is, in the general case, not guaranteed to terminate, but for which subsets of the language can be identified that guarantee termination.

The operational semantics for XChangeEQ_{focus on hierarchical programs (cf. Chapter 10.1.3).} For these programs termination of each evaluation step is guaranteed. However, the operational semantics can easily be extended to stratified programs (cf. Chapter 10.1.2). In this case, some programs might lead to non-terminating evaluation steps. Usually this is simply because the evaluation step would have to produce an infinite number of events as output. In Chapter 18, we resume this discussions and look at alternatives and extensions to the current operational semantics that could avoid producing such an infinite number of events.

12.2.5 Extensibility and Applicability to Other Settings

Because querying events is a young and dynamic research area, it is desirable to develop operational semantics that are extensible and applicable to other settings than just XChangeEQ _{running on a} single machine.

The design of XChangeEQ already anticipates certain points where the language might be extended. These include: new calendric systems for generating absolute and relative timers and for expressing temporal conditions, new relationships between events such as causal or spatial relationships, enriching events with (non-event) database data, or using a different data model and query language for simple events. Our operational semantics should be able to accommodate such extensions with relative ease. In the same direction, if possible the operational semantics should be suitable not just for evaluating XChangeEQ _{programs. Ideally they should provide a} common basis that could also be used for implementing other, different event query languages (e.g., composition-operator-based languages).

Further our operational semantics should not just be usable in a setting where an event query program is evaluated with a fixed query plan on a single machine. They should, for example, also provide a suitable basis for investigating adaptive query evaluation techniques that modify the query plan during its execution, distributed and peer-to-peer evaluation of event queries, or event query evaluation in mobile systems with limited connectivity, bandwidth, and computation resources.

In document Eckert, Michael (2008): Complex Event Processing with XChangeEQ: Language Design, Formal Semantics, and Incremental Evaluation for Querying Events. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 155-158)