• No results found

Streaming workflow transformation

N/A
N/A
Protected

Academic year: 2020

Share "Streaming workflow transformation"

Copied!
94
0
0

Loading.... (view fulltext now)

Full text

(1)

. . .

Plot

. .

V

plots .

Model

. .

V

predictions . .

V

sum .

Σ

.

Avg

. .

V

avg .

w

3 .

w

4 .

w

4 .

w

3 .

w

4 . .

V

all .

.

w

1

.

w

2 . .

T

1 .

T

2 .

.

T

3 .

.

w

e .

w

e .

w

e

.

Streaming Work ow

(2)

(3)

Streaming Work ow

Transformation

Master’s esis in Computer Science

Tjalling van der Wal

February

(4)

(5)

Abstract

is thesis presents . a formal model for streaming work ows adapted for transformation and . transformation rules for streaming work ows de ned according to that formal model. e validity of the transforma-tion rules is demonstrated by formally proo ng equivalence. e validity of the formal model is demonstrated by the fact that valid transforma-tion rules can be de ned.

(6)

(7)

Contents

Abstract v

Contents vii

List of Figures ix

Introduction

. Monitoring a River . . . .

. Sensor Data Problems and Challenges . . . .

. Streaming Work ow System . . . .

. Optimization . . . .

. Objectives . . . .

. Research Questions . . . .

. esis Structure . . . .

Formal Model

. Monitoring a River with a Streaming Work ow . . .

. Behavior of Activities . . . .

. Assumptions on the System Runtime . . . .

. Design Choices . . . .

. Position in the System. . . .

Equivalence

. Monitoring a River the Same Way, but Different . . .

. Equivalence of Activities . . . .

. Equivalence of Views . . . .

. Equivalence of Streaming Work ows . . . .

(8)

Transformation Rules

. Zip Rule . . . .

. Copycat Rule . . . .

. Shared Loop Rule . . . .

. Copy Elimination/Bypass Rule . . . .

. A Multi-Step Example . . . .

. Monitoring a River Slightly Differently . . . .

Tuple-based Transformation Rules

. Union Associativity Rule . . . .

. Selection Pushdown Rule . . . .

. Masquerade Rule . . . .

. Other Rules . . . .

Advanced Transformation Rules

. Aggregate Split Rule . . . .

. Shared Join Rule. . . .

Related Work

. Complications . . . .

. Query Graph Transformations . . . .

. Adaptive Processing . . . .

. Optimization Criteria . . . .

. Provenance . . . .

. Approximation . . . .

Conclusions

. Monitoring a River: Overview . . . .

. Answers to the Research Questions . . . .

. Results and Contribution . . . .

. Suggested Approach to Building an Optimizer . . . .

. Unanswered Questions . . . .

. Evaluation . . . .

(9)

List of Figures

. Optimization . . . .

. Monitoring a River to Detect Groundwater In uxes . . .

. Types of Window Sequences . . . .

. Regular Window De nition. . . .

. Fresh Tuples . . . .

. Alignment of Window Sequences. . . .

. Subalignment of Window Sequences . . . .

. Position of the Formal Model . . . .

. Zip Rule. . . .

. Copycat Rule . . . .

. Copycat Rule LHS and RHS Combined . . . .

. Partitioned Timeline . . . .

. Shared Loop Rule . . . .

. Copy Elimination Rule . . . .

. Copy Bypass Rule (Righthand Side) . . . .

. Example of a Multi-step Work ow Transformation . . .

. Union Associativity Rule . . . .

. Selection Pushdown Rule . . . .

. Masquerade Rule . . . .

. Single Input Union Rule . . . .

. Idempotent Activities Rule . . . .

. Collapse/Split Select Rule. . . .

. Intermediate Project Rule . . . .

. Aggregate Split Rule . . . .

. Computational Costs in the Aggregate Split Rule . . . .

. Shared Join Rule. . . .

(10)

(11)

Chapter One

Introduction

is thesis presents a formal model and transformation rules for stream-ing work ows. In this chapter the motivation, objectives and research questions are introduced. First, this chapter introduces an example of a streaming work ow that will be used throughout this thesis.

.

Monitoring a River

An environmental scientist wants to devise a system to detect ground water in uxes that indicate weaknesses in dikes along a river. Part of his research involves deploying sensors to measure water temperature, ow and saline levels. e equipment used is advanced and capable of transmitting the measurements wirelessly to the research centre, some-times as frequent as every second when required.

To detect potential natural hazards as soon as possible, a computer system at the research centre continuously processes the fresh data and compares it to con gured thresholds and predicted values. To reduce false positives, the system utilizes sliding averages and historic data.

e processing done by the system has been de ned and con gured by means of a work ow speci cation. is work ow is called a stream-ing work ow, because it is used to process streams. In a stream of data, the individual data items are related and the order of the data has seman-tic signi cance. (As opposed to business process work ows, in which the work items are similar but independent.)

(12)

Both work ows independently calculate averages over the same mea-surements. Luckily the system is smart enough not to calculate the av-erages twice.

.

Sensor Data Problems and Challenges

e system from the example does not exist (yet). Section . proposes such a system. is section provides the motivation for building it.

Problems e utilization of measurements is often limited to the per-son or team that has collected them. at is a shame because collect-ing measurements is expensive. e low utilization of measurements is caused by the following problems:

Data is not stored centrally. Measurements are stored on personal desktops: not available to other researchers or teams; they may not even know about it. After a researcher leaves the project, the measurements are not accessible any more.

Data integration is hard. Conventions differ and detail is missing. e quantityT could be either ground, soil or air temperature. is makes it hard to combine and compare measurements from different projects.

Data governance is difficult. After a researcher leaves or after a project is nished, nobody knows how measurements were col-lected and how they were processed. How frequent did the sensor ‘sense’? Is the data raw or has it been processed?

e processing is manual and ad-hoc. Everyone uses their own favorite tool, like Matlab, Excel, SPSS or custom scripts. is exacerbates the other problems.

Each of these problems makes it harder to collect and process sensor data in such a way that measurements can be easily shared and reused.

(13)

Although sensor deployments are expensive, individual sensors have become cheaper. is means more sensors can be deployed at the same cost and more data can be collected. But this also means that the amount of data to manage becomes larger.

Another challenge is included in the example in the previous sec-tion: streaming data. Instead of arriving periodically in batches, the data arrives as a continuous stream. is means real-time monitoring is possible, provided a more robust way of managing data is developed.

.

Streaming Work ow System

e solution to these problems and challenges is to build a system such as already described in Section . . e system will process incoming sensor data according to processing instructions speci ed by the user by means of streaming work ows: it is a Streaming Work ow System.

In ‘Composable data processing in environmental science — a pro-cess view.’ [ ] the same problems have been described and a similar system is suggested.

Scope of the System Although the example in Section . and some of the problems in Section . are speci c to environmental sciences, the ambition is to build a generic system that can be used in many more sit-uations. Examples include administrative systems, monitoring systems and control systems, in accounting and security domains. Considering that a cash register can be seen as a sensor that observes sales, it is logical not to limit the Streaming Work ow System to environmental science applications.

Functions and Requirements for the System A Streaming Work ow System must perform the following functions:

(14)

.

.. Original.. ..Alternatives. . . Best. Alternative .

... .. Alternatives .

search

. search

. choose

.

.

this thesis

Figure . : Optimization

. e system shall manage data storage and distribution. If the data is centrally available, it can be shared and reused more easily.

. e system shall optimize the work ow speci cations submitted by users. Optimization shall be done both within and between work ows. Criteria for optimization include computational and storage cost. Optimization is needed to let the system scale in terms of users, data volume and data rate.

ese three requirements are sufficient to tackle the problems and chal-lenges from Section . and to make the example from Section . re-ality. is thesis contributes to the third requirement: optimization of streaming work ows.

.

Optimization

Optimization is a requirement for the Streaming Work ow System. But what is optimization? And what is needed to build an optimizer?

[image:14.446.95.351.99.212.2]
(15)

on alternatives that have been found earlier. In the end, from all the alternatives found, the best one is chosen.

As example, consider the algebraic formulax= (a+b)(a+c). By applying algebraic rules the following alternatives can be found:x= (a+c)(a+b)andx=a∗(b+c). e second alternative is better because it requires less computation to ndx.

Searching for Alternatives e search for alternatives has two prereq-uisites: . a formal way to describe both the original and the alternatives and . a formal de nition to tell whether an alternative is valid. In the algebraic example the formula itself is a formal description and any for-mula that evaluates to the same value is a valid alternative.

In the work ow example from Section . , work ows are described by a speci cation. Searching for alternatives is done by applying trans-formation rules to a streaming work ow speci cation to create alterna-tives, just like algebraic rules are used to create alternative formulae. e contribution of this thesis is de ning transformation rules for streaming work ows.

Choosing an Alternative Which alternative is best depends on the context. In case of the algebraic example, the original formula is the best when trying to solve forx= 0, but the alternativex=a∗(b+c)is the best when computing the value. e selection of the ‘best’ alternative streaming work ow speci cation is outside the scope of this thesis.

.

Objectives

is thesis tries to contribute to the idea of a Streaming Work ow Sys-tem by showing the feasibility of building such a sysSys-tem, in particular the feasibility of creating an optimizer for streaming work ows. e objectives of this thesis are:

(16)

It is assumed that optimization is possible if transformation is possible. erefore this thesis is limited to streaming work ow transformation.

. De nea formal model for streaming work ows with transforma-tion in mind. An existing formal model for streaming work ows is used as a starting point.

. De netransformation rules for both generic and speci c activi-ties, and for a reasonable set of common activities (including re-lational).

.

Research Questions

To ful ll the objectives, this thesis will answer the following research questions:

. How must the (existing) formal model be adapted to allow for transformation?

. What is a suitable de nition of equivalence between work ows?

. Can transformation rules be de ned for common types of activi-ties?

a) Can rules be de ned for generic situations?

b) Can rules be de ned for common (relational) activities?

c) Can rules be de ned for aggregates and join activities?

By answering these research questions, the second and third objective from Section . are ful lled. e rst objective is indirectly ful lled by ful llment of the second and third objective.

.

esis Structure

(17)

In Chapters , and transformation rules valid within the formal model are de ned. ese chapters can be read selectively, but in partic-ular the Copycat Rule (Section . ) should be read because for this rule a full proof is provided that uses and illustrates most concepts from the formal model.

Section . presents an example that shows how two independent but similar streaming work ows can share computation through the application of multiple transformation rules. Section . shows how transformation rules can be used to optimize the ‘Monitoring a River’ example as the work ow evolves during a research project.

(18)

(19)

Chapter Two

Formal Model

Before a streaming work ow can be transformed, a formal description of the work ow is required. e description must be formal because . the description must be manipulatable for a computer system and . it must be possible to proof the equivalence between work ows. is chapter introduces the formal model used to describe streaming work ows.

In the rst section the ‘Monitoring a River’ case from the introduc-tion is used as an example to introduce the core concepts of the formal model. In Section . the behavior of a streaming work ow is formal-ized. In Section . assumptions on the system’s runtime are listed. ese are needed to make the model work and to complete the proofs of transformation rules. Section . discusses design choices that have in uenced the formal model.

Origins e formal model presented in this chapter is based on the formal model described by Wombacher in ‘Data Work ow — A Uni-ed Time ControllUni-ed E-Science Processing Model’ [ ]. Compared to the original model, this new model trades expressive power for the predictability needed to do transformations.

It is important to note that although the formal model is less expres-sive, it does not mean that a Streaming Work ow System would be less powerful. More powerful constructs can be offered by the system, with the limitation that not all of those can be transformed and optimized using the formal model presented here.

.

Monitoring a River with a Streaming Work ow

(20)

. . . . .T emper atur e S ensor 1 . .

. T.V

. . . . . . . . . . . . . . .

. T.V

. . . Avg . .

. T.V

. . .T emper atur e S ensor n . .

. T.V

. . . . . . . . .Inux Detector . . . . V influxes . . .F lo w S ensor 1 . . . . . . . . Avg . . . . . . Sum . . . . . . . . Subtract . . . . V alarms . . .F lo w S ensor n . . . . . . . . . . . .Data Entr y . . . . V m 3 / minute . V m 3 / minute . V kno wn inux lo cations . V m 3 / hour . V m 3 / hour

.

.sensors

.

.lter [image:20.446.184.350.84.573.2]
(21)

Activity Output (content of view)

Purpose

.

.

.Activity ... .

Temperature Sensor T inC at location

xat timet

Observe river

Flow Sensor Flow inm3/sat

lo-cationxin time in-tervalt

Observe river

T inC at location

xat timet

Eliminate values

outside

per-centile to address sensor errors

Avg(T) AverageT inC at timetover last hour

Calculate average temperature in river

Avg(Flow) Average ow in

m3/s in time intervalt

Use average as ‘weighted majority vote’ to reduce impact of sensor error

Sum(Flow) Total ow in

m3/hourat timet Calculate total owin last hour

In ux Detector Detected in uxes at location x during time intervalt

Does what this work ow was cre-ated for; all the other activities exist to feed data to this activity.

Data Entry List of known in ux locations

Manually record lo-cations with known issues or special cir-cumstances. Subtract List of alarms

with-out false positives

Eliminate known in ux locations from list of in uxes.

[image:21.446.96.351.94.518.2]
(22)

using this example.

To detect in uxes, the scientist has written an algorithm that looks for anomalies in the combination of water temperature and water volume. Before sensor data can be fed to this algorithm, the data has to be l-tered to address sensor errors and has to be aggregated. After the in ux detection, known locations have to be ltered out to produce a list of locations on the dike that need inspection. So the work ow has the following steps: . record the data, . lter the data, . aggregate the data . execute detection algorithm, . record known in ux locations and . eliminate false alarms.

Figure . shows the streaming work ow created by the scientist. Table . details the purpose of each individual component in the work ow. Together these components perform the six required steps.

e work ow starts with the sensors deployed in the river; period-ically they send data to the research centre. In the formal model ev-erything that produces or processes data is anactivity. e activities representing the sensors are shown on the left of the work ow as blue boxes. Associated with each activity is aviewthat stores all data pro-duced by an activity. In the work ow, views are depicted as small green circles.

e recording of sensor data is represented by the sensor activities and their corresponding views. e second step to perform is to lter the data to reduce the impact of sensor error. For temperature data, a activity periodically reads the data from all the sensor views and produces a new stream from which outliers have been removed. For the ow data, the average over all sensors is calculated by anAvgactivity.

e output from both activities is again stored in views.

e third step, aggregation, is similar to the ltering: an activity reads the data from a view, applies the required computation and pro-duces a stream which is stored in a new view. is results in two views: one with a sliding average temperature and one with a sliding total ow volume.

(23)

As a sixth and nal step the list of locations is ltered by means of a manually maintained list that is produced by step ve.

Two phrases in the description above have been deliberately left vague: ‘periodically’ and ‘reads the data’. is raises two questions:when does an activity execute? andwhich data is used for an execution?. e next section will answer those questions.

.

Behavior of Activities

is section describes the behavior of activities. Activities are the active components of a streaming work ow, as opposed to views that just store data and thus have no behavior. e behavior of activities is de ned by two components: .whento execute and .what input datato use. Both are speci ed using window sequences.

De nition : Window Sequence A window sequence is a regular and predictable sequence of windows on a view. Each window represents a single activation of an activity and the selected data from the view to use for that activation. A window sequence is either de ned in terms of time or in terms of arriving tuples.

When an activity must be activated (or executed) is determined by an arithmetic sequence of timestamps or tuple-ids. Elements of such a sequence are called reference points.

De nition : Reference Points e reference points of an activity are an arithmetic sequence generated by[epoch+n∗delta|n∈N0]. For

time-based activitiesepochanddeltamust both be valid time units. For tuple-based activities both must be tuple counts.

An activity can have a different window sequence for each input re-lation, but they all share the same reference points.

(24)

.

.xed .view

.

.

.

. .

landmark .view

. n . . . . . . . .

partitioning .view

. n . . . . . . . . . .

sliding .view

. n . . . . . . . . . .

hopping .view

. n . . . . . .

[image:24.446.127.320.163.420.2]
(25)

What data must be selected from input views is de ned relative to the reference point of the current activation. ree basic types of window sequences are supported.

De nition : Regular Window Sequence For regular windows, the lower and upper bound of the data interval selected from a view is de-ned relative to the reference point. e interval de nition is:⟨reference point−(ws+offset). . .referencepoint−offset].wsis thewindow sizeof the window andoffsetallows to specify the distance of the window to the reference point. Figure . visualizes this.

ere are exactly three subtypes of regular window sequences based on the relation between the window size (ws) and the distance that the reference point shifts between each window (delta). ese three sub-types are partitioning (ws= delta), sliding (ws>delta) and hopping (ws<delta). Figure . shows them.

A partitioning window sequence has two speci c properties: all win-dows are disjoint: ∀i, j : wi∩wj = (mutually exclusive) and all windows together cover the entire view: ∩wi = V (collectively ex-haustive). A hopping window sequence ignores part of the data in the input view.

.

.

ws .offset

. lower bound

. upper bound

. reference

point

Figure . : Regular Window De nition

De nition : Landmark Window Sequence For landmark windows the lower bound is xed and the upper bound is de ned in the same way as for regular windows. e interval de nition is:⟨landmark. . .reference point−offset].

[image:25.446.130.317.347.404.2]
(26)

De nition : Window (Instance) A single window from a window sequence; obtained by substitution of the reference point of the current activation into the window sequence de nition. It serves as the data se-lection interval for the current activation. e lower bound of a window is exclusive and the upper bound is inclusive.

De nition : Standard Tuple-based Window Sequence For tuple-based activities a special window sequence is de ned. e standard tu-ple based window sequencewe for event based activities is de ned as

delta= 1,ws= 1,offset= 0.

When activated, an activity is not just called with a bag of input tuples. e work ow system also reports the interval used and all the reference point sequence and window sequence information.Unionand Joinactivities need this information to distinguish between fresh tuples and tuples they ‘have seen before’.

Knowing the speci cation of the window of the current activation, an activity can calculate the bounds of the previous window. e latter can be used to determine which tuples ‘have been seen before’ and which tuples are fresh.

De nition : Fresh Tuples A fresh tuple is a tuple that was not in-cluded in an earlier window instance. Figure . shows windows from a sliding window sequence with fresh tuples marked orange. In parti-tioning and hopping windows, all tuples are fresh.

. .

sliding .view

. n

.

.

.

.

.

.

.

.

.

.

.

.

.

. .

n= 0

[image:26.446.125.318.457.511.2]
(27)

. .

w1

.

w2

Figure . : Alignment of Window Sequences

. .

w1

.

w2

Figure . : Subalignment of Window Sequences

An important property of reference points and window sequences is alignment. is property is used in proofs in Chapter .

De nition : Window Sequence Alignment Two window sequences

w1 andw2 aligniff delta(w1) = delta(w2) and∃n : epoch(w2) =

epoch(w1) +n∗delta(w1).

Becausenis allowed to be negative, alignment is an equivalence rela-tion: it is re exive, symmetrical and transitive. Reference to alignment of activities is shorthand for alignment of their respective window se-quences.

De nition : Window Sequence Subalignment Two window se-quencesw1andw2subaligniff ∃k∈N1 :k∗delta(w1) =delta(w2)

and∃n∈N:epoch(w2) =epoch(w1) +n∗delta(w1).

[image:27.446.133.318.93.118.2] [image:27.446.133.317.161.186.2]
(28)

.

Assumptions on the System Runtime

To make the formal model function, several assumptions must be made on the Streaming Work ow System runtime.

Assumption I: e System is In nitely Fast. e model has a for-mal and mathematical nature. Execution time is not something that concerns the model. erefore the model assumes that execution of ac-tivities is fast; so fast that execution takes zero time. is assumption has a very nice effect on the notion of equivalence: since execution is in nitely fast, intermediate states of the system will not be observable and will thus not affect equivalence.

Assumption II: e System Ensures Correct Execution Order. When more than one activity is enabled for the same reference point, the run-time of the Streaming Work ow System will ensure that an execution order is chosen that respects the input-output dependencies between the activities. When an activity is executed it is guaranteed that all in-put views are complete for all reference points less or equal the current reference point.

e system may not be able to ful ll this assumption when the work-ow speci cation contains cycles. is thesis ignores cyclic subgraphs. Cyclic subgraphs must be treated as a single non-transformable activity.

Assumption III: e Initial System is Empty. e work ow model does not allow orphaned views. As a result any data in the system must have been added to the system through an activity. is includes historic data and (con guration) data that is added manually. is assumption is needed to proof the correctness of transformations.

.

Design Choices

(29)

Design Choice I: Streaming and Querying are the Same. Wombacher [ ] makes a distinction between a streaming mode of operation and a special query mode. is model chooses to treat them the same. To illustrate this in a more technical manner: Suppose the system is like a functional programming language and the sensor data is stored in lists. To process the data, you need mapping and folding functions. But: these functions can be applied to both nite and in nite lists and also to lists that are still being generated. For the functions there is no dif-ference between stream processing fresh data and batch processing his-toric data. Likewise the streaming work ow model does not distinguish streaming and querying modes.

e reason for this choice is that it allows the formal model to be ag-nostic of the current time;epochand other time variables are not limited toNowand the future but can also be in the past.

It is personal belief that this makes the model cleaner and less com-plex but does not needlessly complicate the implementation of the sys-tem runtime.

Design Choice II: Triggers are Ugly. In [ ], triggers are used to de-termine when activities should be executed. Triggers are not declarative and their semantics are hard to capture, especially the fact ‘ week’ does not mean the same as ‘ days’. Also a trigger alone can not be ma-nipulated by transformations because it does not describe what data an activity needs for an activation. erefore in this model triggers have been replaced by window sequences.

Design Choice III: Work ows are Immutable. Once submitted to the system by the user, a work ow is immutable. When a work ow needs to be updated, the user must submit a new work ow to the system. When-ever is said that a work ow is updated, it means that a new work ow derived from the original is added to the system.

e rationale for immutable work ows is data governance: because work ows cannot change, it is always possible to tell how a certain tuple was generated.

(30)

Design Choice IV: All Activities are Stream Deterministic. Wom-bacher [ ] distinguishes deterministic and non-deterministic activities. In this thesis, this distinction is ignored and all activities are considered to be stream deterministic: given the same data stream, a just initial-ized instance of an activity always produces the same output stream. Whether an activity carries internal state across subsequent activations is irrelevant for this type of determinism.

e kind of determinism described in [ ] is much stronger as it is de ned in terms of individual activations, theoretically allowing out-of-order execution. is sort of optimizations is outside the scope of this thesis.

Stream determinism does require that the output of activities solely depends on the current and previous inputs. e results should not de-pend on (in particular) the actual execution time. is gives the system the exibility to delay execution, but also means that transformations can cause earlier execution of activities without breaking equivalence.

Sensors and data entry activities in a work ow are not stream deter-ministic. is is not a problem because these activities are the leafs of the work ow graph and can not be eliminated anyway for that reason.

.

Position in the System

is section describes the position of the formal model within a Stream-ing Work ow System. DescribStream-ing what the modelisand, in particular, what itis not, has helped me considerably to think about transformation rules in isolation.

e Model is Not a User Interface. e formal model is not intended as a user interface. A real system will provide a higher level of abstrac-tion. Because the model is not a user interface, seemingly arbitrary and counter-intuitive limitations and assumptions can be made about

work-ows.

(31)

. .

Streaming Work ow System

.

.

.

.

.

. .User Interface

.

Formal Model

.Implementation

.

Relational Database

.

.

.

.

.

. .SQL

.Relational Algebra

.Physical Operators

Figure . : Position of the Formal Model in a Streaming Work ow System.

a single algorithm. Because the model is not an execution plan, it is no problem when a transformation rule produces a work ow that seems less efficient.

So What is the Model? e formal model is an abstraction layer that enables the system to transform streaming work ow speci cations into anyshape. e desirability of a shape is no concern in the model.

[image:31.446.129.323.97.170.2]
(32)

(33)

Chapter ree

Equivalence

In Section . , optimization was summarized as ‘searching for an alter-native that is better than the original’. One of the questions that arises from that description is: what de nes a valid alternative? When is a streaming work ow a valid alternative for another streaming work ow? A work ow is a valid alternativeiff it can be interchanged with the original without changing the output of the system. e formal concept for ‘valid alternative’ or ‘interchangeable’ is equivalence. Two work ows are valid alternatives for each other when they are equivalent.

Equivalence is denoted by theoperator. In this thesis the nota-tions=tand≡tare used to state that an equality or equivalence is valid at a certain point in time.

.

Monitoring a River the Same Way, but Different

Previous chapters have used the example of a scientist who wants to observe dikes along a river to detect groundwater in uxes and his col-league who studies the effects of climate change on the river. What does equivalence mean to them?

To the user of a Streaming Work ow System equivalence means that he gets the results he expected; that the dataapparentlyhas been pro-cessed in the way he has speci ed the data to be propro-cessed. e system must deliver predictable, reproducible and traceable results to guarantee the scienti c validity of his research.

(34)

Equivalence simply means that two different work ows deliver the same results. is chapter formalizes this intuitive de nition to make it possible to proof that transformation rules preserve equivalence in Chapter .

.

Equivalence of Activities

is chapter describes equivalence bottom-up, so this section starts with describing equivalence of activities. Intuitively, activities are equivalent when they produce identical output from the same input.

De nition : Equivalent Activities Two activities are de ned to be equivalentiff . they run the same code, . this code is stream determin-istic and . they have the same con guration. According to Assumption IV, the second condition is always true.

Note that sensors and other activities at the leafs of the work ow are never equivalent. One could either argue that they do not run code in the sense intended in the de nition of equivalent activities or that the location of their deployment is part of their code. Anyhow, the rst condition is false.

In transformation rules, activities are considered equivalentiff they have the same label. If the labels are different, then the activities are not equivalent. is does not prevent transformation rules from stating that two activities are interchangeable because work ow equivalence will be de ned in terms of view equivalence.

.

Equivalence of Views

With equivalence de ned for activities, the equivalence of views can be de ned.

(35)

e content of a view for the purpose of equivalence consists of all records (including empty records), with all the attributes in the user de ned schema. In addition theidealized transaction time(part of the system’s metadata) attribute is included; this is the timestamp used by interval predicates to select data from views.

Addingtuple idto the content would make equivalence much harder, but not adding it restricts the kinds of tuple based triggers that can be transformed. is choice requires further consideration.

Equivalence of views can be very simple. e following lemma for-malizes a trivial case of equivalence between views. It is a special case of the de nition above.

Lemma : Trivial Equivalence If two views are . produced by equiv-alent activities, . from equivequiv-alent input views and . with the same window sequence,thenthey are equivalent according to De nition .

e rst and second conditions must be (indirectly) true for shar-ing to be possible at all: when doshar-ing somethshar-ing different with the data, nothing can be shared and likewise when the data is not from a com-mon source. A lot of transformation rules target situations where the third condition is not true, but where there is some relation between the window sequences that causes a subsumption relation between views.

A simple proof of Lemma can be given by induction over time. Given two views for which all three conditions hold: VAwhich holds the output of activityAandVBwhich holds the output of activityB. Att = 0both views are empty according to AssumptionIII, so their content is the same. At timetthe content of both views is still the same. At timet+ 1bothAandBare activated and . read the same data interval from equivalent views, . perform the same operation on the same data and . produce the same output tuples which are appended toVAandVB. So at timet+ 1the contents ofVAandVBare still the same. usVAandVBare equivalent by De nition .

(36)

Strict Equivalence Weak Equivalence

. Zip

. Copycat

. Shared Loop

. Copy Bypass . Copy Elimination

. Union Associativity . Selection Pushdown . Masquerade . Single Input Union

. Idempotent Activities

. Collapse and Split Select . Intermediate Project

. Aggregate Split . Shared Join

[image:36.446.107.338.208.409.2]
(37)

.

Equivalence of Streaming Work ows

Based on the de nition of equivalent views, the de nition of equivalent work ows can be given.

De nition : Equivalent Work ows (Strict) Two streaming work-ows are equivalentiff for each view in either work ow, an equivalent view exists in the other.

Many transformation rules introduce views to the work ow to store intermediate results that can be shared and reused. As a result these rules are not valid under the strict de nition of work ow equivalence. From a practical (user) viewpoint however additional views are no issue.

erefore a weaker de nition of work ow equivalence is added.

De nition : Equivalent Work ows (Weak) Given a subset of the views, two streaming work ows are equivalentiff for each view in the set, equivalent views exist in both work ows.

is de nition weakens the equivalence by restricting the number of views that has to have an equivalent. In practice it will be no problem that the set of equivalent views is restricted because there is an asym-metry as described next.

De nitions and are symmetrical because an equivalence rela-tion has to be symmetrical. In reality there is asymmetry: one work ow will be the speci cation written by the user and the second work ow is the internal work ow used by the system runtime. e subset used for equivalence is the set of views in the user speci cation. Addition-ally, many users are only interested in the end results produced by their work ow. Intermediate views can therefore also be removed from the subset used for equivalence.

(38)

views when the subset of views used for equivalence (i.e. user interests) changes.

Comparing Strong and Weak Equivalence Table . lists the trans-formation rules presented in the next chapters classi ed by the type of equivalence. Rules with the same type of equivalence also tend to share other characteristics.

An interesting observation is the fact that rules that adhere to strict equivalence identify duplicate views in a work ow or identify a sub-sumption relation between views. Application of these rules is always bene cial from a computational or storage perspective. Rules that only preserve weak equivalence try to make duplicate work explicit, allowing the results to be shared and the duplication to be eliminated. Addition-ally these rules can be used to trade computational cost for storage cost and vice versa.

Independence of Equivalence e de nition of equivalence given is independent of both the work ow structure and the execution model.

is has the following advantages:

Transformation rules can structurally change the work ows, as long as the required views are still present.

e formal model can be changed and implementations can di-verge from it. A realistic example would be a ltering activity or a join activity that produces two output views based on two independent predicates.

Part of a work ow de nition can use other semantics and a dif-ferent execution model. e de nition of equivalence allows the formal model to be treated as a subset of the original model

de-ned by [ ].

.

Local and Global Equivalence

(39)

Lemma : Local Equivalence is Global Equivalence When a sub-graph of a work ow is transformed, it is sufficient to show that equiva-lence is preserved in that subgraph to proof that equivaequiva-lence is preserved for the full work ow graph. As a consequence, each transformation rule can be discussed in isolation.

To proof Lemma , consider transformationT, the setWof all views in a work ow and a subsetS⊆Wof views affected byT. Equivalence is de ned under the set of viewsE ⊆W (for strict equivalenceE =

W).

For all viewsE∩Sit is guaranteed that equivalence exists on both sides of the transformation, becauseT is a valid transformation rule. e viewsE∩(W\S)are not affected by the transformation rule, so these do not change equivalence. SinceE= (E∩S)(E∩(W\S)), equivalence underEof the complete work ow is preserved.

(40)

(41)

Chapter Four

Transformation Rules

In this chapter a series of generic transformation rules is presented. Each of these rules is valid within the formal model from Chapter . e rules demonstrate that the formal model can be used to de ne trans-formation rules for generic situations (Research Question a).

Each rule in this chapter is informally described using graphs and text. e Copycat Rule (Section . ) contains a full formal proof; for the other rules an indication is given of how a similar proof could be constructed.

Section . starts with a simple rule. In Section . a work ow is transformed using all the rules presented in this chapter and in Section . the river example is used to illustrate how the Streaming Work ow System can optimize an evolving work ow.

Describing Transformation Rules Transformation rules are described visually by two graphs that represent the before (left hand side, or lhs) and after (right hand side, or rhs) state. In addition preconditions and introduced window sequences are stated. Quanti cation of the graph structure and preconditions is handled either intuitively or by choos-ing a speci c case; further research or an implementation will have to formalize this to make the rules generic.

Applicability of Rules A transformation rule is applicable when . a subgraph match is found, . all preconditions are true, . all variables referenced by the rule are de ned and . in each precondition and vari-able de nition the lhs and rhs are of the same type.

.

Zip Rule

(42)

Before transformation (lhs):

.

.

. .Act.. ... .V

out

. . . .

Vin

. .Act.. ... .V

out′

.

w

.

w

After transformation (rhs):

. .. ...

Vin .Act.. ... . Vout

.

w

(43)

fundamental rule of all. Many other rules increase the complexity of the work ow graph, trying to create possibilities for the Zip Rule to be applied.

e Zip Rule is the embodiment of the notion of trivially equiva-lent views from Lemma . To proof the validity of the rule, start with proo ng thatVoutandVout′ are trivially equivalent. Views must meet

three conditions to be trivially equivalent:

. Produced by equivalent activities.

Check; expressed by the fact that all activities in Figure . have the same label.

. From equivalent input views.

Check; the input views of all activities in Figure . areidentical, which is stronger than equivalent.

. e same window sequence.

Check; all input relations in Figure . have the same label.

For each view on the left hand side (lhs), an equivalent view exists on the right hand side (rhs), becauseVoutandVout′are trivially equivalent.

erefore the work ow before application of the rule is strict equivalent with the work ow after application of the rule [De nition ].

.

Copycat Rule

e Copycat Rule is very similar to the Zip Rule. Only one condition from the trivial equivalence of views is not ful lled in the following sit-uation. is section shows a non-trivial proof for a rule and illustrates scenarios that a Streaming Work ow System is expected to optimize.

Scenario Although copying someone else’s work maybe a crime in an academic context, being ‘inspired’ by the way a colleague deals with cer-tain tasks is absolutely acceptable for day-to-day work. Reinventing ex-isting solutions is often not the best way to spend time.

(44)

Before transformation (lhs):

.

.

. .Act..

1 ... .V

2

. . . .

V1

. .Act..

2 ... .V

3

.

w1

.

w2

After transformation (rhs):

. .. ...

V1 .Act..1 ... .

V2

.

.

.

Copy ... . V3

.

w1 .w3

Preconditions:

Act1and Act2are equivalent

w1andw2are both time-based

epoch(w1) ̸=epoch(w2)

delta(w1) =delta(w2)

offset(w1) =offset(w2)

ws(w1) =ws(w2)

∃n >0 :epoch(w2) =epoch(w1) +n∗delta(w1)

De nition ofw3:

epoch(w3) =epoch(w2)

delta(w3) =delta(w2)

offset(w3) = 0

ws(w3) =delta(w2)

[image:44.446.132.320.46.527.2]
(45)

A user sees a work ow from a colleague and thinks that it is really a clever idea. So the user creates an (almost) identical work ow for himself. is is a literal copycat situation!

Some task is reassigned. Not familiar with the task, the new as-signee starts off by doing things the way the former asas-signee did. When he understand things better, he will make his own

modi-cations.

Some business practice is repeated regularly (e.g. every scal pe-riod). Each time the procedure is slightly tweaked, resulting in a new work ow being added to the system.

All three scenarios share one observation: the main difference be-tween two work ow speci cations is when they were speci ed. In terms of window sequences all variables are the same except for theepoch vari-able.

e Rule eBeforeandAftersituation in Figure . show the struc-tural changes to the work ow. e preconditions formalize the obser-vations mentioned before: everything is the same, except theepoch.

e preconditions contain one special requirement: theepochs must align. As a consequence of this alignment and the other preconditions, the windows de ned byw2are a proper subset of the windows de ned

byw1. is means thatV3is a proper subset ofV2. Since it is known

when Act2was de ned, all that needs to be done to createV3fromV2is

copy everything added after Act2was de ned.Copyandw3are de ned

in such a way that everything inV2that also should be inV3is copied.

Sincew3speci es the same reference points asw1, new data added to

V2is copied toV3in the same system state transition.

Formal Proof For a transformation to be correct, the work ow before application must be equivalent to the work ow after application. Be-cause equivalent work ows have equivalent views [De nition ], it is sufficient to proof thatV2 ≡Vcopyin a work ow that combines the lhs

and rhs of the transformation as shown in Figure . .

(46)

.

.

. .Act..

1 ...

.

V1

.

.

.

Copy ... .V

copy

. . . .

V1

. .Act..

2 ...

.

V2

.

w1

.

w2

.

wcopy

[image:46.446.136.312.278.339.2]
(47)

validity of the proof because theepochs cause the activities to behave as if they have yet not been de ned. e work ow system runtime treats an activity with a futureepochas non-existent. (Of course optimization can already be applied to these activities.)

Equivalence of views is de ned as ‘having the same content at all points in time’ [De nition ]. erefore the equivalenceV2≡Vcopyis

shown with a proof by induction, using the timetas induction variable.

Σt represents the state of the system at timetbefore anyactivity enabled attis executed.Σt+1represents the state of the systemafter all

activities enabled atthave been executed. For intermediate states the notationΣt′is used.

Given

. e combined work ow of both sides of the transformation. See Figure . .

. e preconditions for the transformation:

a) Act1Act2

b) w1andw2are both time-based

c) ∃n >0 :epoch(w2) =epoch(w1) +n∗delta(w1)

d) (implied by above:epoch(w1)<epoch(w2))

e) delta(w1) =delta(w2)

f ) offset(w1) =offset(w2)

g) ws(w1) =ws(w2)

. e windowwcopyas de ned by the transformation:

a) epoch(wcopy) =epoch(w2)

b) delta(wcopy) =delta(w2)

c) offset(wcopy) = 0

d) ws(wcopy) =delta(w2)

(select all since prev execution of Act1)

(48)

Basic Step Att = 0, all views in the systemΣ0 are empty. Since

empty views have identical content, the conclusion is thatV2=0 Vcopy,

by de nition of an empty system [AssumptionIII].

Induction Step According to the induction hypothesis, in system state

Σt, it is true thatV2=tVcopy. In the next system stateΣt+1, it must be

true thatV2=t+1Vcopy.

ere are cases to consider. teither aligns with all window se-quences or does not align with any. Iftaligns, then it falls into one of three sections of a timeline partitioned byepoch(w1)andepoch(w2)as

illustrated in Figure . . e rst execution of an activity is no different from any other execution of that activity. ereforet=epoch(w1)and

t=epoch(w2)are not separate cases.

.

.

epoch(w1)

.

epoch(w2)

Figure . : Partitioned Timeline

e rst thing to proof is that either all or none activities align with

t. As a result, no cases have to be considered in which some activities do align and some activities do not align.

Becauseepoch(wcopy) = epoch(w2)[ a] anddelta(wcopy) = delta(w2)

[ b],Copyaligns trivially with Act2 by de nition of alignment.

Be-causedelta(w1) =delta(w2)[ e] and [ c], Act1and Act2also align (in

fact [ e]+[ c]isthe de nition of alignment, see De nition ) Alignment is an equivalence relation (thus transitive) therefore all ac-tivities align and therefore all acac-tivities do align withtor all activities do not align witht.

Lets consider the rst case:

No alignment.

[image:48.446.127.318.259.313.2]
(49)

In the following cases, the windows of all activities align with each other and witht. Becausetis a valid reference point of at least one of the activities in the work ow system, it is calledNow. is switch of terminology is made to relate to the notation used to specify work ows more closely.

Before the rst time Act1is enabled.

Now<epoch(w1) Now<epoch(w2)∧Now<epoch(wcopy)

none of the activities is executed and thusΣt+1= Σt⇒V2=t+1

Vcopy[ d, a]

Starting the rst time Act1was enabled but before the rst time

Act2is enabled.

Now≥epoch(w1)∧Now<epoch(w2)

– Act1is executed and addsn≥0tuples toV1

– Act2is not executed soV2does not change

Copyis not executed because

epoch(wcopy) =epoch(w2)[ a] soVcopydoes not change

V2=tVcopyand nothing changes, thusV2 =t+1Vcopy

Starting the rst time Act2is enabled.

Now≥epoch(w2)

– Because of the input dependency relation between Act1and

Copy, the work ow system will ensure thatCopyis always executed after Act1for the same value ofNow.

[Assump-tionII] ere are possible execution orders: (Act1,Act2,

Copy),(Act1,Copy,Act2)and(Act2,Act1,Copy). In this

proof the rst order is assumed. e other two orders can be proven similarly to result inV2=t+1Vcopy.

– Act1is executed becauseepoch(w1)<epoch(w2)[ d] and

adds zero or more tuples to V1, producingΣt′ = Σt

output(Act1)

– Act2is executed and addsntuples toV2, identical to the

tu-ples Act1has added toV1because Act2Act1∧w1≡Now

w2∧V1=V1(same operation, same selection, same source

thus same output), producingΣt′′ = Σt′ ∪output(Act2)

(50)

Before transformation (lhs):

. . . .

V1

.

. Min .

w

. . .

V2

After transformation (rhs):

. . . .

V1

.

. CA .

w

. . .

V3

.

.

πMin

.

w2

. . .

V4

De nition ofw2:

epoch(w2) =epoch(w)

delta(w2) =delta(w)

offset(w2) = 0

ws(w2) =delta(w)

(51)

– NowV2has more tuples thanVcopy; but execution takes zero

time [AssumptionI], so this is not a point in time for the purpose of view equivalence.

Copyis executed becauseepoch(wcopy) = epoch(w2) [ a].

Execution takes place after Act1has addedntuples toV1.

e windowwcopyselects exactly thentuples just added to

V1. Copycopiesntuples toVcopy, producingΣt+1since all

activities have been executed.

– InΣt+1 ntuples have been added to bothV2 andVcopy.

ese tuples are equivalent because (indirectly) they have been produced by equivalent activities from the same input data. usCopymust be implemented so that it literally copies thevalid onvalue from the input tuples. [ ])

– usV2 =t+1Vcopy

Conclusion of Proof For allthas been proofed thatV2=tVcopy, thus by de nition of view equivalenceV2≡Vcopyin the combined work ow

in Figure . .

Since the other views have not been rede ned, every view on the lhs of the rule has an equivalent on the rhs of the rule and vice versa. e rule preserves strict equivalence and thus is the Copycat Rule a correct

transformation of a streaming work ow.

.

Shared Loop Rule

is rule uses a very simple observation: if you need both theMinand Maxof a set of data, you can nd them both by iterating over the data only once. A Common Aggregate (CA) activity does exactly that: it computes various common aggregates likeMin,MaxandCountover a set of data. To present the user with the view he expected, the rule adds an extraProject(π) activity to select only the requested aggregate.

e validity of this rule is proofed by treating it as the Copycat Rule limited to a subset of schema attributes. A Common Aggregate activity is the aggregate it replaces, except that it outputs more attributes and ProjectisCopyexcept that it outputs less attributes.

(52)

Before transformation (lhs):

. ..V ...

2 .Copy.. ...

.

V3

.

.

.Consumer .

w3 .w4

After transformation (rhs):

. ..V ...

2 .Consumer..

.

w4

Preconditions:

w4⊆w3

Consumer can be any activity.

Figure . : Copy Elimination Rule

After transformation (rhs):

. ..V ...

2 .Copy.. ...

.

V3

.

.

.Consumer .

w3

.

w4

[image:52.446.144.306.65.526.2] [image:52.446.149.302.449.490.2]
(53)

.

Copy Elimination/Bypass Rule

is section shows how a rule that only preserves weak equivalence can be adapted to preserve strict equivalence. First the weak variant is ex-plained.

e Copycat Rule adds aCopyactivity to a work ow. Often the ltering performed by theCopyactivity is redundant because the consumer of the Copyactivity has a window sequence that already does that. is will be the case when the user has de ned a long work ow with multiple processing steps.

Figure . shows the Copy Elimination Rule that takes advantage of that situation. e statementw4⊆w3in the preconditions means that

the reference points ofw4are a subsequence of the reference points of

w3(thus fewer windows), or that each window ofw4selects a subset of

the corresponding window ofw3 (smaller windows); it can also mean

both. Because user will often de ne work ows with multiple processing steps on some data (like a pipeline), situations wherew3 = w4 are

expected not to be uncommon.

e Copy Elimination Rule not only eliminates an activity, it also eliminates the corresponding view. erefore the rule does only preserve weak equivalence between the lhs and rhs but no strict equivalence. is is no problem, assuming that the user is not interested in the interme-diate results.

Now, the users interests have changed. To debug/inspect his work ow the user wants to see the intermediate views. e Streaming Work ow System can no longer apply the Copy Elimination Rule because it breaks equivalence whenV3is included in the subset over which equivalence is

de ned.

(54)

a.Work ows as speci ed and seen by the user(s):

. . . . . .

Vin .σ..p ... .Min.. ... . Vout

. . . .

Vin .σ..p ... .Max.. ... .V

out′

.

we .w1

.

we .w2

b.In the system runtime:

.

.

. .σ..p ... .Min.. ... . Vout . . . . Vin

. .σ..p ... .Max.. ... .V

out′ . we . w1 .

we .w2

c.Apply Zip Rule:

.

.

. . . .Min.. ... .V

out

. . . .

Vin .σ..p ...

. . . .Max.. ... .V

out′ . we . w1 . w2

[image:54.446.141.303.136.503.2]
(55)

e preconditionw4 w3 implies alignment between the window

sequences; therefore the validity of the Elimination and the Bypass Rule can be proofed similarly to the Copycat Rule with induction over time.

.

A Multi-Step Example

To illustrate how work ow transformation works, this section presents a bigger examples that includes all rules described in this chapter. e work ows are show in Figures . and . .

In the example two work ows have been speci ed. One of them cal-culates the seven day sliding minimum value over a subset of measure-ments that satisfy a certain preposition. e other calculates the seven day sliding maximum over the same subset, although this work ow has been de ned to start processing two weeks later than the rst.

a. To start with, the work ows such as speci ed and seen by the users of the Streaming Work ow System are shown. e two work-ows are completely separate from the user’s point of view even though they share the same view as data source. Because the work ows haven been de ned two weeks apart, theMinandMaxactivities have different window sequences.

b. If the two work ows were speci ed by the same user, he could have created a single work ow with multiple output views instead. Re-gardless of how the user(s) has/have speci ed the work ows, within the work ow system’s runtime they are one work ow because they share a view.

c. e user did not care about when the selection is actually per-formed and has therefore used the standard event-based window se-quencewe, so the rst transformation that can be applied is the Zip Rule. Both branches of the work ow perform the same selection on the same set of data using the same window sequence. If the user had speci ed different window sequences, the system would have been able to use the Copycat Rule on the selection activities instead.

(56)

d.Apply Shared Loop Rule to Both Branches:

.

.

. . . .CA.. ... .πMin.. ... .V

out

. . . .

Vin .σ..p ...

. . . .CA.. ... .πMax.. ... .V

out′ . we . w1 . w2 . w1 . w2

e.Apply Copycat Rule:

.

.

. . . .πMin.. ...

. . . .

Vin .σ..p ... .CA.. ...

. . . .Copy.. ... .πMax.. ... .V

out′

.

Vout

.

we .w1

.

w1

.

w2 .w2

f.Apply Copy Elimination Rule:

.

.

. . . .πMin.. ... .V

out

. . . .

Vin .σ..p ... .CA.. ...

. . . .πMax.. ... .V

out′

.

we .w1

.

w1

.

w2

[image:56.446.103.370.113.517.2]
(57)

consequence the window sequences of the new projection activities are the same as for the aggregation activities.

e. Because the only difference betweenw1andw2is theepoch, the

two Common Aggregate activities are an easy match for the Copycat Rule. One aggregate is removed and replaced by aCopyactivity.

f.Given thatw2⊆w2, the preconditions for the Copy Elimination

Rule are ful lled and theCopyactivity can be removed.

e work ows ina. andf. are weak equivalent. e view produced by the Common Aggregate has no equivalent in the original work ow and must therefore be excluded from the equivalence. Weak equiva-lence is acceptable here because this is exactly the asymmetric situation as explained in Section . .

.

Monitoring a River Slightly Differently

Reconsider the streaming work ow from Figure . . A scientist has devised a method to detect groundwater in uxes in dikes along a river. He uses a streaming work ow to lter and aggregate the sensor data before feeding it into the In ux Detector activity

e scientist has concluded that he needs to lter out more mea-surements because the temperature sensors generate more noise than expected. He changes the activity into a activity and submits the new work ow speci cation to the system. Using the Zip Rule, the system can share the ‘ ow’ branch of the work ow between the original and the revised work ow.

In another update the scientist addsAvg,MinandMaxactivities to the ‘ ow’ branch because they are needed by his new In ux Detection algorithm. Again the system uses the Zip Rule to reuse the unchanged parts of the work ow. Additionally the system applies the Shared Loop Rule, so the extra aggregates have virtually no extra processing costs.

(58)

(59)

Chapter Five

Tuple-based Transformation Rules

Many activities manipulate views in terms of individual tuples. Com-mon characteristics of these activities are: . they operate at the tuple level, . time-based selection is meaningless because of this and . they are used to manage data and have no intrinsic purpose in the work ow. In the ‘Monitoring a River’ example in Figure . tuple-based ac-tivities have been left out for clarity, although the work ow implicitly contains aUnionto collect all sensor data from a speci c type of sen-sor before ltering the data. If the scientist would want to limit his research to a smaller section of the river, he would addSelectactivities to the work ow. e usage of both an implicitUnionactivity and op-tionalSelectactivities illustrates that tuple-based activities are for data management and have no intrinsic purpose.

Many operators in relational algebra can be classi ed as tuple-based activities. e most notable exception is the Join; a Join does not quite match the characteristics above, setting it apart from the other the ac-tivities discussed here. e Join activity is not only discussed in Section

. but also in the next chapter.

is chapter starts with two rules that do come directly from rela-tional algebra.

.

Union Associativity Rule

A common activity isUnion. It copies all tuples from any input view to the output view, preserving all tuples and all attributes. e output view is sorted on time. Contrary to the union operator in relational algebra, aUnionactivity does not have to perform duplicate elimination as all tuples are unique by source and timestamp.

(60)

Before transformation (lhs):

. . . . . .

V1 . .

. . . .

V2 .Union.. ... .V

4

. . . .

V3 . .

. we . we . we

After transformation (rhs):

. . . . . .

V1 . . . .

. . . .

V2 .Union.. ... . .

. . ... .Union.. ... .V

4 . we . we . we . we . V3

[image:60.446.122.321.124.418.2]
(61)

Before transformation (lhs):

. . . . . .

Vin1

. .Join..

pred ... .

Vjoin

.

.

.

σp ... .V

out

. . . .

Vin2

. wjoin . wjoin . we

After transformation (rhs):

. . . . . .

Vin1 .σ..p ... .

Vin1

.

. . . .Join..

pred ... . Vout . . ... . we . wjoin . wjoin .

Vin2

Preconditions:

atoms(p)⊆schema(Vin1)

atoms(p)∩schema(Vin2) =

[image:61.446.131.319.112.553.2]

Figure

Figure �.�: Optimization
Figure �.�: Monitoring a River to Detect Groundwater In�uxes
Table �.�: Description of Work�ow Components in Figure �.�
Figure �.�: Types of Window Sequences. �e horizontal axis repre-sents the data in the view and the bars are window instances, indicatingthe selected data
+7

References

Related documents