• No results found

Idea: Recursive Incremental View Maintenance – The Viewlet Transform

In document Efficient Incremental Data Analysis (Page 30-33)

1.4 Thesis Outline

2.1.2 Idea: Recursive Incremental View Maintenance – The Viewlet Transform

Delta queries can be expensive despite their simpler form. For instance, a delta of an n-way join still references (n − 1) base tables. Instead of computing such a delta query from scratch, we could re-apply the idea of incremental processing to speed up the delta evaluation: store previously computed delta results, just as any other query result, and compute the delta of a delta query (second-order delta) to maintain the materialized delta result. That way, the second-order delta query maintains the first-order delta view, which in turn maintains the top-level view. Assuming that with each derivation deltas become simpler, we could recursively apply the same procedure until we get deltas with no references to base tables.

2.1. Concept: Incremental Computation in Databases

The described technique for constructing higher-order deltas is closer in spirit to discrete wavelet and numerical differentiation methods, and we use a superficial analogy to the Haar wavelet transform as the motivation for calling the base technique a viewlet transform. Here, we present just the intuition behind this technique and provide a more formal description later in this chapter.

The viewlet transform materializes the top-level view along with a set of auxiliary views that support each other’s incremental maintenance. The materialization procedure starts from the top-level view and derives its delta queries for updates to base relations. For each delta query, the procedure materializes its update-independent parts such that the delta evaluation

requires as little work as possible. In other words, it transforms∆Q(D,∆D) into an equivalent

query∆Q0that evaluates over a set of materialized views M1, . . . , Mkand update∆D:

∆Q(D,∆D) = ∆Q0(M

1(D), M2(D), ..., Mk(D),∆D)

But note that M1, . . . , Mkalso require maintenance, which again relies on simpler materialized

views. At first, it may appear counterintuitive that storing more data can reduce maintenance costs. However, the recursive incremental maintenance scheme makes the work required to keep all views fresh extremely simple. For flat queries, each individual aggregate value can be incrementally maintained using a constant amount of work [93, 94], which is impossible to achieve with classical incremental maintenance or re-evaluation.

Example 2.1.2 Let us apply recursive incremental view maintenance on the query of Exam-

ple 2.1.1 and updates to R. ConsideringRQ, we materialize its update-independent part

S(B,C )./ T (C,D) as an auxiliary view MST(B ). We projected away C and D as they are ir-

relevant for the computation of∆RQ. Repeating the same procedure for updates to T , we

materialize R(A, B )./ S(B,C) as MRS(B,C ) to facilitate computing ofTQ. For updates to S,

we materialize R(A, B )./ T (C,D) separately as MR(B ) and MT(C )1.

Next, we derive second-order deltas for MST and MRS. Repeating the same delta derivation for

updates to all three base relations, we materialize one additional view MS(B,C ) representing

the base relation S. Further derivation produces delta expressions with no base relations. Overall, recursive view maintenance materializes queries at three different levels: the top-level

query MQ, two auxiliary views MRS and MST, and the base tables MR, MS, and MT. The

maintenance trigger for updates to R looks as follows.

Listing 2.2 Recursive incremental view maintenance of Q for updates to R

1 ON U P D A T E R BY ∆R :

2 MQ( B ) += Sum[B](∆R ( A , B ) ./ MST( B ) )

3 MRS( B , C ) += Sum[B](∆R ( A , B ) ./ MS( B , C ) )

4 MR( B ) += Sum[B](∆R ( A , B ) )

We similarly build triggers for updates to S and T . 2

1An efficient implementation of the viewlet transform avoids materializing query results with disconnected join

The viewlet transform can produce triggers with lower complexity than classical maintenance triggers. In the previous example, each statement performs at most one join between the delta relation and one materialized view, which is clearly less expensive than the classical approach. In general, if classical IVM is a good idea, then repeating it recursively is an even better idea. The same efficiency improvement argument in favor of IVM of the base query also holds for IVM of the delta query. Considering that joins are expensive and this approach simplifies or eliminates them, the viewlet transform has the potential for excellent query performance.

2.2 Data and Query Model

In this section, we introduce formalisms used for studying the problem of incremental view maintenance for relational queries. We present the internal data model, generalized multiset

relations (GMRs), which enables a uniform treatment of different forms of updates (insertions

and deletions) during incremental view maintenance. We define the query language, AGgregate

CAlculus (AGCA), which consists of a few operators capable of expressing most of SQL and is

amenable to powerful optimizations due to its simplicity.

Our data model generalizes multiset relations to collections of tuples where each tuple is

annotated with a rational multiplicity (i.e., fromQ). As such multiplicities can be positive or

negative, we can treat databases and updates as well as insertions and deletions uniformly – for instance, a deletion is a relation with negative multiplicities, and applying an update to a database means unioning/adding it to the database. In our model, such rational multiplicities can also keep (potentially non-integer) aggregate values of group-by queries, in contrast to SQL which stores these values in an additional column (thus, changing the query result schema). Maintaining aggregates in the multiplicities allows for simpler and cleaner bookkeeping in delta processing – for instance, growing an aggregate means changing the multiplicity of a tuple rather than deleting the tuple and inserting a tuple with the new aggregate value.

Furthermore, we can associate multiple “multiplicities” (Qk) to a tuple to maintain multiple

aggregates inside a single GMR.

Our query language (AGCA) consists of just four operations – addition, its inverse, multi- plication, and sum-aggregation – constructed over GMRs and infinite interpreted relations (which capture conditions, such as a < b and x = 5). AGCA is based on the ring-theoretic framework [93, 94] which defines the query language as a polynomial ring over GMRs with an addition operation that at once generalizes multiset union (as known from SQL) and updating, and a multiplication operation that generalizes the natural join operation. This syntactic simplicity of AGCA enables rich optimizations, as described in Chapter 3.

The query language implements sideways information passing and enforces range restric- tion (variable bindings) as known in the context of relational calculus. Supporting such bindings eliminates the need for an explicit selection operation, which AGCA encodes as a multiplication of a query with a condition (interpreted relation) just like in relational calculus. Multiplication is defined in such a way that query results are guaranteed to be always finite.

2.2. Data and Query Model

In document Efficient Incremental Data Analysis (Page 30-33)