Related Work - Efficient Incremental View Maintenance for Data Warehousing

Incremental view maintenance has received considerable attention from the database community for the last decade [GMS93, GL95, MQM97, CGL⁺96, SBCL00]. In [GMS93], the authors propose algorithms for incremental view maintenance under bag semantics. They also support recursive views in Datalog. In [GL95], the authors establish an algebraic framework for prop-agating deltas through each operator, which is more robust and extensible to new language constructs. In this work, we propose to extend this frame-work to also support PIVOT and UNPIVOT operators. Note that since PIVOT requires a key to exist in the source table, we assume set

seman-tics for non-aggregate views and bag semanseman-tics for aggregate views (since group-by columns form a key).

PIVOT is similar to GROUPBY in many ways [CGGL04]. In [Qua96], the authors propose the insert/delete and update propagation rules for the GROUPBY operator. They also show that it is more preferable to use the latter rules. However, unlike the PIVOT operator, the GROUPBY opera-tor loses the detailed data. Hence the combination and pullup rules for GROUPBY are fairly restrictive. As a result, most commercial database sys-tems only support SPJ+GROUPBY views. Or in other words, they do not support general views with any number of GROUPBY operators that can be anywhere in the query tree. Fortunately, we illustrate in this work that the PIVOT operator has a lot of interesting properties since it keeps the de-tailed data. As shown in this chapter, they can be combined in many ways, resulting in a generalized pivot operator. They can also be pulled up in the query algebra tree, which is more flexible than the situation observed for the GROUPBY operator [CS94]. As a result, we illustrate that it is possible to derive an efficient maintenance plan. In fact, the same approach can be applied for efficient maintenance of general aggregate views with arbitrary number of GROUPBY operators.

In [CGGL04], the authors propose the optimization and execution strate-gies for pivot and unpivot in Microsoft SQL Server. In fact, similar tech-niques can also be applied to include the GPIVOT and GUNPIVOT into the query engine as also briefly mentioned by the authors. In this chapter, we address another important aspect of the PIVOT and UNPIVOT opera-tors, namely, incremental view maintenance. We also show the necessity

of the GPIVOT definition for efficient view maintenance as well as for the optimization of queries with even just simple PIVOTs.

The PIVOT operator defined in [CGGL04] has slight semantic difference than our definition here. In that work, a pivoted output row may contain all ⊥ columns. In this case, when we have a maintained pivoted tuple (K, ⊥ , ..., ⊥), we cannot simply delete it. One solution is to extend the definition of such a PIVOT operator to also include a column COUNT(*), which com-putes the number of rows for each K. We delete the view tuple (K, ⊥, ..., ⊥ , CN T ) only when CNT becomes 0. When GPIVOT is above a GROUPBY operator, we can extend the GPIVOT definition to include a SUM(count(*)) column. We delete the view tuple (K, ⊥, ..., ⊥, SU M (count(∗))) only when SUM(count(*)) becomes 0.

The PIVOT and UNPIVOT operators studied in this chapter are first-order, since the output columns are pre-determined in the query by speci-fying the interested values. In [LSS99], the authors propose the SchemaSQL language with FOLD and UNFOLD operators which are very similar to PIVOT and UNPIVOT operators. However, these two operators are high-order since the output columns are dynamically determined by all distinct values. The incremental maintenance SchemaSQL views was first studied in [KR02]. However, the technique is primarily tuple-based and not effi-cient for batch updates. In this work, we study the first-order version of such operators and thus we are able to derive efficient maintenance plans.

It is an interesting future work to extend our proposed algorithms to sup-port the maintenance of such higher-order pivot and unpivot operators, such as FOLD and UNFOLD [LSS99].

Chapter 3 Views with Complex Aggregate Functions

3.1 Our Contributions

In this chapter, we will propose a generic and comprehensive solution frame-work for management of views with complex aggregate functions as moti-vated in Section 1.3.2. In summary, the main contributions of this work are as follows:

• We propose a workarea function model for management of views containing complex aggregate functions. This framework greatly in-creases the system’s extensibility to add the support of new func-tions, which is an especially useful feature for user-defined functions [WZ00].

• Based on this model, we propose a generic strategy for maintaining

aggregate materialized views with complex aggregate functions.

• We introduce a view matching algorithm to answer queries using such views also in a generic fashion using our workarea function model.

• Our workarea function model can also be extended to support multi-dimensional view maintenance and view matching. We can now also efficiently stack the computation of a multidimensional query.

• We have implemented our techniques in a prototype system of IBM DB2 UDB [Cha98]. The extensive experimental study demonstrates orders of magnitude performance improvement for incremental view maintenance and stacking computation for multi-dimensional queries.

The rest of this chapter is organized as follows. Section 3.2 presents the workarea function model. In Sections 3.3, we describe the techniques for incremental view maintenance with algebraic functions. Section 3.4 intro-duces a view matching algorithm for answering queries using such views.

Section 3.5 describes how to support algebraic functions with multidimen-sional operators. We discuss the experimental results in Section 3.6. Sec-tion 3.7 reviews the related work.

In document Efficient Incremental View Maintenance for Data Warehousing (Page 121-126)