• No results found

Ring-Annotated Relations and Differences

In this chapter we study the reformulation (rewriting) of relational queries that contain thediffer- enceoperator. Our goal for query reformulation is to optimize by reusing existing information, such as materialized views. Since the objective is optimization, we focus onexactreformulation, which finds only equivalent rewritings of the query. 1

Query reformulation using views is well understood for positive fragments of relational lan- guages, such as conjunctive queries (CQs) or unions of CQs (UCQs), under both set and bag semantics (see, e.g., [23,101]). As we shall discuss in more detail in the preamble to Section5.3in

both cases (bag and set semantics), complete procedures for finding UCQ rewritings using UCQ views exist, using finite search spaces. Also, in both cases UCQ equivalence is decidable. In fact, in the same discussion we argue that whenever a (reasonable) finite search space procedure exists, query equivalence must also be decidable.

It follows that the initial outlook on doing reformulations involving the difference operator is glum because even without views the equivalence of relational algebra (RA) queries is unde- cidable, for both set and bag semantics. 2

Hence, we cannot hope for the approaches to UCQ reformulation under bag or set semantics to extend to the entireRA.

However, here are at least three reasons not to give up easily. With reformulation of queries that include difference:

• Optimization using materialized viewscould be done over a broader space of plans (even if the original query and view were just CQs/UCQs!). Rewritings couldsubtractone view from a larger view in order to return a query answer.

• View adaptation[65], the act of updating a materialized view instance when the view defini- 1

In data integration, one is also interested inmaximally contained rewritings, see e.g., [71]. 2

The latter follows, e.g., from the undecidability of bag-containment of unions of conjunctive queries (UCQs) [81], since for UCQsQ,Q0we haveQis contained inQ0iffQ−Q0is equivalent to the empty answer query.

tion has changed, could be seen as a reformulation using views. Here, the updated view can be recomputed based on the old contents of the view, by adding and/or subtracting queries over the base data and possibly other views.

• Incremental view maintenance[66] could be seen as a reformulation using views, since inser-

tions and deletions could be treated as unions and differences.

• Mapping evolutionandupdate exchangein CDSS could be attacked using reformulation-based techniques, as the two problems are closely related to view adaptation and incremental view maintenance.

These are all highly relevant problems in databases.

Thus we ask the following natural question: is there a slightly less expressive class of queries thanRA, — still including difference, and hence still providing the benefits cited above — for which reformulation can be handled effectively. In this chapter we do this via an excursion through a non-standard semantics that is of interest in its own right: what we termZ-relations. These are relations whose tuples are annotated with integers (positive or negative) and the positive

RAoperators are defined on them according to the semiring-annotated semantics introduced in Chapter2. In addition, difference has an obvious, natural definition onZ-relations.

Z-relations are a natural representation for theupdatesto source relations (collections of tuple insertions and deletions, a.k.a.deltas) which must be propagated in incremental view maintenance applications. Indeed, both data and updates can be uniformly represented usingZ-relations, and “application” of a delta to a relation corresponds to simply computing a union. We discuss this further in Section5.1.

It turns out that reformulation ofRAqueries usingRAviews can be solved effectively with respect to the Z-semantics since here equivalence of RA queries with respect to a set of RA

views isdecidable. Moreover, we obtain practically useful results about the class of RA queries for which the reformulation with respect toZ-semantics remains valid with respect to bag se- mantics. Although membership in this class of queries is necessarily undecidable, there are many useful cases with simple sufficient conditions for membership, in the three classes of applications outlined above.

For applications such as CDSS which require rich provenance information, we propose the use of another kind of ring-annotated relation, N[X]-relations, for uniformly representing data and updates. These are like the provenance polynomials N[X] of Chapter 2, but with integer

coefficients.

• We show that under Z-semantics every RA query is equivalent to the difference of two queries inRA+. The latter are selection/projection/join/union queries, forming thepositive relational algebra, and equivalent in expressiveness to UCQs. Then the decidability of equiv- alence ofRAqueries underZ-semantics is a corollary of the decidability of equivalence of UCQs.

• It follows that in reformulation using views underZ-semantics we can work with differences of unions of conjunctive queries (DUCQs). We give a terminating, confluent, sound and complete rewrite system such that if two DUCQs are equivalent under a set of views then they can be rewritten to the same query (modulo isomorphism). This leads to our procedure for exploring the space of reformulations (using the opposites of the rewrite rules).

• In contrast to CQs/UCQs under set semantics, there is no inherent or natural, instance- independent notion of “minimality” for DUCQs underZ-semantics that would yield a finite reformulation search space. We bound the search under a simple cost model, which is an abstraction of the one used in a query optimizer.

• Next we examine when we can use theZ-semantics reformulation strategy to obtain results that work for the bag semantics. We show that the reformulation procedure is closed for queries/views in this class. We also give simple membership conditions.

• We also show how to extend our results to queries withbuilt-in predicates, i.e., inequalities and non-equalities.

• Finally, we discuss the use ofN[X]-relations for updates, and we show thatZ-equivalence and Z[X]-equivalence of relational algebra queries coincide. A consequence is that our Z- semantics reformulation strategy is also sound and complete forZ[X]-semantics.

The chapter is structured as follows. We discuss motivating applications in Section 5.1. We

define the semantics of RA on Z-relations, establish the decidability of Z-equivalence of RA

queries and introduce DUCQs in Section5.2. We introduce the rewrite system for queries using

views in Section 5.3. We present reformulation algorithms and strategies in Section 5.4. We

discuss reformulation for bag semantics/set semantics viaZ-semantics in Section5.5. We extend

ourZ-equivalence results toRAwith built-in predicates in Section5.6. We discussZ[X]-relations