This dissertation is organized into five parts. Part I includes this intro- duction (Chapter 1). It reviews the related issues and solutions under the umbrella of semantic caching, and the challenges faced with the new data model and query languages in the web environment which provides the motivation for the work in this dissertation.
Part II describes the techniques for XQuery containment and rewriting. Chapter 2 first introduces this topic and then reviews the state-of-the-art research in this field. Chapter 3 defines the containment problem targeting a restricted XQuery fragment beyond XPath fragments, and then presents an approach for it. Chapter 4 describes the rewriting technique for XQuery based on the established containment mapping.
Part III addresses the issue of semantic cache replacement. Chapter 5 reviews the background on a variety of existing replacement strategies and discusses their advantages and disadvantages. Chapter 6 presents a fine- grained replacement strategy particularly tailored for the XQuery-based semantic caching system.
Part IV describes the implementation and evaluation of an XQuery- based semantic caching system called ACE-XQ. Chapter 7 delineates the
1.7. ORGANIZATION OF THIS DISSERTATION 37
overall architecture of the ACE-XQ system. Chapter 8 summaries the ex- perimental studies focusing on the approach validation and the replace- ment strategy evaluation.
Finally, Part V concludes the dissertation. In Chapter 9, we summarize our results and give the directions for future work.
Part II
Containment and Rewriting for
XQuery
39
Chapter 2
Background on Query
Containment and Rewriting
2.1
Preliminaries
In this section, we begin by introducing the terminologies used in the con- text of query containment and rewriting.
Definition 2.1 (Query Containment) The set of answers of queryQon database
Dare denotedQ(D). A queryQ2is said to be contained in a queryQ1, denoted Q2Q1, ifQ2produces a subset of the answers ofQ1for any given databaseD, i.e., (8D):Q2(D)Q1(D).
The complexity of query containment has first been studied for the fun- damental class of conjunctive queries which are also known as subsets of datalog. A datalog program is a set of datalog rules. A datalog rule has the form:
q( X):-r 1( X 1), ..., r n( X n), where q andr 1, ..., r
n are predicate (also called relation) names. The atomq(
X)is called the head of the rule, and the atomsr i(
X i)
i=1;:::;nare the subgoals in the body of the rule.
X 1
;:::; X
ndenote tuple variables includ- ing constants.
Xdenotes the set of head variables that are projected out from the source relations in the rule body to compose the view relation. Multiple occurrences of the same variable in different subgoals imply a join predi- cate of the query. It is required that every rule is safe, i.e., every variable that appears in the head must also appear in the body.
A conjunctive query is a datalog program consisting of a single rule. A non-recursive datalog program is a set of datalog rules such that there ex- ists an orderingR
1 ;:::;R
m of the rules so that the predicate name in the head ofR
i does not occur in the body of rule R
j whenever
j i. Such datalog programs can always be unfolded into a finite union of conjunctive queries. When comparison predicates6=;<;>;, andare considered in queries and views, it then requires that for each datalog rule, if a variableX appears in a subgoal with a comparison predicate, thenXmust also appear in an relational subgoal in the body of the rule.
An important theorem about the query containment for conjunctive queries given in [CM77] is listed next.
Theorem 2.1 (Containment Mapping) A conjunctive query Q2 is contained in another queryQ1, denoted byQ2 v Q1, if and only if there is a containment mapping fromQ1toQ2. The containment mapping maps variables ofQ1to those ofQ2such that every subgoal inQ1is mapped to a subgoal inQ2.
2.1. PRELIMINARIES 41
An important implication of this one-to-one containment mapping is that a join variable appearing twice in different subgoals ofQ1must corre- spondingly have coherent mappings to one join variable inQ2. An intuitive explanation of the containment mapping is that an isomorphism between the atomic predicates ofQ1and a subset of the atomic predicates ofQ2can guarantee the answer setQ2is subsumed by that ofQ1.
Containment Mapping Example. For example, given two conjunctive queries
Q1andQ2as below, the containment mapping fromQ1toQ2is thatfX! X 0 ;Y !Y 0 ;W !W 0 ;Z !W 0 g. Q1: p(X,Y) :- r(X,W), b(W,Z), r(Z,Y) Q2: p(X’,Y’) :- r(X’,W’), b(W’,W’), r(W’,Y’)
However, there is no containment mapping fromQ2toQ1because for b(W
0 ;W
0
) in Q2, its only possible target in Q1is b(W;Z). But we cannot
have a mapping W
0
! W and W
0
! Z, since one variable cannot be mapped to two different variables.
Containment of conjunctive queries is NP-complete [CM77]. Contain- ment of a conjunctive query in a non-recursive query is also NP-complete, while containment of two non-recursive queries is
P
2-complete [CV92]. Containment is decidable when at least one of the two queries is non- recursive [CV92], while containment of arbitrary datalog programs is un- decidable [O. 93]. Furthermore, containment of conjunctive queries with arithmetic comparisons in the form ofA
1 A 2 (operator is,, =,,) is p 2-complete [F. 02].
Graph Homomorphism and Simulation. Besides the logic-based formal- ism, graph theory is also applied as a tool to study query containment.
Definition 2.2 (Graph Homomorphism) SupposeG1andG2are two directed labelled graphs. A simulation is a relation R between nodes in G1 and G2, if R (x1;x2) and(x1;a;y1) 2 G1, i.e., y1 is a child ofx1 and its label is a, then exists(x2;a;y2) 2 G2such that R (y1;y2). Such a relationRis also referred to as a graph homomorphism function.
It is shown in [CR97] that conjunctive query containment, although a NP-complete problem in general, can be solved in polynomial time, if the query is acyclic. Their idea generalizes the notion of acyclicity using a term called query width that is derived from graph theory. [LS97a] gives a containment algorithm for complex objects by relating the problem with graph simulation. The problem of query containment for complex objects is shown to be decidable by the approach of graph simulation. Also, it is shown that checking the equivalence of conjunctive queries for complex objects with grouping and aggregates is NP-complete.
The connection between query containment and rewriting is also stud- ied in [CR97]. It is shown the guess of a rewriting can be extended to a guess for containment mappings and for showing the equivalence of the rewriting and the query, the latter of which has the complexity of NP- complete. However, there are many polynomial-time cases of the rewriting problems in practice, analogous to those for query containment.
Definition 2.3 (Maximal Rewriting) Given a view queryQ1and a new query
Q2, a queryQ2 0
is a contained rewriting of queryQ2usingQ1ifQ2 0