Problem Definition - Semantic Caching for XML Queries

and structured results. Moreover, it can improve the maintenance effi- ciency over the re-computation approach by utilizing a local auxiliary struc- ture for accommodating the objects relevant to the web view computation.

1.4 Problem Definition

1.4.1 Containment and Rewriting for XQuery

We first investigate the challenges imposed by the problem of XQuery containment and present an approach for tackling it. The containment for a complete XML query language such as XQuery remains unexplored in the literature. Compared to XPath, XQuery is more powerful in the sense that it can specify sophisticated queries by utilizing variable bindings, element constructions, and result restructuring. XPath expressions serve as a basic query construct of XQuery for selecting objects to be associated with vari- ables or to be returned in the result. Clearly, any result on the XPath containment topic provides the foundation for solving XQuery containment. In particular, it can resolve the containment relationship between two se- lected node sets if they are derived from the same starting node via differ- ent XPath expressions. However, bridging the gap between the containment for XPath and for XQuery is not trivial. We are facing the following difficulties.

First, the research on XPath containment has been primarily focussed on the complexity of the containment problem for various fragments of XPath [G. 02b, Woo03, FT03] rather than on the full XPath language [W3C03b]. Among other results, the containment of the XPath fragmentXP

f=;==[℄;g is

coNP-complete [G. 02b]. If disjunction “j” is added into this fragment and XML documents are restricted to a finite alphabet, than the containment complexity jumps to PSPACE-complete [FT03]. Therefore, restricting the XPath features being considered for the containment problem to be within a certain fragment is a valid research methodology towards problem solving. In our problem domain, many other features such as variable bindings, nested FLWR expressions, and element constructors coexist with XPath to compose XQuery. The question followed is that which subset of XQuery features should be given higher priority to be considered for the containment problem.

Second, the existing XPath containment work often exploits certain pattern trees [G. 02b] for representing XPath expressions, based on which the XPath containment problem can be reduced to tree homomorphism. Hence the questions is if XQuery can be represented in a similar pattern tree form as that for XPath fragment, or what is an appropriate representation of XQuery such that it precisely captures the semantics of the considered features and can serve as a mechanism facilitating the containment checking and rewriting.

Third, since XQuery is has more sophisticated features than XPath, the procedure for determining the containment of XQuery is likely more com- plicated than that for XPath. The questions hence are what are the specific conditions required for containment checking, and how are they utilized in the containment algorithm for determining the containment.

Fourth, the result of an XPath expression query is a node set of a single element type, while that of an XQuery is a tree composed of data bind-

1.4. PROBLEM DEFINITION 25

ings derived from the original XML document. Such a result restructuring capability of XQuery imposes difficulties to the rewriting problem. If a containment relationshipQ2 Q1is determined, we need to answer the question of how to match the data pieces inQ1’s result tree to their origins in the input XML data tree so thatQ2can be rewritten to be redirected to locate the desired data bindings inQ1’s result tree instead of the original XML data tree.

Objectives. With the overall goal of solving the XQuery containment and rewriting problem, we list the followings as our tasks.

1). Define an XQuery fragment containing an appropriate subset of XQuery features to be considered for the containment problem1_;

2). Devise a precise representation for XQuery which can serve as a mechanism facilitating the containment checking and rewriting;

3). Propose the containment checking conditions and the algorithm that utilizes them for determining containment;

4). Prove the soundness of our containment checking approach;

5). Find a mechanism for establishing the mapping between data in the restructured result tree and their origins in the input tree, and design the rewriting technique exploiting this mapping.

1_{By “appropriate”, we mean that the core features such as nested FLWR expressions,} variable bindings and result constructions which are distinct from XPath features but do not necessarily induce a jump of containment complexity are included in the fragment. Features that are either too trivial to be considered or would cause high complication for the containment problem will however be left out for this work.

1.4.2 Replacement for XQuery-based Semantic Cache

In the traditional semantic caching systems, a query region is the minimal granularity managed in the cache. Upon the incoming of a new query over- lapping with a cached one, the cached query region is either split into two or preserved as a whole. The former cache region management scheme will cause the cache space to be severely fragmented over time, while the latter scheme does not allow for a precise recording of the XML fragment popularities due to the coarse granularity of total query regions.

The replacement strategy based on either of these two schemes suffers drawbacks. If the cache space is over-fragmented, caching a new query may require to purge many many tiny query regions. If imprecise user access statistics are recorded on query regions, the replacement would be- come rather random. Also, the replacement unit may be too coarse-grained to maintain an efficient cache space utilization.

The question is if we can record user access statistics at a fine-grained level rather than on query regions while still avoiding the physical region splitting. This way, the replacement function can calculate the utility value based on the precise statistics but does not suffer the over-fragmentation problem. Our objectives are hence to find such a way for recording fine- grained statistics, to propose a replacement function utilizing such infor- mation, and to maintain the cache regions.

In document Semantic Caching for XML Queries (Page 36-40)