Retrieving Cache Contents - Incremental Query Evaluation with the RCADG Cache

10.5 Incremental Query Evaluation with the RCADG Cache

10.5.2 Retrieving Cache Contents

Every new query to be evaluated incrementally first undergoes the same schema-level rewriting and matching procedures that were described for the evaluation from scratch (see Sections 8.4.1 and 8.4.2). In the

CHAPTER 10. THERCADG CACHEFOR XML QUERIES AND RESULTS schema hit evaluation step match edges new edge cache edge

χQn 1 7→ { s Q 1 7→ { hParent∗∗(q3n,qn1), hParent∗∗(q4,q1),sQ1 ,{χQ} i i } , sQ₂ 7→ { hParent∗_∗(qn₂,qn₁), hParent∗_∗(q2,q1),sQ2 ,{χ Q_{} i i }} _, sQ₁′ 7→ { hParent∗_∗(qn₃,qn₁), hParent∗_∗(q′₃,q′₁),sQ₁′,{χQ′ } i i } , sQ₂′ 7→ { hParent∗_∗(qn 2,qn1), hParent∗∗(q′2,q′1),s Q′ 2 ,{χQ ′ } i i } } χQn 2 7→ { s Q 2 7→ { hParent∗∗(q2n,qn1), hParent∗∗(q2,q1),sQ2 ,{χQ} i i } , sQ₂′ 7→ { hParent∗_∗(qn₂,qn₁), hParent∗_∗(q′₂,q′₁),sQ₂′,{χQ′ } i i } }

Figure 10.5: The result LQ_{_Qn′_,_Q_} of looking up cache edges in C{Q′,Q} (see Figure 10.4 on the preceding page) for the schematized binary constraints in Qn↓χ₁Qn and Qn↓χ₂Qn(see Figures 10.1 f., g. on page 143). The look-up result is a nested mapping with the following structure. Each of the two schema hits for Qn (left column) is mapped to a nested mapping that groups the retrieved cache edges by evaluation steps. Every distinct evaluation step from any of the retrieved cache edges (middle column) is mapped to a set of match edges each representing the matching of a relevant binary constraint which happened in that step. A match edge simply binds a cache edge to the corresponding query edge in Qn, and thus indicates which D-constraints in a cached query and in Qn_{must be compared.}

case of our sample query Qn_{, this yields the two schema hits}χQn 1 andχ

2 . Next, the query is normalized

and schematized with each of these schema hits, as shown in Figures 10.1 f., g. on page 143. The resulting schema edges are then looked up in the main-memory part of theRCADG Cache. In our example, three distinct schema edges are looked up, namely,Parent∗_∗(#5,#1),Parent∗_∗(#2,#1)andParent∗_∗(#6,#1). In the cache C_{_Q′_,_Q_}, four cache edges are retrieved for the first two schema edges (top four rows in Figure 10.4 on the facing page) whereas there is no hit for the third one.

The look-up result for Qn is rearranged in a nested map LQn (see Figure 10.5), as follows. Each cache edge cc retrieved for a schema edge cs is bound to the binary constraint c in Qn that created cs.

For instance, looking up the schema edge cs=Parent∗∗(#5,#1)that was created from the binary con- straint c=Parent∗_∗(qn₃,qn

1)in Qn, we retrieve the cache edge cc=hParent∗∗(q4,q1),sQ₁,{χQ}iin C{Q′_,_Q_}. Therefore c and ccare associated in the first entry of LQ

in Figure 10.5. Henceforth we refer to such a pairhc,cciof a query edge c in Qn and a cache edge cc retrieved for c as a match edge. Match edges

specify which D-constraints in a cached and a new query must be reconciled for schema-hit containment to hold true. In this case, the first match edge in Figure 10.5 specifies that the second condition in Defini- tion 10.1 on page 144 must be checked for the D-constraints attached to two pairs of query nodes, namely, qn₃,q4and qn1,q1. The differences and relations between schema edges, cache edges and match edges is

summarized in Table 10.1 on the following page.

As can be seen in Figure 10.5, the look-up for Qnin C_{_Q′_,_Q_}produces six match edges (right-hand side, one match edge in each row). The nested structure of LQn emerges when grouping these match edges by (1) by the schema hit for Qnfor which the cache edges were retrieved (left column) and (2) by the evaluation steps in the cache edges (middle column), in that order. For instance, the first four match edges in LQn were retrieve for Qn↓χ₁Qn and the last two for Qn↓χ₂Qn. Note that since Qn↓χ₁Qn and Qn↓χ₂Qn share the same schema edgeParent∗∗(#2,#1)(see Figures 10.1 f., g. on page 143), the match edges forχQ

2 in the last two

rows of Figure 10.5 are duplicates of the match edges forχ₁Qn in rows two and four. This redundancy will allow us to obtain matches to distinct schema hits for Qnindependently, which is a characteristic of the notion ofRCADG Cacheoverlap introduced before (see Definition 10.2 on page 146). In fact, LQn is usually not materialized in its entirety at any given point in time. Instead we successively and separately create, then process and finally discard each of the distinct top-level entries for all schema hits of Qn(see below).

As indicated by the curly braces in Figure 10.5, each nesting level in LQn _{is a one-to-many mapping.}

On the lower level (right-hand side), there may be multiple cache edges representing binary constraints in

10.5. INCREMENTAL QUERY EVALUATION WITH THERCADG CACHE

edge type description

query edge c (Fig. 10.1a.–c.)

Specifies a binary query constraint on the intensional level. These are the edges in the query graph. There are query edges for expressing all XPath axes.

schema edge cs (Fig. 10.1d.–g.)

Represents a schema-level match to a query edge for a specific schema hit. Schema edges are created by schematizing queries to be cached or to be looked up in the cache. They serve as keys in the main-memory part of the cache, allowing to retrieve cached candidate queries for a new query to be evaluated incrementally. cache edge cc

(Fig. 10.4)

Indicates which query edge in a cached query corresponds to a particular schema edge, and which schema hits produced that schema edge during the schematization of that query. Cache edges serve to collect all schema hits to a cached query that are relevant to a specific schema edge being looked up in the cache. Each cache edge also specifies in which evaluation step the query edge in question was matched on document level.

match edge cm (Fig. 10.5)

Binds a cache edge to a query edge that belongs a new query being looked up in the cache. Match edges specify whichD-constraints in a cached query correspond to whichD-constraints in the new query. This is essential for deciding schema-hit containment and creating remainder queries that return the RCADG Cache overlap.

Table 10.1: Different representations of binary query constraints (“edges”) during the incremental evaluation process. Only query edges (first row) are part of the query model (see Section 2.2). All other types of edge are needed for retrieving and comparing queries that are stored in theRCADG Cache.

a specific cached query that were matched in the same evaluation step (although this is not the case for our sample queries Q′ and Q in the cache). The upper level of LQn (left-hand side of Figure 10.5) is a one-to-many mapping, too, since for the same schema hit of a new query, cache edges for different queries and evaluation steps may be retrieved in the cache, as shown in the figure.

Finally, note that the look-up result LQn _{for Q}n_{only covers the first two steps in the evaluation of the}

cached queries Q′and Q. In particular, the cache entries in C_{_Q′_,_Q_}for the schema edgesParent∗_∗(#5,#3) andParent∗_∗(#4,#1)(last two rows in Figure 10.4 on page 148) are not retrieved because these are not part of any schematization of Qn(see Figure 10.1 on page 143). Provided that the mapping underlying C_{_Q′_,_Q_} is implemented so as to avoid a sequential scan of the memory-resident cache part (e.g., using suitable hash functions), such irrelevant cache contents are typically never touched during the look-up. This means that even as the cache grows, the promising candidate queries are retrieved very efficiently. In Section 10.6 we experimentally confirm the scalability of theRCADG Cache.

In document Weigel, Felix (2006): Structural Summaries as a Core Technology for Efficient XML Retrieval. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 160-162)