(Subsumption for simple breadth complete Xcerpt query terms) Subsumption for Xcerpt ([]) and Xcerpt ( {} ) can be decided in linear

time.

After this short intermezzo on the complexity of subsumption for less expressive fragments of Xcerpt query terms, let us return back to the discussion ofXcerpt({{}})subsumption.

Corollary6gives us a reduction from Xcerpt({{}})subsumption to Xcerpt query term simulation. Since the mapping from Xcerpt query terms inXcerpt({{}})to their single canonical models is bijective, we can also reduce simulation between an Xcerpt query termq∈Xcerpt({{}}) and an Xcerpt data term d∈Xcerpt({})to Xcerpt({{}})subsumption: Letµbe the mapping from query terms inXcerpt({{}})to their canonical models. Thenqsimulates intodif and only ifqcontainsµ−1₍_d₎_{. Thus} we obtain Corollary8.

Corollary 8(Complexity of Subsumption forXcerpt({{}})). Subsump- tion forXcerpt({{}})is in the same complexity class as Simulation between Xcerpt({{}})query terms andXcerpt({})data terms.

Unfortunately, the complexity of Simulation between Xcerpt({{}}) query terms andXcerpt({})data terms has not yet been determined. LetXcerpt({{}},|Σ|=1)be the set of Xcerpt query terms with breadth incomplete subterm specification constructed over a single label only. Subsumption for Xcerpt({{}},|Σ| = 1)– and also Simulation between

Xcerpt({{}},|Σ|=1)query terms andXcerpt({})data terms – is equivalent to thesubtree isomorphism problem, which has been known to be inPfor quite some time, and which has been shown to be solvable in

O(k1.376·n)where kis the number of nodes in the embedded tree, andnthe number of nodes in the embedding tree [ST97].

Corollary 9 (Complexity of Xcerpt({{}},|Σ| = 1)). Simulation and subsumption ofXcerpt({{}},|Σ|=1)query terms can be decided inO(k1.376·n) wherekis the size of the query term andnis the size of the data.

Subtree isomorphism has also been examined for ordered trees. [M¨89] obtains anO(n+m)bound for finding an orderedbottom-up subtree of size min an ordered tree of size n by rewriting trees of arityk

to binary trees and comparing the Zaks sequence representation of pattern and data tree. A bottom-up subtreetbof a treetis a subtree such that for all nodesx andyin t, if xis int_b andyis a child of

x, then alsoymust be intb. Ordered bottom-up subtree matching is equivalent to the Xcerpt query term fragment over a singular alphabet with complete, ordered term specification only, and with the root term qualified as a descendant. We denote this fragment asXcerpt([],|Σ|=

1,↑). As a further illustration, the termdesc a[ a[ a[], a[] ], a[]]is inXcerpt([],|Σ|=1,↑), but the termsa[],desc a[[ ]]anddesc a[ b[] ]

8.5 c o m p l e x i t y f o r x c e r p t f r a g m e n t s 177

Figure11: Embeddings from [Kil92] versus Xcerpt simulation

G1 a b c 6i G2 a b c d

Also [Val02] considers ordered subtree isomorphism over a singular alphabetΣ, but again with a different notion of subtrees than required forXcerpt([[]])simulation: If a nodendin the data treedis matched by the some node np in the pattern treep, then also all left siblings ofndmust be matched by left siblings ofnpin the same embedding. These kinds of queries are expressible in Xcerpt only in the presence of subterm negation (i.e. thewithoutkeyword). Therefore we do not give an upper bound for the complexity ofXcerpt([[]],|Σ|=1)simulation, but refer to the bound for the larger fragment Xcerpt([[]])identified below.

Having identified the complexity of simulation and subsumption for Xcerpt({{}},|Σ| = 1)we now turn to the equivalent problem over an arbitrary alphabet. [Kil92] considers ten different tree inclusion problems, four of which are of interest for Xcerpt simulation and, for the sake of canonical models, also for Xcerpt containment. These are ordered path inclusion,unordered path inclusion,ordered tree inclusionand unordered tree inclusion. Central to all these problems is the notion of embeddings. An embedding as defined in [Kil92] from a pattern tree p to a data treed is aninjectivefunction from the nodes of pto the nodes ofd, that preserves labels and the ancestor relationship. If there is an embedding frompintod, we writep₆i q. Embeddings differ from XPath tree matching in that they are required to beinjective, and from Xcerpt term simulation in that injectivity is treated differently in combination with the Xcerpt descendant modifier, as Figure 11 illustrates. While the Xcerpt query term a{{ desc b, desc c }}does not simulate into the data term a{ d{ b, c } }, there is an injective embedding fromG1toG2. Embeddings as defined in [Kil92] are also equivalent to minor embeddings (see below). This difference between Xcerpt term simulation and embeddings makes Xcerpt simulation computationally cheaper, without sacrificing much expressivity.

An ordered path inclusion problem is the problem of finding an embedding of an ordered pattern treepinto an ordered data treedthat respects the order of subtrees in the pattern and the child relationship. Such an embedding is called anordered path embeddingofpind, and if there is such an embedding, thepisordered path includedind. In Figure11G₁ is not ordered path included inG2, because there is no way of retaining the child relationship in the embedding. Ordered path inclusion is equivalent toXcerpt([[]])simulation over an arbitrary alphabet and can be solved inO(m·n).

178 x c e r p t q u e r y t e r m s u b s u m p t i o n

Corollary 10. Simulation between an Xcerpt query term q∈ Xcerpt([[]])

In document Linse, Benedikt (2010): Data Integration on the (Semantic) Web with Rules and Rich Unification. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 194-196)