4.3 Query Terms: Patterns for Selecting Data
4.3.1 Incompleteness
As discussed in Section 1.3.4, query patterns need to support incomplete query specifications, because data represented on the Web has a much more flexible schema than data represented e.g. in relational databases. Query terms may contain constructs for expressing incompleteness in breadth, in depth, with respect to order, and with respect to optional subterms. The terms “breadth” and “depth” refer to the graph induced by a data term or semistructured expression (cf. Sections 2.5 and 4.4.1). Note that the constructs described here together realise requirement 4 (“no schema required”) of David Maier’s database desiderata (cf. Section 3.2).
Incompleteness in Breadth: Partial Term Specifications
Incompleteness in breadth (i.e. within the subterms of the same parent term) is expressed by using so-called
partial and total term specifications:
• double square or curly braces (i.e.[[ ]]or{{ }}) denote partial term specifications, i.e. a data term matched by the query term may contain additional subterms not matched by subterms of the query term.
• single square or curly braces (i.e.[ ]or{ }as in data terms) denote total term specifications, i.e. a data term matched by the query term must not contain additional subterms that are not matched by subterms of the query term.
Consequently, a data term that is used as a query term matches only itself (and all such terms that are equiv- alent with respect to subterm ordering in case of unordered term specifications), whereas a query term con- taining partial term specifications matches possibly infinitely many data terms. As with ordered/unordered term specifications, subterms with different term specifications may be nested, but nesting within the same list of subterms is disallowed.
4.3. QUERY TERMS: PATTERNS FOR SELECTING DATA
Example 4.5 (Total/Partial Term Specifications)
Consider thebib.xml document of the bookstore example from Section 2.4.2. The following two are query terms for this database:
bib { book {
title { "Boken Om Vikingarna" } }
}
This query term does not match with the data term, as its total term specification requires that there is exactly one book with exactly onetitleelement.
bib {{ book {{
title {{ "Boken Om Vikingarna" }} }}
}}
This query term will match with the data term, as it allows for additional books and additional elements inside thebookelement.
Incompleteness wrt. Order: Unordered Term Specifications
Like data terms, query terms may contain both ordered term specifications (square brackets [ ] and
[[ ]]), and unordered term specifications (curly braces { }and {{ }}). Let t1 be a query term and
let t2be a data term:
• if t1 has an ordered term specification, then it matches with t2 only if t2also has an ordered term
specification. Furthermore, all subterms of t1must match subterms in t2in the same order of appear-
ance.
• if t2 has an unordered term specification, then it matches with t2, if t2has either an ordered term
specification or an unordered term specification. All subterms of t1must match subterms in t2 in
arbitrary order.
In case a query term uses ordered and partial term specification, the matched data term has to contain corresponding subterms in the same order as the subterms of the query term, but there may be additional subterms in between.
Example 4.6 (Ordered/Unordered Term Specifications)
Consider thebib.xmlexample of Section 2.4.2. Recall that in this example the list of authors for each book uses an ordered term specification. The following two query terms show the difference between ordered and unordered term specifications in query terms:
bib {{ book {{ authors [[ author { first [ "Bj¨orn" ], last [ "Ambrosiani" ] }, author { first [ "Sven" ], last [ "Nordqvist"] } ]] }} }}
Match with all books where the author “Bj¨orn Ambrosiani” appears before the author “Sven Nordqvist”.
This query term does not match with the data term, as the authors in the database do not have the same order as in the query term.
bib {{ book {{ authors {{ author { first [ "Bj¨orn" ], last [ "Ambrosiani" ] }, author { first [ "Sven"], last [ "Nordqvist"] } }} }} }}
Match with all books that have (at least) the two authors “Bj¨orn Ambrosiani” and “Sven Nordqvist” in any order.
This query term will match with the database, as the query term does not enforce a particular order on authors.
CHAPTER 4. XCERPT
Incompleteness in Depth: Descendant
Incompleteness in depth is expressed using thedescendantconstruct. A query term of the formdesc t
(read: “descendant t”) matches with all data terms that contain a subterm that is matched bytat an arbitrary depth (including zero). It is the counterpart to the Kleene star operator of regular path expressions and to XPath’s descendant (in short notation://) construct (cf. Section 3.3.1).
Example 4.7 (Descendant)
The following query term matches with a text document (like the one introduced in Section 2.4.3), if at arbitrary depth below the root term, the data term representing the text document contains a section
term with atitlesubterm containing the string “Data Terms”, i.e. either a section, a subsection, a sub- subsection, etc.
report {{
desc section {{
title {{ "Data Terms" }}, }}
}}
Currently, the descendant construct is unrestricted, i.e. it “matches” with any path. Extensions are being considered that allow restrictions to these paths, e.g. using regular expressions over labels, or sets of admissible term labels.
Incompleteness wrt. Optional Subterms: Optional
Terms containing a subterm of the formoptional t specify to match the subtermtwith a subterm of the data term if possible (and yield variable bindings for the variables intaccordingly); otherwise, the evaluation of the query does not fail, but does not yield any bindings for the variables int.
Example 4.8
Consider in the following the student database example introduced in Section 2.4.1. The following query term retrieves student names (variableName) and student ids (variable MatrNr). If both exist, both are returned. If only the name exists, the evaluation does not fail (i.e. the query term still matches), but binds only the variableName. If there is no name in the data term, the query term fails to match it.
students {{ student {{
name { var Name },
optional matrnr { var MatrNr } }}
}}
The constructoptionalis not strictly necessary as the same queries can be expressed by using sev- eral query terms instead of only one. However, it is a convenient construct in many practical examples of semistructured databases and XML documents, as the schema languages of such formats often allow optional elements.