• No results found

4.3 Query Terms: Patterns for Selecting Data

4.3.2 Term Variables, Label/Namespace Variables and the → Construct

Variables act as “handles” for those subterms of the data term that match with the subterm the variable is “attached to”. If a query term matches with a data term, the variables are bound to the corresponding subterms. They can thus be used to retrieve data from a data term and assemble it in a new structure (with the help of construct terms, Section 4.6 below). As in logic programming, a single variable can occur at

4.3. QUERY TERMS: PATTERNS FOR SELECTING DATA

several positions in a term. Of course, bindings to such variables have to be consistent for all occurrences, i.e. all occurrences of the same variable must have the same binding.

Matching a query term with a data term yields a set of alternative substitutions, each of which represents a possible binding for the variables in the query term such that the resulting ground instance matches with the data term (see Section 4.4 below). Obviously, the use of unordered and partial term specifications allows several alternative bindings for the variables that all fulfill this requirement.

In Xcerpt query terms, the following variable notions are used:

Variables without restriction are expressed using the keywordvarfollowed by an identifier (variable name). They can be bound to any subterm in the data term and are thus very similar to the variables in logic programming, i.e. they act as place holders.

Variables with restriction are expressed like a variable without restriction followed by the symbol->

or→(read “as”) and a query term. They can only be bound to subterms of the data term that match with the pattern they are restricted to. Note that variable restrictions are also used in the language XMAS (cf. Section 3.3.4).

Label Variables are, like variables without restrictions, expressed by using the keywordvarfollowed by an identifier, but they occur at the position of a label in a query term. They can be bound to any label of a subterm of the data term that matches with the remaining term specification.

Namespace Variables are similar to label variables. They occur at the position of a namespace

prefix in a query term. Namespace variables are always bound to the namespace URI/IRI, not to the namespace prefix.

Note that in logic programming, variable restrictions are represented using external constraints. The advantage of constraining a variable to certain subterms within a query term instead of outside the query is to better convey the overall structure of the considered query. Arguably, restricting variables inside the query term more appropriately realises the concept of query patterns.

Example 4.9 (Substitutions)

In the student database (Section 2.4.1), the query term given on the left hand side matches the variable

Namewith the student name and variableEmailwith the email address. The right hand side lists different substitutions that yield ground instances of the query term that match with the data term given in Figure 2.3 on page 34.

students {{ student {{

name {{ var Name }} email {{ var Email }} }}

}}

Substitutionσ1:

Name Donald Duck Email [email protected]

Substitutionσ2:

Name Mickey Mouse Email [email protected] Substitutionσ3: Name Goofy Email [email protected] Substitutionσ4: Name Goofy Email [email protected]

Note in particular thatGoofyis listed twice as the data term contains two possible email addresses that can be bound to the variableEmail.

Example 4.10 (Pattern Restrictions)

The following query terms for thebib.xml database of Section 2.4.2 illustrate the difference between variables without and with pattern restrictions.

CHAPTER 4. XCERPT

bib {{ book {{

var X,

authors {{ var AUTHOR }} }}

}}

In this query term, the occurrence of the variable

X is unrestricted. Thus, the variableX might be bound to any subterm of thebookelement (besides

authors), e.g. topriceortitle, since the vari- ableXoccurs without restriction.

bib {{ book {{

var X → title {{ }}, authors {{ var AUTHOR }} }}

}}

In this query term, the occurrence of the variableX

is restricted to such subterms that are matched by the query termtitle {{ }}. Thus, the variableX

can only be bound to thetitleelement.

The use of the keywordvarto introduce a variable is not strictly necessary. It is often possible to determine from the context whether a term is a variable or not. In particular, extensions of the Xcerpt syntax are investigated that allow to declare variables in a context block. However, using the keyword

varsimplifies the syntax in particular for the programmer, as it allows to easily identify variables without having to look at the context.

Label variables are useful to retrieve structural information that is unknown in advance, e.g. when trans- forming an XML document into an HTML representation displaying the structure of the XML document (as e.g. in the implementation of the visual language visXcerpt [14, 16, 15]).

Example 4.11 (Label Variables)

Consider the student database of figure 2.3. The following query term retrieves the label of the element containing the string “Goofy” in the variableX:

students {{ student {{

var X {{ "Goofy" }}, }}

}}