• No results found

4 A Model for Documents and Constraints

4.4 XPath Semantics

e : Expr ::= e1+ e2 | e1− e2 | e1 and e2 | e1 or e2 | not e | e1 = e2 | s | n | p | f unction(e)

We will now give a meaning to this abstract syntax using a denotational semantics and the definition of DOM trees given in the preceding section.

4.4 XPath Semantics

This section introduces a denotational semantics for XPath that defines how paths and expressions are evaluated over a DOM tree. We will take as our input an expression, referred to as Expr in the abstract syntax, a “context node” that defines where the evalu-ation starts, and a variable context that contains bindings of variable names to previously retrieved nodes. We will evaluate the expression to a member of the set Result, which is the set of possible results of evaluating an XPath expression:

Definition 4.11. Result = String ∪ N umber ∪ Boolean ∪ N odeSet.

It will be necessary to convert results of path evaluations to specific data types. For example, when evaluating a predicate on a particular step we need to convert the result

4.4. XPath Semantics 40

of evaluating an expression to a boolean in order to determine whether to include the step’s node in the result. These conversions will be performed by the functions toString, toBoolean and toN umber; we have no use for a function toN odeSet.

Definition 4.12. Conversion Functions

toString : Result → String toN umber : Result → N umber toBoolean : Result → Boolean

We will define toString informally as follows: strings are converted to strings using the identity function; to convert a number the function concatenates the string representation of each digit in the number and inserts a decimal point as appropriate if the fractional part is non-zero, e.g. 5 becomes "5" and 3.2 becomes "3.2"; booleans are converted as follows: > becomes "true" and ⊥ becomes "false"; and finally, node sets are converted by concatenating the result of applying value (the string value, Definition 4.9) to each node in the set.

toN umber is defined as follows: strings that are representations of numbers are converted to the numbers they represents, otherwise they are converted to 0. Numbers are converted using the identity function; booleans are converted as follows: > becomes 1 and ⊥ becomes 0; finally, for a node set n, toN umber(n) = |n|, that is we return the cardinality of the set.

Finally, we will define toBoolean: for any String s, toBoolean(s) = (s = “true”); for any number n, toBoolean(n) = (n 6= 0), therefore toBoolean(5) = > but toBoolean(0) = ⊥;

booleans are converted into booleans using the identity function; and for any node set n, toBoolean(n) = (n 6= ∅), that is the result is true if and only if the set is non-empty. This concludes the definition of results and conversion functions. We now need to introduce variables.

Paths can be relative to variables, where a variable is an identifier for a node set. For example $x/Product would select the union of all Product elements that are children of the nodes in the set of nodes identified by x. In order to handle variable assignment and lookup, we need to define a binding context and the two functions bind and lookup.

Definition 4.13. A binding context is a function ρ : BindingContext = V ariable → N odeSet that maps a variable name to a set of nodes. For a given variable name v, ρ(v) returns the set of nodes to which v has been bound or ∅ if it has not been bound. Since ρ is a function, duplicate bindings for the same variable name are not possible.

Definition 4.14. bind is a function that introduces a new variable binding into a binding

context. It is a higher-order function that takes an existing binding context as a parameter:

bind : V ariable × N odeSet × BindingContext → BindingContext bind(v, n, ρ) = ρ1, where ρ1(v1) = if (v1 = v) then n else ρ(v1)

Definition 4.15. lookup is used to look up variables in a binding context. We introduce this simply to make the semantics more readable:

lookup : V ariable × BindingContext → N odeSet lookup(v, ρ) = ρ(v)

We can now define the main evaluation function S, which takes a path, a context node and a binding context and returns a node set that contains the result of evaluating the path. We will also define the function E , which evaluates an expression given a context node and binding context and returns a Result. The signature of these functions is:

S : P ath → N ode → BindingContext → N odeSet E : Expr → N ode → BindingContext → Result

When defining the functions we will use the semantic notation S[[p]]n,ρ where p is the syntax constract being defined, and the subscript parameters represent the environment necessary for the definition.

The following examples illustrate the definition that follows below. S[[//p]]n,ρ selects all nodes that match the path p anywhere in the tree by making all descendants of the root element the context for evaluating p. The path p1 | p2is evaluated by evaluating p1 and p2 separately and then taking the union of the resulting node sets. p1/p2is a straightforward child axis step and is evaluated by evaluating p1 over the context node, and then making each node in the result of this evaluation the new context node and evaluating p2. S[[p[e]]]n,ρ selects a set of nodes using the path p and then uses the function E [[e]], which we will define next, to filter the resulting node set – only nodes for which the expression e is true remain in the set. S[[s]]n,ρ is a terminal point for the recursion of S and selects all children of the context nodes whose name matches the string s. S[[@∗]]s,pselects all attribute nodes of the context node. The next few definitions provide navigations over the sibling and ancestor axes, and finally S[[.]]n,p selects the context node and S[[..]]n,p selects the parent node of the context node.

4.4. XPath Semantics 42

Next we will define the semantics for E , which evaluates expressions and returns a Result.

This function is used by S to evaluate predicates and we will also make use of it directly in later sections.

E defines an interpretation for a number of different constructs: the arithmetic operators + and − are handled by evaluating the parameters, turning them into numbers, and then applying the usual arithmetic functions; the boolean operators take on the usual boolean logic interpretations; equality comparison is handled by converting both parameters into strings and then comparing the strings for equality; and expressions that are simple paths are evaluated by using S to select a node set.

Function invocation using function(e) is not refined any further at this point. We expect that f unction is a function with the signature f unction : Expr → Result, but for reasons of readability, and because it is not necessary for our discussion, we do not fully introduce the concept of a function context here. We will simply assume that all functions are side effect free – that is, they do not change the document – and terminate. Many of the functions in XPath take more than one parameter, but they all return a Result. The XPath function library includes:

• String manipulation, e.g. substring : (Expr, N umber) → String, which returns the substring of a string (or another result that has been converted to a string) starting from the given index.

• Boolean functions, e.g. true() : Boolean, which always returns >.

• Node set functions, e.g. count : N odeSet → N umber which returns the cardinality of a node set.

• Number functions, e.g. round : Expr → N umber, which rounds the result of con-verting the evaluated expression to a number up to the nearest integer.

Definition 4.17. Expression Evaluation

E : Expr → N ode → BindingContext → Result

E[[e1+ e2]]n,ρ = toN umber(E [[e1]]n,ρ) + toN umber(E [[e2]]n,ρ) E[[e1− e2]]n,ρ = toN umber(E [[e1]]n,ρ) − toN umber(E [[e2]]n,ρ) E[[e1 and e2]]n,ρ = toBoolean(E [[e1]]n,ρ) ∧ toBoolean(E [[e2]]n,ρ) E[[e1 or e2]]n,ρ = toBoolean(E [[e1]]n,ρ) ∨ toBoolean(E [[e2]]n,ρ) E[[not e]]n,ρ = ¬toBoolean(E[[e]]n,ρ)

E[[e1 = e2]]n,ρ = toString(E [[e1]]n,ρ) = toString(E [[e2]]n,ρ)

E[[s]]n,ρ = s

E[[n]]n,ρ = n

E[[p]]n,ρ = S[[p]]n,ρ E[[f unction(e)]]n,ρ = f unction(e)

This concludes our definition of XPath and its associated evaluation semantics. We have defined the two types P ath, which represents a path in a document, and Expr, which combines paths, literals, numbers, boolean operators, arithmetic operators and functions into a simple expression language. The two semantic functions S, which selects a set of nodes given a P ath, and E , which evaluates an expression, define the meaning for these constructs.

In the following section we will extend the definitions of S and E to handle path evaluation over a forest of DOM trees. We will then use these extended definitions in our constraint language semantics.