• No results found

4.6 Construct Terms: Patterns for Constructing Data

4.6.2 Grouping and Sorting: all and some

It is often desirable to collect all bindings for a variable in a single answer term. The grouping constructs

allandsomeserve this purpose:

• allgroups all possible instances of the enclosed subterms resulting from different variable bindings as children of the enclosing term. At least one instance has to exist, and the number of instances always needs to be finite (otherwise the program does not terminate).

• somegroups non-deterministically some of the possible instances of the enclosed subterms resulting from variable bindings as children of the enclosing term. Some is quantified by a number which restricts the (maximum) number of alternatives to use. At least one instance has to exist.

The requirement that there has to exist at least one instance in both grouping constructs may seem unin- tuitive. However, a construct term can only be evaluated if the rule it is part of “fires”, i.e. the query part succeeds and thus yields at least one substitution for the variables occurring in the query. If this behaviour is not desired, the grouping constructs can be combined withoptional(see below).

Example 4.26 (Grouping Constructs)

Consider again the substitutions of Example 4.25. The following construct term creates a list ofresult

subterms (one for each title/author combination from the substitutions) below aresultsterm using the

all-construct to collect all instances:

results {

all result { var TITLE, var AUTHOR } }

The result of applying the substitutions to this construct term might be the following data term (compare with the set of data terms from Example 4.25):

4.6. CONSTRUCT TERMS: PATTERNS FOR CONSTRUCTING DATA

results { result {

title { "Vikinga Blot" },

author { last { "Ingelman-Sundberg" }, first { "Catharina" } } },

result {

title { "Boken Om Vikingarna" },

author { last { "Ingelman-Sundberg" }, first { "Catharina" } } },

result {

title { "Folket i Birka p˚a Vikingarnas Tid" }, author { last { "Wahl" }, first { "Mats" } } },

result {

title { "Folket i Birka p˚a Vikingarnas Tid" }, author { last { "Nordqvist" }, first { "Sven" } } },

result {

title { "Folket i Birka p˚a Vikingarnas Tid" }, author { last { "Ambrosiani" }, first { "Bj¨orn" } } }

}

Formally,all torsome n t denote the grouping of all or some instances of tobtained from all possible bindings of the variables that are free in the term t. Subterms of tthat again have the form

all t’orsome n’ t’are recursively evaluated in the same manner (see below). A variable is free in a (sub)termt, if it (1) occurs int, and (2) is not in the scope of another, nested grouping construct. E.g. in the term

results {

all result { var TITLE, var AUTHOR } }

both variablesTITLEandAUTHORare not free, since they are in the scope of anallconstruct. In the term

results {

result { all var TITLE, var AUTHOR } }

the variableAUTHOR is free, whereas the variableTITLEis not free. A variable is said to be free for a

grouping construct, if it is free in the term enclosed by the grouping construct. E.g. in the termall t, all variables that are free intare free for the outermostall. All free variables in a construct term need to have the same binding in each of the substitutions that are used for grouping.

Example 4.27

Consider a slightly modified variant of the previous construct term. Note that only the variableAUTHORis in the scope of theallconstruct, while the variableTITLEis free.

result { var TITLE, all var AUTHOR }

The result of applying the set of answer substitutions of Example 4.25 to this construct term is the following set of data terms:

CHAPTER 4. XCERPT

result {

title { "Vikinga Blot" },

author { last { "Ingelman-Sundberg" }, first { "Catharina" } } }

result {

title { "Boken Om Vikingarna" },

author { last { "Ingelman-Sundberg" }, first { "Catharina" } } }

result {

title { "Folket i Birka p˚a Vikingarnas Tid" }, author { last { "Wahl" }, first { "Mats" }, author { last { "Nordqvist" }, first { "Sven" }, author { last { "Ambrosiani" }, first { "Bj¨orn" } }

Note that each of the three resulting data terms uses only one binding for the variableTITLEof the construct term, but groups possibly several bindings of the variableAUTHOR. In each instance (i.e. data term), the grouping construct groups together substitutions that have the same binding forTITLE. As there exists only one substitution for each of the titles “Vikinga Blot” and “Boken Om Vikingarna”, the grouping construct only groups a single substitution in the first two data terms. In the third data term, three substitutions are grouped (each having the same binding forTITLE, but a different binding forAUTHOR).

The grouping constructsallandsomeare similar to the so-called collection constructs{.}and[.]in XMAS [72] and to the grouping construct{.}in XML-RL [70].

Nesting of Grouping Constructs

Grouping constructs may be nested to perform more complex restructuring tasks. Recall that a term of the formall tcollects all instances oftwith different bindings for the free variables int. Iftcontains nested grouping constructs, each instance oftis further grouped according to the nested grouping constructs. For example, the construct term

results {

all result {

all var TITLE,

var AUTHOR

} }

creates for each binding of the variableAUTHOR(i.e. the variable that is free for the outerall) an instance of the subtermresult. In each instance, the innerallcollects all instances of the variableTITLE(that are part of substitutions with the same binding forAUTHOR). Thus, the construct term creates a list of book titles for each author, and groups theresultsubterms below aresultsterm. Likewise, the construct term

results {

all result {

var TITLE,

all var AUTHOR

} }

lists for each book title all authors. Intuitively, nested grouping constructs are similar to nested iteration constructs in imperative languages (likefororwhileloops), where the inner loop performs a complete run for each iteration of the outer loop. Note, however, that nested grouping constructs do not compute the

4.6. CONSTRUCT TERMS: PATTERNS FOR CONSTRUCTING DATA

“cross-product”, but instead have to respect the different answer substitutions: in the example above, every result elements contains a book title, and only the authors of that book, whereas the cross-product would list for each result also the authors of other books. If it is desirable to compute the cross-product, it is necessary to appropriately modify the query/query term such that it selects titles and authors independently.

Explicit Grouping: group by

In many cases, it is desirable to group by variables whose values should not appear in the result, wherefore they are not part of the subterm that is enclosed by a grouping construct. For example, a construct term might group resulting instances based on the position of a row in an HTML table while not including this position (i.e. the integer number) in the result. While this result could be achieved by using several rules (one for creating the result and one for filtering out superfluous parts), this solution is very cumbersome. For this reason, the grouping constructsallandsomemay be accompanied by agroup byclause containing the (additional) variables by which the instances are grouped. Such clauses have the form

all <subterm> group by { <variables> }

or

some <n> <subterm> group by { <variables> }

where<n>is the maximum number of instances forsome,<subterm>is the subterm of which instances are created, and<variables>is a comma-separated list of variables. All these variables are considered to be part of the free variables of the subterm enclosed by the grouping construct and thus used for grouping, regardless of whether they appear in<subterm>or not..

Example 4.28 (Explicit Grouping)

Consider an HTML table, the cells containing arbitrary values. The following query term retrieves all cell values, together with row and column number:

desc table {{

position var Row tr {{

position var Col td { var Value } }}

}}

Now assume that the table should be “transposed”, i.e. rows and columns are exchanged. The following construct term creates such a transposed table. Since the positions are necessary for grouping but should not be included in the resulting data term, it usesgroup byfor this purpose:

table [

all tr [

all td [ var Value ] group by { var Row } ] group by { var Col }

]

The construct term is evaluated as follows: For each different binding ofCol(Colis the only free variable in the scope of the outerall), an instance oftr [ ... ]is created. Within each instance, the innerall

creates an instance oftd [ ... ]for each different binding ofRow(within the set of substitutions having the same binding forCol).

Sorting: order by

The grouping constructsall and some create sequences of subterms in arbitrary order (although they should try to return results in the same order in which the corresponding subterms appear in the original

CHAPTER 4. XCERPT

sources, if possible). In order to sort the resulting sequence according to the bindings for certain variables, the grouping constructsallandsomemay be augmented by a sorting specification. Sorting specifications are very similar to explicit grouping and have the form

all <subterm> order by (<comparison>) [ <variables> ]

or

some <n> <subterm> order by (<comparison>)[ <variables> ]

where<n>is the maximum number of instances forsome,<subterm>is the subterm of which instances are created, and<variables>is a comma-separated list of variables.<comparison>is the name of the comparison function to be used in sorting. Comparison functions take as arguments two lists of terms (representing two different substitutions for the variables in<variables>) and return a value indicating whether the first list is less than, equal to, or greater than the second list. The current prototype runtime system (cf. Appendix A) supports the two exemplary comparison functionslexicalandnumeric(both in ascending order); further comparison functions may be programmed natively in the implementation language of the prototype (i.e. Haskell).

The list of variables influences the grouping in two ways: (1) instances are grouped as if the variables occurred in agroup byclause (i.e. are considered part of the variables free for the grouping construct) and (2) the instances are sorted on the bindings of the variables in the list using the specified compar- ison function. In the two exemplary functions, sorting is performed primarily with respect to the first variable in the list and more specific for each of the following variables. For instance, a variable list

[var Last,var First]would specify to sort primarily by the last names, and within instances with the same last name sort by the first name.

Example 4.29

Sort the list of books by the book titles in ascending lexical order:

results {

all result { all var Author, var Title } order by (lexical) [ var Title ] }

Example 4.30

Consider the following query term (evaluated against the XML document representing the data of bookstore A in Section 2.4.2):

bib {{ book {{

var Title → title {{ }}

var Author → author {{ var First → first {{ }}, var Last → last {{ }} }} }}

}}

The following construct term creates a list of authors for each book title. Authors are sorted by last name and then by first name. Note that grouping is performed on the variable Author, as well as the variablesLastandFirst.

results {

all result {

all var Author order by (lexical) [var Last, var First],

var Title

} }

4.6. CONSTRUCT TERMS: PATTERNS FOR CONSTRUCTING DATA

Comparison with GROUP BY and Aggregations in SQL

Xcerpt’s grouping constructs are very similar toGROUP BYclauses in SQL [6], which allow to group results with the same bindings on the specified variables into a combined representation. In SQL,GROUP BYis usually used in conjunction with an aggregation function over some of the variables not used for grouping. However, grouping in Xcerpt differs from grouping in SQL in several aspects:

grouping is part of the construction instead of the query

• grouping without aggregation functions is necessary, as Xcerpt, unlike SQL, allows complex tree structures instead of flat tuples.

grouping constructs have a scope; therefore, it is in most cases not necessary to explicitly specify the variables used for grouping. Instead, all free variables in the scope (i.e. enclosed subterm) are implicitly used.

grouping constructs can be nested; a nested grouping construct is very similar to an aggregation function that creates a term sequence.

In relational databases, nesting of grouping constructs would create results that are in non-first normal form, i.e. tuples that are not flat, which is usually not permitted. In Xcerpt, nesting is possible (and desirable) because the data is tree-structured in the first place.

Example 4.31

Consider a relationScores(Student,ExerciseNr,Score)used for storing exercise results of students. To keep the example simple, it is assumed that the first attribute in a tuple (Student) holds the student name. The following table represents the data from Section 2.4.1):

Scores Student ExerciseNr Score

Donald Duck 1 15

Donald Duck 2 7

Mickey Mouse 1 3

Mickey Mouse 3 14

Goofy 2 13

To sum up the totals for each student in SQL, one usually groups on the attributeStudentand aggregates (for each student) over the attributeScore. The attributeExerciseNris ignored:

SELECT Student, sum (Score) FROM Scores GROUP BY Student

In Xcerpt, the same result would be created with the following construct term using nested grouping constructs and an aggregation function (aggregations in Xcerpt are introduced in Section 4.6.3 below). Note that althoughgroup byis used in this construct term, it could be omitted because the variableStudent

already appears inside the outeralland thus is used implicitly for grouping:

totals {

all score {

name { var Student },

total-score { sum (all var Score) } } group by { var Student }

CHAPTER 4. XCERPT