• No results found

more, the proposed architecture calls for a separate transformation ofall RDF triples that can be extracted from the XML document. Though a common evaluation framework for languages such as XQuery, XSLT, and SPARQL as presented in PartsIItoIVis a first step towards alleviating this problem, current implementations of XSLT or SPARQL do not allow for cross-language optimization.

In the Xcerpt case the arguments are almost inverted. We can employ a single language for the whole use case and can transform triples on- demand. Also, Xcerpt is far more expressive than SPARQL which allows us to express more interesting queries on the transformed tuples. However, Xcerpt is a non-standard query language with only prototypical implemen- tations that is not widely adopted.

Summarizing, the presented system constitutes the first complete im- plementation of the GRDDL use-case and allows to draw the following conclusions:(1)Extracting RDF information from microformats is a non- trivial task and calls for expressive and user-friendly query languages specifically aimed at querying heterogeneous XML data.(2)For usability as well as efficiency purposes it is desirable to have a language that is both capable of extracting the relevant information and of further semantic processing.(3)Although SPARQL is a very well-specified and expected to become the most widely used RDF query language, it lacks some features— most notably grouping which limit its use in our examples. In [63] it is discussed how SPARQL can be extended with more expressive grouping constructs without increase in query complexity. Currently, these limi- tations of SPARQL queries mean that it must be embedded in a more powerful general purpose programming language to solve all the GRDDL use cases.(4)While Xcerpt is still a research prototype, it already shows that versatile, pattern-oriented and rule-based querying has the potential to considerably ease the authoring of data intensive web-applications.

GRDDL is an example of a use case, developed independently of the vision of versatile query languages and Xcerpt as discussed in the previous chapters, that illustrates the need for these approaches. It also underlines, that an evaluation framework capable of integrating different Web query languages as discussed in the remaining parts of this work is called for.

This use case concludes our glimpse at the refinement of Xcerpt’s ver- satile aspect towards Xcerpt 2.0. In the following parts, we first discuss a novel formal foundation, calledCIQLog, for Web query languages that allows us, as discussed in PartIIIto capture the semantics of many diverse Web query languages. Then, we discuss how queries expressed as such can be efficiently evaluated using theCIQCAGalgebra and how that algebra is implemented. Though this discussion, at times, leads us quite far away from Xcerpt, we repeatedly come back to Xcerpt, when discussing the

relation of Xcerpt’s data model to that ofCIQLogin Section5.5, when dis- cussingCIQLogqueries and their relation to Xcerpt or XQuery expressions in Section6.5.3, and finally when considering the translation of (core) Xcerpt queries in Chapter7.

Part II

T H E O RY. A F O R M A L

P E R S P E C T I V E O N W E B

5

DATA M O D E L — R E L AT I O N S O V E R

T R E E S A N D G R A P H S

5.1 Introduction . . . 117 5.2 Data Graphs . . . 119 5.3 XML: Essentials and Formal Representation . . . 122 5.3.1 XML in 500 Words . . . 122

5.3.2 Mapping XML to Data Graphs . . . 124

5.3.3 Transparent Links . . . 125

5.4 RDF: Essentials and Formal Representation . . . 126 5.4.1 RDF in 500 Words . . . 126

5.4.2 Mapping RDF to Data Graphs . . . 127

5.5 Xcerpt Data Terms . . . 129 5.5.1 Xcerpt Data Terms in 500 Words . . . 129

5.6 Relations on Data Graphs . . . 130 5.6.1 Binary Relational Structures . . . 131

5.6.2 A Relational Schema for Data Graphs . . . 132

5.6.3 Properties of Nodes and Edges: Labels and Positions 133

5.6.4 Structural Relations . . . 135

5.6.5 Order Relations . . . 136

5.6.6 Equivalence Relations . . . 137

5.6.7 Inverse and Complement . . . 142

5.6.8 Example relations . . . 142

5.7 Conclusion . . . 143

5.1 I N T R O D U C T I O N

Versatile languages such as Xcerpt are one avenue for addressing the frag- mentation of Web data formats. In this chapter, we start with the second avenue: a uniform, purely logical semantics for many Web query languages, including versatile ones such as Xcerpt. The semantics is provided byCIQLog,

XPath Query

no built-ins, no set op.

CIQLog Translation SPARQL Query no built-ins Xcerpt Program non-recursive, single-rule XQuery Program non-compositional, Core CIQCAG Compilation CIQLog Program CIQCAG Expression Part III Chapter 7 Chapter 8 Chapter 8 Chapter 9 Part IV Chapter 13

Figure 22. Overview of PartsIIIandIVTranslation from Web Query languages to

CIQLogand then toCIQCAG

a variant of datalog with negation and value invention specifically adapted to the Web setting and to be able to handle data of different shapes.

The reason for introducingCIQLogis for it to serve as the uniform se- mantics for many Web query languages, but also to provide a means for evaluating these languages with theCIQCAGalgebra introduced in PartIV. Figure22illustrates this approach and relates it to parts and chapters of this thesis.

Before we turn to the translation of the individual languages, the next two chapters introduce first a uniform data model for Web data and second the languageCIQLogas generalization of common Web languages.

Thedata modelofCIQLogandCIQCAG(as described in Chapter6and PartIV) are arbitrary binary relational structures. To bridge the gap to XML and RDF query languages such as XQuery or SPARQL, we first introduce in this chapter acommon view of Web data as node and edge labeled graphs(Section5.2) together with mappings from XML (Section5.3) and RDF (Section5.4). These mappings are, for the most part, simple and intuitive. On these data graphs we define a set of unary or binary relations for querying the structure of the graph (Section5.6.4), the relative position of edges and labels in that graph (Section5.6.5, the labels of edges and nodes (Section5.6.3), and edge and nodes equivalent wrt. label, position, or structure (Section5.6.6. These relations are closely related to XPath’s axes and other formalizations of relations on XML data, but exhibit a number of distinct features to properly address arbitrary shapes of the underlying data graphs and the effect of edge labels (see, e.g., the definition ofchild- anddescendant-like traversal relations in Section5.6.4). Such relations can then be part of binary relational structures as used byCIQLogandCIQCAG.

Related documents