Challenges from XML and its Query Languages

Q1andQ2 0

Q2. Q2 0

becomes a maximally-contained rewriting if (1) Q2 0

is a contained rewriting ofQ2, and (2) there is no contained rewritingQ2

00 ofQ2such thatQ2 0 Q2 00 . If a contained rewritingQ2 0 ofQ2satisfiesQ2 0 Q2, then it is an equivalent rewriting ofQ2.

Researchers have considered the algorithms for answering queries using views in the context of several query languages. There has been ex- tensive work on finding the equivalent rewritings for SQL queries (e.g., [YL87, SJGP90, TSI94, CR94, CKPS95, LMSS95]). Yang and Larson [LY85, YL87] considered the problem of finding rewritings for SPJ queries using SPJ views. There is another work considering queries and views with binding patterns [RSU95], with grouping and aggregation [GHQ95, SDJL96, GBLP96], and with multi-block queries [ZCL+

00].

2.2 Challenges from XML and its Query Languages

With the advent of semi-structured data on the Web, interest was resur- rected in the containment and rewriting problem for queries that target such schema-less data. We first introduce this new Web data model, XML, and its query languages. Then we give a survey on the complexity results for the containment of XPath queries, a simple but common sub-language of many existing XML query languages.

2.2.1 Background on XML

Extensible Markup Language (XML) [W3C98] is a hierarchical data format for information representation and exchange on the Web. An XML docu-

ment presents a nested element structure with each element containing an ordered list of attributes and/or sub-elements. The example XML document in Figure 2.1 shows that it contains information about a book. In this example, there is a book element that has the three sub-elements: title, author and price. This book element also has a year attribute with value 1990. The author element further contains a last element and a first element, which encapsulate the values of the last and first names respectively.

<bib>

<bookyear=\1990 00

<title>DataWarehousingTe hnologies<=title> <author>

<last>Dayal<=last> <first>M:<=first> <=author> <pri e>59:99<=pri e> <=book> <bookyear=\1992 00 >

<title>TCP=IP Illustrated<=title> <author>

<last>Stevens<=last> <first>W:<=first> <=author>

<publisher>Addison Wesley<=publisher> <pri e>39:99<=pri e>

<=book> <=bib>

Figure 2.1: The Example XML Document: bib.xml

DTDs (Document Type Definitions) [W3Ca] for XML, which are inher- ited from the schema mechanism for SGML (Standard Generalized Markup

2.2. CHALLENGES FROM XML AND ITS QUERY LANGUAGES 45

Language) [W3Cb], can be used to define content models (the valid order and nesting of elements) and to a limited extent the data types of attributes. Figure 2.2 shows the DTD for the example XML document in Figure 2.1. The example DTD describes the possible arrangement of tags in a valid XML document.

<?xmlversion=\1:0 00

?> <!DOCTYPE bib [

<!ELEMENT bib (book)>

<!ELEMENT book (title; (author+jeditor+); publisher?; pri e)> <!ATTLIST book year CDATA #R EQUIR ED>

<!ELEMENT author (last; first)>

<!ELEMENT editor (last; first; affiliation)> <!ELEMENT title (#PCDATA)>

<!ELEMENT last (#PCDATA)> <!ELEMENT first (#PCDATA)> <!ELEMENT affiliation (#PCDATA)> <!ELEMENT publisher (#PCDATA)> <!ELEMENT pri e (#PCDATA)> ℄>

Figure 2.2: The Example DTD: bib.dtd

2.2.2 Background on XML Query Languages

A crucial enabling technique of many XML applications is the XML query languages. In the last few years, there has been a great deal of research [AQM+

97, CJS99, P. 00, FFK+

98] into semi-structured query languages to enable the execution of database-style queries over these XML files. How- ever, some of these languages have not initially been designed to query XML documents but rather each has its own system proprietary data model.

For example, Lore bears the flavor of OQL and extends it using path-expression to query data modelled in OEM [PMW95]. StruQL [FFK+

98] models the data as a graph and applies a pattern-matching query paradigm to achieve some graph transformation. YATL (YAT Language) [CJS99] is also based on a pattern-matching paradigm adopting operators from object-oriented algebras as well as Bind and Tree operators. Even though all these language proposals are claimed to be extensible to query XML documents, there still exist mismatches between them and the XML query data model [W3C00]. Hence these languages are not that suitable for querying the Internet pop- ulated with XML data.

More recently, several XML-oriented query languages including XML- QL [DFF+

99b] and XQL [RLS] (from Microsoft) have been proposed. Sim- ilar to Lore [QWG+

96] and UnQL [P. 00], they all have a notion of path expressions for navigating the nested structure of XML. For example, XML- QL uses a nested XML-like structure to specify the parts of a document to be selected and specifies the structure of the result XML document using a result template. Introduced first at the W3C’s conference on XML query languages in 1998, XQL has since been implemented by several large IT vendors. It now can be seen as the precursor and the core part of XPath [W3C03b]. However, neither XQL nor XPath is a full-blown query language. They lack features like variable binding, joins across documents and restructuring.

Introduced in 2000, Quilt [CRF00] quickly gained intensive attention as an expressive XML query language featuring nested FLWR expressions, a modernized SQL-ish construct, on top of XPath. Shortly after, W3C pro-

In document Semantic Caching for XML Queries (Page 56-60)