• No results found

1.3 XML: eXtensible Markup Language

1.3.1 User-Defined Tags

XML is a standard syntax for defining custom tag collections. Unlike HTML, which consists of a fixed set of tags, XML is a meta-language (that is, “a language for defining languages”), which standardizes the syntactic rules whereby users can de- fine their own sets of tags, suited to the needs of a specific application domain.

Awell-formedXML document is a piece of marked-up content that obeys a few syntactic rules:

The document must start with a standard line, declaring the language version, such as: <?xml version="1.0"?>

All tags, called elementsin XML terminology, can enclose some content, which can be text or other tags. XML elements have an opening tag and a closing tag. The latter is obtained by prefixing the opening tag by means of the symbol “/”. Exception to this rules are the tags with no content, which may have no closing tag—but in such a case they must have the “/” symbol at the end of the tag name, as in <emptytag/>. The document must have one root element, and the nesting of elements must be well-formed, which means that any element containing an inner element must not be closed before the closing of the inner element. Elements may have attributes with given values, and attribute values must be delimited by quotes (" ").

The following example presents a short, but well-formed, XML document:

<?xml version="1.0"?> <root>

<child>

<subchild>..some content...</subchild>

</child> </root>

The document starts with a standard line that declares the XML version, and then contains some custom tags. As another example, a fragment of the out- line of this book could be represented in XML as shown in Figure 1.10. As illus- trated in the example, XML elements may have different kinds of content:

Element content:Contains other elements, like the <book>element. Text content:Contains character data, like the <chapter>element.

22 Chapter One: Technologies for Web Applications

Designing Data-Intensive Web Applications

Stefano Ceri, Piero Fraternali, Aldo Bongio, Marco Brambilla, Sara Comai, Maristella Matera

Part I: INTRODUCTION

Chapter 1 Technologies for Web

Applications

PART II: CONCEPTUAL MODELING

Chapter 2 Data Model

Chapter 3 Hypertext Model

Chapter 4 Content Management

Model

Chapter 5 Advanced Hypertext

Model ...

Figure 1.10 An example of XML tags for representing the outline of a book.

<book>

<publishing schedule="10-31-2002"/> <title> Designing Data-Intensive Web Applications </title>

<author> Stefano Ceri </author> <author> Piero Fraternali </author> <author>Aldo Bongio </author> <author> Marco Brambilla </author> <author> Sara Comai </author>

<author> Maristella Matera </author> <part> Technology Overview

<chapter> 1.Technologies for Web Applications </chapter> </part>

<part> Models for Designing Web Applications

<chapter> 2.Data Model </chapter> <chapter> 3.Hypertext Model </chapter> <chapter> 4.Content Management

Model </chapter>

<chapter> 5.Advanced Hypertext Model </chapter>

</part> .. </book>

Mixed content:Contains other elements and/or character data, like the <part>element.

Empty content:No content, like the <publishing>element.

Besides content, XML elements may have attributes, like the scheduleattribute in the <publishing>element.

An XML document may be associated with a Document Type Definition (DTD),prescribing the common format of a class of XML documents. A DTD in- cludes the description of the elements that can be used in the document, and for each element specifies the admissible content and attributes.

A DTD contains three categories of declarations: element, attribute, and en- tity declarations. An element declaration introduces an element and specifies its admissible content; an attribute declaration specifies which attributes can be put inside an element and expresses a few properties of such attributes; an entity de- claration introduces a sort of “constant,” which is a reference to some fixed piece of content. We do not further discuss entity declarations, although we next illus- trate a few examples of element and attribute declarations.

A DTD for structuring documents about books may include element decla- rationslike the ones in the following example:

<!ELEMENT book (publishing, title, editor?, author+, (chapter*|part*))>

<!ELEMENT publishing EMPTY>

<!ELEMENT title (#PCDATA)>

<!ELEMENT editor (#PCDATA)>

<!ELEMENT author (#PCDATA)>

<!ELEMENT chapter (#PCDATA)>

<!ELEMENT part (#PCDATA|chapter)*>

The above rules declare seven elements: book, publishing, title, edi- tor, author, chapter, andpart.Elementbookhas a complex content model: it may contain a sequence of subelements, denoted by the comma-separated list of element names. Specifically, the bookelement must contain one subelement of type publishing,one subelement of type title,zero or one (denoted by the “?” symbol) subelement of type editor, one or more (denoted by the “+” symbol) subelements of type author,and zero or more (denoted by the “*” sym- bol) chapters or parts. Chapters and parts are in alternative(denoted by the “|” symbol): either the book is organized in parts or in chapters. The publishingel- ement has no content (EMPTY), the title, editor, author,andchapterele- ments have text data (PCDATA) as content. Finally, the part element contains zero or more chapters mixed with text data.

Anattribute declaration lists all the attributes that an element may include, and poses some constraints on their values. For example, the attributes of the publishingelement may be declared as follows:

<!ATTLIST publishing

schedule CDATA #REQUIRED

editor CDATA #IMPLIED

format (paperback|hardback) "paperback"

>

TheATTLISTclause introduces three attributes for the publishingelement: schedule, editor, andformat.Thescheduleattribute consists of character data (CDATA) and is mandatory (#REQUIRED). The editorattribute has also char- acter data as a value, but is optional (#IMPLIED). Finally, the format attribute is optional and may have a value chosen from a fixed set of options (paperback, hardback), with paperbackas the default value assigned to the attribute when the user does not include the attribute in the publishingelement.

A document that conforms to a given DTD is said to be validwith respect to that DTD. For example, the document of Figure 1.10 is valid with respect to the DTD expressed by the above clauses defining elements and attributes for describ- ing books.

The DTD can be either placed inside the XML document, or stored in a sep- arate file, as shown by the following example:

<?xml version="1.0"?>

<!DOCTYPE book SYSTEM "book.dtd"> <book>

.. </book>

The line: <!DOCTYPE book SYSTEM "book.dtd">defines the type of the docu- ment by referring to the file book.dtd, where the DTD declarations are stored.

DTDs present several limitations in expressing the structure of documents: they do not allow you to specify data types for the content of elements and attributes other than character data, and are unable to express several useful con- straints on the nesting of elements. To improve the document structure specifi- cation, DTDs can be replaced by XML schema definitions (XSDs).An XML schema definition is an XML document, which dictates the structure of a family of XML documents, using a standard set of tags for element declaration, defined by the XML Schema specification. XML Schema became a recommendation of the World

Wide Web Consortium in May 2001 and is gradually replacing DTDs in those applications that require a more precise description of XML document structure. Figure 1.11 shows an example of an XSD, corresponding to the structure of the XML document of Figure 1.10. Being an XML document, the XSD starts with the XML version declaration (line 1), followed by the <schema>element, which encloses all the element definitions. The xmlns attribute of the <schema> element also imports the definition of the XML Schema tags used to describe the

1.3 XML: eXtensible Markup Language 25

1 <?xml version="1.0"?>

2 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

3

4 <!-- definition of book element-->

5 <xs:element name="book">

6 <xs:complexType>

7 <xs:sequence>

8 <xs:element ref="publishing"/>

9 <xs:element ref="title"/>

10 <xs:element ref="editor" minOccurs="0"/>

11 <xs:element ref="author" maxOccurs="unbounded"/>

12 <xs:choice>

13 <xs:element ref="chapter" minOccurs="0" maxOccurs="unbounded"/>

14 <xs:element ref="part" minOccurs="0" maxOccurs="unbounded"/>

15 </xs:choice>

16 </xs:sequence>

17 </xs:complexType>

18 </xs:element> 19

20 <!-- definition of chapter, title, editor and author elements --> 21 <xs:element name="chapter" type="xs:string"/>

22 <xs:element name="title" type="xs:string"/> 23 <xs:element name="editor" type="xs:string"/> 24 <xs:element name="author" type="xs:string"/> 25

(continued)

document structure. These tags are organized in a so-called XML namespace,spec- ified in the document http://www.w3.org/2001/XMLSchema. All the tags belonging to the same namespace have a name starting with a common prefix; in the case of XML Schema tags, the prefix is xs.

The<schema>element encloses the definition of the element types for de- scribing books. The root element <book>(lines 4–18) is declared as a complex type, because it contains several subelements (publishing, title, editor and so on). The <sequence>element inside the declaration of element <book> specifies the required order of the nested subelements (publishing, followed by title, and so on). The <choice>tag is used for specifying that an element can

26 Chapter One: Technologies for Web Applications

26 <!-- definition of part element–> 27 <xs:element name="part" mixed="true">

28 <xs:complexType>

29 <xs:element ref="chapter" minOccurs="0" maxOccurs="unbounded"/>

30 </xs:complexType>

31 </xs:element> 32

33 <!-- definition of publishing element --> 34 <xs:element name="publishing">

35 <xs:complexType>

36 <xs:attribute name="schedule" type="xs:date" use="required"/>

37 <xs:attribute name="editor" type="xs:string"/>

38 <xs:attribute name="format" default="paperback"/>

39 <xs:simpleType> 40 <xs:restriction base="xs:string"> 41 <xs:enumeration value="paperback"/> 42 <xs:enumeration value="hardback"/> 43 </xs:restriction> 44 </xs:simpleType> 45 <xs:attribute/> 46 </xs:complexType> 47 </xs:element> 48 </xs:schema> Figure 1.11 (continued)

TEAM

FLY

Team-Fly

®

contain one of a set of subelements; in the example, element <book>may con- tain as subelement either chapterorpart. For each element, occurrence indi- cators define how often an element can appear. In particular, the maxOccurs attribute specifies the maximum number of times an element can occur, whereas theminOccurs indicator specifies the minimum number of times an element can occur. The default values of the occurrence indicators are 1.

Elementschapter, title, editor,andauthor(declared at lines 21–24) are of type string, one of the basic types provided by XML schema, which also includes the decimal, integer, Boolean, date, and time types. Element part(lines 27–31) may contain both plain text and chapterelements: this feature is speci- fied by setting the mixedattribute to true (line 27), and by defining the nontex- tual subelement of part(lines 28–30).

Finally, element publishing(lines 33–47) has empty content and three at- tributes. The three attributes are declared in the same way as elements; they are grouped into the definition of a complex type, which does not include the <sequence> element, to denote that the attributes can occur in any order. At- tributescheduleis of type date and is required; editoris string-typed and op- tional (which needs not be explicitly specified); the type of attribute formatis a string, whose content is restricted to a set of predefined values, enumerated in- side the <restriction>tag (lines 40–43). The acceptable values are “paperback” and “hardback”, with “paperback” as the default (line 38).

Like DTDs, XML schemas can be placed inside the XML document or in a separate file, referenced inside the document.