Davide Eynard
04 – XML Schemas
XML: recap and evaluation
During last lesson we saw the basics of XML...
Tree structure
Elements and attributes Content vs presentation
... And the basics of XML evaluation
Well-formedness (just syntax) Validity wrt a schema
Why do we need a schema?
XML can be used to describe different data and is totally
unaware of what you are speaking about You can check if the syntax is right...
... but you cannot constrain its usage in any way!
Example: <person> <firstName>John</firstName> <lastName>Doe</lastName> <SSN>123-45-6789</SSN> <SSN>987-65-4321</SSN> </person>
SSN should be unique, but a simple check on syntax would not find errors in this code
What does a schema do?
A schema allows you to define all the elements and attributes
that can be used inside an XML document
Moreover, you can add constraints specifying:
Which are the children of a particular element In which order they appear
How many children an element can have If the element is empty or contains text Datatypes for elements and attributes
Default values for elements and attributes
Given this information, an XML document conforming to a
given schema can be validated
The document is valid if it is well-formed and it follows the structure given inside the schema
Validation can be done automatically by any tool which “understands” the schema language.
DTD vs XML Schema
There are many ways of defining the structure of an XML
document
i.e. DTD, XML Schema, RELAX NG, Schematron, ...
DTD and XML Schema are the most used ones, but XML
Schema is having more success (W3C recommendation, 2001) because:
it is written with XML syntax it supports datatypes
it supports namespaces
it supports inheritance and data type extension
Using a DTD
Writing a DTD is just as easy as writing another text file, but
how can we use a DTD?
How can we say a file should follow a schema?
How can we use this information to validate the file?
To match a document with a DTD, we should add the
following to the xml prolog:
<! DOCTYPE rootelement SYSTEM “dtdlocation”>
To test it, we can use online validators or validating editors Example:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE messages SYSTEM "./messages.dtd"> <messages>
<message msgid="1"> <from>
...
DTD Elements – 1
An element can declared in the following way:
<!ELEMENT element-name category> or
<!ELEMENT element-name (element-content)>
Category = EMPTY
<!ELEMENT br EMPTY> <br/>
Elements containing only a sequence of characters
<!ELEMENT element-name (#PCDATA)>
Elements containing any mixture of text and other elements
<!ELEMENT element-name ANY>
Elements containing one or more children elements
<!ELEMENT element-name (child1, child2, ...)>
Follows the specified order!
DTD Elements – 2
Example: <text-message> <from>+393357654321</from> <to>+393471234567</to> <text>Hi there!</text> </text-message><!ELEMENT text-message (from, to, text)> <!ELEMENT from (#PCDATA)>
<!ELEMENT to (#PCDATA)> <!ELEMENT text (#PCDATA)>
DTD Elements – 3
Disjunction:
<!ELEMENT email_header (cc|bcc)> <!ELEMENT cc (#PCDATA)>
<!ELEMENT bcc (#PCDATA)>
We can use disjunction to specify subelements in generic
order:
<!ELEMENT email_header ((from,to)|(to,from))>
... What if we have 10 subelements?
Cardinality:
<!ELEMENT email_header (from,to+,cc*,subject)>
? zero times or once * zero or more times + one or more times
DTD Attributes – 1
Attributes are defined in a DTD with an attribute list:
<!ATTLIST element-name attr-name attr-type value-type>
For each attribute you have to define:
The name of the element it is related to Its name
Its type
Its value type
DTD Attributes – 2
Example: <text-message from=”+393357654321”> <to>+393471234567</to> <text>Hi there!</text> </text-message><!ELEMENT text-message (to, text)> <!ATTLIST text-message
from CDATA #REQUIRED> <!ELEMENT to (#PCDATA)> <!ELEMENT text (#PCDATA)>
DTD Attributes – 3
Attribute types:
CDATA, a string
ID, a name that is unique across the XML document IDREF, a reference to another element with the ID
attribute
IDREFS, a sequence of IDREF
(v1|...|vn), an enumeration of all possible values i.e. weekday (monday|tuesday|...|sunday)
Limitations
No dates No numbers No booleans
DTD Attributes – 4
Attribute value types:
#REQUIRED (attribute must appear in every occurrence of the element type in the XML document)
#IMPLIED (the appearance of the attribute is optional) #FIXED “value” (every element must have this attribute
with this value)
<!ATTLIST html xmlns CDATA #FIXED 'http://www.w3.org/1999/xhtml'>
“value” (specifies the default value for the attribute)
<!ATTLIST car color (red|white|blue) “red”>
From DTD to XML Schema
Main differences between XML DTD and XML Schema:
XML Schema's syntax is based on XML itself (you can use the same tools for XML documents and schemas!)
It allows the reuse of existing schemas (inheritance) and their refinement (extension)
It supports more specific datatypes It supports namespaces
Note:
XML Schema is also called XML Schema Definition (XSD)
Namespaces
Elements in XML files can be defined by the developers
What if two developers use the same name for different kinds of elements? Example: <table> <tr> <td>Apples</td> <td>Bananas</td> </tr> </table> <table>
<name>African Coffee Table</name> <width>80</width>
<length>120</length> </table>
Namespaces definition
We need a way to specify that element names come from two
different contexts
we put a prefix before element names
we specify what namespace that prefix represents
<h:table xmlns:h = "http://www.w3.org/TR/html4/"> <tr> <td>Apples</td> <td>Bananas</td> </tr> </h:table> <f:table xmlns:f = ”http://my.name.space/furniture/”> <name>African Coffee Table</name>
<width>80</width> <length>120</length> </f:table>
Root and default namespaces
You can also define all the namespaces you are going to use
in the root element of your XML document:
<root xmlns:h = "http://www.w3.org/TR/html4/" xmlns:f = ”http://my.name.space/furniture/”> <h:table>...</h:table> <f:table>...</f:table> </root>
If the xmlns attribute is not followed by a prefix, then the
specified namespace is considered as the default one
<html xmlns="http://www.w3.org/1999/xhtml">
Documents using XML Schema
How is the prolog of XML documents using an XML Schema? <?xml version="1.0" encoding="UTF-8"?>
<messages
xmlns = "http://my.name.space"
xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation = "http://my.name.space ./messages.xsd"> <message msgid="1"> <from> ... ... </messages> 18
XML Schema opening tag
An XML schema is an XML document whose root element is
called schema and is defined like follows:
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema" targetNamespace = "http://my.name.space" xmlns = "http://my.name.space" elementFormDefault = "qualified"> • Source, • target, • and default ns Notes:
the xs:schema element is the root of every XML schema qualified = “Associated with a namespace, either by the
use of a declared prefix or via a default namespace declaration”. More details here
The four constructs of XML Schema
XML Schema is built on four constructs:
A simple type definition defines a family of text strings (Unicode)
A complex type definition defines a collection of
requirements for attributes, sub-elements, and char data An element declaration associates an element name with
either a simple type or a complex type
An attribute declaration associates an attribute name with a simple type (attributes always contain unstructured text)
XML Schema Elements and Types
To declare an element (equivalent to <!ELEMENT> in a DTD)
you have to use the “element” tag:
<element name=”...” />
The most important (optional) attribute is type, as it defines
the element's content type:
<element name=”...” type=”...”/>
Cardinality and default values
To change cardinality, you can use the (optional) attributes
minoccurs and maxoccurs:
<element name="from" minoccurs="1" maxoccurs="1" />
<element name="to" minoccurs="1" maxoccurs="unbounded" /> <element name="cc" minoccurs="0" maxoccurs="unbounded" /> Note:
minoccurs=”x”, where x is an integer >=0
maxoccurs=”x”, where x is an integer >0 or “unbounded” The default is “1” in both cases
Also, default or fixed values can be specified:
<element name="color" type="xs:string" default="red"/> <element name="color" type="xs:string" fixed="green/>
XML Schema Attributes and Types
To declare an attribute use the “attribute” tag (very similar to
the “element” one):
<attribute name=”...” />
Similarly to element, attributes can have types, default, and
fixed values:
<attribute name=”...” type=”...”/>
<attribute name="color" type="xs:string" default="red"/> Attributes are optional by default. You can use the use
attribute to make them required:
<attribute name="color" type="xs:string" use="required"/>
23
Note:
Attributes can be defined only within a complex element type (see later)
XML Schema built-in data types
Simple derived types
Derived datatypes (such as integer), are built from the
original ones using Restrictions
Lists Unions
Complex types
Complex types are used to define elements which contain
attributes, text, other elements, or any combination of these
They are built using the following operators
Element references, such as <element ref=”name”> Concatenation, using the sequence element
Union, using a choice element
The all element (like sequence but unordered) The any construct
The group element (to allow references to item groups) MinOccurs and maxOccurs attributes to define cardinalities The mixed (boolean) attribute to allow mixed content
An example
27
<xsd:complexType name=“TeacherType”> <xsd:sequence>
<xsd:element name=“firstname” type=“xsd:string” minOccurs=“0” maxOccurs=“unbounded”/> <xsd:element name=“lastname” type=“xsd:string”/> </xsd:sequence>
<xsd:attribute name=“title” type=“xsd:string” use=“optional” />
</xsd:complexType>
<xsd:element name=“teacher” type=“TeacherType”/>
<teacher title=“Ph.D.”>
<firstname>Davide</firstname> <lastname>Eynard</lastname> </teacher>
Schema extension
Modularization is allowed by the following three constructs:
<include schemaLocation="URI"/>
<import namespace="NS" schemaLocation="URI"/> <redefine schemaLocation="URI"> ... </redefine>
Inheritance and extensions Restrictions
Limitations of XML Schema
XML Schema is much more powerful and expressive than
DTDs, however it still has some limitations:
Too difficult for non-experts (problem: non-experts need to read the schema to write valid XML documents!)
Element and attribute declarations are context insensitive Although XML Schema is built with XML, it still does not
have a complete XML Schema Technical limitations
When describing mixed content, the character data cannot be constrained in any way
A schema cannot enforce a particular root element Element defaults cannot contain markup, but only
character data
... and many others
References
Some Web references:
G. Antoniou and F. van Harmelen “A Semantic Web Primer”, The MIT Press 2004.
Chapter 2 slides: http://www.ics.forth.gr/isl/swprimer
M.C. Daconta, L.J. Obrst, and K.T. Smith. “The Semantic Web”, Wiley, 2003.
Chapter 3 online:
http://www.wiley.com/legacy/compbooks/daconta/sw/sample.html
W3 School website, in particular http://www.w3schools.com/dtd and
http://www.w3schools.com/Schema
Anders Møller and Michael I. Schwartzbach. “An Introduction to XML and Web
Technologies”, Addison-Wesley, 2006. Chapter 4 (Schema Languages) online: http://www.brics.dk/ixwt
Examples from Elizabeth Castro, “XML for the World Wide Web: Visual Quickstart
Guide”, Peachpit Press, 2000. http://www.cookwood.com/xml
Tools:
XML Validation Services: http://www.stg.brown.edu/service/xmlvalid,
http://validator.w3.org
XML Copy Editor, a free (as in freedom), multiplatform editor which supports
validation: http://xml-copy-editor.sourceforge.net
Validator, a free (as in freedom), multiplatform, drag and drop XML validator which
works on Windows, Linux, and Mac OS X:
http://homepage.mac.com/rcrews/software/validator/ 30