• No results found

04 XML Schemas. Software Technology 2. MSc in Communication Sciences Program in Technologies for Human Communication Davide Eynard

N/A
N/A
Protected

Academic year: 2021

Share "04 XML Schemas. Software Technology 2. MSc in Communication Sciences Program in Technologies for Human Communication Davide Eynard"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

Davide Eynard

04 – XML Schemas

(2)

XML: recap and evaluation

 During last lesson we saw the basics of XML...

 Tree structure

 Elements and attributes  Content vs presentation

 ... And the basics of XML evaluation

 Well-formedness (just syntax)  Validity wrt a schema

(3)

Why do we need a schema?

 XML can be used to describe different data and is totally

unaware of what you are speaking about  You can check if the syntax is right...

 ... but you cannot constrain its usage in any way!

 Example: <person> <firstName>John</firstName> <lastName>Doe</lastName> <SSN>123-45-6789</SSN> <SSN>987-65-4321</SSN> </person>

 SSN should be unique, but a simple check on syntax would not find errors in this code

(4)

What does a schema do?

 A schema allows you to define all the elements and attributes

that can be used inside an XML document

 Moreover, you can add constraints specifying:

 Which are the children of a particular element  In which order they appear

 How many children an element can have  If the element is empty or contains text  Datatypes for elements and attributes

 Default values for elements and attributes

 Given this information, an XML document conforming to a

given schema can be validated

 The document is valid if it is well-formed and it follows the structure given inside the schema

 Validation can be done automatically by any tool which “understands” the schema language.

(5)

DTD vs XML Schema

 There are many ways of defining the structure of an XML

document

 i.e. DTD, XML Schema, RELAX NG, Schematron, ...

 DTD and XML Schema are the most used ones, but XML

Schema is having more success (W3C recommendation, 2001) because:

 it is written with XML syntax  it supports datatypes

 it supports namespaces

 it supports inheritance and data type extension

(6)

Using a DTD

 Writing a DTD is just as easy as writing another text file, but

how can we use a DTD?

 How can we say a file should follow a schema?

 How can we use this information to validate the file?

 To match a document with a DTD, we should add the

following to the xml prolog:

 <! DOCTYPE rootelement SYSTEM “dtdlocation”>

 To test it, we can use online validators or validating editors  Example:

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE messages SYSTEM "./messages.dtd"> <messages>

<message msgid="1"> <from>

...

(7)

DTD Elements – 1

 An element can declared in the following way:

<!ELEMENT element-name category> or

<!ELEMENT element-name (element-content)>

 Category = EMPTY

<!ELEMENT br EMPTY> <br/>

 Elements containing only a sequence of characters

<!ELEMENT element-name (#PCDATA)>

 Elements containing any mixture of text and other elements

<!ELEMENT element-name ANY>

 Elements containing one or more children elements

<!ELEMENT element-name (child1, child2, ...)>

Follows the specified order!

(8)

DTD Elements – 2

 Example: <text-message> <from>+393357654321</from> <to>+393471234567</to> <text>Hi there!</text> </text-message>

<!ELEMENT text-message (from, to, text)> <!ELEMENT from (#PCDATA)>

<!ELEMENT to (#PCDATA)> <!ELEMENT text (#PCDATA)>

(9)

DTD Elements – 3

 Disjunction:

<!ELEMENT email_header (cc|bcc)> <!ELEMENT cc (#PCDATA)>

<!ELEMENT bcc (#PCDATA)>

 We can use disjunction to specify subelements in generic

order:

<!ELEMENT email_header ((from,to)|(to,from))>

 ... What if we have 10 subelements?

 Cardinality:

<!ELEMENT email_header (from,to+,cc*,subject)>

 ? zero times or once  * zero or more times  + one or more times

(10)

DTD Attributes – 1

 Attributes are defined in a DTD with an attribute list:

 <!ATTLIST element-name attr-name attr-type value-type>

 For each attribute you have to define:

 The name of the element it is related to  Its name

 Its type

 Its value type

(11)

DTD Attributes – 2

 Example: <text-message from=”+393357654321”> <to>+393471234567</to> <text>Hi there!</text> </text-message>

<!ELEMENT text-message (to, text)> <!ATTLIST text-message

from CDATA #REQUIRED> <!ELEMENT to (#PCDATA)> <!ELEMENT text (#PCDATA)>

(12)

DTD Attributes – 3

 Attribute types:

 CDATA, a string

 ID, a name that is unique across the XML document  IDREF, a reference to another element with the ID

attribute

 IDREFS, a sequence of IDREF

 (v1|...|vn), an enumeration of all possible values  i.e. weekday (monday|tuesday|...|sunday)

 Limitations

 No dates  No numbers  No booleans

(13)

DTD Attributes – 4

 Attribute value types:

 #REQUIRED (attribute must appear in every occurrence of the element type in the XML document)

 #IMPLIED (the appearance of the attribute is optional)  #FIXED “value” (every element must have this attribute

with this value)

<!ATTLIST html xmlns CDATA #FIXED  'http://www.w3.org/1999/xhtml'> 

 “value” (specifies the default value for the attribute)

<!ATTLIST car color (red|white|blue) “red”>

(14)

From DTD to XML Schema

 Main differences between XML DTD and XML Schema:

 XML Schema's syntax is based on XML itself (you can use the same tools for XML documents and schemas!)

 It allows the reuse of existing schemas (inheritance) and their refinement (extension)

 It supports more specific datatypes  It supports namespaces

 Note:

 XML Schema is also called XML Schema Definition (XSD)

(15)

Namespaces

 Elements in XML files can be defined by the developers

 What if two developers use the same name for different kinds of elements?  Example: <table> <tr> <td>Apples</td> <td>Bananas</td> </tr> </table> <table>

<name>African Coffee Table</name> <width>80</width>

<length>120</length> </table>

(16)

Namespaces definition

 We need a way to specify that element names come from two

different contexts

 we put a prefix before element names

 we specify what namespace that prefix represents

<h:table xmlns:h = "http://www.w3.org/TR/html4/"> <tr> <td>Apples</td> <td>Bananas</td> </tr> </h:table> <f:table xmlns:f = ”http://my.name.space/furniture/”> <name>African Coffee Table</name>

<width>80</width> <length>120</length> </f:table>

(17)

Root and default namespaces

 You can also define all the namespaces you are going to use

in the root element of your XML document:

<root xmlns:h = "http://www.w3.org/TR/html4/" xmlns:f = ”http://my.name.space/furniture/”> <h:table>...</h:table> <f:table>...</f:table> </root>

 If the xmlns attribute is not followed by a prefix, then the

specified namespace is considered as the default one

<html xmlns="http://www.w3.org/1999/xhtml">

(18)

Documents using XML Schema

 How is the prolog of XML documents using an XML Schema? <?xml version="1.0" encoding="UTF-8"?>

<messages

xmlns = "http://my.name.space"

xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation = "http://my.name.space ./messages.xsd"> <message msgid="1"> <from> ... ... </messages> 18

(19)

XML Schema opening tag

 An XML schema is an XML document whose root element is

called schema and is defined like follows:

<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs = "http://www.w3.org/2001/XMLSchema" targetNamespace = "http://my.name.space" xmlns = "http://my.name.space" elementFormDefault = "qualified"> • Source, target, and default ns  Notes:

 the xs:schema element is the root of every XML schema  qualified = “Associated with a namespace, either by the

use of a declared prefix or via a default namespace declaration”. More details here

(20)

The four constructs of XML Schema

 XML Schema is built on four constructs:

 A simple type definition defines a family of text strings (Unicode)

 A complex type definition defines a collection of

requirements for attributes, sub-elements, and char data  An element declaration associates an element name with

either a simple type or a complex type

 An attribute declaration associates an attribute name with a simple type (attributes always contain unstructured text)

(21)

XML Schema Elements and Types

 To declare an element (equivalent to <!ELEMENT> in a DTD)

you have to use the “element” tag:

<element name=”...” />

 The most important (optional) attribute is type, as it defines

the element's content type:

<element name=”...” type=”...”/>

(22)

Cardinality and default values

 To change cardinality, you can use the (optional) attributes

minoccurs and maxoccurs:

<element name="from" minoccurs="1" maxoccurs="1" />

<element name="to" minoccurs="1" maxoccurs="unbounded" /> <element name="cc" minoccurs="0" maxoccurs="unbounded" />  Note:

 minoccurs=”x”, where x is an integer >=0

 maxoccurs=”x”, where x is an integer >0 or “unbounded”  The default is “1” in both cases

 Also, default or fixed values can be specified:

<element name="color" type="xs:string" default="red"/> <element name="color" type="xs:string" fixed="green/>

(23)

XML Schema Attributes and Types

 To declare an attribute use the “attribute” tag (very similar to

the “element” one):

<attribute name=”...” />

 Similarly to element, attributes can have types, default, and

fixed values:

<attribute name=”...” type=”...”/>

<attribute name="color" type="xs:string" default="red"/>  Attributes are optional by default. You can use the use

attribute to make them required:

<attribute name="color" type="xs:string" use="required"/>

23

 Note:

 Attributes can be defined only within a complex element type (see later)

(24)

XML Schema built-in data types

(25)

Simple derived types

Derived datatypes (such as integer), are built from the

original ones using  Restrictions

 Lists  Unions

(26)

Complex types

 Complex types are used to define elements which contain

attributes, text, other elements, or any combination of these

 They are built using the following operators

Element references, such as <element ref=”name”>  Concatenation, using the sequence element

 Union, using a choice element

 The all element (like sequence but unordered)  The any construct

 The group element (to allow references to item groups)  MinOccurs and maxOccurs attributes to define cardinalities  The mixed (boolean) attribute to allow mixed content

(27)

An example

27

<xsd:complexType name=“TeacherType”> <xsd:sequence>

<xsd:element name=“firstname” type=“xsd:string” minOccurs=“0” maxOccurs=“unbounded”/> <xsd:element name=“lastname” type=“xsd:string”/> </xsd:sequence>

<xsd:attribute name=“title” type=“xsd:string” use=“optional” />

</xsd:complexType>

<xsd:element name=“teacher” type=“TeacherType”/>

<teacher title=“Ph.D.”>

<firstname>Davide</firstname> <lastname>Eynard</lastname> </teacher>

(28)

Schema extension

 Modularization is allowed by the following three constructs:

 <include schemaLocation="URI"/>

 <import namespace="NS" schemaLocation="URI"/>  <redefine schemaLocation="URI"> ... </redefine>

 Inheritance and extensions  Restrictions

(29)

Limitations of XML Schema

 XML Schema is much more powerful and expressive than

DTDs, however it still has some limitations:

 Too difficult for non-experts (problem: non-experts need to read the schema to write valid XML documents!)

 Element and attribute declarations are context insensitive  Although XML Schema is built with XML, it still does not

have a complete XML Schema  Technical limitations

 When describing mixed content, the character data cannot be constrained in any way

 A schema cannot enforce a particular root element  Element defaults cannot contain markup, but only

character data

 ... and many others

(30)

References

 Some Web references:

 G. Antoniou and F. van Harmelen “A Semantic Web Primer”, The MIT Press 2004.

Chapter 2 slides: http://www.ics.forth.gr/isl/swprimer

 M.C. Daconta, L.J. Obrst, and K.T. Smith. “The Semantic Web”, Wiley, 2003.

Chapter 3 online:

http://www.wiley.com/legacy/compbooks/daconta/sw/sample.html

 W3 School website, in particular http://www.w3schools.com/dtd and

http://www.w3schools.com/Schema

 Anders Møller and Michael I. Schwartzbach. “An Introduction to XML and Web

Technologies”, Addison-Wesley, 2006. Chapter 4 (Schema Languages) online: http://www.brics.dk/ixwt

 Examples from Elizabeth Castro, “XML for the World Wide Web: Visual Quickstart

Guide”, Peachpit Press, 2000. http://www.cookwood.com/xml

 Tools:

 XML Validation Services: http://www.stg.brown.edu/service/xmlvalid,

http://validator.w3.org

 XML Copy Editor, a free (as in freedom), multiplatform editor which supports

validation: http://xml-copy-editor.sourceforge.net

 Validator, a free (as in freedom), multiplatform, drag and drop XML validator which

works on Windows, Linux, and Mac OS X:

http://homepage.mac.com/rcrews/software/validator/ 30

References

Related documents