Modul 2:
XML Schemadefinition
a.Univ.-Prof. Dr. Werner Retschitzegger
Vorlesun g IFS in der B ioinfo rmati k SS 20 11
Johannes Kepler University Linz
www.jku.ac.at
Johannes Kepler University Linz
www.jku.ac.at Institute of Bioinformatics www.bioinf.jku.at Institute of Bioinformatics www.bioinf.jku.at
IFS
IFS
Information Systems Groupwww.ifs.uni-linz.ac.at
IFS
IFS
IFS
IFS
Information Systems Groupwww.ifs.uni-linz.ac.at M2-2 XML Schemadefinition XML Schema Namespaces XML 1.0 Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Outline
Introductionz Motivation for XML
z Document Markup Languages
z Application Areas for XML
XML 1.0 Namespaces XML Schema
The following slides are based (among others) on:
Elliotte Rusty Harold, W. Scott Means, XML in a Nutshell: A Desktop Quick Reference, 3rd Edition, O'Reilly & Associates, 2005
M2-3
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Motivation for XML
1/5
From HTML to XML
"If I invent another programming language,
its name will contain the letter
X
."
(N. Wirth, Software Pioniere Konferenz, Bonn 2001)
223 Mio. SQL 252 Mio. ABC 20,6 K “Werner Retschitzegger” 237 Mio. Soccer 603 Mio. XML 2,2 Mrd. Love Google Indicator: ... as of Sep/16/08 M2-4 XML Schemadefinition XML Schema Namespaces XML 1.0 Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Motivation for XML
2/5
From HTML to XML
Brian Kerningham: "The problem with HTML-WYSIWYG is that
what you see is all you've got"
HTML (HyperText Markup Language) is the "Lingua Franca"
for representing Hypertext Documents at the Web
Standardized 1989 by W3C (World Wide Web Consortium)
Basic concept: "Markup" in terms of "Tags"
Drawbacks
z Restricted number of pre-defined tags
{ permanent extensions with proprietary tags
z Tags primarily describe layout aspects
M2-5
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Motivation for XML
3/5
From HTML to XML
<h1>PDACatalog</h1> <h2>Nokia 8210</h2> <table border="1"> <tr> <td>Battery</td><td>900mAh</td> </tr> <tr> <td>Weight</td><td>141g</td> </tr> … </table>HTML describes layout of content
<PDACatalog> <Producer name="Nokia"> <PDA name="8210"> <Battery>900mAh</Battery> <Weight>141g</Weight> … </PDA> </Producer> </PDACatalog> XML describes
structureand semantics of content
Tim Bray, Co-Editor of XML 1.0:
"XML will become the ASCII of the 21st century
-basic, essential, unexciting" PDA-Catalog Battery Weight PDA-Catalog M2-6 XML Schemadefinition XML Schema Namespaces XML 1.0 Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Motivation for XML
4/5
Features of XML
Layout Independence
z Separation of structure and semantics of the content from its layout
Platform and Vendor Independence
z Endorsed by the W3C Internationality
z Based on the UNICODE-Standard Extensibility
z Tags can be defined and named arbitrarily – meta language Structurability
z Tags can be nested arbitrarily
Semi-structured
z Content can contain fully structured parts and fully unstructured parts Self-describing
z Tags describing structure and semantics of the content are z ... for humans: relatively easy to read and edit
z ... for machines: easy to generate and parse X-Technology Infrastructure
z W3C provides a set of XML-based standards – „XML Standards Family“ Correctness Proof
M2-7
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Well-formedness
z syntactical properties, e.g.:
{ At least 1 tag per document { Exactly 1 root tag
{ Tags have to be none-overlapping { Each tag has to have
an end tag { ....
XML-Processors parse XML documents and check
z either solely well-formedness (non-validating processors)
z or also validity (validating processors)
Can be called from within an application (e.g., browser) Decompose an XML document into its parts forming a tree,
which allows to access its parts from within an application Validity
z XML document is well-formed
and corresponds to a schema
z Schema defines vocabulary and
grammar z Alternatives: DTD or XML Schema-Standard Application Document parts Errors Catalog.DTD XML Processor Parser Entity Manager PDACatalog1.XML PDA XML-Document Features Entities
Motivation for XML
5/5Properties of XML Documents and XML Processors
M2-8 XML Schemadefinition XML Schema Namespaces XML 1.0 Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Document Markup Languages
1/4
History
Vannevar Bush 1945 Memex
Douglas Engelbart 1962 Augment
Ted Nelson 1965 Xanadu
William Tunniclife (GCA) 1967 GenCode
Goldfarb, Mosher, Lorie (IBM) 1969 GML (Generalized Markup Language)
ANSI 1978 Standardisierung (GenCode & GML)
Charles Goldfarb
ISO 1986 SGML (Standard Generalized Markup
Language - ISO 8879)
Tim Berners-Lee (CERN) 1989 HTML (Hypertext Markup Language)
Mark Andreessen (NCSA) 1993 HTML-Forms (XMosaic)
Netscape, Microsoft 1994 HTML-Derivations
Jon Bosak, Tim Bray, 1996 XML Working Group
James Clark et al. (W3C)
10. 2. 1998 XML 1.0
M2-9
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Document Markup Languages
2/4Memex
http://www.ps.uni-sb.de/~duchier/pub/vbush/vbush-all.shtml M2-10 XML Schemadefinition XML Schema Namespaces XML 1.0 Introduction© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
SGML
XML
Meta Level
XHTML
Language Level
(e.g. DTDs)
HTML
MathML
Instance Level
(documents)
e
iπ
+1= 0
n f (n) =Σ
k k=1WML
z.B.
z.B.
M2
M1
M0
[www.omg.org]Document Markup Languages
3/4M2-11
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Document Markup Languages
4/4
XML versus ...
... SGML
z XML vs. SGML (60 pages vs. 600 pages)
z XML has 20% of SGML’s complexity, but 80% of its functionality z XML documents are conform to an ISO revision of SGML
-WebSGML (Annex to the SGML-Standard ISO8879)
... HTML
z XML is complementary to HTML (semantic and structure vs.
layout)
z XML is not backward compatible to HTML
z Simple conversion from HTML documents to XML
... XHTML
z = Extensible HTML
z W3C Recommendation Aug. 2002 (2nd edition)
z HTML 4.01 as an „XML application“, i.e. HTML was described by
means of a XML-DTD M2-12 XML Schemadefinition XML Schema Namespaces XML 1.0 Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Application Areas of XML
1/4Three Main Application Areas
Data Exchange ("Portable Data")
z Using XML solely as an exchange format or z Using also a common schema
Multi-Delivery
z One and the same content can be
delivered to different end user devices
Intelligent Retrieval
z Instead of a simple keyword search on
basis of HTML documents, structure-based search on basis of XML documents
"Mozart" -Componi
st or choc olate ball?
M2-13
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
[http://www.oasis-open.org/cover/xml.html#applications]
XML-DTDs for ...
Literature "Gutenberg" Travel "openTravel" News "NewsML" Marketing "adXML" Weather "OMF" Human Resources "XML-HR" Voice Applications "VoxML" Vector Graphics "SVG" Mobile Applications "WML" Geo Applications "ANZMETA" Health Care "HL7" Mathematics "MathML” Banking "MBA” eGovernment “eGovML” Electronic Commerce z CBL: Common BusinessLibrary (Commerce One)
z BizTalk: Microsoft
z cXML: Commerce XML
z RosettaNet:Format for
Online-Orders
z ebXML: OASIS + XML/EDI
z FnXML: Financial Products
Markup Language ...
Application Areas of XML
2/4Industrial Sectors – "Verticalisation of XML"
M2-14 XML Schemadefinition XML Schema Namespaces XML 1.0 Introduction
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Application Areas of XML
3/4Sources of XML Data
Inter-application and mobile devices communication
data
z e.g., Web Services
Logs and Blogs
z e.g., RSS
Metadata
z e.g., Schema, WSDL, XMP Presentation data
z e.g., XHTML Documents
z e.g., Word
Views of other sources of data
z e.g., Relational, LDAP, CSV, Excel, etc.
M2-15
© 2011 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML
z XML language concepts incl. DTD
XML Namespaces
z Support of a global identification schema
for element names and attribute names XPath (XML Path Language)
z Path expressions for navigation in
XML documents XML Schema
z XML-based language for the definition of XML schemata
XLink, XPointer
z XML-based language for the linking of (parts of) XML documents
XSL (Extensible Stylesheet Language)
z XSLT: Transformation of XML documents (declarative) z XSL-FO: Rendering of XML documents (declarative)
DOM (Document Object Model)
z API for accessing XML documents in a procedural manner
W3C Standardization Levels: (1) Note (2) Working Draft (WD) (3) Candidate Recommendation (CR) (4) Proposed Recommendation (PR) (5) Recommendation (REC)
Application Areas of XML
4/4
XML Standardization Family (excerpt)
„It takes ten minutes to understand (base) XML, but then ten month to understand the new technologies hung around it. „
(Peter Chen)
M2-16
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Namespaces XML Schema XML 1.0 Introduction XML Schemadefinition
Outline
Introduction
XML 1.0
z XML Document z DTD z Entities Namespaces
XML Schema
M2-17
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Document
1/3Running Example: PDACatalog
<?xml
<?xml version="1.0" version="1.0" encoding="UTF-8"?>> <
<PDACatalogPDACatalog>> <!
<!---- NOKIA NOKIA ---->> <Producer
<Producer name="NOKIA"name="NOKIA">> <
<ProducerNoProducerNo no="h1234"no="h1234"/>/>
<PDA
<PDA name="7110"name="7110">>
<Weight>
<Weight>141g141g</Weight></Weight>
<Price
<Price contract=contract=““yes"yes">>999999</Price></Price>
<Price
<Price contract=contract=““no"no">>49994999</Price></Price>
</PDA>
</PDA>
<PDA
<PDA name="8210"name="8210">>
... ... </PDA> </PDA> </Producer> </Producer> </ </PDACatalogPDACatalog>> “Root Element" or “Document Element" Prologue (optional) "xml declaration" Comment Start Tag
End Tag Attribute
Attribute Value Elementname Text “Character Data" “Element Content" of <Producer> “Empty Element" Subelement PDACatalog1.XML PDACatalog1.XML “Mixed Content" M2-18
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Namespaces XML Schema
XML 1.0
Introduction XML Schemadefinition
XML Document
2/3Elements and Attributes
Element- and attribute names have to be valid "XML Names"
z [ letter | _ | : ] [ letter | '0..9' | '.' | '-' | '_' | ':' ]* z "letter": A-Z, a-z, and others like ä, êς
z ':' reserved for namespaces z No length restriction z Case-sensitive
Empty elements can be represented in long form or short
form
z <ProducerNo no="h1234"></ProducerNo>or
z <ProducerNo no="h1234"/>
Attribute values must be enlosed by quotation marks
z <PDA name='8210'> or
M2-19
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Document
3/3Comments
Can stretch across multiple rows
z Between start tag and end tag of an element z Before or after the root element
Restrictions
z Comment within a tag not allowed z Nesting of comments not allowed z "--" within a comment not allowed
<!--A comment may comprise also <tagNames> or &entities;
--> ...
M2-20
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Namespaces XML Schema
XML 1.0
Introduction XML Schemadefinition
A DTD defines vocabulary and grammar for a set of XML
documents
An XML document is allowed to reference a single DTD only
("document type declaration -
DOCTYPE
")
A DTD has to be referenced
z AFTER the prologue
z but BEFORE the root element
A DTD does NOT DEFINE the root
element of a XML document
z The root element is rather defined
within the XML document itself using the
DOCTYPE
-Declaration
z Can be an arbitrary element of the DTD
DTD
1/8
Purpose and Characteristica
<?xml version="1.0"?>
<?xml version="1.0"?>
<!DOCTYPE
<!DOCTYPE PDACatalogPDACatalog ...... < <PDACatalogPDACatalog>> ... ... PDACatalog1.XML PDACatalog1.XML Catalog.DTD Catalog.DTD Root Element Definition Usage
M2-21
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
DTD
2/8
Incorporating DTD’s into XML Documents – 3 Alternatives
1.External DTD, i.e., a dedicated file (
*.dtd
) identified by
means of an URI ("external subset")
<!DOCTYPE PDACatalog SYSTEM "Catalog.dtd">
2.Internal DTD, i.e., defined within the XML document
("internal subset")
<!DOCTYPE PDACatalog […]>
3.
External & internal DTD, i.e., internal complements
external
Excursus – URL vs. URI:
z An URL (Uniform Resource Locator) identifies Internet
resources on basis of their location using the Domain Name Service (DNS)
z An URI (Uniform Resource Identifier) identifies arbitrary
resources on basis of their names (z.B. ISBN#) or other properties of the resource
z Each URL is a valid URI
M2-22
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Namespaces XML Schema XML 1.0 Introduction XML Schemadefinition
DTD
3/8
Example –
Catalog.dtd
<!-- Catalog DTD Version 1.0 --> <!ELEMENT PDACatalog (Producer*)><!ELEMENT Producer (ProducerNo, PDA+)>
<!ATTLIST Producer name CDATA #REQUIRED>
<!ELEMENT ProducerNo EMPTY>
<!ATTLIST ProducerNo no ID #REQUIRED>
<!ELEMENT PDA (Weight, Price+)>
<!ATTLIST PDA name CDATA #REQUIRED>
<!ELEMENT Weight (#PCDATA)>
<!ELEMENT Price (#PCDATA)>
<!ATTLIST Price contract (yes|no) "no"> Weight ProducerNo no * 1..* Price contract PDA name PDACatalog Producer name 1 1 1..*
UML Class Diagram XML DTD
XML Element XML Attribute Legend:
1 : exactly once
1..*: once or several times
* : 0 or several times : part-of
M2-23
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
DTD
4/8
Element Declaration
<!ELEMENT element name(Content Model)>
Sequence
<!ELEMENT Producer (ProducerNo, PDA+)>
Alternative
<!ELEMENT Battery (LiIo | NiMh | NiCd)>
Cardinality
z Optional (0 or once)
<!ELEMENT PDA (Comment?)>
z Null or several times
<!ELEMENT PDACatalog (Producer*)>
z Once or several times
<!ELEMENT Producer (PDA+)>
z Content model can be nested by means of paranthesis
<!ELEMENT div1 (head, (p | list | note)*, div2*)>
M2-24
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Namespaces XML Schema XML 1.0 Introduction XML Schemadefinition
DTD
5/8
Element Declaration
Empty Element
z Element may contain attributes, but neither text nor subelements
<!ELEMENT ProducerNo EMPTY>
Element Content
z Element contains subelements and optional attributes but no text
<!ELEMENT PDACatalog (Producer*)>
Mixed Content
z Element contains text and optional subelements or attributes
<!ELEMENT Price (#PCDATA)>
<!ELEMENT Price (#PCDATA | Category | Discount)*>
Element with arbitrary content
z Content not exactly specified in DTD z Used elements have to be declared anyway
M2-25
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
DTD
6/8
Attribute Declaration
<!ATTLIST element name
attributename1 type default
attributename2 type default
...
>
Attribute names must be unique within an element
Default specifications
z NOT NULL #REQUIRED
z Optional Value #IMPLIED
z Default Value [#FIXED] "value"
M2-26
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Namespaces XML Schema
XML 1.0
Introduction XML Schemadefinition
CDATA
z String
z <!ATTLIST Producer name CDATA #REQUIRED>
ID
,
IDREF(S)
z IDensures uniqueness of
attribute values within a document
z Per element 1 attribute of
typeIDallowed only
z IDREFis a reference to an attribute of typeID
z „Referential integrity“ (untyped!) is checked by XML processor z Values of ID- and IDREF(S)-attributes must be valid XML names,
i.e., starting numbers are not allowed
DTD
7/8
Attribute Declaration – 10 Types
<!ATTLIST Example
identity ID #IMPLIED
M2-27
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
DTD
8/8
Attribute Declaration – 10 Types
Enumeration Type
z A pre-defined set of values consisting of XML name tokens z <!ATTLIST Price contract (yes|no) "no">
ENTITY
,
ENTITIES
z Attribute value is the name of a declared non-parsed Entity z <!ATTLIST Image filename ENTITY #REQUIRED>
NMTOKEN(S)
z "XML name tokens” are an extended form of XML names z In addition, they can start with "0..9 ", ". " and "-" z <!ATTLIST journal year NMTOKEN #REQUIRED>
NOTATION
z Attribute value is the name of a declared notation – seldomly
used
<!ATTLIST image type NOTATION (gif | tiff) #REQUIRED>
M2-28
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Namespaces XML Schema XML 1.0 Introduction XML Schemadefinition
Entities
1/9
Overview
General Entities Usage in XML documents Parameter Entities Usage in DTDs Pre-defined Replacement of XML-specific char’s Unicode Replacement of none-ASCII-char’s User-defined Replacement of document parts Internal embedded External file Parsed Non-parsed Internal External
Referenceable, named parts of
z XML documents (plain text, markup or other arbitrary formats) z or a DTD
Purpose: Character replacement – macros, modularisation
M2-29
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Purpose: Representation of XML specific characters
z e.g. <> – "escaping"
5 pre-defined Entities
z & & (ampersand)
z < < (less than)
z > > (greater than)
Example
z <formular>x < y</formular>
Usage
z As element value or attribute value
Alternative:
CDATA
-Section
z Example:
<formular>x <![CDATA[<]]> y</formular>
z “Within”CDATAonly its end is recognized (']]>')
z CDATA-Sections cannot be nested
Entities
2/9
Pre-defined Entities
z ' ' (apostrophe)
z &qout; " (quotation mark)
Interpreted as plain text, NOT as markup
M2-30
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Namespaces XML Schema
XML 1.0
Introduction XML Schemadefinition
Purpose
z
Representation of characters, not
available at the keyboard
z
http://www.unicode.org/
Unicode classifies characters into letters,
numbers, punctuations, symbols (general,
technical, mathematical), etc.
z Unique assignment of characters
to numbers
z Supports 25 living languages (Cyrillic, Hebrew, Hiragana, ...) z All in all approx. 50.000 different characters
Usage
z As element value or attribute value z Arbitrary Unicode-characters are
referenced via their numbers (decimal or hexadecimal)
Entities
3/9
Unicode ("Character Encoding") Entities
û û and © all represent the same character
M2-31
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Text or well-formed markup is associated with a name
Declaration within the DTD:
Usage
z As element value or attribute value of the XML document z In entities themselves – but cyclic references are forbidden
Entities
4/9
User-Defined Internal Entities
<!ENTITY entityName "replacementText or Markup">
&entityName;
M2-32
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Namespaces XML Schema
XML 1.0
Introduction XML Schemadefinition
Purpose
z Decomposition of the XML document (similar to SSI – Server
Side Include-mechanism)
z Because of the document’s size or for reuse
Declaration within the DTD
Charakteristica
z In principal well-formed, but may contain multiple root
elements
z Reference to a DTD not allowed
Usage
z Syntax analogous to internal entities
z As element values of the XML document and within entities
themeselves
z Cyclic references forbidden z NOT within attribute values
Entities
5/9
User-Defined External Parsed Entities
M2-33
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Purpose
z References to files with arbitrary formats, e.g. ASCII,
not-wellformed XML, GIF, JPEG, QuickTime Movies
z NDATAdefines a "non-parsed" Entity and specifies an arbitrary
file format
z a NOTATION-declaration is necessary to identify a corresponding
application (via an URI), which is able to process files of this format
Usage
z Only as attribute value of type ENTITY
z Syntax: entity name within quotation marks (Note: NO &...;) z Processor informs the application only that there exists a
non-parsed entity at a certain location – no expansion!
(More expressive) Alternative: W3C’s XLink-Standard
Entities
6/9
User-Defined External Non-Parsed Entities
<!ENTITY entityName SYSTEM "URI" NDATA formatName> <!NOTATION formatName SYSTEM "URI">
M2-34
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Namespaces XML Schema
XML 1.0
Introduction XML Schemadefinition
Entities
7/9
User-Defined Entities – Example
<?XML version="1.0"?>
<!DOCTYPE PDACatalog SYSTEM ”Catalog.dtd" [
<!ENTITY linkNokia "http://www.nokia.de/8210">
<!ENTITY address "<town>Linz</town>">
<!ENTITY features SYSTEM "feat8210.XML"> <!ENTITY bildNokia SYSTEM "/pictures/8210.jpg"
NDATA jpeg>
<!NOTATION jpeg SYSTEM "image/jpeg">
…
<!ATTLIST Image filename ENTITY #REQUIRED>
]> …
<PDA name="8210">
<Picture><Image filename="bildNokia"/></Picture> <ProducerInfo>&linkNokia;</ProducerInfo>
… &features; &address; </PDA> …
D
e
c
lar
at
io
n
Usage
internal external, parsed external, non-parsed Usage as element value Usage as attribute valueM2-35
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Entities
8/9
Parameter Entities
<!ENTITY % Battery "(type, capacity)" >
<!ELEMENT PDABatt %Battery;>
<!ELEMENT camcorderBatt %Battery;>
Internal
<!ENTITY % linkNokia SYSTEM "http://nokia.de" > %linkNokia;External
Purpose
z Modularization of DTDs
Syntactical difference to General Entities
z % blank includedfor declaration z % blank excludedfor usage
Definition of ...
z Name and content model of elements z Attribute declaration
M2-36
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) Namespaces XML Schema
XML 1.0
Introduction XML Schemadefinition
Entities
9/9
Parameter Entities – Overriding
<!ENTITY % residental_content "address,rooms">
External DTD
Internal DTD of a XML document
<!ENTITY % residental_content "address,rooms,baths">
A Parameter Entity defined within an external DTD can be
arbitrarily overriden within the internal DTD of a XML
document
This allows to adapt the external DTD to the requirements
of single XML documents without having to change the
external DTD
Thus, the Parameter Entity is used as a kind of
M2-37
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Outline
Introduction
XML 1.0
Namespaces
XML Schema
M2-38© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) XML Schema
Namespaces
XML 1.0
Introduction XML Schemadefinition
Namespaces
1/5
A XML namespace (NS) allows a unique global
identification of elments and attributes
z W3C-REC "Namespaces in XML", 14th Jan. 1999 (13 pages)
For this, elements and attributes of a domain (e.g.
MathML) are assigned to one or more NS
z XSL uses, e.g., different namespaces for XSLT and XSL-FO
A NS is represented by an URI
z Needs not directly refer to the corresponding vocabulary
z Thus, provides a level of indirection which allows to decouple the
location of the vocabulary from the unique identifier – the URI
The associated elements and attributes have to be qualified
by means of this URI in case of usage, thus being made
globaly unique
z This allows the reuse and especially the combination
M2-39
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces
2/5
NS with Prefix vs. Default NS
BUT: URIs cannot be used for direct qualification
z This is since URIs normally contain characters which are not allowed as
part of valid XML names (e.g., " / ", " & ")
z Instead, user-defined prefixes have to be used
One ore more NS are declared on basis of the pre-defined
attributexmlns
z This attribute can be defined in the context of any element of the DTD
z The name of the element itself where the NS has been declared as well
as direct and indirect subelements and attributes can be qualified with the NS – „NS-inheritance“
Default NS
z Also declared via the pre-defined attributexmlns– BUT – only 1 per
element, and without declaring any prefix
z None-qualified subelements are automatically associated with the
default NS, attributes NOT
z Can be overriden within subelements
M2-40
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) XML Schema
Namespaces
XML 1.0
Introduction XML Schemadefinition
Namespaces
3/5
Declaration and Usage
... <edi:HC
xmlns:edi='http://ecommerce.org/schema'
xmlns='http://www.mobildev.com/schema'> <model name="8210">
<edi:price edi:units='Euro'>32.18</edi:price> <price währung='USD'>25.16</price>
...
</model>... </edi:HC>
NS Prefix (optional) URI of the NS Pre-defined Attribute
for NS Declaration
Default-NS (no Prefix)
The NS of the element edi:priceis http://ecommerce.org/schema The NS of the elementsmodeland priceis the default NS
http://www.mobildev.com/schema
M2-41
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Namespaces
4/5
... and DTDs
NS are in principle independent of DTDs
z Can be used in documents with or without DTDs
BUT:
z All elements and attributes which are qualified in the XML
document must also be declared appropriately within the DTD
z Huge Overhead – this is since DTD’s are not aware of NS z <edi:HC> ... <!ELEMENT edi:HC (....)>
z <edi:price> ... <!ELEMENT edi:price (#PCDATA)>
What can be done is to specify a default NS within the DTD
z <!ATTLIST edi:HC xmlns
CDATA #FIXED 'http://www.mobildev.com/schema'>
M2-42
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) XML Schema Namespaces XML 1.0 Introduction XML Schemadefinition
Namespaces
5/5
Exemplary NS-URIs
RDF http://www.w3.org/1999/02/22-rdf-syntax-ns# http://www.w3.org/2000/01/rdf-schema# MathMLhttp://www.w3.org/1998/Math/MathML XHTML http://www.w3.org/1999/xhtml SMIL http://www.w3.org/TR/REC-smil XSL http://www.w3.org/1999/XSL/Transform http://www.w3.org/1999/XSL/FormatM2-43
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Outline
Introduction
XML 1.0
Namespaces
XML Schema
z Introductionz Elements and Attributes z Pre-defined Datatypes z User-defined Datatypes
z Keys
z Schema Composition z Schema Modeling Styles
z Comparison DTD – XML Schema
M2-44
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema Namespaces XML 1.0 Introduction XML Schemadefinition
Introduction
DTD versus XML Schema 1/2
Drawbacks DTDs
z Proprietary syntaxz Few datatypes, in fact only
one – String
z Global definition of elements z Parameter Entities for
modularization & overriding
z ID, IDREF(S): Severe restrictions
Advantages XML Schema
z XML as syntax z Numerous pre-defined datatypesz User-defined simple and
complex datatypes
z Inheritance z Keys, references:
flexible concept
XML Schema
z Definition of the structure of XML documents z W3C REC May 2001, approx. 420 pages z W3C REC 2nd edition October 2004
M2-45
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS) <?xml version="1.0"?>
<schema ...>
<simpleType name="producerNoType"> ...
<element name="PDACatalog"> <complexType>
<sequence>
<element name="Producer" minOccurs="0" maxOccurs="unbounded"> <complexType>
<sequence>
<element name="ProducerNo"
type="hc:producerNoType" minOccurs="1" maxOccurs="1"/> <element name=„PDA" minOccurs="1" maxOccurs="unbounded">
<complexType> <sequence>
<element name="Weight" type="string" minOccurs="1" maxOccurs="1"/> <element name="Battery" type="string" minOccurs="1" maxOccurs="1"/> </sequence> ... </schema> Catalog.xsd Catalog.dtd
Introduction
DTD versus XML Schema 2/2
...<!ELEMENT PDACatalog (Producer*) > <!ELEMENT Producer (ProducerNo, PDA+)> <!ELEMENT PDA (Weight, Battery)> <!ELEMENT Weight (#PCDATA)> <!ELEMENT Battery (#PCDATA)> ...
M2-46
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema
Namespaces XML 1.0
Introduction XML Schemadefinition
Namespace for own Vocabulary
z Namespace (NS) of the vocabulary to be defined can be declared
by means of attribute targetNamespace(optional!)
NS of the XML Schema-Standard Vocabulary
z Declaration is obligatory!
z Additional NS (i.e., vocabularies) can be incorporated
A single NS can be defined as Default–NS
z Either own NS, XML Schema–NS or other NS z For all other NS used, a prefix is obligatory
<?xml version="1.0"?>
<schema targetNamespace="http://www.ifs.uni-linz.ac.at/hc"
xmlns:hc="http://www.ifs.uni-linz.ac.at/hc" xmlns="http://www.w3.org/2001/XMLSchema" attributeFormDefault="qualified" elementFormDefault="qualified"> ...
Introduction
M2-47
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Schema of a XML document is defined within the root
element via the attribute
schemaLocation
z 1. Part: targetNamespaceof the schema
z 2. Part: location of the schema document
Catalog.xsd Catalog1.xml <?xml version="1.0"?> <schema targetNamespace="http://www.ifs.uni-linz.ac.at/hc" xmlns:hc="http://www.ifs.uni-linz.ac.at/hc" xmlns="http://www.w3.org/2001/XMLSchema" attributeFormDefault="qualified" elementFormDefault="qualified"> ... <?xml version="1.0"?>
<PDACatalog xsi:schemaLocation="http://www.ifs.uni-linz.ac.at/hcCatalog.xsd"
xmlns="http://www.ifs.uni-linz.ac.at/hc"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance“ >
...
Introduction
Usage of NS in the XML Document
xsi:noNamespaceSchemaLocation= "directPathToXSD_File"
M2-48
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema Namespaces XML 1.0 Introduction XML Schemadefinition
Element
Attribut
Global Definition
z Direct subelement of schema
z NOTE: the root element of the XML document is required to
be defined as global element!
Local Definition
z Definition on an arbitrary nesting level
Analoguosly for Datatypes!
<elementname="name"type="type"minOccurs="int"maxOccurs="int|unbounded"... /> Simple or
Complex Type
Cardinality: Upper/Lower Bounds (only in “local” elements)
<attributename="name"type="type"use="how-its-used"default/fixed="value"... />
Values: required, optional, prohibited (only in “local” attributes)
only relevant, if “use” is not defined Simple Type
Elements and Attributes 1/3
Global / Local Definition
M2-49
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Global or Local Datatypes
Reference to an existing Element or Attribute
<elementname="name"minOccurs="int"maxOccurs="int|unbounded"...> <complexType>…</complexType>
</element>
<elementref="name"minOccurs="int"maxOccurs="int|unbounded".../> <attributename="name"use="how-its-used"default/fixed="value"...>
<simpleType>...</simpleType> </attribute>
<attributeref="name"use="how-its-used"default/fixed="value".../>
Elements and Attributes 2/3
Global / Local Datatypes and References
M2-50
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema Namespaces XML 1.0 Introduction XML Schemadefinition <schema ...> <element name="Producer"> <complexType> <sequence>
<element name="ProducerNo" type="hc:producerNoType" minOccurs="1" maxOccurs="1"/>
<element ref="hc:PDA" maxOccurs="unbounded"/> </sequence>
<attributename="name" type="string" use="required"/> </complexType>
</element>
<element name="PDA"> <complexType>
<sequence>
<element name="Weight" type="string"/> <element name="Battery" type="string"/> </sequence> </complexType> </element> <simpleType name="producerNo"> … Global Element, local Datatype Reference to a global Element Local Element, global Datatype Global Element, local Datatype Local Element, pre-def. Datatype Local Attribute, pre-def. Datatype
Elements and Attributes 3/3
Summarizing Example – Global/Local
M2-51
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
string boolean float double duration date Time
time date gYear gMonth Day gDay gYear Month anyType anySimple Type (all complex types)
gMonth hex Binary base64 Binary any URI QName NOTATION normalized String token
language NMTOKEN Name
NMTOKENS NCName ID IDREF ENTITY IDREFS ENTITIES decimal integer nonPositiveInteger nonNegativeInteger
negativeInteger positiveInteger unsignedLong
unsignedInt unsignedShort unsignedByte long int short byte (W3C REC, 28th Oct. 2004) z Primitive (atomic) z Derived
Pre-Defined Datatypes
1/4 M2-52© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema
Namespaces XML 1.0
Introduction XML Schemadefinition
Because of backward-compatibility reasons, usable only as types for attributes
Pre-Defined Datatypes
2/4
String Datatypes
string anySimpleType hex Binary base64 Binary anyURI NOTATION QName
normalized String token language NMTOKEN Name NMTOKENS NCName ID IDREF ENTITY IDREFS ENTITIES
z Pre-defined primitive Types
z Pre-defined derived Types
Backward-compatibility to DTDs Normalized String with whitespace replacement. Each Tab, Linefeed and CR is replaced by Blank.
"Tokenized" String – all whitespace characters are replaced by blanks,
all starting and ending blanks are deleted and multiple consecutive blanks are replaced by a single one.
Standardized language codes (e.g. en, en-US, de, de-DE) Name token: String without blanks (z.B. "CMS", "234234")
XML-Name: must start with letter, ":" or "-" (e.g., "CMS", "-1") Name without prefix
String-Datatype without Whitespace-Replacement
Binary string-encoded Datatypes
Qualified name: supports the usage of names with NS-prefix
M2-53
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Pre-defined Datatypes
3/4
Numerical Datatypes
float double anySimpleType decimal integer nonPositiveInteger nonNegativeIntegernegativeInteger positiveInteger unsignedLong
unsignedInt unsignedShort unsignedByte long int short byte
z Pre-defined primitive Types
z Pre-defined derived Types
Decimal Numbers: decimal separator ".", "+" or "-" possible.
64, 32, 16 or 8 Bit
Floating Point Numbers: simple (32 Bits) and double (64 Bits) precision
boolean
M2-54
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema
Namespaces XML 1.0
Introduction XML Schemadefinition
Pre-defined Datatypes
4/4
Date- and Time Datatypes
durationtime dateTime dategYearMonthgYear gMonthDay gDay anySimpleType gMonth "CCYY-MM-DDThh:mm:ss" "CCYY-MM-DD" "CCYY-MM" "CCYY" "--MM-DD" "---DD" "--MM" "hh:mm:ss" "PnYnMnDTnHnMnS"
Day of the month
Day of the year
M2-55
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
User-defined Datatypes
Alternatives
Should the Type contain Elements or Attributes? Unstructured Content <simpleType> Structured Content <complexType> Derivation <restriction> <union> or <list> Derivation <restriction> <extension> Nesting <sequence> <all> <choice> Empty / Mixed Name d / An onymo u s
Should the Type contain Elements?
yes no
yes no
Attributes & Elements
<complexContent> Attributes <simpleContent> Note: <complexContent> only necessary in case of derivation from a user-defined type M2-56
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema Namespaces XML 1.0 Introduction XML Schemadefinition
User-defined Datatypes
Alternatives – Examples
<xsd:complexType name="BookTypeWithID"> <xsd:complexContent> <xsd:extension base="BookType"><xsd:attribute name="ID" type="xsd:token"/>
</xsd:extension> </xsd:complexContent> </xsd:complexType> <xsd:complexType> <xsd:sequence> .... </xsd:sequence> </xsd:complexType> <xsd:simpleType name="longitudeType"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="-180"/> <xsd:maxInclusive value="180"/> </xsd:restriction> </xsd:simpleType> <xsd:integer> No Derivation Derivation Simple Complex User-defined Pre-defined Anonymous Named
M2-57
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Restriction of a pre-defined datatype
z <restriction>
Union of pre-defined datatypes (Extension)
z <union>
z Values must correspond to at least one of the combined
datatypes
List of values of one pre-defined datatype
(or again of a List-Datatype)
z <list>
User-defined Datatypes
Derived Simple Datatypes –
<
simpleType
>
M2-58
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema
Namespaces XML 1.0
Introduction XML Schemadefinition
Alternative Definition Possibilities
z Referencing an existing datatype via the attribute base
z Local definition from scratch by using simpleTypeas subelement of the
restriction-Element
12 Possible Restrictions, depending on the base datatype z length z minLength z maxLength z pattern z enumeration z minInclusive z maxInclusive z minExclusive z maxExclusive z whiteSpace z totalDigits z fractionDigits
<simpleTypename="batteryType"> <restrictionbase="string">
<enumeration value="NiMh"/> <enumeration value="NiCd"/> <enumeration value="LiIo"/> </restriction>
</simpleType>
<element name="Battery" type="hc:batteryType"/>
<Battery>NiCd</Battery> XML-Document
User-defined Datatypes
M2-59
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
User-defined Datatypes
Derived Simple Datatypes
<
simpleType
>
–
restriction
M2-60
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema
Namespaces XML 1.0
Introduction XML Schemadefinition
User-defined Datatypes
Derived Simple Datatypes
<
simpleType
>
–
restriction
Restrictions using a “pattern” element
Restrictions of the lexical values
Simple regular expressions
z Normal characters: "C&A"
z Categories of characters:"\p{IsBasicLatin}"
z Sets of characters: "[\p{IsBasicLatin}-[\d]]" z Quantifiers: "[a-zA-Z]{1,8}"
z Paranthesis: "(XML(\s+|-))?Schema" z Combinations of these expressions
M2-61
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Alternative Definition Possibilities
z Referencing an existing datatype via attributes
(memberTypesor itemType)
z Local definition from scratch by using simpleTypeas subelement
of the union- or list-Elements <simpleType name="PDAFeatureType">
<unionmemberTypes="hc:PDAColor hc:PDARobustness"/> </simpleType>
<simpleType name="PDAFeatureListType"> <listitemType="hc:PDAFeature"/> </simpleType>
<element name="PDAFeatureList" type="hc:PDAFeatureListType"/> XML-Dokument:
<PDAFeatureList>blue waterproof shockproof</PDAFeatureList>
User-defined Datatype
Derived Simple Datatypes
<
simpleType
>
–
union/list
M2-62
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema
Namespaces XML 1.0
Introduction XML Schemadefinition
Nested Elements
z Possible within a complex datatype only
Attributes
z Possible within a complex datatype only z Independentof the existence of nested elements
Empty Content
z Possible within a complex datatype only z Does not have nested elements
Mixed Content
z Datatype may contain nested elements and text
z In contrast to DTDs, for nested elements, the ordering and
cardinality properties can be arbitrarily specified
User-defined Datatypes
M2-63
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Sequence –
<sequence>
Choice –
<choice>
Arbitrary Ordering –
<all>
z Nested Elements can be used in arbitrary order
Cardinality
z Expressed by means of minOccursand maxOccurs <complexTypename=“PDAType">
<sequenceminOccurs="1" maxOccurs="1">
<element name="Weight" type="string" minOccurs="1" maxOccurs="1"/> <element name="Battery" type="string" minOccurs="1" maxOccurs="1"/> </sequence>
<attribute name="no" type="nonNegativeInteger" use="required"/> </complexType>
User-defined Datatype
<
complexType
>
– Nested Elements / Attributes
M2-64
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema
Namespaces XML 1.0
Introduction XML Schemadefinition
<complexType name=“PDAType" mixed="true"> <sequence>
<element name="Weight" type="string" minOccurs="1" maxOccurs="1"/> <element name="Battery" type="string" minOccurs="1" maxOccurs="1"/> </sequence>
</complexType>
<element name=„PDA" type="hc:PDAType"/>
<PDA>Type Nokia 7110 has <Weight>141g</Weight>and <Battery>900mAh</Battery>
</PDA>
XML Document
User-defined Datatypes
M2-65
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Extension
z <extension>
z Additional nested elements and/or attributes
Restriction
z <restriction>
z Domain z Cardinality
Abstract Datatypes
z <complexType>with attribute abstract = "true“
Prohibition of Derivation
z <complexType>with attribute final
z Potential values: #all, restriction, extension
User-defined Datatypes
<
complexType
>
– Derivation of Complex Types
M2-66
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema
Namespaces XML 1.0
Introduction XML Schemadefinition
Elements are attached at the end
Extension must be specified within a
<
complexContent
>
-Tag
<complexType name=“extendedPDAType"> <complexContent>
<extensionbase="hc:PDAType" > <sequence>
<element name=“Band" type="string" minOccurs="1" maxOccurs="1"/> <element name="Feature" type="string"
minOccurs="1" maxOccurs="10"/> </sequence> </extension> </complexContent> </complexType> extendedPDAType PDAType
User-defined Datatypes
M2-67
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
The declarations of the base datatype
which should retain must be repeated
Restriction must be specified within a
<
complexContent
>
-Tag
<complexType name=“restrictedPDAType"> <complexContent>
<restrictionbase="hc:extendedPDAType"> <sequence>
<element name="Weight" type="string" minOccurs="1" maxOccurs="1"/> <element name=“Band" type="string" minOccurs="1" maxOccurs="1"/> <element name="Feature" type="string" minOccurs="1" maxOccurs="5"/> </sequence>
</restriction> </complexContent> </complexType>
User-defined Datatypes
<
complexType
>
– Derivation via Restriction
extendedPDAType
restrictedPDAType PDAType
M2-68
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema Namespaces XML 1.0 Introduction XML Schemadefinition
Static
Dynamic
z Definition of the derived datatype within the XML document via
the attribute typeof the XML Schema Instance (xsi) NS
Element PDA has datatype PDAType <PDA> <Weight>141g</Weight> <Battery>900mAh</Battery> </PDA> <PDA xsi:type=“extendedPDAType"> <Weight>115g</Weight> <Battery>550mAh</Battery> <Band>Dualband</Band> <Feature>Waterproof</Feature> </PDA> Datatype extendedPDAType is derived from PDAType: Extension with
Band & Feature
User-defined Datatype
M2-69
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Characteristics of a key (
key
)
z Value (combination) must be unique z Value must exist
z Key must be defined as subelement of another element –
following the type definition
Candidates for keys (
field
)
z Elements with simple datatypes only! z Attributes
z Combinations of elements and attributes
Scope can be defined (
selector
)
Reference to key can be defined (
keyref
)
Elements, Attributes and Combinations thereof can be
defined to be unique (
unique
)
z Value (combination) must be unique z Value need NOT exist
Keys
1/2M2-70
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema Namespaces XML 1.0 Introduction XML Schemadefinition
Keys
2/2 <element name="PDACatalog"> <complexType> ...</complexType> <keyname=“typeKey"> <selectorxpath="hc:Producer/hc:PDA"/> <fieldxpath="@name"/> <fieldxpath="@version"/> </key><keyrefname="refToTypeKey" refer="hc:typeKey"> <selectorxpath="hc:Stock/hc:PDAQuantity"/> <fieldxpath="@name"/>
<fieldxpath="@version"/> </keyref>
</element>
PDA Name Version Weight ... PDAQuantity Name Version Quantity <element name="PDACatalog"> <complexType> ...</complexType> <uniquename="uniqueProducerNo"> <selector xpath="hc:Producer"/> <field xpath="@producerNo"/> </unique> </element>
M2-71
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Group of Elements
<groupname="mainData"><sequence>
<element name="Weight" type="string" minOccurs="1" maxOccurs="1"/> <element name="Battery" type="string" minOccurs="1" maxOccurs="1"/> </sequence>
</group>
<complexType name=“PDAType"> <sequence>
<groupref="hc:mainData"/>
<element name="Feature" type="string" minOccurs="1" maxOccurs="10"/> </sequence>
</complexType>
Schema Composition
Within a Schema
1/2M2-72
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema Namespaces XML 1.0 Introduction XML Schemadefinition
Group of Attributes
<attributeGroupname="BatteryAttributeGroup"> <attribute name="type" type="string" default="NiMh"/> <attribute name=“capacity" type="string" use="required"/> </attributeGroup> <complexType name=“BatteryType"> <sequence>...</sequence> <attributeGroupref="hc:BatteryAttributeGroup"/> </complexType>Schema Composition
Within a Schema
2/2M2-73
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Incorporation of other schemata via
include
,
redefine
and
import
z include, redefineand import elements must be subelements
of schemaprior to any other declaration
Include of a Schema –
include
z NS of included schema must be equal to the NS of the including
schema or no NS at all
z The included schema can be used as if it were declared directly
within the including schema
<schema...>
<includeschemaLocation="PDA.xsd"/>...
Schema Composition
Different Schemata 1/2
M2-74
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema
Namespaces XML 1.0
Introduction XML Schemadefinition
Including and Redefining a Schema –
redefine
z Same functionality as include
z In addition, included components (simpleType, complexType, group, attributeGroup)can be newly defined
z New definitions replace the original ones
Import of a Schema –
import
z Imported schema can have an arbitrary NS (could be unequal to
the current one)or none
<importnamespace="http://" http://www.somewhere.else.com" schemaLocation="Producer.xsd"/>... <redefineschemaLocation="PDA.xsd"> <complexType name=“PDAType">....</complexType>... </redefine>...
Schema Composition
Different Schemata 2/2
M2-75
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Schema Modeling Styles
Non-Normative Datamodel of XML Schema Concepts
Legend:
http://www.w3.org/TR/xmlschema-1/
M2-76
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema
Namespaces XML 1.0
Introduction XML Schemadefinition
Schema Modeling Styles
XML Schema Concepts in Practice
Analysis of 1400
Schemata of diverse standard vocabularies
z Open Travel Alliance
(OTA), z Human Resource XML (HR-XML), z W3C, z Global Justice XML, z etc.
P. Kiel, Profiling XML Schema,
M2-77
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Schema Modeling Styles
Relationships /Global vs. Local /Element vs. Type
Relationships
z Realisation by means of nesting or via references
Global Elements/Attribute-Declarations
z Pre-requisite for reuse in the same/another schema
z Root element must be global
Local Element/Attribute-Declarations
z In case that a declaration makes sense only in combination with the
declared type
Local Element Declarations
z Can occur with different structure but the same name in different types
Local Attribute Declarations
z Makes sense since attributes are most often tightly coupled to elements
Three Stereotypical Design Forms
z Russian Doll Design
z Salami Slice Design
z Venetian Blinds Design
Literature
z XMLSchema Best Practices (Roger Costello): www.xfront.com
z P. Kiel, Profiling XML Schema, http://www.xml.com/pub/a/2006/09/20/profiling-xml-schema.html
M2-78
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema Namespaces XML 1.0 Introduction XML Schemadefinition
Nested Element
Declarations
z Local declarations only z Prevents global types
Advantages
z Structure obvious (corresponds to the XML document‘s structure) z Prevents side-effects Disadvantages
z Danger of deep nesting levels
z No reuse of declarations – redundancies z No extensibility in terms of derivation
Schema Modeling Styles
Russian Doll Design
M2-79
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
Global Element Declarations
z Usage of global elements per
reference (ref-Attribute)
z Each global element can be a
root element
Advantages
z Reuse of elements
Disadvantages
z Large numger of global elements
{ Confusing
{ Danger of side-effects in case of
changes to global elements
z No extensibility in terms of derivation
Schema Modeling Styles
Salami Slice Design
M2-80
© 2010 JKU Linz, Institut für Bioinformatik, Arbeitsgruppe Informationssysteme (IFS)
XML Schema
Namespaces XML 1.0
Introduction XML Schemadefinition
Global Type Declarations
z Elements, except the root
element, are declared locally
Advantages
z Reuse of types
{ A named type is available for
each element and attribute
{ Types can be imported from
other schemata
z Extensibility by derivation
<redefine>
Disadvantages
z Large number of global types
{ Confusing