Service-Oriented Computing
What is Markup(encoding)?
Markup is text that is added to the data of a document in
order to convey information about it.
.
The markup instructions are often called
"tags."
Example:
<centre on> This is a <italics on> very serious <italics off> matter.<centre off>
This is a
very
serious
matter.
Historically, the word markup has been used to describe annotation
within a text intended to instruct a typist how a particular passage should be printed or laid out.
What is XML?
XML stands for
eXtensible Markup Language
XML is a markup language for documents containing
structured information
XML is used for describing other languages(i.e WSDL,RDF,
WML)
XML is used for data interchange
XML DTD and XML Schema define rules to describe data
An open W3C standard
A subset of SGML
What is XML?
XML is a “use everywhere” data specification
Documents Configuration
Database Application X
Repository
XML XML
Documents vs. Data
XML is used to represent two main types of things:
– Documents
Lots of text with tags to identify and annotate portions
of the document
– Data
XML and Structured Data
Pre-XML representation of data:
XML representation of the same data:
“PO-1234”,”CUST001”,”X9876”,”5”,”14.98”
<PURCHASE_ORDER>
<PO_NUM> PO-1234 </PO_NUM> <CUST_ID> CUST001 </CUST_ID> <ITEM_NUM> X9876 </ITEM_NUM> <QUANTITY> 5 </QUANTITY>
Benefits of XML
Open W3C standard
Representation of data across heterogeneous
environments
– Cross platform
– Allows for high degree of interoperability
Strict rules
– Syntax
– Structure
9
HTML and XML
XML is not a replacement for HTML.
HTML XML
HTML was designed to display data, with focus on how data looks
XML was designed to transport and store data, with focus on what data is
HTML is used to mark up text so it can be displayed to users
XML is used to mark up data so it can be processed by computers
HTML describes both structure (e.g. <p>, <h2>, <em>) and appearance (e.g. <br>, <font>, <i>)
XML describes only content, or “meaning”
HTML uses a fixed, unchangeable set of tags
10
HTML and XML
HTML and XML look similar, because they are both
SGML
languages (SGML =
Standard Generalized
Markup Language
)
– Both HTML and XML use
elements
enclosed in
tags
(e.g.
<body>This is an element</body>
)
– Both use tag
attributes
(e.g.,
<font face="Verdana" size="+1" color="red">
)
11
HTML and XML
HTML is for humans
– HTML describes web pages
– You don’t want to see error messages about the web pages you visit
– Browsers ignore and/or correct as many HTML errors as they can, so HTML is often sloppy
XML is for computers
– XML describes data
– The rules are strict and errors are not allowed In this way, XML is like a programming language
– Current versions of most browsers can display XML However, browser support of XML is spotty at best
12
HTML vs. XML
<h1> Bibliography </h1>
<p> <i> Foundations of DBs</i>, Abiteboul, Hull, Vianu <br> Addison-Wesley, 1995
<p> <i> Logics for DBs and ISs </i>, Chomicki, Saake, eds.
<br> Kluwer, 1998
<bibliography>
<book> <title> Foundations of DBs </title> <author> Abiteboul </author> <author> Hull </author>
<author> Vianu </author>
<publisher> Addison-Wesley </publisher> ....
.</book>
<book> ... <editor> Chomicki </editor>... </book> ... </bibliography>
HTML tags:
presentation, generic document structure
XML tags:
13
XML-related technologies
DTD (Document Type Definition) and XML Schema are used to define legal XML tags and their attributes for particular purposes
CSS (Cascading Style Sheets) describe how to display HTML or XML in a browser
XSLT (eXtensible Stylesheet Language Transformations) and XPath are used to translate from one form of XML to another
Layout of a typical XML document
Layout of a typical XML document
Components of an XML Document
Processing instructions(prologue)
– Encoding specification (Unicode by default) – Namespace declaration
– Schema declaration
Elements
– Each element has a beginning and ending tag
<TAG_NAME>...</TAG_NAME>
– Elements can be empty (<TAG_NAME />)
Attributes
The Prolog (processing instructions)/XML declaration
It tells the browser or parser that this document is marked up in XML. This prolog is actually a part of HTML as well, but most HTML authors leave it out. In HTML the prolog might look like this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
This tells browser that this document will be using HTML 4.0 Transitional.
But the prolog for an XML document can also contain: – the DTD or schema being used
– declarations of special pieces of text – text encoding
17
The Prolog (processing instructions)/XML declaration
The XML declaration statement is used to indicate that the
specified document is an XML document.
The XML declaration looks like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
The XML declaration is not required by browsers, but is required by most XML processors (so include it!)
XML declaration starts with <? xml, and ends with ?> version="1.0" is required (this is the only version so far)
encoding can be "UTF-8" (ASCII) or "UTF-16" (Unicode), or something else, or it can be omitted
18
Elements and attributes
An XML element is the basic syntactic construct of an XML
document
Attributes and elements are somewhat interchangeable
Example using just elements:
<name>
<first>David</first> <last>Matuszek</last> </name>
Example using attributes:
<name first="David" last="Matuszek"></name>
19
Entities
Five special characters must be written as entities:
& for & (almost always necessary)< for < (almost always necessary) > for > (not usually necessary)
" for " (necessary inside double quotes) ' for ' (necessary inside single quotes)
These entities can be used even in places where they
are not absolutely required
20
CDATA
By default, all text inside an XML document is parsed.
The term CDATA is used about text data that should not be
parsed by the XML parser.
You can force text to be treated as unparsed
character data
by
enclosing it in
<![CDATA[ ... ]]>
Any characters, even
&
and
<
, can occur inside a CDATA
Whitespace inside a CDATA is (usually) preserved
CDATA is useful when your text has a lot of illegal characters
(for example, if your XML document contains some HTML text)
CDATA Example
Some text, like JavaScript code, contains a lot of "<" or "&" characters. To avoid errors script code can be defined as CDATA.
A CDATA section starts with "<![CDATA[" and ends with "]]>":
<script> <![CDATA[
function matchwo(a,b) {
22
Comments
<!-- This is a comment in both HTML and XML -->
Comments are useful for:
– Explaining the structure of an XML document
– Commenting out parts of the XML during development and testing
Comments are not elements and do not have an end tag
The blanks after
<!--
and before
-->
are optional
The character sequence
--
cannot occur in the comment
The closing bracket must be
-->
Well-formed vs. Valid
XML must be
well-formed
(An XML document that obeys the syntax rules is said to be well-formed)– correct syntax
– tags match, tags nest, all characters legal
– parser must reject if not well-formed
XML may be
valid
with respect to a Schema (A well-formed
XML document that conforms to its schema is said to be valid.)24
Rules for Well-formed XML (XML Syntax))
Every XML document must have one and only one root element
Every element must have both a start tag and an end tag, e.g.
<name> ... </name>
– But empty elements can be abbreviated: <break />.
– XML tags may not begin with the letters xml, in any combination of cases
Elements must be properly nested,
e.g. not
<b><i>bold and italic</b></i>
Attribute values must be enclosed in “” or ‘’
Processing instructions are optional
XML is case-sensitive
Namespaces: Overview
An XML namespace is a collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names.
Allow authors to differentiate between tags of the same
name (using a prefix)
– Frees author to focus on the data and decide how to best describe it – Allows multiple XML documents from multiple authors to be merged
Identified by a URI (Uniform Resource Identifier)
– When a URL is used, it does NOT have to represent a live server – To guarantee uniqueness, typically a URI (Uniform Resource
Namespaces: Declaration
xmlns: bk = “urn:mybookstuff.org:bookinfo”
Namespace declaration examples:
Namespace declaration Prefix URI (URL)
xmlns: bk = “http://www.example.com/bookinfo/”
There are two ways to use namespaces:
– Declare a default namespace
Namespaces: Examples
<BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”> <bk:TITLE>All About XML</bk:TITLE>
<bk:AUTHOR>Joe Developer</bk:AUTHOR>
<bk:PRICE currency=‘US Dollar’>19.99</bk:PRICE>
<bk:BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo” xmlns:money=“urn:finance:money”>
Namespaces: Default Namespace
An XML namespace declared without a prefix becomes
the default namespace for all sub-elements
All elements without a prefix will belong to the default
namespace:
<BOOK xmlns=“http://www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE>
Namespaces: Scope
Unqualified elements belong to the inner-most default
namespace.
– BOOK, TITLE, and AUTHOR belong to the default book namespace
– PUBLISHER and NAME belong to the default publisher namespace
<BOOK xmlns=“www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE>
<AUTHOR>Joe Developer</AUTHOR>
<PUBLISHER xmlns=“urn:publishers:publinfo”> <NAME>Microsoft Press</NAME>
Namespaces: Attributes
Unqualified attributes do NOT belong to any namespace
– Even if there is a default namespaceValid XML
XML is
valid
if it declares a DTD/XSD Schema and
conforms to that schema
Schemas: Overview
– DTD (Document Type Definitions)
Not written in XML
No support for data types or namespaces
– XSD (XML Schema Definition)
Written in XML
Supports data types
Schemas: Purpose
Define the “rules” (grammar) of the document
– Data types– Value bounds
A XML document that conforms to a schema
is said to be valid
– More restrictive than well-formed XML
Define which elements are present and in what order
Define the structural relationships of elements
What is a DTD?
A DTD (
Document Type Definition
) defines the structure of a
“valid” XML document
An XML document may have an optional DTD.
Only the elements defined in a DTD can be used in an XML
document
A DTD can be
internal
The DTD is part of the document file
external
The DTD and the document are on separate files An external DTD may reside
Connecting a Document with its DTD
An internal DTD
<?xml version="1.0"?>
<!DOCTYPE db [<!ELEMENT ...> … ]> <db> ... </db>
A DTD from the local file system:
<!DOCTYPE db SYSTEM "schema.dtd">
A DTD from a remote file system:
DTD
An internal DTD
<!DOCTYPE invoice [
<!ELEMENT invoice (sku, qty, desc, price) > <!ELEMENT sku (#PCDATA) >
<!ELEMENT qty (#PCDATA) > <!ELEMENT desc (#PCDATA) > <!ELEMENT price (#PCDATA) > }>
<invoice>
<sku>12345</sku> <qty>55</qty>
<desc>Left handed monkey wrench</desc> <price>14.95</price>
</invoice>
DTD
An referenced external DTD
<?xml version=“1.0”>
<!DOCTYPE invoice SYSTEM “invoice.dtd”>
<invoice>
<sku>12345</sku> <qty>55</qty>
<desc>Left handed monkey wrench</desc>
DTD
An external DTD (invoice.dtd)
<?xml version=“1.0”?>
<!ELEMENT invoice (sku, qty, desc, price) > <!ELEMENT sku (#PCDATA) >
<!ELEMENT qty (#PCDATA) > <!ELEMENT desc (#PCDATA) > <!ELEMENT price(#PCDATA) >
DTD
Data Types
Parsed Character Data
– #PCDATA <!ELEMENT firstname (#PCDATA)
<!ELEMENT lastname (#PCDATA)
Unparsed Character Data
– CDATA <firstname><![CDATA[<b>Jim</b>]]></firstname>
DTD
XML Document
<db><person><name>Alan</name>
<age>42</age>
<email>[email protected] </email> </person>
<person>………</person> ……….
</db>
DTD
<!DOCTYPE db [
<!ELEMENT db (person*)>
<!ELEMENT person (name, age, email)> <!ELEMENT name (#PCDATA)>
<!ELEMENT age (#PCDATA)> <!ELEMENT email (#PCDATA)> ]>
DTD
Occurrence Indicator:
Indicator
Occurrence
(no indicator)
Required
One and only one
?
Optional
None or one
*
Optional,
repeatable
None, one, or more
+
Required,
repeatable
Why You Should Use XSD
Newest W3C Standard
Broad support for data types
Reusable “components”
– Simple data types – Complex data types
Extensible
Inheritance support
Namespace support
XML Schema – Better than DTDs
The purpose of a Schema is to define the legal building
blocks of an XML document, just like a DTD.
XML Schemas
are easier to learn than DTD
are extensible to future additions
are richer and more useful than DTDs are written in XML
A Simple XML Document
Look at this simple XML document called
"note.xml":
<?xml version="1.0"?> <note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
A DTD File
The following example is a DTD file called "note.dtd" that defines the elements of the XML document ("note.xml"):
<!ELEMENT note (to, from, heading, body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
An XSD
The following example is an XML Schema file called "note.xsd" that defines the elements of the XML document ("note.xml"):
<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3schools.com"> <xs:element name="note"> <xs:complexType> <xs:sequence>
<xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence>
</xs:complexType> </xs:element>
A Reference to a DTD
This XML document has a reference to a DTD: <?xml version="1.0"?>
<!DOCTYPE note SYSTEM
"http://www.w3schools.com/dtd/note.dtd"> <note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
A Reference to an XSD
This XML document has a reference to an XML Schema:
<?xml version="1.0"?> <note
xmlns="http://www.w3schools.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3schools.com note.xsd"> <to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
49
Another well-structured example
<novel>
<foreword>
<paragraph> This is the great American novel. </paragraph>
</foreword>
<chapter number="1">
<paragraph>It was a dark and stormy night. </paragraph>
<paragraph>Suddenly, a shot rang out! </paragraph>
50
XML as a tree
An XML document represents a hierarchy; a hierarchy is a tree
novel
foreword chapter
number="1"
paragraph paragraph paragraph
This is the great American novel.
It was a dark and stormy night.
Displaying XML
XML documents do not carry information about how to
display the data
We can add display information to XML with
– CSS (Cascading Style Sheets)– XSL (eXtensible Stylesheet Language) --- preferred
XML, HTML,
… XML
XSLT
XML Applications
Computer-computer communications.
Enterprise Application Integration.
Content Management Systems.
Wireless Communication Systems.
PDAs and Handheld Devices.
eLearning and Educational Services.
Web Services.