Tim's XML Braindump I
An Overview of XML
Stolen from lots of online sources!
compiled by Tim
Outline
XML in a Nutshell
What does XML mean to you?
What’s Real and What’s Hype?
XML Technologies (XSLT/XPath)
XML and Object “databases”
Something for the coders
XML in a Nutshell
Structured data in a text file via markup
Looks like HTML but isn't
Human-legible but not designed for that
New, but not that new
License-free, platform-independent and well-supported
A family of technologies
Related to Unicode, URNs, Web Services
Uses of XML
Configuration files
Gnome desktop, Pedro
Data interchange
OpenOffice documents, MAGE-ML,
maxdML
Web services and B2B transactions
RSS, SOAP, IFX (financial exchange)
Why should you know about XML?
XML solves real world problems
Standards are multiplying like rabbits (both good and bad)
Everyone is using it - companies and academia
It may even be interesting!
The rant - what XML is not
Not a drop-in solution
Not an alternative to RDBMS
Not for storing 'bulk' data
Not self-describing (though it can be self-validating)
Not pretty
Not for end-users
The rant - the witty quote
“XML is not the answer to all the world’s problems—it creates new problems, that are awfully damn
interesting to solve.”
Simon St. Laurent,
author of XML: A Primer,
on the xml-dev mailing list
Anatomy of an XML Document
<?xml version="1.0"?>
<beerlist>
<beer name=”Old Speckled Hen”>
<abv>5.2</abv>
<brewery>Moorland</brewery>
</beer>
<beer name=”Chocolate Stout”>
<abv>4.5</abv>
<brewery>Youngs</brewery>
</beer>
<!-- Beer is your friend -->
</beerlist>
Anatomy of an XML Document
<?xml version="1.0"?>
<beerlist>
<beer name=”Old Speckled Hen”>
<abv>5.2</abv>
<brewery>Moorland</brewery>
</beer>
<beer name=”Chocolate Stout”>
<abv>4.5</abv>
<brewery>Youngs</brewery>
</beer>
<!-- Beer is your friend -->
</beerlist>
Prolog
Text Closing tag
Tag with attribute/value pair Document element tag
Comment
Well formed and valid XML
To be XML a document must be “well formed”.
eg:
Has an <?XML ...> declaration on line 1
Only one document element
All tags nested properly and closed
All tag names are valid
No special characters in the text fields
Additionally XML may be validated
Validate against a DTD specific to a document type.
DTD may be in the document or in a separate file.
<?xml version="1.0" standalone="yes"?>
<DOCTYPE beerlist [
<!ELEMENT beerlist (beer)*>
<!ATTLIST beerlist name CDATA #IMPLIED>
<!ELEMENT beer (abv, brewery)>
<!ELEMENT abv (#PCDATA)>
<!ELEMENT brewery (#PCDATA)>
]>
XML Schemas - why?
Various limitations of DTDs
DTD itself is not in XML format - more work for parsers
Does not express data types (weak data typing)
No namespace support
Document can override external DTD definitions
No DOM support
XML Schema is intended to resolve these issues
but ... DTDs are going to be around for a while
XML Schema basics
Schema is a separate XML document
A document maps namespaces to schemas.
Pedro or XMLSpy can 'animate' a schema.
A schema document can be validated
against the schema-schema!
<!-- XML Schema schema for XML Schemas : Part 1: Structures -->
<xs:schema targetNamespace="http://www.w3.org/2001/XMLSchema" blockDefault="#all"
elementFormDefault="qualified" version="1.0" xml:lang="EN">
<xs:annotation>
<xs:documentation>
Part 1 version: Id: structures.xsd,v 1.2 2004/01/15 11:34:25 ht Exp Part 2 version: Id: datatypes.xsd,v 1.3 2004/01/23 18:11:13 ht Exp </xs:documentation>
</xs:annotation>
<xs:annotation>
<xs:documentation source="http://www.w3.org/TR/2004/PER-xmlschema-1- 20040318/structures.html">
The schema corresponding to this document is normative, with respect to the syntactic constraints it expresses in the XML Schema language. The documentation (within
<documentation> elements) below, is not normative, but rather highlights important aspects of the W3C Recommendation of which this is a part
</xs:documentation>
</xs:annotation>
<xs:import namespace="http://www.w3.org/XML/1998/namespace"
schemaLocation="http://www.w3.org/2001/xml.xsd">
<xs:annotation>
<xs:documentation> Get access to the xml: attribute groups for xml:lang as declared on 'schema' and 'documentation' below </xs:documentation>
</xs:annotation>
</xs:import>
<xs:complexType name="openAttrs">
...
Searching and transforming
XPath
Navigate XML document with UNIX-like path expressions
XSLT
Transform one XML document into another (uses XPath)
XSLT is written in XML
Turing-complete : ie you can
theoretically specify any operation
Can XML be used for databasing?
No formal design methodology
No normal forms
Not compatible with relational
models, or at least not in the general case.
Inefficient for large files
XML Databases
Xindices (formarly dbXML)
Manages a collection of documents
Efficient searching
Built-in XSLT
Transactional
But a large collection of documents is not the same as storing a large XML document.
Can we edit documents in the DB?
More XML file types
XHTML
RSS
SMIL
SVG
MAGE-ML
SOAP
see http://www.oasis-open.org
For the Programmers -SAX
SAX and DOM are the two
technologies for manipulationg XML.
SAX is stream-based.
You basically need to make a state- machine to parse your document
Read-only
For the Programmers - DOM
Navigate a document in memory using various methods.
Can modify and re-write the document
May depend on SAX to load files
May utilise XPATH expressions
Eg. Perl:
use XML::DOM
my $parser = new XML::DOM::Parser;
my $doc = $parser->parse($xml_as_string) or die "failed to parse XML!\n";
my @beers = $doc->getElementsByTagName('beer', 1);
Loads of other XML modules, including XML::Twig which
combines aspects of SAX and DOM.
Conclusion