XML (eXtensible Markup Language)
Nan Niu ([email protected]) CSC309 -- Fall 2008
22
Last Week
• DHTML
– Modifying DOM – Event bubbling
• Applets
33
HTML Deficiencies
• Fixed set of tags
– No standard way to create new formatting tags
• Unsuitable for some Web users – Interpreting the meaning of the content
• Better search engines
– Standard way to represent content/data
• B2B transactions
• AJAX
– Standard way to invoke services
• Encode function names and arguments
44
XML (www.w3.org/TR/REC-xml)
• Language for creating markup languages (meta-language)
• Provides a simple and universal way of storing any textual data
– Considered a universal data interchange language
• Examples – XHTML
– VoiceXML (for speech) – SOAP
– WSDL
55
XML Advantages
• Can use a single parser for all XML documents
• Simplifies application writing
– Ability to detect syntactic and structural errors via DTD or Schema
• Independent programs can check document structure
• Allow standardized data exchange
66
Syntax of XML
• Two distinct levels – Low-level syntax rules
• Imposed on all XML documents
– Structural syntactic rules
• By DTDs (document type definitions) or XML schemas
• Specify the allowed tags and attributes, the order or appearance, and arrangements
• Text-based
77
Low-level Syntax Rules
• Begin with an XML declaration
• XML names are used to name elements and attributes (case sensitive)
• Define a single root element
– All other elements must be nested inside – The root element of XHTML is html• Elements with content must have a closing tag
– Use <element_name /> for those without• XML elements (tags) can have attributes
– Specified with name/value assignments– Values must be enclosed by (single or double) quotes Well formed if document adheres to the above rules
88
Typical XML Document
• Instruction for XML processors
<?xml version=“1.0” encoding=“ISO-8859-1”?>
<?xml-stylesheet type=“text/css” href=“resume.css”?>
• Elements
<tag_name attr1=“value1”…>…</tag_name>
• Entity references
– Similar to macros, e.g., (space)
• Comments
– <!-- This is a comment -->
• DOCTYPE declaration
– Identifies the specific document class
<!DOCTYPE Catalog SYSTEM “cd.dtd”>
99
DTD (Document Type Definition)
• Define a set of structural rules called declarations
–
A set of elements allowed in the document
• Names of elements, attributes, and entities –
How and where the elements are to appear
• Where elements and attributes may occur
• How elements fit together
• Determine the XML document structure
• W3C online validation –
http://validator.w3.org/Valid if document is both well-formed and conforms to its DTD
1010
Declaration Keywords
• !ELEMENT
• !ATTLIST
• !ENTITY
• !NOTATION
– Used for formal declaration of Processing Instructions (PI) targets
– (NOT discussed here)
1111
<!ELEMENT tag-name content>
• Declares a new element tag
• tag-name
– Must start with letter or “-” or “.” with no white spaces – May use “|” (OR) to group multiple tag-names
• content
– EMPTY: no content is allowed
– ANY: any content is allowed (defeats the purpose of DTD) – #PCDATA: parsable character data; tags inside the text
will be treated as markup and entities will be expanded – mixed:mixing children with #PCDATA
– children (see next slide)
1212
Content Definition: children
• (…)
– Delimits a group
• A
– A must occur, one time only
• A+– A must occur, one or more times
• A?– A may occur zero or one
• A*time
– A may occur zero or more times
• A | B
– Either A or B must occur
• A , B
– Both A and B must occur, in that order
1313
DTD Example (I)
• <!ELEMENT a EMPTY>
– Illegal
• <a>With content</a>
– Legal
• <a></a> or <a />
• <!ELEMENT name (#PCDATA)>
– <name>John</name>
• <!ELEMENT a (#PCDATA|a)+>
– Mixed
• <a> text <a>more</a> </a>
1414
DTD Example (II)
• <!ELEMENT (b|c|d) EMPTY>
– All of b, c, d are empty tags
• <!ELEMENT a (b,(c|d)+)+>
– Illegal
• <a><b /><b /></a>
• <a><c /><d /></a>
– Legal
• <a><b /><c /><d /><b /><d /></a>
1515
<!ATTLIST tag-name att-name att-type [default-value] ...>
• Associate name-value pairs with elements
• att-type
– CDATA• Character data (not parsable) – NMTOKEN (or NMTOKENS)
• Valid XML name (or names) – ID
• Unique identifier – IDREF
• Valid ID of an element – NUMBER
– Enumerates, i.e. explicit set of possible values
• default-value
– A value – #REQUIRED – #IMPLIED• Optional – #FIXED value
• Value cannot be changed
1616
More DTD Example
• DTD
<!ELEMENT student EMPTY>
<!ATTLIST student
id ID #REQUIRED
name CDATA #REQUIRED
gender (male | female) #IMPLIED
dept CDATA #FIXED “cs”
• XML Document
><student id=“s01” name=“Chris” />
1717
Element vs. Attribute
• Element
– Data can be considered an independent object – Data is related via a parent/child relationship to another
piece of information
– Item needs to occur multiple times – Ordering is important
• Attribute
– Information that describes other information, such as a status or id
– Limit values to a predefined list – Minimize the file size of target documents
• More detail at
http://xml.coverpages.org/draft-stuhecelemvsattrib- 03-20020316.pdf1818
Elements vs. Attributes
Properties of an Element (Sub-)Elements represent
parts of an Element
Data of secondary importance; often metadata.
Natural, core content, which would generally appear in every printout/display
Can only appear once in an element
Must appear in the order specified in the schema, but may appear several times.
Used for "atomic" data items
Structured and simple data
Can contain only strings, or lists of strings
Can have child Elements nested within them
Attributes Elements
1919
Example: Simple XML Email App.
<?xml version="1.0"
encoding="ISO-8859-1" ?>
<!DOCTYPE email SYSTEM
“email.dtd” >
<email date=”September 20, 2004”>
<from>Jani</from>
<to>Tove</to>
<subject>Reminder</subject>
<body>Don't forget me this weekend!</body>
</email>
<!ELEMENT email (from,to,subject,body)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT body (#PCDATA)>
<!ATTLIST email date #CDATA
#REQUIRED>
2020
Example: an XML library application
• Write an XML application for storing information about books in a library.
• Include the following information:
– title – author – publisher – description
– type (one of fiction,non-fiction,cookbook,technical) – ISBN
– book id for internal purposes.
– set of related books
– flag the best of the related books
2121
library.dtd
<!ELEMENT library (book+)>
<!ELEMENT book (title, author+,publisher,description,related*)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author ((firstname, lastname)|(lastname, firstname))>
<!ELEMENT publisher (#PCDATA)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT related EMPTY>
<!ATTLIST book bookid ID #REQUIRED
type (fiction|non-fiction|cookbooks|technical) #REQUIRED isbn CDATA #REQUIRED>
<!ATTLIST related bookref IDREF #REQUIRED>
<!ATTLIST related class CDATA #IMPLIED>
2222
<!ENTITY [%] name entity-value>
• % is optional
– General entities: without % – Parameter entities: with %
• In DTD
– <!ENTITY threequarters “¾”>
– <!ENTITY % versionnumber “4.3.2.1”>
– <!ENTITY version “Version %versionnumber;”>
• In XML Document
– <a> &version; and &threequarters; </a>
2323
<!DOCTYPE>
• <!DOCTYPE rootelement PUBLIC|SYSTEM [name]
URL>
– Associate the XML document to a specific document class – Root element of the XML document must match
rootelement – Examples
• <!DOCTYPE Catalog SYSTEM “cd.dtd”>
rootelement = Catalog url = cd.dtd
• <!DOCTYPE html PUBLIC “-//W3C//DTD HTML 4.01//EN”
“http://www.w3.org/TR/html4/strict.dtd”>
rootelement = html
name = -//W3C//DTD HTML 4.01//EN url = http://www.w3.org/TR/html4/strict.dtd
2424
XML Validation
Xerces validator
java -cp /u/csc309h/lib/xerces.jar:/u/csc309h/lib Validator -v email.xml
–v is very important; otherwise it does not print any
errors
2525
XML Reference Architecture
XML File XML Processor Application
2626
DOM Sample
•
DOM (Document Object Model)
•
Generates tree-like structure of XML document in memory
•
Programs transverse tree
2727
SAX Sample
•
SAX (Simple API for XML)
•
www.saxproject.org
•
Event based
•
Trigger events as XML is parsed
•
Programs register for events
2828
Building Internet Applications
(http://philip.greenspun.com/seia/basics) 1. Develop a data model
• What information are you gonna store and how will you represent it?
2. Develop a collection of legal transactions on that model, e.g., inserts and updates
3. Design the page flow
• User interactions; leading to legal transactions
• An Internet service lives or dies by Steps 1-3
4. Implement the individual pages
– Query the data model, wrap information in XHTML, and return the combined result to the user
– Intellectually uninteresting (also from an engineering point of view)
– However, there’s where you have a huge range of technology choices