eXtensible Markup Language (XML)
Full text
(2) eXtensible Markup Language (XML) in HTML .... in XML .... <html> <h1>Bibliography</h1> <ol> <li><i>Foundation of Databases</i>, <b>Abiteboul, Hull</b>, 1995</li> <li><i>Database Systems</i> <b>Elmasri, Navathe</b>, 1994</li> </ol> </html>. <bibliography> <book> <title>Foundation of Databases</title <author>Abiteboul</author> <author>Hull</author> <year>1995</year> </book> <book> <!-- continues --> </book> </bibliography>. A simple, very flexible and extensible text data format “extensible” because the markup format is not fixed like HTML It lets you design your own customised markup. XML is a language that describes data It separates presentation issues from the actual data.
(3) XML: Tags, tags, tags Consider the following snippet of information from a staff list: LName Edgar Edmond Edmonds. Title Miss Dr Dr. FName Pam David Ian. School Optometry Information Systems Physical Sciences. Campus KG GP GP. Room B501 S842 M206. In XML ... <Phonebook> <Entry> <Entry> <LastName>Edmond</LastName> <LastName>Edgar</LastName> <Title>Dr</Title> <Title>Miss</Title> <FistName>David</FirstName> <FistName>Pam</FirstName> <School>Information Systems</School> <School>Optometry</School> <Campus>GP</Campus> <Campus>KG</Campus> <Room>S842</Room> <Room>B501</Room> </Entry> </Entry> <!-- Entry continues ... --> </Phonebook>.
(4) Why XML? – Background Early Web Used to publish documents to be read by humans HTML was designed for the purpose. Today’s Web Many business activities are performed on the Web Dynamic interactions: Web app ⇔ people / Web app ⇔ Web app Web becomes a platform for data exchange XML provides a simple, cross-platform data format. Web contains vast amount of data published in HTML format Many programs process or analyse such data HTML changes ... (when data inside does not) → the program that reads the HTML page must change too XML provides a long-term, reliable data format for publishing.
(5) Why XML? Benefits of using XML in document (data) exchange Self-describing, modular and portable data A common, widely accepted data representation language Standard supports available for creating/parsing XML docs Standard supports for checking validity of data Efficient search of business information standard support for querying XML docs quick and simple search (XPath) more comprehensive keyword + structure based search possible as well (XQuery). Extensible document descriptions XML is flexible (cf. relational tables)! reuse, adaptation of existing documents.
(6) Separating the Content from Presentation XML <?xml version=”1.0” ?> <?xml-stylesheet type=”text/css” href=”staffcard.css” ?> <staff> <name>Helen Paik</name> <title>Lecturer, UNSW</title> <email>hpaik@cse</email> <extension>54095</extension> <photo src=”me.gif” /> </staff>. CSS staff{background-color: #cccccc; ...} name{display: block; font-size: 20pt; ... } title{display: block; margin-left: 20pt;} email{display: block; font-family: monospace; extension{display: block; margin-left: 20pt;} Helen Paik (CSE, UNSW). COMP9321, 09s2. Week 2. 6 / 86.
(7) Separating the Content from Presentation Linking XML with its style (presentation) instruction: CSS: <?xml-stylesheet type=”text/css” href=”staffcard.css” ?> XSLT: <?xml-stylesheet type=”text/xml” href=”staffcard.xsl”?> Most browsers now have built-in support for both. <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.1"> <xsl:template match="/"> <HTML><BODY><TABLE border="1"> <xsl:apply-templates select="staff"/> </TABLE></BODY></HTML> </xsl:template> <xsl:template match="staff"> <TR><TD>Name</TD><TD><xsl:value-of select="name"/> </TD></TR> <TR><TD>Email</TD><TD><xsl:value-of select="email"/> </TD></TR> <TR><TD>Ext.</TD><TD><xsl:value-of select="extension"/> </TD></TR> </xsl:template> Helen Paik (CSE, UNSW). COMP9321, 09s2. Week 2. 7 / 86.
(8) XML Applications Like any other good inventions, XML is now used for things that are far beyond its creators original imagination. A set of ’tags’ that are developed for specific types of documents. e.g., Chemical Markup Language (CML). <atom id="caffeine_karne_a_1"> <float builtin="x3" units="A">-2.8709</float> <float builtin="y3" units="A">-1.0499</float> <float builtin="z3" units="A">0.1718</float> <string builtin="elementType">C</string> </atom>.
(9) XML Applications Math Markup Language (MathML) <mrow> <apply><eq/> <ci>A</ci> <matrix> <matrixrow><ci>x</ci><ci>y</ci></matrixrow> <matrixrow><ci>z</ci><ci>w</ci></matrixrow> </matrix> </apply> </mrow>. Really Simple Syndication (RSS) <rss version="0.91"> <channel> <title>CNN.com</title> <item> <title>July ends with 76 ... killed</title> <link>http://www.cnn.com/.../story.html</link> <description>Three U.S. soldiers were ...</description> </item>. A=. X Y Z W.
(10) XML Applications Scalable Vector Graphics (SVG) delivers two-dimensional graphics in XML to the Web. The following svg displays four black rectangles on a rectangular canvas with a blue border. <?xml version="1.0" standalone="no"?> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG20010904//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd"> <svg width="5cm" height="4cm" xmlns="http://www.w3.org/2000/svg"> <desc>Four separate rectangles</desc> <!-- rect 1 --> <rect x="0.5cm" y="0.5cm" width="2cm" height="1cm" /> <!-- rect 2 --> <rect x="0.5cm" y="2cm" width="1cm" height="1.5cm" /> <!-- rect 3 --> <rect x="3cm" y="0.5cm" width="1.5cm" height="2cm" /> <!-- rect 4 --> <rect x="3.5cm" y="3cm" width="1cm" height="0.5cm" /> <!-- Show outline of canvas using ’rect’ element --> <!-- rect 5 --> <rect x=".01cm" y=".01cm" width="4.98cm" height="3.98cm" fill="none" stroke="blue" stroke-width=".02cm" /> </svg>. http://www.w3.org/TR/SVG/ http://www.adobe.com/svg/viewer/install/main.html.
(11) XML Applications More applications ...? Web application server configurations Web services - SOAP and WSDL (the whole standard is based on XML) B2B integration protocols (e.g., RosettaNet).
(12) XML is ... Is a Language there is a grammar, and it can be parsed by machines. Is a Markup Language XML looks a bit like HTML (tags). But it describes what things are, not what they are supposed to do. Is eXtensible you can define more words and add to the language. XML is for structuring data. XML is for describing data. XML is text, but isn’t meant to be read. XML is verbose by design. XML is a family of technologies. XML is license-free, platform-independent and well-supported. XML is NOT a programming language. it is not something you can ’compile’.
(13) The XML Family XML: a language used to describe information. DOM: a programming interface for accessing and updating documents. DTD/XML schema: a language for specifying the structure and content of documents. XSLT: a language for transforming documents. XPath: a query language for navigating XML documents. XPointer: for identifying fragments of a document. XLink: generalises the concept of a hypertext link. XInclude: for merging documents. XQuery: a language for making queries across documents. RDF: a language for describing resources..
(14) Quick XML Syntax An XML document is a tree .... <office> <phone>1235</phone> <person> <name>Alan</name> <age>29</age> <phone>2044</phone> </person> <person> <name>Sue</name> <age>45</age> <phone>2043</phone> </person> </office>. office. phone. person. person. 1235. name. age. phone. name. age. phone. Alan. 29. 2044. Sue. 45. 2043.
(15) Quick XML syntax All XML documents must have ’a’ root element All XML elements must have a closing tag Empty element tags end with / >. XML tags are case sensitive (NAME vs. Name) All XML elements must be nested (<p><q></p></q> ??) Element Naming letters, numbers, and other characters must not start with a number, ’. (period)’ or ’- (hyphen)’ must not start with ’xml’ (or XML or Xml ..) cannot contain spaces. Attribute values must always be quoted (single or double) Comments in XML: <!-- This is a comment -->.
(16) Attributes in XML tags LName Edgar .... Title Miss .... FName Pam .... School Optometry .... Campus KG .... Room B501 .... Phonebook with Attributes <Phonebook> <Entry entryNumber="001"> <Name Title="Miss"> <Last>Edgar</Last> <First>Pam</First> </Name> <School Campus="KG">Optometry</School> <Room Building="B" Level="5">01</Room> </Entry> </Phonebook>. Attribute order is not significant Sometimes using attributes can make an XML document concise.
(17) Attributes in XML tags LName Edgar .... Title Miss .... FName Pam .... School Optometry .... Campus KG .... Room B501 .... Phonebook with many attributes ... <Phonebook> <Entry entryNumber="001"> <Name Title="Miss" LName="Edgar" FName="Pam"/> <Location Campus="KG" School="Optometry" Building="B" Room="501"/> </Entry> </Phonebook>. Avoid using too many (loses structure, more parsing effort ...).
(18) Entity References The character data inside an element must not contain certain characters with special meanings (e.g., < means start of a tag) You must escape the characters using entity references XML predefines exactly five entity references: < - The less then sign (<) & - The ampersand (&) > - The greater than sign (>) " - The straight double quotation marks (") ' - The apostrophe, single quote (’) <image source=’koala.gif’ width=’122’ height=’66’ alt = ’Powered by O'Reilly Books’ />.
(19) CDATA section Sometimes the character data of an element might contain too many characters that need to be escaped (e.g., chunk of other XML parts or HTML code). CDATA section lets you enclose the character data as literal.. Example <p>You can use default <code>xmlns</code> to attribute to avoid having to add the svg prefix to all your elements:</p> <![CDATA[ <svg xmlns="http://www.w3.org/2000/svg" width="12cm" height="12cm" <eclipse rx="110" ry="130"/> <rect x = "4cm" y="1cm" width="3cm" height="6cm"/> </svg> ]]>. Everything between <!CDATA[ and ]]> is treated as raw characters, not markups..
(20) Defining the document structure Phonebook.xml <Phonebook> <Entry> <LastName Title=”Miss”>Edgar</LastName> <FirstName>Pam</FirstName> <School>Optometry</School> <Campus>KG</Campus> <Room>B501</Room> <Extension>5695</Extension> </Entry> <!– and so on –>. How would we communicate the nature of this document? If we were to describe the document to someone over a phone line, we might say: 1 2 3 4. It’s a kind of internal (staff) phone book. It’s made up of a number of individual entries. Each entry contains the staff members’s last name, title, first name ... A person’s title must be Miss or Mrs or Ms or Mr or Dr or Prof ....
(21) DTD (Document Type Definitions) DTDs are used to ensure that XML docs adhere to a ”agreed” structure. DTD can be declared:. within XML: Internal DTD <?xml version="1.0"?> <!DOCTYPE Login [ <!ELEMENT Login (Username,Password) <!ELEMENT Username (#PCDATA)> <!ELEMENT password (#PCDATA)> ]> <Login> <Username>hpaik</Username> <Password>IwillNeverTell</Password> </Login>. outside XML: External DTD <?xml version="1.0"?> <!DOCTYPE Login SYSTEM "login.dtd"> <Login> ><Username>hpaik</Username> <Password>IwillNeverTell</Password> </Login>.
(22) Phonebook.xml with Internal DTD <?xml version="1.0"?> <!DOCTYPE Phonebook [ <!ELEMENT Phonebook (Entry)+ > <!ELEMENT Entry (LastName, FirstName, School, Campus, Room, Extension)> <!ELEMENT LastName (#PCDATA)> <!ELEMENT FirstName (#PCDATA)> <!ELEMENT School (#PCDATA)> <!ELEMENT Campus (#PCDATA)> <!ELEMENT Room (#PCDATA)> <!ELEMENT Extension (#PCDATA)> <!ATTLIST LastName Title (Miss | Ms | Mrs | Mr | Dr | Prof) #REQUIRED> ]> <Phonebook> <Entry> <LastName Title="Miss">Edgar</LastName> <FirstName>Pam</FirstName> <School>Optometry</School> <Campus>GP</Campus> <Room>B501</Room> <Extension>5695</Extension> </Entry> <!-- more entries not shown ... --> </Phonebook>.
(23) Phonebook.xml with External DTD. Phonebook.dtd Phonebook.xml <?xml version="1.0"?> <!DOCTYPE Phonebook SYSTEM "Phonebook.dtd"> <Phonebook> <Entry> <LastName Title="Miss">Edgar</LastName> <FirstName>Pam</FirstName> <!-- rest of the entries --> </Phonebook>. Phonebook.dtd <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ATTLIST. Phonebook (Entry)+ > Entry (LastName, FirstName, School,Campus, Room, Extension)> LastName (#PCDATA)> FirstName (#PCDATA)> School (#PCDATA)> Campus (#PCDATA)> Room (#PCDATA)> Extension (#PCDATA)> LastName Title (Miss | Ms | Mrs | Mr | Dr | Prof) #REQUIRED>.
(24) Defining XML Content: Elements A Book <book> <author> <name>J.K. Rowling</name> </author> <detail> <series>Seventh</series> <title>Harry Potter and the Deathly Hallows</title> </detail> </book> Creating Elements: <!ELEMENT book (author, detail)> <!ELEMENT author (name)> <!ELEMENT name (#PCDATA)>. <!ELEMENT detail (series, title)> <!ELEMENT series (#PCDATA)> <!ELEMENT title (#PCDATA)>.
(25) Defining XML Content: Modifers A Book <book> <author> <!– more than one authors? –> <name>E. Harold</name> <name>S. Means</name> </author> <detail> <!– not every book is in a series –> <title>XML in a Nutshell</title> </detail> </book> 1. ? : optional element (only once). 2. + : mandatory element (1 or more). 3. * : optional element (0 or more). <!ELEMENT book (author, detail*)> <!ELEMENT author (name+)> <!ELEMENT name (#PCDATA)>. <!ELEMENT detail (series?, title)> <!ELEMENT series (#PCDATA)> <!ELEMENT title (#PCDATA)>.
(26) Defining XML Content: Choices, Empty Element Choices <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT. newbooks (book+)> book (author+, detail*)> author (name | penname)> name (#PCDATA)> penname (#PCDATA)> detail ((series?, title) | (publisher, release*))> series (#PCDATA)> title (#PCDATA)> publisher (#PCDATA)> release (#PCDATA)>. Empty Element Content <!ELEMENT BR EMPTY> <BR/> is called an empty element.
(27) Defining XML Content: Mixed content, Any Mixed content: mixture of elements and text <!ELEMENT message (#PCDATA | bold | italic)*> <message>You <italic> really <bold>must</bold> try this delicious <bold>new</bold> recipe for <italic>pudding </message>. ANY : Any predefined element could be included <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT. book (author+, description, detail*)> author (name+)> name (#PCDATA)> description ANY> detail (series?, title)> series (#PCDATA)> title (#PCDATA)>.
(28) Defining XML Content: Creating Attributes <book> <author period=”classical” category=”children”> <name type=”normal”>J.K. Rowling</name> </author> <title>Harry Potter and the Half-Blood Prince</title> </book> Creating Attributes: <!ELEMENT book (author, title)> <!ELEMENT author (name+)> <!ELEMENT name (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ATTLIST name type (normal | penname) ”normal” #REQUIRED> <!ATTLIST author period CDATA #REQUIRED category CDATA #IMPLIED>.
(29) Defining XML Content: Creating Attributes Default values for attributes: The default postcode in an address is to be 4001, state must be QLD. <!ATTLIST Address Postcode CDATA ”4001” State CDATA #FIXED ”QLD”>. The above definition has the following effects on the source doc. <Address /> → <Address Postcode=”4001” State=”QLD”> <Address Postcode=”4010” State=”QLD”/> → (no error) <Address Postcode=”4001” State=”NSW”/> → (error).
(30) Well-formedness and Validity of XML Well-formedness Rules: Open and close all tags Empty-element tags end with /> There is a unique root element Elements may not overlap Attribute values are quoted < and & are only used to start tags and entity references, respectively Only the five predefined entity references are used Validity Rules: Well-formed Must have a Document Type Definition (DTD) Must comply with the constraints specified in the DTD.
(31) XML Namespaces XML elements can have any names. What if a name could mean two different things (ie., name clash)? The following two XML documents that describe student information.. From University X:. From University Y:. <student> <id>12345</id> <name>Jeff Smith</name> <language>C#</language> <rating>9.5</rating> </student>. <student> <id>534-22-5252</id> <name>Bob Citizen</name> <language>Spanish</language> <rating>3.2</rating> </student>. How could a program distinguish the different elements?.
(32) Another example ... <Author>. <Book>. <Name>. <Name> <ISBN>. <First>. <Ed>. <Last> <Email>. <Books> <Book> <Name> <ISBN> <Ed> <Author> <Name> <First> <Last> <Email>.
(33) XML Namespaces A namespace is a set of names in which all names are unique. The name ’title’ can now be identified as: Book.title, Project.title, Employee.title .... namespace:Book ID price. title. publisher. author. namespace:Project. title. due-date ID manager. auditor. budget. namespace:Employee ID firstname. salary. lastname. title. These names are called “qualified names”. XML namespaces give elements and attributes a unique name across the Internet. XML namespaces enable programmers to process the tags and attributes they care about and ignore those that don’t matter to them..
(34) Previous examples can now be ... The previous examples can now have qualified names:. namespace:UniversityX/Student ID. <Books>. name. rating. Book Namespace <Book> <Name>. language. <ISBN>. namespace:UniversityY/Student ID rating. name. <Ed> <Author> <Name>. language. <First> <Last> <Email>. Author Namespace. Week 2. 34 / 86.
(35) XML Namespace Syntax xmlns:<prefix>=’namespace identifier’ eg., <books xmlns:xdc="http://www.xml.com/books"> not a normal XML attribute (treated differently) the URI must be unique, but may not represent a ’useful’ resource the prefix is by convention or author’s choice Consider the following XML document: painting.xml <catalog> <rdf:RDF xmlns:rdf="http://www.w3.org/TR/REC-rdf-syntax#"> <rdf:Description xmlns:dc="http://purl.org/dc/" about="painting.xml"> <dc:title>Impressionist Paintings</dc:title> <dc:creator>Elliotte Rusty Harold</dc:jcreator> <dc:description>impressionist paintings</dc:description> <dc:date>2000-08-22</dc:date> </rdf:Description> </rdf:RDF> <painting> <title>Memory of the Garden at Etten</title> <artist>Vincent Van Gogh</artist> <date>1888</date> <descrption>Two women look to the left.</description> </painting> </catalog>.
(36) Default namespace: a namespace without a prefix ... <html xmlns="http://www.w3.org/HTML/1998/html4" xmlns:xdc="http://www.xml.com/books"> <head> <title>Book Review</title> </head> <body> <xdc:bookreview> <xdc:title>XML: A Primer</xdc:title> <table> <tr align="center"> <td>Author</td> <td>Price</td> <td>Pages</td> <td>Date</td> </tr> <tr align="left"> <td><xdc:author>Simon St. Laurent</xdc:author></td> <td><xdc:price>31.98</xdc:price></td> <td><xdc:pages>352</xdc:pages></td> <td><xdc:date>1998/01</xdc:date></td> </tr> </table> </xdc:bookreview> </body> </html>.
(37) Scope of XML Namespaces An XML namespace declaration remains in scope for the element on which it is declared and all of its descendants (unless it is overridden) The scope includes the element the namespace is declared on. <foo:A xmlns:foo="http://www.foo.org/"> <foo:B> <foo:C xmlns:foo="http://www.bar.org/"> <foo:D>abcd</foo:D> </foo:C> </foo:B> </foo:A> http://www.rpbourret.com/xml/NamespacesFAQ.htm#s6.
(38) Scope of XML Namespaces Multiple namespaces can be in scope as long as they use different prefixes. <A xmlns:foo="http://www.foo.org/" xmlns:bar="http://www.bar.org/"> <foo:B>abcd</foo:B> <bar:C>efgh</bar:C> </A>. Unqualified attributes do not belong to any namespace, even if a default namespace is in scope. <foo:A xmlns:foo="http://www.foo.org/"> <bar:B D="whoami" xmlns:bar="http://www.bar.org/"> <C>abcd</C> </foo:B> </foo:A>.
(39) DTD vs XML Schema Consider the following XML document. <Person> <Name>Frank</Name> <Age>21</Age> <Sex>M</Sex> </Person> We could specify documents of this form using a DTD: <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT. Person (Name,Age,Sex) > Name (#PCDATA) > Age (#PCDATA) > Sex (#PCDATA) >. How about constraints like : Age must be a number, Sex is either M or F?.
(40) XML Schema vs. DTD Alternatively, we could use XML schema: <xsd:element name="Name" type="xsd:string"/> <xsd:element name="Age" type="xsd:integer"/> <xsd:element name="Sex"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="M"/> <xsd:enumeration value="F"/> </xsd:restriction> </xsd:simpleType> </xsd:element>. DTDs have their own special syntax. Schemas are written in XML. DTDs are compact. XML schemas are verbose. DTDs have limited data types. Schema has rich data types. DTDs deal, primarily, with structure. Schemas deal with structure and data types..
(41) Simple vs. Complex Types Complex types: elements that contain sub-elements or carry attributes Simple types: elements that only contain numbers (or strings or dates, etc.) but do not contain any sub-elements Note: attributes are always simple types. In the following XML document, based on the rules just given, decide which of the following are simple types and which are complex: <Phonebook> <Entry> <LastName Title="Miss">Edgar</LastName> <FirstName>Pam</FirstName> <School>Optometry</School> <Campus>GP</Campus> <Room>B501</Room> <Extension>5695</Extension> </Entry> </Phonebook>. Campus Entry, Extension FirstName, LastName, Phonebook, Room, School Title.
(42) Declaring simple type elements <Title>Harry Potter and Half-Blooded Prince</Title> <Author>J.K. Rowling</Author> <Date>2005</Date> <Series>6</Series> <xsd:element <xsd:element <xsd:element <xsd:element. name="Title" type="xsd:string"/> name="Author" type="xsd:string"/> name="Date" type="xsd:gYear"/> name="Series" type="xsd:positiveInteger"/>. Note: XML Schema has many built-in data types.
(43) Declaring complex type elements <Book> <Title>Harry Potter and Half-Blooded Prince</Author> <Author>J.K. Rowling</Author> <Date>2005</Date> <Series>6</Series> </Book> <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:gYear"/> <xsd:element name="Series" type="xsd:positiveInteger"/> </xsd:sequence> </xsd:complexType> </xsd:element>.
(44) all, choice, minOccurs and maxOccurs When child elements must appear in the order: <xsd:sequence> can appear in any order: <xsd:all> can be chosen from a list: <xsd:choice> <xsd:element name="Book"> <xsd:complexType> <xsd:all> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:gYear"/> <xsd:element name="Series" type="xsd:positiveInteger"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:all> </xsd:complexType> </xsd:element>.
(45) all, choice, minOccurs and maxOccurs <xsd:element name="Book"> <xsd:complexType> <xsd:choice> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:gYear"/> <xsd:element name="Series" type="xsd:positiveInteger"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:choice> </xsd:complexType> </xsd:element> <xsd:element name="Book"> <xsd:complexType> <xsd:choice minOccurs="1" maxOccurs="4"> ... </xsd:element>.
(46) all, choice, minOccurs and maxOccurs minOccurs, maxOccurs can be combined with xsd:element too. <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element name="Title" type="xsd:string" minOccurs="0"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:gYear"/> <xsd:element name="Series" type="xsd:positiveInteger"/> <xsd:element name="Publisher" type="xsd:string" minOccurs="1" maxOccurs="3"/> </xsd:sequence> </xsd:complexType> </xsd:element>.
(47) Adding attributes <CatalogItem itemID="9876" demand="high" suppID="BRISa" OnSale="true"> <ItemName>Nokia Mobile Phone</ItemName> <Price>449.99</Price> </CatalogItem> <xsd:element name="CatalogItem"> <xsd:complexType> <xsd:sequence> <xsd:element name="ItemName" type="xsd:string"> <xsd:element name="Price" type="xsd:float"> </xsd:sequence> <xsd:attribute name="itemID" type="xsd:integer"/> <xsd:attribute name="demand" type="xsd:string"/> <xsd:attribute name="suppID" type="xsd:string"/> <xsd:attribute name="OnSale" type="xsd:boolean"/> </xsd:complexType> </xsd:element>.
(48) Adding attributes Attributes may only be assigned simple types. They can express fixed, default and optional values. <xsd:element name="CatalogItem"> <xsd:complexType> <xsd:sequence> <xsd:element name="ItemName" type="xsd:string"> <xsd:element name="Price" type="xsd:float"> </xsd:sequence> <xsd:attribute name="itemID" type="xsd:integer" use="required"/> <xsd:attribute name="demand" type="xsd:string" default="medium"/> <xsd:attribute name="supplier" type="xsd:string" fixed="Syd001"/> <xsd:attribute name="OnSale" type="xsd:boolean" use="optional" default="false"/> </xsd:complexType>.
(49) Named Types vs. Anonymous Types An anonymous type: <xsd:element name="Book"> <xsd:complexType> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:gYear"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element>.
(50) Named Types vs. Anonymous Types A named type: <xsd:complexType name="myBookType"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:gYear"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:complexType>. <xsd:element name="Book" type="myBookType" />.
(51) Named Types vs. Anonymous Types A named type can be reused: <xsd:element name="Book" type="genericBookType" /> <xsd:element name="Publication" type="genericBookType" /> <xsd:complexType name="genericBookType"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:gYear"/> </xsd:sequence> </xsd:complexType>.
(52) Deriving Complex Types XML Schema lets you use extend or restrict an existing type to define a new type. These are called derived types. - Extension: extend the parent complexType with more elements - Restriction: create a type which is a subset of the base type Consider the following complex type: ’Publication’ to create ’BookPublication’. <xsd:complexType name="Publication"> <xsd:sequence> <xsd:element name="Title" type="xsd:string"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:gYear"/> </xsd:sequence> </xsd:complexType>.
(53) Deriving Complex Types: Extension <xsd:complexType name="BookPublication"> <xsd:complexContent> <xsd:extension base="Publication"> <xsd:sequence> <xsd:element name="ISBN" type="xsd:string"/> <xsd:element name="Publisher" type="xsd:string"/> </xsd:sequence> </xsd:extension> </xsd:complexContent> </xsd:complexType> <BookPublication> <Title>XML Bible</Title> <Author>Elliotte Rusty Harold</Author> <Date>1999</Date> <ISBN> 0764532367 </ISBN> <Publisher>Hungry Minds Inc</Publisher> </BookPublication>.
(54) Deriving Complex Types: Restriction Redefine a base type element to have a restricted range of values <xsd:complexType name="Publication"> <xsd:sequence> <xsd:element name="Title" type="xsd:string" maxOccurs="unbounded"/> <xsd:element name="Author" type="xsd:string" maxOccurs="unbounded"/> <xsd:element name="Date" type="xsd:gYear"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name= "SingleAuthorPublication"> <xsd:complexContent> <xsd:restriction base="Publication"> <xsd:sequence> <xsd:element name="Title" type="xsd:string" maxOccurs="unbounded"/> <xsd:element name="Author" type="xsd:string"/> <xsd:element name="Date" type="xsd:gYear"/> </xsd:sequence> </xsd:restriction> </xsd:complexContent> </xsd:complexType>.
(55) Deriving Complex Types: Restriction Redefine a base type element to have a more restricted number of occurrences. <xsd:complexType name="Publication"> <xsd:sequence> <xsd:element name="Title" type="xsd:string" <xsd:element name="Author" type="xsd:string" <xsd:element name="Date" type="xsd:gYear"/> </xsd:sequence> </xsd:complexType>. maxOccurs="unbounded"/> minOccurs="0"/>. <xsd:complexType name= "ZeroAuthorPublication"> <xsd:complexContent> <xsd:restriction base="Publication"> <xsd:sequence> <xsd:element name="Title" type="xsd:string" maxOccurs="unbounded"/> <xsd:element name="Date" type="xsd:gYear"/> </xsd:sequence> </xsd:restriction> </xsd:complexContent> </xsd:complexType>.
(56) Deriving Simple Types You can create new types by restricting, extending simple types too. Restriction: create a new type by adding conditions to one of built-in types. eg., An integer in the range 1950 to 1959: <xsd:simpleType name="Fifties"> <xsd:restriction base="xsd:integer" /> <xsd:minInclusive value="1950" /> <xsd:maxInclusive value="1959" /> </xsd:restriction> </xsd:simpleType>.
(57) Deriving Simple Types Facets: In general, customised data types may be designed using the following facets. enumeration, minInclusive, maxInclusive, minExclusive, maxExclusive, pattern, length, minLength, and maxLength ..
(58) Deriving Simple Types Restriction of a simple type using enumeration facet <xsd:simpleType name="shape"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="circle"/> <xsd:enumeration value="triangle"/> <xsd:enumeration value="square"/> </xsd:restriction> </xsd:simpleType>. Restriction of a simple type using length facet <xsd:simpleType name="TelephoneNumber"> <xsd:restriction base="xsd:string"> <xsd:length value="8"/> </xsd:restriction> </xsd:simpleType>.
(59) Defining your own type: Custom types Here is an anonymous simple, custom type using restriction. <xsd:attribute name="Title" use="required"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="Miss"/> <xsd:enumeration value="Ms"/> <xsd:enumeration value="Mrs"/> <xsd:enumeration value="Mr"/> </xsd:restriction> </xsd:simpleType> </xsd:attribute>.
(60) Defining your own type: Custom types Define a simple type, named ”CampusType”, for the Campus element. Restrict the campus to KG, GP or CA. <xsd:simpleType name="CampusType"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="KG"/> <xsd:enumeration value="GP"/> <xsd:enumeration value="CA"/> </xsd:restriction> </xsd:simpleType> <xsd:element name="Campus" type="CampusType"/>.
(61) Restriction using pattern facet We can require that the Phone Extension element consist of exactly four digits in the following way: <xsd:element name="Extension"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:pattern value="4 ."/> </xsd:restriction> </xsd:simpleType> </xsd:element>.
(62) Restriction using pattern facet A regular expression is a sequence of characters that denote a set of strings. Here are a number of examples: [A-DQUT]. one of the letters A, B, C, D, Q, U and T. [0-9]. one of the digits 0 to 9 inclusive (same as \d). [p-r0-1]. one of the characters p, q, r, 0 and 1. Pie. the string Pie. Cutie | Pie. the string Cutie or the string Pie. A+. one or more A’s, e.g., A, AA, AAA. Aug-[0-9]{2}. the strings Aug-00 to Aug-99. Bug{2,4}. the strings Bugg, Buggg and Bugggg. X http://www.w3.org/TR/xmlschema-2/#regexs.
(63) Restriction using pattern facet Rewrite the Room element to use an anonymous simple type that restricts the element to being a single alphabetic character followed by three digits. <xsd:element name="Room"> <xsd:simpleType> <xsd:restriction base="xsd:string"> "/> <xsd:pattern value=" </xsd:restriction> </xsd:simpleType> </xsd:element>.
(64) Restriction based on another custom type <xsd:simpleType name= "EarthSurfaceElevation"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="-1290"/> <xsd:maxInclusive value="29035"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType name= "BostonAreaSurfaceElevation"> <xsd:restriction base="EarthSurfaceElevation"> <xsd:minInclusive value="0"/> <xsd:maxInclusive value="120"/> </xsd:restriction> </xsd:simpleType>.
(65) Extension of a simple type Adding attributes to a simple typed element <LastName Title="Ms">Rowling</LastName> Now consider this: LastName is a simple type (ie., string) Simple type cannot have an attribute, so you have to make LastName a complex type. A complex type means there will be at least one child element. However, what we want is a complex type with an attribute but no element..
(66) Adding attributes to a simple typed element XML Schema let you extend a simple type and add attributes to it. <xsd:element name="LastName"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="Title" type="xsd:string" use="required" /> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element>.
(67) Adding attributes to a simple typed element Another example of element with simple content and attribute <elevation units="feet">5440</elevation> <xsd:element name="elevation"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:integer"> <xsd:attribute name="units" type="xsd:string" use="required"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element>.
(68) Here is a schema for the phonebook <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:complexType name="LastNameType"> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="Title" type="TitleType" use="required" /> </xsd:extension> </xsd:simpleContent> </xsd:complexType> <xsd:simpleType name="TitleType"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="Miss"/> <xsd:enumeration value="Ms"/> <xsd:enumeration value="Mr"/> <xsd:enumeration value="Dr"/> <xsd:enumeration value="Prof"/> </xsd:restriction> </xsd:simpleType> <xsd:complexType name="EntryType"> <xsd:sequence> <xsd:element name="LastName" type="LastNameType"/> <xsd:element name="FirstName" type="xsd:string"/> <xsd:element name="School" type="xsd:string"/> <xsd:element name="Campus" type="xsd:string"/> <xsd:element name="Room" type="xsd:string"/> <xsd:element name="Extension" type="xsd:string"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name="PhonebookType"> <xsd:sequence> <xsd:element name="Entry" type="EntryType" minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> <xsd:element name="Phonebook" type="PhonebookType"/> </xsd:schema>.
(69) Parsing XML documents with Java What do you mean by ’parsing (or processing) XML docs’ ? Parsing makes an interface available to your application that needs to make use of the document Through the interface, you can modify, retrieve the document contents. XML Data. XML Parser. Some Interface. Your Application. What if the interface provided by the parser is parser-specific? → your application will have to be ’parser-specific’. Obviously ... we want “STANDARD”!. XML Data. XML Parser. Standard Interface. Your Application.
(70) SAX and DOM as the Standard Interfaces SAX - the Simple API for XML DOM - the Document Object Model Why two standards? → trade-off between control and performance DOM gives you a tree structure you have a complete control over the structure ie., traverse the tree, modify structure, etc. the tree gets stored in memory all at once. SAX lays out the document in time, as a sequence of ’events’ events are associated with each tag (open/close), each tag body, etc. you will write event handlers (ie., you can ignore certain events) It requires much less memory It becomes difficult to use if processing an element depends on earlier/later elements.
(71) SAX Overview. The parsing progress is a steady progression through the text document (eg. playing a tape) During the process notifications of events are sent to a ContentHander object; “document has started”, “an element has started”, “the character content of an element has been found”, etc. SAX defines standard names for callback functions that are triggered by events.
(72) An example of the events ... XML Document <?xml version=”1.0”?> <Name> <Last Name>Paik</Last Name> <First Name>Helen</First Name> </Name>. SAX Events start document start element: Name start element: Last Name characters: Paik end element: Last Name start element: First Name characters: Helen end element: First Name end element: Name end document.
(73) SAX Overview Some of the main SAX interfaces we need to know: X XMLReader interface: allows an application to set and query features and properties in the parser, to register event handlers to initiate a document parse main methods are setContentHandler() and parse(). X ContentHandler interface: the main interface that most SAX applications implement X Attributes interface: allows access to a list of attributes XMLReader and Attributes interfaces are implemented by the parser. ContentHandler interface should be implemented by the application..
(74) More on ContentHandler Interface startDocument(): Receive notification of the beginning of a document endDocument(): Receive notification of the end of a document startElement(uri, localName, qName, atts): Receive notification of the beginning of an element: uri is a string containing the uri of any associated namespace. localName is the name of the element, without any namespace prefix. qName is the qualified name, i.e. including any namespace prefix, of the element. atts is an object (of type Attributes) that collectively represents all and any attributes associated with the start tag. characters(ch, start, length): Receive notification of character data: endElement(uri, localName, qName): Receive notification of the end of an element:.
(75) SAX Attributes Interface getLength(): Returns the number of attributes attached to the associated element tag. getQName(index): Returns the qualified (i.e., full) name of an attribute: index is an integer indicating the position of the attribute.. getValue(qName): Returns the value of an attribute: qName is the qualified name of the attribute.. getValue(index): Returns the value of an attribute: index is the position of the attribute..
(76) XMLReader Interface setContentHandler(handler): This method lets the programmer tell the parser which object will handle parsing events: handler is the object involved.. setFeature(name, status): This method allows the parser to be configured: name is the uri of the feature involved. status is a boolean value indicating whether or not the feature should be enabled. // Set parser to validating reader.setFeatures(”http://xml.org/sax/features/validation”, true); // Set parser to process namespace reader.setFeatures(”http://xml.org/sax/features/namespaces”, true);. parse(source): This tells the parser to commence parsing: source is the stream of characters (XML) involved..
(77) Document Object Model (DOM) DOM is an API for HTML and XML documents, its specification is developed by W3C (http://www.w3.org/DOM/DOMTR) It defines the logical structure of documents and the way a document is accessed and manipulated.. TABLE <TABLE> <TBODY> <TR> <TD>Assignment One</TD> <TD>Submission Instructions</TD> </TR> <TR> <TD>Assignment Two</TD> <TD>Submission Instructions</TD> </TR> </TBODY> </TABLE>. <TABLE> <TBODY> <TR>. <TR>. <TD>. <TD>. <TD>. <TD>. Ass..One. Sub..Inst. Ass..Two. Sub..Inst. http://java.sun.com/j2se/1.5.0/docs/api/org/w3c/dom/package-summary.html.
(78) Using a DOM Parser (eg., Apache Xerces) import org.w3c.dom.*; import org.apache.xerces.parsers.DOMParser; public class DOMCountNames { public static void main(String[] args) { try { DOMParser parser = new DOMParser(); parser.parse(args[0]); Document doc = parser.getDocument(); // do something .. } catch(Exception e){ e.printStackTrace(System.err); } } }. The DOMParser Class: DOMParser class is derived from the XMLParser class parse() method parses the input source given by a system identifier Document getDocument() method returns the document itself.
(79) Document Interface Methods Once you have the Document object, you can: Attr createAttribute(String name): Creates an attribute Element createElement(String tagName): Creates an element Text createTextNode(String data): Creates a Text Node Element getDocumentElement(): Gets the root element of the document Element getElementById(String elementId): Get the element by ID NodeList getElementsByTagName(String tagname): Returns a NodeList of all the elements with a given tag name NodeList Interface Methods: int getLength(): Gets the number of nodes in this list Node item(int index): Gets the item at the specified index value in the collection Helen Paik (CSE, UNSW). COMP9321, 09s2. Week 2. 79 / 86.
(80) Examples of Node Properties. (XML), p.9.25. persons person. person. person. first. last. first. last. first. last. Alan. Wiles. Jun. Li. Sue. White. document.getElementsByTagName("person")[1]. person. document.getElementsByTagName("person")[1].parentNode. persons. document.getElementsByTagName("person")[1].childNodes. first. document.getElementsByTagName("person")[1].firstChild. first. document.getElementsByTagName("person")[1].lastChild. last. document.getElementsByTagName("person")[1].previousSibling document.getElementsByTagName("person")[1].lastChild.firstChild. person Li. last.
(81) Count/Print the number of ’book’ elements import org.w3c.dom.*; import org.apache.xerces.parsers.DOMParser; public class DOMCountNames { public static void main(String[] args) { try{ DOMParser parser = new DOMParser(); parser.parse(args[0]); Document doc = parser.getDocument(); NodeList nodelist = doc.getElementsByTagName("book"); System.out.println(args[0] + " has " + nodelist.getLength() + " <book> elements."); } catch(Exception e){ e.printStackTrace(System.err); } } }. Compiling and Running % javac -classpath ”:xerces.jar” DOMCountNames.java % java -classpath ”:xerces.jar” DOMCountNames books.xml.
(82) Dealing with Nodes in DOM In DOM, XML Documents are treated as a tree of nodes Types of Nodes: Twelve different kinds of node are defined by the W3C DOM standard. These are identified by the following constants: ELEMENT_NODE = ATTRIBUTE_NODE = TEXT_NODE = CDATA_SECTION_NODE = ENTITY_REFERENCE_NODE = ENTITY_NODE = PROCESSING_INSTRUCTION_NODE COMMENT_NODE = DOCUMENT_NODE = DOCUMENT_TYPE_NODE = DOCUMENT_FRAGMENT_NODE = NOTATION_NODE =. 1; 2; 3; 4; 5; 6; = 7; 8; 9; 10; 11; 12;.
(83) Dealing with Nodes in DOM There is a large range of methods that can be applied to the nodes: Node Interface Methods getNodeName() getNodeType() getFirstChild() getNextSibling() insertBefore(. . . ) appendChild(. . . ) normalize() hasAttributes(). getNodeValue() getParentNode() getLastChild() getAttributes() replaceChild(. . . ) hasChildNodes() isSupported(. . . ) getLocalName(). setNodeValue(. . . ) getChildNodes() getPreviousSibling() getOwnerDocument() removeChild(. . . ) cloneNode(. . . ) getNamespaceURI().
(84) Dealing with Nodes Consider the following program: Document doc = parser.getDocument(); Element docRoot = doc.getDocumentElement(); String docRootName = docRoot.getTagName(); System.out.println("Doc root: "+docRootName); int i = 0; for (Node node = docRoot.getFirstChild(); node != null; node = node.getNextSibling()) { if (node.getNodeType()==Node.ELEMENT_NODE) { System.out.println(i+": " + node.getNodeType() + node.getNodeName()); } else { System.out.println(i+": " + node.getNodeType()); } i++; }. The method getNodeType() returns a number in the range 1 to 12. Thus we can tell which kind of node we are dealing with. What will the output of this program be?.
(85) The Element interface This interface outlines operations that are specific to elements: getTagName(): This method returns the name of the tag associated with the element. getAttribute(name): This method returns a string containing the value of an attribute: name is the name of the attribute.. Amend the code on the previous example to print out the name and value of the type attribute attached to the food element. Element tmp = (Element)node; if (tmp.getTagName().equals(”food”)) { String typeVal = tmp.getAttribute(”type”); System.out.println(”type: ” + typeVal); }.
(86) More with DOM ... Heaps of hands on tutorials on the web ... DOM and Javascript: e.g., http://www.sitepoint.com/print/xml-javascript-mozilla https://developer.mozilla.org/en/The_DOM_and_JavaScript DOM, XML, Javascript and Ajax: e.g., http://www.w3schools.com/Ajax/ajax_intro.asp.
(87)
Related documents
Finished goods inventory Distribution planning Order processing Transportation Customer service Strategic planning Information services Marketing/sales Finance Supply Chain
being of the same type as the ?rst piece but having hooks that are more-or-less at right angles to the hooks of the ?rst piece or being a deep pile or nap so that when the two
Keywords: Random Subspace Method, Ensemble Classification, Sparsity, Information Criterion, Consistency, Feature Ranking, High Dimensional
◼ For each weak entity type W in the ER schema with owner entity type E, create a relation R & include all simple attributes (or.. simple components of composite attributes) of
Purpose: To evaluate hearing limitations among elderly individuals with moderate to severe sensorineural hearing loss according to the variables educational level and degree of
complying with the following conditions: (a) provide its Lifeline customers with 911 and enhanced 911 (E911) access regardless of activation status and availability of
Using Resistant Prey Demonstrates That Bt Plants Producing Cry1Ac, Cry2Ab, and Cry1F Have No Negative Effects on Geocoris punctipes and Orius insidiosus Jun-Ce Tian Cornell
A nonlinear narrow sweep excitation method (NNSM) was employed to efficiently excite the local resonance frequencies of the damaged region in order to give rise to the