CSE 3241: XML
Extensible Markup Language (Ch. 12)
1
Topics
⚫ Structured, Semistructured, and Unstructured Data
⚫ XML Hierarchical (Tree) Data Model
⚫ XML Documents
⚫ DTD (Document Type Definition)
⚫ XML Schema
⚫ Storing and Extracting XML Documents from Databases
⚫ XML Languages
Structured, Semistructured, and Unstructured Data
⚫ Structured data
◦ Represented in a strict format
◦ Example: information stored in databases
⚫ Semistructured data
◦ Has a certain structure
◦ Not all information collected will have
identical structure
Structured, Semistructured,
and Unstructured Data (cont’d.)
◦ Self-describing data
● Schema information mixed in with data values
● May be displayed as a directed graph
● Labels or tags on directed edges represent:
● Schema names
● Names of attributes
● Object types (or entity types or classes)
● Relationships
Unstructured Data
⚫ Limited indication of the of data
document that contains information embedded within it
⚫ HTML documents
◦ Do not include schema information about type of data
⚫ Static HTML page
◦ All information to be displayed explicitly
spelled out as fixed text in HTML file
Unstructured Data
⚫ HTML uses a large number of predefined tags
◦ Tag
● Text that appears between angled brackets: <...>
◦ End tag
● Tag with a slash: </...>
Projects Proj X
Proj Y Worker
s
Worker
s
Semistructured Data
SemiStructured Data: XML
⚫ Data sources
◦ Database storing data for Internet applications
⚫ Hypertext documents
◦ Common method of specifying contents and
formatting of Web pages
What is XML?
⚫ XML – The eXtensible Markup Language
⚫ What’s a Markup Language?
◦ Language used to annotate a document for some purpose
◦ Uses tags that are distinguished from the content of the document to provide that annotation
◦ HTML (HyperText Markup Language) and LaTeX
● Both examples of document publishing languges
● Tags used to indicate formatting
● Tags follow a defined structure to keep them separate from
the content of the document
What is XML?
⚫ XML provides a framework to define a structure for data
◦ An XML document is a collection of related data items
◦ Document is “marked up” with tags known as elements
● Elements are used to provide structure to the data
11
XML Hierarchical (Tree) Data Model
⚫ Elements and attributes
◦ Main structuring concepts used to construct an XML document
⚫ Complex elements
◦ Constructed from other elements hierarchically
⚫ Simple elements
◦ Contain data values
⚫ XML tag names
◦ Describe the meaning of the data elements in the
document
XML Hierarchical (Tree) Data Model (cont’d.)
⚫ XML attributes
◦ Describe properties and characteristics of the elements (tags) within which they appear
⚫ May reference another element in another part of the XML document
◦ Common to use attribute values in one
element as the references
The XML Data Model
⚫ Attributes vs. Elements
◦ Data can be stored as the
contents of an element OR as an attribute of an element
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</L ocation>
<Dept_no>5</Dept_n o>
…
</Projects>
Why pick one over the other?
Best practice:
Attributes at describe/modify the element Elements to hold the actual data values
Much like in HTML:
Element (tag) contents are the data to be displayed
Attributes (generally) modify/describe how it is to be
displayed
What does XML have to do with databases?
⚫ Recall: What is a database?
◦ A logically coherent collection of data with some
specific meaning that has been designed for a specific purpose.
● Structured and semi-structured data files vs. database?
◦ More practically, XML is used as a data exchange framework
● Moving data from one application to another, from one database to another
● Taking data from a database and turning it into a website, a report, or other human readable document
◦ Even some implementations of “XML native” DBs
● XML as the “back end” storage instead of relations
The XML Data Model
17
⚫ XML uses a hierarchical model
◦ Also known as a tree model Documents can be
represented as trees
Each simple element contains one data value
Leaves of the tree
Complex elements can contain multiple child elements
Internal nodes of the tree
Each complex element can belong to one complex parent element
Parent node of the tree
One root element contains everything else
Root of the tree
A sample XML tree
• Internal nodes are complex elements
• Leaf nodes are simple elements
• The root node is the root element
• Root element
contains all other elements within it
Projects
Project Id=“1”
Name Location Dept_no Workers
Ssn Last_name Hours
Ssn Hours
Worker Worker
“Product X” “Bellaire” “5”
“123456789” “Smith” “32.5” “453453453” “15.5”
Project Project
A sample XML tree
19
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</Location>
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</S sn>
<Last_name>Smith<
/LastName>
<Hours>32.5</Hou rs>
</Worker>
<Worker>
<Ssn>453453453</S sn>
<Hours>15.5</Hou rs>
</Worker>
</Workers>
</Project>
…
</Projects>
Projects
Project Id=“1”
Name Location Dept_no Workers
Ssn Last_name Hours
Ssn Hours
Worker Worker
“Product X” “Bellaire” “5”
“123456789” “Smith” “32.5” “453453453” “15.5”
Project Project
A sample of XML
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</Location>
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
21
XML Declaration
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
root element
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
23
Beginning of root element
End of root element root element
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location
>
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</
Ssn>
<Last_name>Smith
</LastName>
<Hours>32.5</Hou rs>
</Worker>
<Worker>
<Ssn>453453453</
Ssn>
<Hours>15.5</Hou rs>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
First child element of root
(Other child elements possible in here – do not even need to be “Project”
elements necessarily)
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
25
The first Project element has an attribute named number
with a value of “1”
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
First child element of Project element where id=“1”
Simple element with a name of “Name” and a value of “Product X”
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
27
Second child element of Project element where id=“1”
Simple element with a name of “Location” and a value of “Bellaire”
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
Third child element of Project element where id=“1”
Simple element with a name of “Dept_no” and a value of “5”
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
29
Fourth child element of Project element where id=“1”
Complex element with a name of “Workers”
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
First child element of Projects/
Project[number=“1”]/
Workers
Complex element with a name of “Worker”
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
XML Hierarchical (Tree) Data Model (cont’d.)
⚫ Tree model or hierarchical model
⚫ Main types of XML documents
◦ Data-centric XML documents
◦ Document-centric XML documents
◦ Hybrid XML documents
⚫ Schemaless XML documents
◦ Do not follow a predefined schema of
element names and corresponding tree
structure
XML Document Types – Data Centric XML
⚫ Data-centric XML
◦ Highly structured
◦ Many small data items
◦ Often used for data exchange purposes
● Transfer data from one system to another
◦ Also used to create web pages dynamically from databases
◦ Generally follow a schema document that
determines their structure
XML Document Types – Document-Centric XML
⚫ Few structural elements
⚫ Large amounts of text
◦ Articles, blog entries, books
⚫ May have a schema document, but not required
◦ Schema may be very limited in semantics
● What’s a title?
● What’s a chapter?
● What’s a paragraph?
33
More XML Document Types
⚫ Hybrid XML
◦ Some parts are highly structured
◦ Some parts mostly blocks of text and/or unstructured
◦ May or may not have a predefined schema
⚫ Schemaless XML documents
◦ Semi-structured documents without a predefined schema
◦ Denoted by the attribute ‘standalone=“yes”’
in the XML declaration on the top line
Valid XML
⚫ An XML document is considered valid if:
◦ It is well-formed
◦ And…
35
To be continued after this
definition…
Well-formed XML
⚫ An XML document is well-formed when it follows certain conditions:
◦ It must start with an XML declaration line:
<?xml version=“1.0” standalone=“yes”?>
◦ It must form a tree:
● Must start with a single root element
● Every child element must have start and end tags that are contained completely within a parent element:
Good Bad
<parent> <parent>
<child> <child>
</child> </parent>
</parent> </child>
Valid XML
⚫ An XML document is considered valid if:
◦ It is well-formed, and …
◦ It follows a particular schema in a standard definition language
● A DTD document (Document Type Definition)
● An XML schema document
◦ DTDs are the original, older technology
◦ XML schema documents are the “new” hotness
● First published in 2001
37
DTD – Document Type Definition
⚫ Original method of specifying a schema definition
◦ Still in widespread use
⚫ A very simple schema definition language
◦ Each possible element in the document is defined
● What children must it have?
● What children can it (optionally) have?
● What kinds of attributes can/must it have?
● If it is a leaf element, what kinds of values can it
have?
XML Documents, DTD, and XML Schema (cont’d.)
⚫ Notation for specifying elements
⚫ XML DTD
◦ Data types in DTD are not very general
◦ Special syntax
● Requires specialized processors
◦ All DTD elements always forced to follow the specified ordering of the document
● Unordered elements not permitted
A sample XML document and DTD
40
<?xml version=“1.0” standalone=“no”?>
<!DOCTYPE Projects SYSTEM “proj.dtd”>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</S sn>
<Last_name>Smith<
/LastName>
<Hours>32.5</Hou rs>
</Worker>
<Worker>
<Ssn>453453453</S sn>
<Hours>15.5</Hou rs>
</Worker>
</Workers>
</Project>
…
We declare that we want to use a DTD by Putting the DOCTYPE declaration at the top of our XML file
!DOCTYPE keyword
Projects
The name of our DTD’s root node
SYSTEM
indicating that this is an external DTD
“proj.dtd”
the filename (or URL)
A sample XML document and DTD
41
<?xml version=“1.0” standalone=“no”?>
<!DOCTYPE Projects SYSTEM “proj.dtd”>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</Location
>
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>1234567 89</Ssn>
<Last_name>S mith</LastName>
<Hours>32.5</
Hours>
</Worker>
<Worker>
<Ssn>4534534 53</Ssn>
<Hours>15.5</
Hours>
</Worker>
</Workers>
</Project>
…
</Projects>
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID #REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID #REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
A sample DTD
root element comes first
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID #REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
A sample DTD
43 Name of element
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID #REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
A sample DTD
List of children
Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child No symbol – indicates exactly one child
So this indicates 1 or more Project children
are required
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID #REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
A sample DTD
45 List of children
Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child No symbol – indicates exactly one child
This indicates that Dept_no is an optional
field, but there can be only one of them
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID #REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
List of children
Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child No symbol – indicates exactly one child
This indicates that Dept_no is an optional
field, but there can be only one of them
A sample DTD
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID #REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
List of children
Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child No symbol – indicates exactly one child
This indicates that Dept_no is an optional
field, but there can be only one of them
A sample DTD
47
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID #REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
A sample DTD
Project has an attribute named
“number”
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID #REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
A sample DTD
49 Project has an attribute named
“number”
It’s “type” is a unique ID
This can be used to refer to this child by other elements – like a primary key
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID #REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
Project has an attribute named
“number”
It’s “type” is a unique ID
This can be used to refer to this child by other elements – like a primary key
And this attribute ID must exist on all
Project children
A sample DTD
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID #REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
A sample DTD
51 Name is a “leaf node”
#PCDATA means that it holds
“parsed character data”
It will contain a value of some kind between its start and end tag (even an empty value counts as a value for the DTD)
A sample XML document and DTD
52
<?xml version=“1.0” standalone=“no”?>
<!DOCTYPE Projects SYSTEM “proj.dtd”>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</Location>
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</S sn>
<Last_name>Smith<
/LastName>
<Hours>32.5</Hou rs>
</Worker>
<Worker>
<Ssn>453453453</S sn>
<Hours>15.5</Hou rs>
</Worker>
</Workers>
</Project>
…
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID #REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
DTD Limitations
⚫ Data types in DTD are not general
◦ Child nodes hold PCDATA values – strings
◦ DTD has its own syntax
● Need to write a special parser for it
● Can’t leverage existing XML parsers to do DTD parsing
◦ All elements must follow the ordering laid out
● Unordered elements not allowed
53