CSE 3241: XML
Extensible Markup Language
(Ch. 12)
1
Topics
Structured, Semistructured, and Unstructured Data
XML Hierarchical (Tree) Data Model
XML Documents
DTD (Document Type Definition)
XML Schema
Storing and Extracting XML Documents from Databases
XML Languages
Structured, Semistructured, and Unstructured Data
Structured data
◦ Represented in a strict format
◦ Example: information stored in databases
Semistructured data
◦ Has a certain structure
◦ Not all information collected will
have identical structure
Structured, Semistructured,
and Unstructured Data (cont’d.)
◦ Self-describing data
Schema information mixed in with data values
May be displayed as a directed graph
Labels or tags on directed edges represent:
Schema names
Names of attributes
Object types (or entity types or classes)
Relationships
Unstructured Data
Limited indication of the of data document that contains
information embedded within it
HTML documents
◦ Do not include schema information about type of data
Static HTML page
◦ All information to be displayed
explicitly spelled out as fixed text in
HTML file
Unstructured Data
HTML uses a large number of predefined tags
◦ Tag
Text that appears between angled brackets:
<...>
◦ End tag
Tag with a slash: </...>
Projects Proj X
Proj Y Worker
s
Worker s
Semistructured Data
SemiStructured Data: XML
Data sources
◦ Database storing data for Internet applications
Hypertext documents
◦ Common method of specifying
contents and formatting of Web
pages
What is XML?
XML – The eXtensible Markup Language
What’s a Markup Language?
◦ Language used to annotate a document for some purpose
◦ Uses tags that are distinguished from the content of the document to provide that annotation
◦ HTML (HyperText Markup Language) and LaTeX
Both examples of document publishing languges
Tags used to indicate formatting
Tags follow a defined structure to keep them
separate from the content of the document 10
What is XML?
XML provides a framework to define a structure for data
◦ An XML document is a collection of related data items
◦ Document is “marked up” with tags known as elements
Elements are used to provide structure to the data
11
XML Hierarchical (Tree) Data Model
Elements and attributes
◦ Main structuring concepts used to construct an XML document
Complex elements
◦ Constructed from other elements hierarchically
Simple elements
◦ Contain data values
XML tag names
◦ Describe the meaning of the data
elements in the document
XML Hierarchical (Tree) Data Model (cont’d.)
XML attributes
◦ Describe properties and
characteristics of the elements (tags) within which they appear
May reference another element in another part of the XML
document
◦ Common to use attribute values in
one element as the references
The XML Data Model
Attributes vs.
Elements
◦ Data can be stored as the contents of an
element OR as an
attribute of an element
14
<?xml version=“1.0”
standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</Location>
<Dept_no>5</Dept_no>
…
</Projects>
Why pick one over the other?
Best practice:
Attributes at describe/modify the element Elements to hold the actual data values
Much like in HTML:
Element (tag) contents are the data to be displayed Attributes (generally) modify/describe how it is to be displayed
What does XML have to do with databases?
Recall: What is a database?
◦ A logically coherent collection of data with some specific meaning that has been
designed for a specific purpose.
Structured and semi-structured data files vs.
database?
◦ More practically, XML is used as a data exchange framework
Moving data from one application to another, from one database to another
Taking data from a database and turning it into a website, a report, or other human readable
document
◦ Even some implementations of “XML native” DBs
XML as the “back end” storage instead of relations
16
The XML Data Model
17
XML uses a hierarchical model
◦ Also known as a tree model Documents can be
represented as trees
Each simple element contains one data value
Leaves of the tree
Complex elements can contain multiple child elements
Internal nodes of the tree
Each complex element can belong to one complex parent element
Parent node of the tree
One root element contains everything else
Root of the tree
A sample XML tree
18
•
Internal nodes are complex elements
•
Leaf nodes are simple elements
•
The root node is the root element
•
Root element
contains all other elements within it
Projects
Project Id=“1”
Name Location Dept_no Workers
Ssn Last_name Hours Ssn Hours
Worker Worker
“Product X” “Bellaire” “5”
“123456789” “Smith” “32.5” “453453453” “15.5”
Project Project
A sample XML tree
19
<?xml version=“1.0” standalone=“yes”?
>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</Location>
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
…
</Projects>
Projects
Project Id=“1”
Name Location Dept_no Workers
Ssn Last_name Hours Ssn Hours
Worker Worker
“Product X” “Bellaire” “5”
“123456789” “Smith” “32.5” “453453453” “15.5”
Project Project
A sample of XML
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</Location>
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
20
A sample of XML
21
XML Declaration
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
22
root element
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
23
Beginning of root element
End of root element root element
<?xml version=“1.0”
standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</
Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
24
First child element of root
(Other child elements possible in here – do not even need to be “Project”
elements necessarily)
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
25
The first Project element has an attribute named number
with a value of “1”
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
26
First child element of Project element where id=“1”
Simple element with a name of “Name” and a value of “Product X”
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
27
Second child element of Project element where id=“1”
Simple element with a name of “Location” and a value of “Bellaire”
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
28
Third child element of Project element where id=“1”
Simple element with a name of “Dept_no” and a value of “5”
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
29
Fourth child element of Project element where id=“1”
Complex element with a name of “Workers”
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
A sample of XML
30
First child element of Projects/
Project[number=“1”]/
Workers
Complex element with a name of “Worker”
<?xml version=“1.0” standalone=“yes”?>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</ Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
….
</Projects>
XML Hierarchical (Tree) Data Model (cont’d.)
Tree model or hierarchical model
Main types of XML documents
◦ Data-centric XML documents
◦ Document-centric XML documents
◦ Hybrid XML documents
Schemaless XML documents
◦ Do not follow a predefined schema of element names and corresponding
tree structure
XML Document Types – Data Centric XML
Data-centric XML
◦ Highly structured
◦ Many small data items
◦ Often used for data exchange purposes
Transfer data from one system to another
◦ Also used to create web pages dynamically from databases
◦ Generally follow a schema document that determines their structure
32
XML Document Types – Document-Centric XML
Few structural elements
Large amounts of text
◦ Articles, blog entries, books
May have a schema document, but not required
◦ Schema may be very limited in semantics
What’s a title?
What’s a chapter?
What’s a paragraph?
33
More XML Document Types
Hybrid XML
◦ Some parts are highly structured
◦ Some parts mostly blocks of text and/or unstructured
◦ May or may not have a predefined schema
Schemaless XML documents
◦ Semi-structured documents without a predefined schema
◦ Denoted by the attribute
‘standalone=“yes”’ in the XML declaration on the top line
34
Valid XML
An XML document is considered valid if:
◦ It is well-formed
◦ And…
35
To be continued after this definition…
Well-formed XML
An XML document is well-formed when it follows certain conditions:
◦ It must start with an XML declaration line:
<?xml version=“1.0” standalone=“yes”?>
◦ It must form a tree:
Must start with a single root element
Every child element must have start and end tags that are contained completely within a parent
element:
Good Bad
<parent> <parent>
<child> <child>
</child> </parent>
</parent> </child>
36
Valid XML
An XML document is considered valid if:
◦ It is well-formed, and …
◦ It follows a particular schema in a standard definition language
A DTD document (Document Type Definition)
An XML schema document
◦ DTDs are the original, older technology
◦ XML schema documents are the “new”
hotness
First published in 2001
37
DTD – Document Type Definition
Original method of specifying a schema definition
◦ Still in widespread use
A very simple schema definition language
◦ Each possible element in the document is defined
What children must it have?
What children can it (optionally) have?
What kinds of attributes can/must it have?
If it is a leaf element, what kinds of values
can it have?
38XML Documents, DTD, and XML Schema (cont’d.)
Notation for specifying elements
XML DTD
◦ Data types in DTD are not very general
◦ Special syntax
Requires specialized processors
◦ All DTD elements always forced to follow the specified ordering of the document
Unordered elements not permitted
A sample XML document and DTD
40
<?xml version=“1.0” standalone=“no”?>
<!DOCTYPE Projects SYSTEM “proj.dtd”>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</Location >
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
…
</Projects>
We declare that we want to use a DTD by Putting the DOCTYPE declaration at the top of our XML file
!DOCTYPE keyword Projects
The name of our DTD’s root node SYSTEM
indicating that this is an external DTD
“proj.dtd”
the filename (or URL)
A sample XML document and DTD
41
<?xml version=“1.0” standalone=“no”?>
<!DOCTYPE Projects SYSTEM “proj.dtd”>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</Location>
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
…
</Projects>
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID
#REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID
#REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
A sample DTD
42
root element comes first
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID
#REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
A sample DTD
43
Name of element
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID
#REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
A sample DTD
44
List of children
Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child No symbol – indicates exactly one child
So this indicates 1 or more Project children
are required
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID
#REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
A sample DTD
45
List of children
Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child No symbol – indicates exactly one child
This indicates that Dept_no is an optional
field, but there can be only one of them
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID
#REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
List of children
Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child No symbol – indicates exactly one child
This indicates that Dept_no is an optional
field, but there can be only one of them
A sample DTD
46
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID
#REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
List of children
Regular expression-like syntax:
+ – indicates 1 or more of this child
* – indicates 0 or more of this child
? – indicates 0 or 1 of this child No symbol – indicates exactly one child
This indicates that Dept_no is an optional
field, but there can be only one of them
A sample DTD
47
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID
#REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
A sample DTD
48
Project has an attribute named
“number”
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID
#REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
A sample DTD
49
Project has an attribute named
“number”
It’s “type” is a unique ID
This can be used to refer to this child by other elements – like a primary key
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID
#REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
50
Project has an attribute named
“number”
It’s “type” is a unique ID
This can be used to refer to this child by other elements – like a primary key
And this attribute ID must exist on all
Project children
A sample DTD
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID
#REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
A sample DTD
51
Name is a “leaf node”
#PCDATA means that it holds
“parsed character data”
It will contain a value of some kind between its start and end tag (even an empty value counts as a value for the DTD)
A sample XML document and DTD
52
<?xml version=“1.0” standalone=“no”?>
<!DOCTYPE Projects SYSTEM “proj.dtd”>
<Projects>
<Project number=“1”>
<Name>Product X</Name>
<Location>Bellaire</Location>
<Dept_no>5</Dept_no>
<Workers>
<Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</LastName>
<Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>15.5</Hours>
</Worker>
</Workers>
</Project>
…
</Projects>
<!ELEMENT Projects (Project+)>
<!ELEMENT Project (Name, Location, Dept_no?, Workers)>
<!ATTLIST Project number ID
#REQUIRED>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Location (#PCDATA)>
<!ELEMENT Dept_no (#PCDATA)>
<!ELEMENT Workers (Worker*)>
<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>
<!ELEMENT Ssn (#PCDATA)>
<!ELEMENT Last_name (#PCDATA)>
<!ELEMENT First_name (#PCDATA)>
<!ELEMENT Hours (#PCDATA)>
DTD Limitations
Data types in DTD are not general
◦ Child nodes hold PCDATA values – strings
◦ DTD has its own syntax
Need to write a special parser for it
Can’t leverage existing XML parsers to do DTD parsing
◦ All elements must follow the ordering laid out
Unordered elements not allowed
53