• No results found

CSE 3241: XML Extensible Markup Language (Ch. 12) 1

N/A
N/A
Protected

Academic year: 2021

Share "CSE 3241: XML Extensible Markup Language (Ch. 12) 1"

Copied!
54
0
0

Loading.... (view fulltext now)

Full text

(1)

CSE 3241: XML

Extensible Markup Language (Ch. 12)

1

(2)

Topics

⚫ Structured, Semistructured, and Unstructured Data

⚫ XML Hierarchical (Tree) Data Model

⚫ XML Documents

⚫ DTD (Document Type Definition)

⚫ XML Schema

⚫ Storing and Extracting XML Documents from Databases

⚫ XML Languages

(3)

Structured, Semistructured, and Unstructured Data

Structured data

◦ Represented in a strict format

◦ Example: information stored in databases

Semistructured data

◦ Has a certain structure

◦ Not all information collected will have

identical structure

(4)

Structured, Semistructured,

and Unstructured Data (cont’d.)

Self-describing data

● Schema information mixed in with data values

● May be displayed as a directed graph

Labels or tags on directed edges represent:

● Schema names

● Names of attributes

● Object types (or entity types or classes)

● Relationships

(5)

Unstructured Data

⚫ Limited indication of the of data

document that contains information embedded within it

HTML documents

◦ Do not include schema information about type of data

Static HTML page

◦ All information to be displayed explicitly

spelled out as fixed text in HTML file

(6)

Unstructured Data

⚫ HTML uses a large number of predefined tags

Tag

● Text that appears between angled brackets: <...>

End tag

● Tag with a slash: </...>

(7)

Projects Proj X

Proj Y Worker

s

Worker

s

(8)

Semistructured Data

(9)

SemiStructured Data: XML

Data sources

◦ Database storing data for Internet applications

Hypertext documents

◦ Common method of specifying contents and

formatting of Web pages

(10)

What is XML?

⚫ XML – The eXtensible Markup Language

⚫ What’s a Markup Language?

◦ Language used to annotate a document for some purpose

◦ Uses tags that are distinguished from the content of the document to provide that annotation

◦ HTML (HyperText Markup Language) and LaTeX

● Both examples of document publishing languges

● Tags used to indicate formatting

● Tags follow a defined structure to keep them separate from

the content of the document

(11)

What is XML?

⚫ XML provides a framework to define a structure for data

◦ An XML document is a collection of related data items

◦ Document is “marked up” with tags known as elements

● Elements are used to provide structure to the data

11

(12)

XML Hierarchical (Tree) Data Model

Elements and attributes

◦ Main structuring concepts used to construct an XML document

Complex elements

◦ Constructed from other elements hierarchically

Simple elements

◦ Contain data values

⚫ XML tag names

◦ Describe the meaning of the data elements in the

document

(13)

XML Hierarchical (Tree) Data Model (cont’d.)

⚫ XML attributes

◦ Describe properties and characteristics of the elements (tags) within which they appear

May reference another element in another part of the XML document

◦ Common to use attribute values in one

element as the references

(14)

The XML Data Model

⚫ Attributes vs. Elements

◦ Data can be stored as the

contents of an element OR as an attribute of an element

<?xml version=“1.0” standalone=“yes”?>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</L ocation>

<Dept_no>5</Dept_n o>

</Projects>

Why pick one over the other?

Best practice:

Attributes at describe/modify the element Elements to hold the actual data values

Much like in HTML:

Element (tag) contents are the data to be displayed

Attributes (generally) modify/describe how it is to be

displayed

(15)
(16)

What does XML have to do with databases?

⚫ Recall: What is a database?

A logically coherent collection of data with some

specific meaning that has been designed for a specific purpose.

● Structured and semi-structured data files vs. database?

◦ More practically, XML is used as a data exchange framework

● Moving data from one application to another, from one database to another

● Taking data from a database and turning it into a website, a report, or other human readable document

◦ Even some implementations of “XML native” DBs

● XML as the “back end” storage instead of relations

(17)

The XML Data Model

17

⚫ XML uses a hierarchical model

◦ Also known as a tree model Documents can be

represented as trees

Each simple element contains one data value

Leaves of the tree

Complex elements can contain multiple child elements

Internal nodes of the tree

Each complex element can belong to one complex parent element

Parent node of the tree

One root element contains everything else

Root of the tree

(18)

A sample XML tree

• Internal nodes are complex elements

• Leaf nodes are simple elements

• The root node is the root element

• Root element

contains all other elements within it

Projects

Project Id=“1”

Name Location Dept_no Workers

Ssn Last_name Hours

Ssn Hours

Worker Worker

“Product X” “Bellaire” “5”

“123456789” “Smith” “32.5” “453453453” “15.5”

Project Project

(19)

A sample XML tree

19

<?xml version=“1.0” standalone=“yes”?>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</Location>

<Dept_no>5</Dept_no>

<Workers>

<Worker>

<Ssn>123456789</S sn>

<Last_name>Smith<

/LastName>

<Hours>32.5</Hou rs>

</Worker>

<Worker>

<Ssn>453453453</S sn>

<Hours>15.5</Hou rs>

</Worker>

</Workers>

</Project>

</Projects>

Projects

Project Id=“1”

Name Location Dept_no Workers

Ssn Last_name Hours

Ssn Hours

Worker Worker

“Product X” “Bellaire” “5”

“123456789” “Smith” “32.5” “453453453” “15.5”

Project Project

(20)

A sample of XML

<?xml version=“1.0” standalone=“yes”?>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</Location>

<Dept_no>5</Dept_no>

<Workers>

<Worker>

<Ssn>123456789</Ssn>

<Last_name>Smith</LastName>

<Hours>32.5</Hours>

</Worker>

<Worker>

<Ssn>453453453</Ssn>

<Hours>15.5</Hours>

</Worker>

</Workers>

</Project>

….

</Projects>

(21)

A sample of XML

21

XML Declaration

<?xml version=“1.0” standalone=“yes”?>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</ Location >

<Dept_no>5</Dept_no>

<Workers>

<Worker>

<Ssn>123456789</Ssn>

<Last_name>Smith</LastName>

<Hours>32.5</Hours>

</Worker>

<Worker>

<Ssn>453453453</Ssn>

<Hours>15.5</Hours>

</Worker>

</Workers>

</Project>

….

</Projects>

(22)

A sample of XML

root element

<?xml version=“1.0” standalone=“yes”?>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</ Location >

<Dept_no>5</Dept_no>

<Workers>

<Worker>

<Ssn>123456789</Ssn>

<Last_name>Smith</LastName>

<Hours>32.5</Hours>

</Worker>

<Worker>

<Ssn>453453453</Ssn>

<Hours>15.5</Hours>

</Worker>

</Workers>

</Project>

….

</Projects>

(23)

A sample of XML

23

Beginning of root element

End of root element root element

<?xml version=“1.0” standalone=“yes”?>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</ Location

>

<Dept_no>5</Dept_no>

<Workers>

<Worker>

<Ssn>123456789</

Ssn>

<Last_name>Smith

</LastName>

<Hours>32.5</Hou rs>

</Worker>

<Worker>

<Ssn>453453453</

Ssn>

<Hours>15.5</Hou rs>

</Worker>

</Workers>

</Project>

….

</Projects>

(24)

A sample of XML

First child element of root

(Other child elements possible in here – do not even need to be “Project”

elements necessarily)

<?xml version=“1.0” standalone=“yes”?>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</ Location >

<Dept_no>5</Dept_no>

<Workers>

<Worker>

<Ssn>123456789</Ssn>

<Last_name>Smith</LastName>

<Hours>32.5</Hours>

</Worker>

<Worker>

<Ssn>453453453</Ssn>

<Hours>15.5</Hours>

</Worker>

</Workers>

</Project>

….

</Projects>

(25)

A sample of XML

25

The first Project element has an attribute named number

with a value of “1”

<?xml version=“1.0” standalone=“yes”?>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</ Location >

<Dept_no>5</Dept_no>

<Workers>

<Worker>

<Ssn>123456789</Ssn>

<Last_name>Smith</LastName>

<Hours>32.5</Hours>

</Worker>

<Worker>

<Ssn>453453453</Ssn>

<Hours>15.5</Hours>

</Worker>

</Workers>

</Project>

….

</Projects>

(26)

A sample of XML

First child element of Project element where id=“1”

Simple element with a name of “Name” and a value of “Product X”

<?xml version=“1.0” standalone=“yes”?>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</ Location >

<Dept_no>5</Dept_no>

<Workers>

<Worker>

<Ssn>123456789</Ssn>

<Last_name>Smith</LastName>

<Hours>32.5</Hours>

</Worker>

<Worker>

<Ssn>453453453</Ssn>

<Hours>15.5</Hours>

</Worker>

</Workers>

</Project>

….

</Projects>

(27)

A sample of XML

27

Second child element of Project element where id=“1”

Simple element with a name of “Location” and a value of “Bellaire”

<?xml version=“1.0” standalone=“yes”?>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</ Location >

<Dept_no>5</Dept_no>

<Workers>

<Worker>

<Ssn>123456789</Ssn>

<Last_name>Smith</LastName>

<Hours>32.5</Hours>

</Worker>

<Worker>

<Ssn>453453453</Ssn>

<Hours>15.5</Hours>

</Worker>

</Workers>

</Project>

….

</Projects>

(28)

A sample of XML

Third child element of Project element where id=“1”

Simple element with a name of “Dept_no” and a value of “5”

<?xml version=“1.0” standalone=“yes”?>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</ Location >

<Dept_no>5</Dept_no>

<Workers>

<Worker>

<Ssn>123456789</Ssn>

<Last_name>Smith</LastName>

<Hours>32.5</Hours>

</Worker>

<Worker>

<Ssn>453453453</Ssn>

<Hours>15.5</Hours>

</Worker>

</Workers>

</Project>

….

</Projects>

(29)

A sample of XML

29

Fourth child element of Project element where id=“1”

Complex element with a name of “Workers”

<?xml version=“1.0” standalone=“yes”?>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</ Location >

<Dept_no>5</Dept_no>

<Workers>

<Worker>

<Ssn>123456789</Ssn>

<Last_name>Smith</LastName>

<Hours>32.5</Hours>

</Worker>

<Worker>

<Ssn>453453453</Ssn>

<Hours>15.5</Hours>

</Worker>

</Workers>

</Project>

….

</Projects>

(30)

A sample of XML

First child element of Projects/

Project[number=“1”]/

Workers

Complex element with a name of “Worker”

<?xml version=“1.0” standalone=“yes”?>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</ Location >

<Dept_no>5</Dept_no>

<Workers>

<Worker>

<Ssn>123456789</Ssn>

<Last_name>Smith</LastName>

<Hours>32.5</Hours>

</Worker>

<Worker>

<Ssn>453453453</Ssn>

<Hours>15.5</Hours>

</Worker>

</Workers>

</Project>

….

</Projects>

(31)

XML Hierarchical (Tree) Data Model (cont’d.)

Tree model or hierarchical model

⚫ Main types of XML documents

Data-centric XML documents

Document-centric XML documents

Hybrid XML documents

Schemaless XML documents

◦ Do not follow a predefined schema of

element names and corresponding tree

structure

(32)

XML Document Types – Data Centric XML

⚫ Data-centric XML

◦ Highly structured

◦ Many small data items

◦ Often used for data exchange purposes

● Transfer data from one system to another

◦ Also used to create web pages dynamically from databases

◦ Generally follow a schema document that

determines their structure

(33)

XML Document Types – Document-Centric XML

⚫ Few structural elements

⚫ Large amounts of text

◦ Articles, blog entries, books

⚫ May have a schema document, but not required

◦ Schema may be very limited in semantics

● What’s a title?

● What’s a chapter?

● What’s a paragraph?

33

(34)

More XML Document Types

⚫ Hybrid XML

◦ Some parts are highly structured

◦ Some parts mostly blocks of text and/or unstructured

◦ May or may not have a predefined schema

⚫ Schemaless XML documents

◦ Semi-structured documents without a predefined schema

◦ Denoted by the attribute ‘standalone=“yes”’

in the XML declaration on the top line

(35)

Valid XML

⚫ An XML document is considered valid if:

◦ It is well-formed

◦ And…

35

To be continued after this

definition…

(36)

Well-formed XML

⚫ An XML document is well-formed when it follows certain conditions:

◦ It must start with an XML declaration line:

<?xml version=“1.0” standalone=“yes”?>

◦ It must form a tree:

● Must start with a single root element

● Every child element must have start and end tags that are contained completely within a parent element:

Good Bad

<parent> <parent>

<child> <child>

</child> </parent>

</parent> </child>

(37)

Valid XML

⚫ An XML document is considered valid if:

◦ It is well-formed, and …

◦ It follows a particular schema in a standard definition language

● A DTD document (Document Type Definition)

● An XML schema document

◦ DTDs are the original, older technology

◦ XML schema documents are the “new” hotness

● First published in 2001

37

(38)

DTD – Document Type Definition

⚫ Original method of specifying a schema definition

◦ Still in widespread use

⚫ A very simple schema definition language

◦ Each possible element in the document is defined

● What children must it have?

● What children can it (optionally) have?

● What kinds of attributes can/must it have?

● If it is a leaf element, what kinds of values can it

have?

(39)

XML Documents, DTD, and XML Schema (cont’d.)

⚫ Notation for specifying elements

⚫ XML DTD

◦ Data types in DTD are not very general

◦ Special syntax

● Requires specialized processors

◦ All DTD elements always forced to follow the specified ordering of the document

● Unordered elements not permitted

(40)

A sample XML document and DTD

40

<?xml version=“1.0” standalone=“no”?>

<!DOCTYPE Projects SYSTEM “proj.dtd”>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</Location >

<Dept_no>5</Dept_no>

<Workers>

<Worker>

<Ssn>123456789</S sn>

<Last_name>Smith<

/LastName>

<Hours>32.5</Hou rs>

</Worker>

<Worker>

<Ssn>453453453</S sn>

<Hours>15.5</Hou rs>

</Worker>

</Workers>

</Project>

We declare that we want to use a DTD by Putting the DOCTYPE declaration at the top of our XML file

!DOCTYPE keyword

Projects

The name of our DTD’s root node

SYSTEM

indicating that this is an external DTD

“proj.dtd”

the filename (or URL)

(41)

A sample XML document and DTD

41

<?xml version=“1.0” standalone=“no”?>

<!DOCTYPE Projects SYSTEM “proj.dtd”>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</Location

>

<Dept_no>5</Dept_no>

<Workers>

<Worker>

<Ssn>1234567 89</Ssn>

<Last_name>S mith</LastName>

<Hours>32.5</

Hours>

</Worker>

<Worker>

<Ssn>4534534 53</Ssn>

<Hours>15.5</

Hours>

</Worker>

</Workers>

</Project>

</Projects>

<!ELEMENT Projects (Project+)>

<!ELEMENT Project (Name, Location, Dept_no?, Workers)>

<!ATTLIST Project number ID #REQUIRED>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Location (#PCDATA)>

<!ELEMENT Dept_no (#PCDATA)>

<!ELEMENT Workers (Worker*)>

<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>

<!ELEMENT Ssn (#PCDATA)>

<!ELEMENT Last_name (#PCDATA)>

<!ELEMENT First_name (#PCDATA)>

<!ELEMENT Hours (#PCDATA)>

(42)

<!ELEMENT Projects (Project+)>

<!ELEMENT Project (Name, Location, Dept_no?, Workers)>

<!ATTLIST Project number ID #REQUIRED>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Location (#PCDATA)>

<!ELEMENT Dept_no (#PCDATA)>

<!ELEMENT Workers (Worker*)>

<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>

<!ELEMENT Ssn (#PCDATA)>

<!ELEMENT Last_name (#PCDATA)>

<!ELEMENT First_name (#PCDATA)>

<!ELEMENT Hours (#PCDATA)>

A sample DTD

root element comes first

(43)

<!ELEMENT Projects (Project+)>

<!ELEMENT Project (Name, Location, Dept_no?, Workers)>

<!ATTLIST Project number ID #REQUIRED>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Location (#PCDATA)>

<!ELEMENT Dept_no (#PCDATA)>

<!ELEMENT Workers (Worker*)>

<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>

<!ELEMENT Ssn (#PCDATA)>

<!ELEMENT Last_name (#PCDATA)>

<!ELEMENT First_name (#PCDATA)>

<!ELEMENT Hours (#PCDATA)>

A sample DTD

43 Name of element

(44)

<!ELEMENT Projects (Project+)>

<!ELEMENT Project (Name, Location, Dept_no?, Workers)>

<!ATTLIST Project number ID #REQUIRED>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Location (#PCDATA)>

<!ELEMENT Dept_no (#PCDATA)>

<!ELEMENT Workers (Worker*)>

<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>

<!ELEMENT Ssn (#PCDATA)>

<!ELEMENT Last_name (#PCDATA)>

<!ELEMENT First_name (#PCDATA)>

<!ELEMENT Hours (#PCDATA)>

A sample DTD

List of children

Regular expression-like syntax:

+ – indicates 1 or more of this child

* – indicates 0 or more of this child

? – indicates 0 or 1 of this child No symbol – indicates exactly one child

So this indicates 1 or more Project children

are required

(45)

<!ELEMENT Projects (Project+)>

<!ELEMENT Project (Name, Location, Dept_no?, Workers)>

<!ATTLIST Project number ID #REQUIRED>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Location (#PCDATA)>

<!ELEMENT Dept_no (#PCDATA)>

<!ELEMENT Workers (Worker*)>

<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>

<!ELEMENT Ssn (#PCDATA)>

<!ELEMENT Last_name (#PCDATA)>

<!ELEMENT First_name (#PCDATA)>

<!ELEMENT Hours (#PCDATA)>

A sample DTD

45 List of children

Regular expression-like syntax:

+ – indicates 1 or more of this child

* – indicates 0 or more of this child

? – indicates 0 or 1 of this child No symbol – indicates exactly one child

This indicates that Dept_no is an optional

field, but there can be only one of them

(46)

<!ELEMENT Projects (Project+)>

<!ELEMENT Project (Name, Location, Dept_no?, Workers)>

<!ATTLIST Project number ID #REQUIRED>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Location (#PCDATA)>

<!ELEMENT Dept_no (#PCDATA)>

<!ELEMENT Workers (Worker*)>

<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>

<!ELEMENT Ssn (#PCDATA)>

<!ELEMENT Last_name (#PCDATA)>

<!ELEMENT First_name (#PCDATA)>

<!ELEMENT Hours (#PCDATA)>

List of children

Regular expression-like syntax:

+ – indicates 1 or more of this child

* – indicates 0 or more of this child

? – indicates 0 or 1 of this child No symbol – indicates exactly one child

This indicates that Dept_no is an optional

field, but there can be only one of them

A sample DTD

(47)

<!ELEMENT Projects (Project+)>

<!ELEMENT Project (Name, Location, Dept_no?, Workers)>

<!ATTLIST Project number ID #REQUIRED>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Location (#PCDATA)>

<!ELEMENT Dept_no (#PCDATA)>

<!ELEMENT Workers (Worker*)>

<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>

<!ELEMENT Ssn (#PCDATA)>

<!ELEMENT Last_name (#PCDATA)>

<!ELEMENT First_name (#PCDATA)>

<!ELEMENT Hours (#PCDATA)>

List of children

Regular expression-like syntax:

+ – indicates 1 or more of this child

* – indicates 0 or more of this child

? – indicates 0 or 1 of this child No symbol – indicates exactly one child

This indicates that Dept_no is an optional

field, but there can be only one of them

A sample DTD

47

(48)

<!ELEMENT Projects (Project+)>

<!ELEMENT Project (Name, Location, Dept_no?, Workers)>

<!ATTLIST Project number ID #REQUIRED>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Location (#PCDATA)>

<!ELEMENT Dept_no (#PCDATA)>

<!ELEMENT Workers (Worker*)>

<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>

<!ELEMENT Ssn (#PCDATA)>

<!ELEMENT Last_name (#PCDATA)>

<!ELEMENT First_name (#PCDATA)>

<!ELEMENT Hours (#PCDATA)>

A sample DTD

Project has an attribute named

“number”

(49)

<!ELEMENT Projects (Project+)>

<!ELEMENT Project (Name, Location, Dept_no?, Workers)>

<!ATTLIST Project number ID #REQUIRED>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Location (#PCDATA)>

<!ELEMENT Dept_no (#PCDATA)>

<!ELEMENT Workers (Worker*)>

<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>

<!ELEMENT Ssn (#PCDATA)>

<!ELEMENT Last_name (#PCDATA)>

<!ELEMENT First_name (#PCDATA)>

<!ELEMENT Hours (#PCDATA)>

A sample DTD

49 Project has an attribute named

“number”

It’s “type” is a unique ID

This can be used to refer to this child by other elements – like a primary key

(50)

<!ELEMENT Projects (Project+)>

<!ELEMENT Project (Name, Location, Dept_no?, Workers)>

<!ATTLIST Project number ID #REQUIRED>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Location (#PCDATA)>

<!ELEMENT Dept_no (#PCDATA)>

<!ELEMENT Workers (Worker*)>

<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>

<!ELEMENT Ssn (#PCDATA)>

<!ELEMENT Last_name (#PCDATA)>

<!ELEMENT First_name (#PCDATA)>

<!ELEMENT Hours (#PCDATA)>

Project has an attribute named

“number”

It’s “type” is a unique ID

This can be used to refer to this child by other elements – like a primary key

And this attribute ID must exist on all

Project children

A sample DTD

(51)

<!ELEMENT Projects (Project+)>

<!ELEMENT Project (Name, Location, Dept_no?, Workers)>

<!ATTLIST Project number ID #REQUIRED>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Location (#PCDATA)>

<!ELEMENT Dept_no (#PCDATA)>

<!ELEMENT Workers (Worker*)>

<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>

<!ELEMENT Ssn (#PCDATA)>

<!ELEMENT Last_name (#PCDATA)>

<!ELEMENT First_name (#PCDATA)>

<!ELEMENT Hours (#PCDATA)>

A sample DTD

51 Name is a “leaf node”

#PCDATA means that it holds

“parsed character data”

It will contain a value of some kind between its start and end tag (even an empty value counts as a value for the DTD)

(52)

A sample XML document and DTD

52

<?xml version=“1.0” standalone=“no”?>

<!DOCTYPE Projects SYSTEM “proj.dtd”>

<Projects>

<Project number=“1”>

<Name>Product X</Name>

<Location>Bellaire</Location>

<Dept_no>5</Dept_no>

<Workers>

<Worker>

<Ssn>123456789</S sn>

<Last_name>Smith<

/LastName>

<Hours>32.5</Hou rs>

</Worker>

<Worker>

<Ssn>453453453</S sn>

<Hours>15.5</Hou rs>

</Worker>

</Workers>

</Project>

<!ELEMENT Projects (Project+)>

<!ELEMENT Project (Name, Location, Dept_no?, Workers)>

<!ATTLIST Project number ID #REQUIRED>

<!ELEMENT Name (#PCDATA)>

<!ELEMENT Location (#PCDATA)>

<!ELEMENT Dept_no (#PCDATA)>

<!ELEMENT Workers (Worker*)>

<!ELEMENT Worker (Ssn, Last_name?, First_name?, Hours)>

<!ELEMENT Ssn (#PCDATA)>

<!ELEMENT Last_name (#PCDATA)>

<!ELEMENT First_name (#PCDATA)>

<!ELEMENT Hours (#PCDATA)>

(53)

DTD Limitations

⚫ Data types in DTD are not general

◦ Child nodes hold PCDATA values – strings

◦ DTD has its own syntax

● Need to write a special parser for it

● Can’t leverage existing XML parsers to do DTD parsing

◦ All elements must follow the ordering laid out

● Unordered elements not allowed

53

(54)

Summary

⚫ Three main types of data: structured, semi-structured, and unstructured

⚫ XML standard

◦ Tree-structured (hierarchical) data model

◦ XML and DTD notation/language

⚫ Next class…

◦ XML Schema

◦ Storing and Extracting XML Documents

◦ XML Languages

References

Related documents