2. XML Basics. 2.1 Introduction. 2.1 Introduction. 2.1 Introduction. 2.1 Introduction. XML Databases 2. XML Basics,

(1)

XML Databases

2. XML Basics, 03.11.08

Silke Eckstein

Andreas Kupfer

Institut für Informationssysteme

Technische Universität Braunschweig

http://www.ifis.cs.tu-bs.de

2.1 Introduction

2.2 XML Formalization

2.3 Well-Formedness

2.4 XML Text Declarations

2.5 Namespaces

2.6 Overview

2.7 References

2

2. XML Basics

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

• Structure of XML documents

–

XML prolog

–

Document Type Definition (DTD)

–

Document Instance

–

Have to be well-formed (see later)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 3

2.1 Introduction

<Bücher>

<Buch>

<Autor id="1234567890">Rainer Eckstein</Autor>

<Autor id="1234568723">Silke Eckstein</Autor>

<Titel>XML und Datenmodellierung</Titel>

<Untertitel>XML-Schema ...</Untertitel>

<Verlag id="3-89864">dpunkt.Verlag</Verlag>

</Buch>

</Bücher>

• A document instance is a set of tags that is customized

to represent the content, e.g.:

<Autor>Silke Eckstein</Autor>

<Titel>XML und Datenmodellierung</Titel>

• New types of queries may require

new tags: No problem for XML!

–

Resulting set of tags forms a new markup

language (

XML dialect

).

• All

tags need to appear in properly nested

pairs (e.g.,

<t> . . . <s> . . . </s>. . . </t>).

• Tags can be freely nested to reflect the logical structure

of the content.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 4 [Scholl07]

2.1 Introduction

• XML comes with a number of additional

constructs which allow us to convey even more

useful information, e.g.:

–

Attributes

may be used to qualify tags (avoid the

so-called tag soup).

Instead of

• <question> Is it okay ...? </question>

<angry> Now I'm ... </angry>

use

• <bubble tone="question">Is it okay ...?</bubble>

<bubble tone="angry">Now I'm ...</bubble>

2.1 Introduction

• More additional constructs:

–

References

establish links internal to

an XML document:

Establish link target:

• <character id="phb">The Pointy-Haired Boss</character>

Reference the target:

• <bubble speaker="phb">Speed is the key to

success.</bubble>

(2)

2.1 Introduction

2.2 XML Formalization

• Elements

• Attributes

• Entities

• Miscellaneous

• General structure

2.3 Well-Formedness

2.4 XML Text Declarations

2.5 Namespaces

2.6 Overview

2.7 References

7

Outline

• We will now try to approach XML in a slightly more

formal way.

• This discussion will be based on the central XML

technical specification:

–

Extensible Markup Language (XML) 1.1 (Second Edition)

W3C Recommendation Aug 2006

(

http://www.w3.org/TR/xml11

)

2.2 XML Formalization

Visit the W3C site

This lecture does not try to be a "guided tour" through the XML-related

W3C technical documents (boring!).

Instead we will cover the basic principles and most interesting ideas. Visit

the W3C site and use the original W3C documents to get a full grasp of

their contents.

• Elements

–

… are the

basic modules

of XML documents

–

… consist of a

start-

and an

end-tag

with the

element content

in between

–

… may also be

empty

(with an empty-element tag then)

–

… may be

nested

, which leads to

hierarchical structure

of

XML documents

2.2 XML Formalization

Well-formed XML (fragments)

:

<foo> okay </foo>

<This-is-a-well-formed-XML-tag.> okay

</This-is-a-well-formed-XML-tag.>

<foo>okay</foo>

<foo/>

Non-well-formed XML:

<foo> oops </

bar

>

<foo> oops </

F

oo>

<foo> oops ... ‹

EOT

›

• Elements – examples:

2.2 XML Formalization

Nested element:

<address>

<street> Rudower Chaussee </street>

<no> 25 </no>

<zip> 12489 </zip>

<city> Berlin </city>

</address>

Simple element:

<city> Berlin </city>

Empty element:

<fax/>

• Element content may contain document characters as

well as properly nested elements (so-called mixed

content):

2.2 XML Formalization

Well-formed XML

<foo><bar>

<baz> okay </baz>

</bar>

<ok> okay </ok> still okay

</foo>

Non-well-formed XML

<foo><bar> oops </foo></bar>

<foo><bar> oops </bar><bar> oops </foo></bar>

• Element nesting establishes a parent-child

relationship between elements:

–

In the XML fragment <c> . . . </c>. . . <c'> . . . </c'> ,

• element p is the

parent

of elements c; c',

• elements c; c' are

children

of element p,

• elements c; c' are

siblings

.

• There is exactly one element that encloses the whole

XML content: the root element.

2.2 XML Formalization

Non-well-formed XML

<one> one eins un </one>

<two> two zwei deux </two>

(3)

• Attributes

–

… may specify further properties of elements

–

… may not be nested

–

… are not considered to be children of the containing element

(instead they are

owned

by the containing element)

–

Attribute

values

are restricted to character data.

2.2 XML Formalization

Well-formed XML (fragments)

<price currency="Euro"> 23.45 </price>

<price>

<currency> Euro </currency>

23.45 </price>

• An Element can contain each attribute only once:

2.2 XML Formalization

Non-well-formed XML

<Team person='Erna' person='Hugo' person='Agnes'/>

Well-formed XML (fragments)

<Team persons='Erna Hugo Agnes'/>

<Team person1='Erna' person2='Hugo' person3='Agnes'/>

<Team>

<Person>Erna</Person>

<Person>Hugo</Person>

<Person>Agnes</Person>

</Team>

• Entities

–

In XML, document

content

and

markup

are specified using a

single set of characters.

–

Characters { <, >, &, ", ' } form pieces of XML markup, they may

be denoted by

predefined entities

to represent content:

–

The XML entity facility is actually a versatile recursive

macro

expansion machinery (more on that later).

2.2 XML Formalization

Character

Entity

<

>

&

"

'

Well-formed XML:

<operators>

Valid comparison operators are

<, =, & >.

</operators>

• CDATA sections

–

… may occur anywhere where character data may occur.

–

… are used to escape blocks of text containing characters

which would otherwise be recognized as markup.

–

Within a CDATA section, only the string ']]>' is

recognized as markup

• left angle brackets and ampersands may occur in their literal

form;

• they need not (and cannot) be escaped using "<" and "&".

–

CDATA sections cannot nest.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 16 [XML06]

2.2 XML Formalization

Well-formed XML (fragments)

<![CDATA[<greeting>Hello, world!</greeting>]]>

• Comments

–

… may appear

anywhere

in a document

outside

other markup

–

… may not end with '--->'

2.2 XML Formalization

Well-formed XML (fragments)

Non-well-formed XML

2.1 Introduction

2.2 XML Formalization

2.3 Well-Formedness

2.4 XML Text Declarations

2.5 Namespaces

2.6 Overview

2.7 References

Outline

(4)

• The W3C XML recommendation is actually more

formal and rigid in dening the syntactical structure

of XML:

–

"A textual object is

well-formed

XML if,

1. Taken as a whole, it matches the production labeled

"document".

2. It meets all the

well-formedness constraints

given in

this [the W3C XML Recommendation] specification. . . . "

2.3 Well-Formedness

• Well-formedness #1

: Context-free Properties

–

All

context-free

properties of well-formed XML

documents are concisely captured by a

grammar

(using an EBNF-style notation).

• Grammar

: system of

production (rule)s

of the form

lhs ::= rhs

–

Excerpt of the XML grammar

(see next pages):

2.3 Well-Formedness

No

lhs

rhs

[1]

Document

::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* )

[2]

Char

::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

[2a] RestrictedChar

::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] |

[#x86-#x9F]

[3]

S

::= (#x20 | #x9 | #xD | #xA)+

[4]

NameStartChar

::=

":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] |

[#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] |

[#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] |

[#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

[4a] NameChar

::=

NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] |

[#x203F-#x2040]

[5]

Name

::= NameStartChar (NameChar)*

[10] AttValue

::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"

[14] CharData

::= [^<&]* - ([^<&]* ']]>' [^<&]*)

2.3 Well-Formedness

No

lhs

rhs

[22] prolog

::= XMLDecl Misc* (doctypedecl Misc*)?

[23] XMLDecl

::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'

[24]

VersionInfo

::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')

[25] Eq

::= S? '=' S?

[26] VersionNum

::= '1.1'

[27] Misc

::= Comment | PI | S

[39] element

::= EmptyElemTag | STag content ETag

[40] STag

::= '<' Name (S Attribute)* S? '>'

[41] Attribute

::= Name Eq AttValue

[42] ETag

::= '</' Name S? '>'

[43] content

::=

CharData? ((element | Reference | CDSect | PI | Comment)

CharData?)*

[44] EmptyElemTag

::= '<' Name (S Attribute)* S? '/>'

[67] Reference

::= EntityRef | CharRef

[68] EntityRef

::= '&' Name ';'

• N.B.

–

The numbers in [] refer to the

correspondig productions in the

W3C XML Recommendation.

2.3 Well-Formedness

Expression …

… denotes

r*

ε

,

r

,

rr

,

rrr

, …

zero or more repititions of

r

+

rr*

one or more repititions of

r

?

r

|

ε

optional

r

[abc]

a|b|c

character class

[^abc]

inverted character class

• Remarks

–

As usual, the XML grammar may systematically be

transformed into a program, an

XML parser

, to be used

to check the syntax of XML input.

2.3 Well-Formedness

Rule … … implements this characteristic of XML:

[1]

an XML document contains exactly one root element

[10]

attribute values are enclosed in " or '

[22]

XML documents have to include a declaration prolog

[14]

characters < and & may not appear literally in element content

[43]

element content may contain character data and entity references as well as

nested elements

[68]

entity references may contain arbitrary entity names (other than lt, amp, . . . )

(5)

• Parsing XML

1. Starting with the symbol

document

, the parser uses

the lhs ::= rhs rules to expand symbols, constructing

a

parse tree

.

2. The leaves of the parse tree are

characters which have no further

expansion.

3. The XML input is

parsed

successfully if it perfectly

matches the parse tree's

front

(concatenate the

parse tree leaves from left to right).

2.3 Well-Formedness

speaker

• Example 1

–

Parse tree for XML input

<?xml … ?> <bubble speaker="phb">Um... No.</bubble>

2.3 Well-Formedness

document

prolog

element

Misc*

STag

bubble

Name

<

Attribute

S?

>

ε

S

…

[

S

S?

Eq

=

S?

ε

Name

AttValue

"phb"

content

CharData

Um… No.

STag

bubble

Name

</

S?

>

ε

• Example 2

–

Parse tree for the \minimal" XML document

<?xml version="1.1"?><foo/>

2.3 Well-Formedness

document

prolog

XMLDecl

<?xml

VersionInfo

EncodingDecl?

ε

Misc*

[

version

S

?>

S?

Eq

"

=

1.1 VersionNum

element

Misc*

S?

ε

"

ε

EmptyElemTag

foo

Name

ε

<

(S Attribute) S?*

/>

ε

[

S

[

S

[[

• Well-formedness #2

: Context-dependent

Properties

–

The XML grammar cannot enforce all XML

well-formedness constraints

(

WFCs

).

–

Some XML WFCs depend on

1. what the XML parser has

seen before

in its

input, or

2. on a

global state

, e.g., the denitions of

user-declared entities.

–

These WFCs cannot be checked by simply comparing

the parse tree front against the XML input (

context-dependent WFCs

).

2.3 Well-Formedness

• Sample WFCs

–

All 10 XML WFCs are given in http://www.w3.org/TR/REC-xml

2.3 Well-Formedness

WFC

Comment

(2) Element Type Match

The Name in an element's end tag must match

the element name in the start tag.

(3) Unique Att Spec

No attribute name may appear more than once

in the same start tag or empty element tag.

(5) No < in Attribute Values The replacement text of any entity referred to

directly or indirectly in an attribute value

(other than <) must not contain a <.

(9) No Recursion

A parsed entity must not contain a recursive

reference to itself, either directly or indirectly.

2.1 Introduction

2.2 XML Formalization

2.3 Well-Formedness

2.4 XML Text Declarations

2.5 Namespaces

2.6 Overview

2.7 References

Outline

(6)

• The

XML Text Declaration

<?xml. . . ?>

–

A well-formed XML 1.1 document has to start

with a

header

, the

text declaration

(grammar rule [23]):

XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'

–

VersionInfo:

• An XML document whose text declaration carries a

VersionInfo of version="1.1" is required to conform to

W3C's XML Recommendation posted on August 16, 2006

(see http://www.w3.org/TR/xml11).

2.4 XML Text Declarations

• For XML 1.0 this header is optional

–

I.e. the following is an XML 1.0 document

because it does not have an XML declaration:

• <greeting>Hello, world!</greeting>

• Encoding declaration:

–

Documents that use an encoding other than UTF-8 or

UTF-16 (see below) must announce so using the XML

text declaration, e.g.

• <?xml version="1.1" encoding="iso-8859-15"?> or

• <?xml version="1.1" encoding="utf-32"?>

2.4 XML Text Declarations

• XML Documents and Character

Encoding

–

For a computer, a character like X is

nothing but an 8 (16/32) bit

number

whose value is interpreted as the

character X when needed (e.g., to

drive a display).

–

Trouble is, a large number of such

number

character mapping tables, the

so-called

encodings

, are in parallel use

today.

2.4 XML Text Declarations

• … XML Documents and Character Encoding

–

Due to the huge amount of characters needed by the

global computing community today (Latin, Hebrew, Arabic,

Greek, Japanese, Chinese . . . languages),

conflicting

intersections

between encodings are common.

–

Example:

• 0xa4 0xcb 0xe4 0xd3

€

Λ δ

∑

• 0xa4 0xcb 0xe4 0xd3

€ Ë ä Ó

2.4 XML Text Declarations

iso-8859-7

iso-8859-15

• Unicode

–

The

Unicode

Initiative aims to define a new encoding

that tries to embrace all character needs:

• characters of "all" languages of the world,

• plus scientific, mathematical, technical,

box drawing, . . . symbols

(see http://www.unicode.org/charts/).

–

Range of the Unicode encoding:

0x0000 – 0x10FFFF (17 * 65536 characters).

• Codes that fit into the first 16 bits (denoted U+0000 - U+FFFF)

have been assigned to encode the most widely used languages

and their characters (

Basic Multilingual Plane, BMP

).

• Codes U+0000 - U+007F have been assigned to match the 7-bit

ASCII encoding which is pervasive today.

2.4 XML Text Declarations

• Unicode Transformation Formats

–

Current CPUs operate most efficiently on

32-bit words

(

16-bit words

,

8-bit bytes

).

–

Unicode thus developed

Unicode Transformation Formats

(UTF) which define how a Unicode character code between

U+0000 – U+10FFFF is to be mapped into a 32-bit word

(16-bit words, 8-(16-bit bytes).

• UTF-32

(map a Unicode character into a 32-bit word)

1. Map any Unicode character in the range U+0000 – U+10FFFF

to the corresponding 32-bit value 0x00000000 – 0x0010FFFF.

2. N.B.

For each Unicode character encoded in UTF-32 we

waste at least 11 zero bits.

(7)

• UTF-16

(map a Unicode character into one or two 16-bit words)

1. Apply the following mapping scheme:

2. For the range U+000000 – U+00FFFF, simply fill the



positions with the 16 bit of the character code.

(Code ranges U+D800 – U+DBFF and U+DC00 – U+DFFF are

unassigned!)

3. For the U+010000 – U+10FFFF range, subtract 0x010000 from the

character code and fill the



positions using the resulting 20-bit value.

2.4 XML Text Declarations

Unicode range

Word sequence

U+000000 – U+00FFFF



U+010000 – U+10FFFF

110110



110111



Example

Unicode character U+012345 (0x012345 – 0x010000 = 0x02345):

UTF-16: 1101100000001000 1101111101000101

• UTF-8

–

UTF-16 is designed to facilitate efficient and robust

decoding:

• If we see a leading 11011 bit pattern in a 16-bit word, we know it

is the first

or

second word in a UTF-16 multi-word sequence.

• The sixth bit of the word then tells us if we actually look at the

first or second word.

–

UTF-8 (map a Unicode character into a sequence of 8-bit

bytes)

• UTF-8 is of special importance because

a)

a stream of 8 bit bytes (octets) is what flows over an IP

network connection,

b) text-processing software today is built to deal with 8 bit

character encodings (iso-8859-x, ASCII, etc.).

2.4 XML Text Declarations

• UTF-8 encoding

1. Apply the following mapping scheme:

2. The spare bits (



) are filled with the bits of the character code to

be represented (rightmost



is least significant bit, pad to the left

with 0-bits).

Examples:

• Unicode character U+00A9 (© sign):

UTF-8: 11000010 10101001 (0xC2 0xA9)

• Unicode character U+2260 (math relation symbol ≠):

UTF-8: 11100010 10001001 10100000 (0xE2 0x89 0xA0)

2.4 XML Text Declarations

Unicode range

Word sequence

U+000000 – U+00007F

0 

U+000080 – U+0007FF

110 

10 

U+000800 – U+00FFFF

1110



10 

U+010000 – U+10FFFF

11110



10 

• Advantages of UTF-8 encoding

–

For a UTF-8 multi-byte sequence, the

length of the

sequence

is equal to the number of leading 1-bits (in the

first byte), e.g.:

11100010 10001001 10100000

(Only single-byte UTF-8 encodings have a leading 0-bit.)

–

Character boundaries

are simple to detect (even when

placed at some arbitrary position in a UTF-8 byte stream).

–

UTF-8 encoding does not affect (binary) sort order.

–

Text processing software which was originally developed

to work with the pervasive 7-bit ASCII encoding remains

functional. This is especially true for the C programming

language and its string (char[]) representation.

2.4 XML Text Declarations

• XML and Unicode

–

A conforming XML parser is required to correctly process UTF-8

and UTF-16 encoded documents. (The W3C XML Recommendation

predates the UTF-32 definition).

–

Documents that use a different encoding must announce so using the

XML text declaration, e.g.

<?xml encoding="iso-8859-15"?> or <?xml encoding="utf-32"?>

–

Otherwise, an XML parser is encouraged to

guess

the encoding

while reading the very first bytes of the input XML document:

2.4 XML Text Declarations

Head of doc

Encoding guess

0x00 0x3C 0x00 0x3F

UTF-16 (big-endian)

0x3C 0x00 0x3F 0x00

UTF-16 (little-endian)

0x3C 0x3F 0x78 0x6D

UTF-8 (or ASCII, iso-8859-?: erroneous)

(Notice: < = U+003C, ? = U+003F, x = U+0078, m = U+006D)

2.1 Introduction

2.2 XML Formalization

2.3 Well-Formedness

2.4 XML Text Declarations

2.5 Namespaces

2.6 Overview

2.7 References

Outline

(8)

• Insertion: Universal Resource Identifiers

–

URL (Universal Resource Locator): resolvable identifier on the

Web

• The target of an URL pointer

is

an HTML file (virtual or materialized)

–

URI (Universal Resource Identifier): general purpose key to

resources on the Web

• Uniquely identifies a resource

• Target is not an HTML file, can be anything (schema, table, file, entity,

object, tuple, person, physical item, etc.)

• Lifetime and scope of this "key" is user dependent

–

IRI (Internationalized Resource Identifiers)

• Allow non Latin characters (Chinese, Arabic, Japanese, etc.)

–

URL, URI, IRI

• All strings

• Very LONG strings

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 43 [Fisch05]

2.5 Namespaces

• How the web does work

–

Individually created documents linked by ambiguous references

• How the web should work

–

Global database of knowledge

• Key to doing that is to permit distributed knowledge

creation and lazy integration

• Problems

–

Vocabulary collisions

–

Joins

• Namespaces

–

Build on URI / IRI notion

–

Make it possible to uniquely qualify intra-document name

collisions

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 44 [Lag05]

2.5 Namespaces

<?xml version=“1.1” encoding=“UTF-8”?>

<Book>

<ISBN>0743204794</ISBN>

<author>Kevin Davies</author>

<

title

>Cracking the Genome</

title

>

<price>20.00</price>

</Book>

<?xml version=“1.1” encoding=“UTF-8”?>

<html>

<head>

<

title

>My home page</

title

>

</head>

<body>

My hobbyMy books

</body>

</html>

2.5 Namespaces

<?xml version=“1.1” encoding=“UTF-8”?>

<html>

<head>

<

title

>My home page</

title

>

</head>

<body>

My hobby

My books

<Book>

<ISBN>0743204794</ISBN>

<author>Kevin Davies</author>

<

title

>Cracking the Genome</

title

>

<price>20.00</price>

</Book>

</body>

</html>

2.5 Namespaces

<?xml version=“1.1” encoding=“UTF-8”?>

<xhtml:html>

<xhtml:head>

<

xhtml:title

>My home page</

xhtml:title

>

</xhtml:head>

<xhtml:body>

<xhtml:p>My hobby</xhtml:p>

<xhtml:p>My books

<bo:Book>

<bo:ISBN>0743204794</bo:ISBN>

<bo:author>Kevin Davies</bo:author>

<

bo:title

>Cracking the Genome</

bo:title

>

<bo:price>20.00</bo:price>

</bo:Book>

</xhtml:p>

</xhtml:body>

</xhtml:html>

2.5 Namespaces

bo

xhtml

bo:Book

bo:title

bo:author

bo:price

bo:ISBN

xhtml:html

xhtml:head

xhtml:body

xhtml:p

xhtml:title

vocabulary bo

vocabulary xhtml

(9)

• Give

prefixes

only local relevance in an instance

document

• Associate local prefix with

global namespace

name

–

a unique name for a namespace

–

uniqueness is guaranteed by using a URI in domain of the

party creating the namespace

–

doesn’t have any meaning, i.e. doesn’t have to resolve into

anything

• An

XML namespace

is a

collection of names

,

identified by a URI reference

, which are used in

XML documents as element types and attribute

names.

2.5 Namespaces

<?xml version=“1.1” encoding=“UTF-8”?>

<xhtml:html

xmlns:xhtml=“http://www.w3c.org/1999/xhtml”

xmlns:bo=“http://www.nogood.com/Book”

>

<xhtml:head>

<xhtml:title>My home page</xhtml:title>

</xhtml:head>

<xhtml:body>

<xhtml:p>My hobby</xhtml:p>

<xhtml:p>My books

<bo:Book>

<bo:ISBN>0743204794</bo:ISBN>

<bo:author>Kevin Davies</bo:author>

………

</bo:Book>

</xhtml:p>

</xhtml:body>

</xhtml:html>

2.5 Namespaces

<?xml version=“1.1” encoding=“UTF-8”?>

<html

xmlns=“http://www.w3c.org/1999/xhtml”

xmlns:bo=“http://www.nogood.com/Book”

>

<head>

<title>My home page</title>

</head>

<body>

My hobby

My books

<bo:Book>

<bo:ISBN>0743204794</bo:ISBN>

<bo:author>Kevin Davies</bo:author>

………

</bo:Book>

</body>

</html>

2.5 Namespaces

<?xmlversion=“1.0” encoding=“UTF-8”?>

<html

xmlns=“http://www.w3c.org/1999/xhtml”>

<head>

<title>My home page</title>

</head>

<body>

My hobby

My books

<bo:Book

xmlns:bo=“http://www.nogood.com/Book”

>

<bo:ISBN>0743204794</bo:ISBN>

<bo:author>Kevin Davies</bo:author>

………

</bo:Book>

</body>

</html>

2.5 Namespaces

<?xml version=“1.0” encoding=“UTF-8”?>

<html

xmlns=“http://www.w3c.org/1999/xhtml”>

<head>

<title>My home page</title>

</head>

<body>

My hobby

My books

<Book

xmlns=“http://www.nogood.com/Book”

>

<ISBN>0743204794</ISBN>

<author>Kevin Davies</author>

………

</Book>

</body>

</html>

2.1 Introduction

2.2 XML Formalization

2.3 Well-Formedness

2.4 XML Text Declarations

2.5 Namespaces

2.6 Overview

2.7 References

Outline

(10)

1. Introduction

2. XML Basics

3. Schema definition

4. XML query languages I

5. Mapping relational data

to XML

6. SQL/XML

7. XML processing

8. XML query languages II

9. XML storage I

10. XML storage - index

11. XML storage - native

12. Updates / Transactions

13. Systems

14. XML Benchmarks

2.6 Overview

55

• http://www.w3.org/ [W3C]

• Extensible Markup Language (XML) 1.1 (2nd Edition)

[XML06]

–

W3C Recommendation 16 August 2006, edited in place 29

September 2006

–

http://www.w3.org/TR/xml11

• M. Scholl, "XML and Databases", Lecture, Uni

Konstanz, WS07/08 [Scholl07]

• Carl Lagoze, "Architecture of Web Information

Systems", Cornell University, Spring 05, [Lag05]

http://www.cs.cornell.edu/Courses/cs431/2005sp/syllabus.htm

56

2.7 References

• XML in a Nutshell [HM04]

–

Harold & Means

–

O'Reilly, 2004, ISBN 0596007647

• The Unicode Standard, Version 5.0

–

The Unicode Consortium

(http://www.unicode.org/) )

–

Addison-Wesley; 5th edition, 2006

ISBN

:

0321480910

• Peter Fischer, "XML und Datenbanken", Lecture,

ETH Zürich, WS 05/06 [Fisch05]

57

2.7 References

• Now, or ...

• Room:

IZ 232

• Office our:

Tuesday, 12:30 – 13:30 Uhr

or on appointment

• Email:

[email protected]

Questions, Ideas, Comments