• No results found

2. XML Basics. 2.1 Introduction. 2.1 Introduction. 2.1 Introduction. 2.1 Introduction. XML Databases 2. XML Basics,

N/A
N/A
Protected

Academic year: 2021

Share "2. XML Basics. 2.1 Introduction. 2.1 Introduction. 2.1 Introduction. 2.1 Introduction. XML Databases 2. XML Basics,"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

XML Databases

2. XML Basics, 03.11.08

Silke Eckstein

Andreas Kupfer

Institut für Informationssysteme

Technische Universität Braunschweig

http://www.ifis.cs.tu-bs.de

2.1 Introduction

2.2 XML Formalization

2.3 Well-Formedness

2.4 XML Text Declarations

2.5 Namespaces

2.6 Overview

2.7 References

2

2. XML Basics

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

Structure of XML documents

XML prolog

Document Type Definition (DTD)

Document Instance

Have to be well-formed (see later)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 3

2.1 Introduction

<Bücher>

<Buch>

<Autor id="1234567890">Rainer Eckstein</Autor>

<Autor id="1234568723">Silke Eckstein</Autor>

<Titel>XML und Datenmodellierung</Titel>

<Untertitel>XML-Schema ...</Untertitel>

<Verlag id="3-89864">dpunkt.Verlag</Verlag>

</Buch>

</Bücher>

A document instance is a set of tags that is customized

to represent the content, e.g.:

<Autor>Silke Eckstein</Autor>

<Titel>XML und Datenmodellierung</Titel>

New types of queries may require

new tags: No problem for XML!

Resulting set of tags forms a new markup

language (

XML dialect

).

All

tags need to appear in properly nested

pairs (e.g.,

<t> . . . <s> . . . </s>. . . </t>).

Tags can be freely nested to reflect the logical structure

of the content.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 4 [Scholl07]

2.1 Introduction

XML comes with a number of additional

constructs which allow us to convey even more

useful information, e.g.:

Attributes

may be used to qualify tags (avoid the

so-called tag soup).

Instead of

<question> Is it okay ...? </question>

<angry> Now I'm ... </angry>

use

<bubble tone="question">Is it okay ...?</bubble>

<bubble tone="angry">Now I'm ...</bubble>

2.1 Introduction

More additional constructs:

References

establish links internal to

an XML document:

Establish link target:

<character id="phb">The Pointy-Haired Boss</character>

Reference the target:

<bubble speaker="phb">Speed is the key to

success.</bubble>

(2)

2.1 Introduction

2.2 XML Formalization

Elements

Attributes

Entities

Miscellaneous

General structure

2.3 Well-Formedness

2.4 XML Text Declarations

2.5 Namespaces

2.6 Overview

2.7 References

7

Outline

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

We will now try to approach XML in a slightly more

formal way.

This discussion will be based on the central XML

technical specification:

Extensible Markup Language (XML) 1.1 (Second Edition)

W3C Recommendation Aug 2006

(

http://www.w3.org/TR/xml11

)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 8 [Scholl07]

2.2 XML Formalization

Visit the W3C site

This lecture does not try to be a "guided tour" through the XML-related

W3C technical documents (boring!).

Instead we will cover the basic principles and most interesting ideas. Visit

the W3C site and use the original W3C documents to get a full grasp of

their contents.

Elements

… are the

basic modules

of XML documents

… consist of a

start-

and an

end-tag

with the

element content

in between

… may also be

empty

(with an empty-element tag then)

… may be

nested

, which leads to

hierarchical structure

of

XML documents

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 9 [Scholl07]

2.2 XML Formalization

Well-formed XML (fragments)

:

<foo> okay </foo>

<This-is-a-well-formed-XML-tag.> okay

</This-is-a-well-formed-XML-tag.>

<foo>okay</foo>

<foo/>

Non-well-formed XML:

<foo> oops </

bar

>

<foo> oops </

F

oo>

<foo> oops ... ‹

EOT

Elements – examples:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 10

2.2 XML Formalization

Nested element:

<address>

<street> Rudower Chaussee </street>

<no> 25 </no>

<zip> 12489 </zip>

<city> Berlin </city>

</address>

Simple element:

<city> Berlin </city>

Empty element:

<fax/>

Element content may contain document characters as

well as properly nested elements (so-called mixed

content):

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 11 [Scholl07]

2.2 XML Formalization

Well-formed XML

<foo><bar>

<baz> okay </baz>

</bar>

<ok> okay </ok> still okay

</foo>

Non-well-formed XML

<foo><bar> oops </foo></bar>

<foo><bar> oops </bar><bar> oops </foo></bar>

Element nesting establishes a parent-child

relationship between elements:

In the XML fragment <p> <c> . . . </c>. . . <c'> . . . </c'> </p>,

element p is the

parent

of elements c; c',

elements c; c' are

children

of element p,

elements c; c' are

siblings

.

There is exactly one element that encloses the whole

XML content: the root element.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 12 [Scholl07]

2.2 XML Formalization

Non-well-formed XML

<one> one eins un </one>

<two> two zwei deux </two>

(3)

Attributes

… may specify further properties of elements

… may not be nested

… are not considered to be children of the containing element

(instead they are

owned

by the containing element)

Attribute

values

are restricted to character data.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 13 [Scholl07]

2.2 XML Formalization

Well-formed XML (fragments)

<price currency="Euro"> 23.45 </price>

<price>

<currency> Euro </currency>

23.45

</price>

An Element can contain each attribute only once:

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 14 [Scholl07]

2.2 XML Formalization

Non-well-formed XML

<Team person='Erna' person='Hugo' person='Agnes'/>

Well-formed XML (fragments)

<Team persons='Erna Hugo Agnes'/>

<Team person1='Erna' person2='Hugo' person3='Agnes'/>

<Team>

<Person>Erna</Person>

<Person>Hugo</Person>

<Person>Agnes</Person>

</Team>

Entities

In XML, document

content

and

markup

are specified using a

single set of characters.

Characters { <, >, &, ", ' } form pieces of XML markup, they may

be denoted by

predefined entities

to represent content:

The XML entity facility is actually a versatile recursive

macro

expansion machinery (more on that later).

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 15 [Scholl07]

2.2 XML Formalization

Character

Entity

<

&lt;

>

&gt;

&

&amp;

"

&quot;

'

&apos;

Well-formed XML:

<operators>

Valid comparison operators are

&lt;, =, &amp; &gt;.

</operators>

CDATA sections

… may occur anywhere where character data may occur.

… are used to escape blocks of text containing characters

which would otherwise be recognized as markup.

Within a CDATA section, only the string ']]>' is

recognized as markup

left angle brackets and ampersands may occur in their literal

form;

they need not (and cannot) be escaped using "&lt;" and "&amp;".

CDATA sections cannot nest.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 16 [XML06]

2.2 XML Formalization

Well-formed XML (fragments)

<![CDATA[<greeting>Hello, world!</greeting>]]>

Comments

… may appear

anywhere

in a document

outside

other markup

… may not end with '--->'

2.2 XML Formalization

Well-formed XML (fragments)

<!-- declarations for <head> & <body> -->

Non-well-formed XML

<!-- B+, B, or B--->

2.1 Introduction

2.2 XML Formalization

2.3 Well-Formedness

2.4 XML Text Declarations

2.5 Namespaces

2.6 Overview

2.7 References

Outline

(4)

The W3C XML recommendation is actually more

formal and rigid in dening the syntactical structure

of XML:

"A textual object is

well-formed

XML if,

1. Taken as a whole, it matches the production labeled

"document".

2. It meets all the

well-formedness constraints

given in

this [the W3C XML Recommendation] specification. . . . "

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 19 [Scholl07]

2.3 Well-Formedness

Well-formedness #1

: Context-free Properties

All

context-free

properties of well-formed XML

documents are concisely captured by a

grammar

(using an EBNF-style notation).

Grammar

: system of

production (rule)s

of the form

lhs ::= rhs

Excerpt of the XML grammar

(see next pages):

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 20 [Scholl07]

2.3 Well-Formedness

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 21

No

lhs

rhs

[1]

Document

::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* )

[2]

Char

::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

[2a] RestrictedChar

::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] |

[#x86-#x9F]

[3]

S

::= (#x20 | #x9 | #xD | #xA)+

[4]

NameStartChar

::=

":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] |

[#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] |

[#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] |

[#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

[4a] NameChar

::=

NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] |

[#x203F-#x2040]

[5]

Name

::= NameStartChar (NameChar)*

[10] AttValue

::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"

[14] CharData

::= [^<&]* - ([^<&]* ']]>' [^<&]*)

2.3 Well-Formedness

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 22

No

lhs

rhs

[22] prolog

::= XMLDecl Misc* (doctypedecl Misc*)?

[23] XMLDecl

::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'

[24]

VersionInfo

::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')

[25] Eq

::= S? '=' S?

[26] VersionNum

::= '1.1'

[27] Misc

::= Comment | PI | S

[39] element

::= EmptyElemTag | STag content ETag

[40] STag

::= '<' Name (S Attribute)* S? '>'

[41] Attribute

::= Name Eq AttValue

[42] ETag

::= '</' Name S? '>'

[43] content

::=

CharData? ((element | Reference | CDSect | PI | Comment)

CharData?)*

[44] EmptyElemTag

::= '<' Name (S Attribute)* S? '/>'

[67] Reference

::= EntityRef | CharRef

[68] EntityRef

::= '&' Name ';'

N.B.

The numbers in [] refer to the

correspondig productions in the

W3C XML Recommendation.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 23 [Scholl07]

2.3 Well-Formedness

Expression …

… denotes

r*

ε

,

r

,

rr

,

rrr

, …

zero or more repititions of

r

r

+

rr*

one or more repititions of

r

r

?

r

|

ε

optional

r

[abc]

a|b|c

character class

[^abc]

inverted character class

Remarks

As usual, the XML grammar may systematically be

transformed into a program, an

XML parser

, to be used

to check the syntax of XML input.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 24 [Scholl07]

2.3 Well-Formedness

Rule … … implements this characteristic of XML:

[1]

an XML document contains exactly one root element

[10]

attribute values are enclosed in " or '

[22]

XML documents have to include a declaration prolog

[14]

characters < and & may not appear literally in element content

[43]

element content may contain character data and entity references as well as

nested elements

[68]

entity references may contain arbitrary entity names (other than lt, amp, . . . )

(5)

Parsing XML

1. Starting with the symbol

document

, the parser uses

the lhs ::= rhs rules to expand symbols, constructing

a

parse tree

.

2. The leaves of the parse tree are

characters which have no further

expansion.

3. The XML input is

parsed

successfully if it perfectly

matches the parse tree's

front

(concatenate the

parse tree leaves from left to right).

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 25 [Scholl07]

2.3 Well-Formedness

speaker

Example 1

Parse tree for XML input

<?xml … ?> <bubble speaker="phb">Um... No.</bubble>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 26 [Scholl07]

2.3 Well-Formedness

document

prolog

element

Misc*

STag

bubble

Name

<

Attribute

S?

>

ε

S

[

S

S?

Eq

=

S?

ε

ε

Name

AttValue

"phb"

content

CharData

Um… No.

STag

bubble

Name

</

S?

>

ε

ε

Example 2

Parse tree for the \minimal" XML document

<?xml version="1.1"?><foo/>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 27 [Scholl07]

2.3 Well-Formedness

document

prolog

XMLDecl

<?xml

VersionInfo

EncodingDecl?

ε

Misc*

[

version

S

?>

S?

S?

Eq

"

=

1.1

VersionNum

element

Misc*

S?

ε

ε

"

ε

ε

EmptyElemTag

foo

Name

ε

<

(S Attribute)* S?

/>

ε

[

S

[

S

[[

[[

Well-formedness #2

: Context-dependent

Properties

The XML grammar cannot enforce all XML

well-formedness constraints

(

WFCs

).

Some XML WFCs depend on

1. what the XML parser has

seen before

in its

input, or

2. on a

global state

, e.g., the denitions of

user-declared entities.

These WFCs cannot be checked by simply comparing

the parse tree front against the XML input (

context-dependent WFCs

).

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 28 [Scholl07]

2.3 Well-Formedness

Sample WFCs

All 10 XML WFCs are given in http://www.w3.org/TR/REC-xml

2.3 Well-Formedness

WFC

Comment

(2) Element Type Match

The Name in an element's end tag must match

the element name in the start tag.

(3) Unique Att Spec

No attribute name may appear more than once

in the same start tag or empty element tag.

(5) No < in Attribute Values The replacement text of any entity referred to

directly or indirectly in an attribute value

(other than &lt;) must not contain a <.

(9) No Recursion

A parsed entity must not contain a recursive

reference to itself, either directly or indirectly.

2.1 Introduction

2.2 XML Formalization

2.3 Well-Formedness

2.4 XML Text Declarations

2.5 Namespaces

2.6 Overview

2.7 References

Outline

(6)

The

XML Text Declaration

<?xml. . . ?>

A well-formed XML 1.1 document has to start

with a

header

, the

text declaration

(grammar rule [23]):

XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'

VersionInfo:

An XML document whose text declaration carries a

VersionInfo of version="1.1" is required to conform to

W3C's XML Recommendation posted on August 16, 2006

(see http://www.w3.org/TR/xml11).

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 31 [Scholl07]

2.4 XML Text Declarations

For XML 1.0 this header is optional

I.e. the following is an XML 1.0 document

because it does not have an XML declaration:

<greeting>Hello, world!</greeting>

Encoding declaration:

Documents that use an encoding other than UTF-8 or

UTF-16 (see below) must announce so using the XML

text declaration, e.g.

<?xml version="1.1" encoding="iso-8859-15"?> or

<?xml version="1.1" encoding="utf-32"?>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 32 [Scholl07]

2.4 XML Text Declarations

XML Documents and Character

Encoding

For a computer, a character like X is

nothing but an 8 (16/32) bit

number

whose value is interpreted as the

character X when needed (e.g., to

drive a display).

Trouble is, a large number of such

number

character mapping tables, the

so-called

encodings

, are in parallel use

today.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 33 [Scholl07]

2.4 XML Text Declarations

… XML Documents and Character Encoding

Due to the huge amount of characters needed by the

global computing community today (Latin, Hebrew, Arabic,

Greek, Japanese, Chinese . . . languages),

conflicting

intersections

between encodings are common.

Example:

0xa4 0xcb 0xe4 0xd3

Λ δ

0xa4 0xcb 0xe4 0xd3

€ Ë ä Ó

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 34 [Scholl07]

2.4 XML Text Declarations

iso-8859-7

iso-8859-15

Unicode

The

Unicode

Initiative aims to define a new encoding

that tries to embrace all character needs:

characters of "all" languages of the world,

plus scientific, mathematical, technical,

box drawing, . . . symbols

(see http://www.unicode.org/charts/).

Range of the Unicode encoding:

0x0000 – 0x10FFFF (17 * 65536 characters).

Codes that fit into the first 16 bits (denoted U+0000 - U+FFFF)

have been assigned to encode the most widely used languages

and their characters (

Basic Multilingual Plane, BMP

).

Codes U+0000 - U+007F have been assigned to match the 7-bit

ASCII encoding which is pervasive today.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 35 [Scholl07]

2.4 XML Text Declarations

Unicode Transformation Formats

Current CPUs operate most efficiently on

32-bit words

(

16-bit words

,

8-bit bytes

).

Unicode thus developed

Unicode Transformation Formats

(UTF) which define how a Unicode character code between

U+0000 – U+10FFFF is to be mapped into a 32-bit word

(16-bit words, 8-(16-bit bytes).

UTF-32

(map a Unicode character into a 32-bit word)

1. Map any Unicode character in the range U+0000 – U+10FFFF

to the corresponding 32-bit value 0x00000000 – 0x0010FFFF.

2. N.B.

For each Unicode character encoded in UTF-32 we

waste at least 11 zero bits.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 36 [Scholl07]

(7)

UTF-16

(map a Unicode character into one or two 16-bit words)

1.

Apply the following mapping scheme:

2.

For the range U+000000 – U+00FFFF, simply fill the

positions with the 16 bit of the character code.

(Code ranges U+D800 – U+DBFF and U+DC00 – U+DFFF are

unassigned!)

3.

For the U+010000 – U+10FFFF range, subtract 0x010000 from the

character code and fill the

positions using the resulting 20-bit value.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 37 [Scholl07]

2.4 XML Text Declarations

Unicode range

Word sequence

U+000000 – U+00FFFF



U+010000 – U+10FFFF

110110



110111



Example

Unicode character U+012345 (0x012345 – 0x010000 = 0x02345):

UTF-16: 1101100000001000 1101111101000101

UTF-8

UTF-16 is designed to facilitate efficient and robust

decoding:

If we see a leading 11011 bit pattern in a 16-bit word, we know it

is the first

or

second word in a UTF-16 multi-word sequence.

The sixth bit of the word then tells us if we actually look at the

first or second word.

UTF-8 (map a Unicode character into a sequence of 8-bit

bytes)

UTF-8 is of special importance because

a)

a stream of 8 bit bytes (octets) is what flows over an IP

network connection,

b) text-processing software today is built to deal with 8 bit

character encodings (iso-8859-x, ASCII, etc.).

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 38 [Scholl07]

2.4 XML Text Declarations

UTF-8 encoding

1.

Apply the following mapping scheme:

2.

The spare bits (

) are filled with the bits of the character code to

be represented (rightmost

is least significant bit, pad to the left

with 0-bits).

Examples:

Unicode character U+00A9 (© sign):

UTF-8: 11000010 10101001 (0xC2 0xA9)

Unicode character U+2260 (math relation symbol ≠):

UTF-8: 11100010 10001001 10100000 (0xE2 0x89 0xA0)

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 39 [Scholl07]

2.4 XML Text Declarations

Unicode range

Word sequence

U+000000 – U+00007F

0



U+000080 – U+0007FF

110



10



U+000800 – U+00FFFF

1110



10



10



U+010000 – U+10FFFF

11110



10



10



Advantages of UTF-8 encoding

For a UTF-8 multi-byte sequence, the

length of the

sequence

is equal to the number of leading 1-bits (in the

first byte), e.g.:

11100010 10001001 10100000

(Only single-byte UTF-8 encodings have a leading 0-bit.)

Character boundaries

are simple to detect (even when

placed at some arbitrary position in a UTF-8 byte stream).

UTF-8 encoding does not affect (binary) sort order.

Text processing software which was originally developed

to work with the pervasive 7-bit ASCII encoding remains

functional. This is especially true for the C programming

language and its string (char[]) representation.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 40 [Scholl07]

2.4 XML Text Declarations

XML and Unicode

A conforming XML parser is required to correctly process UTF-8

and UTF-16 encoded documents. (The W3C XML Recommendation

predates the UTF-32 definition).

Documents that use a different encoding must announce so using the

XML text declaration, e.g.

<?xml encoding="iso-8859-15"?> or <?xml encoding="utf-32"?>

Otherwise, an XML parser is encouraged to

guess

the encoding

while reading the very first bytes of the input XML document:

2.4 XML Text Declarations

Head of doc

Encoding guess

0x00 0x3C 0x00 0x3F

UTF-16 (big-endian)

0x3C 0x00 0x3F 0x00

UTF-16 (little-endian)

0x3C 0x3F 0x78 0x6D

UTF-8 (or ASCII, iso-8859-?: erroneous)

(Notice: < = U+003C, ? = U+003F, x = U+0078, m = U+006D)

2.1 Introduction

2.2 XML Formalization

2.3 Well-Formedness

2.4 XML Text Declarations

2.5 Namespaces

2.6 Overview

2.7 References

Outline

(8)

Insertion: Universal Resource Identifiers

URL (Universal Resource Locator): resolvable identifier on the

Web

The target of an URL pointer

is

an HTML file (virtual or materialized)

URI (Universal Resource Identifier): general purpose key to

resources on the Web

Uniquely identifies a resource

Target is not an HTML file, can be anything (schema, table, file, entity,

object, tuple, person, physical item, etc.)

Lifetime and scope of this "key" is user dependent

IRI (Internationalized Resource Identifiers)

Allow non Latin characters (Chinese, Arabic, Japanese, etc.)

URL, URI, IRI

All strings

Very LONG strings

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 43 [Fisch05]

2.5 Namespaces

How the web does work

Individually created documents linked by ambiguous references

How the web should work

Global database of knowledge

Key to doing that is to permit distributed knowledge

creation and lazy integration

Problems

Vocabulary collisions

Joins

Namespaces

Build on URI / IRI notion

Make it possible to uniquely qualify intra-document name

collisions

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 44 [Lag05]

2.5 Namespaces

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 45 [Lag05]

2.5 Namespaces

<?xml version=“1.1” encoding=“UTF-8”?>

<Book>

<ISBN>0743204794</ISBN>

<author>Kevin Davies</author>

<

title

>Cracking the Genome</

title

>

<price>20.00</price>

</Book>

<?xml version=“1.1” encoding=“UTF-8”?>

<html>

<head>

<

title

>My home page</

title

>

</head>

<body>

<p>My hobby</p><p>My books</p>

</body>

</html>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 46 [Lag05]

2.5 Namespaces

<?xml version=“1.1” encoding=“UTF-8”?>

<html>

<head>

<

title

>My home page</

title

>

</head>

<body>

<p>My hobby</p>

<p>My books

<Book>

<ISBN>0743204794</ISBN>

<author>Kevin Davies</author>

<

title

>Cracking the Genome</

title

>

<price>20.00</price>

</Book>

</p>

</body>

</html>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 47 [Lag05]

2.5 Namespaces

<?xml version=“1.1” encoding=“UTF-8”?>

<xhtml:html>

<xhtml:head>

<

xhtml:title

>My home page</

xhtml:title

>

</xhtml:head>

<xhtml:body>

<xhtml:p>My hobby</xhtml:p>

<xhtml:p>My books

<bo:Book>

<bo:ISBN>0743204794</bo:ISBN>

<bo:author>Kevin Davies</bo:author>

<

bo:title

>Cracking the Genome</

bo:title

>

<bo:price>20.00</bo:price>

</bo:Book>

</xhtml:p>

</xhtml:body>

</xhtml:html>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 48 [Lag05]

2.5 Namespaces

bo

xhtml

bo:Book

bo:title

bo:author

bo:price

bo:ISBN

xhtml:html

xhtml:head

xhtml:body

xhtml:p

xhtml:title

vocabulary bo

vocabulary xhtml

(9)

Give

prefixes

only local relevance in an instance

document

Associate local prefix with

global namespace

name

a unique name for a namespace

uniqueness is guaranteed by using a URI in domain of the

party creating the namespace

doesn’t have any meaning, i.e. doesn’t have to resolve into

anything

An

XML namespace

is a

collection of names

,

identified by a URI reference

, which are used in

XML documents as element types and attribute

names.

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 49 [Lag05]

2.5 Namespaces

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 50 [Lag05]

2.5 Namespaces

<?xml version=“1.1” encoding=“UTF-8”?>

<xhtml:html

xmlns:xhtml=“http://www.w3c.org/1999/xhtml”

xmlns:bo=“http://www.nogood.com/Book”

>

<xhtml:head>

<xhtml:title>My home page</xhtml:title>

</xhtml:head>

<xhtml:body>

<xhtml:p>My hobby</xhtml:p>

<xhtml:p>My books

<bo:Book>

<bo:ISBN>0743204794</bo:ISBN>

<bo:author>Kevin Davies</bo:author>

………

</bo:Book>

</xhtml:p>

</xhtml:body>

</xhtml:html>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 51 [Lag05]

2.5 Namespaces

<?xml version=“1.1” encoding=“UTF-8”?>

<html

xmlns=“http://www.w3c.org/1999/xhtml”

xmlns:bo=“http://www.nogood.com/Book”

>

<head>

<title>My home page</title>

</head>

<body>

<p>My hobby</p>

<p>My books

<bo:Book>

<bo:ISBN>0743204794</bo:ISBN>

<bo:author>Kevin Davies</bo:author>

………

</bo:Book>

</p>

</body>

</html>

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 52 [Lag05]

2.5 Namespaces

<?xmlversion=“1.0” encoding=“UTF-8”?>

<html

xmlns=“http://www.w3c.org/1999/xhtml”>

<head>

<title>My home page</title>

</head>

<body>

<p>My hobby</p>

<p>My books

<bo:Book

xmlns:bo=“http://www.nogood.com/Book”

>

<bo:ISBN>0743204794</bo:ISBN>

<bo:author>Kevin Davies</bo:author>

………

</bo:Book>

</p>

</body>

</html>

2.5 Namespaces

<?xml version=“1.0” encoding=“UTF-8”?>

<html

xmlns=“http://www.w3c.org/1999/xhtml”>

<head>

<title>My home page</title>

</head>

<body>

<p>My hobby</p>

<p>My books

<Book

xmlns=“http://www.nogood.com/Book”

>

<ISBN>0743204794</ISBN>

<author>Kevin Davies</author>

………

</Book>

</p>

</body>

</html>

2.1 Introduction

2.2 XML Formalization

2.3 Well-Formedness

2.4 XML Text Declarations

2.5 Namespaces

2.6 Overview

2.7 References

Outline

(10)

1. Introduction

2. XML Basics

3. Schema definition

4. XML query languages I

5. Mapping relational data

to XML

6. SQL/XML

7. XML processing

8. XML query languages II

9. XML storage I

10. XML storage - index

11. XML storage - native

12. Updates / Transactions

13. Systems

14. XML Benchmarks

2.6 Overview

55

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

http://www.w3.org/ [W3C]

Extensible Markup Language (XML) 1.1 (2nd Edition)

[XML06]

W3C Recommendation 16 August 2006, edited in place 29

September 2006

http://www.w3.org/TR/xml11

M. Scholl, "XML and Databases", Lecture, Uni

Konstanz, WS07/08 [Scholl07]

Carl Lagoze, "Architecture of Web Information

Systems", Cornell University, Spring 05, [Lag05]

http://www.cs.cornell.edu/Courses/cs431/2005sp/syllabus.htm

56

2.7 References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

XML in a Nutshell [HM04]

Harold & Means

O'Reilly, 2004, ISBN 0596007647

The Unicode Standard, Version 5.0

The Unicode Consortium

(http://www.unicode.org/) )

Addison-Wesley; 5th edition, 2006

ISBN

:

0321480910

Peter Fischer, "XML und Datenbanken", Lecture,

ETH Zürich, WS 05/06 [Fisch05]

57

2.7 References

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig

Now, or ...

Room:

IZ 232

Office our:

Tuesday, 12:30 – 13:30 Uhr

or on appointment

Email:

[email protected]

XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 58

Questions, Ideas, Comments

References

Related documents

It is the (education that will empower biology graduates for the application of biology knowledge and skills acquired in solving the problem of unemployment for oneself and others

We nd that if individuals dier in initial wealth and if commodity taxes can be evaded at a uniform cost, preferences have to be weakly separable between consumption and labor

Comments This can be a real eye-opener to learn what team members believe are requirements to succeed on your team. Teams often incorporate things into their “perfect team

- Habitat for Humanity International – Provided computer support for the direct mail, telemarketing, major donor, matching gift, and special event fundraising programs -

Control &lt;&lt; ButtonBase &gt;&gt; Button CheckBox RadioButton DataGridView DataGrid &lt;&lt; TextBoxBase &gt;&gt; TextBox RichTextBox GroupBox PictureBox StatusBar ToolBar TreeView

The distributor may seek to limit the producer’s right to terminate until distributor has recouped its advance (assuming it has given the producer an advance.) Another

• Storage node - node that runs Account, Container, and Object services • ring - a set of mappings of OpenStack Object Storage data to physical devices To increase reliability, you

• Our goal is to make Pittsburgh Public Schools First Choice by offering a portfolio of quality school options that promote high student achievement in the most equitable and