XML Databases
2. XML Basics, 03.11.08
Silke Eckstein
Andreas Kupfer
Institut für Informationssysteme
Technische Universität Braunschweig
http://www.ifis.cs.tu-bs.de
2.1 Introduction
2.2 XML Formalization
2.3 Well-Formedness
2.4 XML Text Declarations
2.5 Namespaces
2.6 Overview
2.7 References
22. XML Basics
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig
•
Structure of XML documents
–
XML prolog
–
Document Type Definition (DTD)
–
Document Instance
–
Have to be well-formed (see later)
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 3
2.1 Introduction
<Bücher>
<Buch>
<Autor id="1234567890">Rainer Eckstein</Autor>
<Autor id="1234568723">Silke Eckstein</Autor>
<Titel>XML und Datenmodellierung</Titel>
<Untertitel>XML-Schema ...</Untertitel>
<Verlag id="3-89864">dpunkt.Verlag</Verlag>
</Buch>
</Bücher>
•
A document instance is a set of tags that is customized
to represent the content, e.g.:
<Autor>Silke Eckstein</Autor>
<Titel>XML und Datenmodellierung</Titel>
•
New types of queries may require
new tags: No problem for XML!
–
Resulting set of tags forms a new markup
language (
XML dialect
).
•
All
tags need to appear in properly nested
pairs (e.g.,
<t> . . . <s> . . . </s>. . . </t>).
•
Tags can be freely nested to reflect the logical structure
of the content.
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 4 [Scholl07]
2.1 Introduction
•
XML comes with a number of additional
constructs which allow us to convey even more
useful information, e.g.:
–
Attributes
may be used to qualify tags (avoid the
so-called tag soup).
Instead of
•
<question> Is it okay ...? </question>
<angry> Now I'm ... </angry>
use
•
<bubble tone="question">Is it okay ...?</bubble>
<bubble tone="angry">Now I'm ...</bubble>
2.1 Introduction
•
More additional constructs:
–
References
establish links internal to
an XML document:
Establish link target:
•
<character id="phb">The Pointy-Haired Boss</character>
Reference the target:
•
<bubble speaker="phb">Speed is the key to
success.</bubble>
2.1 Introduction
2.2 XML Formalization
•
Elements
•
Attributes
•
Entities
•
Miscellaneous
•
General structure
2.3 Well-Formedness
2.4 XML Text Declarations
2.5 Namespaces
2.6 Overview
2.7 References
7Outline
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig
•
We will now try to approach XML in a slightly more
formal way.
•
This discussion will be based on the central XML
technical specification:
–
Extensible Markup Language (XML) 1.1 (Second Edition)
W3C Recommendation Aug 2006
(
http://www.w3.org/TR/xml11
)
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 8 [Scholl07]
2.2 XML Formalization
Visit the W3C site
This lecture does not try to be a "guided tour" through the XML-related
W3C technical documents (boring!).
Instead we will cover the basic principles and most interesting ideas. Visit
the W3C site and use the original W3C documents to get a full grasp of
their contents.
•
Elements
–
… are the
basic modules
of XML documents
–
… consist of a
start-
and an
end-tag
with the
element content
in between
–
… may also be
empty
(with an empty-element tag then)
–
… may be
nested
, which leads to
hierarchical structure
of
XML documents
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 9 [Scholl07]
2.2 XML Formalization
Well-formed XML (fragments)
:
<foo> okay </foo>
<This-is-a-well-formed-XML-tag.> okay
</This-is-a-well-formed-XML-tag.>
<foo>okay</foo>
<foo/>
Non-well-formed XML:
<foo> oops </
bar
>
<foo> oops </
F
oo>
<foo> oops ... ‹
EOT
›
•
Elements – examples:
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 10
2.2 XML Formalization
Nested element:
<address>
<street> Rudower Chaussee </street>
<no> 25 </no>
<zip> 12489 </zip>
<city> Berlin </city>
</address>
Simple element:
<city> Berlin </city>
Empty element:
<fax/>
•
Element content may contain document characters as
well as properly nested elements (so-called mixed
content):
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 11 [Scholl07]
2.2 XML Formalization
Well-formed XML
<foo><bar>
<baz> okay </baz>
</bar>
<ok> okay </ok> still okay
</foo>
Non-well-formed XML
<foo><bar> oops </foo></bar>
<foo><bar> oops </bar><bar> oops </foo></bar>
•
Element nesting establishes a parent-child
relationship between elements:
–
In the XML fragment <p> <c> . . . </c>. . . <c'> . . . </c'> </p>,
•
element p is the
parent
of elements c; c',
•
elements c; c' are
children
of element p,
•
elements c; c' are
siblings
.
•
There is exactly one element that encloses the whole
XML content: the root element.
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 12 [Scholl07]
2.2 XML Formalization
Non-well-formed XML
<one> one eins un </one>
<two> two zwei deux </two>
•
Attributes
–
… may specify further properties of elements
–
… may not be nested
–
… are not considered to be children of the containing element
(instead they are
owned
by the containing element)
–
Attribute
values
are restricted to character data.
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 13 [Scholl07]
2.2 XML Formalization
Well-formed XML (fragments)
<price currency="Euro"> 23.45 </price>
<price>
<currency> Euro </currency>
23.45
</price>
•
An Element can contain each attribute only once:
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 14 [Scholl07]
2.2 XML Formalization
Non-well-formed XML
<Team person='Erna' person='Hugo' person='Agnes'/>
Well-formed XML (fragments)
<Team persons='Erna Hugo Agnes'/>
<Team person1='Erna' person2='Hugo' person3='Agnes'/>
<Team>
<Person>Erna</Person>
<Person>Hugo</Person>
<Person>Agnes</Person>
</Team>
•
Entities
–
In XML, document
content
and
markup
are specified using a
single set of characters.
–
Characters { <, >, &, ", ' } form pieces of XML markup, they may
be denoted by
predefined entities
to represent content:
–
The XML entity facility is actually a versatile recursive
macro
expansion machinery (more on that later).
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 15 [Scholl07]
2.2 XML Formalization
Character
Entity
<
<
>
>
&
&
"
"
'
'
Well-formed XML:
<operators>
Valid comparison operators are
<, =, & >.
</operators>
•
CDATA sections
–
… may occur anywhere where character data may occur.
–
… are used to escape blocks of text containing characters
which would otherwise be recognized as markup.
–
Within a CDATA section, only the string ']]>' is
recognized as markup
•
left angle brackets and ampersands may occur in their literal
form;
•
they need not (and cannot) be escaped using "<" and "&".
–
CDATA sections cannot nest.
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 16 [XML06]
2.2 XML Formalization
Well-formed XML (fragments)
<![CDATA[<greeting>Hello, world!</greeting>]]>
•
Comments
–
… may appear
anywhere
in a document
outside
other markup
–
… may not end with '--->'
2.2 XML Formalization
Well-formed XML (fragments)
<!-- declarations for <head> & <body> -->
Non-well-formed XML
<!-- B+, B, or B--->
2.1 Introduction
2.2 XML Formalization
2.3 Well-Formedness
2.4 XML Text Declarations
2.5 Namespaces
2.6 Overview
2.7 References
Outline
•
The W3C XML recommendation is actually more
formal and rigid in dening the syntactical structure
of XML:
–
"A textual object is
well-formed
XML if,
1. Taken as a whole, it matches the production labeled
"document".
2. It meets all the
well-formedness constraints
given in
this [the W3C XML Recommendation] specification. . . . "
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 19 [Scholl07]
2.3 Well-Formedness
•
Well-formedness #1
: Context-free Properties
–
All
context-free
properties of well-formed XML
documents are concisely captured by a
grammar
(using an EBNF-style notation).
•
Grammar
: system of
production (rule)s
of the form
lhs ::= rhs
–
Excerpt of the XML grammar
(see next pages):
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 20 [Scholl07]
2.3 Well-Formedness
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 21
No
lhs
rhs
[1]
Document
::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* )
[2]
Char
::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
[2a] RestrictedChar
::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] |
[#x86-#x9F]
[3]
S
::= (#x20 | #x9 | #xD | #xA)+
[4]
NameStartChar
::=
":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] |
[#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] |
[#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] |
[#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]
[4a] NameChar
::=
NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] |
[#x203F-#x2040]
[5]
Name
::= NameStartChar (NameChar)*
[10] AttValue
::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
[14] CharData
::= [^<&]* - ([^<&]* ']]>' [^<&]*)
2.3 Well-Formedness
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 22
No
lhs
rhs
[22] prolog
::= XMLDecl Misc* (doctypedecl Misc*)?
[23] XMLDecl
::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[24]
VersionInfo
::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')
[25] Eq
::= S? '=' S?
[26] VersionNum
::= '1.1'
[27] Misc
::= Comment | PI | S
[39] element
::= EmptyElemTag | STag content ETag
[40] STag
::= '<' Name (S Attribute)* S? '>'
[41] Attribute
::= Name Eq AttValue
[42] ETag
::= '</' Name S? '>'
[43] content
::=
CharData? ((element | Reference | CDSect | PI | Comment)
CharData?)*
[44] EmptyElemTag
::= '<' Name (S Attribute)* S? '/>'
[67] Reference
::= EntityRef | CharRef
[68] EntityRef
::= '&' Name ';'
•
N.B.
–
The numbers in [] refer to the
correspondig productions in the
W3C XML Recommendation.
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 23 [Scholl07]
2.3 Well-Formedness
Expression …
… denotes
r*
ε
,
r
,
rr
,
rrr
, …
zero or more repititions of
r
r
+
rr*
one or more repititions of
r
r
?
r
|
ε
optional
r
[abc]
a|b|c
character class
[^abc]
inverted character class
•
Remarks
–
As usual, the XML grammar may systematically be
transformed into a program, an
XML parser
, to be used
to check the syntax of XML input.
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 24 [Scholl07]
2.3 Well-Formedness
Rule … … implements this characteristic of XML:
[1]
an XML document contains exactly one root element
[10]
attribute values are enclosed in " or '
[22]
XML documents have to include a declaration prolog
[14]
characters < and & may not appear literally in element content
[43]
element content may contain character data and entity references as well as
nested elements
[68]
entity references may contain arbitrary entity names (other than lt, amp, . . . )
•
Parsing XML
1. Starting with the symbol
document
, the parser uses
the lhs ::= rhs rules to expand symbols, constructing
a
parse tree
.
2. The leaves of the parse tree are
characters which have no further
expansion.
3. The XML input is
parsed
successfully if it perfectly
matches the parse tree's
front
(concatenate the
parse tree leaves from left to right).
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 25 [Scholl07]
2.3 Well-Formedness
speaker
•
Example 1
–
Parse tree for XML input
<?xml … ?> <bubble speaker="phb">Um... No.</bubble>
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 26 [Scholl07]
2.3 Well-Formedness
document
prolog
element
Misc*
STag
bubble
Name
<
Attribute
S?
>
ε
S
…
[
S
S?
Eq
=
S?
ε
ε
Name
AttValue
"phb"
content
CharData
Um… No.
STag
bubble
Name
</
S?
>
ε
ε
•
Example 2
–
Parse tree for the \minimal" XML document
<?xml version="1.1"?><foo/>
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 27 [Scholl07]
2.3 Well-Formedness
document
prolog
XMLDecl
<?xml
VersionInfo
EncodingDecl?
ε
Misc*
[
version
S
?>
S?
S?
Eq
"
=
1.1
VersionNum
element
Misc*
S?
ε
ε
"
ε
ε
EmptyElemTag
foo
Name
ε
<
(S Attribute)* S?
/>
ε
[
S
[
S
[[
[[
•
Well-formedness #2
: Context-dependent
Properties
–
The XML grammar cannot enforce all XML
well-formedness constraints
(
WFCs
).
–
Some XML WFCs depend on
1. what the XML parser has
seen before
in its
input, or
2. on a
global state
, e.g., the denitions of
user-declared entities.
–
These WFCs cannot be checked by simply comparing
the parse tree front against the XML input (
context-dependent WFCs
).
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 28 [Scholl07]
2.3 Well-Formedness
•
Sample WFCs
–
All 10 XML WFCs are given in http://www.w3.org/TR/REC-xml
2.3 Well-Formedness
WFC
Comment
(2) Element Type Match
The Name in an element's end tag must match
the element name in the start tag.
(3) Unique Att Spec
No attribute name may appear more than once
in the same start tag or empty element tag.
(5) No < in Attribute Values The replacement text of any entity referred to
directly or indirectly in an attribute value
(other than <) must not contain a <.
(9) No Recursion
A parsed entity must not contain a recursive
reference to itself, either directly or indirectly.
2.1 Introduction
2.2 XML Formalization
2.3 Well-Formedness
2.4 XML Text Declarations
2.5 Namespaces
2.6 Overview
2.7 References
Outline
•
The
XML Text Declaration
<?xml. . . ?>
–
A well-formed XML 1.1 document has to start
with a
header
, the
text declaration
(grammar rule [23]):
XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
–
VersionInfo:
•
An XML document whose text declaration carries a
VersionInfo of version="1.1" is required to conform to
W3C's XML Recommendation posted on August 16, 2006
(see http://www.w3.org/TR/xml11).
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 31 [Scholl07]
2.4 XML Text Declarations
•
For XML 1.0 this header is optional
–
I.e. the following is an XML 1.0 document
because it does not have an XML declaration:
•
<greeting>Hello, world!</greeting>
•
Encoding declaration:
–
Documents that use an encoding other than UTF-8 or
UTF-16 (see below) must announce so using the XML
text declaration, e.g.
•
<?xml version="1.1" encoding="iso-8859-15"?> or
•
<?xml version="1.1" encoding="utf-32"?>
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 32 [Scholl07]
2.4 XML Text Declarations
•
XML Documents and Character
Encoding
–
For a computer, a character like X is
nothing but an 8 (16/32) bit
number
whose value is interpreted as the
character X when needed (e.g., to
drive a display).
–
Trouble is, a large number of such
number
character mapping tables, the
so-called
encodings
, are in parallel use
today.
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 33 [Scholl07]
2.4 XML Text Declarations
•
… XML Documents and Character Encoding
–
Due to the huge amount of characters needed by the
global computing community today (Latin, Hebrew, Arabic,
Greek, Japanese, Chinese . . . languages),
conflicting
intersections
between encodings are common.
–
Example:
•
0xa4 0xcb 0xe4 0xd3
€
Λ δ
∑
•
0xa4 0xcb 0xe4 0xd3
€ Ë ä Ó
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 34 [Scholl07]
2.4 XML Text Declarations
iso-8859-7
iso-8859-15
•
Unicode
–
The
Unicode
Initiative aims to define a new encoding
that tries to embrace all character needs:
•
characters of "all" languages of the world,
•
plus scientific, mathematical, technical,
box drawing, . . . symbols
(see http://www.unicode.org/charts/).
–
Range of the Unicode encoding:
0x0000 – 0x10FFFF (17 * 65536 characters).
•
Codes that fit into the first 16 bits (denoted U+0000 - U+FFFF)
have been assigned to encode the most widely used languages
and their characters (
Basic Multilingual Plane, BMP
).
•
Codes U+0000 - U+007F have been assigned to match the 7-bit
ASCII encoding which is pervasive today.
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 35 [Scholl07]
2.4 XML Text Declarations
•
Unicode Transformation Formats
–
Current CPUs operate most efficiently on
32-bit words
(
16-bit words
,
8-bit bytes
).
–
Unicode thus developed
Unicode Transformation Formats
(UTF) which define how a Unicode character code between
U+0000 – U+10FFFF is to be mapped into a 32-bit word
(16-bit words, 8-(16-bit bytes).
•
UTF-32
(map a Unicode character into a 32-bit word)
1. Map any Unicode character in the range U+0000 – U+10FFFF
to the corresponding 32-bit value 0x00000000 – 0x0010FFFF.
2. N.B.
For each Unicode character encoded in UTF-32 we
waste at least 11 zero bits.
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 36 [Scholl07]
•
UTF-16
(map a Unicode character into one or two 16-bit words)
1.
Apply the following mapping scheme:
2.
For the range U+000000 – U+00FFFF, simply fill the
positions with the 16 bit of the character code.
(Code ranges U+D800 – U+DBFF and U+DC00 – U+DFFF are
unassigned!)
3.
For the U+010000 – U+10FFFF range, subtract 0x010000 from the
character code and fill the
positions using the resulting 20-bit value.
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 37 [Scholl07]
2.4 XML Text Declarations
Unicode range
Word sequence
U+000000 – U+00FFFF
U+010000 – U+10FFFF
110110
110111
Example
Unicode character U+012345 (0x012345 – 0x010000 = 0x02345):
UTF-16: 1101100000001000 1101111101000101
•
UTF-8
–
UTF-16 is designed to facilitate efficient and robust
decoding:
•
If we see a leading 11011 bit pattern in a 16-bit word, we know it
is the first
or
second word in a UTF-16 multi-word sequence.
•
The sixth bit of the word then tells us if we actually look at the
first or second word.
–
UTF-8 (map a Unicode character into a sequence of 8-bit
bytes)
•
UTF-8 is of special importance because
a)
a stream of 8 bit bytes (octets) is what flows over an IP
network connection,
b) text-processing software today is built to deal with 8 bit
character encodings (iso-8859-x, ASCII, etc.).
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 38 [Scholl07]
2.4 XML Text Declarations
•
UTF-8 encoding
1.
Apply the following mapping scheme:
2.
The spare bits (
) are filled with the bits of the character code to
be represented (rightmost
is least significant bit, pad to the left
with 0-bits).
Examples:
•
Unicode character U+00A9 (© sign):
UTF-8: 11000010 10101001 (0xC2 0xA9)
•
Unicode character U+2260 (math relation symbol ≠):
UTF-8: 11100010 10001001 10100000 (0xE2 0x89 0xA0)
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 39 [Scholl07]
2.4 XML Text Declarations
Unicode range
Word sequence
U+000000 – U+00007F
0
U+000080 – U+0007FF
110
10
U+000800 – U+00FFFF
1110
10
10
U+010000 – U+10FFFF
11110
10
10
•
Advantages of UTF-8 encoding
–
For a UTF-8 multi-byte sequence, the
length of the
sequence
is equal to the number of leading 1-bits (in the
first byte), e.g.:
11100010 10001001 10100000
(Only single-byte UTF-8 encodings have a leading 0-bit.)
–
Character boundaries
are simple to detect (even when
placed at some arbitrary position in a UTF-8 byte stream).
–
UTF-8 encoding does not affect (binary) sort order.
–
Text processing software which was originally developed
to work with the pervasive 7-bit ASCII encoding remains
functional. This is especially true for the C programming
language and its string (char[]) representation.
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 40 [Scholl07]
2.4 XML Text Declarations
•
XML and Unicode
–
A conforming XML parser is required to correctly process UTF-8
and UTF-16 encoded documents. (The W3C XML Recommendation
predates the UTF-32 definition).
–
Documents that use a different encoding must announce so using the
XML text declaration, e.g.
<?xml encoding="iso-8859-15"?> or <?xml encoding="utf-32"?>
–
Otherwise, an XML parser is encouraged to
guess
the encoding
while reading the very first bytes of the input XML document:
2.4 XML Text Declarations
Head of doc
Encoding guess
0x00 0x3C 0x00 0x3F
UTF-16 (big-endian)
0x3C 0x00 0x3F 0x00
UTF-16 (little-endian)
0x3C 0x3F 0x78 0x6D
UTF-8 (or ASCII, iso-8859-?: erroneous)
(Notice: < = U+003C, ? = U+003F, x = U+0078, m = U+006D)
2.1 Introduction
2.2 XML Formalization
2.3 Well-Formedness
2.4 XML Text Declarations
2.5 Namespaces
2.6 Overview
2.7 References
Outline
•
Insertion: Universal Resource Identifiers
–
URL (Universal Resource Locator): resolvable identifier on the
Web
•
The target of an URL pointer
is
an HTML file (virtual or materialized)
–
URI (Universal Resource Identifier): general purpose key to
resources on the Web
•
Uniquely identifies a resource
•
Target is not an HTML file, can be anything (schema, table, file, entity,
object, tuple, person, physical item, etc.)
•
Lifetime and scope of this "key" is user dependent
–
IRI (Internationalized Resource Identifiers)
•
Allow non Latin characters (Chinese, Arabic, Japanese, etc.)
–
URL, URI, IRI
•
All strings
•
Very LONG strings
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 43 [Fisch05]
2.5 Namespaces
•
How the web does work
–
Individually created documents linked by ambiguous references
•
How the web should work
–
Global database of knowledge
•
Key to doing that is to permit distributed knowledge
creation and lazy integration
•
Problems
–
Vocabulary collisions
–
Joins
•
Namespaces
–
Build on URI / IRI notion
–
Make it possible to uniquely qualify intra-document name
collisions
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 44 [Lag05]
2.5 Namespaces
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 45 [Lag05]
2.5 Namespaces
<?xml version=“1.1” encoding=“UTF-8”?>
<Book>
<ISBN>0743204794</ISBN>
<author>Kevin Davies</author>
<
title
>Cracking the Genome</
title
>
<price>20.00</price>
</Book>
<?xml version=“1.1” encoding=“UTF-8”?>
<html>
<head>
<
title
>My home page</
title
>
</head>
<body>
<p>My hobby</p><p>My books</p>
</body>
</html>
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 46 [Lag05]
2.5 Namespaces
<?xml version=“1.1” encoding=“UTF-8”?>
<html>
<head>
<
title
>My home page</
title
>
</head>
<body>
<p>My hobby</p>
<p>My books
<Book>
<ISBN>0743204794</ISBN>
<author>Kevin Davies</author>
<
title
>Cracking the Genome</
title
>
<price>20.00</price>
</Book>
</p>
</body>
</html>
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 47 [Lag05]
2.5 Namespaces
<?xml version=“1.1” encoding=“UTF-8”?>
<xhtml:html>
<xhtml:head>
<
xhtml:title
>My home page</
xhtml:title
>
</xhtml:head>
<xhtml:body>
<xhtml:p>My hobby</xhtml:p>
<xhtml:p>My books
<bo:Book>
<bo:ISBN>0743204794</bo:ISBN>
<bo:author>Kevin Davies</bo:author>
<
bo:title
>Cracking the Genome</
bo:title
>
<bo:price>20.00</bo:price>
</bo:Book>
</xhtml:p>
</xhtml:body>
</xhtml:html>
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 48 [Lag05]
2.5 Namespaces
bo
xhtml
bo:Book
bo:title
bo:author
bo:price
bo:ISBN
xhtml:html
xhtml:head
xhtml:body
xhtml:p
xhtml:title
vocabulary bo
vocabulary xhtml
•
Give
prefixes
only local relevance in an instance
document
•
Associate local prefix with
global namespace
name
–
a unique name for a namespace
–
uniqueness is guaranteed by using a URI in domain of the
party creating the namespace
–
doesn’t have any meaning, i.e. doesn’t have to resolve into
anything
•
An
XML namespace
is a
collection of names
,
identified by a URI reference
, which are used in
XML documents as element types and attribute
names.
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 49 [Lag05]
2.5 Namespaces
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 50 [Lag05]
2.5 Namespaces
<?xml version=“1.1” encoding=“UTF-8”?>
<xhtml:html
xmlns:xhtml=“http://www.w3c.org/1999/xhtml”
xmlns:bo=“http://www.nogood.com/Book”
>
<xhtml:head>
<xhtml:title>My home page</xhtml:title>
</xhtml:head>
<xhtml:body>
<xhtml:p>My hobby</xhtml:p>
<xhtml:p>My books
<bo:Book>
<bo:ISBN>0743204794</bo:ISBN>
<bo:author>Kevin Davies</bo:author>
………
</bo:Book>
</xhtml:p>
</xhtml:body>
</xhtml:html>
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 51 [Lag05]
2.5 Namespaces
<?xml version=“1.1” encoding=“UTF-8”?>
<html
xmlns=“http://www.w3c.org/1999/xhtml”
xmlns:bo=“http://www.nogood.com/Book”
>
<head>
<title>My home page</title>
</head>
<body>
<p>My hobby</p>
<p>My books
<bo:Book>
<bo:ISBN>0743204794</bo:ISBN>
<bo:author>Kevin Davies</bo:author>
………
</bo:Book>
</p>
</body>
</html>
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 52 [Lag05]
2.5 Namespaces
<?xmlversion=“1.0” encoding=“UTF-8”?>
<html
xmlns=“http://www.w3c.org/1999/xhtml”>
<head>
<title>My home page</title>
</head>
<body>
<p>My hobby</p>
<p>My books
<bo:Book
xmlns:bo=“http://www.nogood.com/Book”
>
<bo:ISBN>0743204794</bo:ISBN>
<bo:author>Kevin Davies</bo:author>
………
</bo:Book>
</p>
</body>
</html>
2.5 Namespaces
<?xml version=“1.0” encoding=“UTF-8”?>
<html
xmlns=“http://www.w3c.org/1999/xhtml”>
<head>
<title>My home page</title>
</head>
<body>
<p>My hobby</p>
<p>My books
<Book
xmlns=“http://www.nogood.com/Book”
>
<ISBN>0743204794</ISBN>
<author>Kevin Davies</author>
………
</Book>
</p>
</body>
</html>
2.1 Introduction
2.2 XML Formalization
2.3 Well-Formedness
2.4 XML Text Declarations
2.5 Namespaces
2.6 Overview
2.7 References
Outline
1. Introduction
2. XML Basics
3. Schema definition
4. XML query languages I
5. Mapping relational data
to XML
6. SQL/XML
7. XML processing
8. XML query languages II
9. XML storage I
10. XML storage - index
11. XML storage - native
12. Updates / Transactions
13. Systems
14. XML Benchmarks
2.6 Overview
55XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig
•
http://www.w3.org/ [W3C]
•
Extensible Markup Language (XML) 1.1 (2nd Edition)
[XML06]
–
W3C Recommendation 16 August 2006, edited in place 29
September 2006
–
http://www.w3.org/TR/xml11
•
M. Scholl, "XML and Databases", Lecture, Uni
Konstanz, WS07/08 [Scholl07]
•
Carl Lagoze, "Architecture of Web Information
Systems", Cornell University, Spring 05, [Lag05]
http://www.cs.cornell.edu/Courses/cs431/2005sp/syllabus.htm
562.7 References
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig
•
XML in a Nutshell [HM04]
–
Harold & Means
–
O'Reilly, 2004, ISBN 0596007647
•
The Unicode Standard, Version 5.0
–
The Unicode Consortium
(http://www.unicode.org/) )
–
Addison-Wesley; 5th edition, 2006
ISBN
:
0321480910
•
Peter Fischer, "XML und Datenbanken", Lecture,
ETH Zürich, WS 05/06 [Fisch05]
57
2.7 References
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig
•
Now, or ...
•
Room:
IZ 232
•
Office our:
Tuesday, 12:30 – 13:30 Uhr
or on appointment
•
Email:
[email protected]
XML Databases – Silke Eckstein – Institut für Informationssysteme – TU Braunschweig 58