Indexing
Indexing
XML
XML
Data
Data
in
in
RDBMS
RDBMS
using
using
ORDPATH
ORDPATH
MicrosoftMicrosoft®® SQL Server 2005SQL Server 2005™™
Concepts
Concepts developeddeveloped byby::
Patrick
Patrick OO‘‘NeilNeil, Elizabeth , Elizabeth OO‘‘NeilNeil, ,
(University of Massachusetts Boston) (University of Massachusetts Boston)
Shankar
Shankar PalPal, Istvan , Istvan CseriCseri, Oliver , Oliver SeeligerSeeliger, Gideon Schaller, , Gideon Schaller, Leo
Leo GiakoumakisGiakoumakis, Vasili , Vasili ZolotovZolotov, Nigel , Nigel WestburyWestbury
(Microsoft Corporation) (Microsoft Corporation)
5. Juli 2006 Stephan Müller 2
<BOOK ISBN=”1-55860-438-3”> <SECTION>
<TITLE> Bad Bugs</TITLE>
Nobody loves bad bugs. <FIGURE CAPTION=”Sample bug”/> </SECTION>
<SECTION>
<TITLE> Tree Frogs </TITLE>
All right-thinking people
<BOLD> love </BOLD> tree frogs. </SECTION>
</BOOK>
XML Data Model
5. Juli 2006 Stephan Müller 3 Book Figure Section Caption Bold ISBN Section
Title All right Frogs
Nobody Title
XML Document / Fragment - Properties:
XML Data Model Hierarchy Document Order: 1 < 2 < 3 < 4 < 5 < ….. < 11 < 12 8 1 3 2 4 5 6 7 9 10 11 12
5. Juli 2006 Stephan Müller 4
SQL with embedded XQuery and XPath:
SELECT id, xdoc.query(‘
for $s in
/BOOK[@ISBN=“1-55860-438-3“]//SECTION
return <topic> { data($s/TITLE) } </topic> ‘)
FROM docs;
SQL Command:
CREATE TABLE docs (
id INT PRIMARY KEY,
xdoc XML
);
XML Fragment as BLOB XML Fragment as BLOB 7 7 … … … … … … … … XMLXML DocumentDocument as BLOBas BLOB 2 2 XML Fragment as BLOB XML Fragment as BLOB 1 1 XDOC XDOC ID ID
Created docs Table:
ORDPATH
5. Juli 2006 Stephan Müller 6
What
What
we
we
expect
expect
from
from
a
a
labeling
labeling
scheme
scheme
:
:
►
►
Support
Support
for
for
structural
structural
fidelity
fidelity
(
(
Hierarchy
Hierarchy
+
+
Document
Document
Order)
Order)
►
►
Support
Support
for
for
efficient
efficient
structural
structural
modifications
modifications
to
to
the
the
XML
XML
tree
tree
--
insert
insert
sub
sub
-
-
tree
tree
--
delete
delete
sub
sub
-
-
tree
tree
--
move
move
sub
sub
-
-
tree
tree
►
►
Support
Support
for
for
high
high
-
-
performance
performance
query
query
plans
plans
for
for
native XML
native XML
queries
queries
using
using
relational primitives
relational primitives
►
►
Independence of XML
Independence of XML
schemas
schemas
typing
typing
XML
XML
instances
instances
without relabeling !!!5. Juli 2006 Book Figure Bold Section ISBN Caption Section
Title All right Frogs Nobody Title 1 1.1 1.3.3 1.3 1.3.1 1.5 1.3.5 1.3.5.1 1.5.1 1.5.3 1.5.5 1.5.7
Example of an Initial Load
1.5.7 1.5.7 1.5.5 1.5.5 1.5.3 1.5.3 1.5.1 1.5.1 1.5 1.5 1.3.5.1 1.3.5.1 1.3.5 1.3.5 1.3.3 1.3.3 1.3.1 1.3.1 1.3 1.3 1.1 1.1 1 1 ORDPATH ORDPATH '
'treetreefrogsfrogs'' 4 ( 4 (ValueValue)) --' 'lovelove'' 1 (Element) 1 (Element) 7 (BOLD) 7 (BOLD) 'All
'All rightright--thinkingthinkingpeoplepeople''
4 (
4 (ValueValue))
--'
'TreeTreefrogsfrogs''
1 (Element) 1 (Element) 4 (TITLE) 4 (TITLE) Null Null 1 (Element) 1 (Element) 3 (SECTION) 3 (SECTION) 'Sample
'Sample bugbug''
2 (Attribute) 2 (Attribute) 6 (CAPTION) 6 (CAPTION) Null Null 1 (Element) 1 (Element) 5 (FIGURE) 5 (FIGURE) 'Nobody
'Nobody loveslovesbad bad bugsbugs''
4 ( 4 (ValueValue)) --'Bad Bugs' 'Bad Bugs' 1 (Element) 1 (Element) 4 (TITLE) 4 (TITLE) Null Null 1 (Element) 1 (Element) 3 (SECTION) 3 (SECTION) '1 '1--5586055860--438438--3'3' 2 (Attribute) 2 (Attribute) 2 (ISBN ) 2 (ISBN ) Null Null 1 (Element) 1 (Element) 1 (BOOK) 1 (BOOK) VALUE VALUE NODE_TYPE NODE_TYPE TAG TAG Document Order: 1 < 1.1 < 1.3 < 1.3.1 < … < 1.5.7 Hierarchy
L
5. Juli 2006 Stephan Müller 9
1.5.3.-9.11
O OKK L LKK … … O O11 L L11 O O00 L L00ORDPATH Example Value:
Li /Oi Pair Desgin:
0100101101010110001111111000011
ORDPATH bit pattern:
We need a prefix-free L
iencoding…
5. Juli 2006 Stephan Müller 10
5. Juli 2006 Stephan Müller 11
1.5.3.-9.11
L0= 3 O0 = 1 L1= 3 O1= 5 L2= 3 O2= 3 L3= 4 O3= -9 L4 = 4 O4= 11 01 001 01 101 01 011 00011 1111 100 0011 0100101101010110001111111000011 (Figure 3.2a)Using Li values from Figure 3.2a
ORDPATH bit pattern
Li /Oi Pair Design
5. Juli 2006 Stephan Müller 12
Advantages of
Advantages of
comparing
comparing
ORDPATH
ORDPATH
Values
Values
:
:
►
►
Determination of
Determination of
ancestor
ancestor
–
–
descendent
descendent
relationships
relationships
for
for
any
any
two
two
ORDPATHs
ORDPATHs
is
is
very
very
easy
easy
.
.
►
►
Easy
Easy
determination
determination
of
of
the
the
distance
distance
between
between
two
two
ORDPATHs
ORDPATHs
.
.
►
►
Simple
Simple
bitstring
bitstring
(
(
or
or
byte
byte
-
-
by
by
-
-
byte
byte
)
)
comparison
comparison
yields
yields
document
document
order.
order.
5. Juli 2006 Stephan Müller 13
Context Node
Descendants of a given Context Node
Book Figure Bold Section ISBN Caption Section
Title All right Frogs Nobody Title 1 1.1 1.3.3 1.3 1.3.1 1.5 1.3.5 1.3.5.1 1.5.1 1.5.3 1.5.5 1.5.7 ( cn = 1.3 )
14
‚
‚treetreefrogsfrogs'' 4 ( 4 (ValueValue)) --1.5.7 1.5.7 ‚ ‚lovelove'' 1 (Element) 1 (Element) 7 (BOLD) 7 (BOLD) 1.5.5 1.5.5 ‚
‚All All rightright--thinkingthinkingpeoplepeople''
4 ( 4 (ValueValue)) --1.5.3 1.5.3 ‚
‚TreeTree frogsfrogs''
1 (Element) 1 (Element) 4 (TITLE) 4 (TITLE) 1.5.1 1.5.1 Null Null 1 (Element) 1 (Element) 3 (SECTION) 3 (SECTION) 1.5 1.5 'Sample
'Sample bugbug''
2 (Attribute) 2 (Attribute) 6 (CAPTION) 6 (CAPTION) 1.3.5.1 1.3.5.1 Null Null 1 (Element) 1 (Element) 5 (FIGURE) 5 (FIGURE) 1.3.5 1.3.5 'Nobody
'Nobody loveslovesbad bad bugsbugs'' 4 ( 4 (ValueValue)) --1.3.3 1.3.3 'Bad Bugs' 'Bad Bugs' 1 (Element) 1 (Element) 4 (TITLE) 4 (TITLE) 1.3.1 1.3.1 Null Null 1 (Element) 1 (Element) 3 (SECTION) 3 (SECTION) 1.3 1.3 '1 '1--5586055860--438438--3'3' 2 (Attribute) 2 (Attribute) 2 (ISBN ) 2 (ISBN ) 1.1 1.1 Null Null 1 (Element) 1 (Element) 1 (BOOK) 1 (BOOK) 1 1 VALUE VALUE NODE_TYPE NODE_TYPE TAG TAG ORDPATH ORDPATH SELECT Ordpath FROM infoset WHERE 1.3 < Ordpath (cn) AND 1.4 > Ordpath (cn+1)
Descendants of a given Context Node SQL Query:
Arbitrary
5. Juli 2006 Stephan Müller 16
Rightmost / Leftmost Insertion:
Arbitrary Insertions Child4 3.5.-1 Parent Child1 Child2 3.5 3.5.1 3.5.3 Child3 3.5.5
5. Juli 2006 Stephan Müller 17
Careting in nodes between two existing nodes…
3.5.2.2.-1 3.5.2.2.1 3.5.2.2 3.5.2.3 3.5.2.1 3.5.2 3.5 3.5.3 3.5.1 Arbitrary Insertions
5. Juli 2006 Stephan Müller 18 Parent Child1 Child2 3.5 3.5.1 3.5.3 Child3 3.5.2.1 Child4 3.5.2.3 Child5 3.5.2.2.1 Child6 3.5.2.2.-1 Arbitrary Insertions Careting in nodes between two existing nodes…
5. Juli 2006 Stephan Müller 19
Note:
Note:
►►
Multiple
Multiple
levels
levels
of
of
carets
carets
are
are
extremely
extremely
rare in
rare in
practice
practice
.
.
Advantage:
Advantage:
►►
Insertions
Insertions
require
require
no
no
relabelings
relabelings
of
of
old
old
nodes
nodes
…
…
We
We
avoid
avoid
updates
updates
to
to
primary
primary
key
key
values
values
which
which
would
would
involve
involve
the
the
primary
primary
index
index
and all
and all
secondary
secondary
indexes
indexes
.
.
5. Juli 2006 Stephan Müller 20
ORDPATH
ORDPATH
…
…
►
►
…
…
is
is
a
a
hierarchical
hierarchical
prefix
prefix
-
-
based
based
labeling
labeling
scheme
scheme
.
.
►
►
…
…
provides efficient access to
provides efficient access to
subtrees
subtrees
.
.
►
►
…
…
provides all kinds of modifications.
provides all kinds of modifications.
►
►
…
…