• No results found

High Performance XML Data Retrieval

N/A
N/A
Protected

Academic year: 2021

Share "High Performance XML Data Retrieval"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

High Performance XML Data Retrieval

Mark V. Scardina

Group Product Manager & XML Evangelist Oracle Corporation

Jinyu Wang

Senior Product Manager Oracle Corporation

(2)

Agenda

y Why XPath for Data Retrieval?

y Current XML Data Retrieval Strategies and Issues

y High Performance XPath Requirements y Design of Extractor for XPath

y Extractor Use Cases

(3)

Why XPath for Data Retrieval?

y W3C Standard for XML Document Navigation since 2001

y Support for XML Schema Data Types in 2.0 y Support for Functions and Operators in 2.0 y Underlies XSLT, XQuery, DOM, XForms,

XPointer

(4)

Current Standards-based Data Retrieval Strategies

y Document Object Model (DOM) Parsing y Simple API for XML Parsing (SAX)

y Java API for XML Parsing (JAXP)

y Streaming API for XML Parsing (StAX)

(5)

Data Retrieval Using DOM Parsing

y Advantages

Dynamic random access to entire document

Supports XPath 1.0

y Disadvantages

DOM In-memory footprint up to 10x doc size

No planned support for XPath 2.0

Redundant node traversals for multiple XPaths

(6)

DOM-based XPath Data Retrieval

A

B

C

D

C F

F

1

2 1

2

2

E B

1 2 1

2 /A/B/C/D /A/B/C

(7)

Data Retrieval using SAX/StAX Parsing

y Advantages

Stream-based processing for managed memory

Broadcast events for multicasting (SAX)

Pull parsing model for ease of programming and control (StAX)

y Disadvantages

No maintenance of hierarchical structure

No XPath Support either 1.0 or 2.0

(8)

High Performance Requirements

y Retrieve XML data with managed memory resources

y Support for documents of all sizes

y Handle multiple XPaths with minimum node traversals

y Support DTD and Schema-based XML documents

(9)

Extractor for XPath

y Stream-based processing utilizing SAX y Support for DTDs and XML Schemas

y Implements Publish/Subscribe model for scalability

y Handles multiple XPaths simultaneously y Supports XPath 1.0; extended to 2.0

(10)

Extractor’s Publish/Subscribe Processing Model

SAX

XPath Expressions

&Content Handlers

Outputs Extractor

(11)

Extractor’s Function Blocks

y Initialization: registration of XPaths/Handlers y XPath Compilation: compiles and builds

index graphs

y XPath Tracking: maintains XPath state and matches doc XPaths with the indexed XPaths y Output: sends matching XPath start/stop

events along with the XML data

(12)

Extractor’s Function Blocks

SAX Track XPath and perform XPath matching when

receiving SAX events XPath

Expressions &

Content Handlers Register XPath expressions and the instances of content

handlers

Registered Handler A...

Registered Handler Z Extractor

XML

Compile XPath expressions and build

Index graph

...

XMLSequenceBuilder

XMLSAXSerializer SAX

XML Sequences

Files

SAX

Runtime

Initialization Output

DTD/XSD

(13)

Initialization

y Registration of absolute XPaths y Support for XML Namespaces for

differentiation

y Registration of execution handlers using XContentHandler()

y Built-in Handlers for ease of use

XMLSequenceBuilder()

XMLSAXSerializer()

(14)

XPath Compilation

y XPath streamability evaluation

y Streamable isAll=true/false option

True: Process only streamable XPaths

False: Buffer data as needed

y Build XPath Predicate Table y Build Index Tree

XPath Dependency Tree (w/o DTD/XSD)

Data Model Tree (w/ DTD/XSD) using existing Validation engine

(15)

Compilation of Each XPath Predicate

Location XPath -1

Predicate XPath -1 Value

Predicate XPath -2 Predicate XPath-3

Predicate XPath-4 Value

Related to

... ... ... ...

TRUE FALSE Pending Predicate XPath Evaluation Attributes Status

Location XPath -n

Predicate XPath -1 Value

Predicate XPath -2 Predicate XPath-3

Predicate XPath-4 Value

Related to TRUE

FALSE Pending Predicate XPath Evaluation Attributes Status

Predicate Table (not-matched, matched, yet-to-be-matched) Predicate Table (not-matched, matched, yet-to-be-matched)

•Also Used for isAll = True condition

(16)

Runtime XPath Matching

y State Machine tokenizes and tracks

In-scope Namespaces

Current Element Name

Current Element Attributes

Node Position Relative to Siblings

Number of Child Elements

y Implemented as a Stack

(17)

XPath Index Tree (w/o DTD/XSD)

3

A

B

C

D

X

P

1 2

XPath Expressions XPath Dependency Tree /A/B/C/D

/A/B//P /A/B/C

//P

3 4

5 //P/*/Q

/

2 1

X

P 4

X

5

Q X Fake Node for /*

X Fake Node for //

(18)

Dependency Tree Traversal

3

A

B

C

D

X'

P

Matched Node

XPath Dependency Tree /

2 1

X''

P 4

X'''

5

Q X Fake Node for /*

X Fake Node for //

XPath Stack A

B C D

A P

C

X'' X'' B

X'' X'

1

2 D X'' X' P

1

2 /A/B/C/D /A/B//P

/A/B/C

//P

3 4

5 //P/*/Q

4

X'' X'

3

(19)

Synchronous Data Model Traversal

A

B

C

D

C F

F

1

2

3

E B

A

B

C

D

C F

F

1

2

E B

XML Document Tree XML Data Model

3

/A/B/C/D /A/B/C

(20)

Extractor Output

y XContentHandler()

Execution of Registered Content Handlers

y XMLSequenceBuilder()

Built-in Handler

Presents Result Set as XMLSequence Object

Contains a Linked List of XMLItems

y XMLSAXSerializer

Built-in Handler

Serializes output to Printwriter or OutputStream

(21)

Extractor Use Cases

y Content Management y Web-Services

y XSLT/XQuery Implementation

(22)

Extractor Content Management Use Case

Pre-Process

SAX Parser

JDBC

DTD/XML Schema:

XPaths Table XPath

Expressions:

/a/b /a/b/@c ...

XML system/public ID or

XML Schema Location URL

Registered Handler A

Registered Handler Y ...

MetaData Tables

Registered Handler Z

JDBC

JDBC

Connection Pool

Doc Table Retrieve XPath Expressions

Process Data using Content Handlers Extractor

Oracle Database XML

Initialize

(23)

Extractor Web Service Use Case

Web Services

Register XPath/

Content Handlers

"invoke" WS SOAP

SAX Parser

XML

...

Client 3

Clients Subscribing and Receiving the Data

Extractor SOAP

Initialize Web Service

Client Web Services

WS response

SOAP

Register Data Subscription using

XPath

Client 1

Client 2 Service Broker

(24)

Extractor XSLT/XQuery Use Case

XSLT Resolving

XPath

SAX Parser

XML Extractor

XML

Register XPath

XQuery XMLSequnceObject

XMLSequenceBuilder XPath Bridge

(25)

Oracle XML Resources

•Oracle Technology Network

http://otn.oracle.com

Downloads, Demos, Samples, Papers

XML Support Forum

•Oracle Database 10g XML & SQL

Design, Build, & Manage XML Applications in Java, C, C++, & PL/SQL

Covers all of Oracle XML technology

BetaBook Forum on OTN

Available in May from Bookstores

(26)

References

Related documents

Similar to kenaf fiber study, noted by Li et al., (2004) that an optimum addition of natural hemp fiber increases the Compressive and flexural properties of natural hemp

adapun tujuan dalam penelitian ini adalah untuk memberikan wawasan kepada kita tentang bahwa dalam Islam persoalan hak-hak perempuan telah secara dibahas secara luas para yuris

Title II disability recipients (SSDI and Adult Disabled Child benefits) who appeal Social Security's determinations that they are no longer disabled also have 10 days in which to

Manninen, Terhikki, Lauri Korhonen, Pekka Voipio, Panu Lahtinen and Pauline Stenberg, 2011, “Airborne estimation of boreal forest LAI in winter conditions: A test using summer

To investigate whether subretinal fluid induces procoagulant activity, we added samples from 10 patients who underwent successful scleral buckling surgery for primary RRD to normal

As shown in Algorithm 1, this process comprises two different interleaved stages, an individual planning process by which agents devise refinements over a centralized base plan and

Within the labor movement, for example, Central Labor Councils have been strong in San Jose, San Francisco, the East Bay, and to a certain extent Sacramento as well, but even

Robotics usually responds with something about 8216 cannot moderate to host, on increasing 15121 8217 Easter for Gnome Phone is a problematic game that was looking to windows your