• No results found

Database Technologies

N/A
N/A
Protected

Academic year: 2021

Share "Database Technologies"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

XML Databases

Database & Information Systems Group

Christian Grün

Bachelor and

Master Projects

(2)

Introduction

XML…

• just small files – why databases?

"

library of U Konstanz (800 MB)

"

genetic data (Swissprot, 3 GB)

"

Wikipedia (8 GB)

"

Medline (38 GB)…

Challenges

• support new standards

• find relevant query optimizations

• visualizing results

"

tree-structured data structure

<XML/>

vs

<root><Entry id="100K_RAT" class="STANDARD" mtype="PRT" seqlen="889"> <AC>Q62671</AC> <Mod date="01-NOV-1997" Rel="35" type="Created"></Mod> <Mod date="01-NOV-1997" Rel="35" type="Last sequence update"></Mod> <Mod date="15-JUL-1999" Rel="38" type="Last annotation update"></Mod> <Descr>100 KDA PROTEIN (EC 6.3.2.-)</Descr> <Species>Rattus norvegicus (Rat)</Species> <Org>Eukaryota</Org> <Org>Metazoa</Org> <Org>Chordata</Org> <Org> Craniata</Org> <Org>Vertebrata</Org> <Org>Euteleostomi</Org> <Org>Mammalia</Org> <Org> Eutheria</Org> <Org>Rodentia</Org> <Org>Sciurognathi</Org> <Org>Muridae</Org> <Org> Murinae</Org> <Org>Rattus</Org> <Ref num="1" pos="SEQUENCE FROM N.A"> <Comment> STRAIN=WISTAR</Comment> <Comment>TISSUE=TESTIS</Comment> <DB>MEDLINE</DB> <MedlineID> 92253337</MedlineID> <Author>Mueller D</Author> <Author>Rehbein M</Author> <Author> Baumeister H</Author> <Author>Richter D</Author> <Cite>Nucleic Acids Res. 20:1471-1475(1992)</Cite> </Ref> <Ref num="2" pos="ERRATUM"> <Author>Mueller D</Author> <Author>Rehbein M</Author> <Author>Baumeister H</Author> <Author>Richter D</Author> <Cite>Nucleic Acids Res. 20:2624-2624(1992)</Cite> </Ref> <EMBL prim_id="X64411" sec_id= "CAA45756"></EMBL> <INTERPRO prim_id="IPR000569" sec_id="-"></INTERPRO> <INTERPRO prim_id="IPR002004" sec_id="-"></INTERPRO> <PFAM prim_id="PF00632" sec_id="HECT" status= "1"></PFAM> <PFAM prim_id="PF00658" sec_id="PABP" status="1"></PFAM> <Keyword>Ubiquitin conjugation</Keyword> <Keyword>Ligase</Keyword> <Features> <DOMAIN from="77" to="88"> <Descr>ASP/GLU-RICH (ACIDIC)</Descr> </DOMAIN> <DOMAIN from="127" to="150"> <Descr>PRO-RICH</Descr> </DOMAIN> <DOMAIN from="420" to="439"> <Descr>ARG/GLU-RICH (MIXED CHARGE)</Descr> </DOMAIN> <DOMAIN from="448" to="457"> <Descr>ARG/ASP-RICH (MIXED CHARGE)</Descr> </DOMAIN> <DOMAIN from="485" to="514"> <Descr>PABP-LIKE</Descr> </DOMAIN> <DOMAIN from="579" to="590"> <Descr>ASP/GLU-RICH (ACIDIC) </Descr> </DOMAIN> <DOMAIN from="786" to="889"> <Descr>HECT DOMAIN</Descr> </DOMAIN> <DOMAIN from="827" to="847"> <Descr>PRO-RICH</Descr> </DOMAIN> <BINDING from="858" to="858"> <Descr>UBIQUITIN (BY SIMILARITY)</Descr> </BINDING> </Features></Entry> <Entry id="104K_THEPA" class="STANDARD" mtype="PRT" seqlen="924"> <AC>P15711</AC> <Mod date="01-APR-1990" Rel="14" type="Created"></Mod> <Mod date="01-APR-1990" Rel="14" type="Last sequence update"></Mod> <Mod date="01-AUG-1992" Rel="23" type="Last annotation update"></Mod> <Descr>104 KDA MICRONEME-RHOPTRY ANTIGEN</Descr> <Species>Theileria parva </Species> <Org>Eukaryota</Org> <Org>Alveolata</Org> <Org>Apicomplexa</Org> <Org> Piroplasmida</Org> <Org>Theileriidae</Org> <Org>Theileria</Org> <Ref num="1" pos= "SEQUENCE FROM N.A"> <Comment> STRAIN=MUGUGA</Comment> <DB>MEDLINE</DB> <MedlineID> 90158697</MedlineID> <Author>Iams K.P</Author> <Author>Young J.R</Author> <Author>Nene V</Author> <Author>Desai J</Author> <Author>Webster P</Author> <Author>Ole-Moiyoi O.K</Author> <Author>Musoke A.J</Author> <Cite>Mol. Biochem. Parasitol. 39:47-60(1990)</Cite> </Ref> <EMBL prim_id="M29954" sec_id="AAA18217"></EMBL> <PIR prim_id= "A44945" sec_id="A44945"></PIR> <Keyword>Antigen</Keyword> <Keyword>Sporozoite</Keyword>

(3)

BaseX

• XML database, developed in DBIS workgroup

• open source:

www.basex.org

• query languages:

– W3C standards XPath & XQuery

• extensions:

– XQuery Update, Full-Text

• indexes:

– attributes, texts

– full-text

• special focus:

– tight coupling between

frontend and backend

(4)

Topics – Backend

Namespace Support

• what are namespaces?

Todos

• design of an elegant solution for namespace access

• extension of the internal BaseX storage

• unterstanding of the specification

<Address

xmlns:name="names"

>

<

name:First

>

John

</

name:First

>

<

name:Family

>

McHilton

</

name:Family

>

<Street>

12 Donovan Road

</Name>

<Town>

Chicago, 31072

</Town>

</Address>

<Address>

<FirstName>

John

</FirstName>

<FamilyName>

McHilton

</FamilyName>

<Street>

12 Donovan Road

</Name>

<Town>

Chicago, 31072

</Town>

</Address>

XPath:

//name:*

(5)

Topics – Backend

DTD Parsing

• what is a DTD?

– defines the document structure and entities

– allows document validation

Todos

• extension of the XML parser

• integration of validate commands

• unterstanding of the specification…

<!ELEMENT

mondial

(

country

*) >

<!ELEMENT

country

(

name

,

city

*) >

<!ELEMENT

name

(#PCDATA) >

<!ELEMENT

city

(#PCDATA) >

<!ATTLIST

country id

ID #REQUIRED >

<!ENTITY

uuml

ü

“ >

<

mondial

>

<

country id

=“

f0_136

">

<

name

>

Germany

</

name

>

<

city

>

M&uuml;nchen

</

city

>

</

mondial

>

(6)

Topics – Backend

XQuery Optimizations

• sample (returns all media with the title “Casablanca”):

• possible query plans:

– parse all

Medium

and

Title

tags (sequential scan)

very slow…

– access the index and check results

…much faster!

Todos

• implementation of existing XPath optimizations for XQuery

• learning much about XQuery and tree-structured optimizations!

for

$i

in doc(

"library.xml"

)

//Medium

where

$i/Title

=

"Casablanca"

(7)

Topics – Backend

Index Management

• current state: one index for all texts & attribute value

• desirable: special-purpose indexes:

– indexes for single tags/attributes

– indexes on numeric values

"

range queries

– index for approximate text search

Todos

• extension of the existing indexes

• adaptation of the query optimizations

• thoughts on new index structures

<

Medium

>

<

Title

>

Matrix

</

Title

>

<

Year

>

1999

</

Year

>

<

Type

>

DVD

</

Type

>

</

Medium

>

<

Medium

>

<

Title

>

Matrix Reloaded

</

Title

>

<

Year

>

2003

</

Year

>

<

Type

>

DVD

</

Type

>

</

Medium

>

(8)

Topics – Frontend

View Schemas

• XML structure and contents can be very diverse:

– attribute-based storage

<

item id

="0"

firstName

="Hans"

lastName

="Gruber"

title

="B.Sc." />

<

item id

="1"

firstName

="Thomas"

lastName

="Schmid"

title

="Prof." />

– text-based storage

<

item

><

id

>0</

id

><

first

>Hans</

first

><

last

>Gruber</

last

><

title

>...

– flat vs. hierarchic data

• desirable: view definitions to optimize visualization output

Todos

• analysis of existing XML documents

• design of a view schema

(9)

Topics – Frontend

TreeMap

• space-filling visualization for hierarchic data

• diversity of layout algorithms available

• numerous attributes unexploited: color, intensity, …

• popular example: size-based

file system visualization

Todos

• visualization of

tree-structured data

• implementation of efficient

Java visualizations

(10)

Topics – Frontend

Visualization

• numerous visualizations

exist for tree-structured data:

– conventional tree view

– hyperbolic view

– interring, …

Todos

• visualization of

tree-structured data

• implementation of efficient

Java visualizations

(11)

Organization

First…

• take some time for your decision

• feel free to suggest own topics

Events

• project is accompanied by a weekly project seminar

• seminar includes regular updates between all

members and one talk on your project

• Room: E217

'

88-4449

References

Related documents

The XML schema collection stores the imported XML schemas and quantity then used to clay the following Validate XML instances Type the XML data domain it is stored in church

The MDL XML language schema consists of an XML schema document (XSD) file that defines the structure of valid MDL instance documents.. Additionally, an automatic

Title II disability recipients (SSDI and Adult Disabled Child benefits) who appeal Social Security's determinations that they are no longer disabled also have 10 days in which to

jumpstart JSON Schema development you can counsel the JSON Schema generator to create the valid schema based an existing XML Schema or JSON instance document The JSON Schema

An XML schema collection is a metadata object across the blunt that contains one cover more XML Schema Definition XSD language schemas It is used to validate xml data type instances

We have chosen the I.C number of each students of 2DAA as we choose to collect a numeric data for the tasks. A numerical data is also known as quantitative data which consists

mellonella larvae model can be used as an infection model for GBS isolates from all host species and reflects the hypervirulence of ST283 and ST17, both of which are associated

For parsing with references of schemas parse xml schema described with what that parses input xml files, which xml driver wrongly assumes that. You with parsing event