• No results found

Semi-structured Data. 1 - Introduction

N/A
N/A
Protected

Academic year: 2021

Share "Semi-structured Data. 1 - Introduction"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

Semi-structured Data

1 - Introduction

(2)

Outline

• Structured Data

• Semi-structured Data

• Why Semi-structured Data?

• The Data Model

(3)

• Data is structured in semantic chunks - entities

VIE, Vienna International, Vienna LHR, London Heathrow, London VIE, LHR, BA

VIE, LHR, OS

BA, British Airways OS, Austrian Airlines

• Similar entities are grouped together - classes

VIE, Vienna International, Vienna LHR, London Heathrow, London

VIE, LHR, BA VIE, LHR, OS

BA, British Airways OS, Austrian Airlines

Structured Data

Airports

Flights

(4)

Structured Data

• Entities in the same class have the same descriptions - attributes

Airports

(VIE, Vienna International, Vienna) (LHR, London Heathrow, London)

(Airport_Code, Name, City)

Airlines

(BA, British Airways) (OS, Austrian Airlines) (Airline_Code, Name)

Flights

(VIE, LHR, BA) (VIE, LHR, OS)

(5)

Structured Data

• Entities in the same class have the same descriptions - attributes

• Attributes in similar entities

Airports

(VIE, Vienna International, Vienna) (LHR, London Heathrow, London)

(Airport_Code, Name, City)

Airlines

(BA, British Airways) (OS, Austrian Airlines)

(Airline_Code, Name) Flights

(VIE, LHR, BA) (VIE, LHR, OS)

(Origin, Destination, Airline)

same format (string, integer, date, etc.) predefined length

all present same order

(6)

Structured Data - Relational Model

• Database model for structured data:

• Records grouped in tables

entities → records (or tuples) classes → tables (or relations)

table attribute

(7)

Structured Data: “On the Fly” Example

Flights Origin Destination Airline VIE LHR British Airways VIE LHR Austrian Airlines

LHR EDI British Airways

LGW GLA EasyJet

Airports Code Name City

VIE Vienna International Vienna

LHR London Heathrow London

LGW London Gatwick London

LCA Larnaca International Larnaca

GLA Glasgow Glasgow

EDI Edinburgh Edinburgh

Airlines Code Name BA British Airways OS Austrian Airlines

(8)

“Persons” Example

Gerti Kappel, 18870, 18896, [email protected] Andreas, Pieris, [email protected], 740072, 18493

Wolfgang Fischl, [email protected], 740050 Bill, Robert, 188316, [email protected]

(9)

Semi-structured Data (SSD)

• Data is structured in semantic entities • Similar entities are grouped in classes

• Entities in the same class may not have the same attributes

• Attributes of similar entities

may have different format may have different length not all required

may have different order

there is structure

but not too much

(10)

Semi-structured Data: “Persons” Example

Gerti Kappel, 18870, 18896, [email protected] Andreas, Pieris, [email protected], 740072, 18493

Wolfgang Fischl, [email protected], 740050 Martin, Fleck, 58801, [email protected]

• There is structure

o Each row is a semantic entity - person

o All entities are grouped in a class - persons

• But not too much structure

o Entities have no regular structure

(11)

Why Semi-structured Data?

• There are data sources that we would like to treat as databases, but which

cannot be constraint by a schema

• Flexible format for exchanging data between different places

… the WEB

(12)

Data Model

• We need an effective way to represent semi-structured data

• Like the relational model for structured data

(13)

Trees as Data Model

Gerti Kappel 18870 18896 [email protected]

name

tel fax email

Gerti Kappel, 18870, 18896, [email protected] Andreas, Pieris, [email protected], 740072, 18493

[email protected] 740072 18493

name

email tel fax

Andreas Pieris

first last person

person persons

(14)

Trees as Data Model

• SSD can be represented as a (labelled) tree:

o leaf nodes standing for single data items

o inner nodes have no label

o edges labelled with elements

• Such a model is called self-describing - information that is usually associated with a schema is contained within the data

• Data carries its own description

Gerti Kappel 18870 18896 [email protected]

name

tel fax email person

person

(15)

SSD: Representing Relational Data

R A B C

a1 b1 c1 a2 b2 c2

Structured data is a special case of semi-structured data +

relational data can be represented as a tree (with an overhead)

a1 R-record A C R-record B b1 c1 a2 A B C b2 c2 R-table

(16)

Store Semi-structured Data

• There are various formalisms to store semi-structured data

o Object Exchange Model (OEM)

o JavaScript Object Notation (JSON)

(17)

Store Semi-structured Data

{persons:

{person:

{name: “Gerti Kappel”

tel: 18870 fax: 18896 email: “[email protected]”} } {person: {name: {first: “Andreas”, last: “Pieris”} email: “[email protected]” tel: 740072 fax: 18493} } }

Gerti Kappel 18870 18896 [email protected]

name

tel fax email

person person

persons

(18)

Store Semi-structured Data

<persons> <person>

<name> Gerti Kappel</name> <tel> 18870</tel>

<fax> 18896</fax>

<email> [email protected] </email> </person>

<person> <name>

<first> Andreas</first> <last> Pieris</last> </name>

<email> [email protected]</email> <tel> 740072</tel>

<fax> 18493</fax> </person>

</persons>

XML Representation

Gerti Kappel 18870 18896 [email protected]

name

tel fax email

person person

(19)

Store Semi-structured Data

• There are various formalisms to store semi-structured data

o Object Exchange Model (OEM)

o JavaScript Object Notation (JSON)

o eXtensible Markup Language (XML)

• Different syntax

• Different mechanisms for self-describing • Different description mechanisms

o Which attributes are allowed/required

o Which values are allowed/required

• Different query languages and manipulation mechanisms

but the goal is the same:

(20)

Sum Up

• Structured Data

o Similar entities grouped in classes

o Similar entities have a regular structure

o Relational Model

• Semi-structured Data

o Similar entities grouped in classes

o Similar entities have irregular structure

o Trees as a Model

• Store Semi-structured Data

o Various formalisms

References

Related documents

90.9per cent respondents residing in the slum areas of Karachi have ranked Police Department number 1 in corruption.. 317 respondents out of the total four

Medical Attention or Special Treatment Needed Specific treatment (Wash areas of contact with water

Children……….45 Figure 3-Drink/ Beverages Provided to Child/ Children………45 Figure 9- Snack Foods Provided to Child/ Children……….46 Parent Reported Self Knowledge Level

As with Knotice’s previous Mobile Email Opens Reports, the mobile email open rate is a measurement of email opens occurring on mobile devices (including phones and tablets) compared

Gender matching of students and teachers in middle and high school math and science courses increases the likelihood of women taking at least one STEM course as a college freshman,

Flexible with toothpaste, new direct york flights from to cmh was a refund for cheap flights are correct them and group; which airlines british airways and book a way to.. Cheap

Terminal 1 is the sea terminal connect the majority of airlines such an Air France British Airways Flybe and Ryanair operating from here Flights operating from Terminal 2 at

The bivariate Hu¨sler-Reiss distribution function H introduced in Hu¨sler and Reiss 1989, is max-stable with unit Gumbel marginal distributions.. The random vector ðS1 ; S2 Þ