• No results found

IV Distributed Databases - Motivation & Introduction -

N/A
N/A
Protected

Academic year: 2021

Share "IV Distributed Databases - Motivation & Introduction -"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

IV Distributed Databases

IV Distributed Databases

--

Motivation & Introduction

Motivation & Introduction

-

-I OODBS II XML DB

III Inf Retr DModel •

Motivation

• Expected Benefits

• Technical issues

• Types of distributed DBS

• 12 Rules of C. Date

• Parallel vs Distributed DBS

References

References

M.T. Özsu and P. Valduriez. Principles of Distributed Database Systems, 2nd edition. Prentice-Hall,1999.

 Rahm, E.: Mehrrechner-Datenbanksysteme, Addison-Wesley, 1994

G. Vossen, G. Weikum: Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery,

Morgan Kaufmann, 2001, ISBN ISBN: 1558605088 Gray, J.; Reuter, A.: Transaction Processing - Concepts and

Techniques, Morgan Kaufmann Publishers, San Matteo, 1993 Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency

Control and Recovery in Database Systems, Addison-Wesley, 1987 (pdf)

Bernstein, P.A., Newcomer, E.: Principles of Transaction Processing, Morgan Kaufmann, San Matteo, 1997

Material used from B. Kemme (McGill), H. Garcia-Molina (Stanford), A. Zaslavsky et al.(Monash), G. Alonso (ETH)

(2)

hs / FUB dbsII-03-10DDBIntro-3

Motivation

Motivation



Application

: Data "naturally" distributed

Companies with different branches Airlines

Financial Business University / faculties

Any organization with a decentralized organizational structure



Technology

: Network infrastructure, processors, RAM



Economy

: Hardware cost



Software supporting Distributed Processing, e.g RPC

Ö

Huge number of interconnected systems

Recent challenge: Web-based Computing

Ö

E-Commerce

hs / FUB dbsII-03-10DDBIntro-4

Goals: Improvement of non functional characteristics

Goals: Improvement of non functional characteristics



Performance:

the more computing power, the better

Primary goal for parallel DBS, not necessary distributed DB



Reliability:

Substitute faulty components (HW, software… … and network) seamlessly

Fault tolerance: the ability to hide failures from users Related to higher availability

95,8 % too low? Definitely: 1 hour / day !



Scalability

upscale / downscale your system incrementally

Central components and algorithms counter productive ÖDistributed algorithms

(3)

hs / FUB dbsII-03-10DDBIntro-5

The dark side of distribution

The dark side of distribution



Systems often less reliable

"You will never make a system of unreliable components more reliable by adding more unreliable components" However: hot standby

But: data copies must be kept consistent, complex software, unreliable network.



Scalability

DS inherently complex

High development cost -> middleware efforts High administration cost

Ö

lack of flexibility

The dark side …

The dark side …



Performance

Double resources do not guarantee double performance Network performance?

Q Transfer time not only depends on bandwidth

Transfer of 4 KB page

latency Bandwidth transfer - 100 m 0.5 µs 10 Mbps 5 ms - 100 m 0.5 µs 100 Mbps 0.5 ms - 1 km 5 µs 100 Mbps 0.5 ms - 100 km 0.5 ms 100 Mbps 1 ms - 1000 km 5 ms 100 Mbps 5.5 ms - 10000 km 50 ms 1 Gbps 50 ms

Q Distance > 100 km Ösignal propagation time dominates

(4)

hs / FUB dbsII-03-10DDBIntro-7

What is a Distributed Database?

What is a Distributed Database?



A distributed database (DDB) is a collection of

multiple,

logically interrelated

databases distributed

over a

computer network.



A distributed database management system (D–

DB

M

S) is the

software that manages the DDB

and

provides an access mechanism that makes this

distribution transparent to the users

.



Distributed database system (DDBS) = DDB + D–

DB

M

S

Def. by P. Valduriez, T. Öszu

hs / FUB dbsII-03-10DDBIntro-8

Example (1)

Example (1)



Transparency of distribution:

one

logical DB

UPDATE empl

SET sal = sal*1.1 WHERE proj.dur>12 AND emp.id = ass.eid AND proj.id=ass.pid Expl. by B. Kemme Berlin New York Munic Muc projects Muc employees Muc assigments NY employees All projects Berlin employees All assigments net

(5)

hs / FUB dbsII-03-10DDBIntro-9

Example (2)

Example (2)



Cooperation: autonomous DB cooperating on

particular tasks

SELECT flights

WHERE departure = Montreal AND arrival = Munich

AND date = 12/9/2002 AND price < 800$ lufthansa.com air-canada.com Travel-overland.com net

Example(3)

Example(3)



Autonomous, heterogenous systems, logically

identical data types

Select empl

SET sal = sal*0.9 WHERE jobTitle =

"product manager"

Daimler / Stuttg.

Daimler / Bremen

Chrysler / Detroit

Only Detroit data Oracle 9i

Only Bremen data MySQL

OnlyStuttgart data IBM DB2

(6)

hs / FUB dbsII-03-10DDBIntro-11

Example (4)

Example (4)



Sophisticated Client / Server computing

client

client clientclient

Application Server A Application Server B

Possible R/W conflict

hs / FUB dbsII-03-10DDBIntro-12

Classification criteria

Classification criteria



Distribution

Physically independent systems

Peer-to-peer: data distribution and sharing

Client / Server: function distribution e.g. parsing in client



Heterogeneity

DBMS software

Database schema (Types) and languages (SQL variants)



Autonomy

No global control

Local DBS operations may not influenced by global operations (e.g. of a global transaction) Note: subsumes completely independent or

(7)

hs / FUB dbsII-03-10DDBIntro-13

Classification cube

Classification cube

by P. Valduriez, T. Öszu

Distributed DB: looks like one DB

Federated: more autonomy but not independent (Expl. 3) Multi DB: independent, cooperative (Expl. 2)

Scenarios and common problems

Scenarios and common problems



Not just one distributed database systems

.. but indefinitely many



Understand common problems

e.g. how to guarantee one state for replicated data

from the user point of view



Solve by developing distributed algorithms

e.g. transaction commit

Any unsolvable problems?

Example: Internet marriage

priest

bride groom

Distributed transaction: YES of NO,

this is the question

All participants and communication unreliable

Main issue: Partial failure

(8)

hs / FUB dbsII-03-10DDBIntro-15

12 +1 rules for DDBS (C. Date)

12 +1 rules for DDBS (C. Date)

Rule 0: A DDB looks like a central DB to users

Rule 1: sites should be as independent as possible –local autonomy Rule 2: There should not be a central master all sites are

dependent on -No reliance on central site

Rule 3: Never a need for complete shutdown –continuous operation Rule 4: Users should not need to know where data are stored

- location transparency (independence)

Rule 5: If data are split (e.g. columns of one relation) and distributed over several sites, user's should not be aware of it

-fragmentation transparency

hs / FUB dbsII-03-10DDBIntro-16

12 rules…

12 rules…

Rule 6: Users should not be aware of replicated data -replication independence

Rule 7: Efficient distributed query processing Rule 8: Global concurrency control and recovery

–distributed transaction management Rule 9: Hardware independence

Rule 10: OS independence Rule 11: Network independence Rule 12: DBMS independence

(9)

hs / FUB dbsII-03-10DDBIntro-17

Parallel versus Distributed Databases

Parallel versus Distributed Databases



More similarities than differences



Similar to Parallel / Distributed Processing

distinction



Parallel DBS

Not geographically distributed Goal: High Performance Homogenous Software Fast interconnect



Distributed DBS

Data geographically distributed Goal: Data sharing

Disconnected operation possible -> autonomy

Transparency

Parallel / distributed DBS

Parallel / distributed DBS



Query processing in parallel DBS

Distribute operators (sort, filter,…) an data over processor to make complex processing fast

Join (R, S) { // |R| >> | S|

1. Split R into n-1 partitions Ri and assign to Mi/Pi; Assign S to processor / memory Pn / Mn;

2. Sort Ri and S; ( //n parallel

3. Join (n-1) + 1 streams } e.g. join on a shared disk MP system M1 Mn P P P P

(10)

hs / FUB dbsII-03-10DDBIntro-19

Parallel / distributed DBS

Parallel / distributed DBS



Distributed QP

Given a data distribution

Find strategy to evaluate query with minimal cost,

in particular communication cost

|S| = 100000 records

Compute with minimal cost (time): R ZY S ZY T |R| = 10000 records |T| = 1000 records 10000 km 100 km hs / FUB dbsII-03-10DDBIntro-20

Important terms

Important terms



Motivation: technology, application, economy



Expected benefits:

Scalability

reliability

performance



Data / function distribution



Fault tolerance in case of partial failures



Autonomy , multi database, federated DB



Distribution transparency

References

Related documents