ADBMS-lec 5- Distributed DB.pptx

(1)

CSC-468 Advance Database

Lecture 05

Distributed Database

Aniqa Naeem

(2)

TEXT BOOKS FOR DATA WAREHOUSING

1. “Building the Data Warehouse” By Inmon

2. “Database Systems: Models, Languages , Design and Application Programming” By Ramez Elmasri and Shamkant B. Navathe – 6th Edition”

3. “Advance Data Management” By Lena Wiese

REFERENCE BOOKS FOR DATA WAREHOUSING

4. “Data Warehousing (Design, Development and Best Practices)” By Soumendra Mohanty

5. “Mastering Data Warehouse Design” By Claudia Imhoff, Nicolas Galemmo, Jonathan G. Geiger

(3)

Books In Library

•

“Fundamentals of Database

Systems", 7th Ed, by Ramez Elmasri

and Shamkant B. Navathe. 2017

•

“Database System: A Practical

Approach to design,

Implementation and

Management” , 4

th

Ed, by Thomas

(4)

Last lecture

•

ROLLUP

calculates aggregations such as

SUM,

COUNT, MAX, MIN, and AVG at increasing levels

of aggregation

, from the

most detailed up to a

grand total.

•

CUBE

is an extension similar to ROLLUP, enabling

a single statement to calculate

all possible

combinations of aggregations

.

•

The

GROUPING

functions help you identify the

(5)

Outline

•

Distributed Database

•

Types and Characteristics

•

Types of Architectures for Parallel DBMS

•

Advantages of DDBMS

(6)

Chapter 22– (22.1)- Chapter 12 in 2016 book Book: Database System: A Practical Approach to design, Implementation and

Management” , 4th_{Ed, by}_{Thomas Connolly}

and Carolyn Begg

(7)

Distributed Database

•

A

logically interrelated collection of shared

data

(and a description of this data) physically

distributed over a computer network.

•

Note that the

physical distribution does not

necessarily imply that the

computer systems

be geographically far apart

; they could

(8)

Distributed DBMS

•

The software that permits the management of the distributed

database and

makes the distribution transparent to the users.

•

Distributed Database Management System (DDBMS)

consists of a

single logical database that is split into a number

of fragments

.

•

Each fragment is stored on one or more computers under the

control of a

separate DBMS

, with the computers connected

by a

communication network

.

•

Each SITE is capable of

independently processing

user

requests that requires access to the

local data

and is also

capable of processing data stored on other computers on the

network.

(9)

1) Local Applications: Applications that do not require data

from

other sites.

2) Global Applications: Applications that do require data from other

sites.

Characteristics of DDBMS: (Features)

• _{Collection of}_{logically related shared data}_. • _{The data is}_split_{into number of}_fragments_. • _{Fragments may be}_replicated_.

• Fragments/ replica is allocated to SITES.

• The sites are linked by a communication network. • The data at each site is under the control of a DBMS.

• _{The DBMS at each site can handle}_{local applications autonomously}_. • _{Each DBMS participated in at least}_{one global application}

Types and Characteristics

(10)

5 CENTRALIZED DATABASE MANAGEMENT SYSTEM OR DISTRIBUTED PROCESSING

(11)

6

DISTRIBUTED DATABASE MANAGEMENT SYSTEM

(12)

Using DDB technology, NADRA may implement their database system on a number of separated computer systems rather than a single, centralized mainframe. The computer systems may be located at each local branch office: for example, Karachi, Lahore and Islamabad.

A network linking the computers will enable the branches to communicate with each other and a DDBMS will enable them to access data stored at another branch office.

Thus, a client living in Karachi can go the nearest office to find out status of the ID Card rather than having phone or write to the

Islamabad branch for details.

Alternatively, if each NADRA branch office already has its own database, a DDBMS can used to integrate the separate databases into single, logical database, again making the local data more widely available.

Example

(13)

• Fundamental Principle of Distributed DBMS (Transparency):

• The fundamental principle of Distributed DBMS is Transparency, i.e. the system is expected to make distribution transparent (invisible) to the user.

• Difference b/w Distributed Processing ,Distributed DBMS and Parallel Processing:

• Distributed Processing or Centralized Database:

• A centralized database that can be accessed over the computer network. (Fig-1 on slide 5)

• Distributed Database:

• A logically interrelated collection of shared data (and a description of this data) physically distributed over a computer network.

(Fig-2 on slide 6)

• The key point with the definition of a distributed DBMS is that the system consists of data that is physically distributed across number of sites in the network. If the data is centralized even though other users may be accessing the data over the network, we don’t consider this to be DDBMS, simply

distributed processing.

(14)

• A DBMS running across multiple processors and disks that is designed to execute operations in parallel, whenever possible, in order to improve performance.

• Parallel DBMS links multiple smaller machines to achieve the same throughput as a single, larger machine, often with greater

scalability and reliability than single processor.

• _{To provide}_{multiple processors}_{with common access to a}_single

database, a parallel DBMS must provide with shared resource management.

Parallel DBMS or Parallel Processing

(15)

Types of Architectures for Parallel DBMS

1. Shared Memory Architecture

2. Shared Disk Architecture

(16)

• It is often known as Symmetric Multiple Processing (SMP).

• It is a tightly coupled architecture in which multiple processors within a single system share the same system memory.

• This approach has become popular on platforms ranging from

personal workstations that support a few microprocessors in parallel, to large RISC (Reduced Instruction Set Computing) based machines, all the way up to the largest mainframes.

• This architecture provides high-speed data access for a limited number of processors for about 64 processors.

(17)

(18)

•

It’s a

loosely-coupled architecture

optimized for

applications that are

inherently centralized

and require

high availability and performance.

•

Each processor can access all disks directly, but each

has its own private memory.

•

Like

the sharednothing architecture, the shared disk

architecture

eliminates shared memory performance

bottleneck.

•

Shared disk systems are sometimes referred to as

Clusters.

18

(19)

(20)

• It is often known as Massively Parallel Processing (MPP).

•

It’s a multiple processor architecture in which each

processor is a part of a complete system, with its own

memory and disk storage

.

• _{The database is}_partitioned_{among all the}_disks _{on each system} associated with the database, and data is transparently available to

users on all systems.

• This architecture is more scalable than shared memory architecture (SMP) and can easily support a large number of processors.

(21)

(22)

•

While the

shared nothing architecture

definition some

times include distributed database management system

(DDBMS)

,

the distribution of data in

parallel DBMS

is

based solely on

performance consideration

s.

•

Further, the nodes of a

DDBMS

are typically

geographically distributed

, separately administered and

have a slower interconnection network whereas the

nodes of a

parallel DBMS

are typically within the

same

computer or within the same SITE.

Differences

(23)

•

A

multiprocessor system design

(Parallel processing)

is rather

symmetrical

, consisting of a number of

identical processors

and memory components and

controlled by one or more copies of the same

operating

system

.

This

is not true

is

distributed computing

system, where the

heterogeneity of the operating system

as well as the hardware is quite common.

•

Parallel Database

Technology is typically used for

very large database

of the order of terabytes (10

12

bytes)

or the systems that have to process

thousands of

transactions per seconds

.

(24)

Parallel DB System

Distributed DB System

on Multiprocessor and

single DB system

DB is geographically

separated

Symmetry and

homogeneity of sites

(architecture/schema

should be same)

There may be

(25)

1) Reflects organizational structure:

Many organizations are naturally distributed over several locations. For example, NADRA has different offices in different cities of Pakistan. It is natural for database to use in such an application to be distributed over these locations.

2) Improved shareability and local autonomy:

The geographical distribution of an organization can be reflected in the distribution of data.; users at one site can access data stored at other sites. Data can be placed at the site close to the users who normally use that data. In this way, users have local control of the data, and they can consequently establish and enforce local policies regarding the use of this data.

Advantages of Distributed Database

Management System

(26)

19

3) Improved availability:

In a centralized DBMS, a computer failure terminates the operation of the DBMS. However a failure at one site of a DDBMS, or a failure of a communication link making some sites inaccessible, does not make the entire system inoperable. Distributed DBMS are designed to continue to function despite such failures.

4) Improved reliability:

As data may be replicated so that it may exists at more than one site, the failure of a node or a communication link does not necessarily make the data inaccessible.

5) Improved Performance:

(27)

20

6) Modular Growth:

In distributed environment, it is much easier to handle expansion.

New sites can be added to the network without affecting the operations of the other sites. This flexibility allows an organization to expand relatively easily. Increasing database size can usually be handled by adding processing and storage power to the network.

In a centralized DBMS, growth may entail changes to both hardware and software.

7) Transparency and Fragmentation:

Transparent system hides the implementation details from users i.e. the user may execute the same query without having any clue that where the data for which he is asking for resides on which geographical site.

Distribution of data over different geographical SITES is called

(28)