CSC-468 Advance Database
Lecture 05
Distributed Database
Aniqa Naeem
TEXT BOOKS FOR DATA WAREHOUSING
1. “Building the Data Warehouse” By Inmon
2. “Database Systems: Models, Languages , Design and Application Programming” By Ramez Elmasri and Shamkant B. Navathe – 6th Edition”
3. “Advance Data Management” By Lena Wiese
REFERENCE BOOKS FOR DATA WAREHOUSING
4. “Data Warehousing (Design, Development and Best Practices)” By Soumendra Mohanty
5. “Mastering Data Warehouse Design” By Claudia Imhoff, Nicolas Galemmo, Jonathan G. Geiger
Books In Library
•
“Fundamentals of Database
Systems", 7th Ed, by Ramez Elmasri
and Shamkant B. Navathe. 2017
•
“Database System: A Practical
Approach to design,
Implementation and
Management” , 4
thEd, by Thomas
Last lecture
•
ROLLUP
calculates aggregations such as
SUM,
COUNT, MAX, MIN, and AVG at increasing levels
of aggregation
, from the
most detailed up to a
grand total.
•
CUBE
is an extension similar to ROLLUP, enabling
a single statement to calculate
all possible
combinations of aggregations
.
•
The
GROUPING
functions help you identify the
Outline
•
Distributed Database
•
Types and Characteristics
•
Types of Architectures for Parallel DBMS
•
Advantages of DDBMS
Chapter 22– (22.1)- Chapter 12 in 2016 book Book: Database System: A Practical Approach to design, Implementation and
Management” , 4th Ed, by Thomas Connolly
and Carolyn Begg
Distributed Database
•
A
logically interrelated collection of shared
data
(and a description of this data) physically
distributed over a computer network.
•
Note that the
physical distribution does not
necessarily imply that the
computer systems
be geographically far apart
; they could
Distributed DBMS
•
The software that permits the management of the distributed
database and
makes the distribution transparent to the users.
•
Distributed Database Management System (DDBMS)
consists of a
single logical database that is split into a number
of fragments
.
•
Each fragment is stored on one or more computers under the
control of a
separate DBMS
, with the computers connected
by a
communication network
.
•
Each SITE is capable of
independently processing
user
requests that requires access to the
local data
and is also
capable of processing data stored on other computers on the
network.
1) Local Applications: Applications that do not require data
from
other sites.
2) Global Applications: Applications that do require data from other
sites.
Characteristics of DDBMS: (Features)
• Collection of logically related shared data. • The data is split into number of fragments. • Fragments may be replicated.
• Fragments/ replica is allocated to SITES.
• The sites are linked by a communication network. • The data at each site is under the control of a DBMS.
• The DBMS at each site can handle local applications autonomously. • Each DBMS participated in at least one global application
Types and Characteristics
5 CENTRALIZED DATABASE MANAGEMENT SYSTEM OR DISTRIBUTED PROCESSING
6
DISTRIBUTED DATABASE MANAGEMENT SYSTEM
Using DDB technology, NADRA may implement their database system on a number of separated computer systems rather than a single, centralized mainframe. The computer systems may be located at each local branch office: for example, Karachi, Lahore and Islamabad.
A network linking the computers will enable the branches to communicate with each other and a DDBMS will enable them to access data stored at another branch office.
Thus, a client living in Karachi can go the nearest office to find out status of the ID Card rather than having phone or write to the
Islamabad branch for details.
Alternatively, if each NADRA branch office already has its own database, a DDBMS can used to integrate the separate databases into single, logical database, again making the local data more widely available.
Example
• Fundamental Principle of Distributed DBMS (Transparency):
• The fundamental principle of Distributed DBMS is Transparency, i.e. the system is expected to make distribution transparent (invisible) to the user.
• Difference b/w Distributed Processing ,Distributed DBMS and Parallel Processing:
• Distributed Processing or Centralized Database:
• A centralized database that can be accessed over the computer network. (Fig-1 on slide 5)
• Distributed Database:
• A logically interrelated collection of shared data (and a description of this data) physically distributed over a computer network.
(Fig-2 on slide 6)
• The key point with the definition of a distributed DBMS is that the system consists of data that is physically distributed across number of sites in the network. If the data is centralized even though other users may be accessing the data over the network, we don’t consider this to be DDBMS, simply
distributed processing.
• A DBMS running across multiple processors and disks that is designed to execute operations in parallel, whenever possible, in order to improve performance.
• Parallel DBMS links multiple smaller machines to achieve the same throughput as a single, larger machine, often with greater
scalability and reliability than single processor.
• To provide multiple processors with common access to a single
database, a parallel DBMS must provide with shared resource management.
Parallel DBMS or Parallel Processing
Types of Architectures for Parallel DBMS
1. Shared Memory Architecture
2. Shared Disk Architecture
• It is often known as Symmetric Multiple Processing (SMP).
• It is a tightly coupled architecture in which multiple processors within a single system share the same system memory.
• This approach has become popular on platforms ranging from
personal workstations that support a few microprocessors in parallel, to large RISC (Reduced Instruction Set Computing) based machines, all the way up to the largest mainframes.
• This architecture provides high-speed data access for a limited number of processors for about 64 processors.
•
It’s a
loosely-coupled architecture
optimized for
applications that are
inherently centralized
and require
high availability and performance.
•
Each processor can access all disks directly, but each
has its own private memory.
•
Like
the sharednothing architecture, the shared disk
architecture
eliminates shared memory performance
bottleneck.
•
Shared disk systems are sometimes referred to as
Clusters.
18
• It is often known as Massively Parallel Processing (MPP).
•
It’s a multiple processor architecture in which each
processor is a part of a complete system, with its own
memory and disk storage
.• The database is partitioned among all the disks on each system associated with the database, and data is transparently available to
users on all systems.
• This architecture is more scalable than shared memory architecture (SMP) and can easily support a large number of processors.
•
While the
shared nothing architecture
definition some
times include distributed database management system
(DDBMS)
,
the distribution of data in
parallel DBMS
is
based solely on
performance consideration
s.
•
Further, the nodes of a
DDBMS
are typically
geographically distributed
, separately administered and
have a slower interconnection network whereas the
nodes of a
parallel DBMS
are typically within the
same
computer or within the same SITE.
Differences
•
A
multiprocessor system design
(Parallel processing)
is rather
symmetrical
, consisting of a number of
identical processors
and memory components and
controlled by one or more copies of the same
operating
system
.
This
is not true
is
distributed computing
system, where the
heterogeneity of the operating system
as well as the hardware is quite common.
•
Parallel Database
Technology is typically used for
very large database
of the order of terabytes (10
12bytes)
or the systems that have to process
thousands of
transactions per seconds
.
Parallel DB System
Distributed DB System
on Multiprocessor and
single DB system
DB is geographically
separated
Symmetry and
homogeneity of sites
(architecture/schema
should be same)
There may be
1) Reflects organizational structure:
Many organizations are naturally distributed over several locations. For example, NADRA has different offices in different cities of Pakistan. It is natural for database to use in such an application to be distributed over these locations.
2) Improved shareability and local autonomy:
The geographical distribution of an organization can be reflected in the distribution of data.; users at one site can access data stored at other sites. Data can be placed at the site close to the users who normally use that data. In this way, users have local control of the data, and they can consequently establish and enforce local policies regarding the use of this data.
Advantages of Distributed Database
Management System
19
3) Improved availability:
In a centralized DBMS, a computer failure terminates the operation of the DBMS. However a failure at one site of a DDBMS, or a failure of a communication link making some sites inaccessible, does not make the entire system inoperable. Distributed DBMS are designed to continue to function despite such failures.
4) Improved reliability:
As data may be replicated so that it may exists at more than one site, the failure of a node or a communication link does not necessarily make the data inaccessible.
5) Improved Performance:
20
6) Modular Growth:
In distributed environment, it is much easier to handle expansion.
New sites can be added to the network without affecting the operations of the other sites. This flexibility allows an organization to expand relatively easily. Increasing database size can usually be handled by adding processing and storage power to the network.
In a centralized DBMS, growth may entail changes to both hardware and software.
7) Transparency and Fragmentation:
Transparent system hides the implementation details from users i.e. the user may execute the same query without having any clue that where the data for which he is asking for resides on which geographical site.
Distribution of data over different geographical SITES is called