Introduction to Databases
Marek Rychly
[email protected] Strathmore University, @iLabAfrica &Brno University of Technology, Faculty of Information Technology
Advanced Databases and Enterprise Systems 24 August 2015
Outline
1
Course Content
Lectures and Assessments
Evaluation
Background on the Course
2
Introduction to Databases
From File-Based Storage to Database Management Systems
Database Schema and Data Model
Database Architecture
3
Theoretical Background
Sets and Relations
Functions
Course Content
Introduction to Databases Theoretical Background
Lectures and Assessments
Evaluation
Background on the Course
Lectures and Assessments
1st Introduction to Databases2nd Relational Database Model
(1stAssessment: define relations, keys, and constraints) 3rd Relational Algebra and Calculus
(2ndAssessment: make expressions in relational algebra/calculus) 4th–5th SQL
(3rdAssessment: create database objects and define several queries)
6th Transactions and Transaction Management 7th Database Analysis and Design
8th Entity-Relationship Modelling 9th Normalization and Database Design
(4thAssessment: make an ER model and perform the normalization) 10th Database Performance
11th Data warehousing and Data-mining;
Post-relational Databases
12th Current developments in database technology;
Distributed, Object-oriented, and NoSQL Databases
Course Content
Introduction to Databases Theoretical Background
Lectures and Assessments
Evaluation
Background on the Course
Core Reading Materials
T. Connolly, T.M. Connolly, and C.E. Beg.
Database Systems: A Practical Approach to Design,
Implementation, and Management
.
Pearson Education, 2014.
ISBN 9780132943260.
URL
http://www.pearsonhighered.com/educator/product/
9780132943260.page#dw_resources
.
M. Kifer, A.J. Bernstein, and P.M. Lewis.
Database Systems: An Application-oriented Approach
.
Pearson/Addison-Wesley, 2006.
ISBN 9780321268457.
URL
http://www.pearsonhighered.com/educator/product/
9780321268457.page#dw_resources
.
Course Content
Introduction to Databases Theoretical Background
Lectures and Assessments
Evaluation
Background on the Course
Evaluation
50%
assessments
(each result has to be submitted to Marek Rychly <[email protected]> by, at latest, two weeks after the assessment specification in a lecture)
1
stAssessment by 14 September 2015
2
ndAssessment by 21 September 2015
3
rdAssessment by 28 September 2015
4
thAssessment by 16 October 2015
5
thAssessment by 26 October 2015
50%
final exam
Course Content
Introduction to Databases Theoretical Background
Lectures and Assessments Evaluation
Background on the Course
Prior Knowledge and Skills
The course does not require any prior knowledge or skill,
however, the following may help you to understand the lectures:
1
basic mathematical background
(sets, relations, functions)2
basics of software engineering
(software development process and its phases, software life-cycle) 3
software modelling
(data modelling, Class Diagram in Unified Modelling Language) 4
concepts of object-oriented programming
(encapsulation, composition, inheritance) 5
operation system design principles
(file system organization / data blocks, memory management / paging) 6
concepts of concurrent computation
Course Content
Introduction to Databases Theoretical Background
Lectures and Assessments Evaluation
Background on the Course
Lecturer Introduction
RNDr. Marek Rychly, Ph.D.
assistant professor at BUT FIT
Brno University of Technology
Faculty of Information Technology
Brno, Czech Republic, Central Europe
over 10 years of lecturing experience
(programming, cryptography, databases, SW architectures)
over 25 of scientific conference and journal publications
(dynamic architectures, SOA, Web services, distributed systems)
a researcher in several national and international projects
a happily married man with two small kids :-)
http://www.fit.vutbr.cz/homes/rychly/.en https://www.linkedin.com/in/rychly
Course Content
Introduction to Databases Theoretical Background
Lectures and Assessments Evaluation
Background on the Course
Students Introduction? (optional)
What is your name? Where are you from?
What is your educational background?
(in general, in IT, any database experiences, etc.)
What is your professional background?
(a job title, why do you need this course, etc.)
What do you want to focus on in this course?
(see the list of lectures on page 4)
Is there anything you would like to skip?
(because you already know about that or because you are not interested in) Please send your (very short) introduction [email protected] subject “ADES Student Introduction”. Knowing your background and priorities will help me to
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model Database Architecture
How to Store Data? File-based Decentralized Storage
each application site/domain manages its own data
(a sales dept. manages “sales” data, a contract dept. manages “contract” data)
data saved to/loaded from local files by individual applications
(one file per one application instance)
data processing and querying is done by individual applications
(the application interprets the data)
possible data transfers between site/domains are done manually
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model Database Architecture
File-based Decentralized Storage II
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model Database Architecture
File-based Decentralized Storage III
Advantages and Disadvantages
+
easy to implement
(an application just save data to or load data from a file)
+
full control over data
(local data storage is easy to control, secure, and backup)
separation and isolation of data
(it is more difficult to access data from different locations or to combine them)
duplication of data
(the same data stored in different locations, difficult to control/synchronize them)
program-data dependence
(structure of the data stored is limited by a related application/program, it is difficult to modify the structure without modification of the program)
incompatible file formats
(different applications store data in different formats, e.g., little/big endian)
fixed queries/proliferation of application programs
(data are queried by their applications which implement just limited/currently required set of queries, it is difficult to implement another queries)
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model Database Architecture
How to Store Data? Database Approach
Definition (Database, DB)
A shared collection of logically related data, and a description of this
data (i.e., meta-data, a system catalogue), designed to meet the
information needs of an organization (to answer queries).
Definition (Database Management System, DBMS, DB server)
A software system that enables users to define, create, maintain, and
control access to the database (it manages data and meta-data).
Definition (Database Application Program, DB client)
A computer program that interacts with the database by issuing an
appropriate request (typically an SQL statement) to the DBMS.
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model Database Architecture
Database (DB)
An Example of Entity-Relationship Diagram (entities, attributes, relationships)
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model Database Architecture
Database Management System and App. Program
DB Clients Accessing Data by Means of Centralized DBMS
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model Database Architecture
Database Approach
Advantages and Disadvantages
+
centralized approach
(data are stored in one location, without duplicities, no separation/isolation)
+
data abstraction
(structure of the data is not affected by related applications/programs, it respects well-established practices/normal forms)
+
standardized interface
(DBMS provides a standardized interface to access data, e.g, SQL 2003)
+
performance/security tuning
(DBMS is optimized for performance and security, applications need not to care)
single point of failure
(failure on a DBMS server affects all its clients; solved by replication/distribution)
limited scalability
(scalability by upgrading a DBMS server or a cluster, which is not trivial)
issues related to concurrent access to/modification of data
(multiple concurrent modifications of single data need to be serialized, long-running modifications may require locking the data)
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model Database Architecture
DBMS Environment
1
hardware
(HW servers where DBMSs are running, communication infrastructure, storage) 2
software
(DBMS software, e.g., Oracle 12c, Microsoft Server, PostgreSQL, MySQL, etc.)
3
data
(stored data and their meta-data) 4
procedures
(procedures manipulating the data, invoked on demand or triggered automatically) 5
people
(a data/DB administrator, logical/physical DB designers app. developer, end-user)
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model
Database Architecture
The Three-Level ANSI-SPARC Architecture
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model
Database Architecture
Three Types of Database Schema in ANSI-SPARC
1
External level/External Schemas
(different external views of the database for different users/applications)
data displayed by an application to its users (tables, forms)
results of user queries of data in the database (SELECTs)
database views (predefined SELECTs)
2
Conceptual level/Conceptual Schema/Logical model
(structure of data stored in the database, describes what the data is stored)
all entities, their attributes, and their relationships
keys and constraints on the data
security and integrity information
3
Internal level/Internal schema/Physical model
(physical representation of the database, describes how the data is stored)
storage space allocation for data and indexes
exact data-types for all attributes and their storage size
data encoding, compression, and encryption
Changes in lower levels do not affect levels above.
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model
Database Architecture
Three Types of Database Schema in ANSI-SPARC
Example of the Database Schema in Three Different Levels
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model
Database Architecture
Data Model and Data Modelling
Definition (Data model)
A high-level description of a database schema as an integrated collection of
concepts for describing and manipulating data, relationships between data,
and constraints on the data in an organization.
a data model consists of three components
1
structure of the data (objects/entities, attributes, relationships)
2operations manipulating the data (methods, event handlers)
3integrity constraints (for correct and accurate data)
a data model respect different levels of ANSI-SPARC architecture
(external, conceptual/logical, and internal/physical data models)
the process of construction of a data model is data modelling
there exists different types of data models (i.e., data modelling)
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model
Database Architecture
Types of Data Model and Data Modelling
there were proposed several categories of data models:
1
object-based data models
(e.g., entity-relationship, object, functional, and semantic data models) 2
record-based data models
(e.g., relational, network, and hierarchical data models) 3
physical data models
(e.g., unifying model, frame model)
we will focus on
entity-relationship
and
relational
data models
(these are the most popular models in current databases and DBMS)
moreover, we will focus on
conceptual modelling
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model
Database Architecture
A Sample Database Schema in Relational Data Model
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model
Database Architecture
Classification of DBMS
DBMSs can be classified based on several criteria
based on data model
network, hierarchical
(traditional models not commonly used due to their complexity)
rational, object-relational
(well-known DBMSs like Oracle, MS SQL Server, DB2, PostgreSQL, MySQL/MariaDB, etc.)
object-oriented, XML, document-oriented, key-value, etc.
(suitable for particular applications)
based on user numbers
(single-user, multi-user)
based on database distribution
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems
Database Schema and Data Model
Database Architecture
Eight Functions of Full-scale DBMS (by Codd, 1982)
1
data storage, retrieval, and update (in different external views)
2a user-accessible system catalogue (meta-data)
3
transaction support (atomicity, consistency, integrity, durability)
4concurrency control services (serialization of concurrent requests)
5recovery services (data consistency even on failure)
6
authorization services (security)
7
support for data communication (requests as comm. messages)
8integrity services (with respect to defined constraints)
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems Database Schema and Data Model
Database Architecture
System Architecture with DBMS
DBMS implemented as a database server in multi-tier architecture.
Two-tier client-server architecture:
1
client
(user interface and partially also business and data processing logic) 2
database server
(data validation and access, partially business and data processing logic)
Three-tier client-server architecture:
1
client
(user interface, no business and data processing logic, i.e. it is a “thin client”) 2
application server
(business and data processing logic) 3
database server
(data validation and access)
The app. server may implement a TP Monitor to access DBMS.
(Transaction Processing Monitor provides transaction routing, distribution, load balancing, funnelling, and increased reliability by employing multiple DBMSs)
Course Content
Introduction to Databases
Theoretical Background
From File-Based Storage to Database Management Systems Database Schema and Data Model
Database Architecture
Transaction Processing Monitor
Definition (Transaction Processing Monitor, TP Monitor)
A program that controls data transfer between clients and servers to provide a
consistent environment, particularly for on-line transaction processing (OLTP).
transaction routing
(directs transactions/requests to a particular DBMS, to increase scalability, perform load-balancing, or increase reliability a primary DBMS server is inaccessible)
managing distributed transactions
(performs operations across several heterogeneous DBMSs in single transactions)
funnelling
(establishes a small number of network connections to DBMSs which are utilized/reused for incoming requests of a large number of clients as needed)
Course Content Introduction to Databases
Theoretical Background
Sets and Relations
Functions
Sets and Set Operations
Definition (Set)
A finite or infinite collection of objects in which order has no
significance and multiplicity is (generally) also ignored.
a
2
A
is used to denote that
a
is an element of a set
A
.
A
is a subset of
B
,
A
B
, iff all members of
A
are also members of
B
.
(e.g.,f2;3g f1;2;3;4;5g; an empty set, denoted;, is a subset of all sets)
fx
jp(x
)g
is a set of elements
x
s.t. a property
p
is satisfied by
x
.
(e.g.,fn2Njn<=3gorfn2Zjnis eveng)
A
[
B
= f
x
j
x
2
A
or
x
2
B
g
is an union of sets
A
and
B
.
(e.g.,f1;2;3g [ f3;4;5g = f1;2;3;4;5g)
A
\
B
= f
x
j
x
2
A
and
x
2
B
g
is an intersection of sets
A
and
B
.
(e.g.,f1;2;3g \ f3;4;5g = f3g)
B
n
A
= fxjx
2
B
and
x
=2
Ag
is a relative complement of
A
in
B
.
Course Content Introduction to Databases
Theoretical Background
Sets and Relations
Functions
Cartesian Product and Relations
Definition (Cartesian Product)
The Cartesian product of two sets
A
and
B
, denoted
A
B
(also called
the cross product), is defined to be the set of all points
(
a
;
b
)
where
a
2
A
and
b
2
B
; i.e.,
A
B
= f(
a
;
b
)j
a
2
A
and
b
2
B
g
.
Definition (Relation)
A relation is any subset of a Cartesian product.
For example, let
A
= f
1
;
2
;
3
g;
B
= f
4
;
5
g
C
=
A
B
= f(
1
;
4
); (
1
;
5
); (
2
;
4
); (
2
;
5
); (
3
;
4
); (
3
;
5
)g
D
= f(
a
;
b
)j(
a
;
b
) 2
C
and
a
=
2
g
Then, C is a Cartesian product of
A
and
B
,
Course Content Introduction to Databases
Theoretical Background
Sets and Relations
Functions
Functions
Definition (Function)
A function is a relation that
uniquely
associates members of one set
with members of another set. Formally, function
f
from
A
to
B
, denoted
f
:
A
!
B
,
uniquely
associates every
a
2
A
with object
f
(a) 2
B
.
For example,
f
:
N
!
N;
f
(x
) =
x
2;
f
is a function.
(it is a relationf= f(1;1); (2;4); (3;9); (4;16); : : :g)
g
: f
1
;
2
g ! f
3
g;
g
(
1
) =
3 and
g
(
2
) =
3;
g
is a function.
(it is a relationg= f(1;3); (2;3)g)
h
: f
3
g ! f
1
;
2
g;
h
(
3
) =
1 and
h
(
3
) =
2;
h
is
not
a function.
Summary
Summary
DBMSs are trying to solve issue of file-based storage.
(issues: separation, isolation, duplication, program-data dependence, formats, etc.)
Database schema can be viewed from multiple levels.
(external, conceptual/logical, internal/physical; corresponds to DB schema models)
The most common approaches to data model/modelling are
entity-relationship (ER) model
(based on entities, their attributes and relationships)
relational model
(based on mathematical relations)
conceptual model
(based on the conceptual level of ANSI-SPARC architecture)