Introduction to Database Systems
Winter term 2013/2014
Melanie Herschel [email protected] Université Paris Sud, LRI
1
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Chapter 1
Introduction
• After completing this chapter, you should be able to:
‣ Explain basic notions: database state, schema, query, update, data model, DDL, DML,
‣ Explain the role of the DBMS,
‣ Explain data independence,
declarativity, and the three-schema architecture,
‣ Name different classes of users of a database application system,
‣ Name some DBMS tools.
2
1. Introduction 2. ER-Modeling 3. Relational model(ing) 4. Relational algebra 5. SQL
Chapter 1 Introduction
• Basic Database Notions
• Database Management Systems (DBMS)
• Programmer’s View, Data Independence
• Database Users and Database Tools
3
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Task of a Database
Query Answering
• What is a database? Difficult question. There is no precise and generally accepted definition.
• Naïve approach: The main task of a database system (DBS) is to answer certain questions (queries) about a subset of the real world, e.g.,
Questioning a DBS
Which homeworks has
Ann Smith completed? 1
Database System
24
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Task of a Database
Data Storage
• The DBS acts only as a storage for information. The information must first be entered and kept current.
• In a sense, a DBS is a computerized version of a card-index box / filing cabinet (but more powerful and efficient).
Keeping a DBS current
Ann Smith has done Homework 3 and
received 10 points for it.
Database System
OK5
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Task of a Database
Efficient Data Retrieval and Combination
• Normal database systems do not perform particularly complicated computations on the stored data in order to answer questions.
• However, a DBS can retrieve the requested data quickly from a huge set of data (giga bytes, tera bytes, ≫ main memory).
• A DBS can also aggregate / combine several pieces of stored data to answer more complex questions (“Compute the average points for Homework 3”).
6
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Task of a Database
SQL
• Above, the question “Which homework has Ann Smith completed?” was shown in natural (English) language.
• Making machines understand natural language is a tough task (and bears a large potential for misunderstanding).
• Therefore, questions (or queries) are normally written in a formal language, these days typically SQL.
SQL
•SQL: Structured Query Language, developement started in 1986, current version SQL:2003.
•Pronounced S-Q-L or Sequel.
7
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
State, Query, Update
• The set of stored data is called the database state.
• Entering, modifying, or deleting information (i.e., making updates) changes the database state.
State & query
Update
current state
SQL query
Answer SELECT Homework FROM Solved
WHERE Student = ‘Ann Smith’
current state
Update INSERT INTO Solved
VALUES (‘Ann Smith’, 3, 10)
new state
8
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Structured Information
• Each database can store only information of a predeclared structure (a limited domain of disclosure):
• Because the data are structured, not simply text, complex query formulations are possible, e.g., “How many homeworks has each student done?”.
Update
Today’s special in the
cafeteria is pizza.
Homework DBS
Error9
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Structured Information
• Actually, a database system stores only plain data (character, strings, numbers), and not information.
• Data becomes information by interpretation.
• Therefore, real-world concepts like students, homework,
cafeterias, etc., need to be defined/declared before the database can be used.
A pure text database?
Which types of questions could we pose on a DBS storing text (character strings) only with no further structure provided?
10
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
State vs. Schema
Database schema
• Formal definition of the structure of the database contents.
• Determines the possible database states.
•Defined only once (when DB is created).
• In a programming language, this corresponds to variable declarations (assigning a type to a variable)
Variable declaration
Example of a variable declaration in C: short int i;
Possible states of variable i: -32768 ≤ i ≤ 32767
11
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
State vs. Schema
Database state (instance of the schema)
• Contains the actual data, structured according to the schema.
•Changes often (whenever database information is updated).
• Corresponds to current contents/values of a programming language variable.
Variable state change
In state s, variable i has value 41. Now, perform state change (s to s’) via assignment i = i + 1.
12
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
State vs. Schema
•In the relational model, the data is structured in form of tables (relations).
•Each table has a name, a sequence of named columns (attributes), and a set of rows (tuples).
A table
Student Homework Points
Ann Smith 1 10
Ann Smith 2 8
Michael Jones 1 9
Michael Jones 2 9
SOLVED
DB schema
DB state
13
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Data Model
•Defines a formal language (syntax and semantics) for
‣ Declaring database schema
‣ Querying the current database state
‣ Changing the database state
‣ “Data model” is, regrettably, widely used for “database schema”
•Examples
• (Network model, hierarchical model,) relational model, entity relationship model, object-oriented models, UML, XML.
14
Chapter 1 Introduction
• Basic Database Notions
• Database Management Systems (DBMS)
• Programmer’s View, Data Independence
• Database Users and Database Tools
15
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
DBMS
•A Database Management System (DBMS) is an application–
independent software system that implements a data model, i.e., allows for
•Definition of a DB schema for some concrete application,
• Storage of an instance of this schema on, e.g., a disk,
•Querying the current instance (database state),
•Changing the database state.
16
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
DBMS
• Normal users do not need to use SQL for their daily tasks of data entry or data lookup.
• These users use application programs that have been developed specifically for this task and offer a more accessible user interface.
• Internally, these application programs translate the user requests into SQL statements (queries, updates) in order to communicate with the DBMS.
17
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
DBMS
• Often, several different application programs are used to access the same centralized database.
• For example, the Homework DBS might provide:
‣ A read–only web interface for students.
‣ A program used by the teaching assistant to load homework and exam points.
‣ A program that prints a report for the professor used to assign grades.
• The interactive SQL interface (SQL console) that comes with the DBMS is simply yet another way to access the DBMS.
18
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
DBMS
Accessing a DBMS
User 1 User 2
Application
Program DBMS Tool
(e.g., SQL console)
Database Management System (DBMS)
DB
schema DB
state
19
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
DB Application Systems
• Often, different users access the same database concurrently (i.e., at the same time, touching the same data).
• The DBMS is usually implemented as a background server process (or set of such processes) that is accessed over the network by application programs (clients).
• One can also view the DBMS as an extension of the operating system (a more powerful file system).
20
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
DB Application Systems
Client-Server Architecture
Client User 1 (Application)
Client User 2 (SQL console) Network ...
Server (DBMS)
21
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
DB Application Systems
Three-Tier Architecture
Server (DBMS)
Application Server (App. Web Server)
Thin client (Browser)
22
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
DB Application Systems
A Recap of Database Vocabulary
• A database (DB) consists of a DB schema and a DB state.
• A database management system (DBMS) is a software system that implements a data model (e.g., a relational DBMS (RDBMS) implements the relational model).
• A database system (DBS) consists of a DBMS and a database.
• A database application system consists of a DBS and a set of application programs.
23
Chapter 1 Introduction
• Basic Database Notions
• Database Management Systems (DBMS)
• Programmer’s View, Data Independence
• Database Users and Database Tools
24
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Persistent Storage
When the output is a function of the input only, no persistent storage is necessary.
Computing the factorial (n n!)
Today
Tomorrow
5
factorial
1205
factorial
12025
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Persistent Storage
But the output can also be a function of the input and a persistent state.
Determining how many points Ann obtained for all her homework.
Today
Tomorrow
Ann Homework points 20
30 Homework points
Ann
26
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Persistent Storage
A DBS provides persistent state
Ann Homework points 30
Input Output
Persistent state
Persistent information
•Information that lives longer than a single process (program execution).
•Survives power outage and a reboot of the operating system.
27
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Persistent Storage
Which of the following processes/devices need persistent storage? If so, for which particular task?
1.Web browser
2.Pocket calculator
3.Mobile phone
4.Screen saver
5.DVD recorder
28
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Typed Persistent Data
Classical Way to Implement Persistence
•Information needed in subsequent program invocations is saved into a file.
•The operating system (OS) maintains the file on disk.
•Disks provide persistent memory: the content is not lost if the machine is switched off or the OS is rebooted.
•File systems are predecessors of modern DBMS.
29
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Typed Persistent Data
Implementing Persistence with Files
• OS files are usually nothing but sequences of bytes.
• A record structure must be defined on top of this (much like in Assembler languages):
• The record and file structure is contained only in the programmers’
heads.
• The OS file system cannot prevent misinterpretation, overflows, etc., because it is not aware of the file structure.
Typed Persistent Data (1)
37
• Classical way to implement persistence:
Information needed in subsequent program invocations is saved into a file.
The operating system (OS) maintains the file on disk.
Disks provide persistent memory: the contents is not lost if the machine is switched off or the OS is rebooted.
OS files and persistence
The above statement is basically true but care should be taken nevertheless. Why?
File systems are predecessors of modern DBMS.
Typed Persistent Data (2)
38
• Implementing persistence with files:
OS files are usually nothing but sequences of bytes.
A record structure must be defined on top of this (much like in Assembler languages):
0 40 42 44
A n n S m i t h . . . 0 3 1 0
The record and file structure is contained only in the
programmers’ heads.
The OS file system cannot prevent misinterpretation, overflows, etc., because it is not aware of the file structure
30
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Typed Persistent Data
Implementing Persistence with a DBMS
• The structure of the information to be stored must be defined in a way the DBMS understands:
• The file structure is formally documented.
• The system can detect type errors in application programs.
• Simplified programming (higher abstraction level).
SQL DDL command
CREATE TABLE SOLVED ( STUDENT VARCHAR(40), HOMEWORK NUMERIC(2), POINTS! NUMERIC(2) )
31
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
A Subprogram Library
• Most DBMSs use OS files to store the data. (Some use raw disk device access.)
• One can view a DBMS as a subprogram library that can be used for file access.
• Compared with the direct OS system calls for file access, the DBMS offers higher level operations.
• The DBMS offers a wide varietry of algorithms that one would otherwise have to program.
32
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
A Subprogram Library
•For instance, a typical Relational DBMS contains routines for
‣ sorting (e.g., external merge sort),
‣ searching (e.g., B-trees),
‣ file space management, buffer management,
‣ aggregation, statistical evaluation.
•The algorithms are optimized for large data sets (that do not fit into main memory).
•The DBMS also offers multi-user support (locking) and safety measures to protect data against system crashes.
33
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Physical Data Independence
•The DBMS is a layer of software above the OS files. The files can be accessed only via the DBMS.
•The DBMS may change the file structure internally (reorder records, splits files, etc.) for performance reasons.
•This goes unnoticed by the application program.
•Compare with the idea of abstract data types: The implementation changes, the interface is kept stable.
34
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Physical Data Independence
Typical Example
•At the beginning, a professor used the homeworks DB only for his courses in the current term.
•Since the DB was small and there were relatively few accesses, it was sufficient to store the data as a heap file.
•Later, the entire university used the DB, and information of previous courses had to be kept for some time.
•DB size grows significantly, DB access much more frequently.
•An index file (e.g., a B-tree) is now needed to provide fast access.
35
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Physical Data Independence
Without DBMS
•Using the new B-tree index to access the file must be explicitly built into the lookup (query) commands.
•Thus, application programs need to be changed if the mode of file access is changed.
•If one forgets to change a seldolmly used application program, and this program does not update the index when the data has been updated, the DB becomes inconsistent.
36
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Physical Data Independence
With Relational DBMS
•Already at the interface, the system completely hides the(non-)existence of indexes on files.
•Queries and updates do not have to and cannot refer to indexes.
•The system automatically
1. Modifies the index in case of data updates,
2. Uses the index to evaluate queries against the indexed data when advantageous.
37
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Physical Data Independence
Conceptual Schema (“interface”):
•
Only logical information content of the database, relevant to the subset of the real world modelled in the DB.•
Simplified view of the DB: physical storage details hidden.Internal/Physical Schema (“implementation”):
•
Indexes,•
Division of tables among disks,•
Storage management if tables grow or shrink,•
Placement of new rows in a table (sort order, clustering).38
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Physical Data Independence
•
The user enters a query (e.g., in SQL) that only refers to the conceptual schema.•
The DBMS translates this into a query/program (execution plan) which refers to the the internal schema. This is done by the query optimizer.•
The DBMS executes the translated query on the persistent instance of the internal schema.•
The DBMS translates the result back to the conceptual level.39
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Physical Data Independence
Changing the internal schema
Conceptual schema
Old internal schema (no B-tree index)
New internal schema (with B-tree index) Same conceptual schema
40
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Declarative Languages
•
Physical data independence requires that the query language (SQL) cannot refer to indexes.•
Declarative query languages go one step further:‣
Queries should only describe what information is sought,‣
but should not prescribe any particular method how to compute/retrieve the desired information.
Kowalski
Algorithm = Logic + Control
Imperative/procedural languages: explicit control, implicit logic Declarative/descriptive languages: implicit control, explicit logic
41
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Declarative Languages
•
SQL is a declarative language. The user describes conditions the requested data is required to fulfill:•
Often, simpler formulations of the same query are possible, but SQL users do not have to think about efficient execution.•
More concise than imperative programming: less expensive program development and maintenance.SQL Query
SELECT S.POINTS FROM SOLVED S
WHERE S.STUDENT = ‘ANN SMITH’
AND S.HOMEWORK = 3
42
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Declarative Languages
•
Declarative query languages‣
allow powerful optimizers (no evaluation method is prescribed)‣
need powerful optimizers (naïve evaluation is almost always too inefficient).•
Independence of current hardware technology and software quality: Today’s queries will use tomorrow’s DBMS setup and algorithms when a new version of the DBMS is released.43
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Logical Data Independence
•
Logical data independence allows for changes to the logical information content of the database.•
Such changes are obviously restricted to additions to the logical information content.•
Example: add column SUBMISSION_DATE to table SOLVED.•
Such additions may be required for new applications.•
It should not be necessary to change old applications only because records now contain additional information.•
Logical data independence is important when there are application programs with distinct, but overlapping information needs.44
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Logical Data Independence
External Schemas/Views
•
Logical data independence requires a third level of database schemas, the external schemas or views.•
Each user (department, . . . ) may have an individual view of the data.•
An external view contains a subset of the information in the database, maybe slightly restructured.Views may also be vital because of security reasons.
•
In contrast, the conceptual schema describes the complete information content of the database.45
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Three-Schema Architecture
Three-schema architecture [ANSI/Sparc 1978]
User 1 User n
External schema 1 External schema n
Conceptual schema
Internal schema ...
Stored data
46
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
More DBMS Functions
Transactions
•
Transactions are sequences of DB commands (queries and updates) that are executed as an atomic unit (“all or nothing”).•
DBMS may crash during/after a sequence of commands is/has been executed. The DBMS then performs undo /redo.•
Support for backup and recovery.•
Support of concurrent users.•
Each user is given the illusion to be the only DB user at any time.DBMS performs locking and conflict detection.
47
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
More DBMS Functions
Security & Integrity
Security
•
Access rights: Who may perform which operations on which table?•
Auditing: DBMS remembers who did what/when.Integrity
•
The DBMS checks that the entered data is plausible/complete (such checks may also span several tables).•
DBMS rejects updates (insertions and deletions) which would violate defined business rules.48
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
More DBMS Functions
Data Dictionary
Metadata (“data about data”, schema, user list, access rights) is availble in system tables, for instance:
System tables
Table_Name Owner SOLVED Melanie SYS_TABLES SYS SYS_COLUMNS SYS
SYS_TABLES Table_NameSOLVED Seq1 Col_NameStudent
SOLVED 2 Homework
SOLVED 3 Points
SYS_TABLES 1 Table_Name SYS_TABLES 2 Owner SYS_COLUMNS 1 Table_Name SYS_COLUMNS 2 Seq SYS_COLUMNS 3 Col_Name
SYS_COLUMNS
49
Chapter 1 Introduction
• Basic Database Notions
• Database Management Systems (DBMS)
• Programmer’s View, Data Independence
• Database Users and Database Tools
50
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Database Users
Database Administrator (DBA)
•
Should know about all schemas, may change the conceptual and the internal schema (creates tables, creates/drops indexes). Can damage everything.•
Gives access rights to users. Ensures security.•
Monitors system performance (Transaction throughput #TX/s,#concurrent users, index sizes, ...)
•
Monitors available disk space and installs new disks.•
Ensures that backup copies of the data are made. Does recovery after disk failures, etc.51
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Database Users
Application Programmer
•
Writes programs for standard, all-day tasks, to be used by the naïve users (see below):•
safe data entry,•
report generation,•
data browsing.•
Knows SQL well, plus programming languages and development tools.•
Usually supervised by DBA.•
Might do conceptual schema design (knows which table the application will need to access/create).52
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Database Users
End Users
Sophisticated User (one kind of “end user”)
•
Knows SQL and/or some query tools, may use SQL console.•
Does non-standard aggregations/evaluations of the data without help from application programmers.•
May generate complex queries.Naïve User (the other kind of “end user”)
•
Uses DB only via application programs, often unaware of existence of DBMS back-end.•
Primarily data entry user, simple browsing-style queries against external views.53
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Database Tools
•
Interactive SQL console•
Graphical/menu-based query tools•
Interface for DB access from standard programing lanugages (C, C++, Java)•
Tools for form-based DB application (4GL)•
Report generators•
Web interface•
Tools for data import/export, backup & recovery, performance monitoring, . . .54
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Summary
•
Functions of database systems‣
Persistence‣
Integration/redundancy avoidance‣
Physical and logical data independence‣
Subprogram library: many algorithms built-in, especially tuned for external memory access (disks)‣
Query and update evaluation55
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Summary
•
Functions of database systems (continued)‣
High data safety and availability (Backup & Recovery)‣
Combinations of operations into atomic transactions‣
Multi-user support: synchronization of concurrent accesses‣
Integrity enforcement‣
View management‣
Security via data access control‣
System catalog management (metadata)56
Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud
Summary
•
The main goal of the DBMS is to give the user a simplified view on the persistent storage, i.e., to hide any complications introduced by the DBMS physical layer.•
The user does not worry about‣
physical storage details‣
different information needs of other users‣
efficient query formulation‣
possibility of system crashes/disk failures‣
presence of concurrent users accessing identical data subsets.57