• No results found

Introduction to Database Systems. Chapter 1 Introduction. Chapter 1 Introduction

N/A
N/A
Protected

Academic year: 2021

Share "Introduction to Database Systems. Chapter 1 Introduction. Chapter 1 Introduction"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

Introduction to Database Systems

Winter term 2013/2014

Melanie Herschel [email protected] Université Paris Sud, LRI

1

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Chapter 1

Introduction

• After completing this chapter, you should be able to:

‣ Explain basic notions: database state, schema, query, update, data model, DDL, DML,

‣ Explain the role of the DBMS,

‣ Explain data independence,

declarativity, and the three-schema architecture,

‣ Name different classes of users of a database application system,

‣ Name some DBMS tools.

2

1. Introduction 2. ER-Modeling 3. Relational model(ing) 4. Relational algebra 5. SQL

Chapter 1 Introduction

• Basic Database Notions

• Database Management Systems (DBMS)

• Programmer’s View, Data Independence

• Database Users and Database Tools

3

(2)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Task of a Database

Query Answering

• What is a database? Difficult question. There is no precise and generally accepted definition.

• Naïve approach: The main task of a database system (DBS) is to answer certain questions (queries) about a subset of the real world, e.g.,

Questioning a DBS

Which homeworks has

Ann Smith completed? 1

Database System

2

4

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Task of a Database

Data Storage

• The DBS acts only as a storage for information. The information must first be entered and kept current.

• In a sense, a DBS is a computerized version of a card-index box / filing cabinet (but more powerful and efficient).

Keeping a DBS current

Ann Smith has done Homework 3 and

received 10 points for it.

Database System

OK

5

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Task of a Database

Efficient Data Retrieval and Combination

• Normal database systems do not perform particularly complicated computations on the stored data in order to answer questions.

• However, a DBS can retrieve the requested data quickly from a huge set of data (giga bytes, tera bytes, ≫ main memory).

• A DBS can also aggregate / combine several pieces of stored data to answer more complex questions (“Compute the average points for Homework 3”).

6

(3)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Task of a Database

SQL

• Above, the question “Which homework has Ann Smith completed?” was shown in natural (English) language.

• Making machines understand natural language is a tough task (and bears a large potential for misunderstanding).

• Therefore, questions (or queries) are normally written in a formal language, these days typically SQL.

SQL

•SQL: Structured Query Language, developement started in 1986, current version SQL:2003.

•Pronounced S-Q-L or Sequel.

7

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

State, Query, Update

• The set of stored data is called the database state.

• Entering, modifying, or deleting information (i.e., making updates) changes the database state.

State & query

Update

current state

SQL query

Answer SELECT Homework FROM Solved

WHERE Student = ‘Ann Smith’

current state

Update INSERT INTO Solved

VALUES (‘Ann Smith’, 3, 10)

new state

8

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Structured Information

• Each database can store only information of a predeclared structure (a limited domain of disclosure):

• Because the data are structured, not simply text, complex query formulations are possible, e.g., “How many homeworks has each student done?”.

Update

Today’s special in the

cafeteria is pizza.

Homework DBS

Error

9

(4)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Structured Information

• Actually, a database system stores only plain data (character, strings, numbers), and not information.

• Data becomes information by interpretation.

• Therefore, real-world concepts like students, homework,

cafeterias, etc., need to be defined/declared before the database can be used.

A pure text database?

Which types of questions could we pose on a DBS storing text (character strings) only with no further structure provided?

10

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

State vs. Schema

Database schema

• Formal definition of the structure of the database contents.

• Determines the possible database states.

Defined only once (when DB is created).

• In a programming language, this corresponds to variable declarations (assigning a type to a variable)

Variable declaration

Example of a variable declaration in C: short int i;

Possible states of variable i: -32768 ≤ i ≤ 32767

11

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

State vs. Schema

Database state (instance of the schema)

• Contains the actual data, structured according to the schema.

Changes often (whenever database information is updated).

• Corresponds to current contents/values of a programming language variable.

Variable state change

In state s, variable i has value 41. Now, perform state change (s to s’) via assignment i = i + 1.

12

(5)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

State vs. Schema

•In the relational model, the data is structured in form of tables (relations).

•Each table has a name, a sequence of named columns (attributes), and a set of rows (tuples).

A table

Student Homework Points

Ann Smith 1 10

Ann Smith 2 8

Michael Jones 1 9

Michael Jones 2 9

SOLVED

DB schema

DB state

13

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Data Model

•Defines a formal language (syntax and semantics) for

‣ Declaring database schema

‣ Querying the current database state

‣ Changing the database state

‣ “Data model” is, regrettably, widely used for “database schema”

•Examples

• (Network model, hierarchical model,) relational model, entity relationship model, object-oriented models, UML, XML.

14

Chapter 1 Introduction

• Basic Database Notions

• Database Management Systems (DBMS)

• Programmer’s View, Data Independence

• Database Users and Database Tools

15

(6)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

DBMS

•A Database Management System (DBMS) is an application–

independent software system that implements a data model, i.e., allows for

Definition of a DB schema for some concrete application,

• Storage of an instance of this schema on, e.g., a disk,

Querying the current instance (database state),

Changing the database state.

16

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

DBMS

• Normal users do not need to use SQL for their daily tasks of data entry or data lookup.

• These users use application programs that have been developed specifically for this task and offer a more accessible user interface.

• Internally, these application programs translate the user requests into SQL statements (queries, updates) in order to communicate with the DBMS.

17

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

DBMS

• Often, several different application programs are used to access the same centralized database.

• For example, the Homework DBS might provide:

‣ A read–only web interface for students.

‣ A program used by the teaching assistant to load homework and exam points.

‣ A program that prints a report for the professor used to assign grades.

• The interactive SQL interface (SQL console) that comes with the DBMS is simply yet another way to access the DBMS.

18

(7)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

DBMS

Accessing a DBMS

User 1 User 2

Application

Program DBMS Tool

(e.g., SQL console)

Database Management System (DBMS)

DB

schema DB

state

19

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

DB Application Systems

• Often, different users access the same database concurrently (i.e., at the same time, touching the same data).

• The DBMS is usually implemented as a background server process (or set of such processes) that is accessed over the network by application programs (clients).

• One can also view the DBMS as an extension of the operating system (a more powerful file system).

20

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

DB Application Systems

Client-Server Architecture

Client User 1 (Application)

Client User 2 (SQL console) Network ...

Server (DBMS)

21

(8)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

DB Application Systems

Three-Tier Architecture

Server (DBMS)

Application Server (App. Web Server)

Thin client (Browser)

22

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

DB Application Systems

A Recap of Database Vocabulary

• A database (DB) consists of a DB schema and a DB state.

• A database management system (DBMS) is a software system that implements a data model (e.g., a relational DBMS (RDBMS) implements the relational model).

• A database system (DBS) consists of a DBMS and a database.

• A database application system consists of a DBS and a set of application programs.

23

Chapter 1 Introduction

• Basic Database Notions

• Database Management Systems (DBMS)

• Programmer’s View, Data Independence

• Database Users and Database Tools

24

(9)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Persistent Storage

When the output is a function of the input only, no persistent storage is necessary.

Computing the factorial (n n!)

Today

Tomorrow

5

factorial

120

5

factorial

120

25

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Persistent Storage

But the output can also be a function of the input and a persistent state.

Determining how many points Ann obtained for all her homework.

Today

Tomorrow

Ann Homework points 20

30 Homework points

Ann

26

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Persistent Storage

A DBS provides persistent state

Ann Homework points 30

Input Output

Persistent state

Persistent information

•Information that lives longer than a single process (program execution).

•Survives power outage and a reboot of the operating system.

27

(10)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Persistent Storage

Which of the following processes/devices need persistent storage? If so, for which particular task?

1.Web browser

2.Pocket calculator

3.Mobile phone

4.Screen saver

5.DVD recorder

28

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Typed Persistent Data

Classical Way to Implement Persistence

•Information needed in subsequent program invocations is saved into a file.

•The operating system (OS) maintains the file on disk.

•Disks provide persistent memory: the content is not lost if the machine is switched off or the OS is rebooted.

•File systems are predecessors of modern DBMS.

29

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Typed Persistent Data

Implementing Persistence with Files

• OS files are usually nothing but sequences of bytes.

• A record structure must be defined on top of this (much like in Assembler languages):

• The record and file structure is contained only in the programmers’

heads.

• The OS file system cannot prevent misinterpretation, overflows, etc., because it is not aware of the file structure.

Typed Persistent Data (1)

37

• Classical way to implement persistence:

Information needed in subsequent program invocations is saved into a file.

The operating system (OS) maintains the file on disk.

Disks provide persistent memory: the contents is not lost if the machine is switched off or the OS is rebooted.

OS files and persistence

The above statement is basically true but care should be taken nevertheless. Why?

File systems are predecessors of modern DBMS.

Typed Persistent Data (2)

38

• Implementing persistence with files:

OS files are usually nothing but sequences of bytes.

A record structure must be defined on top of this (much like in Assembler languages):

0 40 42 44

A n n S m i t h . . . 0 3 1 0

The record and file structure is contained only in the

programmers’ heads. 

The OS file system cannot prevent misinterpretation, overflows, etc., because it is not aware of the file structure

30

(11)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Typed Persistent Data

Implementing Persistence with a DBMS

• The structure of the information to be stored must be defined in a way the DBMS understands:

• The file structure is formally documented.

• The system can detect type errors in application programs.

• Simplified programming (higher abstraction level).

SQL DDL command

CREATE TABLE SOLVED ( STUDENT VARCHAR(40), HOMEWORK NUMERIC(2), POINTS! NUMERIC(2) )

31

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

A Subprogram Library

• Most DBMSs use OS files to store the data. (Some use raw disk device access.)

• One can view a DBMS as a subprogram library that can be used for file access.

• Compared with the direct OS system calls for file access, the DBMS offers higher level operations.

• The DBMS offers a wide varietry of algorithms that one would otherwise have to program.

32

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

A Subprogram Library

•For instance, a typical Relational DBMS contains routines for

‣ sorting (e.g., external merge sort),

‣ searching (e.g., B-trees),

‣ file space management, buffer management,

‣ aggregation, statistical evaluation.

•The algorithms are optimized for large data sets (that do not fit into main memory).

•The DBMS also offers multi-user support (locking) and safety measures to protect data against system crashes.

33

(12)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Physical Data Independence

•The DBMS is a layer of software above the OS files. The files can be accessed only via the DBMS.

•The DBMS may change the file structure internally (reorder records, splits files, etc.) for performance reasons.

•This goes unnoticed by the application program.

•Compare with the idea of abstract data types: The implementation changes, the interface is kept stable.

34

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Physical Data Independence

Typical Example

•At the beginning, a professor used the homeworks DB only for his courses in the current term.

•Since the DB was small and there were relatively few accesses, it was sufficient to store the data as a heap file.

•Later, the entire university used the DB, and information of previous courses had to be kept for some time.

•DB size grows significantly, DB access much more frequently.

•An index file (e.g., a B-tree) is now needed to provide fast access.

35

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Physical Data Independence

Without DBMS

•Using the new B-tree index to access the file must be explicitly built into the lookup (query) commands.

•Thus, application programs need to be changed if the mode of file access is changed.

•If one forgets to change a seldolmly used application program, and this program does not update the index when the data has been updated, the DB becomes inconsistent.

36

(13)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Physical Data Independence

With Relational DBMS

•Already at the interface, the system completely hides the(non-)existence of indexes on files.

•Queries and updates do not have to and cannot refer to indexes.

•The system automatically

1. Modifies the index in case of data updates,

2. Uses the index to evaluate queries against the indexed data when advantageous.

37

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Physical Data Independence

Conceptual Schema (“interface”):

Only logical information content of the database, relevant to the subset of the real world modelled in the DB.

Simplified view of the DB: physical storage details hidden.

Internal/Physical Schema (“implementation”):

Indexes,

Division of tables among disks,

Storage management if tables grow or shrink,

Placement of new rows in a table (sort order, clustering).

38

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Physical Data Independence

The user enters a query (e.g., in SQL) that only refers to the conceptual schema.

The DBMS translates this into a query/program (execution plan) which refers to the the internal schema. This is done by the query optimizer.

The DBMS executes the translated query on the persistent instance of the internal schema.

The DBMS translates the result back to the conceptual level.

39

(14)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Physical Data Independence

Changing the internal schema

Conceptual schema

Old internal schema (no B-tree index)

New internal schema (with B-tree index) Same conceptual schema

40

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Declarative Languages

Physical data independence requires that the query language (SQL) cannot refer to indexes.

Declarative query languages go one step further:

Queries should only describe what information is sought,

but should not prescribe any particular method how to compute/

retrieve the desired information.

Kowalski

Algorithm = Logic + Control

Imperative/procedural languages: explicit control, implicit logic Declarative/descriptive languages: implicit control, explicit logic

41

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Declarative Languages

SQL is a declarative language. The user describes conditions the requested data is required to fulfill:

Often, simpler formulations of the same query are possible, but SQL users do not have to think about efficient execution.

More concise than imperative programming: less expensive program development and maintenance.

SQL Query

SELECT S.POINTS FROM SOLVED S

WHERE S.STUDENT = ‘ANN SMITH’

AND S.HOMEWORK = 3

42

(15)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Declarative Languages

Declarative query languages

allow powerful optimizers (no evaluation method is prescribed)

need powerful optimizers (naïve evaluation is almost always too inefficient).

Independence of current hardware technology and software quality: Today’s queries will use tomorrow’s DBMS setup and algorithms when a new version of the DBMS is released.

43

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Logical Data Independence

Logical data independence allows for changes to the logical information content of the database.

Such changes are obviously restricted to additions to the logical information content.

Example: add column SUBMISSION_DATE to table SOLVED.

Such additions may be required for new applications.

It should not be necessary to change old applications only because records now contain additional information.

Logical data independence is important when there are application programs with distinct, but overlapping information needs.

44

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Logical Data Independence

External Schemas/Views

Logical data independence requires a third level of database schemas, the external schemas or views.

Each user (department, . . . ) may have an individual view of the data.

An external view contains a subset of the information in the database, maybe slightly restructured.

Views may also be vital because of security reasons.

In contrast, the conceptual schema describes the complete information content of the database.

45

(16)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Three-Schema Architecture

Three-schema architecture [ANSI/Sparc 1978]

User 1 User n

External schema 1 External schema n

Conceptual schema

Internal schema ...

Stored data

46

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

More DBMS Functions

Transactions

Transactions are sequences of DB commands (queries and updates) that are executed as an atomic unit (“all or nothing”).

DBMS may crash during/after a sequence of commands is/has been executed. The DBMS then performs undo /redo.

Support for backup and recovery.

Support of concurrent users.

Each user is given the illusion to be the only DB user at any time.

DBMS performs locking and conflict detection.

47

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

More DBMS Functions

Security & Integrity

Security

Access rights: Who may perform which operations on which table?

Auditing: DBMS remembers who did what/when.

Integrity

The DBMS checks that the entered data is plausible/complete (such checks may also span several tables).

DBMS rejects updates (insertions and deletions) which would violate defined business rules.

48

(17)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

More DBMS Functions

Data Dictionary

Metadata (“data about data”, schema, user list, access rights) is availble in system tables, for instance:

System tables

Table_Name Owner SOLVED Melanie SYS_TABLES SYS SYS_COLUMNS SYS

SYS_TABLES Table_NameSOLVED Seq1 Col_NameStudent

SOLVED 2 Homework

SOLVED 3 Points

SYS_TABLES 1 Table_Name SYS_TABLES 2 Owner SYS_COLUMNS 1 Table_Name SYS_COLUMNS 2 Seq SYS_COLUMNS 3 Col_Name

SYS_COLUMNS

49

Chapter 1 Introduction

• Basic Database Notions

• Database Management Systems (DBMS)

• Programmer’s View, Data Independence

• Database Users and Database Tools

50

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Database Users

Database Administrator (DBA)

Should know about all schemas, may change the conceptual and the internal schema (creates tables, creates/drops indexes). Can damage everything.

Gives access rights to users. Ensures security.

Monitors system performance (Transaction throughput #TX/s,

#concurrent users, index sizes, ...)

Monitors available disk space and installs new disks.

Ensures that backup copies of the data are made. Does recovery after disk failures, etc.

51

(18)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Database Users

Application Programmer

Writes programs for standard, all-day tasks, to be used by the naïve users (see below):

safe data entry,

report generation,

data browsing.

Knows SQL well, plus programming languages and development tools.

Usually supervised by DBA.

Might do conceptual schema design (knows which table the application will need to access/create).

52

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Database Users

End Users

Sophisticated User (one kind of “end user”)

Knows SQL and/or some query tools, may use SQL console.

Does non-standard aggregations/evaluations of the data without help from application programmers.

May generate complex queries.

Naïve User (the other kind of “end user”)

Uses DB only via application programs, often unaware of existence of DBMS back-end.

Primarily data entry user, simple browsing-style queries against external views.

53

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Database Tools

Interactive SQL console

Graphical/menu-based query tools

Interface for DB access from standard programing lanugages (C, C++, Java)

Tools for form-based DB application (4GL)

Report generators

Web interface

Tools for data import/export, backup & recovery, performance monitoring, . . .

54

(19)

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Summary

Functions of database systems

Persistence

Integration/redundancy avoidance

Physical and logical data independence

Subprogram library: many algorithms built-in, especially tuned for external memory access (disks)

Query and update evaluation

55

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Summary

Functions of database systems (continued)

High data safety and availability (Backup & Recovery)

Combinations of operations into atomic transactions

Multi-user support: synchronization of concurrent accesses

Integrity enforcement

View management

Security via data access control

System catalog management (metadata)

56

Introduction to Database Systems | winter term 2013/14 | Melanie Herschel | Université Paris Sud

Summary

The main goal of the DBMS is to give the user a simplified view on the persistent storage, i.e., to hide any complications introduced by the DBMS physical layer.

The user does not worry about

physical storage details

different information needs of other users

efficient query formulation

possibility of system crashes/disk failures

presence of concurrent users accessing identical data subsets.

57

References

Related documents

Copy the following coding line and paste it where you want your document to be: * {{File | Filename.pdf | Text to be displayed}}. Then

When applied to measures with overlap in item content across correlated constructs, in the absence of a global factor and cross-loadings, CFA is forced to account for the shared

To self-strengthen, the Qing used the social capital of Han leaders to persuade Chosŏn into signing more international treaties under Qing supervision, which would help China

163 These seven stages for Wang’s conversion process are like this: context (Chinese society, Confucianism [mainly], Taoism and Buddhism); crisis (social crisis of Ming Dynasty

parent or legal guardian of such a participant in the AWFE Program, in consideration of my or another’s participation in the AWFE Program, I hereby, for myself and any participant

In South Africa, participants were recruited from the chronic care units in four primary health care clinics. Recruitment into the depression treatment cohort followed the same

UJCLOCCOUOUJJC q2dflpLJfl1J QO/IJJUJGIJ2 !'JPH!A OL flUM1!J!UIJC2 tO L!2C 2I1jjC!CUt }CI UJJTJCC2 JJAC qtoUJJ) pccu t tpG GW4 o, [9Iu uJc4c,2. Iv 'c1t

To allow loads to be lowered using strand jacks, generally some type of strand guide is placed above the jack. This guide is generally a curved steel frame designed to support most