• No results found

data Databases

N/A
N/A
Protected

Academic year: 2022

Share "data Databases"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Databases

ch1. Introduction to Databases

2018. Fall Instructor: Joonho Kwon [email protected] Data Science Lab @ PNU

data

data science laboratory

These slides are based on the slides prepared by Peter Bailis.

(2)

Turing Awards in Data Management

Charles Bachman, 1973 IDS and CODASYL

Ted Codd, 1981 Relational model

You could be next!!

Jim Gray, 1998

Transaction processing

(3)

Lectures: A note about format of notes

3

These are asides / notes (still need to know these in general!)

Main point of slide / key takeaway at bottom

Definitions in blue with concept being defined bold & underlined

Warnings- pay attention here!

Take note!!

(4)

Definition of DBMS

Data models & the relational data model

Schemas & data independence

Outline

(5)

Database

A large, integrated collection of data

Models a real-world enterprise

Entities (e.g., Students, Courses)

Relationships (e.g., Alice is enrolled in 145)

What is DBMS?

5

A Database Management System (DBMS) is a piece of software designed to store and

manage databases

(6)

Give examples of DBMSs

Oracle, IBM DB2, Microsoft SQL Server, Altibase, Cubrid

Open source: MySQL (Sun/Oracle), PostgreSQL, MariaDB

Open source library: SQLite

We will focus on relational DBMSs most quarter

Database Management System

(7)

Consider building a course management system (CMS):

Students

Courses

Professors

Who takes what

Who teaches what

A Motivating, Running Example

7

Entities

Relationships

(8)

A data model is a collection of concepts for describing data

The relational model of data is the most widely used model today

Main Concept: the relation- essentially, a table

A schema is a description of a particular

collection of data, using the given data model

E.g. every relation in a relational data model has a schema describing types, etc.

Data models

(9)

Logical Schema

Students(sid: string, name: string, gpa: float)

Courses(cid: string, cname: string, credits: int)

Enrolled(sid: string, cid: string, grade: string)

Modeling the CMS

9

sid Name Gpa

101 Bob 3.2

123 Mary 3.8

Students

cid cname credits

564 564-2 4

308 417 2

Courses

sid cid Grade

123 564 A

Enrolled

Relations

(10)

Logical Schema

Students(sid: string, name: string, gpa: float)

Courses(cid: string, cname: string, credits: int)

Enrolled(sid: string, cid: string, grade: string)

Modeling the CMS

sid Name Gpa

101 Bob 3.2

123 Mary 3.8

Students

cid cname credits

564 564-2 4

308 417 2

Courses

sid cid Grade

123 564 A

Corresponding keys

(11)

Physical Schema: describes data layout

Relations as unordered files

Some data in sorted order (index)

Logical Schema: Previous slide

External Schema: (Views)

Course_info(cid: string, enrollment: integer)

Derived from other tables

Other Schemata…

11

Applications Administrators

(12)

Concept: Applications do not need to worry about how the data is structured and stored

Data independence

Logical data independence:

protection from changes in the logical structure of the data

Physical data independence:

protection from physical layout changes

I.e. should not need to ask: can we add a new entity or attribute without rewriting the application?

I.e. should not need to ask: which disks are the data stored on? Is the data indexed?

(13)

Transactions

Concurrency & locking

Atomicity & logging

Summary

Overview of DBMS topics

13

(14)

Suppose that our CMS application serves 1000’s of users or more- what are some challenges?

Challenges with Many Users

DBMS allows user to write programs as if they were the only user

Disk/SSD access is slow, DBMS hide the latency by doing more CPU work concurrently

• Security: Different users, different roles

• Performance: Need to provide concurrent access

• Consistency: Concurrency can lead to update problems

We won’t look at too much in this course, but is extremely important

(15)

A key concept is the transaction (TXN): an atomic sequence of db actions (reads/writes)

Transactions

15

Atomicity: An action either completes entirely or not at all

Acct Balance a10 20,000 a20 15,000

Transfer $3k from a10 to a20:

1. Debit $3k from a10 2. Credit $3k to a20

• Crash before 1,

• After 1 but before 2,

• After 2.

Written naively, in which states is atomicity

preserved?

DB Always preserves atomicity!

Acct Balance a10 17,000 a20 18,000

(16)

A key concept is the transaction (TXN): an atomic sequence of db actions (reads/writes)

If a user cancels a TXN, it should be as if nothing happened!

Transactions leave the DB in a consistent state

Users may write integrity constraints, e.g., ‘each course is assigned to exactly one room’

Transactions

Atomicity: An action either completes entirely or not at all

Consistency: An action results in a state which conforms to all integrity constraints

(17)

The DBMS ensures that the execution of {T1,…,Tn} is equivalent to some serial execution

One way to accomplish this: Locking

Before reading or writing, transaction requires a lock from DBMS, holds until the end

Key Idea: If Ti wants to write to an item x and Tj wants to read x, then Ti, Tj conflict. Solution via locking:

only one winner gets the lock

loser is blocked (waits) until winner finishes

Challenge: Scheduling Concurrent Transactions

17

A set of TXNs is isolated if their effect is as if all were executed serially

What if Ti and Tj need X and Y, and Ti asks for X before Tj, and Tj asks for Y before Ti? -> Deadlock! One is

aborted…

All concurrency issues handled by the DBMS…

(18)

DBMS ensures atomicity even if a TXN crashes!

One way to accomplish this: Write-ahead logging (WAL)

Key Idea: Keep a log of all the writes done.

After a crash, the partially executed TXNs are undone using the log

Ensuring Atomicity & Durability

Write-ahead Logging (WAL):

Before any action is finalized, a corresponding log entry is forced to disk

We assume that the log is on

“stable” storage

(19)

Persistent storage of data

Logical data model; declarative queries and updates

→ physical data independence

Multi-user concurrent access

Safety from system failures

Performance, performance, performance

Massive amounts of data (terabytes~petabytes)

High throughput (thousands~millions transactions/hour)

High availability (≥ 99.999% uptime)

Standard DBMS features

19

(20)

Much of the OS may be bypassed for performance and safety

Standard DBMS architecture

Application

DBMS

OS

Disk(s)

Queries/modifications Answers/responses

File system interface

Storage system interface

(21)

DBMS Components

21

Query Optimization and Execution Relational Operators

Access Methods

Buffer Management Disk Space Management

•Makes efficient use of disk space

Think 300,000,000 accounts!

•Makes efficient use of RAM

Think 1,000,000 simultaneous requests!

•Provides generic ways to combine data

Do you want a list of customers and accounts or the total account balance of all customers?

•Figures out the best way to answer a question

There is always more than 1 way to skin a cat…!

•Provides efficient ways to extract data

Do you need 1 record or a bunch?

Database application

•Talks to DBMS to manage data for a specific task -> e.g. app to withdraw/deposit money or provide a history of the account

Disk(s)

(22)

End users and DBMS vendors

Reduces cost and makes money

DB application programmers

Can handle more users, faster, for cheaper, and with better reliability / security guarantees!

Database administrators (DBA)

Easier time of designing logical/physical schema, handling security/authorization, tuning, crash

A Well-Designed DBMS makes many people happy!

Must still understand DB internals

(23)

DB application developer

writes programs that query and modify data

DB designer

establishes schema

DB administrator

loads data, tunes system, keeps whole thing running

Data analyst

data mining, data integration

DBMS implementor

builds the DBMS

Research on new systems

The players

23

(24)

DBMS are used to maintain, query, and manage large datasets.

Provide concurrency, recovery from crashes, quick application development, integrity, and security

Key abstractions give data independence

DBMS R&D is one of the broadest, most exciting fields in CS. Fact!

Summary of DBMS

(25)

Q&A

25

References

Related documents

The Rising Star recognition is awarded to an Adrian College Athletic Training Student who exhibits exemplary academic and clinical potential in the initial phase of their

SERCOS Drive or Module Transmit Optometer Set to measure power in dBm 1 meter in length reference cable... System Field Testing Verify Receiver Power Verify receiver power

Application of the death penalty is consistently and actively extolled as an effective deterrence to drug-related crimes and low rates of drug use in Singapore within

In a seminal contribution, Lewis (1996) suggests that high degrees of interna- tional risk-sharing might be generated by the non-separability of tradable and non- tradable goods in

In sum, the outlined perspectives of mirror neurons, their explanation in terms of action perception circuits, APCs, and the broader strategy to build cognition and communicative

 The  environment  is   constantly  participating  in  their  creation.. On the countermonument see Young’s chapter “The Countermonument: Memory against Itself

The buffer contains a solution of glycolic acid and its sodium salt, sodium glycolate.. Explain what a buffer is and how this buffer

The second part analyses the governance structure of Bitcoin, which can be understood as a two-layered construct: an infrastructure seeking to govern user behaviour via a