• No results found

1/20/11. Outline. Database Management Systems. Prerequisites. Staff and Contact Information. Course Web Site. 645: Database Design and Implementation

N/A
N/A
Protected

Academic year: 2021

Share "1/20/11. Outline. Database Management Systems. Prerequisites. Staff and Contact Information. Course Web Site. 645: Database Design and Implementation"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

645: Database Design and

Implementation

Yanlei Diao

University of Massachusetts Amherst

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Outline

•  Course topics and requirements

•  Overview of databases and DBMS’s

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Database Management Systems

•  Fundamentals

–  Data modeling –  Relational design –  Query language SQL –  Database theory

•  Database implementation

–  Storage, indexing

–  Query processing, query optimization –  Transaction management

•  Networked information systems

–  XML, XQuery, XML query processing –  Data stream systems

–  Probabilistic databases

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Prerequisites

•  An undergraduate database or operating

systems course

•  Or consent of the instructor

–  Data structures and algorithms

–  Computer systems

–  Sufficient programming experience

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Course Web Site

http://avid.cs.umass.edu/courses/645/s2011/

Or

Yanlei’s web page →

Teaching →

645 Spring 2011

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Staff and Contact Information

•  Instructor: Yanlei Diao

–  Email: yanlei at cs.umass.edu

–  Office hours: Tu Th 10:15-11:15 am, CS 232

•  TA: Liping Peng

–  Email: lppeng at cs.umass.edu

–  Office hour: TBD

•  Mailing list

–  cs645 at edlab-mail.cs.umass.edu

–  course related information, relevant to the whole class

•  All available on the

home

page

(2)

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Textbook

Database Management Systems

3

rd

Edition

Ramakrishnan and Gehrke

Lecture notes will be posted on the

schedule page before class.

- no laptop in class

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Grading

•  Homework: 35%

•  Course Project: 30%

•  Midterm: 30%

•  Class participation: 5%

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Homework: 35%

•  5 assignments

–  Written problem sets

–  Programming exercises

•  Posted on the

assignments

page

•  Watch the due date!

•  Submission: hardcopy before class on due date

•  Policy on late submissions

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Exam (30%)

•  Midterm exam

–  In-class, closed-book exam

–  April 5, location TBD

•  No final exam!

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Project: 30%

•  Groups of 2 or work individually

•  Research-oriented problem

–  Nobody has solved the problem

–  Scientific value

•  Milestones & deliverables: see the

projects

page

•  Submission: via email

–  Proposal, status report: before class on due date

–  Final report: 5 pm on the due date

•  In-class presentation

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Participation (5%)

•  Attend every class

(3)

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Academic Honesty

•  All submitted work must be your own!

–  Although students are encouraged to study together, each student much produce his or her own solution to each homework. –  Copying or using sections of someone else’s program or

assignment (even if it has been modified by you), or copying a solution from an external source, is not acceptable.

–  The University guidelines for academic misconduct: http://www.umass.edu/dean_students/code_conduct/ acad_honest.htm

–  The staff of CS 645 will be vigorous in enforcing them.

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Outline

•  Course topics and requirements

•  Overview of databases and DBMS’s

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Databases and DBMS’s

•  A database is a large, integrated collection of data

•  A database management system (DBMS) is a

software system designed to store and manage a

large amount of data

–  Declarative interface to define data stored, add data,

update data, and query data

–  Efficient querying

–  Concurrent users

–  Reliable storage and crash recovery

–  Access control…

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Early DBMS’s

•  Early DBMS’s (1960’s) evolved from file systems

•  Many small data items, many queries and updates

–  Banking

–  Airline reservations

•  1960s Navigational DBMS

–  Tree-based or graph-based data model

–  Manual navigation to find what you want

–  No support for “search” (“search” ≠ “program”)

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Relational DBMS

•  Relational model (E.F. Codd, 1970)

–  Data independence: hides details of physical storage

from users

–  Declarative query language: say what you want, not

how to compute it

–  Mathematical foundation: what queries mean,

possible implementations

•  Query optimization (1970’s till now)

–  Earliest: System R at IBM, INGRES at UC Berkeley

–  Queries can be efficiently executed despite data

independence and declarative queries!

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Commercial DBMS’s

System R

INGRES

Material in this slide based on wikipedia

Sybase

Informix

Postgres

MS SQL Server

IBM DB2

Oracle

MySQL

(4)

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Recent Database Applications

•  Data warehouses

–  Large amounts of data over years, complex queries,

designed for analysis and reporting

–  Sales data analysis, e.g., Walmart, Target, …

–  Fraud analysis, e.g., credit card use, insurance

–  Call record analysis, e.g., AT&T

–  Schema design, data cleaning and loading, indexing,

aggregation, materialized views, data mining…

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Recent Database Applications

•  Electronic commerce

–  E.g., amazon.com, ebay.com

–  How do we integrate thousands of catalogs?

–  How do we store data of sparse attributes

–  How do we handle the scale?

•  Social networking

–  E.g., facebook.com, myspace.com, with 500 million

users at a popular site

–  Real-time analysis of user behavior

–  Database Security…

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Recent Database Applications

•  Large-scale sensing applications

–  Temperature, light, humidity…

–  Global Positioning System

–  Radio Frequency Identification (RFID)

–  Radar sensor networks

–  Digital sky surveys

–  How do we store, merge, and query sensing data?

–  How do we support mission-critical applications?

T V

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Recent Database Applications

•  Bioinformatics

–  Amino acid sequences: e.g., SWISS-PROT

–  3D molecular structure: e.g., PDB, CSD

–  Data integration

–  Pattern matching

–  Approximate matching, ranking

–  Automatic inference…

1/20/11 Yanlei Diao, University of Massachusetts Amherst

How does one build a database?

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Example: The Internet Shop*

•  DBDudes Inc.: a well-known database consulting

firm

•  Barns and Nobble (B&N): a large bookstore

specializing in books on horse racing

•  B&N decides to go online but needs help

•  Step 0: DBDudes makes B&N agree to

–  pay steep fees and

–  schedule a lunch meeting for requirements analysis

* The example and all related material was taken from “Database Management Systems” Edition 3.

(5)

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Step 1: Requirements Analysis

•  “I’d like my customers to be able to browse my catalog of

books and place orders online.”

–  Books:

•  For each book, B&N’s catalog contains its ISBN number, title, author, price, year of publication, …

–  Customers:

•  Most customers are regulars with names and addresses registered with B&N.

•  New customers must first call and establish an account. –  On the new website:

•  Customers identify themselves before browsing and ordering. •  Each order contains the ISBN of a book and a quantity. –  Shipping:

•  For each order, B&N ships all copies of a book together once they become available.

1/20/11 Yanlei Diao, University of Massachusetts Amherst

•  A high level description of the data in terms of the

Entity-Relationship (ER)

model.

•  Design review:

–  What if a customer places two orders of the same book in one day? –  What if a customer wants to have different books in the same order?

Step 2: Conceptual Design

cardnum ship_date

Books Orders Customers

isbn title

author qty_in_stock price

Year cid address

cname order_date

qty

ordernum

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Step 3: Logical Design

•  Mapping the ER diagram to the

relational model

•  Access control: use

views

to restrict the access of certain

employees to customer sensitive information

CREATE TABLE Books

(isbn CHAR(10), title CHAR(80), author CHAR(80), qty_in_stock INTEGER, price REAL, year INTEGER, PRIMARY KEY(isbn)) CREATE TABLE Customers (cid INTEGER, cname CHAR(80), address CHAR(200), PRIMARY KEY(cid))

CREATE TABLE Orders (ordernum INTEGER, isbn CHAR(10), cid INTEGER, cardnum CHAR(16), qty INTEGER, order_date DATE, ship_date DATE, PRIMARY KEY(ordernum, isbn), FOREIGN KEY (isbn) REFERENCES Books, FOREIGN KEY (cid) REFERENCES Customers)

CREATE VIEW OrderInfo (isbn, cid, qty, order_date, ship_date) AS SELECT O.isbn, O.cid, O.qty,

O.order_date, O.ship_date FROM Orders O

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Step 4: Schema Refinement

ordernum isbn cid cardnum qty order_date ship_date 120 0-07-11 123 40241160 2 Jan 3, 2006 Jan 6, 2006 120 1-12-23 123 40241160 1 Jan 3, 2006 Jan 11, 2006 120 0-07-24 123 40241160 3 Jan 3, 2006 Jan 26, 2006 120 ordernum 120 120 isbn 0-07-11 1-12-23 0-07-24 qty 2 1 3 Jan 26, 2006 ship_date Jan 6, 2006 Jan 11, 2006 ordernum 120 cid 123 cardnum 40241160 order_date Jan 3, 2006 Orders

Orders

Redundant Storage!

Orderlists

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Step 5: Internet Application Development

Database System

(MySQL, DB2…)

Application Server

(Apache Tomcat…)

Client Program

(Web Browser)

Presentation tier

Application logic tier

Data management tier

B&N Data:•  Books •  Customers (User login) •  Orders •  Orderlists B&N Business logic: •  Home page •  Login page •  Search page •  Cart page •  Confirm page HTML, Javascript, Cookies PHP, JSP, Servlets, XSLT Relational, XML B&N Client: •  User input •  Display of output 1/20/11 Yanlei Diao, University of Massachusetts Amherst

(6)

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Example SQL Queries

SELECT isbn, title, author, price FROM Books

WHERE author = '<SearchString>' ORDER BY title

Search Page

SELECT cid, username, password FROM Customers

WHERE username = '<SpecifiedUsername>' Login Page

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Step 6: Physical Design

•  Auxiliary data structures,

indexes

, to speed up

searches, e.g., hash index, B+tree, R-tree

Books

Hash Indexon Books.isbn

Spectacular Bid title Legacies of the Turf Seattle Slew isbn 0-07-11 1-12-23 0-07-24 qty 10 0 3 Timothy Capps author Edward L. Bowen Dan Mearns price 29.95 24.95 16.95 year 2003 2000 2001 … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … H1 1 2 3 … … … N-1 isbn number 1/20/11 Yanlei Diao, University of Massachusetts Amherst

What is inside DBMS?

1/20/11 Yanlei Diao, University of Massachusetts Amherst

DBMS

DBMS Architecture

Query Parser Query Rewriter Query Optimizer Query Executor Query Processor Transactional Storage Manager Disk Space Manager

DB Access Methods

Buffer Manager

Lock Manager Log Manager

Disk Manager

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Query Processor

Query Parser Query Rewriter Query Optimizer

Query Executor

SELECT C.cname, F.ordernum, F.order_date FROM Customers C, OrderInfo F WHERE C.cname = “John”

C.cid = F.cid SELECT C.cname, F.ordernum,

F.order_date FROM Customers C, OrderInfo F WHERE C.cname = “John”

C.cid = F.cid •  Syntax checking •  Internal representation •  Handling views •  Logical/semantic rewriting •  Flattening subqueries

•  Building a query execution plan •  Efficient, if not optimal •  Pull-based execution of a plan •  Iterator model

SELECT C.cname, O.ordernum, O.order_date FROM Customers C, Orders O WHERE C.cname = “John”

C.cid = O.cid IndexScan Customers cname=“John” IndexScan Orders cid=cid C.cname, O.ordernum, O.order_date (Indexed Join) (On-the-fly) CREATE VIEW OrderInfo

(ordernum, cid, order_date) AS

SELECT O.ordernum, O.cid, O.order_date, FROM Orders O

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Transactional Storage Manager

Access Methods

Buffer Manager

Lock Manager Log Manager

IndexScan Customers cname=“John” IndexScan Orders cid=cid C.cname, O.ordernum, O.order_date (Indexed Join) (On-the-fly) Replacement policy Concurrency: 2PL… Recovery: WAL File, B+tree, R-tree, Hash

(7)

1/20/11 Yanlei Diao, University of Massachusetts Amherst

Disk Manager

Disk Space Manager Buffer Manager

Allocate/Deallocate a page Read/Write a page Contiguous seq. of pages

Data

Indexes Catalog

Log Database

1/20/11 Yanlei Diao, University of Massachusetts Amherst

DBMS: Theory + Systems

Theory! Query Parser Query Rewriter Query Optimizer Query Executor

Disk Space Manager DB Access Methods

Buffer Manager

Lock Manager Log Manager

References

Related documents

In the near future, football (touted as the game suited for Filipinos as playing the game doesn’t require much of height in order to excel in the sport) could be the number one

This course studies the principles, design, administration, and implementation of database management systems. Topics include: entity-relationship model, relational model,

Additional property listings shall be assigned to Listing broker on a daily basis Within 24 hours from receipt of the Initial List.. MLS print-outs confirming entry must be delivered

Analyzing the requirements of the database will proceed to the design of the structure of the relational database, which will then be developed in a Relational Database

This course examines data structures, file organizations, concepts and principles of database management systems (DBMS); as well as, data analysis, database design, data

updates through the PPD-1 process on priority schema to account for and enable priority on all forms of next generation networks communications (e.g., voice, video, data) for

Reorganizarea matricei extracelulare este un pattern structural iminent al restenozei intra-stent, care se manifestă prin creşterea degradării colagenului de tip I şi reducerea

Stephen Matthews, Partner – Moscow, Igor Gorchakov, Partner – Moscow, Mona Vaswani, Partner – London, Richard Smith, Partner – London, Jeffrey Sullivan, Partner – London,