• No results found

Storage in Database Systems. CMPSCI 445 Fall 2010

N/A
N/A
Protected

Academic year: 2021

Share "Storage in Database Systems. CMPSCI 445 Fall 2010"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Storage in Database Systems

CMPSCI 445

Fall 2010

(2)

Storage Topics

Architecture and Overview

Disks

Buffer management

Files of records

(3)

DBMS Architecture

Disk Space Manager

File & Access Methods

Buffer Manager Query Parser Query Rewriter Query Optimizer

Query Executor

Lock Manager Log Manager

(4)

Data on External Storage

Disks: Can retrieve random page at fixed cost

 But reading several consecutive pages is much cheaper than reading them in random order

Tapes: Can only read pages in sequence

 Cheaper than disks; used for archival storage

Page: Unit of information read from or written to disk

 Size of page: DBMS parameter, 4KB, 8KB

Disk space manager:

 Abstraction: a collection of pages.

 Allocate/de-allocate a page.

 Read/write a page.

(5)

Buffer Management

Architecture:

 Data is read into memory for processing

 Data is written to disk for persistent storage

Buffer manager stages pages between external storage and main memory buffer pool.

Access method layer makes calls to the buffer

manager.

(6)

Access Methods

Access methods: routines to manage various disk-based data structures.

 Files of records

 Various kinds of indexes

File of records:

 Important abstraction of external storage in a DBMS!

 Record id (rid) is sufficient to physically locate a record

Indexes:

 Auxiliary data structures

(7)

File organizations & access methods

Many alternatives exist, each ideal for some situations, and not so good in others:

Heap (unordered) files: Suitable when typical access is a file scan retrieving all records.

Sorted Files: Best if records must be retrieved in some order, or only a `range’ of records is needed.

Indexes: Data structures to organize records via trees or hashing.

Like sorted files, they speed up searches for a subset of records, based on values in certain (“search key”) fields

Updates are much faster than in sorted files.

(8)

Disks

(9)

Disks and DBMS Design

DBMS stores information on disks.

This has major implications for DBMS design!

READ: transfer data from disk to main memory (RAM) for data processing.

WRITE: transfer data from RAM to disk for persistent storage.

Both are high-cost operations, relative to in-memory

operations, so must be planned carefully!

(10)

Why Not Store Everything in Main Memory?

Main memory is volatile. We want data to be saved between runs. (Obviously!)

Costs too much. $100 will buy you either 4GB of RAM or 1TB of disk today.

32-bit addressing limitation.

 2

32

bytes can be directly addressed in memory.

 Number of objects cannot exceed this number.

(11)

Basics of Disks

Unit of storage and retrieval: disk block or page.

 A disk block/page is a contiguous sequence of bytes.

 Size of a DBMS parameter, 4KB or 8KB.

Disks support direct access to a page.

Unlike RAM, time to retrieve a page varies!

 It depends upon the location on disk.

Therefore, relative placement of pages on disk has

major impact on DBMS performance!

(12)

Components of a Disk

Platters

Platters spin (say, 7200rpm).

Spindle

Arm assembly is moved in or out to position a head on a desired track.

Disk head

Arm movement

Only one head reads/

writes at any one time.

Tracks under heads make a cylinder (imaginary!).

Sector Track

(13)

Accessing a Disk Page

Time to access (read/write) a disk block:

seek time (

moving arms to position disk head on track

)

rotational delay (

waiting for block to rotate under head

)

transfer time (

actually moving data to/from disk surface

)

Seek time and rotational delay dominate.

Seek time varies from about 2 to 30 msec

Rotational delay varies from 0 to 11 msec

Transfer rate between 25 to 100 MB/s

Key to lower I/O cost: reduce seek/rotation

(14)

Arranging Pages on Disk

`Next’ block concept:

blocks on same track, followed by

blocks on same cylinder, followed by

blocks on adjacent cylinder

Blocks in a file should be arranged

sequentially on disk (by `next’), to minimize seek and rotational delay.

For a sequential scan, pre-fetching several

(15)

Buffer Management

(16)

Buffer Management in a DBMS

DB

MAIN MEMORY DISK

disk page free frame

Page Requests from Higher Levels

BUFFER POOL

choice of frame dictated by replacement policy

(17)

More on Buffer Management

Requestor of page must unpin it, and indicate whether page has been modified:

dirty bit is used for this.

Page in pool may be requested many times,

a pin count is used. A page is a candidate for replacement iff pin count = 0.

CC & recovery may entail additional I/O when a frame is chosen for replacement.

(Write-Ahead Log protocol; more later.)

(18)

When a Page is Requested ...

If requested page is not in pool:

Choose a frame for replacement

If frame is dirty, write it to disk

Read requested page into chosen frame

Pin the page and return its address.

If requests can be predicted (e.g., sequential scans)

(19)

Buffer Replacement Policy

Frame is chosen for replacement by a replacement policy:

Least-recently-used (LRU), Clock, MRU etc.

Policy can have big impact on # of I/O’s;

depends on the access pattern.

Sequential flooding: Nasty situation caused by LRU + repeated sequential scans.

# buffer frames < # pages in file means each page

request causes an I/O. MRU much better in this

(20)

DBMS vs. OS File System

OS does disk space & buffer mgmt: why not let OS manage these tasks?

Differences in OS support: portability issues

Some limitations, e.g., files can’t span disks.

Buffer management in DBMS requires ability to:

pin a page in buffer pool, force a page to disk

(important for implementing CC & recovery),

(21)

Files

(22)

Files

Access method layer offers an abstraction of data on disk: a file of records residing on

multiple pages

 A number of fields are organized in a record

 A collection of records are organized in a page

 A collection of pages are organized in a file

(23)

Files of Records

Page or block is OK when doing I/O, but

higher levels of DBMS operate on records and files of records.

FILE : A collection of pages, each containing a collection of records. Must support:

insert/delete/modify record

read a particular record (specified using record id)

scan all records (possibly with some conditions on

the records to be retrieved)

(24)

Unordered (Heap) Files

Simplest file structure contains records in no particular order.

As file grows and shrinks, disk pages are allocated and de-allocated.

To support record level operations, we must:

keep track of the pages in a file

keep track of free space on pages

(25)

Heap File Implemented as a List

(heap file name, header page id) stored in a known place.

Two doubly linked lists, for full pages & pages with space.

 Each page contains 2 `pointers’ plus data.

Header Page

Data

Page Data

Page Data

Page

Data

Page Data

Page Data

Page Pages with Free Space Full Pages

(26)

Heap File Using a Page Directory

A directory entry per page; it can include the number of free bytes on the page.

Data Page 1 Data Page 2

Data Page N Header

Page

DIRECTORY

(27)

Page Format

How to store a collection of records on a page?

Consider a page as a collection of slots, one for each record.

A record is identified by rid = <page id, slot #>

Record ids (rids) are used in indexes

(Alternatives 2 and 3).

(28)

Page Format: Fixed Length Records

Slot 1 Slot 2

Slot N

PACKED N

Free Space

number of records

. . .

M 1 . . . 0

M ... 3 2 1

UNPACKED, BITMAP Slot 1

Slot 2

Slot N Slot M

1 1

number of slots

. . .

(29)

Page Format: Variable Length Records

Can move records on page without changing rid; so, attractive

Page i Rid = (i,N)

Rid = (i,2)

Rid = (i,1)

20 16 24 N Pointer

to start of free space

SLOT DIRECTORY

N . . . 2 1 # slots

Rid of record is <page id, slot #>

(30)

Record Format: Fixed Length

Information of a record type e.g., the number of fields and field types is stored in the system catalog.

Fixed length record: (1) the number of fields is fixed,

Base address (B)

L1 L2 L3 L4

F1 F2 F3 F4

Address = B+L1+L2

(31)

Record Format: Variable Length

Variable length record: (1) number of fields is fixed, (2) some fields are variable length

Two alternatives:

$ $ $ $

Scan

Fields Delimited by Special Symbols

F1 F2 F3 F4

F1 F2 F3 F4 S1 S2 S3 S4 E4

Array of Field

Offsets

References

Related documents

JDBC SQL Access Storage Database buffer Network Server Application JDBC Application JDBC Application JDBC... Durability

The detailed aggregated results of the Central Bank Survey of Foreign Exchange and Derivatives Market Activity in April and at end-June 2007 and the semiannual OTC derivatives

The same scenario follows as with the EXPORT flow in the terminal, a vessel nomination document is received from the shipping line, and then a vessel visit is created.. 6.2

Sort-first parallel rendering is fundamentally limited by a very strict network bandwidth constraint that can only be alleviated using frame buffer data reduction which can be

The presence at Board meetings of bilingual staff knowledgeable in board policy issues for members of the Board to request minor clarifications in their language of choice.. There

When cutting threads, if the tap or die cuts awat metal the full form or depth of the thread, the result is 100% height of thread on the work. Therefore, the height of thread

Betsi Cadwaladr University Health Board JOB DESCRIPTION (COMBINED) Nov 2016 v1.0 Title: Clinical Fellow in Rural Emergency Medicine and either Pre-hospital

Rub the sore spot as you repeat the setup statement thrice Even though my logical mind doubts abundance I deeply and completely love and accept myself and I choose to trust that