UVA
DEPARTMENT OF COMPUTER SCIENCEPhysical Data Organization
Database design using logical model of the database - appropriate level for users to focus on
- user independence from implementation details Performance
- other major factor in user satisfaction - depends on
- efficient data structures for data representation - efficiency of system operation on those structures Disk access
- one of the most critical factors in performance
- main memory is in general not big enough for entire DB - recovery problem with main memory DB
- disk contains data files and system files including data dictionary and index files
UVA
DEPARTMENT OF COMPUTER SCIENCEStorage Media Hierarchy
Storage medium: primary storage and secondary storage
- database is stored physically on some some storage medium - primary storage: can be operated directly by CPU
--- main memory & cache
- secondary storage: larger capacity, lower cost, slower access;
cannot be operated directly by CPU; must be copied to primary Hierarchy
- access speed, cost per unit of data, reliability - cache: fastest and most costly
- main memory
- flash memory: limited number of writes (also slow) non-volatile: disk-substitute in embedded systems - magnetic disk and optical disk (CD-ROM)
- tape storage: sequential access; for backup and archival
UVA
DEPARTMENT OF COMPUTER SCIENCEDisk Access and Buffer Management
Disk
- direct access storage device (not sequential)
- arm movement involves seek time and latency time - goal is to reduce # of disk access and seek time - a block need not to be transferred every time
- buffer blocks: closely related with concurrency control and recovery strategy of the database system
Buffer management
- goal is to increase hit ratio
- similar to virtual memory management in OS - differences: forced writing for recovery and
MRU (most recently used first) replacement algorithm - priority-based replacement: data dictionary and
index blocks have high priority
UVA
DEPARTMENT OF COMPUTER SCIENCERAID
Redundant arrays of independent disks
- motivation: large # of small disks might be cost effective; higher reliability and higher performance Higher reliability by redundancy
- mirroring/shadowing: a logical disk consists of two physical disks --- write on both
Higher performance by parallelism
- data striping: splitting data across multiple disks - bit-level or block-level striping
- with n disks, block i will go to disk (i mod n) + 1 RAID levels
- to provide redundancy at lower cost using disk striping combined with error-correcting bits, instead of mirroring
UVA
DEPARTMENT OF COMPUTER SCIENCEFile Organization
File
- a sequence of records mapped unto disk blocks
- block: unit of data transfer between disk and memory - block size ranges from 512 bytes to few Kbytes
- fixed-length records vs variable-length records Fixed-length records
- size of each field is declared
- when delete, mark it to be ignored: searching for deleted free space may not be efficient
- use pointer for free space: danger of dangling pointer which no longer points to the desired record
- problem of interblock records: needs 2 accesses ... block i) (block i+1 ...
--- record j
UVA
DEPARTMENT OF COMPUTER SCIENCEVariable-length Records
When such situations occur?
- multiple record types in one file
- record type allows variable length fields - repeating groups (multiple values)
Methods to deal with them
- byte string representation: special end-of-record symbol ( ) at the end of each record
- each record is a string of consecutive bytes - difficulty in reusing the space of deleted record - fixed-length representation:
1) reserved space for expected maximum length - useful only if most are close to max. length 2) a list of fixed-length records chained by pointers 3) anchor block (first record of the chain) and overflow
block (all the others) chained by pointers
UVA
DEPARTMENT OF COMPUTER SCIENCEMapping Data to Files
Relational database - straight-forward
- in most cases, each relation in a separate file File organization
- how to organize a given set of records in files
- heap file: any record can be placed anywhere (no ordering) - sequential file: records are stored in a sequential order
- hashing file: hash function computes the specific block for the record based some attribute value
- clustering file: records of different relations stored on the same file/block for efficient processing
- related records can be read by one block read - may be inefficient for other operations
UVA
DEPARTMENT OF COMPUTER SCIENCEEfficient Searching
Additional structures help searching
- associated with files to make the search for records based on certain field more efficient - for direct data locating w/o sequential search - two approaches: indexing and hashing
Sequential file
- records are chained together by pointers for fast retrieval in search key order
- records are stored physically in search key order to minimize the number of block accesses
- difficult to maintain the physical sequential order as records are inserted and deleted
- binary search for files can be done on the blocks rather than on the records, if block address are available in the file header
UVA
DEPARTMENT OF COMPUTER SCIENCEIndex Structures
Index file
- index is usually defined on a single field of a record (index field)
- index file is for fast random access Dense index
- one index record for every search-key value - faster access but higher overhead
Sparse index
- index records for only some of the records - less faster but less overhead
(Brighton) (record: Brighton, ..) (Brighton) (Downhill) (record: Downhill, ..)
(Marinion) (record: Marinion, ..) (Marinion)
dense index sparse index
UVA
DEPARTMENT OF COMPUTER SCIENCEIndex Structures
Hierarchy of index
- multi-level index for a large index file - index tree (search tree)
Primary and secondary index
- primary index is the one whose search key specifies the sequential order of the file
- secondary index: index other than primary one
- secondary index improves the performance of queries that use keys other than the primary search key
- modifying DB imposes a serious overhead on secondary index (compared to the primary index)
- dense index is desirable than sparse index for secondary index, since the file is not ordered physically according to the secondary index
UVA
DEPARTMENT OF COMPUTER SCIENCEClustering Index
Clustering field
- a non-key field that does not have a distinct value for each record, on which records of a file are physically ordered
Clustering index
- clustering index is to speed up retrieval of records that have the same value for the clustering field
- differs from primary index which requires that ordering field should have a distinct value for each record
UVA
DEPARTMENT OF COMPUTER SCIENCEIndex File
Index file size
- index file for a primary index need substantially fewer blocks than the data file
- why?
- fewer index entries: an entry exists for each block of data file rather than for each record - index entry is smaller in size than a data record:
only two fields (key value and block pointer) Blocking factor (bfr)
- savings in disk block accesses
- bfr = block size (B) / record length (R)
UVA
DEPARTMENT OF COMPUTER SCIENCEIndex File: Example
An ordered file with 30,000 records, B = 1 Kb, R = 100 bytes - bfr = 10; data file needs 3000 blocks
- binary search would require (log2 Blocks) = 12 accesses - with ordering key field of 9 bytes and block pointer
of 6 bytes, size of primary index entry = 15 bytes - bfr = block size (B) / record length (R) = 68 - total # of index entries: 3000
- # of blocks needed for the index = (3000/68) = 45 - binary search on index file would require
(log2 Bi) = (log2 45) = 6 accesses - search for a record using the primary index
6 (for index) + 1 (for data) = 7 accesses
UVA
DEPARTMENT OF COMPUTER SCIENCESearch Tree
Disadvantage of indexed sequential file organization - performance degradation as file grows
- file reorganization can avoid this performance degradation with its own overhead
Search tree
- a special type of tree used to guide the search for a record given the value of one of its fields - in a search tree of order p, each node contains
at most p 1 search values and p pointers in the order
<P1, K1, ..., Pq 1, Kq 1, Pq>, where q p Pi: pointer to a child node or null pointer
Ki: search key value from some ordered set of
values (all search key values are assumed to be unique) for all values X in the subtree pointed by Pi, we have
Ki 1<X<Ki for 1<i<q, X<Ki for i=1, and Ki 1<X for i=q
UVA
DEPARTMENT OF COMPUTER SCIENCEB-tree Index Files
B-tree (balanced tree)
- a search tree with some additional constraints for efficient insertion and deletion
- number of access is fixed Formal definition
A B-tree of order n is a search tree that satisfies 1) the root has at least two children
2) all nodes other than root have at least n/2 children 3) all leaf nodes are at the same level (balanced)
Insertion and deletion
- insertion may need split when a node becomes too large (more than n children)
- deletion may need combining if a node becomes too small (less than n/2 pointers)
- balance property must be maintained
UVA
DEPARTMENT OF COMPUTER SCIENCEB-tree and B+-tree
Node structure of B-tree
<P1, (K1, Pr1), P2, ..., (Kq 1, Prq 1), Pq>
Pi: tree pointer to point another node Ki: search key value
Pri: data pointer to point record whose search key field value is Ki (or the data block containing it) - within each node, K1 < K2 < .. <Kq 1
- for all values X in the subtree pointed by Pi, we have Ki 1<X<Ki for 1<i<q, X<Ki for i=1, and Ki 1<X for i=q - a node with q tree pointers, q p, has q 1 search
key field values, and hence q 1 data pointers B+-tree: a variation of B-tree data structure
- most widely used multi-level index implementation
<P1, K1, ..., Pq 1, Kq 1, Pq>, where q p
- at leaf node, it is <K1, Pr1, ..., Kq 1, Prq 1, Pnext>
UVA
DEPARTMENT OF COMPUTER SCIENCEB+-tree
Requirements for maintaining B+-tree
- every node must contain at least n/2 pointers except for the root (which should have at least 2) - balanced: for ensuring good performance
Searching for key field value K
1) visit the root node, looking for the smallest key value greater than K. Suppose the value is Ki. 2) follow pointer Pi to another node
- if K < K1, then follow P1
- if K > Kmax, then follow Pmax
3) repeat step 2 until reaching a leaf node
UVA
DEPARTMENT OF COMPUTER SCIENCEDifferences of B+-tree from B-tree
1. In B+-tree, data pointers are stored only at the leaf nodes - more entires can be packed into internal (non-leaf) nodes
of a B+-tree than for a similar B-tree
- for the same block (node) size, the order p will be larger for the B+-tree than for the B-tree --- improved search time - B-tree eliminates redundant storage of search key values - faster search in some cases to find desired search key
values before reading a leaf node in B-tree
2. Leaf and non-leaf nodes are of the same size in B+-tree, while in B-tree, non-leaf nodes are larger
- complication in storage management for index structures 3. Deletion in B-tree is more complicated
- in B+-tree, deleted entry always appears in a leaf - in B-tree, it can be a non-leaf node, requiring
replacement by the proper value from the subtree