• No results found

Relational Database Systems 2 2. Physical Data Storage

N/A
N/A
Protected

Academic year: 2021

Share "Relational Database Systems 2 2. Physical Data Storage"

Copied!
63
0
0

Loading.... (view fulltext now)

Full text

(1)

Christoph Lofi

Philipp Wille

Institut für Informationssysteme

Relational Database Systems 2

2. Physical Data Storage

(2)

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

2

Data

Storage Manager

Query Processor

Application

Interfaces

Indices

Statistics

DDL

Interpreter

Query

Evaluation

Engine

Applications

Programs

Object Code

Transaction Manager

Buffer

Manager

File Manager

Catalog/

Dictionary

Embedded

DML

Precompiler

DML

Compiler

DB Scheme

Application

Programs

Direct Query

Application

Programmers

DB

Administrators

(3)

2.1 Introduction

2.2 Hard Disks

2.3 RAIDs

2.4 SANs and NAS

2.5 Case Study

(4)

DBMS needs to retrieve, update and process

persistently stored

data

Storage consideration is an important factor in

planning a database system (physical layer)

Remember:

The data has to be securely stored, but

access to the data should be declarative!

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

4

2.1 Physical Storage Introduction

(5)

Data is stored on a

storage media

. Media highly

differ in terms of

Random Access

Speed

Random/Sequential Read/Write

speed

Capacity

Cost

per Capacity

(6)

Capacity:

Quantifies the amount of data

which can be stored

Base Units: 1 Bit, 1 Byte = 2

3

Bit = 8 Bit

Capacity units according to IEC, IEEE, NIST, etc:

Usually used for file sizes and primary storage (for higher

degree of confusion, sometimes used with SI abbreviations…)

1

KiB

= 1024

1

Byte; 1

MiB

= 1024

2

Byte ; 1

GiB

= 1024

3

Byte;

Capacity units according to SI:

Usually used for advertising secondary/tertiary storage

1

KB

= 1000

1

Byte

0.976 KiB; 1

MB

= 1000

2

Byte

0.954 MiB;

1

GB

= 1000

3

Byte

0.931 GiB; …

Especially used by the networking community:

1

Kb

= 1000

1

Bit

=

0.125 KB

0.122 KiB ;

1

Mb

= 1000

2

Bit

=

0.125 MB

0.119 MiB

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

6

(7)
(8)

Random Access Time:

Average time to access

a random piece of data at a known media position

Usually measured in ms or ns

Within some media, access time can vary depending

on position (e.g. hard disks)

Transfer Rate:

Average amount consecutive of

data which can be transferred per time unit

Usually measured in KB/sec, MB/sec, GB/sec,…

Sometimes also in Kb/sec, Mb/sec, Gb/sec

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

8

(9)

Volatile:

Memory needs constant power to keep data

Dynamic:

Dynamic volatile memory needs to be “refreshed”

regularly to keep data

Static:

No refresh necessary

Access Modes

Random Access:

Any piece of data can be accessed in

approximately the same time

Sequential Access:

Data can only be accessed in sequential

order

Write Mode

Mutable

Storage: Can be read and written arbitrarily

Write Once Read Many (

WORM

)

(10)

Online

media

„always on“

Each single piece of data can be accessed fast

e.g. hard drives, main memory

Nearline

media

Compromise between online and offline

Offline media can automatically put “on line”

e.g. juke boxes, robot libraries

Offline

media (disconnected media)

Not under direct control of processing unit

Have to be connected manually

e.g. box of backup tapes in basement

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

10

2.1 Online, Nearline, Offline

(11)

Media characteristics result in a

storage hierarchy

DBMS optimize data distribution among the storage

levels

Primary Storage:

Fast, limited capacity, high price,

usually volatile electronic storage

Frequently used data / current work data

Secondary Storage:

Slower, large capacity, lower price

Main stored data

Tertiary Storage:

Even slower, huge capacity, even lower

price, usually offline

(12)

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

12

2.1 The Storage Hierarchy

Cos

t

Spe

ed

Optical Disks, Tape

Primary

Secondary

Tertiary

Cache, RAM

Flash, Magnetic Disks

~100 ns

~10 ms

(13)

Type

Media

Size

Random

Acc. Speed

Transfer

Speed

Characteristics

Price

Price/GB

Pri

L1-Processor Cache

32 KB

5 x 10

-10

s

15.4 GB/sec

Vol, Stat,

RA,OL

Pri

DDR3-Ram

(Corsair Dominator

Platinum Series)

8 GB

2.6 x 10

-8

s

12.3 GB/sec

Vol, Dyn, Ra,

OL

€ 160

€ 20

Sec

Harddrive SSD

(Samsung 840 PRO)

256 GB

4 x 10

-6

s

513 MB/sec

Stat, RA, OL

€ 187

€ 0.73

Sec

Harddrive Magnetic

(Seagate ST2000DM001)

2000 GB

5.7 x 10

-4

s

153 MB/sec

Stat, RA, OL

€ 100

€ 0.05

Ter

Blank recordable

DVD-R disk

4.7 GB

9.8 x 10

-2

s

11 MB/sec

Stat, RA, OF,

WORM

€ 0.15

/Disk

€ 0.03

Ter

LTO-5 tape

(TDK - LTO Ultrium 5 Data Cartridge)

1500 GB

58 s

280 MB/sec

Stat, SA, OF

€ 15

/Tape

€ 0.01

2.1 Storage Media – Examples

Pri= Primary, Sec=Secondary, Ter=Tertiary

(14)

Hard drives are currently the standard for

large, cheap and persistent storage

Usually used as the

main storage

media

for most data in a DB

DBMS need to be

optimized

for

efficient disk storage and access

Data access needs to be as fast as possible

Often used data should be accessible with highest speed,

rarely needed data may take longer

Different data items needed for certain reoccurring tasks

should also be stored/accessed together

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

14

(15)

Directionally

magnetization of a ferromagnetic

material

Realized on hard disk platters

Base platter made of non-magnetic aluminum or glass substrate

Magnetic grains worked into base platter to form

magnetic regions

Each region represents 1 Bit

Read head

can detect

magnetization direction of

each region

Write head

may change

direction

(16)

G

iant

M

agneto

r

esistance Effect (

GMR

)

Discovered 1988 simultaneously by Peter Grünberg and Albert

Fert

Both honored with the 2007 Nobel Prize in Physics

Allows the construction of efficient

read heads:

The electric resistance of an alternating ferro- and non-magnetic

material giantally changes within changing directed magnetic fields

http://www.research.ibm.com/research/demos/gmr/cyberdemo1.htm

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

16

(17)

Perpendicular

Recording (used since 2005)

Longitudal Recording limited to ~200 Gb/inch

2

due to

superparamagnetic effect

Thermal energy may spontaneously change magnetic direction

Perpendicular recording allows for up to 1000 Gb/inch

2

Very simplified:

Align

magnetic field orthogonal

to surface instead of

parallel

Magnetic regions can be

smaller

(18)

Usage of magnetic

grains

instead of continuous

magnetic material

Between magnetic direction transitions,

Neel Spikes

are

formed

Areas of unsure magnetic direction

Neel Spikes are larger for continuous materials

Magnetic regions can be smaller as the transition width can be

reduced

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

18

(19)

A hard disk is made up of multiple

double-sided platters

Platter sides are called

surfaces

Platters are fixed on main

spindle

and rotate at equal and constant speed

(common: 5400 rpm / 7200 rpm)

Each surface has it’s own

read

and

write head

Heads are attached to

arms

Arms can position heads

along the surface

Heads cannot move

inde-pendently

Heads have no contact

to surface and hover on

top of an air bearing

(20)

Each surface is divided into circular

tracks

Some disks may use spirals

All tracks of all surfaces with the same diameter

are called

cylinder

Data within the same cylinder can be accessed very

efficiently

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

20

EN 13.2

(21)

Each track is subdivided into

sectors

of equal

capacity

a) Fixed angle sector subdivision

Same number of sectors per track, changing density,

constant speed

b) Fixed data density

Outer tracks have more sectors than inner tracks

Transfer speed higher on

outer tracks

Adjacent sectors can be

(22)

Hard drives are not completely reliable!

Drives do

fail

Means for physical failure recovery are necessary

Backups

Redundancy

Hard drives age and

wear

down.

Wear significantly increases by:

Contact

cycles (head parking)

Spindle start-stop

Power-on

hours

Operation outside ideal environment

Temperature too low/high

Unstable voltage

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

22

2.2 HD – Reliability

(23)

Reliability

measures are

statistical

values assuming

certain usage patterns

Desktop usage (all per year): 2,400 hours, 10,000 motor

start/stops, 25°C temperature

Server usage (all per year): 8,760 hours, 250 motor start/stops,

40°C temperature

Non-Recoverable read errors:

A sector on the surface

cannot be read anymore – the data is lost

Desktop disk: 1 per 10

14

read bits, Server: 1 per 10

15

read bits

Disk can detect this!

Maximum

contact cycles

: Maximum number of allowed

head contacts (parking)

(24)

Mean Time Between Failure (

MTBF

): Statistically

anticipated time for a large disk population failing to

50%

Drive manufactures usually use optimistic simulations to

guess the MTBF

Desktop: 0.7 million hours (80 years), Server: 1.2 million

hours (137 years) –

Manufacturers values

Annualized Failure Rate (

AFR

): Probability of a failure

per year based on MTBF

AFR = OperatingHoursPerYear / MTBF

hours

Desktop: 0.34%, Server: 0.73%

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

24

2.2 HD – Reliability

(25)

Failure rate during a hard disks lifespan is not constant

Can be better modeled by the “

bathtub curve

” having

3 components

Infant Mortality Rate

Wear Out Failures

Random Failures

(26)

Report by Google

100,000 consumer grade disks

(80-400GB, ATA Interface,

5,400-7,200 RPM)

Results (among others)

Drives fail often!

There is infant mortality

High usage increases infant

mortality, but not later failure

rates

Observed AFR is around

7%

and

MTBF

16.6 years

!

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für

Informationssysteme

26

Failure trends in a large disk drive population

E. Pinheiro, W.-D. Weber, L. A. Barroso 5th USENIX conference on File and Storage Technologies, 2007

2.2 Real World Failure Rates

(27)

Seagate ST32000641AS 2 TB (Desktop Harddrive, 2011)

Manufacturer’s specifications

2.2 HD – Example Specs

Specification

Value

Capacity

2 TB

Platters

4

Heads

8

Cylinders

16,383

Sectors per track

63

Bytes per sector

512

Spindle Speed

7200 RPM

MTBF

85 years

(28)

Assume a storage need of 100 TB. Only following

HDs are available

Capacity:

1 TB

capacity each

MTBF: 100,000 hours each (ca.

11 years

)

Consider using 100 of these disks independently

(w/o RAID).

Total Storage: 100,000 GB = 100 TB

MTBF: 1,000 hours (ca.

42 days

)

THIS IS BAD!

More sophisticated ways of using multiple disks are

needed

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

28

(29)

Alternative to hard-drives:

SSD

Use microchips which retain data in non-volatile

memory chips

and contain

no moving parts

Use the same interface as hard disk drives

Easy replacement in most applications possible

Key components

Memory

Controller

(30)

Flash-Memory

Most SSDs use NAND-based flash memory

Retains memory even without power

Slower than DRAM solutions

Single-level cell versus multi-level cell

Wears down!

DRAM

Use volatile Random Access Memory

Ultrafast data access (< 10 microseconds)

Sometimes use internal battery or external power device to

ensure data persistence

Only for Applications that require even faster access, but do

not need data persistence after power loss

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

30

2.2 Memory

(31)

The controller is an embedded processor

Incorporates the electronics that bridge the

NAND memory components to the host

computer

Some of its functions

error correction,

wear leveling

,

bad block

mapping

, read and write caching, encryption, garbage

collection

(32)

Advantages

Low access time and latency

No moving parts

shock resistant

MTBF about 2 million hours

Lighter and more energy-efficient than HDDs

Disadvantages

Divided into blocks/pages

If one byte changes the whole page has to be written

The old page will be marked as stale

Only whole blocks can be deleted

Limited ability of being rewritten (between 3,000 and 100,000

cycles per page)

Wear leveling algorithms assure that write operations are equally

distributed to the pages

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

32

2.2 SSD – Summary

(33)

The

disk controller

organizes low level

access to the disk

e.g. head positioning, error checking, signal

processing

Usually integrated into the disk

Provides

unified

and

abstracted

interface

to access the disks (e.g. LBA)

Connects disk to a peripheral bus (e.g. IDE,

SCSI, FiberChannel, SAS)

The

host bus adapter

(HBA) bridges

between the peripheral bus and systems

internal bus (like PCIe, PCI)

Internal Bus usually integrated into systems

main board

Often confused as being the disk controller

DAS

(Directly Attached Storage)

2.2 HD – Controller

Disk

Controller

Host Bus

Adapter

Peripheral

Bus

Internal Bus

Inner System / Mainboard

(34)

Sectors can be logically grouped to

blocks

by the

operating system

Sectors in a block do not necessarily need to be

adjacent

e.g. NTFS defaults to 4 KiB per block

8 sectors on a modern disk

Hardware address

of a block is combination of

Cylinder number, surface number, block number

within track

Controller maps hardware address to

logical block

address

(LBA)

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

34

2.2 HD – Controller

(35)

Disk controller transfers content of whole blocks to

buffer

Buffer resides in

primary storage

and can be accessed

efficiently

Time needed to transfer a random block (4KiB/Block on

ST3100034AS): (<10 msec)

Seek Time:

Time needed to position head to correct cylinder (<8

msec)

Latency (Rotational Delay):

Time until the correct block arrives

below the head (<0.14 msec)

Block Transfer Time:

Time to read all sectors of block (<0.01 msec)

Bulk Transfer Rate for

n

adjacent blocks (<20msec for

n

=10)

(36)

Locating data on a disk is a major

bottleneck

Try operating on data already in buffer

Aim for bulk transfer, avoid random block transfer

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

36

2.2 HD – Controller

(37)

A single HD is often not sufficient

Limited capacity

Limited speed

Limited reliability

Idea: Combine multiple HD into a

RAID Array

(

R

edundant

A

rray of

I

ndependent

Disk

s)

RAID Array treats multiple hardware disks as a single

logical disk

More HDs for increased capacity

(38)

The RAID controller connects

to multiple hard disks

Disks are

virtualized

and

appear to be just one single

logical

disk

The RAID controller acts as an

extended specialized HBA

(Host Bus Adapter)

Still

DAS

(Directly Attached

Storage)

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

38

2.3 RAID Controller

RAID Controller

Peripheral Bus

Internal Bus

Represented

as single

logical Disk

(39)

Mirroring

(or shadowing): Increases reliability by

complete redundancy

Idea: Mirror Disks are

exact copies

of original disk

Not space efficient

Read speed can be

n

times as fast, write speed does not

increase

Increases reliability. Assume

Two disks with a MTBF 11 years each

One original disk, one mirror disk

Assume disk failures are independent of

each other (unrealistic)

Disk replacement time of 10 hours

(40)

Striping:

Improve performance by parallelism

Idea: Distribute data among all disks for increased

performance

BitLevel Striping:

Split all bits of a byte to the disks

e.g. for 8 disks, write

i

-th bit to disk

i

Number of disk needs to be a power of 2

Each disk is involved in each access

Access rate does not increase

Read and write transfer speed linearly increases

Simultaneous accesses not possible

Good for speeding up few, sequential and large

accesses

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

40

Silber 11.3

(41)

Block Level Striping:

Distribute blocks among

the disks

Only one disk is involved reading a specific block

Read and write speed of a single block not increased

Other disks still free to read/write other blocks

Read and write speed of multiple accesses increase

Good for large number of parallel accesses

(42)

Error Correction Codes:

Increase reliability

with computed redundancy

Hamming Codes

(~1940)

Can

detect

and

repair

1 bit errors within

a set of n

data bits

by computing k

parity bits

n

= 2

k

-

k

– 1

n=1, k=2;

n=4, k=3;

n = 11, k=4;

n = 26, k=5; …

Especially used for in-memory and tape error

correction

Media cannot detect errors autonomously

Not really used for hard drives anymore

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

42

(43)

Interleaved Parity

(Reed-Solomon Algorithm on

GD(2) Galois Field)

Can

repair

1-bit errors (when the error is known)

Hard Disks can detect read errors themselves, no need

for complete Hamming codes

Basic Idea:

From

n

data pieces D

1

,…,D

n

compute a parity data D

p

by

combining data using logical

XOR

(eXclusive OR)

XOR

is associative and commutative

Important: A

XOR

B

XOR

B = A

i.e. D

p

= D

1

XOR

D

2

XOR

XOR

D

n

(44)

Interleaved Parity. Example:

A = 0101, B = 1100, C = 1011

P

=

0010

= A

XOR

B

XOR

C

C is lost.

P

= A

XOR

B

XOR

C

C =

P

XOR

A

XOR B

C =

A

XOR

B

XOR

C

XOR

A

XOR

B

C =

A

XOR

A

XOR

B

XOR

B

XOR

C

C = 0

XOR

C

C = 1011

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

44

2.3 RAID Principles – Interleaved Parity

0101

XOR

1100

XOR

1011

P

0010

0010

XOR

0101

XOR

1100

C

1011

(45)

The 3 RAID principles can be combined in multiple ways

Not every combination is useful

This led to the definition of 7 core RAID

levels

RAID 0 – RAID 6

The most dominant levels are RAID 0, RAID 1, RAID 1+0, RAID 5

In following examples, assume

A MTBF

of 100,000 hours (11.42 years) per disk

A Mean Time to Repair (MTTR) of 6 hours

Failure rate is constant and failures between disks are independent

MTBF

raid

is the mean time to data loss within the raid if each failing

disk is replaced within the MTTR

D

is the number of drives in the RAID set

C=200 GB

is capacity of one disk, C

capacity of whole raid

(46)

Mean Time to Repair (

MTTR

)

MTTR = TimeToNotice + RebuildTime

Assume time to notice of 0.5 hours

Rebuild time

is the time for completely writing back lost

data

Assume disk capacity of 200GB

Write back speed of 10 MB/sec

Consisting of reading remaining disks

Computing parity / Reconstructing data

Rebuild time around 5.5 hours

During rebuild, a RAID is especially

vulnerable

MTTR = 6 hours

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

46

(47)

File A (A1-Ax), File B (B1-Bx), File C (C1-Cx)

Raid 0

Block-Level-Striping

only

Increased parallel access and transfer speeds, reduced

reliability

All disks contain data (0% overhead)

Works with any number of disks

MTBF

raid

= MTBF

disk

/ D

4 disks:

MTBF

raid

= 2.86 years

C

raid

= 800 GB

(0 GB wasted (0%))

Common size:

2 disks

(48)

Raid 1

Mirroring

only

Increased reliability, increased read transfer speed, low space

efficiency

MTBF

raid

= MTBF

disk

D

/ (D! * MTTR

D-1

)

4 disks

:

MTBF

raid

= 2.2 trillion years

C

raid

= 200 GB

(600 GB wasted (75%))

Age of universe may be around 15 billion years…

Common size:

2 disks

MTBF

raid

= 95,130 years

C

raid

= 200 GB

(200 GB wasted (50%))

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

48

2.3 RAID Levels

(49)

RAID 2

Not used anymore

in practice

was used in old mainframes

Bit-Level-Striping

Use Hamming Codes

Usually Hamming Code(7,4) – 4 data bits, 3 parity bits

Reliable 1-Bit error recovery (i.e. one disk may fail)

3 redundant disks per 4 data disks (75% overhead)

Ratio better for larger number of disks

MTBF

raid

= MTBF

disk

2

/ (D*(D-1) * MTTR)

7

disks (does not really make sense for 4 –

not comparable

to other values)

MTBF

= 4,530 years

(50)

RAID 3

Interleaved-Parity

Byte-Level-Striping

Dedicated Parity Disk

Bottleneck! Every write operation

needs to update the parity disk.

No parallel writes

1 redundant disk per

n

data disks

Overhead decreases with number of disks

while reliability decreases

25% overhead for 4 data disks

MTBF

raid

= MTBF

disk

2

/ (D*(D-1) * MTTR)

4 disks

MTBF

raid

= 15,854 years

C

raid

= 600 GB

(200 GB wasted (25%))

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

50

2.3 RAID Levels

(51)

RAID 4

Block-Level Striping

As RAID 3 otherwise

4 disks

(common size)

MTBF

raid

= 15,854 years

C

raid

= 600 GB

(200 GB wasted (25%))

5 disks

(also common size)

MTBF

raid

= 9,513 years

C

raid

= 800 GB

(200 GB wasted (20%))

(52)

RAID 5

Parity is distributed

among the

hard disks

May allow for parallel block writes

As RAID 4 otherwise

Bottleneck when writing many files

smaller than a block

Whole parity block has to be read and

re-written for each minor write

Can recover from a single disk failure

MTBF

raid

and C

raid

as for RAID 3 & 4

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

52

2.3 RAID Levels

(53)

RAID 6

Two independent parity blocks distributed

among the disks

May be implemented by parity on orthogonal data or by using Reed-Solomon on

GF(2

8

)

As RAID 5 otherwise

2 redundant disks per

n

data disks

Can recover from a

double disk failure

No vulnerability during single failure rebuild

Very suitable for larger arrays

Writer overhead due to more complicated parity computation

MTBF

raid

= MTBF

disk

3

/ (D*(D-2)*(D-1) * MTTR

2

)

4 disks

MTBF

raid

= 132 million years

C

raid

= 400 GB

(400 GB wasted (50%))

8 disks

(common)

(54)

Additionally, there are hybrid levels combing the core levels

RAID 0+1, RAID 1+0, RAID 5+0, RAID 5+1, RAID 6+6, …

Raid 1+0

Mirrored sets nested in a striped set

RAID 0 on sets of RAID 1 sets

Very high read and write transfer speeds

, increased reliability, low space efficiency,

limited maximum size

Most performant RAID combination

D

1

= Drives per RAID 1, D

0

=Number of RAID1 sets

MTBF

raid

= MTBF

disk

D1

/ (D

1

! * MTTR

D1-1

) / D

0

4 disks

: D

1

= 2, D

0

= 2

MTBF

raid

= 47,565 years

C

raid

= 400 GB

(400 GB wasted (50%))

6 disks

: D

1

= 2, D

0

= 3

MTBF

raid

= 31,706 years

C

raid

= 600 GB

(600 GB wasted (50%))

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

54

(55)

RAIDs controllers

directly

connect storage to the

system bus

Storage available to only one system/ server/ application

Number of disks is limited

Consumer grade RAID: 2-4 disks

Enterprise grade RAID: 8-24+ disks

Solutions

NAS

(Network Attached Storage): Provide abstracted

file

systems

via network (

software solution

)

SAN

(Storage Area Network): Virtualized logical storage

within a specialized network on

block level

(

hardware

(56)

Before discussing NAS, we need file

systems

A file systems is a

software

for

abstracting

file operations on a

logical storage device

Files are a collection of binary data

Creating, reading, writing, deleting, finding,

organizing

How does a file access translate into

top-level operations on a logical

storage device?

e.g. which blocks have to be read/written?

Bridge between

application software

and (abstracted)

hardware

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

56

2.4 File Systems vs. Raw Devices

Application

Software

File System

Logical

Storage

(57)

Raw Devices access allows

applications to bypass the OS and

the file system

Application may directly tune

aspects of physical storage

There is still the hard drive

controller…so, its not really direct

May lead to very efficient

implementations

2.4 File Systems vs. Raw Devices

Application

Software

(58)

Idea: Provide a remote

file system

using already

available

network

infrastructure

NAS

: Network Attached Storage

Use specialized

network protocols

(e.g. CIFS, NFS,

FTP, etc)

Easiest case: File Server (e.g. Linux+Samba)

Advantages:

Easy

to setup, easy to use,

cheap

infrastructure

Allows

sharing

of storage among several systems

Abstracts on file system level (easy for most

applications)

Disadvantages

Inefficient

and slow

large protocol and processing overhead

Abstracts on file system level (not suitable for special

purposes like raw devices or storage virtualization)

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

58

2.4 NAS – Network Attached Storage

Application

Software

File System

Logical

Storage

Network

NAS

Ser

ver

(59)

SANs offer specialized

high-speed networks

for storage devices

Usually uses local FibreChannel

networks

Remote location may be connected via Ethernet

or IP-WAN (Internet)

Network uses specialized storage protocols

iFCP (SCSI on FiberChannel)

iSCSI (SCSI on TCP/IP)

HyperSCSI (SCSI on raw ethernet)

SANs provide

raw block level

access to

logical storage

devices

Logical disks of any size can be offered by the SAN

For a client system using a logical disk, it might

appears like a local disk or RAID

Client system has full control over file systems on

2.4 SAN – Storage Area Network

Application

Software

File System

(60)

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

60

2.4 SAN – Storage Area Network

SAN HBA

SAN/RAID HBA

Peripheral Bus (SCSI, SAS, etc.)

SAN Switch

SAN Switch

SAN HBA

SAN Bus (iFCP)

SAN

HBA

SAN

HBA

SAN Switch

NAS Protocol

(CIFS)

Ethernet

Network

WAN-SAN Bus (HyperSCSI)

NAS

Head

(61)

Advantages:

Very efficient

Highly optimized local network infrastructure

Optimized protocols with low overhead

Very flexible

(any number of systems may use any number of

disks at any location)

Helps for disaster protection

SAN can transparently span to even remote locations

May also employ

NAS heads

for NAS-like behavior

Disadvantages

(62)

There are different types of storage

Usually, there is a storage hierarchy

Faster, smaller, more expensive storage

Slower, bigger, less expensive storage

Hard drives are currently the most popular media

Mechanical device

High sequential transfer rates,

Bad random access times, low random transfer rates

Prone to failure

DBMS must be optimized for the used storage

devices!

Relational Database Systems 2 – Wolf-Tilo Balke – Institut für Informationssysteme

62

2 Physical Storage

(63)

Access Pathes

Physical Data Access

Index Structures

Physical Tuning

References

Related documents

The Cat and the Mouse New Vocabulary agree hungry mouth waters stare surprised travel yawn.. Suggested Background Information, Activities and Questions

fender. The rubber washer can provide vibration isola- tion and seems to have been part of original re- mote reservoir installations. Lock washers should be used behind the

In a surprise move, the Central Bank of Peru (BCRP) reduced its benchmark interest rate by 25 basis points (bps) to 3.25% in mid-January following disappointing economic growth data

Monte Carlo simulations were used to assess organ and effective doses resulting from default scan protocols (head, thorax, and pelvis) implemented in V1.6 and

Although many parents understand the importance of talking to their kids about sex, far fewer seem to realize just how important it is for them to talk to their children about

Failure to thrive- inadequate weight gain or deviation from growth standards for age and sex, based on standardized National Centers for Health Statistics (NCHS) growth

Understood as public proposals for how African Americans could employ forms of self- help to improve some dimension of black life, uplift appeals marked a rich site of rhetorical

Given these observa- tions, we propose that a copula approach provides a promising way forward across research domains for a more refined description of coupling between nodes, and