Introduction to I/O and
Disk Management
1
Secondary Storage Management
Disks — just like memory, only differentWhy have disks?
¾ Memory is small. Disks are large.
Short term storage for memory contents (e.g., swap space). Reduce what must be kept in memory (e.g., code pages). ¾ Memory is volatile. Disks are forever (?!)
How to approach persistent storage
Disks first, then file systems.
¾ Bottom up.¾ Focus on device characteristics which dominate performance p or reliability (they become focus of SW).
Disk capacity (along with processor performance) are
the crown jewels of computer engineering.
File systems have won, but at what cost victory?
¾ Ipod, iPhone, TivO, PDAs, laptops, desktops all have filesystems.
¾ Google is made possible by a file system
3 ¾ Google is made possible by a file system.
¾ File systems rock because they are: Persistent.
Heirarchical (non-cyclical (mostly)).
Rich in metadata (remember cassette tapes?) Indexible (hmmm, a weak point?)
The price is complexity of implementation.
Different types of disks
Advanced Technology Attachment (ATA)
¾ Standard interface for connecting storage devices (e.g., hard drives and CD-ROM drives))
¾ Referred to as IDE (Integrated Drive Electronics), ATAPI, and UDMA.
¾ ATA standards only allow cable lengths in the range of 18 to 36 inches. CHEAP.
Small Computer System Interface (SCSI)
¾ Requires controller on computer and on disk.¾ Controller commands are sophisticated, allow reordering.
USB or Firewire connections to ATA disc
Different types of disks
Bandwidth ratings. ¾ These are unachievable. ¾ 50 MB/s is max off platters.
Mode Speed
UDMA0 16.7 MB/s
¾ 50 MB/s is max off platters. ¾ Peak rate refers to transfer from disc device’s memory cache.
SATA II (serial ATA)
¾ 3 Gb/s (still only 50 MB/s off platter, so why do we care?) ¾ Cables are smaller and can
UDMA0 6.7 MB/s UDMA1 25.0 MB/s UDMA2 33.3 MB/s UDMA3 44.4 MB/s UDMA4 66.7 MB/s 5 be longer than pATA.
SCSI 320 MB/s
¾ Enables multiple drives on same bus
UDMA5 100.0 MB/s
UDMA6 133 MB/s
Flash: An upcoming technology
Flash memory gaining popularity
¾ One laptop per child has 1GB flash (no disk) ¾ Vista supports Flash as accelerator
¾ Future is hybrid flash/disk or just flash?
¾ Erased a block at a time (100,000 write-erase-cycles) ¾ Pages are 512 bytes or 2,048 bytes
¾ Read 18MB/s, write 15MB/s ¾ Lower power than (spinning) disk
GB/dollar dollar/GB
Anatomy of a Disk
Basic components0
s–1
Block/Sector
Track
0
1
2
s–1
Cylinder
Head
7Platter
Surface
Spindle
Disk structure: the big picture
Anatomy of a Disk
Seagate 73.4 GB Fibre Channel Ultra 160 SCSI disk
Specs: ¾ 12 Platters ¾ 24 Heads ¾12 Arms ¾14 100 Tracks ¾ 24 Heads ¾ Variable # of sectors/track ¾ 10,000 RPM Average latency: 2.99 ms ¾ Seek times Track-to-track: 0.6/0.9 ms Average: 5.6/6.2 ms ¾14,100 Tracks ¾512 bytes/sector 9 Includes acceleration and
settle time. ¾ 160-200 MB/s peak
transfer rate 1-8K cache
Anatomy of a Disk
Example: Seagate Cheetah ST373405LC (March 2002)
Specs:
¾ Capacity: 73GB ¾ 8 surfaces per pack ¾ # cylinders: 29 549 ¾ # cylinders: 29,549
Disk Operations
Read/Write operationsPresent disk with a sector address ¾ Old: DA = (drive, surface, track, sector) ¾ New: Logical block address (LBA) Heads moved to appropriate track
¾ seek time ¾ settle time
The appropriate head is enabled
Wait for the sector to appear under the head
¾ “rotational latency”
11 ¾ rotational latency
Read/write the sector
¾ “transfer time” seek time + latency + transfer timeRead time: (5.6 ms + 2.99 ms+ 0.014 ms)
Disk access latency
Which component of disk access time is the longest?
¾ A. Rotational latency¾ B T f l t
Disk Addressing
Software wants a simple “disc virtual address space”
consisting of a linear array of sectors.
¾ S t b d 1 N h 512 b t (t i l i )
¾ Sectors numbered 1..N, each 512 bytes (typical size). ¾ Writing 8 surfaces at a time writes a 4KB page.
Hardware has structure:
¾ Which platter?
¾ Which track within the platter? ¾ Which sector within the track?
Th h d
t
t
ff
t l t
13
The hardware structure affects latency.
¾ Reading from sectors in the same track is fast.
¾ Reading from the same cylinder group is faster than seeking.
Disk Addressing
Mapping a 3-D structure to a 1-D structure
Surface
2p–1
t–1 ... 1 0
0 1 s–1 Track Sectorp
0
2
0 1 s 1...
?
Mapping criteria¾ block n+1 should be as “close” as possible to block n
0 n
The Impact of File Mappings
File access times: Contiguous allocationArray elements map to contiguous sectors on disk ¾ Case1: Elements map to the middle of the disk
5 6
3 0
6 0
2 0488 6
29 0
37 6
Transfer Time Seek Time Lat-ency5.6 + 3.0 + 6.0
Constant Variable 2,048 424= 8.6 + 29.0 = 37.6 ms
×
=
revolutiontime per required to transfer datanumber of revolutionsTransfer Time
15
Terms Term
The Impact of File Mappings
File access times: Contiguous allocationArray elements map to contiguous sectors on disk
¾ Case1: Elements map to the middle tracks of the platter5 6
3 0
6 0
2 0488 6
29 0
37 6
5.6 + 3.0 + 6.0
2,048 212Case2: Elements map to the inner tracks of the platter
= 8.6 + 58.0 = 66.6 ms
5.6 + 3.0 + 6.0
2,048= 8.6 + 29.0 = 37.6 ms
424
5.6 + 3.0 + 6.0
2,048 636Disk Addressing
The impact of file mappings: Non-contiguous allocation
Array elements map to random sectors on disk
¾ Each sector access results in a disk seek2,048 × (5.6 + 3.0) = 17.6 seconds
2p–1
t–1 ... 1 0
0 1 s–1..
17 0 nFile blocks
0
2
..
Practical Knowledge
If the video you are playing off your hard drive skips,
defragment your file system.
OS block allocation policy is complicated.
Defragmentation allows the OS to revisit layout with
global information.
Defragmentation Decisions
Files written when the disk is nearly full are more
likely to be fragmented.
¾ A T ¾ A. True ¾ B. False
19
In a multiprogramming/timesharing environment, a queue
of disk I/O requests can form
Disk Head Scheduling
Maximizing disk throughputCPU
Disk
Other Other I/O
(surface, track, sector)
I/O
Disk Head Scheduling
ExamplesAssume a queue of requests exists to read/write tracks: ¾
83
72
14
147
16
150
and the head is on track 650 25 50 65 75 100 125 150
21
Assume a queue of requests exists to read/write tracks:
¾ and the head is on track 65
Disk Head Scheduling
Examples150
16
147
14
72
83
0 25 50 65 75 100 125 150Greedy scheduling: shortest seek time first
¾ Rearrange queue from:To:
Disk Head Scheduling
Minimizing head movement150
16
147
14
72
83
To: 0 25 50 75 100 125 15072
82
147
150
16
14
23Disk Head Scheduling
Minimizing head movementGreedy scheduling: shortest seek time first
¾ Rearrange queue from:16
14
72
83
147
150
Rearrange queue from:
To:
Disk Head Scheduling
SCAN scheduling150
16
147
14
72
83
16
14
72
83
147
150
0 25 50 75 100 125 150 25 “SCAN” scheduling: Move the head in one direction until all requests have been serviced and then reverse. Also called elevatorscheduling.
Moves the head 187 tracks
Disk Head Scheduling
Other variationsC-SCAN scheduling (“Circular”-SCAN)
¾ Move the head in one direction until an edge of the disk is reached and then reset to the opposite edge
0 25 50 75 100 125 150
LOOKscheduling
Disk Performance
Disk partitioningDisks are typically partitioned to minimize the largest possible seek time
¾ A partition is a collection of cylindersp y ¾ Each partition is a logically separate disk
Partition A Partition B
27
Disks – Technology Trends
Disks are getting smaller in size
¾ Smaller Æ spin faster; smaller distance for head to travel; and lighter weight
Di k tti d
Disks are getting denser
¾ More bits/square inch Æ small disks with large capacities Disks are getting cheaper
¾ 2x/year since 1991 Disks are getting faster
¾ Seek time, rotation latency: 5-10%/year (2-3x per decade) ¾ Seek time, rotation latency: 5 10%/year (2 3x per decade) ¾ Bandwidth: 20-30%/year (~10x per decade)
Management of Multiple Disks
Using multiple disks to increase disk throughputDisk striping (RAID-0)
¾ Blocks broken into sub-blocks that are stored on separate disks similar to memory interleaving
¾ Provides for higher disk bandwidth through a larger effective block ¾ Provides for higher disk bandwidth through a larger effective block
size 3 2 1 29 8 9 10 11 12 13 14 15 0 1 2 3 OS disk block 8 9 10 11
Physical disk blocks
12 13 14 15 0 1 2 3
Management of Multiple Disks
Using multiple disks to improve reliability & availability
To increase the reliability of the disk, redundancy
must be introduced
¾ Simple scheme disk mirroring (RAID 1) ¾ Simple scheme: disk mirroring (RAID-1) ¾ Write to both disks, read from either.
Who controls the RAID?
Hardware
¾ +Tend to be reliable (hardware implementers test) ¾ +Offload parit comp tation from CPU
¾ +Offload parity computation from CPU
Hardware is a bit faster for rewrite intensive workloads ¾ -Dependent on card for recovery (replacements?) ¾ -Must buy card (for the PCI bus)
¾ -Serial reconstruction of lost disk
Software
¾ Software has bugs
31 ¾ -Software has bugs
¾ -Ties up CPU to compute parity
¾ +Other OS instances might be able to recover ¾ +No additional cost
¾ +Parallel reconstruction of lost disk
Management of Multiple Disks
Using multiple disks to increase disk throughputRAID (redundant array of inexpensive disks) disks
¾ Byte-wise striping of the disks (RAID-3) or block-wise striping of the disks (RAID-0/4/5)
¾ Provides better performance and reliability
3 2
1
Improving Reliability and Availability
RAID-4Block interleaved parity striping
¾ Allows one to recover from the crash of any one disk ¾ Example: storing 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3
Disk 1 Disk 2 Disk 3 Parity Disk
x x x x 33
RAID-4
layout:
1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 0 1 1 0 0 1 1 xImproving Reliability and Availability
RAID-5 Block interleaved parity stripingDisk 1 Disk 2 Disk 3 Disk 4 Disk 5