1
CSC 2233:
Topics in Computer System Performance and Reliability: Storage Systems!
Note: some of the slides in today’s lecture are borrowed from a course taught by Greg Ganger and Garth Gibson at Carnegie Mellon University
Who am I?
3
What makes storage systems so cool?
1. Combines so many topic areas:
hardware meets OS meets networking meets distributed systems meets security meets AI meets HCI…
What makes storage systems so cool?
1. Combines so many topic areas 2. This is where great jobs are!
Designers and implementers still needed
not just testing J
Continuing growth area for the future
The Internet is a network, but the web is a storage system
Strong existing companies: EMC, NetApp, …
Core competency for Internet services: Google, Microsoft, Amazon, …
and still support for start-ups
5
What makes storage systems so cool?
1. Combines so many topic areas 2. Great careers
3. Still so much room to contribute:
performance actually matters here
in fact, it dominates other parts of system performance in many cases
… and reliability too
storage management wide open
and, storage starting to “take over” computation
Big data …
Lots is and will be happening
Solid state drives and other technologies?
Amdahl ’ s Law
Speedup limited to fraction improved
obvious, but fundamental, observation
50 50
90% reduction in BLUE yields only
45% reduction in total
50
5
7
Technology Trends
2000" 2002" 2004" 2006" 2008" 2010"
Year"
Normalized value relative to 2000"
1"
10"
100"
CPU Performance"
Memory Bandwidth"
Disk Bandwidth"
Network Bandwidth"
Network Latency"
Disk Latency"
Consequence: storage performance dominates
0 10 20 30 40 50 60 70 80 90
100 CPU Time
I/O Time
0 10 20 30 40 50 60 70 80 90
100 CPU Time
I/O Time
9
“I/O certainly has been lagging in the last decade”
Seymour Cray, 1976
“Also, I/O needs a lot of work”
David Kuck, 1988
“In 3 to 5 years, we will start seeing servers as peripherals to storage”
SUN Chief Technology Officer, 1998
“Scalable I/O is perhaps the most overlooked area of high-performance computing R&D”
Suggested R&D topic report for 2005-2009
Storage systems: fun quotes
Logistics & Administratives
Class time: Thu 10am – 12pm
Office hours:
By appointment
Class web page
www.cs.toronto.edu/~bianca/csc2233.html
11
Grading
30% class participation
Participation in class discussions
(Read all papers prior to class)
Class presentation of research paper
70% class project
No exams, no homework, no paper summaries
Class project
Can be done in team of two or alone
Start looking for a partner now!
On a research project you pick
I will suggest possible projects (see course web page)
You can propose your own
Start thinking about it soon, proposal due in ~3 weeks
Output: workshop quality research paper (10-12 pages)
Even better: conference quality paper
Use latex template on course web page
All reports will be published as tech-report
13
Class project
Output: workshop quality research paper (10-12 pages)
I will help you get there --- multiple milestones:
Project proposal
Related work
Status reports
Final report
And meetings with instructor
Topic of class project
Project topic must be related to the topic of the class
Is it OK to have overlap with my research / my course project in another course?
You cannot get academic credit for the same piece of work twice
15
Paper presentation
Each of you will present one or two papers in class
Format of the presentation:
30 min presentation of paper
5-15 min paper review
Good points
Bad points
10 min class discussion that you lead!
Prepare questions!
Paper presentation
What I do not want:
A long laundry list of all things the paper did
What I do want:
A lecture style presentation of the paper
Including background material your fellow class mates might need to understand the paper
A critical discussion of the paper
Strength & Weaknesses
Prepare questions!
17
Purpose of presentation
Wrong answers:
“To give a verbal version of the paper, cramming all its content into 30 min”
“To impress people with your technical depth and thoroughness”
In fact, no one cares about these things
The goal is to filter out the main points of the paper and present them well
By the end, everybody in the audience should remember 2-3 take- home messages
What ’ s on each slide?
Each slide should have one basic point
There should NOT be tons of text
Use sentence fragments
Use pictures everywhere you possibly can!
A picture says more than 1000 words
Saves text and thus slides
Much easier to process
19
Rest of today: Some review …
What are storage systems all about?
Memory/storage hierarchy
21
Memory/storage hierarchies
Balancing performance with cost
Small memories are fast but expensive
Large memories are slow but cheap
Exploit locality to get the best of both worlds
locality = re-use/nearness of accesses
allows most accesses to use small, fast memory
Capacity Performance
Example memory hierarchy values
Notice the huge access time gap
between DRAM and disk
Where will SSDs go?
23
What are storage systems all about?
Memory/storage hierarchy
Combining many technologies to balance costs/benefits
No longer the focal point of storage system design
Still important though
Maybe more so with new technologies arriving on the market
What are storage systems all about?
Memory/storage hierarchy
Combining many technologies to balance costs/benefits
No longer the focal point of storage system design
Still important though
Maybe more so with new technologies arriving on the market
Persistence
Storing data for lengthy periods of time
To be useful, it must also be possible to find it again later
this brings in data organization, consistency, and management issues
This is where the serious action is
25
Why persistence is important
Some statistics:
Among companies who lose data in a disaster, 50% never re-open and 90% are out of business within two years
Even smaller incidents can be costly
Reproducing some tens of megabytes of accounting data can take several weeks and cost tens of thousands of dollars
Bad PR!
Storage System Application
Bob1 Bob2 Bob3 Bob4
Bob1 Bob2 Bob3 Bob4 Bob3 Bob4 Bob4
Application gives data objects & their
IDs to storage
What is a storage system: Big Picture
The storage system keeps the data objects
and returns one upon request (by ID)
Bob2
Bob1
27
Storage Systems & Interfaces
What is a “Storage System”?
Hardware (devices, controllers, interconnect) and Software (file system, device drivers, firmware) dedicated to providing
management of and access to persistent storage.
One view: defined by collection of interfaces
Program Physical Media
File system
Device driver
I/O
controller
High level of abstraction No abstraction
Storage Software Interfaces
Understands files and
29
OS sees storage as linear array of blocks
OS’s view of storage device
Common disk block size: 512 bytes
Number of blocks: device capacity / block size
Common OS-to-storage requests defined by few fields
R/W, block #, # of blocks, memory source/dest
6
5 7 12 23
…
…
OS sees storage as linear array of blocks
OS’s view of storage device
How does the OS implement the abstraction of files and directories on top of this logical array of disk blocks?
6
5 7 12 23
…
…
31
File System Implementation
File systems define a block size (e.g., 4KB)
Disk space is allocated in granularity of blocks
Bitmap Space to store files and directories
Default usage of LBN space
Superblock
Notice the terminology clash here: “block” is used for different
things by the file system and the disk interface… and this kind of
thing is common in storage systems!!
File System Implementation
File systems define a block size (e.g., 4KB)
Disk space is allocated in granularity of blocks
A “Master Block” determines location of root directory (aka superblock)
Always at a well-known disk location
Often replicated across disk for reliability
A free map determines which blocks are free, allocated
Usually a bitmap, one bit per block on the disk
Also stored on disk, cached in memory for performance
Remaining disk blocks used to store files (and dirs)
There are many ways to do this
33
Disk Layout Strategies
Files span multiple blocks
How do you allocate the blocks for a file?
1. Contiguous allocation
Contiguous Allocation
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
25 26 27 28 29
File Name Start Blk Length
File A 2 3
File B 9 5
File C 18 8
File D 27 2
directory Disk
35
Disk Layout Strategies
Files span multiple disk blocks
How do you find all of the blocks for a file?
1. Contiguous allocation
Like memory
Fast, simplifies directory access
Inflexible, causes fragmentation, needs compaction 2. Linked, or chained, structure
Linked Allocation
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
15 16 17 18 19
20 21 22 23 24
25 26 27 28 29
File Name
Start Blk Last Blk
… … …
File B 1 22
… … …
directory
37
Disk Layout Strategies
Files span multiple disk blocks
How do you find all of the blocks for a file?
1. Contiguous allocation
Like memory
Fast, simplifies directory access
Inflexible, causes fragmentation, needs compaction 2. Linked, or chained, structure
Each block points to the next, directory points to the first
Good for sequential access, bad for all others 3. Indexed structure (indirection, hierarchy)
An “index block” contains pointers to many other blocks
Handles random better, still good for sequential
May need multiple index blocks (linked together)
Indexed Allocation: Unix Inodes
Unix inodes implement an indexed structure for files
Each file is represented by an inode
Each inode contains 15 block pointers
First 12 are direct block pointers (e.g., 4 KB data blocks)
Then single, double, and triple indirect
… 0
12
1 …
39
Unix Inodes and Path Search
Unix Inodes are not directories
They describe where on the disk the blocks for a file are placed
Directories are files, so inodes also describe where the blocks for directories are placed on the disk
Directory entries map file names to inodes
To open “/one”, use Master Block to find inode for “/” on disk and read inode into memory
inode allows us to find data block for directory “/”
Read “/”, look for entry for “one”
This entry locates the inode for “one”
Read the inode for “one” into memory
The inode says where first data block is on disk
Read that block into memory to access the data in the file
Data and Inode Placement
Original Unix FS had two placement problems:
1. Data blocks allocated randomly in aging file systems
Blocks for the same file allocated sequentially when FS is new
As FS “ages” and fills, need to allocate into blocks freed up when other files are deleted
Problem: Deleted files essentially randomly placed
So, blocks for new files become scattered across the disk
2. Inodes allocated far from blocks
All inodes at beginning of disk, far from data
Traversing file name paths, manipulating files, directories requires going back and forth from inodes to data blocks
Both of these problems generate many long seeks
41
Cylinder Groups
BSD Fast File System (FFS) addressed placement problems using the notion of a cylinder group (aka allocation groups in lots of modern FS’s)
Disk partitioned into groups of cylinders
Data blocks in same file allocated in same cylinder group
Files in same directory allocated in same cylinder group
Inodes for files allocated in same cylinder group as file data blocks
Superblock
Cylinder group organization
Cylinder Group
More FFS solutions
Small blocks (1K) in orig. Unix FS caused 2 problems:
Low bandwidth utilization
Small max file size (function of block size)
=> fix using a larger block (4K)
Problem: Media failures
Replicate master block (superblock)
Problem: Device oblivious
Parameterize according to device characteristics
43
File Buffer Cache
Applications exhibit significant locality for reading and writing files
Idea: Cache file blocks in memory to capture locality
This is called the file buffer cache
Cache is system wide, used and shared by all processes
Reading from the cache makes a disk perform like memory
Even a 4 MB cache can be very effective
Issues
The file buffer cache competes with VM (tradeoff here)
Like VM, it has limited size
Need replacement algorithms
Read Ahead
Many file systems implement “read ahead”
FS predicts that the process will request next block
FS goes ahead and requests it from the disk
This can happen while the process is computing on previous block
Overlap I/O with execution
When the process requests block, it will be in cache
Compliments the on-disk cache, which also is doing read ahead
For sequentially accessed files, can be a big win
45
Caching Writes
On a write, some applications assume that data
makes it through the buffer cache and onto the disk
As a result, writes are often slow even with caching
Several ways to compensate for this
“write-behind”
Maintain a queue of uncommitted blocks
Periodically flush the queue to disk
Unreliable
Battery backed-up RAM (NVRAM)
As with write-behind, but maintain queue in NVRAM
Expensive
Log-structured file system
Always write contiguously at end of previous write
Remainder of the course
Other optimizations:
Other file system designs: log-structured, journaling
Devices: Hard disks & Solid state drives
Reliability & fault tolerance
Performance modeling
Distributed file systems: Google & Netapp
Parallel file systems: GPFS & PanFS
Storage for data-intensive computing