Presented by:
A Primer on
Object Storage, Cloud
Storage, and High Capacity
File Systems
Chris Robertson
Sr. Solution Architect
Cambridge Computer
2
A primer on Object Storage, Cloud Storage, and High Capacity File Systems
Merit 2012
About Your Lecturer: Chris Robertson
SA at Cambridge Computer
• 25% of my time I do what industry analysts do
• 75% of my time is client-facing, solving problems and reconciling
to budgets
Cambridge Computer
• Expertise in storage networking, data protection, and data life
cycle management
• Founded in 1991
• Based in Boston with regional teams spread around the country
• Unique business model with no costs or commitments to our
clients (ask us how this is possible)
• Clients of all shapes and sizes
– Museums, K12, Defense Contractors, Banks, etc.
– Everyone has data. No one wants to lose it!
A Unique Business Model: Combining
the Best of All Worlds. . .
4
A primer on Object Storage, Cloud Storage, and High Capacity File Systems
Merit 2012
What is Cloud Storage?
“The Cloud” has the same challenges that any other
enterprise has
• The design challenges of cloud storage are relevant to private
users with large private data collections.
Cloud storage has three major incarnations
• Enterprise storage for applications that are hosted in the cloud
– Dynamic provisioning of storage with careful attention to balancing
capacity and performance.
• Hosted backups
– Granular / efficient backups with backups with data automatically
stored off site
• Redundant object storage
– Geographically dispersed, redundant storage for data that does not
change much.
A Typical Cloud Service: 3 Copies of
Each Object Stored Somewhere
6
A primer on Object Storage, Cloud Storage, and High Capacity File Systems
Merit 2012
The Cloud is Accessed Through SOAP/
REST Software Interface
Dedicated appliance
and/or software app
“On-Ramp”
SOAP/REST
interface
File or block
interface
8
A primer on Object Storage, Cloud Storage, and High Capacity File Systems
Merit 2012
Traditional Storage Models Don’t
Scale
Data accumulates over time
• If your primary storage capacity doubles, then
BOTH
the
CAPACITY
and the
SPEED
of your backup system must
double.
• Backups take too long. Restores take too long.
Storage devices become
BRITTLE
as they get bigger
and bigger
• The bigger they are, the harder they fall
Wholesale data migration between storage devices is
impractical. Massive storage systems must allow for
in place upgrades.
Moving a PB is Heavy Lifting
Data Rate
Example
Total Time
(Approximate)
140MB/Sec
LTO-5 tape drive at full tilt
without factoring in
compression
82.5 days
1GB/Sec
A beefy Virtual Tape Library
A dedicated 10Gb Ethernet
11 days
1.5mb/Sec
A dedicated T-1
176 years
156mb/Sec
An OC3
640 days
10
Bigger Hard Drives: Friend or Foe?
The Good News: As drives grow bigger we can achieve
more capacity with fewer devices
• Fewer devices = higher density, lower power consumption, fewer
device failures
The Bad News
• MTBF not growing as fast
• Bandwidth into device not growing as fast
• Consequences
– Unreliability (per bit) growing
– Accessibility of data (per bit) shrinking
– Drive rebuild times are longer, which increases overall risk of data loss
12
A primer on Object Storage, Cloud Storage, and High Capacity File Systems
Merit 2012
RAID Rebuilds Take Too Long
RAID 5 rebuilds take too long
• On the order of 36 hours per TB
• 4TB drive could take a week to rebuild
RAID 6 (double parity offers some protection)
• But what happens when we have 8TB drives?
The more stuff you have the higher the chance of
failures.
Redundancy Between Cabinets: Can
You Have Too Much Redundancy?
Is this really a good
idea?
How long will it take to
re-mirror a 14TB RAID 6
stripe?
Is there a better way to
protect against a device
failure?
• Replication?
• Backup?
• Mirroring at a different
level of abstraction?
14
A primer on Object Storage, Cloud Storage, and High Capacity File Systems
Merit 2012
How Big is the Building Block?
What Are You Building? What Size Building Block?
An outhouse?
Brick
The foundation for a new
house?
Cinder Block
A pyramid?
Boulder
Object Storage – More than
Just the Cloud
16
A primer on Object Storage, Cloud Storage, and High Capacity File Systems
Merit 2012
Objects Represent a Different Way to
Address Data
Block
Blocks are addressed by Device ID and sequential block
number.
File
Files are addressed by UNC paths:
\\MyServer\MyFolder\MyFile.doc
Object
Objects are addressed by an ID that is unique to the
storage system.
- Sequentially assigned number
- Randomly assigned number
- A hash derived as a function of the objects content
- A combination of things
What is an “Object”
An object is a chunk of data that can be individually addressed and
manipulated
• A file is a chunk of data
– A zip file containing many files is a chunk of data
• A file can be made up of several chunks of data
• A block is a chunk of data
• A volume (a range of blocks) is made up of chunks of data
• Pages, extents, chunks, chunklets are objects consisting of multiple blocks
Email?
• An email message is a chunk of data
• An email attachment is a chunk of data
• An email message along with its attachments could be treated as a single
chunk of data.
Often objects have associated metadata
• Descriptive information or tags
• Provenance
18
A primer on Object Storage, Cloud Storage, and High Capacity File Systems
Merit 2012
Content Addressing
Content addressing calculates a hash of the data that
makes up the object and uses the hash as an address
• Locality independence
– An object can live in multiple location for:
• Redundancy
• Parallelism
• Local processing affinity
• Data integrity
– The object can be compared against its hash for integrity checking
– If the hash test fails, simply retrieve a copy of the object and repair the
corrupt object
• Deduplication
Self Healing and Data
20
A primer on Object Storage, Cloud Storage, and High Capacity File Systems
Merit 2012
Basic Object-Level Redundancy: An
Alternative to RAID and Mirroring
Redundant Objects Propagate on
Device Failure
22
A primer on Object Storage, Cloud Storage, and High Capacity File Systems
Merit 2012
Erasure-Coded Data Protection: An
Alternative to Parity-Based RAID
24
A primer on Object Storage, Cloud Storage, and High Capacity File Systems
Merit 2012
You Can Lose X% of Your Storage
Without Losing Data
Some Real-World Examples of
Object-Based Storage
26
A primer on Object Storage, Cloud Storage, and High Capacity File Systems
Merit 2012
Splitting SAN I/O into a Block Stream
and an Object Stream
Object-Based File System with Erasure
Coding and Global Dedupe
28
A primer on Object Storage, Cloud Storage, and High Capacity File Systems
Merit 2012
SharePoint with External Blob Storage
Gateway of some sort
Shared File System Leveraging a
Cloud-based Object Store
30
A primer on Object Storage, Cloud Storage, and High Capacity File Systems
Merit 2012
Object-Based Archive File System:
Automatic Back up to Tape
The Mwah Hah Hah Plan to Conquer
the World
32
A primer on Object Storage, Cloud Storage, and High Capacity File Systems