DIY B ild Y
O
P i t
DIY: Build Your Own Private or
Hybrid Storage Cloud
yb d Sto age C oud
Howard Marks Chief Scientist
Where I Come From
• 25 years as a consultant in the midmarket F th h l f b th d th t • From the school of been there, done that, broke that too • Reviewing products for magazines since ‘87 • Concentrating on storage/servers this century • Now running independent test lab/analyst firm • Twitter @DeepStoragenet [email protected]As If You Didn’t Know
•
We’re drowning in data
•
We’re drowning in data
– Standard file systems breaking down – Too Small • 16TB‐100TB A f illi fil • A few million files • 255 byte pathStorage is Supposed to Be Getting
Cheaper!
Cheaper!
• Disk cost is droppingDisk cost is dropping Storage Is No It’s rapidly • $250 buys: Storage Is Cheap! No It s Not! $ y – 1994: 2 GB – 1999: 20 GB – 2004: 200 GB – 2009: 2000 GB • But enterprise storage costs keep rising! 4
What is Cloud Storage
• Well it’s what public cloud providers sell I fi i l El i – Infinitely Elastic – Relatively low cost / / • S3 ~15¢/GB/Mo – Typically object interface ibl – Internet accessible – Multi‐tenant – You don’t have to manage it!So What’s a Private Storage Cloud
• Massively scalable N ll l i – Not really elastic • Step function for growth • Can’t shrink • Can t shrink • Data protection by policy l l i i – Fault tolerance, copies, retention, etc. – Includes location • Store 2 copies in location 1 plus 1 in each of 3 other locationsWhat’s Cloud Storage Good For?
• Reduced TCO through reduced management I l d d d b k – Includes reduced backup • Large data stores • Low change rates – Especially of individual objectsp y j • Not latency sensitive• Archives rich data stores etc • Archives, rich data stores, etc.
An Advertising Agency
• Digital Asset Management System H ld d ( i d id ) – Holds ads (print and video) • Accessed by creative types in NY, Chicago, LA, SF • Metadata and index in each office • Media on cloud storage– Public cloudPublic cloud
Public or Private
• Public cloud more elastic and pay as you go Si lifi lti it • Simplifies multi‐site access • But: – Network adds latency – Security concerns • But you could encrypt data before it leaves – Control concerns – International issues like privacy/Patriot ActTypical Cloud Storage Architecture
• Redundant Array of
I d d N d (RAIN)
Independent Nodes (RAIN)
– All peers or
Access nodes and storage nodes – Access nodes and storage nodes
• Low cost x86 servers • Low cost SATA DAS • Low cost SATA DAS • Object storage
C ld i l d NAS
Which is Your Idea of Low Cost
Storage?
Storage?
Backblaze Storage Pod CLARiiON Full of 1TB Drives Backblaze Storage Pod CLARiiON Full of 1TB Drives
The Difference is Philosophy
p y
• Enterprise systems have high subsystem li bili reliability – Use redundant components to make sub‐systems f l l fault tolerant – Subsystem failure creates a crisis • Cloud systems accept node failures – Reliability comes from software and redundant dataFile Systems and Object Stores
y
j
File System Object Store • Limits: – Disk Capacity (16‐100TB) – Path (255 char) • Store/retrieve file by URI/URL• Usually has extended
Path (255 char) – Number of files – Metadata • Usually has extended metadata: – Retention • Syntax: – Open(file) – Lock(2343,100) – Protection policy • No limits – Path depth ( , ) – Write(2343,”hello” – Close(file) – Path depth – Files/folder – Total files
Object Access
j
• Object stores use HTTP derived syntax P bj – Put object to save – Get object to read • New object replaces old one • API usually based on SOAP or REST • Content Addressable Storage– Special case of object store where URI=data hashSpecial case of object store where URI=data hash – Inherent single instance storage
Data Protection Methods
Data Protection Methods
• Conventional RAIDConventional RAID
• Object replication/dispersal Obj li i i h
• Object replication with RAID • Erasure and dispersal codes
Erasure Codes
Erasure Codes
• Beyond RAID for protection and integrityBeyond RAID for protection and integrity • Usually based on Reed‐Solomon math C id hi h i l • Can provide higher protection at lower overhead – EG: Survive 4 drive failures w/25% overhead • Dispersal codes add location
How Erasure Codes Work
• Breaks data into blocks
Bl k t i d t d f d
• Blocks contain data and forward correction codes
• System can return data from x of ySystem can return data from x of y blocks • If 10 of 15 any 10 blocks can y reconstruct data • Store each block on different disk/node • Store 5 blocks each in 3 locations
Private Cloud Storage Products
g
• EMC Atmos
• Caringo Castor
– SW only powers Dell Dx6000
• Hitachi Content Platform • NetApp Storage GridNetApp Storage Grid
– Was Bycast
• DDN Web Object Store • DDN Web Object Store
Erasure Code Based Systems
y
• OceanStore
– Academic project at BerkeleyAcademic project at Berkeley
• Cleversafe
– RAIN dispersal systemRAIN dispersal system
• Amplidata
– RAIN dispersal system w/Atom storage nodesp y / g
• NEC HYDRAstor
– Deduplicating grid for backup/archivep g g p – No dispersal
Hybrid Cloud Options
y
p
• Any combination of on‐premise and internet storage could be called hybrid cloud
storage could be called hybrid cloud • Models:
Cluster on premise replicates to public provider – Cluster on‐premise replicates to public provider
• Atmos to Atmos (AT&T Synaptic Cloud)
– On‐premise replicates to colocation clusterOn premise replicates to colocation cluster – Gateway/archiving system writes to both
• Also used for dedicated infrastructure byAlso used for dedicated infrastructure by public provider
Clustered NAS
• Cluster of NAS heads w/shared storageg • Can have file/policy based replication Etc. • Can scale to 10+TB • Low latency/higher performance E l • Examples: – IBM SONAS– Symantec FileStoreSymantec FileStore – Gluster