Trends in Enterprise Backup
1 © Copyright 2009 EMC Corporation. All rights reserved.
Shankar Balasubramanian
Architect, EMC
Outline
Protection Storage
Deduplication Basics
Data Protection Issues
Tapes are archaic
– Unreliable
– Inflexible (needs to be streamed)
– Requires manual intervention for Disaster Recovery
– Restore performance is abysmal (no random access)
Backup is too slow
3 © Copyright 2009 EMC Corporation. All rights reserved.
– Traditional architecture moves all the data repeatedly
Full restore is too slow
– Primary storage not that good at writes, esp. small files
– Must restore all data before any is available
“Introducing disk based protection storage based on
deduplication
Protection Storage vs. Primary Storage
Primary Storage
Protection Storage
Workload
Continuous random accesses Mostly reads
Lots of meta-data accesses
Large batches of accesses Mostly writes
Few meta-data accesses
Cost
Dollars / IOPS Dollars / TB5 © Copyright 2009 EMC Corporation. All rights reserved.
Cost
Dollars / IOPS Dollars / TBPerformance
Latency and throughput, IOPS Sequential throughputProtection Storage Features
Protection Storage
Cost
Global CompressionTM for Data Reduction
• Inline Deduplication
• Local compression (Lempel-Ziv, GZ, GZ fast style)
Performance
Excellent sequential throughputPerformance
Excellent sequential throughputOnly fair random access and small file performanceDeduplication Backup Storage for D2D+DR
Backup/ media serversWAN
Onsite Retention Storage Offsite Disaster Recovery Storage Backup Clients 7 © Copyright 2009 EMC Corporation. All rights reserved.Confidential 7
Retention/ Restore
servers Storage Storage
Backup Replication DR
Deduplicating storage systems take role of tape libraries
– Plug/play w/ standard backup software
Replication of reduced data for WAN Vaulting
– Recover locally or remotely
Appliance packaging with options:
– Storage system with controller, firmware and disks
Deduplication techniques
Regular Storage Array 1:1
Single Instance Storage LZ Compression
~ 2:1 Whitespace
Reduction
9 © Copyright 2009 EMC Corporation. All rights reserved.
Single Instance Storage ~ 3:1 File Level Fixed Block ~ 3:1 Fixed Blocks, Snapshots Backup Target, Variable Segment Variable Segment ~ 20:1 Deduplication Significantly Reduces - Replication WAN Bandwidth - Power
How Deduplication Works
A
B
C
F
D
E
A
B
C
B
D
A
B
E
file system /VTL backup /archive SW
1
stfull backup
1
stincrement
2
ndfull backup
data stream
A B C D E F
unique variable segments (4KB-12KB)
redundant data segments
Segment Data (from a backup stream) sliced into segments
Fingerprint Fingerprints for segments are computed
Filter
Fingerprints compared to fingerprints in summary vector and cache
1. If fingerprint is new, continue
2. If fingerprint is duplicate, reference, then drop duplicate segments
Compress Groups of new segments compressed using lz, gz, or gzfast
Write Segments and metadata written to containers, containers written to disk
1
1
2
2
3
3
4
4
5
5
Deduplication work flow
11 © Copyright 2009 EMC Corporation. All rights reserved.
1
1
~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~ ~~~~~2
2
4
4
disk diskcontainer
3
3
5
5
Data Domain system
Write Segments and metadata written to containers, containers written to disk
Compression Effect
D
a
ta
S
to
re
d
i
n
T
B
Traditional
Storage
Capacity
Disk Savings
10
20
30
First full backup: 2-4x data reduction
File-level incrementals: 5-10x
Weeks in use
D
a
ta
S
to
re
d
i
n
T
B
Capacity
Optimized
1 TB data set
1 full + 6 incr / week
0
CPU-centric Deduplication: SISL
(Stream-informed Segment Layout)
Deduplication at backup / recovery speeds
Deduplication is done by looking for matching fingerprints
– 1TB physical storage will have 125 million fingerprints
– We cannot store all the fingerprints in memory as that will need too much memory
– Caching the fingerprints will not work because fingerprints are random
– We cannot read the disks directly as that will not need too many disks to be read in
parallel
DD answer: SISL
15 © Copyright 2009 EMC Corporation. All rights reserved.
DD answer: SISL
– Stream-informed Segment Layout; includes:
– Summary Vector in RAM says if segment is new
Set bits in SV in RAM for each segment stored
Summary Vector (Bloom Filter) for new
segments
Segment Localities
abcd
A
B
C
D
efgh
E
F
G
H
ijkl
I
J
K
L
. . .
stuv
S
T
U
V
Metadata
Segment
data
DDFS
log
structure
17 © Copyright 2009 EMC Corporation. All rights reserved.Localities
– Stream-informed storage units
– Neighboring unique segments stored together
– Fingerprints and segments stored together with other metadata
One seek can retrieve hundreds into RAM
– Fast caching for fingerprint lookup
Data Invulnerability Architecture
Designed from the ground up for data protection
– File system simplicity & resiliency
Five lines of defense against data loss
– End-to-end verification
– Fault avoidance and containment
19 © Copyright 2009 EMC Corporation. All rights reserved.
– Continuous fault detection and healing
End-to-End Verification at Backup Time
DDOS tests recoverability
asynchronously after backups
– File system consistency
– Data integrity on disk
Primary storage can’t verify after
write
– It would be too slow
– Primary storage discovers problems
during restore
Fault Avoidance and Containment
Custom log-structured file system architecture
Localities
21 © Copyright 2009 EMC Corporation. All rights reserved.
Custom log-structured file system architecture
New data never overwrites good data
–
Previous backups are not at risk
Fewer complex data structures mean fewer bugs
–
No bitmaps and link counts to corrupt
NVRAM for fast, safe restart
DD-RAID does no partial-stripe writes
Continuous Fault Detection and Healing
DD-RAID 6
– Protection against
Two disk failures
Disk read errors during reconstruction
Operator pulling the wrong disk
– Verifies data integrity and stripe
coherency after writes
On-the-fly Error Detection and
Correction
– All on-disk structures covered by strong
checksums
– Data correctness verified on every disk
read
– Data errors corrected automatically from
File System Recoverability
Self-describing data format
– Metadata structures rebuildable from Locality log
File system check (FSCK), if needed, is fast
– Checks and repairs done on de-duplicated data
Run checks on 4 TB of data, not 80 TB
– No overwrite means it’s safe to bring system back on line
23 © Copyright 2009 EMC Corporation. All rights reserved.
Summary
Data Protection needs to change
– Eliminate Tape, backup should be fast, restore should be easy
Data Domain core technology can eliminate tape
– Backup to disk at the cost of tape
– Keep data safe despite reduced copies