EMC DATA DOMAIN
OVERVIEW
ATEA Tromsö
16 November 2010
Peter Karlsson
EMC Backup Recovery Systems Division
•
Division HQ: Santa Clara, CA
•
10 R&D locations
–
1,800 employees
•
Solutions
–
DataDomain, EDL, Avamar, NetWorker, DPA
•
Data protection storage systems
–
> 50,000 systems installed
–
> 40,000 customers
–
> 13,000 petabytes under protection
•
Global sales, support and services
EMC Data Domain:
Leadership and Innovation
•
Deduplication storage systems
–
More than 12,000 systems installed
–
More than 4,300 customers
–
More than 2,600 PB under Data Domain protection worldwide
•
A history of industry firsts
First Deduplication
NAS
First Deduplication
Volume Replication
Largest
Deduplication
Array
First Deduplication
Directory Replication
First Deduplication
Virtual Tape Library
Backup Redesign is Hot
0%
10%
20%
30%
40%
50%
Green Storage
Securing Storage
New Data Center
Thin Provisioning
Data Migration
Expanding Replication
Disaster Recovery
Improving Forecasting
Archiving
Improving Performance
Virtualization Adoption
Backup Redesign
Technology Refresh
Consolidation
Tiered Storage Build Out
Wave 9 Wave 10 Wave 11 Wave 12 Wave 13
What are your top
storage initiatives?
% Respondents
What Is Causing Companies to Redesign
Their Backup & Recovery Environments?
•
Data growth
•
Server virtualization
Digital Information Created and Replicated Worldwide
2,500
2,000
1,500
1,000
500
0
E
x
ab
y
tes
2012
2011
2010
2009
2008
Source: IDC Digital Universe white paper, sponsored by EMC, May 2009
5
-FOLD Growth
in
4
YEARS
Virtual Server A Virtual Server B Virtual Server COld Paradigm
Physical Environment: Low overall
server utilization and plenty of bandwidth for backup
New Paradigm
Virtual Environment: High overall
server utilization and little bandwidth for backup
20 percent resource utilization 80 percent resource utilization 100% 80% 40% 0% 60% 20% CP U Ut il iza tio n 100% 80% 40% 0% 60% 20% CP U Ut il iza tio n Server A Server B Server C ESXServer Hardware
Impact of VMware Adoption
Phase 1
IT Production
Cost Efficiency
Phase 2
Business Production
Quality of Service
Phase 3
ITaaS
Business Agility
15%
30%
70%
85%
Virtual Environment
Phy
sical
Env
ironm
en
t
0%
50%
100%
100%
50%
0%
Increasing Backup & Recovery Needs
A Proactive Approach Is Needed
Phase1
Phase2
Phase3
Reactive Approach
Phase1
Phase2
Phase3
Deduplication Dramatically Reduces
Storage Capacity Requirements
Deduplication
10–30 times less data stored versus fulls + incrementals with typical retention policies
With Data Domain Deduplication Storage
Systems, You Can…
Retain longer
Keep backups onsite longer with less disk
for fast, reliable restores, and eliminate the
use of tape for operational recovery
Replicate smarter
Move only deduplicated data over existing
networks with up to 99% bandwidth
efficiency for cost-effective disaster recovery
Recover reliably
Continuous fault detection and self-healing
ensure data recoverability to meet service
level agreements
Data Domain Basics
Easy integration with existing environment
Replication
CIFS, NFS,
NDMP, DD Boost
Ethernet
Virtual Tape
Library (VTL) over
Fibre Channel
DD880 appliance
Control Tier
Target Tier
Disaster Recovery Tier
4U
2 to 6 ports
10 and 1 Gigabit
Ethernet; 8 Gb/s Fibre Channel
RAID 6
5.4 TB to 142.5 TB usable capacity with shelves
2 TB or 1 TB 7.2k rpm SATA HDD in shelf
File system
NVRAM
N+1 fans and redundant, hot-plug power supplies
DD880 appliance
Data Domain Infrastructure and
Ecosystem
It works with what you have
VMware
Microsoft
Microsoft SharePoint
Oracle
SAP
Backup
Midrange and
Mainframe Partners
BusTech
LaserVault
Luminex
Archive
NAS, SAN, DAS
Second Friday Full Backup
B C D E F L G H
Data Deduplication: Technology
Overview
Store more backups in a smaller footprint
A B C D E F G H I J
Friday Full Backup
A B C D A E F G
Mon Incremental
A
B
H
Tues Incremental
C
B
I
Thurs Incremental
A
C
K
Weds Incremental
E
G
J
Backup
Estimated
Data
Logical Reduction
Physical
Monday Incremental
100 GB
7–10x
10 GB
Tuesday Incremental
100 GB
7–10x
10 GB
K L
Wednesday Incremental 100 GB
7–10x
10 GB
Thursday Incremental
100 GB
7–10x
10 GB
Second FRIDAY FULL
1 TB
50–60x 18 GB
TOTAL
2.4 TB
7.8x
308 GB
Retain: Store More for Longer with Less
Week 1
Backup
Cumulative
Estimated
Physical
Data
Logical
Reduction
Data Integrity:
Data Invulnerability Architecture
Trust but verify—‖hope‖ is not a strategy
Other
RAID 6
NVRAM
Snapshots
Data verification
Checksum
Deduplication, write to disk
Verify
Self-healing file system
Cleaning
Expired data
Defrag
Verify
Global Compression
Local Compression
RAID
File System
Generate
Checksum
Verify
Data
Verify the file
system metadata
integrity
Verify user data
integrity
Network-Efficient Replication for True
Disaster Recovery
Lowers WAN costs; improves service level agreements
Source:
Remote sites
Data Center Hub
Destination:
Supports hundreds
1–5%
1–5%
1–5%
Archive data
Backup data
Data Domain DDX Array
with DD880s
Data Domain system
Flexible replication
One-to-many
Many-to-one
Bi-directional
System-to-
system
Cascaded
Home
DB
WAN
Home
DIR A
95–99% cross-site bandwidth reduction
Industry’s Most Scalable Inline
Deduplication Systems
DDX Array Series
Software options:
DD Boost, DD Virtual Tape
Library,
DD Replicator, DD Retention
Lock,
and DD Encryption
Up to 16 Controllers
DD140 Remote
Office Appliance
DD600 Appliance Series
DD880
Global Deduplication Array
REPLICATE AFTER
DE-DUPLICATION
Backup de-duplication
Is Data Deduplication a Good Thing?
• Without De-Duplication
No reduction in local backup
storage
No reduction in replication time
and bandwidth
No reduction in offsite storage
• Leveraging De-Duplication
Reduced local backup storage
Reduced replication time and
bandwidth
Reduced offsite storage
Methodology:
Inline vs. Post-Process Deduplication
Post- Process:
Deduplication After Storing
The more processes, the more resource
contention
−
Copy to tape: Too slow to stream tape
−
Recovery: Service level agreement predictability
−
Replication: Poor time-to-disaster recovery
−
Deduplication : If interleaved with backup or
restore
More administration
to fight these issues
Deduplication
Store
3x disk accesses
to shared store
Other activities unimpeded
−
Predictable
−
Simpler
Inline:
What about Disaster Recovery?
•
Post-process: DR restore point is usually obsolete
Replicate during backup
Store to cache
dedupe
replicate
DR-Ready
Data Domain
(Inline)
Post-Process
TIME
Data is only safe when safely off-site
Enterprise Recoverability Readiness
at Disaster Recovery Site
Data Domain
Inline
Deduplicated
Replication
Disaster recovery (DR)-readyReplicate during backup
―Adaptive‖
Post-process
Deduplicated
Replication
Backup to Cache Backup time 1.7 times longer than Data Domain
DR-ready
Deduplicate and replicate <50% ingest speed—two times longer if uncompressed at fixed bandwidth
―Scheduled‖
Post-process
Deduplicated
Replication
Backup to Cache Backup time 1.1x longer than Data Domain
DR-ready
Deduplicate and replicate <50% ingest speed—two times longer if uncompressed at fixed bandwidth
VTL/Tape/Truck
Backup to VTL
?
Copy to tape
Truck to storage
Truck from storage
Performance: CPU-Centric vs. Spindle-Bound
T
hroughpu
t
M
B/
s
50
1,500
Number of Disk Spindles
50
100
150
200
Data Domain
Fibre Channel
SATA
Most
deduplication
Scalability:
Data Domain Systems Trajectory
Data Domain SISL Scaling Architecture: CPU-Centric
T
hroughpu
t
GB/
s
Addressable capacity in terabytes post-RAID (physical)
1.25
70
> PB
1.5
0.04
5
3
DD880, July 2009
Industry’s fastest backup
storage controller
Multi-controller
Why Data Domain?
•
Less disk to resource, less to manage
–
CPU-centric deduplication
–
Inline
–
Green
•
Simple, mature, and flexible
–
Simple, mature appliance
–
Nearline tier: any fabric, any software, backup or
nearline applications: data center or remote office
•
Resilience and disaster recovery
–
Storage of last resort
EMC’s Information Infrastructure Portfolio
Backup and Archive Platforms
Storage Platforms
Virtualization and Connectivity
Storage Software
Backup/Recovery: NetWorker • Avamar • Data Protection
Advisor
Replication: Local • Remote • Multi-site • CDP
Security: RSA • enVision • Encryption • IPv6
Manage: Ionix ControlCenter • Virtual Provisioning
Automate: PowerPath • Ionix for IT Operations Intelligence • FAST
Mobility: Virtual LUN • SAN Copy • VPLEX
Avamar
Celerra
Connectrix
VPLEX
Disk Library
Atmos
Cloud optimized storageIomega
File Management
Appliance
RecoverPoint
CDP and CRRDistributed Federation SAN connectivity
Data Domain
Gen 2 Data Store Avamar Virtual Edition for VMware Avamar VMSymmetrix
DMX-4 V-Max FlashCLARiiON
CX4 AX4 FlashCentera
Gen 4LP Node Consumer and SMB storage Flash