EMC Data de-duplication
not ONLY for IBM i
EMC’s focus is
IT
Infrastructure
EMC is a
TECHNOLOGY
EMC Portfolio
Cloud Infrastructure and Services2004
2005
2006
2007
2008
2009/2010
2003
Services Virtualization/ Data Mobility Resource Management Content Management Availability/ Archiving Consumer/ Small BusinessDocumentum Ask Once Document Sciences
X-Hive Rainfinity ProActivity Acartus Captiva VMware Akimbi Illuminator Indigo Stone Dolphin Interlink Internosis
Astrum Smarts nlayers Voyence
BusinessEdge Geniant
Infra
Legato
Avamar
KashyaDantz Mozy Pi Iomega
WysDM
ConchangoData Domain
FastScale ConfigureSoft Information Security Authentica Network Intelligence RSA Valyd Tablus Verid Archer Kazeon Data WarehouseBig Data GreenplumIsilon
Having Great Technology is Not Enough
Backup System Infrastructure
Every backup environment has a bottleneck.
It may be a VERY FAST bottleneck, but it will determine
the maximum throughput obtainable with your
system.
What is deduplication ?
•
Data deduplication (often called "intelligent compression") is a method of reducing
storage needs by eliminating redundant data.
•
Only one unique instance of the data is actually retained on storage media, such as
disk or tape.
•
Redundant data is replaced with a pointer to the unique data copy.
EMC DataDomain Mission …
Make this
Customer Example: 20x Footprint Reduction
One DD System
•
180TB stored
•
8TB of disk used
•
20x Reduction
•
Replicated off-site
Red Line = Amount of data written to Data Domain (virtual storage)
Green Line = Disk Space Consumed (physical storage)
What are the reasons you do not have
de-duplication yet ?
Cost (disk is more expensive than tape)
Flexibility (only SAN support)
Performance (disk I/O bottleneck)
Data Safety & Reliability (only one copy on disk)
Many vendors offer the same – is it true ?
What needs to compared ?
•
Speed (backup/restore)
•
Flexibility (various protocols)
•
Scalability (upgrade options)
•
Supportability (IBM i, Open Systems, Mainframe)
•
Simplicity (management, maintenance)
•
Size (space, de-dupe ratio)
•
Efficient replication (bandwidth reduction)
•
Data Safety (the most important), Encryption
•
Support (good people, local people)
•
Cost (compare all costs)
Why EMC Data Domain for IBM i?
Feature
Benefit
Integrate with Ease and Flexibility Data Domain presents IBM 3584-L32 tape library/libraries and LTO3
(3580-TD3) drives via fibre channel to IBM i hosts
Supportability
BRMS and IBM i Native Commands Support
Retain More Backups with
De-Duplication
As IBM i data is compressible, it is in the de-duplication wheelhouse
Store weeks of full backups on disk in a minimal footprint for rapid
database restores
Recover Data Reliably and
Efficiently
De-duplication with replication drives WAN-efficient disaster recovery
EMC’s Data Domain Data Invulnerability Architecture ensures reliable
recovery – DIA should resonate well with IBM I customers
Improve Performance
Unrivaled 8+ TB/hr aggregate and 1+ TB/hr single-stream, inline
de-duplication
Single-stream throughput capabilities are important to understand when
considering DB2 backups
Data Domain allows for greater parallelization of backup and restores
Simplify Infrastructure
Dedicate Virtual Resource to Each Application – Each LPAR can have
dedicated virtual drives
Deduplication Statistics for IBM i
•
Outliers apply
–
In general the same concepts apply to IBM i environments
as any other environment in terms of data de-duplication.
•
The following de-dupe ratios were discovered during
the test process:
–
Banking: 21×
–
Retail: 24×
–
Shipping: 52×
Performance: CPU-Centric versus Spindle-Bound
Th
rou
gh
pu
t M
B/
s
50
6,000
Number of Disk Spindles
50
100
150
200
Data Domain
Fibre Channel
SATA
Most
deduplication
Price / Performance: CPU-centric wins over time
Source:
http://seagate.com/docs/pdf/whitepaper/economies_capacity_spd_tp.pdf
•
Improve price / performance along with CPUs
•
Keep price competitive with tape automation
•
Alternative
–
Speed through spindle count
–
Huge amounts of wasted disk space
Data Domain – Data Flow
Appliance-based
Disk Systems
#
#
#
#
#
#
#
#
#
#
Hash Table / Previous Stored Versions
Data Domain Core Focus
Speed
SISL
(Stream Informed
Segment Layout)
Deduplication
Storage
Stream Informed Segment Layout (SISL)
SISL
Summary Vector
Memory-based structure to
help quickly identify new segments
Segment Locality
Data layout to maximize probability
of locating duplicates
Data Invulnerability Architecture (DIA)
Four key elements of the Data Domain Data Invulnerability Architecture:
•
End-to-end verification
•
Fault avoidance and containment
•
Continuous fault detection and healing
•
File system recoverability
Data Domain Basics
Easy Integration with Existing Environments
Replication
1 – CIFS
2 – NFS
3 – NDMP
4 – OST
5 – DD Boost
DD890 Appliance
Control Tier
Target Tier
DR Tier
6 – VTL
LAN
SAN
WAN
Backup
Backup
Backup & Archive
Applications
DD890 Appliance
10 and 1 Gb Ethernet; 4 and 8 Gb Fibre Channel
Up to 285 TB usable capacity with disk shelves
Deduplicating file system
Industry’s Most Scalable Inline Deduplication
Systems
DD140
DD610
DD630
DD670
DD860
DD890
Global
Deduplication Array
DD Archiver
Speed (DD Boost) 490 GB/hr
1.3 TB/hr
2.1 TB/hr
5.4 TB/hr
9.8 TB/hr
14.7 TB/hr
26.3 TB/hr
9.8 TB/hr
Speed (other)
450 GB/hr
675 GB/hr
1.1 TB/hr
3.6 TB/hr
5.1 TB/hr
8.1 TB/hr
10.7 TB/hr
4.3 TB/hr
Logical capacity
9–43 TB
40–195 TB
84–420 TB
0.6–2.7 PB
1.4–7.1 PB
2.9–14.2 PB
5.7–28.5 PB
5.7–28.5 PB
Raw capacity
1.5 TB
Up to 6 TB
Up to 12 TB
Up to 76 TB
Up to 192 TB
Up to 384 TB
Up to 768 TB
Up to 768 TB
Usable capacity
0.86 TB
Up to 3.98 TB
Up to 8.4 TB
Up to 55.9 TB
Up to 142 TB
Up to 285 TB
Up to 570 TB
Up to 570 TB
Software options:
DD Boost, DD Virtual Tape Library, DD Replicator,
DD Retention Lock, and DD Encryption
Replication Topologies
Entire “Collection”
Source
Destination
BOOST Backup Image
Multi-Site Protection for Remote Office
Remote Sites
Data Center Hub
1-5%
1-5%
1-5%
Archive Data
Backup Data
Data Domain System
Home
DB
WAN
Home
DIR A
Data Domain System
Data Domain System
DB
DD Encryption Software
Industry’s first encryption of deduplicated data at rest
•
Protects against loss of disk or system
–
Inline encryption provides immediate
protection while preserving deduplication
–
Works with all protocols and applications
•
Uses RSA BSAFE® FIPS 140-2 validated
cryptographic libraries
•
Replicate encrypted data
•
Security officer role for dual authentication
–
Requires one admin user and one security
officer role user for lock, passphrase, and
disable functions
Inline: deduplication and
encryption before storing
DD Boost Software
•
Distributes parts of deduplication process to backup server
•
Supports majority of backup software market
–
Symantec NetBackup and Backup Exec
–
EMC NetWorker
•
Speeds backups by up to 50%
•
Process more backups with existing resources
–
20–40% less overall impact to backup server
–
80–99% less LAN bandwidth
•
Enables Data Domain replication management from the backup
application
Data Domain Archiver
Data Domain Controller
Active Tier
Archive Tier
Backups