EMC Disk Library with EMC Data Domain Deployment Scenario
Best Practices Planning
Copyright © 2010 EMC Corporation. All rights reserved.
EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com All other trademarks used herein are the property of their respective owners.
Part Number h6925
Table of Contents
Executive summary ...4
Introduction...4
Audience ... 4
Terminology ... 4
Disk Library and Data Domain deployment overview...6
Supported environments... 7
Supported Data Domain systems ... 7
Supported EMC Disk Library systems ... 7
Physical connectivity... 7
Sizing the Disk Library with the Data Domain system ... 8
Conclusion ... 12
References ... 12
Executive summary
Today’s IT environments are faced with the combination of data growth and shrinking backup windows.
Restore time objectives (RTOs) and restore point objectives (RPOs) are also becoming more stringent, increasing the importance of a highly reliable, high-performance backup environment.
As a complement to tape for long-term, offsite storage, backup-to-disk and the EMC® Disk Library products have emerged as powerful solutions. Customers seeking the advanced virtual tape library (VTL) functionality of the Disk Library as well as the ROI benefits of deduplication can leverage a Disk Library deployment with Data Domain. This enables customers to move data to Data Domain deduplication storage systems for longer-term retention of data and network-efficient replication.
Replication of deduplicated backup data is supported with Data Domain deduplication storage interoperability. The advantages of replicating with Data Domain are that you will be replicating
deduplicated data and significantly reducing bandwidth. Further, Data Domain Replicator software sends only new and unique data segments to the remote location. In addition, all data is verified independently at the remote site after copy.
Data Domain also offers flexible deployment options to meet a broad scope of data protection needs.
Replicator software provides a full range of replication options, including collection, tape pool, many-to- one, bi-directional, and cascaded replication. Data Domain network-efficient replication is an efficient way to satisfy vaulting requirements.
There are three Disk Library software options available to enable deployment with Data Domain:
• Automated Tape Caching
• Embedded EMC NetWorker® storage node
• Embedded Symantec NetBackup media server
Any one of these software options will allow you to copy Disk Library virtual tapes to a Data Domain system.
Introduction
This white paper provides an overview of the best practices involved with the interoperability of the Disk Library and Data Domain deployment scenario.
Audience
This white paper is intended for EMC customers, EMC system engineers, and members of the EMC and partners professional services community who are interested in incorporating EMC Disk Libraries and Data Domain systems into a backup environment.
Terminology
• Automated Tape Caching – Licensable option that allows data to be temporarily stored on the Disk Library. That data is eventually written to back-end physical tape, or a Data Domain appliance, allowing space to be freed up on the Disk Library.
• Deduplication – Process of detecting and identifying the redundant variable-length blocks (or data segments) within a given set of data to eliminate redundancy.
• DD690 – Data Domain DD690 system.
• DD880 – Data Domain DD880 system.
• DL4000 Series – EMC Disk Library 4000 series appliances.
• DL5000 Series – EMC Disk Library 5000 series appliances.
• Embedded media server — A feature available on the Disk Library providing Symantec NetBackup media server functionality embedded within the Disk Library engine(s). This allows for NetBackup environment awareness of duplicate copies of virtual tapes that are exported to a physical library connected to the back end of a Disk Library and controlled by the embedded media server.
• Embedded storage node — A feature available on the Disk Library providing NetWorker storage node functionality embedded within the Disk Library engine. This allows for NetWorker environment awareness of clone copies of virtual tapes that are exported to a physical library connected to the back end of a Disk Library and controlled by the embedded storage node.
• Engine — A Disk Library or Data Domain deduplication appliance server.
• Flex port — Fibre Channel (FC) ports on the Disk Library server that can be configured as either front-end (SAN client) ports or back-end (physical library) ports. Flex ports do not connect to the EMC storage arrays. See also library port.
• Library port — Fibre Channel (FC) ports on the Disk Library server(s) used to connect to a back-end physical library, another Disk Library, or a Data Domain appliance. These ports are also referred to as initiator ports.
• Remote replication – Backup data residing on a Data Domain appliance is copied over a LAN or WAN to another Data Domain appliance in deduplicated form for disaster recovery protection.
• SAN client — A backup server that connects through a FC SAN to a Disk Library.
• SAN client port — FC ports on the Disk Library server used to connect backup servers (clients of the Disk Library). These ports are also referred to as target ports.
• Server — A Disk Library or Data Domain appliance server. Also known as an engine.
• Tape migration – The process of sending data from the Disk Library to the Data Domain system using Automated Tape Caching.
• TLU — Tape library unit, sometimes referred to as a physical library unit (PLU).
• Virtual tape library (VTL) – Software emulation of a physical tape library system.
Disk Library and Data Domain deployment overview
The Disk Library with the Data Domain deployment scenario is a 4000 or 5000 series Disk Library with a Data Domain DD690 or DD880 system as shown in Figure 1. In this deployment scenario, data in the Disk Library virtual tape cartridges is migrated or copied to the Data Domain system where it is deduplicated to remove data redundancies, resulting in longer data retention capability than a stand-alone Disk Library.
The Data Domain system does not need to be dedicated to the Disk Library. While operations are occurring from the Disk Library to the Data Domain system, concurrent NAS or VTL jobs can be occurring in parallel on the Data Domain system.
Figure 1. EMC Disk Library with the Data Domain deployment data flow
The EMC Disk Library:
• Provides significant performance advantages over tape-based solutions since data is written to disk
• Eliminates all single points of failure for a reliable solution with a high availability (HA) design with redundant components and active engine failover
• Presents itself as one of many standard, open-system tape library and tape drive formats to backup applications
The Data Domain deduplication storage system:
• Eliminates redundant data from backups to reduce storage requirements, enabling longer onsite retention, and reduced replication costs
• Performs sub-file, variable block length deduplication as data is ingested into the system
• Includes built-in data compression that is additive to deduplication in the data reduction process
The Data Domain Replicator software option leverages its deduplication and compression capabilities, substantially reducing the amount of backup data that needs to be sent to a remote site. Data Domain Replicator software provides rapid local and remote restore with the following benefits:
• Permits bi-directional replication between Data Domain systems
• Replicates deduplicated virtual tapes to reduce bandwidth requirements
• Further reduces network traffic as only changed data is sent to the target Data Domain system
• Automatically replicates tapes to the target system
• Provides detailed replication reporting through the Data Domain GUI and CLI
Supported environments
The Disk Library with the Data Domain system supports the backup applications and versions listed in the EMC Support Matrix on Powerlink® for the Disk Library.
Supported Data Domain systems
The following Data Domain systems are supported:
• DD690 — Version 4.7.3.1 or later
• DD880 — Version 4.7.3.1 or later
Supported EMC Disk Library systems
• DL4000 Series — Version 3.3 SP1 or later
• DL5000 Series — Version 4.0 or later
These Disk Library models have been qualified with the above Data Domain systems. All configurations require an official EMC Request for Product Qualification (RPQ).
Physical connectivity
The Disk Library is comprised of one or two servers (engines) attached to one or two CLARiiON® arrays.
The Disk Library connects to the Data Domain system through a storage area network (SAN) using one to four FC ports on each Disk Library engine (maximum of four connections to the Data Domain system allowed) connected to one to four ports on a Data Domain system. For four-port connectivity, two FC HBA cards must be installed in the Data Domain system. Direct-connecting FC cables from a Disk Library to a Data Domain system is not supported.
Each Disk Library engine has four Fibre Channel library ports (4, 5, 8, and 9) for initiator mode SCSI attach. Any one of these ports can be used for connection to the Data Domain appliance. With the Disk Library system, any unused library ports are available for connecting a physical tape library or another Disk Library for use as a back-end library.
Figure 2 shows one of the possible ways the Disk Library can be interconnected with the two Data Domain models.
Data Domain System
6A 5B
Ethernet Ports for Replication, Management
and Data
6B 5A
FC SAN
Disk Library Server A
1 3 7 11
4
9 8 5
0 2 6 10
FC ports available for SAN connections to backup servers
Disk Library Server B
1 3 7 11
4
9 8 5
0 2 6 10
FC ports available for SAN connections to backup servers
CLARiiON Array
1 2
Figure 2. Disk Library with four Data Domain system interconnections
Sizing the Disk Library with the Data Domain system
In order to properly size the Disk Library when configured with the Data Domain system, retention requirements must be thoroughly reviewed and changed to accommodate the longer retention times possible with deduplication technology. Each system must be sized separately according to the retention scheme desired and data access needs to be anticipated to take full advantage of the features of that system and may require backup policies to be re-evaluated. Storage capacity must be sized to adequately handle the amount of data expected to be retained in both native and deduplicated format. Please contact your EMC representatives to properly size the environment in which this interoperability will be used.
Moving data from the Disk Library to the Data Domain deduplication storage system
There are three methods for tape migration or copying data from the Disk Library to the Data Domain system. These three methods use existing features within the Disk Library software.
• Automated Tape Caching
With the Automated Tape Caching feature, the virtual tapes act as disk-based cache to physical libraries such that data is first written to virtual tapes in a VTL and later copied to virtual tapes in the Data Domain system based on user-defined policies. The movement of data over the SAN from the Disk Library to a Data Domain system is done by a process called tape migration. Migrating data causes a copy of the data to exist in two physical locations, one on the Disk Library and one on the Data Domain system. All reads and writes of the data will occur with the copy of data present on the Disk Library.
This data resides on both systems until a reclamation process is run on the Disk Library. Reclamation removes the data in the virtual tape on the Disk Library and replaces it with a pointer (or tape stub) to the data on the Data Domain system. After reclamation, the data is only present in its compressed and deduplicated form on the Data Domain system.
This feature is the recommended feature to use when you are using a backup application that is not EMC Networker or Symantec NetBackup and does require a Disk Library license key to activate.
For best practices planning and for more information on how to set up and use Automated Tape Caching on the Disk Library, please see the EMC Disk Library Automated Tape Caching Feature — A Detailed Review white paper on Powerlink.
• Embedded storage node (EMC NetWorker)
The embedded storage node software treats the Disk Library emulated libraries and drives as if they were physical tape libraries and drives. From a backup application point of view, the devices are standard backup targets. Using the EMC NetWorker cloning operation, data is cloned from the Disk Library to the Data Domain system based on user-defined policies. Data is also expired on the Disk Library and the Data Domain system based on user-defined policies.
This feature is the recommended feature to use when you are using the EMC NetWorker backup application and it requires a Disk Library license key to activate.
For best practices planning and for more information on how to set up and use the embedded storage node on the Disk Library, please see the EMC Disk Library with NetWorker — Best Practices Planning white paper on Powerlink.
• Embedded media server (Symantec NetBackup)
The embedded media server software treats the Disk Library emulated libraries and drives as if they were physical tape libraries and drives. From a backup application point of view, the devices are standard backup targets. Using the Symantec NetBackup duplication operation, data is duplicated from the Disk Library to the Data Domain system based on user-defined policies. Data is also expired on the Disk Library and the Data Domain system based on user-defined policies.
This feature is the recommended feature to use when you are using the Symantec NetBackup backup application and it requires a Disk Library license key to activate.
Using the Disk Library with the Data Domain system
The most common scenarios for using the Disk Library with the Data Domain system are discussed below.
These scenarios include the use of existing Disk Library software options - Automated Tape Caching, embedded storage node, or embedded media server to copy data to the Data Domain system.
• Copying data from the Disk Library to the Data Domain system
In this scenario, either one or two engines are writing data to the Data Domain system. Data is migrated from the Disk Library (using tape caching) or is copied (using the embedded media managers) to the Data Domain system. Data is sent to the Data Domain system through a Fibre Channel SAN. This SAN can be either a normal SAN or can be an extended Fibre Channel SAN.
With the Automated Tape Caching feature, the backup application sees the local copy of data and data access is through the Disk Library. With the embedded storage node or embedded media server, the backup application is aware of both copies of data and data access is through the backup application.
• Copying data from the Disk Library to Data Domain and to a physical tape library
In this scenario, data is copied to the Data Domain system and a physical tape library via the embedded storage node/media server. In this configuration, the data can reside on each of the three units for different retention periods. Each engine would have to see the Data Domain system and the physical tape library since the data is seen by each engine individually. Multiple engines can be used in a dual- engine configuration, with each writing to its own Data Domain system and physical tape unit.
• Copying data from the Disk Library to multiple Data Domain systems
Here, the two Disk Library engines write data to two separate Data Domain systems. Data can either be migrated from the Disk Library (using Automated Tape Caching) or copied (using the embedded media server/storage node) to each Data Domain system from its specific Disk Library engine. This is well suited for environments that require the highest performance and wish to fully utilize the
performance capabilites of the Disk Library.
• Copying data to the Data Domain and replicating to another Data Domain
Domain standard replication commands. A dedicated Disk Library on the target side is not required, although in some tape caching environments, a Disk Library on the target side may be required.
Conclusion
The information presented in this white paper is intended to provide an overview of a Disk Library with the Data Domain deployment scenario in common backup environments. For more in-depth best practices planning and configuration suggestions, please see the associated white papers available on Powerlink.
References
• EMC Disk Library and EMC Data Domain Solution Sizing Process Guide (for EMC employees only)
• EMC Disk Library Automated Tape Cache Feature - A Detailed Review white paper
• EMC Disk Library DL4106, DL4206, and DL4406 Version 3.2 - Best Practices Planning white paper
• EMC Disk Library with NetWorker - Best Practices Planning white paper
• EMC Disk Library with VERITAS NetBackup - Best Practices Planning white paper
• EMC CLARiiON Backup Storage Solutions - The Value of CLARiiON Disk Library with TSM: A Detailed Review white paper
• Data Domain EMC NetWorker V7.4 Application Introduction
• Data Domain VERITAS NetBackup 6.5 Application Introduction
• Data Domain IBM Tivoli Storage Manager Integration Guide