Storage Choices
for Virtual Machine
Disaster Recovery
When you’re sizing up storage solutions for DR in a virtual environment, you should consider the issues of vendor support; storage architecture; replication options; deduplication; and recovery options.
In this E-Guide lean about several concerns you need to be aware of when plan-ning for disaster recovery in a virtual server environment
Sponsored By:
Table of Contents:
Leveraging storage replication for VM disaster recovery Disaster recovery planning in a virtualized environment Hospital stitches in iSCSI switch for virtual DR
Resources from Hewlett-Packard
Storage Choices
for Virtual Machine
Disaster Recovery
Leveraging storage replication for VM disaster recovery
Storage replication is a popular method for synchronizing production and disaster recovery (DR) sites in virtual server environments. If you’re either using array-based replication or leveraging a storage virtualization appliance for replication, there are several variables that will influence the efficiency of your storage topology as it relates to DR. When you’re sizing up storage solutions for DR, you should consider five issues:
• Vendor support • Storage architecture • Replication options
• Deduplication or single instance storage support • Recovery options
Of course, there are several ways to get data from a production site to a DR site. Rather than simply give a high-level overview of these alternative virtual machine (VM) replication methods, this article will take a deeper look at specific storage array considerations. However, when it comes to architecting replication for virtual environments, this article can only scratch the surface. Many storage and DR optimization tricks are vendor-specific. Be sure to check your storage and server virtualization vendors’ documentation and architecture guides for details relevant to your particular environment.
Let’s set a baseline by assuming the high-level storage replication architecture shown below. Note: The network storage could be network attached storage (NAS) or either a Fibre Channel (FC) or iSCSI storage array.
All major network storage vendors offer tools for replicating data on an array from one site to another. Most of them use asynchronous replication for site-to-site network storage synchronization, since the WAN network throughput or distance between sites is usually inadequate for synchronous replication. With asynchronous replica-tion, writes are committed to primary storage, then replicated based on the replication policy set by the storage administrators.
Vendor support
Although most storage array vendors offer some form of asynchronous replication, the choice of array vendor nev-ertheless usually matters. When evaluating storage options, vendor support is a key criteria. A storage array should be supported on products from your environments virtualization vendor and OS vendor. Support should also be con-sidered for enterprise application vendors that name supported storage platforms. Storage platforms that leave a portion of your infrastructure unsupported constitute a risk.
You should also look at your backup vendor’s list of supported storage platforms. Many enterprise backup products are capable of managing snapshots on most popular network storage platforms. A storage platform that integrates with your existing data protection software should be given more consideration than one that that does not.
Storage architecture
The way in which storage is architected to support virtualization can have a dramatic effect on replication perform-ance, and thus DR response. Fault-tolerant capabilities via RAID support are required, as any storage array should be deployed as RAID level 5 at a minimum.
In terms of DR response, you need to look at how each VM’s virtual disk storage is allocated, as well as how temporary file locations are configured in each VM’s guest operating system. When a storage array is configured to support virtualization, you should set aside a volume set for transient or temporary data. How you deal with transient data should be determined by the service level requirements of the VMs you support. For VM data that is synchronously mirrored over dark fiber between two locations, certain application- or service-centric temporary files may be critical and will need to be replicated too. However, for VMs that are asynchronously replicated to a DR site, in most cases replicating temp files would be a waste of bandwidth and storage space.
Getting back to the storage configuration details, assume you’ve set aside enough volume space (e.g. storage LUN, NFS mount, etc.) for your virtual infrastructure’s temporary data. Once the storage for transient data has been allocated, you should configure the virtual infrastructure so that the following files are stored on the transient data volumes:
• Hypervisor swap files; • Virtual machine guest OS:
- Swap file - Pagefile
- OS and application temp folders - User temp directories
Replication options
Each application’s service level requirements should drive the replication requirements of any storage platform. Platforms that offer synchronous and asynchronous replication features, along with block level incremental replica-tion and granular snapshot features, are more likely to meet all of your storage replicareplica-tion requirements. The bottom line should always be the storage solution’s ability to leverage replication in order to meet your recovery time objectives (RTOs) and recovery point objectives (RPOs).
Deduplication or single instance storage support
A high number of VMs with identical OSes, applications or services will often reside on the same storage array. Storage nodes with built-in data deduplication or single instance storage support will offer significant storage sav-ings by eliminating data redundancy on storage blocks. Note: To realize these storage savsav-ings, the storage array should also support thin provisioning. Otherwise a virtual hard disk file (for example, a .vmdk file on a VMFS vol-ume) would consume all of its allocated space at the time it is provisioned. Thin provisioning would allow the virtual hard disk to consume its assigned storage as the virtual hard disk grows in size. With ESX server, thin provisioning is supported by thin formatting VMDKs.
One of the key benefits to deduplicated storage is that the amount of data to be replicated to the DR site will be significantly reduced, by as much as 60%. You could optimize WAN throughput with a WAN accelerator device, but this won’t reduce storage costs. Deduplicated storage will not only reduce the WAN bandwidth needed to replicate storage but will also reduce the total amount of storage needed for a given virtual infrastructure. By reducing the amount of storage you need to replicate, you’ll also be able to replicate storage more frequently and thus reduce your RTO.
Recovery options
Many storage arrays only provide volume-level recovery for virtual machines. While volume-level recovery is usually what you need for DR, you should look at storage platforms that offer granular file level recovery for files residing in virtual hard disks. Platforms that offer you the ability to recover previous volumes or previous versions of single files from snapshots allow you to leverage the storage solution for both DR and day-to-day file recovery operations. Such solutions would save on the required storage space for data protection operations, as most file-level backups would be unnecessary since file recovery could come from previous volume-level snapshots.
Disaster recovery planning in a virtualized environment
Because of its ease of deployment and integration, server virtualization can be a highly effective tool for disaster recovery. Server virtualization addresses three concerns related to disaster recovery:
• Cost: Virtualization allows companies to reduce the number of physical servers they deploy at production and recovery sites.
• Procurement delays: Virtualization eliminates most hardware dependencies.
• Rapid recovery: Virtualized server images can be rapidly deployed and in some cases, moved across physical systems.
When considering server virtualization in a disaster recovery strategy, storage administrators must take into account their data protection, recovery granularity and restore objectives.
Data protection (backups)
One challenging aspect of server virtualization and disaster recovery is that without valid and usable backup data, there is not much to recover. Virtualization alone does not ensure recoverability of the data. Several options are available for backing up virtual servers, and the results of each will vary. If a conventional backup agent is installed on each virtual machine, you can expect to get results comparable to those in a physical server environment.
Recovery granularity
If image-level backups (VMDK) are preferred over conventional backup agents because of the software cost reduction benefits, the backup strategy must be designed to be non-disruptive and to provide granular (file-level) restore capa-bilities. Third party software tools, such as vConvert from Vizioncore Inc. and PlateSpin Forge, help automate full and incremental image backups without taking the virtual server offline. This capability also allows for file-level restores.
Restore performance
One thing to keep in mind when backing up and restoring virtual servers is I/O performance. Virtualization is appealing because it allows the consolidation of otherwise under-utilized server resources. Although this holds true for many systems during production hours, it is often not the case during backups or, more specifically, during restores. The I/O generated by the simultaneous restore of 10 virtual servers on one physical system in a disaster recovery situation can become a serious bottleneck. Simply having hardware available does not guarantee that recovery time objectives (RTOs) will be met.
RTOs/RPOs
However, all the virtualization, image backups and data replication in the world will not be much of a disaster recovery strategy unless there is an off-site component. A typical scenario includes virtual servers deployed or ready to be deployed at an alternate site with backups sent offsite or data replicated between sites.
Hospital stitches in iSCSI switch for virtual DR
When St. Joseph Healthcare began planning for automated load balancing and failover between its primary and secondary data centers last year, its IT department started with an architecture, and then tried to find components to build that architecture using trusted Fibre Channel vendors.
But along the way, the hospital found an iSCSI switch from Sanrad Inc. to be a more viable tool for pulling all the pieces together.
The hospital wanted to send each newly provisioned volume of storage to the secondary site automatically and asynchronously, keeping the two sites reasonably in sync. The goal was not only to be able to balance workloads between the primary site and hot disaster recovery sites using VMware Inc.’s VMotion, but also to be able to lose the entire primary data center almost instantaneously and failover to the secondary site automatically with no impact to users.
The first piece of the hospital’s disaster recovery puzzle was a two-year-old EMC DMX 800 disk array that it moved to the secondary site. The next addition was a Fibre Channel and FATA-based EVA 8100 from Hewlett-Packard. Eric Nelson, St. Joseph’s CIO and IT director, said the EMC array he inherited was still too new to decommission, but he’d also grown familiar with HP in previous jobs and wanted to use the more intuitive virtualized provisioning on the EVA.
This complicated the replication plans, however. Nelson knew putting a second HP system at the secondary site for homogeneous replication would let him transfer data fast enough between the sites, but he didn’t want to lose his investment in the EMC array. That’s when the hospital’s local VAR, Total Tec, suggested Sanrad’s V-Switch product as a way to get asynchronous replication between the heterogeneous arrays.
“I was skeptical about iSCSI,” Nelson said. “I had used Fibre Channel for years and years and had trouble envision-ing puttenvision-ing thenvision-ings like my high-transaction databases on iSCSI.” But usenvision-ing the Sanrad switch would save him $300,000 over its Fibre Channel alternatives. He tried it and was surprised by its performance. “It supports our high-transaction SQL cluster just fine,” he said, pegging SQL’s transaction rate at 100 to 140 per second. “I have noticed no bottlenecks.”
The hospital’s storage team first presents Fibre Channel LUNs from the EVA at the primary site to the Sanrad switch, then presents iSCSI LUNs to most of the 140 physical and virtual servers running in the production environment, both at the primary and secondary sites. The Sanrad switch then automatically mirrors every volume of data to the EMC array at the secondary site.
Nelson admitted that there are a few unique attributes of his environment that allow adequate performance for this architecture. One is that the hospital owns the pipe from one site to another. Another is the hospital’s dual Cisco Catalyst 4510 backbone, which has enough bandwidth to run three separate VLANs to support iSCSI, each of which also have their own blade in the Cisco chassis.
The ultimate test of this infrastructure has yet to be carried out. “We can just about get all our critical servers run-ning simultaneously on the hosts at each physical site, but it’s pretty tight,” Nelson said. The hospital has run some test failovers, and while it has verified that the mirroring is working on a day-to-day basis, it has not yet performed a full live shutdown of the primary site.
In the meantime, Nelson said he’s discovered another advantage to iSCSI over Fibre Channel—the ability to directly attach LUNs to virtual hosts. “Trying to attach a Fibre Channel LUN to a virtual host is like trying to plug in a USB device—you need an HBA,” he said. “With iSCSI, the target and initiator can both be software-based.” In fact, he said, if he’d known at the outset what he now knows about iSCSI, “I would’ve had no problem at all basing all of this on an iSCSI SAN and saving even more money.”
Resources from Hewlett-Packard
HP StorageWorks 4400 Enterprise Virtual Array (EVA 4400) Performance Storage Virtualization and the HP StorageWorks Enterprise Virtual Array
About Hewlett-Packard
Hewlett-Packard is one of the world's largest computer companies and the foremost producer of test and measure-ment instrumeasure-ments. The company's more than 29,000 products are used by people for personal use and in industry, business, engineering, science, medicine and education.
In addition, the company makes networking products, medical electronic equipment, instruments and systems for chemical analysis, handheld calculators and electronic components.
HP is among the top 20 on the Fortune 500 list. The company had net revenue of $42.9 billion in its 1997 fiscal year. More than 56 percent of its business comes from outside the United States, and more than two-thirds of that is from Europe. Other principal markets are Japan, Canada, Australia, the Far East and Latin America. HP ranks among the top 10 U.S. exporters. HP is No. 5 among Fortune's Most Admired Companies and No. 10 among Fortune's Best Companies to Work for in America.
Headquartered in Palo Alto, California, the company employs more than 120,000 people, of whom some 69,000 work in the United States. HP has major sites in 28 U.S. cities and in Europe, Asia Pacific, Latin America and Canada.
HP sells its products and services through about 600 sales and support offices and distributorships in more than 120 countries, and through resellers and retailers.