VMware Virtualization Disaster Recovery Solution

According to the market research about game changing virtual technology conducted by the Internet Data Center (IDC) in 2011, the workload using virtualized platform has increased greatly in recent years. In 2010, the growth rate exceeded 50% for the first time. In 2011, the figure hit 59%. Now, the growth rate is still going up steadily. Virtualization is being applied to a wider range. A service interruption in a virtual machine data center running critical applications may cause the enterprise that owns the center huge loss. As a result, a disaster recovery center becomes a necessary part of a data enter.

Essential characteristics of virtualization including partitioning, isolation, encapsulation, and independence make virtualized environments more flexible. Enterprises pose more

demanding requirements for disaster recovery in virtualized environments, including simple and convenient disaster recovery management and maintenance, cost-effective and flexible disaster recovery drills, as well as failover.

VMware vCenter Site Recovery Manager (SRM) is a business continuity and disaster recovery solution provided by VMware. SRM can help users plan, test, and execute recovery of virtual machine services between the production and disaster recovery sites. For

enterprises' critical applications, the OceanStor enterprise storage system provides the SRA suite to work with the SRM to provide simple and efficient disaster recovery solutions for virtualized environments.

HUAWEI OceanStor 18000 Series Enterprise Storage System

Disaster Recovery White Paper 5 Disaster Recovery Solutions

 Simple management and low maintenance costs

Disaster recovery management is integrated into the original vSphere management platform.

The complex array management and virtual machine operations are encapsulated into simple operations on the interaction interface. The solution is easy to deploy and maintain, reducing the maintenance cost a lot.

 One-key switchover, reducing the RTO to the maximum

After the disaster recovery solution is deployed, users can switch services over to the disaster recovery center by a one-click operation. Then the switchover process is completed by the system automatically. This minimizes the RTO and avoids manual operations that could result in severe consequences.

 Flexible disaster recovery plan configuration and on-demand drills

Users can make disaster recovery plans based on their actual needs and perform efficient disaster recovery drills. The drills use a private network, and therefore, does not affect

services of the original production site. During a drill, users can view the result of each step in real time. This improves users' confidence in the disaster recovery solution.

HUAWEI OceanStor 18000 Series Enterprise Storage System

Disaster Recovery White Paper 6 Advantages of the Disaster Recovery Solutions

6 Advantages of the Disaster Recovery Solutions 6.1 Over a Decade of Accumulated Investment in Storage

As a professional Chinese local storage vendor with over 10 years of experience in the storage field, Huawei insists on self-development of storage products and is dedicated to providing simple and cost-efficient storage products and solutions to optimizing enterprise IT

applications.

Now, Huawei has five R&D centers in the world with over 3000 professional storage R&D engineers. It has invested over 10 million USD in the storage compatibility test lab, storage performance test lab, solution verification center, and China's first disaster recovery and cloud storage lab in Chengdu. Depending on its capability in chips, network, core software and hardware technologies and shared platforms, Huawei delivers a full range of storage products and solutions covering from unified storage, big data storage, to cloud storage, from main storage to auxiliary storage, and from hardware to independent software. With the best R&D capability in China, Huawei delivers one-stop infrastructure solutions and service containing storage, servers, cloud computing, and networks. Among the professional storage R&D team, over 400 engineers are dedicated to disaster recovery solutions. With constant input and innovation, these engineers delivers localized and customized storage service and core technology support for users.

6.2 Multi-Level Disaster Recovery Solution

The OceanStor enterprise storage system adopts the innovative Smart Matrix

architecture, which has multiple engines (each engine containing two controllers) and supports scale-out. Therefore, the OceanStor enterprise storage system delivers performance fitting to the requirements of applications in each construction phase of the IT system and provides customized disaster recovery solutions (backup, media disaster recovery, data-level disaster recovery, or application-level disaster recovery) with predictable service level for customers.

6.3 Business Continuity and Professional Service Throughout the Entire Process

Huawei designed disaster recovery solutions using the OceanStor enterprise storage system based on a thorough and accurate understanding of customer requirements for disaster

HUAWEI OceanStor 18000 Series Enterprise Storage System

Disaster Recovery White Paper 6 Advantages of the Disaster Recovery Solutions

28 recovery. Besides solutions, Huawei also provides customers with professional consulting service covering business continuity planning and analysis, site selection for the disaster recovery center, equipment room design, infrastructure, disaster recovery design, deployment and implementation, and operation and maintenance of the disaster recovery center.

6.4 Simplified and Efficient Disaster Recovery with High Replication Ratio and Wide Compatibility

HyperReplication of the OceanStor enterprise storage system supports data replication from 32 storage devices to one storage device for central backup (32: 1 replication ratio, which is four to eight times that of a piece of peer software from another vendor). This realizes disaster recovery resource sharing and greatly reduces the cost in deploying disaster recovery devices.

In addition, Huawei is the first vendor who realized data replication between high-end, mid-range, and low-end storage systems. In addition, Huawei offers multiple types of multi-hop 3DC disaster recovery modes. Therefore, Huawei's simple and hierarchical low-TCO disaster recovery solutions are suitable to Chinese customers (for example, government

departments with vertical hierarchical structure).

6.5 Centralized Disaster Recovery Management Based On the All In One Platform

After a disaster recovery system is built, how to manage this complicate system is a big challenge. Traditional management methods cannot manage an entire disaster recovery system centrally. Huawei thoroughly understands customer requirements for managing a disaster recovery system and develops the All In One disaster recovery management platform. This platform enables customers to centrally monitor and manage networks, storage devices, and servers in both the production center and disaster recovery center and visually manage disaster recovery services, simplifying the management and maintenance of a disaster recovery system.

Disaster recovery solutions of the OceanStor enterprise storage system achieve intuitive application-level disaster recovery services management in standard process. Users can create, deploy, manage, and maintain disaster recovery services on the standard disaster recovery management interface using wizards. To ensure the availability of disaster recovery data, disaster recovery drills are mandatory. Huawei provides one-click drills to automatically improve and mount disaster recovery data, and verify data availability by integrating the management function of application systems to achieve real application-level disaster recovery management.

HUAWEI OceanStor 18000 Series Enterprise Storage System

Disaster Recovery White Paper 7 Disaster Recovery Drill and Switchover Process

7 Disaster Recovery Drill and Switchover Process

7.1 Local (Same-City) Disaster Recovery Drill Process

Local disaster recovery drills are performed for the following purposes:

1) To test availability of backup data at the disaster recovery center without affecting services in the production center.

2) To test whether services at the production center can be switched to the disaster recovery center (testing the feasibility of a switchover process as well as availability of data in the disaster recovery center) by simulating a production center fault.

The specific drill processes are as follows:

7.1.1 Drill Process for Testing Data Availability at the Local Disaster Recovery Site

With the snapshot technology, the OceanStor enterprise storage system can generate a readable and writable snapshot for the secondary LUN of a remote replication pair. Host access to the snapshot does not affect the snapshot source LUN (the secondary LUN). In this way, the drill does not affect services at the production center or the disaster recovery system.

The drill process is as follows:

1) Prepare for the drill.

 Make sure that the remote replication pair is in a normal state.

 Make sure that the network between the backup hosts and storage systems at the local disaster recovery center is normal.

2) Perform the drill.

 Create a snapshot for the secondary LUN of the synchronous remote replication pair (batch creation is supported if there are multiple secondary LUNs).

 Map the snapshot to the backup host at the local disaster recovery center.

 Run services on the backup host at the local disaster recovery center.

 Test the availability and consistency of data at the local disaster recovery center.

HUAWEI OceanStor 18000 Series Enterprise Storage System

Disaster Recovery White Paper 7 Disaster Recovery Drill and Switchover Process

30 3) Restore the environment after the drill.

 Stop host test services at the local disaster recovery center.

 Delete the mapping from the snapshot to the backup host in the local disaster recovery center.

 Delete the snapshot.

7.1.2 Process for Switching Services to the Local Disaster Recovery Center

To test the switchover process, services in the production center will be actually switched to the local disaster recovery center. The drill interrupts services of the production center.

The detailed process is as follows:

1) Prepare for the drill.

 Make sure that the remote replication pair is in a normal state.

 Make sure that the upper-layer application environment at the local disaster recovery center is ready.

 Make sure that the network between the backup host and storage system at the local disaster recovery center is normal.

2) Perform the drill.

 Stop host services at the production center.

 Delete the mapping from the primary LUN of the synchronous remote replication pair to the host at the production center.

 Perform a primary/secondary switchover for the synchronous remote replication pair from the production center to the local disaster recovery center.

 Map the secondary LUN to the backup host at the local disaster recovery center.

 Run services at the backup host at the local disaster recovery center.

 Test the availability and consistency of data at the local disaster recovery center.

3) Restore the environment (for details, see the switchback process).

7.2 Local Switchback Drill Process

7.2.1 Switchback Process for Testing Data Availability at the Local Disaster Recovery Site

No switchback is needed. During the test, the application systems at both the production and disaster recovery centers are running normally.

7.2.2 Process for Switching Services Back

HyperReplication/S supports reverse incremental synchronization. During a disaster recovery drill, HyperReplication/S records addresses of new host data written to the LUNs at the production and local disaster recovery center. In this way, during the service switchback from

HUAWEI OceanStor 18000 Series Enterprise Storage System

Disaster Recovery White Paper 7 Disaster Recovery Drill and Switchover Process

31 the local disaster recovery center to the production center, only incremental data is copied. In this way, the switchback duration is shortened.

The detailed process is as follows:

1) Prepare for the switchback.

 Make sure that the remote replication pair is in a normal state.

 Make sure that the upper-layer application environment at the production center is ready.

 Make sure that the network between the host and storage system at the production center is normal.

2) Perform the switchback.

 Stop services at the backup host at the local disaster recovery center.

 Delete the mapping between the backup host and storage system at the local disaster recovery center.

 Start synchronization for the synchronous remote replication pair from the local disaster recovery center to the production center to achieve data consistency between the primary and secondary LUNs.

 Perform a primary/secondary switchover for the synchronous remote replication pair from the local disaster recovery center to the production center.

 Map the LUN to the production host at the production center.

 Run services on the production host.

3) Perform the post-switchback check.

 Check that services of the production system are normal.

 Check that the disaster recovery system is normal by viewing the remote replication pair status.

7.3 Remote Disaster Recovery Drill Process

Remote disaster recovery drills are performed for the following purposes:

1) To test the backup data availability without affecting services in the production center.

2) To test whether services at the production center can be switched to the remote disaster recovery center (testing the feasibility of a switchover process and availability of data in the disaster recovery center) by simulating a situation where both the production center and local disaster recovery center fail.

The specific drill processes are as follows:

7.3.1 Drill Process for Testing Data Availability at the Remote Disaster Recovery Site

With the snapshot technology, the OceanStor enterprise storage system can generate a readable and writable snapshot for the secondary LUN of the remote replication pair. Host access to the snapshot does not affect the snapshot source LUN (the secondary LUN). In

HUAWEI OceanStor 18000 Series Enterprise Storage System

Disaster Recovery White Paper 7 Disaster Recovery Drill and Switchover Process

32 this way, the drill does not affect services at the production center or the disaster recovery system.

The drill process is as follows:

1) Prepare for the drill.

 Make sure that the remote replication pair is in a normal state.

 Make sure that the network between the backup host and storage system at the remote disaster recovery center is normal.

2) Perform the drill.

 Create a snapshot for the secondary LUN of the synchronous remote replication pair (batch creation is supported if there are multiple secondary LUNs) at the remote disaster recovery center.

 Map the snapshot to the backup hosts at the remote disaster recovery center.

 Run services at the backup host at the remote disaster recovery center.

 Test the availability and consistency of data at the remote disaster recovery center.

3) Restore the environment after the drill.

 Stop host test services at the remote disaster recovery center.

 Delete the mapping from the snapshot to the backup hosts in the remote disaster recovery center.

 Delete the snapshot.

7.3.2 Process for Switching Services to the Remote Disaster Recovery Center

To test the switchover process, services in the production center will be actually switched to the remote disaster recovery center. The drill interrupts services of the production center.

The drill process is as follows:

1) Prepare for the drill.

 Make sure that the remote replication pair is in a normal state.

 Make sure that the upper-layer application environment at the remote disaster recovery center is ready.

 Make sure that the network between the backup host and storage system at the remote disaster recovery center is normal.

2) Perform the drill.

 Stop host services at the production center.

 Delete the mapping from the primary LUN of the remote replication pair to the host at the production center.

 Perform a primary/secondary switchover for the synchronous remote replication pair from the production center to the local disaster recovery center.

 Perform a primary/secondary switchover for the asynchronous remote replication pair from the local disaster recovery center to the remote disaster recovery center.

 Map the secondary LUN to the backup host at the remote disaster recovery center.

HUAWEI OceanStor 18000 Series Enterprise Storage System

Disaster Recovery White Paper 7 Disaster Recovery Drill and Switchover Process

 Run services at the backup host at the remote disaster recovery center.

 Test the availability and consistency of data at the remote disaster recovery center.

3) Restore the environment (for details, see the switchback process).

7.4 Remote Switchback Drill Process

7.4.1 Switchback Process for Testing Data Availability at the Remote Disaster Recovery Site

No switchback is needed. During the test, the application systems at both the production and disaster recovery centers are running normally.

7.4.2 Process for Switching Services Back

HyperReplication/A supports reverse incremental synchronization. During a disaster recovery drill, HyperReplication/S records addresses of new host data written to the LUNs at the disaster recovery centers. In this way, during the service switchback from the remote disaster recovery center to the local disaster recovery center and production center, only incremental data is copied. In this way, the switchback duration is shortened.

The detailed process is as follows:

4) Prepare for the switchback.

 Make sure that the remote replication pair is in a normal state.

 Make sure that the upper-layer application environment at the production center is ready.

 Make sure that the network between the host and storage system at the production center is normal.

5) Perform the switchback.

 Stop services at the backup host at the remote disaster recovery center.

 Delete the mapping between the backup host and storage system at the remote disaster recovery center.

 Start synchronization for the asynchronous remote replication pair from the remote disaster recovery center to the local disaster recovery center to achieve data consistency between the primary and secondary LUNs.

 When the asynchronous remote synchronization is complete, start synchronization for the remote replication pair from the local disaster recovery center to the production center.

 After the synchronization is complete, perform a primary/secondary switchover for the asynchronous remote replication pair from the remote disaster recovery center to the local disaster recovery center.

 Perform a primary/secondary switchover for the synchronous remote replication pair from the local disaster recovery center to the production center.

 Map the primary LUN to the production host at the production center.

 Run services on the production host.

HUAWEI OceanStor 18000 Series Enterprise Storage System

Disaster Recovery White Paper 7 Disaster Recovery Drill and Switchover Process

34 6) Perform the post-switchback check.

 Check that services of the production system are normal.

 Check that the disaster recovery system is normal by viewing the remote replication pair status.

HUAWEI OceanStor 18000 Series Enterprise Storage System Disaster Recovery White Paper

8 Technology Requirements for disaster recovery Solution Implementation

For HyperReplication/S, a write success response is returned only after the data in each write request is written to the primary site and secondary site. If the primary site is far away from the secondary site, the write latency of foreground applications is quite long, affecting foreground services. Therefore, HyperReplication/S is usually implemented in a

situation where the primary site is near to the secondary site, for example, same-city disaster recovery. The following lists the requirements posed by HyperReplication/S:

 Distance between the primary and secondary sites < 200 km

 Minimum link bandwidth ≥ 64 Mbit/s

 Unidirectional transmission latency < 1 ms

 Actual network bandwidth > peak write I/O bandwidth

For HyperReplication/A, the write latency of foreground applications is independent of the distance between the primary and secondary sites. Therefore, HyperReplication/A applies to disaster recovery scenarios where the primary and secondary sites are far away from each other, or the network bandwidth is limited. The following lists the requirements posed by HyperReplication/A:

 No explicit limit on the WAN distance between the primary and secondary sites;

In document HUAWEI OceanStor Series Enterprise Storage System Disaster Recovery White Paper HUAWEI TECHNOLOGIES CO., LTD. Issue 01. (Page 30-42)