• No results found

EMC Backup and Recovery for Microsoft Applications Deduplication Enabled by EMC CLARiiON and Data Domain

N/A
N/A
Protected

Academic year: 2021

Share "EMC Backup and Recovery for Microsoft Applications Deduplication Enabled by EMC CLARiiON and Data Domain"

Copied!
68
0
0

Loading.... (view fulltext now)

Full text

(1)

EMC Backup and Recovery

for Microsoft Applications

Deduplication Enabled by

EMC CLARiiON and Data Domain

A Detailed Review

EMC Information Infrastructure Solutions

Abstract

EMC® Data Domain® deduplication storage systems dramatically reduce the amount of disk storage needed to retain and protect Microsoft Exchange and SharePoint data. This white paper provides best practices and demonstrates how the EMC Data Domain deduplication storage solution integrates with EMC backup and recovery products to

(2)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 2

Copyright © 2010 EMC Corporation. All rights reserved.

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com All other trademarks used herein are the property of their respective owners.

(3)

Table of Contents

Executive summary ... 6

Business case ... 6 Product solution... 6 Key results ... 7

Introduction ... 8

Overview ... 8 Purpose ... 8 Scope ... 8 Audience ... 8 Key components... 9 EMC CLARiiON CX4-480 ... 9

EMC Replication Manager ... 9

EMC NetWorker ... 9

EMC Data Domain DD690 ... 9

EMC SnapView ... 9

Microsoft Office SharePoint Server ... 10

Microsoft Exchange 2010 ... 10

Exchange 2010 DAG ... 10

VMware ESX Server ... 10

Kroll Ontrack ... 10

Environment profile ... 11

Physical environment ... 11

Hardware resources ... 12

Software resources ... 13

Microsoft Exchange design ... 14

Exchange 2010 design in a virtualized environment ... 14

Introduction ... 14

Exchange user profiles ... 14

Storage design for Exchange database and log LUNs ... 14

Building Block ... 15

Step 1: Identify requirements ... 15

Step 2: Calculate storage requirements ... 15

Step 3: Identify Exchange server Mailbox design ... 17

Step 4: Finalize Exchange server Mailbox storage configuration ... 17

Exchange DAG configuration ... 18

(4)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 4

Storage design and consideration for Replication Manager ... 22

Automation scripts during backup ... 22

NetWorker save set configuration for Exchange 2010... 23

Microsoft SharePoint design ... 24

SharePoint 2007 design in a virtualized environment ... 24

Introduction ... 24

SharePoint content database consideration ... 24

SharePoint farm search component consideration ... 24

SharePoint virtualization resource allocation ... 25

SharePoint storage design ... 26

Backup and recovery design for SharePoint 2007 ... 27

Introduction ... 27

Full disaster backup and recovery design ... 27

VSS providers overview ... 27

VSS Writer overview ... 28

LAN-free backup design for SharePoint full farm ... 28

Clone group design ... 30

Snapshot policy consideration ... 30

Full farm conventional recovery design ... 30

Granular backup and recovery design for SharePoint ... 31

Introduction ... 31

Granular LAN-based backup and recovery design ... 31

Save set configuration for SharePoint... 32

Data Domain design and configuration ... 33

Data Domain system overview ... 33

Data Domain sizing considerations ... 33

Data Domain deduplication ratio considerations ... 34

Data Domain space management considerations ... 34

Data Domain VTL with NetWorker ... 34

Data Domain configuration ... 35

Testing and validation ... 36

Introduction ... 36

Exchange backup scenarios ... 36

Introduction ... 36

Scenario 1: Initial Exchange 2010 full backup ... 36

Scenario 2: Daily full backup ... 37

Exchange 2010 recovery scenarios ... 40

Introduction ... 40

Scenario 1: Single database recovery ... 40

(5)

SharePoint backup using VSS framework ... 45

Introduction ... 45

Workload simulation ... 45

Scenario 1: Initial SharePoint farm full backup ... 45

Scenario 2: Daily full farm backup ... 47

Scenario 3: Database-level full backup ... 47

Granular backup for SharePoint scenarios ... 48

Introduction ... 48

Scenario 1: Granular backup ... 48

SharePoint recovery using VSS ... 50

Introduction ... 50

Scenario 1: Full farm recovery using NMM 2.2 SP1 ... 50

Scenario 2: Database-level conventional recovery ... 51

Granular recovery for SharePoint ... 53

Introduction ... 53

Scenario 1: Item-level granular recovery using NMM 2.2 SP1 ... 53

Scenario 2: Item-level recovery using Kroll Ontrack 6.0 ... 54

Scenario 3: Site-level recovery using Kroll Ontrack 6.0 ... 58

Combined backup scenarios ... 60

Introduction ... 60

Scenario 1: Combined initial full backup to Data Domain ... 60

Scenario 2: Combined daily full backup to Data Domain after new data generation ... 61

Combined recovery scenario ... 63

Introduction ... 63

Conclusion ... 64

Summary ... 64 Key findings ... 64 Next steps ... 64

References ... 65

White papers ... 65 Product documentation ... 65 Other documentation ... 65

Additional information ... 66

Introduction ... 66

Automation scripts during backup ... 66

(6)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 6

Executive summary

Business case Data within enterprise environments is growing fast, according to a Digital Universe study by analyst firm IDC. The technology research and consulting firm estimates the worldwide volume of digital data grew by 62 percent between 2008 and 2009 to nearly 800,000 petabytes (PB). IDC claims this Digital Universe will grow to 1.2 million PB, or 1.2 zettabytes (ZB) in 2010, and reach 35 ZB by 2020.

Microsoft Exchange and SharePoint data stores are also experiencing massive data growth because of the increasing capacity of Exchange mailboxes, rising use of multimedia files, and increasing desire to share and collaborate on these documents in an enterprise environment.

As these environments continue to scale and expand, their criticality increases while protecting them becomes more and more difficult.

EMC, with its rich portfolio of hardware, software, and partner offerings, is well positioned to offer a solution that combines recommendations and best practices for creating a robust backup solution for both Exchange 2010 and SharePoint.

Product solution

The solution illustrates how the EMC® Data Domain® deduplication storage solution integrates with EMC NetWorker®, Replication Manager software, and Kroll Ontrack PowerControls to provide a full or granular-level backup of Exchange Server 2010 and Enterprise SharePoint Server 2007.

(7)

Key results This white paper demonstrates the following benefits:

• Reduced backup timeframes: Backing up 9.2 TB of data takes 4 days with a traditional tape-based backup solution. However, this solution enables a daily full backup in less than 10 hours. Data Domain as the virtual yape library (VTL) can use Fibre Channel (FC) to improve throughput.

• Reduced backup storage requirements: Compared with the traditional 100 percent backup storage capacity requirement on tape, only a relatively small amount of additional capacity is required on Data Domain for daily full backups with a data deduplication ratio of 25:1.

Note Your mileage may vary depending on how many duplicates are in your specific environment.

• No impact to production: By backing up the passive copy of the Exchange DAG with space-efficient LAN-free snapshots (through CLARiiON® SnapView and NetWorker proxy client) the production environment for backup operations is not affected.

• Reduced physical infrastructure footprint: By consolidating the Active Directory, domain controllers, and Exchange/SharePoint application servers onto the VMware virtualization platform, the number of physical servers needed in this solution is significantly reduced.

• Simple and efficient item-level recovery: EMC selects the Kroll Ontrack tool for both mailbox-level and item-level granular recovery. It also saves

(8)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 8

Introduction

Overview This white paper describes the benefits of integrating Data Domain deduplication with the NetWorker family for Exchange and SharePoint backup and restore. It also covers some of the features of the NetWorker Module for Microsoft Applications (NMM), as well as offers some insight into how customers can back up SharePoint and stream all backed-up data to highly efficient Data Domain backup storage through the NetWorker family.

Purpose The purpose of this white paper is to:

• Demonstrate a rapid and efficient backup and restore of a multi-terabyte Microsoft environment using EMC snapshot replication, backup, and backup deduplication technologies.

• Validate the backup and restore performance for SharePoint 2007 and Exchange 2010 by integrating NetWorker with Replication Manager and Data Domain deduplication storage.

• Validate the deduplication function and document the deduplication ratio of Data Domain and NetWorker, which makes an ideal long-term backup solution for Exchange and SharePoint.

• Provide the design and architecture on VMware virtualization deployment to reduce physical footprint.

• Conclude best practices, including Data Domain and NMM design overview and considerations.

Scope The scope of this white paper is to:

• Present an overview of the concepts and technologies in the solution

• Document the backup and restore performance and deduplication ratios of Data Domain for both Exchange Server 2010 and Enterprise SharePoint Server 2007. It also includes different scenarios, such as granular restore, and so on

• Present realistic capabilities and the deduplication ratio of the Data Domain product

This white paper does not provide detailed installation instructions. Actual implementations can vary from the parameter testing results shown, due to customer-specific environmental factors.

Audience This white paper is intended for corporate management and business decision-makers, including storage, server, and IT managers, and application engineers, as well as storage integrators, consultants, and distributors. Database administrators who wish to restore Exchange mail, SharePoint documents, file system data will also find this paper helpful.

(9)

Key

components

This section briefly describes the key solution components. For details on all of the components that make up the architecture, see the “Environment profile” section in this paper.

EMC CLARiiON CX4-480

The EMC CLARiiON CX4-480 is a versatile and cost-effective solution for

organizations seeking an alternative to server-based storage. The EMC CLARiiON CX4-480 delivers performance, scalability, and advanced data management features in one, easy-to-use storage solution.

EMC Replication Manager

Replication Manager automates and simplifies the management of replicas. It orchestrates critical business applications, middleware, and underlying EMC replication technologies to create and manage replicas at the application level for a variety of purposes, including operational recovery, backup, restore, development, and simulation. Customers interested in reducing manual scripting efforts, improving recovery, and creating parallel access to information can implement Replication Manager to put the right data in the right place at the right time.

EMC NetWorker NetWorker helps organizations to control costs by bringing management and control of the entire information environment into one central offering. NetWorker uses this centralized, broad protection to bridge the gap between traditional backup and deduplication backup and allows new backup technologies to be introduced

nondisruptively into complex IT operations by providing a common platform for both.

EMC Data Domain DD690

Data Domain solutions can perform data deduplication while maintaining high levels of performance and reliability. Data deduplication enables organizations to reduce back-end capacity requirements by minimizing the amount of redundant data that is ultimately written to disk backup targets. The actual data reduction can vary

significantly from organization to organization or from application to application, depending on a number of factors—the most important being the rate at which data is changing, the frequency of backup and archive events, and how long that data is retained online.

Data Domain integrates easily into existing data centers and can be configured with FC connections to the SAN.

EMC SnapView EMC SnapView lets you create local point-in-time snapshots and complete data clones for testing, backup, and recovery operations. With SnapView, you can create multiple copies of production data on your EMC CLARiiON networked storage system quickly and easily.

(10)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 10

Microsoft Office SharePoint Server

Microsoft Office SharePoint Server (MOSS) is an integrated suite of server capabilities that can help to improve organizational effectiveness by providing comprehensive content management and enterprise search, accelerating shared business processes, and facilitating information sharing across boundaries for better business insight. Additionally, this collaboration and content management server provides IT professionals and developers with the platform and tools they need for server administration, application extensibility, and interoperability.

Microsoft Exchange 2010

Microsoft Exchange Server 2010 is designed to meet today’s communication and collaboration challenges. It provides advanced e-mail and scheduling while delivering new methods of access for employees, greater productivity for IT

administrators, and increased security and compliance capabilities for organizations. Exchange Server 2010 introduces significant improvements in its database. Mailbox servers now can be defined as part of a Database Availability Group (DAG) to provide automatic recovery at the individual mailbox database level instead of at the server level. Furthermore, the transactional input/output (I/Os) requirements for Exchange 2010 have been reduced from those in Exchange Server 2007. With these new features in Exchange Server 2010, customers can now deploy much larger mailboxes than previous versions of Exchange Server, with less expensive drive types such as Serial Attached SCSI (SAS) and Serial Advanced Technology Attachment (SATA) for Exchange Server 2010 mailbox storage.

Exchange 2010 DAG

A Database Availability Group (DAG) is a set of up to 16 Microsoft

Exchange Server 2010 Mailbox servers that provide automatic database-level recovery from a database, server, or network failure. Mailbox servers in a DAG monitor each other for failures. When a Mailbox server is added to a DAG, it works with the other servers in the DAG to provide automatic, database-level recovery from database, server, and network failures.

VMware ESX Server

VMware ESX Server is software for partitioning, consolidating, and managing servers in mission-critical environments. Ideally suited for enterprise data centers, ESX Server minimizes the total cost of ownership of computing infrastructure by increasing resource utilization and maximizing administration flexibility.

Kroll Ontrack Kroll Ontrack provides technology-driven services and software to help corporate, legal, and government entities and consumers recover, search, analyze, produce, and present data efficiently and cost-effectively. In addition to its award-winning suite of software, it also provides data recovery, advanced search, paper and electronic discovery, computer forensics, ESI and trial consulting, and presentation services.

(11)

Environment profile

This section identifies and briefly describes the technology and components used in the environment.

Physical environment

(12)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 12

Hardware resources

The hardware used to validate the solution is listed in the following table.

Equipment Quantity Configuration

CX4-480 1 45 x 300 GB 15k FC disks

105 x 1 TB 7.2k SATA II disks

SAN Switch 1 Cisco MDS 9509

IP Switch 1 Cisco Catalyst 3560E

Data Domain DD690 1 16 TB raw devices Dell PowerEdge R900

24 core, 128 GB RAM

2 D-P, 4 G Emulex HBAs 2 x Quad NICs

1 Virtual Machines (Cluster) 2 x Exchange MBX Server (2 x 4 vCPUs/24 GB)

1 x Exchange HUB/CAS Server (4 vCPUs/8 GB)

1 x Domain Controller (2 x 4 vCPUs/4 GB)

1 x Replication Manager Server (2 vCPUs/4 GB)

Dell PowerEdge R900 24 core, 128 GB RAM

2 D-P, 4 G Emulex HBAs 2 x Quad NICs

1 Virtual Machines (Cluster)

1 MOSS Index Server (8 vCPUs/6 GB) 2 MOSS web front end (WFEs) (2 x 4 vCPUs/4 GB)

1 MOSS App (Central

Admin/Excel/Docu) (2 x vCPUs/4 GB) 1 x NetWorker Server (4 vCPUs/2 GB) Dell PowerEdge R900

24 core, 128 GB RAM

2 D-P, 4G Emulex HBAs 2 x Quad NICs

1 Virtual Machines (Cluster)

2 x MOSS WFEs (2 x 4 vCPUs/4 GB) 1 x MOSS SQL Server (8 vCPUs/16 GB)

1 x Exchange MBX Server (4 vCPUs/24 GB)

1 x Exchange HUB/CAS Server (4 vCPUs/8 GB)

DC Server 1 Dell PowerEdge R710

8 core, 32 GB RAM, 3 x 4-port NICs NetWorker Proxy Server 1 Dell PowerEdge 6850/R710

8 core, 32 GB RAM, 3 x 4-port NICs 2 dual-port, 4 Gb/s Emulex HBA

(13)

Software resources

The software used to validate the solution is listed in the following table.

Software Version

Windows Server 2008 R2 Enterprise edition

RTM

Windows Server 2008 Enterprise edition SP2 Microsoft SQL Server 2008 SP1 Microsoft Office SharePoint Server 2007 SP2

SP2 with July Cumulative Update 12.0.6510

Microsoft Exchange 2010 Enterprise Edition

RTM (14.0.639.21)

Exchange MAPI and CDO 1.2.1 Latest

EMC PowerPath® 5.3 SP1

EMC PowerPath/VE 5.4 SP1

EMC Navisphere® Agent/CLI

6.29.5.0.66 Visual Studio Test Suite 2008 KnowledgeLake Document Loader Latest NetWorker Module for Microsoft

Applications

2.2 SP1

NetWorker 7.6

Replication Manager 5.2.3

Kroll Ontrack PowerControls 6.0

Data Domain Enterprise Manager 4.8.0.3 (beta code)

Driver for Emulex 7.2.20.6

VMware vSphere 4.0

Microsoft LoadGen 2010 Beta

Solutions Enabler 7.1.1

(14)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 14

Microsoft Exchange design

Exchange 2010 design in a virtualized environment

Introduction The design and testing principles applied to this environment demonstrate how Exchange users with large mailboxes can achieve a high level of backup and recovery performance, while utilizing minimal resources. Testing was based on virtualized Exchange 2010 servers, with DAG implemented to provide mailbox database high availability. Snapshots taken from passive DAG copies will be used for both backup and recovery, which minimizes the impact to active mailbox databases during the backup and speeds up the process during the recovery. This solution is intended not only to meet the basic functionality requirements when deploying an efficient, repeatable backup and recovery design on a large-scale virtualized Microsoft Exchange Server 2010 platform, but also to provide a solid foundation for future growth and development of the environment.

Exchange user profiles

The following table summarizes the Exchange environment profile in this solution.

Profile characteristic Value

Number of users 8,000

Exchange 2010 IOPS 0.15 (Very Heavy)

Read/Write Ratio 3:2

Mailbox server 3 (Virtual Machines)

Number of DAG copies 2

User count per server 4,000

Mailbox size 1 GB

Number of databases per server 10 User count per database 400

RAID type RAID 10, 1 TB 7.2k SATA

Storage design for Exchange database and log LUNs

Sizing and configuring storage for use with Microsoft Exchange Server 2010 could be a complicated process, due to many variables and factors that vary from organization to organization. One of the methods used to simplify the sizing and configuration of storage for use with Microsoft Exchange Server 2010 is to define a unit of measure – a building-block.

(15)

Building Block A building-block represents the required amount of disk and server resources required to support a specific number of Exchange 2010 users. The amount of required resources is derived from a specific user profile type, mailbox size, and disk requirements.

Using the building-block approach takes out the guesswork and simplifies the implementation of the Exchange 2010 Mailbox server. Once the initial building-block is designed, an organization can take this block of work and multiply it by some factor until the desired number of Microsoft Exchange server users (that is, Microsoft Messaging API (MAPI) Outlook users), has been properly met or configured to satisfy the Microsoft Exchange Server recommended performance metrics. EMC’s best practices involving the building-block approach for Exchange Server design proved to be very successful throughout many customer implementations. The process of creating a building-block involves four simple steps:

1. Identify user requirements.

2. Identify and calculate storage requirements (based on both IOPS and capacity).

3. Identify your Exchange Mailbox server database design. 4. Finalize the Exchange Mailbox server storage configuration

Step 1: Identify requirements

As we can see from the Exchange user profiles outlined above, the test environment needs to support 8,000 users (with 4,000 users per server) at 0.15 IOPS per user and a 1 GB MB mailbox quota.

Step 2: Calculate storage requirements

Use this formula to calculate storage for the Exchange 2010 Mailbox server role: (IOPs * %R) + WP (IOPs * %W) / Physical Disk Speed = Required Physical Disks

Where Is

IOPS the number of input/output operations per second %R the percentage of I/Os that are reads

%W the percentage of I/Os that are writes

WP the RAID write penalty multiplier (RAID 1=2, RAID 5=4) Physical Disk

Speed

For the CLARiiON CX4™ series, 155 for 15k rpm FC drives, 130 for 10k rpm FC drives, and 55 for 7.2k rpm SATA drives

(16)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 16

Note:

Microsoft also provides an Exchange 2010 Mailbox server role requirements calculator with some additional variables:

IOPS calculation:

Calculations are based on the targeted user profile as listed above and availability of 1 TB 7.2k rpm drives on the CLARiiON CX4-480. It is essential to calculate IOPS first, and then capacity.

On each Mailbox server, 4,000 users generate 720 IOPS (that is, 4,000 * 0.15 IOPS + 20% headroom). In RAID 10, it requires at least 19 (18.3 round up to 19) spindles to complete the tasks:

((720 * 0.6) + 2 * (720 * 0.4)) / 55 = 18.3

Capacity calculation

On each Mailbox server, at least 7,000 GB formatted capacity is required for 4,000 mailboxes (that is, 4,000 * 1 GB + 35% = 5,400 GB, where 35 percent reservation is for deleted items retention).

Note: Due to the LoadGen 2010 Beta version, an additional 50 percent reservation is needed for indexing. As a result, at least 7,400 GB is required for each Mailbox server. It is not necessary for the production Exchange server.

For log files, at least 800 GB formatted capacity is required (that is, 4,000 * 29 logs/per user/per day * 7 days retention = 793 GB).

So the total space required based on capacity is 8,200 GB (7,400 GB + 800 GB). Four 1 TB Serial Advanced Technology Attachment (SATA) disks were grouped as one RAID 10 group (2+2), which provides 1.8 TB formatted capacity. Therefore, to provide total required capacity for 8,200 GB, it will require a total of 20 disks in five RAID 10 groups on the CLARiiON CX4-480 (8,200 / 1024 / 1.8 = 4.4 round up to 5).

Number of disks required

Based on the calculations above, capacity requirements supersede IOPS

requirements. In total, 20 1 TB SATA drives were grouped as five RAID 10 groups to fulfill both IOPS and capacity requirements.

(17)

Step 3: Identify Exchange server Mailbox design

The next step is to identify how many databases to configure per Exchange server. This involves determining how large the databases need to be. Based on the capacity of each RAID group in this solution, the database and log LUNs are configured to support 400 users and two databases and two log LUNs will be accommodated per RAID group. So each RAID group will host 800 users in total. In summary, a building-block is created that provides all the necessary requirements for performance, capacity, and data protection to support 800 users. The table below summarizes the final building-block created for this configuration.

Item Description

Number of users supported 800 User profile supported 0.15

Mailbox size 1 GB

Disk size and type 1 TB 7.2k rpm SATA drives

RAID type RAID 10

Database LUN size 780 GB

Log LUN size 120 GB

Total disks required 1 RAID 10 (2+2) group – 4 disks (per database copy) Step 4: Finalize Exchange server Mailbox storage configuration

Scaling the configuration up to 4,000 users per the server requirement will require five of these building-blocks, with a total of 20 disks with five RAID 10 groups for each database copy.

To improve performance, it is recommended that each Exchange database and its corresponding log LUNs be placed on separate RAID groups.

(18)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 18

Exchange DAG configuration

High availability is provided for this solution with the use of Microsoft Database Availability Groups (DAGs). Within a DAG, a set of Mailbox servers uses continuous replication to provide automatic recovery in the event of failures.

In this environment, each database has two DAG copies, with three Exchange 2010 Mailbox servers deployed. Each of the Exchange 2010 Mailbox servers (1 and 2) hosts 10 active database copies. Exchange 2010 Mailbox server 3 hosts and 20 passive database copies.

In this way, the snapshots of passive DAG copies will be used for backup, thus eliminating the performance influence on the active DAG copies (for detailed information about Exchange server backup, please refer to the “Backup and recovery design for Exchange” section).

(19)

Exchange virtualization resource allocation

The Exchange servers were deployed on virtualized machines. The virtualization allocation of this solution is detailed in the following table.

Server role vCPUs Memory (GB) Boot disk (GB) Raw device mapping disk DC Server x 1 2 4 80 N/A Exchange 2010 HUB/CAS Server 2 8 100 N/A Exchange 2010 Mailbox Server 4 24 100 780 GB x 10 (Database LUNs) 120 GB x 10 (Log LUNs) Replication Manager Server 2 4 80 N/A Note:

Microsoft provides detailed information on how to calculate memory and CPU requirements for Exchange Server 2010:

(20)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 20

Backup and recovery design for Exchange

Overview Replication Manager helps organizations to safeguard their business-critical Exchange 2010 data with point-in-time, disk-based replicas or continuous data protection sets that can be restored to any significant point within the protection window. With its awareness of the Exchange 2010 environment, Replication Manager wizards guide the process of linking Exchange intelligence with EMC replication software. Replication Manager supports Exchange 2010 in standalone or DAG environments.

Microsoft Volume Shadow Copy Service (VSS) coordinates with Exchange 2010, replication, and CLARiiON to enable application-aware data management. VSS enables Replication Manager to create application-aware replicas. During replication or snapshots, Replication Manager coordinates with the storage and Exchange 2010 to create a snapshot or clone, which is a point-in-time copy of the volumes that contain the data, logs, and system files for Exchange 2010 databases. Replication Manager coordinates with VSS and Exchange 2010 to freeze then thaw the databases during snapshot creation, and then resumes the flow of data after the replication is complete.

The EMC NetWorker client/server environment enables organizations to protect their enterprise from the loss of valuable data. In a network environment, where the amount of data grows rapidly when servers are added to the network, the need to protect data becomes crucial. EMC NetWorker products give organizations the power and flexibility to meet such a challenge.

LAN-free backup design for Exchange 2010

The following image illustrates the LAN-free configuration of Exchange 2010 backup used in this solution. The snapshots of the Exchange 2010 DAG passive copies are mounted on the Replication Manager mount host, which in this solution is also known as the NetWorker Storage Node, and are connected with Data Domain through FC. So the backup data flow is through SAN.

This design avoids the network traffic while rolling the snapshot data to Data Domain. It also minimizes the impact on the production environment during the backup.

(21)

The data flow of the LAN-free topology for Exchange 2010 backup is listed in the following table:

Stage Description

1 NetWorker Server initializes the backup request to the Exchange 2010 DAG passive copies.

2 Before the backup, NetWorker pre-script function calls out the

Replication Manager CLI function to create snapshots for the Exchange 2010 database storage volumes automatically.

3 After a quick snapshot, the replicas are mounted and made visible on the Replication Manager mount host.

4 The Replication Manager mount host, in this case the NetWorker Storage Node, uses the snapshot in primary storage to transfer the data into Data Domain devices through FC.

NetWorker Server, the Replication Manager server, and Exchange 2010 DAG Passive Node server communicate through LAN. However, the data itself is not transferred across the LAN because the backup client, also known as the Replication Manager mount host, is also the NetWorker Storage Node. Data Domain is attached directly to the NetWorker Storage Node and configured as the VTL.

(22)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 22 Storage design and consideration for Replication Manager

For Replication Manager to create a replica on CLARiiON, either CLARiiON

SnapView™ clone technology or CLARiiON SnapView snapshot technology, can be used. CLARiiON SnapView clone technology is known as a real mirror whose size is the same as that of the source volume. CLARiiON SnapView snapshot technology uses copy-on-first-write to perform point-in-time snapshots.

The Exchange 2010 DAG feature makes it possible to use the CLARiiON SnapView snapshot feature in this solution. There is no impact on the production environment when backing up the snapshots of Exchange 2010 DAG passive copies. CLARiiON SnapView can create or destroy a snapshot in seconds, regardless of the LUN size, because it does not actually copy data. It will significantly reduce the replication time and space requirements compared with CLARiiON SnapView clone.

To configure a CLARiiON SnapView snapshot, a reserved LUN pool with the proper number and size of LUNs (also known as snapshot cache) should be allocated for the snapshot function. In this particular solution, 40 LUNs with a total of 36 TB volume capacity needed to be backed up, so 80 x 45 GB LUNs were created to form the snapshot cache, which is a total of 20 percent of production data.

For more information on how to calculate the snapshot cache size, refer to EMC SnapView for Navisphere Administrator’s Guide.

Automation scripts during backup

EMC NetWorker provides the savepnpc command, so that the pre-script and post-script of the backup can be easily customized. By using the Replication Manager CLI function, the whole backup procedure can be managed by NetWorker.

(23)

NetWorker save set

configuration for Exchange 2010

To maximize the backup performance, consider:

• Raising the backup load by properly setting up parallelism in NetWorker

• Balancing the data flow during backup in parallel

Configure the Save Set attribute of the client resource to achieve this. For each Exchange 2010 database, back up the .edb database file and the log folder. In this particular solution, 20 Exchange 2010 mailbox databases in total needed to be backed up, so the client parallelism value was set to 20. This means that 20 database files can be backed up simultaneously if 20 drivers are assigned from Data Domain. This design ensured that there was sufficient backup load. In the meantime, the backup data flow for each backup session can be properly balanced.

The following table lists the backup order and the content to be backed up:

Note Considering that log folders take much less backup time than db files, 20 database files will be backed up simultaneously within most of the backup window.

Backup order Backup content

1 MBX01 (10 DBs and 10 logs)

2 MBX02 (10 DBs and 10 logs)

To achieve the backup order above, the following values were specified in the Save Set attribute of the client resource.

C:\MBX01\DB1\MBX01_DB1.edb C:\MBX01\LOG1 C:\MBX01\DB2\MBX01_DB2.edb C:\MBX01\LOG2 … C:\MBX02\DB10\MBX02_DB10.edb C:\MBX02\LOG10

(24)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 24

Microsoft SharePoint design

SharePoint 2007 design in a virtualized environment

Introduction The following design factors should be considered for a virtualized SharePoint 2007 environment:

• SharePoint content database

• SharePoint farm search component

SharePoint virtualization resource allocation

• SharePoint storage design

• Clone design

SharePoint content database consideration

The SharePoint farm is designed as a publishing/collaboration portal. It includes 1 TB of user content consisting of 10 SharePoint site collections, each populated with 100 GB of content data.

Microsoft recommends a 100 GB content size for each content database as a soft limit. The storage design best practices are 130-150 GB for data volume and 25-50 GB for log volume. This solution designs:

• 100 GB content database data files on 150 GB LUNs

• 5 GB content database transaction log files on 30 GB LUNs

SharePoint farm search component consideration

During the full farm backup, it is important to back up the SharePoint search database. Two types of search components are available in the SharePoint farm:

Enterprise search engine: Office SharePoint Server Search Service (Osearch)

SharePoint help information search Engine: Windows SharePoint Services (WSS) Search Service (SPsearch), a very small search index (less than 100 MB), but still required to be backed up

Both Osearch and SPsearch engines are configured to store:

• WFEs, which are also configured as the query server for better query

performance by using query load balance. Content Index (CI) files are stored on the physically exposed drive letter (LUN) on the index and query servers.

• The SSP search database, which stores metadata and crawler history

information for the search system, and typically requires more disk space than the index.

For definitions about Office Search Engine and WSS Search terms, refer to Microsoft TechNet websites.

(25)

SharePoint virtualization resource allocation

The SharePoint farm uses two of three ESX servers. The virtualization allocation of this solution is detailed in the following table:

Server role vCPUs Memory (GB) Boot disk (GB) Raw device mapping disk

WFE Server x 4 4 4 40 40 GB x 1 (query volume)

Index Server 2 4 40 150 GB x 1 (office

search index volume)

30 GB x 1 (WSS help search index volume) Application Excel Server 2 2 40 SQL Server 2008 4 8 40 150 GB x 10 (content database data volume) 30 GB x 10 (content database log volume) 50 GB x 1 (configuration database volume) 200 GB + 50 GB (SharePoint SSP Search database data and log volumes) 80 GB x 5 (SQL temp database and log volumes)

(26)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 26

SharePoint storage design

This white paper uses the Microsoft recommended sizing of 100 GB for each SharePoint 2007 content database.

A total of 1 TB user data was split into 10 x 100 GB content databases on 150 GB volumes, which used RAID 5 protection.

The following table describes the detailed disk layouts, and the size and number of disks holding SharePoint data across the whole solution.

Description Size (GB) Quantity Total (GB) Drive and RAID type Number of disks SQL MOSS content (Databases –Data) x 10 150 10 1,500 300 GB 15k RAID 5 (8+1) FC disks 9 SQL MOSS content (Databases –Log) x 10 30 10 300 SQL MOSS configuration

(Databases and Log)

50 1 50 300 GB 15k RAID 5 (4+1) FC disks 5 Index volume–Index Server 150 1 150 Query volume-WFEs 40 4 160 300 GB 15k RAID 5 (3+1) FC disks 4 WSS Index volume-Index Server 40 1 40 System volume-SQL 40 1 40 SQL TempDB Data &

Log x 5 80 5 400 300 GB 15k RAID 10 (5+5) FC disks 10 SQL MOSS SSP Search Database 200 1 200 SQL MOSS SSP Search Database Log 50 1 50

(27)

Backup and recovery design for SharePoint 2007

Introduction EMC NetWorker provides disaster and granular backup and recovery for many applications:

• Full disaster backup and recovery: The entire volume or database for that application is backed up, and the entire volume or database is recovered as a whole. In disaster backup and recovery, individual items for backup and

recovery cannot be selected. Incremental level backup is not supported by VSS but is supported in granular backups.

• Granular backup and recovery: In granular backup, individual items can be selected for backup and in granular recovery. Individual items can be selected for recovery.

Full disaster backup and recovery design

This solution is using a VSS framework for consistent point-in-time application snapshots, delivering quick recovery and off-host backup. This solution also demonstrates full recovery for a distributed SharePoint farm and individual SharePoint content databases.

VSS providers overview

This solution uses two kinds of Volume Shadow Copy Service (VSS) providers for SharePoint full farm backup and recovery:

• Microsoft VSS Provider (software-based)

• CLARiiON VSS Provider (hardware-based)

The default VSS provider software on the Windows platform is Microsoft Software Shadow Copy Provider.

For more information on how Microsoft VSS works on Windows 2008, refer to the article on the Microsoft TechNet website:

For a WFE server, NetWorker can back up one file of the system volume, so Microsoft VSS was used for WFEs.

VSS hardware providers (EMC VSS Provider in this solution), which are used to back up the SQL and Index servers, enable the creation of shadow copies at the hardware level, without imposing a load to the production server. For the purposes of VSS, the snapshot/clone is referred to as a shadow. Furthermore, an option to make the shadow transportable is provided, which allows you to mount, or import the shadow on another client. If a shadow is not marked as transportable, you will not be able to mount the shadow or perform rollback recovery.

(28)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 28

VSS Writer overview

NetWorker and NMM integrate with Microsoft Office SharePoint Server 2007 by using the SharePoint Volume Shadow Copy Services (VSS) Writer. The Microsoft SharePoint Server 2007 VSS Writer is dependent on the Microsoft SQL Server 2005/2008 VSS Writer.

Using the SharePoint VSS Writer, EMC NetWorker takes VSS snapshots of the entire SharePoint farm for data protection. For more detail information and configuration for VSS Writer, refer to EMC NetWorker Module for Microsoft Applications Release 2.2 SP1 Application Technical Notes for SharePoint and Exchange.

LAN-free backup design for SharePoint full farm

The transportable technology of VSS hardware providers allows the data

clone/snapshot to be mounted onto a non-production environment (proxy client) for backup tasks. The benefits of using the proxy client are listed as follows:

• The lifetime of the data can be controlled without affecting the performance of the existing servers.

• Hardware resources such as the processor, memory, and network can be optimized for serving the client or the user application. The hardware resources of the proxy host can be used for backing up the data to the storage node.

• Multiple independent copies of the data volumes can be managed across several machines.

The following image illustrates the LAN-free configuration of SharePoint used in this solution. This design avoids network traffic when rolling the clone data to the Data Domain. It also minimizes the impact on the production environment during the backup.

(29)

The data flow of the LAN-free topology is listed as below in this solution:

Stage Description

1 NetWorker Server initializes the request to the application servers (in this solution, they included SQL servers and SharePoint Index servers) with the EMC VSS Provider installed.

2 The application servers use the EMC VSS Provider to create the clones/snapshots for the storage volumes.

3 Clones are mounted and visible in the NetWorker Proxy Server. 4 The proxy client, in this case the storage node, uses the clone/snapshot

in primary storage to transfer the data into Data Domain device through FC.

Data Domain is attached directly to the NetWorker Storage Node as a virtual tape library.

CLARiiON SnapView clone technology is used in this solution because it minimizes the impact on the production environment. The backup program reads data from clone LUNs mounted on a non-production client rather than from a snapshot that reads data from the production database LUNs. For the fast-changing database, it is suggested to use split mirror snapshot technology such as CLARiiON clones or Symmetrix® business continuance volumes.

For more information about the CLARiiON SnapView recommendation, refer to EMC NetWorker Module for Microsoft Applications Release 2.2 SP1 Application Technical Notes for SharePoint and Exchange.

(30)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 30

Clone group design

VSS providers for CLARiiON also require creating a local clone copy for LUNs. The total clone size that the database requires is 2,400 GB.

For better performance, 15 x 1 TB SATA drives were used for the clone LUNs. The 15 spindles were configured into three RAID 5 (4+1) groups yielding 12 TB. The database clone LUNs were distributed evenly among the three RAID 5 groups.

Snapshot policy

consideration

EMC NetWorker provides two preconfigured policies that can be used with NMM:

Serverless backup: A single snapshot is taken per day. The data is then backed up to the traditional tape and the snapshot is deleted.

Daily: Eight snapshots are taken per day. The data in the first snapshot is backed up to the tape. Each snapshot expires after 24 hours.

In this solution, serverless backup is used for SharePoint full-farm backup.

Retaining the snapshot enables you to perform a snapshot restore for the databases. The snapshot restore is much faster than a conventional restore, which reads data from the backup media. The disadvantage of keeping a snapshot is that the disk space used by the snapshot grows rapidly during daytime.

Full farm conventional recovery design

A full recovery of a distributed SharePoint farm requires that each machine in the farm is configured as a Client resource in the NetWorker.

Upon recovery, each machine will use a proxy client to read data from the backup target (Data Domain) and restore the entire farm over the LAN back to the

production environment.

One or more content databases can be recovered after the configuration database has been restored. In the previous releases of NMM, a user was unable to select only individual SharePoint content databases for restore. When any content

database was selected, the corresponding configuration and generic databases were also selected for recovery. NMM 2.2 sp1 provides the ability to select individual SharePoint content databases for recovery.

(31)

Granular backup and recovery design for SharePoint

Introduction In this solution, the granular backup and recovery is using LAN-based topology. The SharePoint 2007 backup utility does not support item-level recovery if the data is missing from the first-level and second-level recycle bin. One remedy is to restore the entire backup or snapshot to the secondary farm and get the item from the DR site. The backup or snapshot is not restored directly to the production server, which eliminates potential risks. However, it is time-consuming and expensive to set up a DR farm to do the recovery.

Another way is to restore whole sites to the production environment directly, which eliminates the need and expense of having a recovery server. However, restoring the entire database can take many hours and impact the business. In addition, restoring directly to the production server overwrites all of the content currently on it, which is not desirable. After running a granular backup of SharePoint site, an item-level granular recovery was completed using NMM 2.2 SP1.

The SharePoint granular backup does not use SQL or SharePoint VSS writers, so it is not necessary to register these writers prior to creating a client resource for SharePoint 2007 granular backup. NMM 2.2 SP1 offers granular backup for SharePoint 2007. Granular backup provides the finest granularity available with SharePoint backup down to the object level. It also provides the ability to back up incrementally. NMM 2.2 SP1 leverages content migration APIs (STSADM “export” command) of SharePoint 2007 by exporting every document and its metadata in the content site one by one.

Granular LAN-based backup and recovery design

In this solution, LAN-based topology is used for granular backup. The workflow is similar as listed in the table below:

Stage Description

1 One or more SharePoint WFEs with NMM 2.2 SP1 installed are set up as clients for granular backup.

2 NetWorker Server sends requests to the WFEs and sets the proxy client as the data mover.

3 WFEs request backup data from SQL Server and objects are staged in the folder on the WFEs.

4 The WFE then ships this data to the NetWorker proxy client.

5 The NetWorker proxy client transfers the data into Data Domain device through FC.

(32)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 32

As the following image illustrates, a public network is used to transfer data from the SQL server back end to the WFE during data extraction. The public network utilization can affect the backup performance.

The 15 SATA disks that contain clones LUNs also have five staging LUNs for SharePoint granular backup. Five 1 TB LUNs are used for four WFEs, and the SQL server is used as the staging folder to store backup streams during granular backup.

Note The staging LUN should have sufficient size for the staging to happen. If there is insufficient space for the temp folder the granular backup will fail. A capacity utilization of 70 percent of these SATA disks ensures better backup performance and I/O throughput during the backup.

For more detailed information about staging folder size calculation, refer to EMC NetWorker Module for Microsoft Applications Release 2.2 SP1 Application Technical Notes for SharePoint and Exchange.

Save set configuration for SharePoint

For more information about Save set settings for SharePoint full farm backup and granular backup, refer to EMC NetWorker Module for Microsoft Applications Release 2.2 SP1 Administration Guide.

(33)

Data Domain design and configuration

Data Domain system overview

Data Domain systems are disk-based deduplication appliances and gateways that provide data protection and disaster recovery (DR) for the enterprise. Data Domain operating system (DD OS) provides both a CLI for performing all system operations, and Enterprise Manager (a graphical user interface) for configuration, management, and monitoring.

Data integrity

The Data Domain Data Invulnerability Architecture protects against data loss from hardware and software failures. Storage in most Data Domain systems is set up in a double parity RAID 6 configuration (two parity drives). Additionally, most

configurations include one or two hot spares in each enclosure.

Data compression

DD OS stores only unique data. Through Data Domain Global Compression technology, a Data Domain system pools redundant data from each backup image. Any duplicate data are stored only once. The storage of unique data is invisible to backup software, which sees the entire virtual file system. DD OS data compression is independent of data format. Data can be structured, such as databases, or unstructured, such as text files. Data can be from file systems or raw volumes.

Restore operations

With disk backup through the Data Domain system, incremental backups are always reliable and access time for files is measured in milliseconds. Furthermore, with a Data Domain system, full backups can be performed more frequently without the penalty of storing redundant data.

From a Data Domain system, file restores go quickly and create little contention with backup or other restore operations. Unlike tape drive backups, multiple processes can access a Data Domain system simultaneously. A Data Domain system allows your site to offer safe, user-driven, single-file restore operations.

Data Domain sizing

considerations

Storage capacity needs to be sized to adequately handle the amount of data to be retained. Backups that are larger than expected or contain data that deduplicates poorly can require much more storage space.

Although there are many factors that might affect the deduplication ratio, it is possible to estimate Data Domain storage capacity required for particular backup scenarios. Typical compression ratios are about 20:1 on average over many weeks. A backup that includes many duplicate or similar files (files copied several times with minor changes) benefits the most from compression.

In this particular solution, 16 x 931 GB drivers were configured into RAID 6 (12+2) with two hot spares so that total available capacity displayed in the Data Domain console is about 10 TB, which means the Data Domain system can accept up to almost 200 TB real data backups. In this solution, the data that needed to be backed

(34)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 34

EMC strongly recommends performing a sizing assessment when including

replication in the backup environment. The sizing assessment can help to determine if replication can occur within the required timeframe, based on the replication network bandwidth and estimated amount of data to replicate on each day.

Data Domain deduplication ratio

considerations

The term “deduplication ratio” refers to the ratio of data before deduplication to the amount of data after deduplication.

There are many factors that affect the deduplication ratio. Some key factors are listed below:

• Retaining data for longer periods of time improves the chance that common data already exists in storage, resulting in greater storage savings and a better deduplication ratio.

• Backups of Exchange and SharePoint are known to contain redundant data and are good deduplication candidates.

• After first full backup, the data change rate affects the deduplication ratio for those consecutive backups.

• Data compression and encryption during backup affect the deduplication ratio; thus, this is not recommended.

• EMC recommends multiplexing be turned off when using the Data Domain storage system as a VTL with NetWorker.

Data Domain space management considerations

EMC recommends running space reclamation weekly as per the default. This feature can be scheduled or run manually.

If possible, schedule space reclamation to occur outside peak ingestion windows. This reduces the completion for resources and minimizes any impact on ingestion, deduplication, or replication.

Data Domain VTL with NetWorker

The following describes general EMC NetWorker settings and best practices for optimizing the backup environment when using Data Domain as a VTL:

• Avoid running disk-intensive applications such as virus scanning on the backup client when it is backing up or restoring files.

• Use parallelism on the client when backing up data for increasing backup load.

• Assign library and drivers for the exclusive use of each backup host to ensure the best possible performance.

• Balance the backup start times rather than scheduling hundreds of backups to begin at the same time. Look at the savegroup and client completion times, or drive activity, to balance the load.

(35)

• To ensure steady-state load, examine the drive target sessions, and try to keep certain numbers of sessions (more than 10 percent) running throughout the backup window. Fewer than these risks stalling target devices; more than this places unnecessary load on the infrastructure.

• On large systems with more than several hundred gigabytes to protect, eliminate data travel through the network by configuring the client as a storage node (LAN-free topology).

• Increase the number of storage nodes and devices if possible, for better performance.

Data Domain configuration

In this particular solution, Data Domain is configured as a VTL and connected to the NetWorker Storage Node. The LAN-free environment is enabled. Configuration details are as follows:

• Exchange and SharePoint share the same NetWorker Storage Node. Adding more storage nodes improves the combining backup and recovery performance if it is required.

• Two libraries are configured, one for the exclusive use of Exchange 2010 and the other for SharePoint 2007.

• In total, 20 IBM LTO-3 drivers are configured for Exchange 2010 so that all 20 database files can be backed up simultaneously. The tape size is 400 GB so that each database file (around 370 GB) can be backed up within one tape.

• In total, 16 IBM LTO-1 drivers are configured for SharePoint 2007. The tape size is 100 GB.

• On the Storage Node Server, one dual-port 4 GB HBA card is assigned to connect the Data Domain system and the other to connect to the primary storage (backup source). This design ensures the best performance when doing backup.

• On Data Domain System, there are two 4 GB HBA ports. Each port is assigned to half the amount of drivers for load-balance consideration.

(36)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 36

Testing and validation

Introduction This section describes the design validation and performance results for this solution. The backup and restore features were validated under different scenarios and performance was measured for both SharePoint 2007 and Exchange 2010. An EMC Data Domain DD690 appliance was used for data deduplication. Microsoft LoadGen 2010 was used to generate mailbox data and simulate a MAPI work load. The KnowledgeLake Document Loader was used to provide continual data

population during testing to simulate SharePoint user data growth. The data grows at a daily base. After that, daily full backup test is performed to validate the solution design.

Exchange backup scenarios

Introduction The following table lists the Exchange backup scenarios performed in this solution:

Test Scenario Description

1 Initial Exchange 2010 full backup 2 Daily full backup

Scenario 1: Initial

Exchange 2010 full backup

This test scenario was to validate the initial Exchange 2010 full backup performance and deduplication ratio in Data Domain. This should be a one-time event.

The test results showed:

• It took 9 hours and 19 minutes to do an initial full Exchange 2010 backup of 6.5 TB of data.

• The deduplication ratio was 1.46:1.

Note It takes some time to seed the grid on Data Domain when performing the initial backup. What is important is the backup time and duplication ratio for a daily full backup following the initial full backup (See results in Exchange Backup Scenario 2.)

• The backup throughput to Data Domain was 214 MB/s on average.

The following graph shows the Data Domain statistics during the initial full Exchange 2010 backup. As you can see from the graph below, backup throughput is

(37)

Scenario 2: Daily full backup

This scenario was to validate the daily full backup performance and deduplication ratio after running LoadGen for eight hours, which generated about 250 GB of log data. Since Data Domain contains the initial full backup data, the deduplication ratio increased greatly.

The test results showed that:

• It took 4 hours and 54 minutes to back up 7.4 TB of data into Data Domain.The backup rate is about 1.5 TB per hour.

• The deduplication ratio was 37:1. The total deduplication ratio of two full Exchange 2010 backup data was 2.84:1 percent.

Note The test result of deduplication ratio is based on using LoadGen, which might be different from real-world data. LoadGen is not a tool to test deduplication ratio.

• The backup throughput to Data Domain was 468 MB/s on average and Data Domain CPU utilization was above 25 percent during the backup. Because Data Domain contains initial Exchange 2010 backup data, the backup throughput was higher than the initial backup and CPU utilization was lower, compared to the very first time.

(38)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 38

environment numbers were gathered by setting all of the 4 GB ports to Shared Rate mode, a more common configuration. This means that by adding more FC bandwidth, it is possible to scale to a backup rate that is more than 1.5 TB per hour.

The following table lists the daily full backup results after LoadGen simulation:

Amount of backup data RM snapshot and mount replica time Backup time (database and Log) Total backup window Deduplication ratio 7.4 TB 28 minutes for all 40 snaps 4 hours and 36 minutes 4 hours and 54 minutes 37:1

Note The amount of snaps will impact the total RM snapshot and mount replica time. It is recommended to mount less than 20 volumes on the RM mount host. So adding more mount host (backup server) will improve the total backup window

The following image shows the SAN switch bandwidth during the daily full backup after LoadGen simulation. The total throughput is about 68.5 MB/s.

(39)

The following graph shows the Data Domain statistic during the daily full backup after LoadGen simulation. The Data Domain CPU utilization is about 25 percent.

(40)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 40

Exchange 2010 recovery scenarios

Introduction The following table lists the Exchange recovery scenarios performed in this solution:

Test Scenario Description

1 Single database recovery 2 Single 2010 mailbox recovery

Scenario 1: Single database recovery

This test shows the RTO to recover a single Exchange 2010 database by using both Replication Manager snapshot and Data Domain backup data. Snapshot restore provides a quick way for Exchange 2010 database recovery and you can also recover old backup data from Data Domain by leveraging the NetWorker client. The test results showed that:

• It took only 8 minutes to restore one Exchange 2010 database from an RM snapshot.

• It took 2 hours and 37 minutes to recover one Exchange 2010 database from Data Domain to Mailbox Server MBX01 of 386 GB of data.

• For Data Domain recovery, the recovery speed from Data Domain was about 41.96 MB/s by calculation. In our testing environment, the bottleneck was the network speed. The recovery window could have been improved if we

increased the network bandwidth; for example using the 10 GB network instead of the 1 GB network in the testing environment.

• For Data Domain recovery, the average CPU usage of Data Domain was 10 percent and the maximum disk utilization of Data Domain was 23 percent, which means Data Domain resources were enough to support more network bandwidth or multiple recovery sessions.

The following table lists the performance counters captured on the Mailbox Server MBX02 to measure the impact on the production environment during the Data Domain recovery.

Mailbox Server CPU usage (%)

Network Utilization (%)

(41)

The following graph demonstrates the Data Domain performance during a full point-in-time recovery. Scenario 2: Exchange 2010 mailbox recovery

Microsoft provides a mechanism to recover data at the mailbox level or item level, which is called a recovery database (RDB). RDB is a special kind of mailbox database to mount a restored mailbox database and extract data from the restored database as part of a recovery operation.

Perform the steps in the following table to recover data using RDB:

Step Action

1 Restore the database files and log files from the tape, and put them under predefined folders on the recovery Mailbox server.

2 Create the recovery database with the cmdlet New-MailboxDatabase and a switch –Recovery. Specify the database path and log file path as the predefined folders. For example:

New-MailboxDatabase -Recovery -Name RDB1 -Server MBX03 EdbFilePath "C:\Recovery\RDB1\DB\MBX02_DB1.EDB" -LogFolderPath "C:\Recovery\RDB1\Log"

3 Use the Restore-Mailbox cmdlet to recover mailbox-level data or item-level data.

(42)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 42

For more details about recovery databases, refer to the following articles on the Microsoft TechNet website:

The advantage of this recovery method is that RDB is a built-in feature of Exchange 2010 so there is no need to buy additional software or licenses to create RDBs on existing Exchange servers. However, the recovery procedure consumes CPU, memory, and network resources on Exchange servers. To minimize the impact, organizations can have a dedicated Exchange Mailbox server for RDBs, which is not a cost-effective solution. Currently, there is no GUI for the RDB feature. All

operations need to be run through cmdlets, which adds complexity to the recovery. Ontrack PowerControls is mailbox recovery software that overcomes some of the disadvantages of RDB. Ontrack PowerControls for Exchange works with existing Exchange Server backup architecture and procedures, and enables the recovery of individual mailboxes, folders, messages, attachments, calendar items, notes, and tasks directly to the production Exchange Server or to any PST file. This powerful software also lets you search and create a copy of all archived e-mail that matches a given keyword or criteria.

In this solution, Ontrack PowerControls is installed on the NetWorker proxy client, which minimizes the impacts on the production environment.

Perform the steps listed in the following table to recover data using Ontrack PowerControls:

Step Action

1 Restore the database files and log files from the tape to the NetWorker Proxy node. In this way, there is no impact on the LAN and the Exchange servers.

2 Run Ontrack PowerControls for Exchange on the NetWorker Proxy node, and specify the location where the database files and log files are restored as shown in the following image.

(43)

3 Ontrack PowerControls lists all mailboxes contained in this database. Through the GUI, select the specific mailbox for restore as shown in the following image.

4 When restoring the data, select in which format to export the data as shown in the following image.

(44)

EMC Backup and Recovery for Microsoft Applications -

Deduplication Enabled by EMC CLARiiON and Data Domain—A Detailed Review 44

5 For item-level restore, expand the mailbox folder hierarchy, and select the e-mail items to be restored as shown in the following image.

6 Select the Export format to as shown in the following image.

A significant advantage for this solution design is that when the recovery data is contained in the last snapshot, it is easy to mount the snapshot to the Proxy node for recovery, without waiting to restore it from the tape, which greatly saves the restore time.

(45)

SharePoint backup using VSS framework

Introduction The following table lists the SharePoint backup scenarios using the VSS framework performed in this solution:

Test Scenario Description

1 Initial SharePoint farm full backup 2 Daily full farm backup

3 Database-level granular full backup

Workload simulation

The KnowledgeLake Document Loader was used to provide continual data

population during testing to simulate SharePoint user data growth. The intention was to measure the Data Domain deduplication ratio when SharePoint content data increases.

Content creation was accomplished with Knowledge Document Loader Lite software. The software can take a series of documents and modify copies of them to generate unique documents. It then takes the document copies and distributes them into document libraries in the SharePoint farm. The data population lasts for 8 hours per day. Scenario 1: Initial SharePoint farm full backup

This test scenario was to validate the initial SharePoint farm full backup performance and deduplication ratio in Data Domain using VSS in LAN-free topology.

The test results are as follows:

• It took 3 hours and 41 minutes to complete a full backup of 1149 GB of data in total into the Data Domain.

• The deduplication ratio was 1.55:1 percent while the post-compression data was 740.48 GB.

• The average response time for the SATA disks of clone LUNs was 3 milliseconds during backup.

• The average write throughput to Data Domain was 152.1 MB/s and Data Domain CPU utilization was 43.9 percent during the backup.

References

Related documents

Global CompressionTM, Data Invulnerability Architecture including inline verification and integrated dual disk parity RAID 6, snapshots, telnet, FTP, SSH, email alerts, scheduled

Global Compression TM , Data Invulnerability Architecture including inline verification and integrated dual disk parity RAID 6, snapshots, telnet, FTP, SSH, email alerts,

EMC Data Domain Operating System powers EMC Data Domain deduplication storage systems to deduplicate data during either the backup process or archive process, which

Client Direct, also known as direct file access (DFA), is a NetWorker feature that enables clients with IP network access to the Data Domain system to send backup data directly to

Backup solutions enabled by deduplication include EMC Avamar deduplication backup software; EMC Data Domain deduplication storage systems; and EMC NetWorker, which can be

EMC Data Domain Encryption software protects backup and archive data stored on Data Domain deduplication storage systems with data encryption and compression that is

Using this proxy system, backup software can capture VMDK and guest OS file backups with low application- VM impact and only moderate impact on the ESX server.. This system requires

EMC delivers proven, industry leading backup and recovery solutions including EMC Avamar, EMC NetWorker, and EMC Data Domain deduplication storage systems that quickly and.. 1