• No results found

EMC Solutions at Microsoft: Optimizing Exchange Backup and Recovery with VSS (Volume Shadowcopy Service) Technology Integration

N/A
N/A
Protected

Academic year: 2021

Share "EMC Solutions at Microsoft: Optimizing Exchange Backup and Recovery with VSS (Volume Shadowcopy Service) Technology Integration"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

EMC Perspective

EMC CLARiiON, SnapView, and

EMC Replication Manager/SE Best Practices

(2)

Executive Summary

Microsoft IT serves as the front-line customer for the various product development groups at Microsoft, deploying the new technologies to test and prove them in an enterprise envi-ronment. After the Microsoft IT group deployed Microsoft®WindowsServer 2003, the

lat-est edition of the company’s enterprise operating system, they decided to take advantage of the Volume Shadowcopy Service (VSS) available in the new operating system.

The adoption of VSS backup for Microsoft’s Exchange Server 2003 led to some discoveries and changes in the VSS functionality in Windows Server 2003. First, Microsoft needed a third-party vendor to provide the interface, a VSS requestor that would complete the solu-tion. After evaluating the available products, Microsoft went with EMC, which had the most mature solution on the market.

Working together, engineers from Microsoft and EMC developed a solution that used Microsoft’s existing EMC®CLARiiON®storage system and the VSS capability in Windows

Server 2003 to meet stringent standards for backup and restore times. The final result exceeded everyone’s expectations and allows Microsoft to restore Exchange service and data in a very few minutes instead of hours.

The benefits of using this VSS solution for backup and restore, especially when used with the EMC storage solution, have enabled Microsoft to meet its service-level agreement (SLA) goals for its Exchange servers and reduced backup and restore times dramatically.

By employing this VSS solution in its corporate Exchange environment, Microsoft IT has significantly reduced administrative overhead for Exchange, improved system performance and service availability, and improved its own ability to meet its SLA obligations. Those benefits should become even more dramatic as the company expands the solution to all of its Exchange servers worldwide.

This paper highlights the best practices learned from developing and deploying the VSS solution for Exchange using EMC CLARiiON storage technologies and EMC Replication Manager/SE.

Introduction: Microsoft Case Study

In Windows Server 2003, Microsoft introduced Volume Shadowcopy Service (VSS), which lets administrators back up and restore data very quickly and reliably through point-in-time imaging. Windows Server 2003 VSS utilizes free space on an NTFS volume to make copies of the data.

Microsoft acquired an EMC CLARiiON CX700 array and began optimizing Exchange for per-formance and availability. Microsoft IT wanted to run peak I/O activity with a certain level of response times, so they worked with EMC to balance I/O across the array in a combina-tion of striping and concatenacombina-tion. The solucombina-tion bound individual LUNs into metaLUNs— EMC’s new technology, which the newest FLARE®storage-system operating environment

supported.

Situation

Microsoft wanted a third-party tool to showcase the VSS technology infrastructure in Windows 2003 for taking instant snapshots of storage volumes. The product was to be used on Microsoft’s own corporate Exchange servers.

Solution

Microsoft worked closely with EMC to develop a product that met Microsoft’s SLAs for backing up/restoring its own production Exchange servers. The joint solution kept backup times within the current backup windows, but allowed any amount of data to be restored within minutes.

Benefits

Speed. Fast recovery of any amount of data for Microsoft’s e-mail, well within the required timeframe. EMC Replication Manager/SE further short-ened the backup jobs by 80 minutes. •Flexibility. Microsoft IT can offload

backups to tape anytime and can recover mail data and begin log replay within two minutes, regardless of the amount of data.

Reliability. The joint EMC and Microsoft solution showed extremely high reliability in backup during initial testing under realistic data loads. Replication Manager/SE enforces best practices such as data checking and validation and integrity checks. Products & Technologies

• EMC CLARiiON CX700

• Microsoft Exchange Server 2003 • EMC Replication Manager/SE • EMC SnapView

(3)

In connection with this optimization, Microsoft wanted to leverage the VSS capability in Exchange 2003 by creating an enterprise-level solution for its own corporate environment. In the fall of 2003, a call went out to vendors to supply a product that could implement such a solution for Microsoft. The Microsoft team evaluated EMC’s VSS product and found the EMC solution the most compelling one available.

Working together, engineers from the two companies improved the speed and reliability of the original design to create a solution that would meet Microsoft’s goals for the project. These goals included:

• Perform almost instantaneous restores while still fitting backups into the existing four-hour backup window

• 100 percent reliability of the backup

• Pass testing under realistic data volume load

The result of this collaboration was a product known as EMC SnapView™ Integration

Module for Microsoft Exchange (SIME). It grew to meet all of Microsoft’s requirements and significantly improved the availability of Microsoft’s corporate Exchange data and service.

In April 2004, Microsoft deployed the VSS solution on a CLARiiON CX700 array in its pro-duction Exchange environment. The configuration included an active-active-passive three-node cluster; each cluster held two Exchange Virtual Servers, each in turn containing four Exchange Storage Groups. Each Exchange Storage Group contained five databases, each of which contained 200 users. This means that each Exchange server controlled 4,000 users and between 800 GB and 1 TB of data—a staggering amount of data to back up in a short time. This configuration was later followed by two additional identically configured CX700s and related cluster configurations. One additional machine located outside of the clusters functioned as a mount host/requestor in the VSS process.

The VSS solution was such a success that EMC decided to integrate it into its Replication Manager product line. More work and refinements were ahead for the EMC/Microsoft team, ultimately leading to a vastly improved VSS product, Replication Manager/SE. In August 2005, Microsoft deployed Replication Manager/SE and an additional two CLARiiON CX700 arrays in its production Exchange environment. The deployment was so successful that Microsoft was able to implement a new SLA: restore Exchange service within one hour.

Replication Manager/SE exceeds Microsoft’s SLA requirements, cutting backup time by an additional 80 minutes. Without this VSS solution, basic e-mail service could be restored within a few hours, but restoration of the data would take 24 hours and require a merge process. With the current solution, Microsoft IT can recover e-mail service and up to 300 GB of data within just two minutes (not including log replay).

(4)

Reasons for Microsoft IT to Acquire New Solutions

In late 2003, the time was right for Microsoft to acquire a new backup and restore solution for its Exchange servers. SLAs were in place that the existing solution couldn’t meet, and Windows 2003 had introduced new technology that could potentially serve the purpose.

Recoverability within SLA Time Difficult

SLAs for Microsoft’s Exchange servers included a four-hour backup window—down from the previous rate of eight hours—and 100 percent reliability in the backup job activity. The new backup/restore solution also had to pass a two-day test that simulated the real data volumes on Exchange with no data loss.

With the advent of VSS technology, Microsoft IT had a tool that could help it better meet its SLA requirements for Exchange support. Figure 1 shows the new restore times that Microsoft was able to achieve with its VSS solution.

Figure 1: Restore Times Using EMC VSS Solutions, Based on a 10 Percent Daily

Change in Data

How Replication Manager/SE Works

Replication Manager/SE provides support for replica creation on Microsoft Windows plat-forms attached to CLARiiON CX and CX3 series arrays. Replication Manager/SE is fully inte-grated with the Microsoft framework that facilitates the creation of application snapshot backups. Specifically, Replication Manager/SE provides support for Exchange 2003 shots via VSS, which provides the framework to create point-in-time, transportable snap-shots (clone or copy-on-write) of Exchange 2003.

VSS has three basic components: the requestor, the writer, and the provider. A VSS requestor is typically a backup application—it requests a shadow copy set. Replication Manager/SE is a VSS requestor. The VSS writer is the application-specific logic needed in the snapshot creation and restore/recovery process. The VSS writer is provided by Exchange 2003 or other applications. The VSS provider is third-party hardware control soft-ware that actually creates the shadow copy. EMC has providers for both CLARiiON and Symmetrix®systems. The Volume Shadowcopy Service coordinates these components’

(5)

Figure 2: Volume Shadowcopy Service Components

VSS provides point-in-time recovery and roll-forward recovery via Copy and Full backup modes. Both modes back up the databases and transaction logs, but only Full mode trun-cates the logs after successful backup. Since these snapshots are transportable, they can also be used for repurposing. For instance, if your server is attached to a storage area net-work (SAN), you can mask the shadow copy from the production server and unmask it to another server that can reuse it for backup or mailbox-level recovery.

Replication Manager/SE as the requestor starts the VSS snapshot process. First, Replication Manager/SE performs a pre-check. It checks the application version and whether the storage space for making Exchange replicas is available. Second, Replication Manager/SE establishes the pairs of production LUNs and clone LUNs and re-syncs the clone LUNs with their source LUNs. It then works with VSS to create a point-in-time replica. Figure 3 illustrates the steps involved in creating an Exchange VSS backup with Replication Manager/SE.

(6)

Manager/SE then instructs the Exchange VSS writer to truncate the logs so that only those changes that haven’t been committed to the database remain. Conversely, if the integrity check fails, the replica is considered invalid and the logs are not truncated. Thus, Replication Manager/SE enforces Exchange best practices and ensures 100 percent relia-bility of the VSS backup it creates. Figure 4 below shows the typical steps in a Replication Manager/SE backup operation.

Figure 4: Steps in a Replication Manager/SE Backup Operation

As mentioned in the introduction, Microsoft IT’s Exchange production environment has four Exchange storage groups per Exchange Virtual Server. Each storage group is config-ured to have two LUNs: one for mail store databases and the other for log files. Within Replication Manager/SE, a replication job is created for each Exchange storage group. Therefore, each Exchange Virtual Server has four Replication Manager/SE jobs.

A limitation in the Microsoft’s Exchange VSS writer prevents two Exchange 2003 replicas from running at the same time on the same machine, even if they are replicating different storage groups. To improve backup performance and work around this limitation, Replication Manager/SE implemented its unique multi-job parallelism. Rather than waiting until the first job finishes completely before starting the second job, Replication Manager/SE starts the second job’s re-sync process as soon as the first job’s re-sync is done. Therefore, the second job’s re-sync is progressing while the first job is still running the eseutil integrity check. Since the re-sync step can typically take 35–40 minutes and the eseutil integrity check, running at 300 MB/s, can take 18–20 minutes to complete, this multi-job parallelism feature represents a significant time savings. In fact, this parallelism took one hour and 20 minutes off Microsoft’s VSS backup window.

(7)

Figure 5: Steps in a Replication Manager/SE Restore Operation

Best Practices and Lessons Learned

When Microsoft IT deployed EMC’s VSS solutions for tuning and testing, they ran backups for seven days with a 100 percent success rate. Because VSS has a 10-second restriction— that is, a writer can freeze or hold writes for no more than 10 seconds—there is a need to “fracture” (break) the database and log clone devices from their sources within the 10-second window. Some vendors can have database LUNs of five data files and one log LUN, which can take too long. For speed, Microsoft and EMC use only one database LUN and one log LUN.

Since VSS has strict time requirement, and the mount process, or VSS import, also involves interaction with Windows devices’ plug and play, it has been observed that many mount failures are due to the mount host being overloaded. Microsoft and EMC’s experience shows that the number of LUNs mounted to a mount host should generally not exceed 32. Assuming each Exchange Virtual Server has four storage groups and each storage group has two LUNs (one for database and one for logs), a mount host can typically handle up to four Exchange Virtual Servers.

It is recommended to use dedicated and separate LUNs for each storage group’s database and log files. Since EMC CLARiiON storage arrays use the first four disks for the operating environment and internal management, it is not advisable to place Exchange database or log LUNs on these four disks.

Server Configuration Best Practices

The Microsoft/EMC team developed several best practices for server configuration as they worked on the VSS solutions. These best practices, combined with existing best practices, culminated in:

• Place information stores on a separate LUN from their transaction logs.

• Relocate the working directory or checkpoint file to the same LUN as the transaction logs. • Disable circular logging.

• Understand “peak I/O per mailbox” requirements (e.g., for Microsoft IT, it’s 1.2 IOs per second per user).

(8)

• Select a storage vendor with a VSS solution that completely adheres to the Exchange VSS requirements.

• Separate random and sequential I/O into different RAID groups. • For good I/O performance under load

– Average read latency should not exceed 20 ms. (5 ms. was Microsoft IT’s requirement). – Average write latency should not exceed 20 ms. (15 ms. was Microsoft IT’s requirement).

Storage Design Best Practices

Microsoft and EMC optimized the storage configuration as described in the following section.

CLARiiON CX700 Design Approach

Microsoft and EMC started with testing of the storage system, using LoadSim and Jetstress to validate the SAN design. They spent a lot of time testing with simulations of peak I/O activity with certain disk read/write times. Exchange is an I/O-intensive server application, so Exchange design has various I/O requirements. To achieve very fast reads and writes, the team balanced I/O across the array—stripe and concatenate—using CLARiiON’s new metaLUN technology. The metaLUN technology enabled the Microsoft/EMC team to bind one LUN on top of 24 disks. Note that if you assign one LUN to one disk, you can get into a bottleneck situation.

Sector alignment details were very important as well. Sector alignment ensures that the disk geometry matches the OS/application I/O access granularity. Frequently, when a disk is for-matted, it starts at an odd offset. The even I/Os coming from the host are split across these odd-sized sectors on the disk, causing two I/Os where there should be only one. This situation can create performance degradation. EMC first identified and came up with a sector-alignment solution for this problem for the Symmetrix storage system in 1998. Since then it has become an EMC standard best practice—one that most other vendors have adopted as well.

Microsoft originally used a 4 KB cluster size, but would likely change the cluster size next time. Forcing the cluster size to 64 KB when formatting the NTFS partition may work around some dramatic performance degradation introduced in Windows Server 2003 SP1 when using 4 KB cluster sizes. The Microsoft product group supports the 64 KB recommendation, though it is not officially a best practice.

(9)

Figure 6: Logical Mapping of Production and Backup Hosts to CLARiiON LUNs

and SnapView Clones

Optimizations

In the original EMC VSS solution design, the Microsoft/EMC team used clones for the VSS replica, taking a snapshot of that clone to present to the host. This protected the clone data from being changed or corrupted by the host that was accessing it. However, the engi-neers at Microsoft IT found this setup to have performance issues, and recoverability in case of a problem was complicated.

The team then re-worked the way the solution manages clones, deciding to have two clones associated with each production LUN. This time, they alternated clones on each backup, always having one “previously good backup” on a clone. This configuration auto-mates recovery while maintaining the protection required during the mounting and read-ing process. This solution consumes a bit more disk space, but the team found that the value of the solution outweighs that drawback.

Lessons Learned

During the process of refining their VSS solution with Microsoft’s experts, EMC helped dis-cover the cluster “isalive” termination bug in VSS. Microsoft worked to resolve this prob-lem in a hotfix and subsequent service pack, and EMC helped test and verify the success of the resolution.

From the development and deployment of the EMC VSS solutions, Microsoft learned the importance of end-to-end functionality. Microsoft and EMC pushed the backup window possibilities, and Replication Manager/SE now exceeds Microsoft’s backup SLAs. Microsoft was able to optimize its storage configuration, and gained the value of high availability for service and data.

(10)

Conclusion

Microsoft initially chose EMC’s VSS partnership because EMC was a market leader with a sophisticated product and plenty of support. And now, EMC provides the first VSS integra-tion soluintegra-tion running within the Microsoft corporate internal IT environment.

Not everyone needs VSS, but it is necessary to meet stringent SLAs such as Microsoft’s, and for the ability to back up and recover data as well as service quickly. Without VSS, Microsoft could restore service within its SLA, but not data for clients. With VSS, it can restore 200–300 GB of data in two minutes—restoring that data without VSS would take hours.

Now, Microsoft has three EMC arrays in its Exchange configuration. “The EMC solution has been very effective for us,” says Ryan McDonald of Microsoft.

What makes EMC Replication Manager/SE different from the competition? In large part, the difference is that Replication Manager/SE enforces best practices for backup and integrity checks and performance. And Replication Manager/SE delivers better backup windows because it has the ability to do parallel operations.

Replication Manager/SE grew out of the first VSS work jointly with Microsoft and was the first end-to-end hardware VSS solution that the Microsoft IT Exchange team was able to work with. Through this collaboration, the right and required steps for a VSS backup and restore were created. These steps became the documented process for VSS solutions to gain Microsoft approval. Additionally, EMC engineers have given significant attention to the way Replication Manager/SE interacts with the CLARiiON to achieve performance while meeting the above requirements.

Further Reading

Replication Manager/SE Quick Start Guide EMC Solution Suite for Microsoft Exchange

(www.EMC.com/solutions/microsoft) Replication Manager/SE (www.EMC.com/products/storage_management/replication_mgr_se) EMC Corporation Hopkinton Massachusetts 01748-9103 1-508-435-1000 In North America 1-866-464-7381

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

THE INFORMATION IN THIS PUBLICATION IS PR0VIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

EMC2,EMC, EMC ControlCenter, AlphaStor, ApplicationXtender, Captiva, Catalog Solution, Celerra, CentraStar, CLARalert, CLARiiON, ClientPak, Connectrix, Co-StandbyServer, Dantz, Direct Matrix

Architecture, DiskXtender, DiskXtender 2000, Documentum, EmailXaminer, EmailXtender, EmailXtract, eRoom, FLARE, HighRoad, InputAccel, Navisphere, OpenScale, PowerPath, Rainfinity, RepliStor, ResourcePak, Retrospect, Smarts, SnapShotServer, SnapView/IP, SRDF, Symmetrix, TimeFinder, VisualSAN, VSAM-Assist, WebXtender, where information lives, Xtender, and Xtender Solutions are registered trademarks and EMC Developers Program, EMC OnCourse, EMC Proven, EMC Snap, EMC Storage Administrator, Acartus, Access Logix, ArchiveXtender, Authentic Problems, Automated Resource Manager, AutoStart, AutoSwap, AVALONidm, C-Clip, Celerra Replicator, Centera, CLARevent, Codebook Correlation Technology, Common Information Model, CopyCross, CopyPoint, DatabaseXtender, Direct Matrix, EDM, E-Lab, Enginuity, FarPoint, Global File Virtualization, Graphic Visualization, InfoMover, Invista, MirrorView, NetWin, NetWorker, OnAlert, Powerlink, PowerSnap, RecoverPoint, RepliCare, SafeLine, SAN Advisor, SAN Copy, SAN Manager, SDMS, SnapImage, SnapSure, SnapView, StorageScope, SupportMate, SymmAPI, SymmEnabler, Symmetrix DMX, UltraPoint, UltraScale, Viewlets, and VisualSRM are trademarks of EMC Corporation. All other trademarks used herein are the property of their respective owners.

References

Related documents

The Creating a client resource using the Client Backup Configuration wizard section in the EMC NetWorker Module for Microsoft for Exchange VSS User Guide provides information about

Host- level backup, powered by V-Ray technology and integrated with VMware Data Protection APIs (application programming interfaces) and Microsoft® Volume Shadow Copy Service

You can use NMM and third-party software, such as Ontrack PowerControls, to perform a granular recovery of Exchange Server mailbox, public folder, or public folder databases or

Active node backups include stand-alone servers, public folder mailboxes (Exchange Server 2013 and 2016), and public folder databases (Exchange Server 2010).. Only passive

Microsoft Volume Shadow Copy Service (VSS) supports SQL Server database backup and recovery to simplify data protection and ensure high availability and data integrity. VSS

NMM supports granular level recovery so that you can restore individual items from within an or Exchange 2007 storage group database or an Exchange Server 2010 or 2013 database,

Microsoft Volume Shadow Copy Service (VSS) supports SQL Server database backup and recovery to simplify data protection and ensure high availability and data integrity.. VSS

On the restore side, Backup Exec Exchange agent provides additional functionality over and above the VSS writer by offering recovery storage group restore (for 2003 restore