White Paper
Abstract
Users are faced with many options and tradeoffs when choosing a backup strategy for Microsoft SQL Server databases. This white paper maps out those choices and examines how EMC® Data Domain® deduplication storage systems preserves data integrity, meets stringent RTO/RPO objectives, and integrates easily into a multitude of active SQL or third-party backup environments.
February 2012
Backup and Recovery for Microsoft SQL Server
Using EMC Data Domain Deduplication Storage Systems
Copyright © 2012 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.
The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and
specifically disclaims implied warranties of merchantability or fitness for a particular purpose.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.
Microsoft and Microsoft SQL Server, Microsoft Exchange, Microsoft SharePoint, and Microsoft Hyper-V are registered trademarks or trademarks of Microsoft, Inc. in the United States and/or other jurisdictions. All other trademarks used herein are the property of their respective owners.
Table of Contents
Executive summary... 5
Audience ... 6
Introduction ... 6
Data Domain Product Background ... 8
Advantages of Data Domain in a SQL Server Environment ... 8
EMC Data Domain Boost ... 9
SQL Server Basics ... 10
Terminology ... 11
Types of Backups ... 11
Recovery Models ... 12
Recovery Techniques ... 12
Data Domain Integration Best Practices... 14
Compression ... 15
Multiplexing ... 15
Encryption ... 15
Backup Application Based Data Deduplication ... 16
Blocksize ... 16
Stripes ... 16
Backup Command ... 18
Data Transfer Rates ... 18
Integration ... 19
Solution Planning ... 20
Additional Considerations ... 20
Backup Types ... 21
IP Network Considerations ... 21
Data Domain and Third-Party Backup Applications ... 21
Conclusion ... 22
Appendix A: Index Fragmentation... 23
Addressing the Challenge ... 23
Appendix B: Additional resources ... 24
Microsoft Resource Links ... 24
List of Figures
Figure 1: Native MS-SQL Database Backup Tool ... 6
Figure 2: NetWorker – MS SQL Client Properties – VSS Snapshot Configuration ... 7
Figure 3: Dual MS SQL Database Backups – NetWorker and Native SQL Server Back Up ... 8
Figure 4: EMC Data Domain Boost ... 9
Figure 5: Microsoft SQL Server Management Studio Databases... 10
Figure 6: Selection Recovery Model ... 12
Figure 7: Restore Database Dialog Box ... 13
Figure 8: Restore Database Options ... 13
Figure 9: Restore the Initial Full Backup then the First Transaction Log Backup ... 14
Figure 10: NetWorker MS SQL Client Restore GUI Example ... 14
Figure 11: Native SQL Backup - Disable Compression ... 15
Figure 12: Multi-striped Database Backup – Eight Stripes ... 18
Figure 13: Database Backup to a Null Device ... 18
Figure 14: Multiple Null Disk Devices ... 19
Figure 15: Nominal Database Backup Performance ... 19
Figure 16: NetWorker Management Console ... 22
Executive summary
Many database administrators prefer native Microsoft SQL Server backups directly to disk compared to using third-party backup applications. When utilizing native SQL Server backup, there is no reliance on the backup administrative team to perform backups or play a role in database recovery. Additionally, there is no longer a need for the database administrator to become proficient in deploying, configuring, administering, or maintaining third-party backup applications.
Historically, native SQL backups have had some drawbacks for a couple of reasons: • Native SQL backup facilities do not provide automated media management
capabilities and therefore must write to disk devices. While backups performed to disk media eliminated the challenge of manually managing tape cartridges, this method also introduced the need for a considerable amount additional disk. Conventional wisdom has traditionally been that the cost of disk versus removable tape media was significantly higher.
• Backup to disk did not meet the requirement of retaining an offsite copy of database backups as part of a disaster recovery strategy. Native backup to disk fell short of providing a viable solution for this requirement.
Deployed as database backup media, EMC® Data Domain® deduplication storage systems address the legacy shortcomings of performing native database backups to disk for the following reasons:
• Data Domain deduplication storage systems optimize storage capacity, making retention and replication of backup data exceptionally cost and network-efficient by providing 10-30x data reduction
• Data Domain systems are simple to integrate utilizing traditional backup software, but also offer an alternative with high-speed, cost-effective backup directly to a CIFS network share, utilizing native SQL Server backup. Users have the choice to eliminate the need for third-party SQL Server backup application agents and their associated operational costs and maintenance fees.
• Data Domain Replicator provides up to 99% reduction in bandwidth required, which enables users to send data offsite for faster “time-to-DR”
• EMC NetWorker integration with EMC Data Domain Boost (DD Boost) significantly increases performance by distributing parts of the deduplication process to NetWorker storage nodes or applications hosts, and serves as a solid foundation for additional integration between NetWorker and Data Domain systems
• Data Domain systems benefit from the EMC Data Domain Data Invulnerability Architecture – continuous recovery verification, fault detection and self healing, and other resiliency features transparent to the backup application.
This white paper provides information about the use of Data Domain deduplication storage as backup media for Microsoft SQL Server backups.
Audience
This white paper is intended as a guide for data protection architects, SQL Server database administrative staff, backup administrators and EMC partners seeking information about integrating Data Domain deduplication storage systems as a key component in a comprehensive backup and recovery strategy.
Introduction
Microsoft SQL Server backup methodology falls into one of two generic categories. The first consists of native SQL Server database backups. This backup technique creates SQL database backups using tools and utilities native to Microsoft SQL Server and does not rely on third-party backup application software (see figure 1). The native database backup tool performs a full database backup to disk through a CIFS network share. The tool is easy to use and
provides a feature set that
addresses business requirements. Benefits include the use of backup and recovery interfaces familiar to
the database administrative staff. This ability is included with Microsoft SQL Server, and there are no additional third-party software license fees.
The second backup methodology uses backup application software that integrates with Microsoft SQL Server to perform SQL database backups based on the Virtual Device Interface (VDI). This solution is typically packaged as a database agent
specifically for Microsoft SQL Server and a particular backup application. When VDI is used, the backup application allows setting customized backup and recovery
parameters similar to those that can be employed when using native Microsoft SQL tools and utilities.
EMC NetWorker backup software has the capability to utilize available snapshot technologies designed to provide application consistency for the backup and recovery processes. The EMC NetWorker Module for Microsoft Applications (NMM) delivers unified, online backup and recovery utilizing Microsoft Virtual Shadow Copy Services (VSS) for Microsoft applications including SQL Server, Exchange, SharePoint, and Hyper-V.
The NetWorker graphical user interface (Figure 2) is an example of a backup application that utilizes Microsoft VSS protection for SQL.
Figure 2: NetWorker – MS SQL Client Properties – VSS Snapshot Configuration When the snapshot type is based on Microsoft VSS, the backup application is the VSS requestor, the SQL Server is the VSS writer, and backup is coordinated with a VSS provider. Advanced backup and recovery features such as disk staging and instant recovery may be available with these implementations depending on the backup application and agent being used.
Additional Concepts
Sometimes customers utilizing the native Microsoft SQL Server database backup methodology, augment their solution with third-party backup client agents that effectively protect the native backup data as a flat file. This two-phased methodology is effectively “backing up a backup.” Among the perceived benefits of the augmented solution is that it allows segregation of the SQL database administrative staff from the data protection staff while providing means to retain database backups in conformance with sound business practices and standardized corporate retention policies.
There is another variant of the same methodology of an augmented backup solution that utilizes two backup solutions in combination to satisfy business objectives. Native SQL database backups are performed to a Data Domain system and
subsequently backup is performed by a third-party backup application and written to the same Data Domain system (Figure 3).
Figure 3: Dual MS SQL Database Backups – NetWorker and Native SQL Server Back Up
Data Domain Product Background
Data Domain deduplication storage systems minimize backup and recovery times, storage and network bandwidth, and risk of data loss. EMC offers a range of Data Domain systems to meet the backup and archive requirements for companies of all sizes as they seek to reduce costs and simplify data management.
Data Domain systems also offer replication that is extremely easy to deploy. The primary advantage of Data Domain system replication is that the data is deduplicated and compressed prior to being sent over the network.
Advantages of Data Domain in a SQL Server Environment
Data Domain systems can be directly integrated into Microsoft SQL Server
environments as disk backup media. In addition, Data Domain systems support all leading enterprise backup and archive applications for seamless integration into existing IT infrastructures.
The use of different backup methodologies with Microsoft SQL Server and Data Domain systems typically has a negligible effect on overall data deduplication ratios.
This enables users to perform native database backups in conjunction with database backups controlled by a third-party backup application without affecting
deduplication efficiency. This includes third-party backup applications that use a SQL agent, with or without VSS snapshots. Additionally, the use of different numbers of stripes or different blocksize values also has a negligible impact on deduplication ratios.
Data Domain network-efficient replication can be used to create offsite copies of SQL backups faster and more economically than legacy tape-based strategies. Data Domain replication makes advanced disaster recovery preparedness for SQL Server a reality.
EMC Data Domain Boost
EMC Data Domain Boost (DD Boost)distributes parts of the deduplication process from the Data Domain system to the backup server or application client. In addition to storage node support, NetWorker 7.6 SP2 or later supports DD Boost-based backup from application hosts for Microsoft applications and databases. This is driving new efficiency for users with NetWorker and Data Domain. By sending only unique data from the NetWorker server or application client to the Data Domain system, less LAN bandwidth is required, backups are 50 percent faster and the whole aggregate system more manageable.
NetWorker provides operational capabilities for configuring, monitoring, and reporting of backup and restores for Data Domain devices. This functionality is provided through the NetWorker Management Console (NMC) portal. The NMC portal is accessible from any supported remote Internet browser. The NMC Device
Configuration Wizard simplifies the configuration of storage devices, backup clients, storage (target) pools, volume labeling, and save set cloning.
DD Boost dramatically increases the aggregate throughput, up to 50% faster than NFS, and reduces the amount of data transferred over the network by 80 to 99 percent. These efficiencies can help eliminate future costs by leveraging existing backup servers and Ethernet networks.
Figure 4: EMC Data Domain Boost
• Increases backup speed up to 50% faster
• Reduces network traffic
• Clone-controlled replication
- Schedules replication - Catalog awareness
• Ease of use
- Wizard automated configuration - Monitoring and reporting
With DD Boost, backup applications can control replication between multiple Data Domain systems and provide backup administrators with a single point of
management for tracking all backups and duplicate copies. This paradigm allows backup administrators to efficiently create DR copies of their backups over the WAN using DD Replicator software and keep catalog consistency for easy disaster recovery. This also provides the flexibility for administrators to manage different retention periods for each copy of data.
With NetWorker, the Data Domain replication process is managed by standard NetWorker cloning, ensuring that NetWorker can recognize and manage a replicated (remote) copy of data and assign unique retention policies to it. The administrator has the ability to schedule the cloning process to run at a time that is most
appropriate for the business.
SQL Server Basics
A Microsoft SQL server instance includes system and user databases. As depicted in Figure 5, system databases are created at installation and include:
• The “master” database, which records all system-level information for a Microsoft SQL server. It contains records for all login accounts and all system configuration settings. The master database records the existence and location of all other
databases.
• The “model” database, which is used as a template that contains the default settings for all databases created within the
Microsoft SQL Server instance
• The “msdb” database, which is used for scheduling, alerts, and jobs
• The “tempdb” database, which serves as a global resource that contains all temporary tables and temporary stored procedures. It is
re-created every time the Microsoft SQL Server instance is started.
Data protection strategies for the system databases are dependent on the database being protected. For instance, transaction log backups are not supported for the master database.
The master database cannot be recovered if a functional version of it does not already exist. Recovery procedures for the master database may include re-installing
Microsoft SQL Server such that a backup of the pre-disaster master database can then be restored.
Figure 5: Microsoft SQL Server Management Studio Databases
The model and msdb databases can contain customized data such as user-specific templates, scheduling information, as well as backup and restore history information. Without a data protection strategy, these items will need to be manually
reconstructed in the event of a disaster.
The tempdb database is empty when the SQL instance is shut down, and does not require protection as it is re-created at startup.
Terminology
Entire databases, specific database files, file groups, and transaction log backups are among the supported backup types with Microsoft SQL Server. This section defines the terminology associated with a given backup type.
Types of Backups
Database backups
• Database Backup – This is a full backup of an entire database and represents the state of the database at the point when the backup is completed
• Differential Database Backup – This is a backup of all the files within a database, and contains only the extents modified since the most recent full backup of each file. Restoring a database protected with full and differential backups to the most recent point in time includes recovering the most recent full and differential backup.
Partial backups
• Partial Backup – Partial backups provide flexibility for backing up databases that contain some number of read-only file groups. This is a partial backup of all data in the primary filegroup, each read/write filegroup, and any optionally specified read-only files or filegroups.
• Differential Partial Backup – This backup contains only the extents modified since the prior partial backup of the same set of filegroups
File backups
• File Backup – This consists of a full backup of all data in one or more files or filegroups
• Differential File Backup – This is a backup of one or more files containing data extents changed since the prior full backup of each file
Transaction log backups
• Regular transaction log backups are required when using the full or bulk-logged recovery models. This backup contains all log records that have not been backed up previously.
Figure 6: Selection Recovery Model Copy-Only backups
• Database backups usually change the database in some way, such as truncating a transaction log in the case of a full database backup. Copy-Only backups can be used in cases where a backup of a database is required without changing the database.
Recovery Models
Microsoft SQL Server includes three recovery models: simple, bulk logged, and full (see Figure 6). The desired recovery model can be deployed based on requirements. Functionally, each recovery model differs with regard to how backup and recovery strategies are executed.
• The full recovery model includes log backups. This model typically has no
exposure to data loss. Point-in-time recovery is possible, up to including the last committed transaction.
• The bulk logged recovery model requires log backups. This model permits high-performance bulk copy operations. Recovery to the end of any backup is possible; point-in-time recovery is not supported.
• The simple recovery model
consists of performing full backups only. Logs are not backed up. In the event database recovery is required, the most recent full backup can be restored. Any changes that occurred subsequent to the last full backup must be redone. From a transactional perspective, the database can only be recovered to the point of the prior full backup.
Recovery Techniques
The technique used to restore a database will vary based on the recovery model being used as well as the backup types being performed. Figures 7-10 provide a brief look at restoring a database that was protected using the full recovery model with full and transaction log backups. A single full backup was performed, followed by five
transaction log backups. Figure 7 depicts the restore database dialog box and general database restore attributes. By default the full backup and subsequent transaction log backups are all selected. Clicking the “OK” button would initiate recovery to the most recent possible point in time. Alternately, recovery to a specific point in time is also possible.
Figure 7: Restore Database Dialog Box
Figure 8: Restore Database Options
Figure 8 depicts restore database options and available database recovery options. By default an existing database will not be overwritten. Also note that by default the recovery state is “RESTORE WITH RECOVERY,” which leaves the recovered database in an online and unstable state after the restore process completes.
Figure 9: Restore the Initial Full Backup then the First Transaction Log Backup
EMC NetWorker and third party backup applications will each have a unique recovery interface for databases. Many automate and coordinate the recovery of full and transaction log backups similar to the way native Microsoft SQL Server tools and utilities do.
Figure 10: NetWorker MS SQL Client Restore GUI Example
Data Domain Integration Best Practices
Table 1 presents a summary of the suggested best practices settings for Microsoft SQL Server backup to Data Domain deduplication storage systems.
Table 1: Recommended Backup Software Settings PARAMETERS AFFECTING DEDUPLICATION
PERFORMANCE SETTING
SQL Server 2008 native compression NO_COMPRESSION Third-party backup application SQL Server local
compression Disabled
Third-party backup application multiplexing Disabled Third-party backup application encryption Disabled Third-party backup application deduplication Disabled
Figure 9 is an example of a recovery transaction that restores the initial full backup, followed by the first transaction log backup. The remaining transaction logs were not included in this query for brevity.
Figure 10 is an example of the NetWorker MS SQL client restore GUI
Compression
Specific to SQL Server 2008 Enterprise and later versions, backup compression can be enabled or disabled. The default product installation does not compress backups. A server-level compression setting can be applied that alters default behavior. The use of the COMPRESSION keyword within a backup SQL transaction explicitly enables backup compression. The use of the NO_COMPRESSION keyword within a backup SQL transaction explicitly disables backup compression.
Figure 11 illustrates SQL Server 2008 properties for native compression; the “Compress backup”
service level property is used for backup jobs that do not explicitly enable or disable compression.
Backup application software compression should be disabled because the Data Domain system can fingerprint unique data segments more efficiently for
deduplication if the data segments sampled are not already compressed. Backup windows can be extended and CPU performance can be impacted on the backup client if the backup software is tasked with performing compression. Local compression is provided for on the Data Domain storage system.
Multiplexing
When the Data Domain system is integrated as a backup device with a backup application that supports multiplexed backups, EMC recommends disabling multiplexed backups. Multiplexing limits the ability of the Data Domain system to deduplicate incoming data.
Historically used as a speed matching solution where multiple slower data streams were multiplexed into a single stream to take advantage of a somewhat faster tape drives, backups to disk drives obtain no advantage from multiplexing. Whether
deployed as a CIFS share, NFS mount, VTL, or OpenStorage / DD Boost disk pool, Data Domain systems accommodate writing multiple backup streams in parallel without multiplexing.
Encryption
Encrypted files are by definition, unique. The encryption software that is part of the backup application will create unique files, on-the-fly for each backup, defeating the deduplication capabilities of the deduplication storage system. Data Domain
Encryption software provides encryption of data at rest and is persistent in flight
during replication with Data Domain Replicator software.
Figure 11: Native SQL Backup - Disable Compression
Backup Application Based Data Deduplication
Disabling deduplication from the backup application software will provide better performance and allow the Data Domain system to offload this work. Data Domain systems are optimized to provide the very best ingest performance and deduplication ratios.
Table 2: Recommended Backup Software Parameter Settings PARAMETERS AFFECTING BACKUP AND
RECOVERY PERFORMANCE SETTING
BLOCKSIZE Default 512 byte or higher based on
performance improvements
Stripes Consider the use of multiple stripes to
improve backup and restore data transfer rates
Blocksize
The, “BLOCKSIZE” keyword can be used to alter physical block size used when writing to backup media. By default the backup process will automatically select a block size appropriate for the backup device. Supported sizes are 512, 1K, 2K, 4K, 8K, 16K, 32K and 64K bytes. The default value used for disk backup is 512 bytes.
The default 512-byte size yields excellent performance with Data Domain systems. Third-party backup applications may substitute their own default value. The fact that this parameter can be adjusted is included as reference. The use of larger sizes may improve or degrade performance. Users are encouraged to investigate further to determine what value may provide optimal results in their environment.
Stripes
While not a keyword within the context of Microsoft SQL Server, the term stripes correlates to the number of simultaneous backup streams to be created for a given backup operation. In the case of disk backups with SQL Server, multi-streamed backups are performed by specifying a number of backup disk targets with the BACKUP command.
Table 3: Mount Options
MOUNT OPTIONS SETTING
When performing native database backups
UNC path
When using a third-party backup server Dependent on backup application and server OS type
When the Data Domain system is used as a disk backup media for native Microsoft SQL Server backups, configuration is performed utilizing a CIFS share.
As a general rule, the UNC path to the share should be used instead of a mapped drive because:
a) Scheduled backups may execute when no user is logged in to the server b) When Sqlservr.exe is executed as a service, it has no relation to a login
session
Table 4: Miscellaneous Options
MISCELLANEOUS OPTIONS CONFIGURATION
Comingling native and third-party backup application database backups to the same Data Domain system
Yes
Replication Yes
Comingling native and third-party backups to a Data Domain system should have only a negligible impact on deduplication ratios because of the variable segment
processing and Stream Informed Segment Layout (SISL) architected into Data Domain systems.
Since Data Domain Replicator software only sends unique, compressed data segments to the remote system it is ideal for network-efficient disaster recovery. Table 5: Infrastructure Configuration
INFRASTRUCTURE CONFIGURATION
Server Disk Subsystem Database and log files should be placed on disk storage with performance
attributes facilitating required
transaction and backup performance metrics
IP Network Dedicated backup network that meets or
exceeds bandwidth requirements for the desired data transfer rate
EMC Data Domain System Sized to meet or exceed ingest rate and backup retention capacity requirements
Backup Command
The recommended use of SQL stripes is as a speed matching technology. Multiple backup streams from a given database can be simultaneously written to a target Data Domain system in an effort to achieve an aggregate data transfer rate that aligns with business requirements.
Figure 12: Multi-striped Database Backup – Eight Stripes
Data Transfer Rates
Multiple business objectives are considered when determining required backup and recovery data transfer rates. Decision criteria include backup window duration, log growth, and recovery time.
Figure 13: Database Backup to a Null Device
A reasonable place to start any backup performance investigation is to understand the theoretical maximum speed at which SQL Server can process a given database backup. Performing a database backup to a null disk device provides an estimate of that maximum achievable speed in a given environment. Figure 13 depicts a
database backup to a null device.
The results of the query indicate that the theoretical maximum rate at which the SQL Server backup function can extract data from this database using a single stripe is approximately 80 MB/sec. Regardless of the data transfer rate at which the backup media can accept data, backing up this database as it currently stands will be limited to 80 MB/sec when using a single stripe.
Figure 12 illustrates a multi-striped database backup that uses eight stripes in an effort to improve backup data transfer rate performance. Multiple stripes can be used to better match data transfer rate capabilities between source and destination media.
By definition, slow backups are those that fail to meet or exceed business objectives. Understanding factors that can affect performance is critical to removing them from the environment.
Figure 14: Multiple Null Disk Devices
Figure 15: Nominal Database Backup Performance
Integration
EMC NetWorker and third-party backup applications used to protect Microsoft SQL Server can also take advantage of Data Domain systems employed as backup media. Data Domain systems are easily configured as varied backup media types and
protocols including VTL, CIFS share, NFS mount, or Data Domain Boost (DD Boost) for backup applications such as EMC NetWorker.
Additionally, DD Boost enables managed replication capabilities known as, “clone controlled replication” with EMC NetWorker.
In this scenario, backup images are replicated from one Data Domain system to another under the direct control of NetWorker or other supported backup
applications. DD Boost monitoring, reporting, and cataloging of replicated backup images and savesets can be used to architect a comprehensive disaster recovery plan.
Figure 15 depicts nominal database backup performance improvement with a moderately tuned eight-stripe SQL database backup with an aggregate data transfer rate of approximately 172 MB/sec, indicating that the
network-attached backup devices are not limiting throughput.
Figure 14 depicts a database backup to multiple null disk devices.
Solution Planning
Capacity and performance planning play a critical role in both successful deployment and ongoing production usage of a Data Domain system. Detailed capacity analysis should be performed by a knowledgeable EMC Velocity partner or an EMC technical consultant. The analysis considers database sizes, growth rates, change rates, and retention periods as input criteria. Performance analysis considers data points such as the required aggregate data transfer rate for backups, connection topology
requirements to support the data transfer rate, and the Data Domain system required to meet or exceed the required data transfer rate.
Beyond capacity and performance planning are additional considerations for Data Domain system replication.
Additional Considerations
Replication ScopeReplicating all database backups is certainly possible. However, many users will want to implement replication at a more granular level. Production database backups are usually excellent replication candidates, whereas development and test database backups are less critical. An analysis of network bandwidth and destination disk space requirements should be performed by a knowledgeable EMC Velocity partner or an EMC technical consultant.
Replication Topology
Backups are typically replicated to serve as a second backup copy for recovery in the event of a disaster. When backups from a primary site are being replicated to a secondary site, planning is relatively straightforward. Users with multiple primary sites may decide to implement a bidirectional replication solution where database backups from either site are replicated to the alternate site. Proper planning should render an outline detailing which database backups are being replicated to each location.
Tape Consolidation
Some users replicate backup images to a central location for disaster recovery purposes while also using the solution as a vehicle that enables centralized tape creation. The third-party backup application used to create tape-based backup copies will dictate any additional considerations or restrictions that this solution involves. A knowledgeable EMC Velocity partner or an EMC technical consultant will be able to assist with this planning task.
Backup Types
The goal of backups is to satisfy recovery time and point objectives. Outlining a strategy of full, differential, and transaction log backups is beyond the scope of this paper. That stated, there are a few key points worth noting:
• Performing full backups frequently with Data Domain deduplication storage does not create a storage usage penalty, as redundant database segments do not consume additional disk space. While this may appear to enable the ability to perform full backups more frequently, the load full backups place on the SQL server and connection topology to the Data Domain system should be taken into consideration.
• When split-mirror or snapshot backups are performed and controlled by a third-party backup application, the Data Domain system is easily integrated as a backup storage device. The features provided by these backup techniques (low-impact backups, instant recovery, and so on) do not preclude the use of Data Domain technology.
IP Network Considerations
When Data Domain systems are deployed as a CIFS backup share, EMC recommends interconnecting SQL servers and Data Domain systems using a dedicated backup area network. When deployment is in conjunction with a backup application as a CIFS share, NFS mount, or OpenStorage / DD Boost disk pool, EMC similarly recommends interconnecting backup application media servers and Data Domain systems using a dedicated backup area network.
Whenever possible, the network used for backup and recovery communications should be segregated from other production networks. This best practice
recommendation seeks to assure that network bandwidth is available for backup and restore jobs to meet or exceed business objectives.
Network bandwidth requirements may dictate the need for a topology that supports data transfers in excess of 125 MB/s. All Data Domain systems support the use of multiple GbE network interfaces, and the use of 10 GbE network interfaces.
A knowledgeable Data Domain system engineer will be able to assist with planning the deployment based on user requirements and available resources.
Data Domain and Third-Party Backup Applications
When Data Domain systems are integrated with EMC NetWorker and third-party backup applications, it is important to note that Microsoft SQL Server backup parameters are handled the same as when compared to a native SQL Server backup implementation. The COMPRESSION, and BLOCKSIZE keywords, as well as any striping, are still valid parameters. Some of these settings may or may not be unavailable when using a third-party backup application.
Figure 16: NetWorker Management Console
Users of third-party backup applications seeking to exploit the full complement of available Microsoft SQL Server backup options should contact their software provider in the event additional information is required.
Conclusion
A Data Domain system makes an excellent target for Microsoft SQL Server backups because it integrates easily and seamlessly into existing SQL Server environments. Data Domain systems allow the SQL Server administrative team to retain a greater number of full backup images online, thereby optimizing recovery options while occupying minimal footprint in the data center, utilizing native backup tools that are familiar to SQL Server administrators.
The addition of a Data Domain system into the environment greatly reduces
dependence on legacy tape and provides faster “time-to-DR” with network-efficient replication.
When Data Domain Boost integration with EMC NetWorker is leveraged, performance can be greatly improved and the managed replication includes the remote backup image in the saveset database for easy recovery.
It is for all of these reasons that more people choose to build their backup solutions using EMC products and technology.
Figure 16 depicts the NetWorker
Management Console interface for
configuring MS SQL Backups.
Appendix A: Index Fragmentation
Index fragmentation affects I/O performance of queries whose data pages do not reside in the Microsoft SQL Server data cache. A variety of techniques are commonly used to reduce index fragmentation, including but not limited to “DBCC
INDEXDEFRAG”, “DBCC DBREINDEX”, and “CREATE INDEX WITH DROP EXISTING”. While these techniques are effective in reducing index fragmentation, they can also have a negative impact on deduplication. Database administrative teams that routinely defragment all indexes at some predetermined frequency may notice reduced data deduplication rates on their Data Domain systems. The end result is reduced storage efficiency.
Index defragmentation has the effect of reorganizing the pages within a database such that Data Domain deduplication sees the backup data stream as new, unique data. In addition to the inefficient use of backup device storage space, this can also impact the ability to replicate database backups using Data Domain replication. A greater quantity of unique data blocks equates to replicating a greater quantity of data over what may be a bandwidth limited WAN.
Database administrative teams may find themselves in a situation where index fragmentation impacts query performance, and frequent index defragmentation impacts backup storage device performance in terms of deduplication and replication rates.
Addressing the Challenge
EMC recommends addressing these challenges with a balanced approach. For instance, instead of defragmenting all indexes based on a schedule, consider defragmentation based on thresholds. Additionally EMC recommends the use of index keys that are less prone to fragmentation in the first place.
Is index fragmentation the only issue impacting transaction performance? I/O subsystem performance, memory usage, and CPU utilization can all have a negative impact on query performance. These issues should be diagnosed and resolved versus the use of frequent automatic index defragmentation to improve performance.
File fragmentation can also impact performance. Many small databases sharing the same logical disk volume combined with the use of the “autogrowth” property can cause logically sequential database files to allocate non-sequential physical storage on disk. Ideally, administrators should set the size of database files at deployment to accommodate potential future growth.
While it may be impossible to anticipate the size of a given database three years into the future, doing so helps to reduce the possibility that file fragmentation will impact query performance. If automatically growing database files is a requirement, consider growing in large chunks versus small chunks. It may be impractical to locate each database on a unique logical volume, but consider doing so for databases that are
expected to grow considerably over time. Finally, disk file fragmentation can be reduced by Windows file system defragmentation utilities such as the Windows “Disk Defragmenter.”
Do all indexes need to be defragmented or just a subset?
EMC recommends the use of index defragmentation tools based on thresholds and limits versus automatically defragmenting every index on every table whether it is required or not. The suggestion is to understand what indexes and their
corresponding fragmentation levels impact performance.
These indexes should be monitored for a specific fragmentation threshold, and action taken to defragment these indexes only when necessary. Selective index
defragmentation will have less impact on production and will assist in preserving the ability to efficiently deduplicate database backups.
Figure 17: DBCC “showcontig” Command Output
Structuring indexes and keys so as to minimize fragmentation may or may not be realistic in all cases, but it should be considered as it potentially reduces the need to defragment indexes frequently. Index and key inserts that occur at the end of the table and index are likely to reduce fragmentation. Deletes that occur in contiguous chunks also assist in reducing fragmentation.
Appendix B: Additional resources
Microsoft Resource Links
Backing Up and Restoring Databases in SQL Server - from SQL Server 2008 Books Online
Backing Up and Restoring Databases in SQL Server - from SQL Server 2005 Books Online
Optimizing Backup and Restore Performance in SQL Server - SQL Server 2005 Books Online
Microsoft SQL Server Community
Figure 17 depicts the DBCC showcontig command output. It includes extent scan fragmentation data indicating that index “C_CustomerI1” does not require defragmentation at this time.
EMC Data Domain links
EMC Backup and Recovery for Microsoft Applications — Deduplication Enabled by EMC CLARiiON and Data Domain white paper
EMC Data Domain Family products and deduplication technology
Technical Notes - Using EMC® NetWorker® Module for SQL Server® with Data Domain Boost® for Improved Backup and Recovery Performance
EMC Data Domain Boost® Software
EMC Data Domain SISL Scalability Architecture — A Detailed Review white paper
EMC Data Domain Replicator Software — A Detailed Review white paper
EMC Data Invulnerability Architecture: Ensuring Data Integrity and Storage System Recoverability white paper
EMC Networker ®Software
EMC NetWorker Online Community
IDC Study – Worldwide Purpose Built Backup Appliances: