Maximizing Availability of Microsoft SQL Server 2012 on Vblock
Systems Table of Contents
1
© 2013 VCE Company, LLC. All Rights Reserved.
MAXIMIZING AVAILABILITY OF
MICROSOFT SQL SERVER 2012
ON VBLOCK™ SYSTEMS
January 2013
© 2013 VCE Company, LLC. All Rights Reserved.
2
© 2013 VCE Company, LLC. All Rights Reserved.
Contents
Introduction ...4 Business case...4 Solution overview ...4 Key results ...5 Objectives ...5 Scope ...5 Audience...6 Feedback ...6 Technology overview ...7 Vblock Systems ...7 Compute components ...7 Network components ...7 Storage components...8 Virtualization components ...8 EMC PowerPath/VE ...8Microsoft Windows Server 2008 R2 with Service Pack 2 ...8
Microsoft SQL Server 2012 ...9 Architecture overview ...10 Physical architecture ...10 Logical architecture ...11 Hardware resources ...11 Software resources ...12 Design considerations ...13
Determining high-availability and disaster recovery needs ...13
Choosing a SQL Server 2012 AlwaysOn design ...14
Designing the overall solution architecture ...14
Sizing and configuring infrastructure components ...16
Compute configuration ...16
Network configuration ...16
Storage configuration...16
Virtualization configuration ...19
Solution validation ...20
3
© 2013 VCE Company, LLC. All Rights Reserved.
Workload generation ...21
Distance emulation ...22
Test methodology ...23
Test scenarios ...23
Creating and configuring the Availability Group...23
Test results ...25
Site failure ...25
Network failure...27
Storage array failure ...28
Server/node failure ...30
Conclusion ...31
Next steps ...31
4
© 2013 VCE Company, LLC. All Rights Reserved.
Introduction
In today’s dynamic business environment, continuous uptime and fast recovery from adverse
conditions are vital to meeting service-level agreements (SLAs) for high availability of business critical applications and application data. Planning for business disruptions and minimizing their impact is critical for enterprises, government agencies, and service providers to reduce revenue losses and compliance risks, including potential lawsuits. Designing and deploying a database like Microsoft SQL Server 2012, in conjunction with a highly available converged infrastructure helps organizations to avert these risks and control deployment and maintenance costs.
This solution architecture paper describes the VCE solution for supporting a geographically dispersed SQL Server 2012 environment protected by AlwaysOn technology with multi-subnet support at synchronous distances of 0, 100, and 300 kilometers.
Business case
Because of the demand for increasingly higher availability service levels, many organizations are upgrading or migrating to SQL Server 2012 to leverage its AlwaysOn features for high availability and disaster recovery. SQL Server 2012 AlwaysOn is an integrated, flexible, and cost-efficient solution that provides redundancy within and across data centers, fast application failover for maximum availability, and data protection of critical applications.
These migrations can replace older infrastructures, which are difficult to manage and expensive to support. Building an infrastructure by separately deploying and managing compute, storage, and virtualization resources for databases and other business systems has been complex,
time-consuming, and costly—until now.
Converged infrastructures that take advantage of virtualized databases like SQL Server are rapidly transforming the way organizations deploy and manage their IT assets and resources. With a converged infrastructure, IT has a single flexible, scalable platform to manage, which reduces system complexity and IT operational overhead. With fewer servers consuming less space, power, and cooling, IT organizations also see significant capital and operational expense savings. For this reason, deploying SQL Server 2012 and taking advantage of its AlwaysOn functionality on a leading
converged infrastructure helps organizations achieve business continuity objectives at a lower risk and cost.
Solution overview
The VCE solution is built on geographically dispersed SQL Server 2012 environments deployed on Vblock™ Systems protected by AlwaysOn technology in a primary and disaster recovery configuration across two data centers. Deploying SQL Server 2012 on Vblock Systems enhances the benefits of the AlwaysOn high-availability and disaster recovery features. Vblock Systems immediately deliver a pre-engineered, tested, and validated converged infrastructure that supports your high-availability requirements while reducing the total cost of ownership (TCO) and improving time to market.
5
© 2013 VCE Company, LLC. All Rights Reserved.
By combining the power of the Vblock converged infrastructure and a virtualized SQL Server 2012, your organization can:
Demonstrate business-critical levels of high performance and availability between data centers Ensure site resiliency and meet recovery objectives when faced with application, storage,
network, and node failures
Achieve business-continuity SLAs end-to-end at a lower risk and cost
Key results
Testing and validation of the VCE solution demonstrated the following:
Using SQL Server 2012 AlwaysOn Availability Groups, the Recovery Time Objective (RTO) of less than one hour, and the Recovery Point Objective (RPO) of zero data loss were met in all failure scenarios at distances of 0, 50,100, and 300 kilometers.
The Vblock System 300 can easily service SQL Server 2012 mission-critical online transaction processing (OLTP) production workloads while providing high availability through a combination of the Availability Groups feature and the storage capabilities of the Vblock converged
infrastructure.
Deploying SQL Server 2012 on a pair of Vblock Systems is simple and efficient, and provides improved data protection under typical transaction loads.
Objectives
The primary objective of this solution architecture paper is to demonstrate how to reduce the impact of hardware and software failures by using virtualized SQL Server 2012 for high availability and disaster recovery. This document includes best practice for facilitating SQL Server 2012 deployments on Vblock Systems to maximize continuous availability of the application and ensure business continuity during disasters.
Scope
The scope of this solution architecture is to:
Validate the Vblock System for critical, enterprise-level, SQL Server 2012 deployments in a disaster recovery configuration featuring AlwaysOn Availability Groups.
Demonstrate the high availability of SQL Server 2012 on Vblock Systems during disaster recovery.
Provide best practices for a SQL Server 2012 deployment on the Vblock System.
6
© 2013 VCE Company, LLC. All Rights Reserved.
Audience
The primary audience of this solution architecture paper is technical practitioners, such as data center managers, database architects, and database administrators (DBAs). In addition, IT management teams would also be interested in reviewing this document as part of a product evaluation.
Feedback
To suggest documentation changes and provide feedback on this paper, contact
7
© 2013 VCE Company, LLC. All Rights Reserved.
Technology overview
The following sections summarize the technologies used in the VCE solution: Vblock Systems
EMC PowerPath/VE
Microsoft Windows Server 2008 R2 with Service Pack 2 Microsoft SQL Server 2012
For more information about the overall test environment including the distance emulator, load generation tools, and test measurement tools, refer to the Test environment design section.
Vblock Systems
Growing business challenges demand an IT infrastructure that is swift, agile, and economical— enabling organizations to adapt to change quickly and cost-effectively. VCE designs and delivers Vblock Systems, which seamlessly integrate leading compute, network, and storage technologies. Through intelligent discovery, awareness, and automation, Vblock Systems provide the highest levels of virtualization and application performance.
The Vblock System 300 is an agile and efficient data center-class system, providing modular and scalable performance. It features a high-density, compact fabric switch; tightly-integrated, fabric-based blade servers; and best-in-class, unified storage. Organizations look to the platform for unified storage, to support high-bandwidth applications, and for ease of use. Available in several models, this solution uses the Vblock 300EX.
Each Vblock model has a base configuration, which is a minimum set of compute and storage components as well as fixed network resources. Within the base configuration, certain hardware aspects can be customized. Together, the components offer balanced CPU, I/O bandwidth, and storage capacity relative to the compute and storage arrays in the system.
For more information, go to http://www.vce.com/vblock.
Compute components
The compute components in Vblock Systems are built on the Cisco Unified Computing System (UCS) line of products. Components include one or more blade server chassis, compute blades, I/O
modules, and the fabric interconnects that connect the unified fabric to the rest of the environment.
Network components
8
© 2013 VCE Company, LLC. All Rights Reserved.
Storage components
The Vblock Systems used in this solution contain VNX5300 storage arrays. The VNX5300 is a high-end, enterprise storage array that supports multiple drive technologies, including Flash, serial-attached SCSI, SAS, and nearline SAS (NL-SAS) drives, and a full range of RAID types. Intel Xeon processors power the VNX series for intelligent storage that automatically and efficiently scales in performance, while ensuring data integrity, security, and high availability for mission-critical SQL Server environments.
Virtualization components
Virtualization components include VMware ESXi, VMware vCenter Server, and VMware vSphere. VMware vCenter Server provides a scalable and extensible platform that forms the foundation for virtualization management. It centrally manages VMware vSphere environments, allowing IT administrators control over the virtual environment.
For more information on VMware vCenter Server, go to http://www.vmware.com/products/vcenter-server/overview.html.
EMC PowerPath/VE
EMC PowerPath/VE performs multipathing for the storage arrays on the Vblock System.
PowerPath/VE works in combination with SAN storage to intelligently manage Fibre Channel (FC), iSCSI, and Fibre Channel over Ethernet (FCoE) I/O paths. Installed on the vSphere host,
PowerPath/VE scales to the maximum number of virtual machines on the host. The virtual machines do not have PowerPath/VE installed nor are they aware that PowerPath VE is managing them. PowerPath/VE dynamically load balances I/O requests and automatically detects, and recovers from, path failures. PowerPath/VE standardizes path management across heterogeneous physical and virtual environments and ensures application uptime.
Microsoft Windows Server 2008 R2 with Service Pack 2
For this solution, the cluster feature available with Windows Server 2008 R2 with Service Pack (SP) 2 was used. A failover cluster is a group of independent computers that work together to increase the availability of applications and services. The clustered servers, called nodes, are connected by physical cables and software. If one of the cluster nodes fails, another node begins to provide service (a process known as failover). Users experience a minimum of disruptions in service.
Microsoft cluster technologies guard against three specific types of failure:
Application and service failures, which affect application software and essential services. System and hardware failures, which affect hardware components such as CPUs, drives,
memory, network adapters, and power supplies.
9
© 2013 VCE Company, LLC. All Rights Reserved.
Microsoft SQL Server 2012
Microsoft SQL Server is a database management and analysis system for e-commerce,
line-of-business, and data warehousing solutions. The latest version, SQL Server 2012, adds new high availability and disaster recovery solutions through AlwaysOn Availability Groups.
The new SQL Server 2012 AlwaysOn feature provides flexible design choices for selecting an appropriate high-availability and disaster recovery solution for applications. SQL Server AlwaysOn was developed for applications that require high uptime, need protection against failures within a data center and adequate redundancy against failures across data centers.
SQL Server 2012 AlwaysOn provides availability at either the application database or instance level: Availability Groups is a new feature that helps protect application databases from planned and
unplanned downtime.
The Availability Groups feature provides an enterprise-level alternative to database mirroring. An availability group supports a failover environment for a set of read-write primary databases and one to four sets of corresponding secondary databases. Optionally, secondary databases can be made available for read-only access and/or some backup operations.
Failover Cluster Instance provides availability for the entire SQL Server instance by enabling failover of the instance rather than individual database(s).
The Failover Cluster Instance feature enhances existing SQL Server Failover Clustering with support for multisite clustering across subnets for improved site protection, a flexible failover policy for better control over instance failover, and improved diagnostics for faster
10
© 2013 VCE Company, LLC. All Rights Reserved.
Architecture overview
VCE solutions are designed to reflect and validate real-world deployments. This paper describes a disaster recovery solution for SQL Server 2012 on the Vblock System using AlwaysOn Availability Groups. The following sections describe the physical and logical architecture of the solution.
Physical architecture
The solution uses Windows Server in a Windows Server Failover Cluster (WSFC) on two Vblock 300EX Systems. One Vblock 300EX System serves as the primary data center site and the other Vblock System as the disaster recovery data center site The Vblock 300EX System is optimal for the storage, compute, hypervisor, and network requirements of this solution. Each of the ESX hosts resides on a VNX5300 storage array connected through MDS 9148 and Nexus 5010. Figure 1 shows the physical architecture of the solution.
11
© 2013 VCE Company, LLC. All Rights Reserved.
Logical architecture
Figure 2 shows the logical architecture of the solution.
Figure 2. Logical architecture
Hardware resources
Table 1 shows the hardware resources used in this solution.
Table 1. Solution hardware resources
Equipment Quantity Configuration
Compute 6 Cisco UCS B200 M2 blade server packs (12-core, 96 GB memory)
Network switches 4 Cisco Nexus 5548
Storage platform 2 EMC VNX5300
Aggregation SAN switch 4 Cisco MDS 9148 Virtualization 2 VMware vSphere 5.0
VMware vCenter Server 5.0
12
© 2013 VCE Company, LLC. All Rights Reserved.
Software resources
Table 2 lists the software used in this solution.
Table 2. Solution software resources
Description Quantity Version Purpose
Windows Server 2008 R2 6 2008 R2 x64 SP1 Server operating system Microsoft SQL Server 2012
Enterprise Edition 2 2012 Database server
EMC PowerPath/VE 2 5.5.1 Advanced multipathing
13
© 2013 VCE Company, LLC. All Rights Reserved.
Design considerations
To meet high-availability and disaster recovery requirements using SQL Server 2012 AlwaysOn Availability Groups, it is important for an organization to:
Determine high-availability and disaster recovery needs. Choose a SQL Server 2012 AlwaysOn design.
Design an overall solution architecture. Size and configure infrastructure components.
Follow best practices for SQL Server 2012 from VMware, Cisco, and EMC
The following sections discuss some of the design considerations for the implementation of Availability Groups. For more information about prerequisites, restrictions, and other recommendations, refer to us/library/ff878487.aspx and
http://msdn.microsoft.com/en-us/library/hh510238.aspx.
Determining high-availability and disaster recovery needs
Disaster recovery is the ability to recover data in case the production system is damaged, destroyed or becomes unavailable for an undeterminable period. One of the first things to do when creating a disaster recovery plan is to determine how much data and time you can afford to lose. The primary measures to use to determine high availability and disaster recovery needs including data recovery metrics, are:
High availability—Refers to a system or component that is continuously operational for a desirable length of time. Availability can be measured relative to "100% operational" or "never failing." Disaster recovery is a subset of business continuity. High availability can be a concept within a data center or across data centers.
Disaster recovery—Refers to a strategy preparing for recovery and continuation of the technology infrastructure vital to an organization after a natural or human-induced disaster. Disaster recovery is also a subset of business continuity. Recovery Point Objective and Recovery Time Objective are the key metrics to measure disaster recovery readiness. Recovery Point Objective (RPO)—Describes a point in time from which data can be restored.
For instance, if there is data corruption, loss, or unavailability, what point in time can a valid copy of the data be restored from?
Example
“The database is corrupted. Can we restore from 15 minutes ago?" This would be an RPO of 15 minutes.
Recovery Time Objective (RTO)—The amount of time in which you want data or hardware restored after a data corruption or hardware failure.
Example
14
© 2013 VCE Company, LLC. All Rights Reserved.
Figure 3. Using RPO and RTO to determine data recovery needs
The disaster recovery goal of the VCE solution is recovery in less than one hour with zero data loss.
Choosing a SQL Server 2012 AlwaysOn design
SQL Server 2012 AlwaysOn provides flexible design choices for selecting an appropriate solution for your application that provides protection against failures within a data center and adequate
redundancy across data centers.
Microsoft has identified the following design patterns as the most commonly used by their customers for high availability and disaster recovery:
Multi-site Failover Cluster Instance for a local high-availability and disaster recovery solution Availability Groups for a local high-availability and disaster recovery solution
Failover Cluster Instance for local high availability, and Availability Groups for a disaster recovery solution
For an overview of these design patterns for high availability and disaster recovery, refer to http://msdn.microsoft.com/en-us/sqlserver/gg490638).
Designing the overall solution architecture
Setting up Availability Groups requires: At least two servers (one primary replica and one secondary replica) running Microsoft Windows Server 2008 R2 Enterprise Edition
Microsoft SQL Server 2012 Enterprise Edition
Windows Server Failover Clustering (WSFC) configured between the hosts
Table 3 lists important Microsoft prerequisites and requirements to consider when designing a high-availability and disaster recovery solution. For more information, refer to
15
© 2013 VCE Company, LLC. All Rights Reserved.
Table 3. Prerequisites and Requirements
Component Prerequisites and requirements
Windows Server 2008 R2 Availability Groups are not supported on domain controllers. Each host computer must be a node in a WSFC cluster.
A WSFC node can host only one availability replica for a given Availability Group. On a given WSFC node, one or more instances of SQL Server can host availability replicas for many Availability Groups.
Ask your database administrators how many WSFC nodes are required to support the availability replicas of the planned Availability Groups. Every computer on which a server instance hosts an availability replica
must possess sufficient disk space for all the databases in the Availability Group.
SQL Server 2012 The instances of SQL Server that host availability replicas for a given Availability Group must reside on separate nodes of a single WSFC cluster. Each server instance must be running the Enterprise Edition of SQL Server
2012.
Enable the AlwaysOn Availability Groups feature on each server instance that will host an availability replica for any availability group.
Availability Groups For a given Availability Group, availability replicas must be hosted by server instances running on different nodes of the same WSFC cluster. However, virtual machines on the same physical computer can each host an availability replica for the same Availability Group because each virtual machine acts as a separate computer.
Each Availability Group supports one primary replica and up to four secondary replicas.
The number of databases and Availability Groups you can put on a computer (virtual or physical) depends on the hardware and workload, but there is no enforced limit.
Thoroughly test your environment with a production-like workload to ensure it can handle peak workload capacity within your application service level agreements (SLAs). Consider load under failure conditions as well as expected response times.
Availability Database Must be a read-write database. Read-only databases cannot be added to Availability Groups.
Must be a multi-user database.
Must not belong to any existing Availability Group. Must not be configured for database mirroring.
The infrastructure used for testing and validation was built on a Windows Server 2008 R2 Enterprise Edition operating system platform with a four-node WSFC consisting of a two-node cluster on the primary data center site and a two-node cluster on the secondary data center site using a stretched, active-active failover cluster configuration.
16
© 2013 VCE Company, LLC. All Rights Reserved.
Sizing and configuring infrastructure components
The following sections describe how the components of the converged infrastructure on the Vblock 300EX were configured for validation of the solution. Before you begin, check the latest software and hardware requirement for SQL Server 2012. Current requirements are listed at
http://technet.microsoft.com/en-us/library/ms143506.aspx.
Compute configuration
For the Vblock solution, two Vblock 300EX Systems were used to represent the primary and secondary (disaster recovery) data center sites. Each Vblock contained four Cisco UCS B200 M2 blade server packs. Each blade server had a 12-core CPU and 96 GB of memory.
Network configuration
Microsoft strongly recommends that you use the same network links for communications between Windows Server Failover Cluster (WSFC) cluster members and communications between availability replicas. Using separate network links can cause unexpected behaviors if some of the links fail (even intermittently).
To improve throughput, manageability, application separation, high availability, and security, virtual local area networks (VLANs) were used to segregate the network traffic of various types.
Storage configuration
SQL AlwaysOn Availability Groups require separate, non-shared storage between data centers. The following sections describe the storage configuration for the solution.
RAID groups
In accordance with best practices from Microsoft, EMC, and VMware, the SQL Server database configuration was split into two RAID groups for data and transaction log data.
For application data, RAID 5 disks were used. RAID 5 was chosen for two main reasons:
Lower cost of data protection compared to RAID 10, with RAID 5 requiring only 20 percent of the total usable capacity for storing parity data
High-read performance because all drives contribute to the delivery of data
To ensure availability of transaction log data in disaster recovery situations, RAID 10 disk groups were used for transaction log data.
Table 4 shows the RAID group configurations in the VNX5300 storage arrays.
17
© 2013 VCE Company, LLC. All Rights Reserved.
RAID Group
No. No. of Disks Disk Type Disk Size (GB) RPM RAID Type Purpose
10 5 SAS 600 10K RAID
(4+1) ESXi boot LUNs, VM Boot and Swap
239, 240 2 SAS 600 10K Hot
Spare Hot Spare Storage pools
To optimize read performance, a RAID 5 storage pool was used for Microsoft SQL Server application data. A separate RAID 10 storage pool was used for the SQL Server transaction logs. Table 5 shows the storage pool configurations, which are identical in both Vblock Systems.
Table 5. Storage pools
Pool Name No. of
Disks Disk Type Disk Size (GB) RPM RAID Type Purpose
MSSQL_R5 10 SAS 600 10K RAID 5 For SQL Server DB
MSSQL_R10 8 SAS 600 10K RAID 10 For SQL Server
Transaction Log Disk storage
To maximize performance and availability, best practices from Microsoft, EMC, Cisco, and VMware were used to configure the virtualized storage to the physical infrastructure.
18
© 2013 VCE Company, LLC. All Rights Reserved.
Table 6. Disk storage layout—Vblock 300EX 1
Drive
Letter Storage (GB) Hard Disk No. Purpose ESXi Disk Type
VMware
Datastore VNX Array LUN Size (GB)
VNX Storage Pool
C: 100 Hard Disk 1 Boot and
Swap vmdk vm_boot_swap 600 RG #10 D: 1024 Hard Disk 2 DB vmdk sql_db 2500 MSSQL_DB L: 700 Hard Disk 3 Log vmdk sql_log 1700 MSSQL_LOG C: 100 Hard Disk 1 Boot and
Swap vmdk vm_boot_swap 600 RG #10 D: 1024 Hard Disk 2 DB vmdk sql_db 2500 MSSQL_DB L: 700 Hard Disk 3 Log vmdk sql_log 1700 MSSQL_LOG C: 100 Hard Disk 1 Boot and
Swap vmdk vm_boot_swap 600 RG #10 C: 100 Hard Disk 1 Boot and
Swap vmdk vm_boot_swap 600 RG #10 C: 100 Hard Disk 1 Boot and
Swap vmdk vm_boot_swap 600 RG #10 C: 100 Hard Disk 1 Boot and
Swap vmdk vm_boot_swap 600 RG #10 C: 1100 Hard Disk 1 Boot and
19
© 2013 VCE Company, LLC. All Rights Reserved.
Table 7. Disk storage layout—Vblock 300EX 2
Drive
Letter Storage (GB) Hard Disk # Purpose ESXi Disk Type
VMware
Datastore VNX Array LUN Size (GB)
VNX Storage Pool
C: 100 Hard Disk 1 Boot and
Swap vmdk vm_boot_swap 600 RG #10 D: 1024 Hard Disk 2 DB vmdk sql_db 2500 MSSQL_DB L: 700 Hard Disk 3 Log vmdk sql_log 1700 MSSQL_LOG C: 100 Hard Disk 1 Boot and
Swap vmdk vm_boot_swap 600 RG #10 D: 1024 Hard Disk 2 DB vmdk sql_db 2500 MSSQL_DB L: 700 Hard Disk 3 Log vmdk sql_log 1700 MSSQL_LOG
Virtualization configuration
The following table describes the virtual machines created to validate the solution. Data Center Site Virtual Machine Role Primary Windows Server 2008 R2 SP2
domain controller (ms-dc1) Provides DNS, Active Directory, and DHCP services SQL Server 2012 (ms-sql-db1) Provides database services
SQL Server 2012 (ms-sql-db2) Provides database services Cluster File Share Witness
(ms-sql-wns) Provides cluster FSW services SQL Server 2012 Native Client
(ms-sql-client1) Provides access to data in the SQL Server 2012 database TPC-E (MSTPCE1) Provides test workload data
TPC-E (MSTPCE2) Provides test workload data
20
© 2013 VCE Company, LLC. All Rights Reserved.
Solution validation
The testing of this solution validated the use of the high-availability and disaster recovery capabilities of SQL 2012 AlwaysOn Availability Groups with the Vblock System to enable business continuity of Microsoft operations.
Test environment design
The solution infrastructure was deployed on Windows Server 2008 R2 in a four-node Windows Server Failover Cluster (WSFC) over two Vblock 300EX Systems. One Vblock 300EX System served as the primary data center site and the other Vblock System served as the secondary data center site. The data center sites were connected by a simulated WAN.
A SQL Server 2012 instance was deployed on each node in the WSFC configuration. The SQL Server instances share an AlwaysOn Availability Group for high-availability and disaster recovery purposes. With this configuration, the primary data center site can fail over to the secondary data center site easily while maintaining availability and minimizing costly downtime.
Two additional virtual machines were created on the primary data center site for TPC-E, which was used to generate the application test workload. A distance emulator between the two Vblock Systems emulated a WAN for testing over distance.
Figure 4 shows the test environment.
21
© 2013 VCE Company, LLC. All Rights Reserved.
Workload generation
To test high availability and disaster recovery, VCE ran OLTP-like workloads against the SQL Server 2012 database environment in parallel with a simulated real world failure event and verified that the application and database were online and available during the failure. Workloads were generated using TPC Benchmark™ E (TPC-E).
TPC-E consists of a set of transactional operations designed to exercise system functionalities in a manner representative of complex OLTP application environments. To help users relate to the components of the benchmark, these transactional operations have been given a real-world context, depicting the activity of a brokerage firm. The workload is centered on the activity of processing brokerage trades and uses a schema that is divided in four sets of tables.
TPC-E models the activity of brokerage firm that must manage customer accounts, execute customer trade orders, and be responsible for the interactions of customers with financial markets. The
brokerage firm interacts with financial markets to execute orders on behalf of the customers and updates relevant account information.
The number of customers defined for the brokerage firm can be varied to represent the workloads of different size businesses.
This benchmark is composed of a set of transactions that are executed against three sets of database tables that represent market data, customer data, and broker data. A fourth set of tables contains generic dimension data such as zip codes. Figure 5 shows the key components of the environment:
22
© 2013 VCE Company, LLC. All Rights Reserved.
Distance emulation
Since the link between the production and disaster recovery sites is often a factor in the failure of a solution, it is important to understand the capacity, latencies, and utilization of the link when designing a disaster recovery solution. To characterize the effects that network latencies and bandwidth can have on the VCE SQL Server Availability Groups solution, multiple failure tests were performed at varying emulated distances.
In general, network latencies correspond to different distances. Round-trip latencies can vary from zero milliseconds (ms) to 250 ms or higher. Table 8 shows distances and their corresponding link latencies. The numbers shown in the table are estimates and can vary depending on link quality, network hops, and backbone hops.
Table 8. Distance and Average Link Latency
Approximate distance One-way latency (ms) Round-trip latency (ms)
50 km 0.25 0.5 500 km 2.5 5 1,000 km 5 10 5,000 km 25 50 10,000 km 50 100 20,000 km 100 200 22,500 km 125 250
23
© 2013 VCE Company, LLC. All Rights Reserved.
Test methodology
The following test methodology was used:
Primary database tables with data for up to 75,000 customers were created and populated using Microsoft TPC-E.
TPC-E tools generated the OLTP-like workloads used for testing purposes.
Various failure scenarios were emulated while workloads were run against the SQL Server 2012 databases.
Successful automatic and manual failover of the SQL Server database from the primary to the secondary site was verified for each failure scenario.
Baseline testing was conducted at 0 km between the primary and secondary sites. Failover testing was conducted with emulated distances of 50 km, 100 km, and 300 km between the primary site and the secondary site.
Replication was set to synchronous-commit mode at all distances. Under synchronous-commit mode, before committing transactions, a synchronous-commit primary replica waits for a synchronous-commit secondary replica to acknowledge that it has finished hardening the log. Synchronous-commit mode ensures that once a given secondary database is synchronized with the primary database, committed transactions are fully protected.
CPU, memory, storage, and network utilization was monitored before, during, and after each test to ensure a full production workload was running during peak activity.
To ensure successful failover and failback, row counts for three key tables (ADDRESS, BROKER, and DAILY_MARKET) were checked before, during, and after failover.
Test scenarios
The following failure scenarios were used to test the solution:
Data center site failure Network failure Storage array failure Server/node failure
Creating and configuring the Availability Group
Using the Availability Group Wizard, an Availability Group was configured with a primary replica (MS-SQL-DB1\PROD01) and one secondary replica (MS-SQL-DB2\PROD01) on the primary (production) data center site, and two secondary replicas (MS-SQL-DB3\PROD03 and MS-SQL-DB4\PROD04) on the secondary (disaster recovery) data center site. This provides redundancy in case of multiple failures on primary and secondary sites.
24
© 2013 VCE Company, LLC. All Rights Reserved.
Figure 6. SQL Server AlwaysOn Availability Group configuration
Automatic Failover Mode is an option that provides high availability by ensuring that the database is quickly made available again after the loss of the primary replica. Failover takes place without manual intervention in the event of an unplanned disaster.
To configure an availability group for automatic failover, the current primary replica and one secondary replica must be set to synchronous-commit mode with automatic failover. Synchronized mode
provides for little to no data loss but requires more available network bandwidth and is limited to latency requirements per SQL Server AlwaysOn limits.
To maximize performance and minimize data loss for disaster recovery, all replicas were configured for synchronous-commit mode with automatic failover.
25
© 2013 VCE Company, LLC. All Rights Reserved.
Figure 7. Failover and failback configuration
Test results
The following sections describe results for failover of the SQL Server 2012 AlwaysOn Availability Group in the event of:
Site failure Network failure Storage failure Server/node failure
Site failure
26
© 2013 VCE Company, LLC. All Rights Reserved.
First, baseline site failover testing was conducted without any load at a distance of 0 km. After
baseline testing, failover tests were conducted with a full production workload and emulated distances of 50 km, 100 km, and 300 km between the primary and secondary sites. Table 9 summarizes the test results.
Table 9. Site failure results
Distance Load/No load Automatic
failover time Manual failover time Failback time Result
Baseline No load ~ 2.5 minutes ~ 3 minutes — Zero data loss 50 km Load < 2 minutes ~ 3 minutes ~ 3 minutes Zero data loss 100 km Load ~ 5 minutes ~ 7 minutes ~ 6 minutes Zero data loss 300 km Load ~ 6 minutes ~ 9 minutes ~ 8 minutes Zero data loss A row count query was issued on the primary and secondary sites to verify zero data loss before, during, and after failover testing. The row counts for the ADDRESS, BROKER, and DAILY_MARKET financial transaction tables on the primary and secondary sites matched, which means no data was lost (Figure 8).
27
© 2013 VCE Company, LLC. All Rights Reserved.
Network failure
To emulate network failure, the network adapter was disabled from a cluster node virtual machine (Figure 9). This test was performed with a full production workload and an emulated distance of 100 km between the primary and secondary sites. Table 10 summarizes the test results.
Figure 9. Network failure emulation Table 10. Network failure results
Distance Load/No load Automatic
failover time Manual failover time Failback time Result
28
© 2013 VCE Company, LLC. All Rights Reserved.
Figure 10. Network failure row count query and results
Storage array failure
To emulate a storage array failure, storage was removed from the vb319-esxi-01 and vb319-esxi-02 storage groups on the VNX array on the primary site (Figure 11).
29
© 2013 VCE Company, LLC. All Rights Reserved.
Figure 11. Storage failure emulation Table 11. Storage failure results
Distance Load/No load Automatic
failover time Manual failover time Failback time Result
30
© 2013 VCE Company, LLC. All Rights Reserved.
Server/node failure
To emulate a server or node failure, the blade servers on which the cluster node virtual machines were installed on the primary site were powered off. Table 12 summarizes the test results.
Table 12. Server/node failure results
Distance Load/No load Automatic
failover time Manual failover time Failback time Result
100 km Load 6 minutes 7.4 minutes — Zero data loss A row count query was issued on the primary and secondary sites to verify zero data loss before, during, and after failover testing. The row counts for the ADDRESS, BROKER, and DAILY_MARKET financial transaction tables on the primary and secondary sites matched, which means no data was lost.
31
© 2013 VCE Company, LLC. All Rights Reserved.
Conclusion
This solution architecture paper provides a validated business continuity solution for Microsoft SQL Server 2012 on the Vblock System using the high-availability and disaster recovery capabilities of AlwaysOn Availability Groups. Testing and validation demonstrated that the solution provides data protection for SQL Server 2012 that meets SLA requirements for recovery from a broad range of failures.
The key findings of testing of this solution are as follows:
Using SQL Server 2012 AlwaysOn Availability Groups, the Recovery Time Objective (RTO) of less than one hour, and the Recovery Point Objective (RPO) of zero data loss were met in all forced manual and automatic failover scenarios at distances of 0, 50,100, and 300 kilometers. Through a combination of the Availability Groups feature and the storage capabilities of the
Vblock converged infrastructure, the Vblock System 300 can easily service SQL Server 2012 mission-critical online transaction processing (OLTP) production workloads while providing high availability.
Zero data loss is possible under typical customer high-volume workloads by following best practices from VMware, Cisco, and EMC for SQL Server 2012 mission-critical database environments that include high-volume update, read, and write transactional data.
By combining the power of the Vblock converged infrastructure and a virtualized SQL Server 2012, an organization can demonstrate business-critical levels of high performance and availability of database platforms, while accelerating the time to deployment and simplifying IT operations.
Next steps
32
© 2013 VCE Company, LLC. All Rights Reserved.
References
Introducing SQL Server AlwaysOn (http://msdn.microsoft.com/en-us/sqlserver/gg490638) Hardware and Software Requirements for Installing SQL Server 2012
(http://msdn.microsoft.com/en-us/library/ms143506.aspx)
Prerequisites, Restrictions, and Recommendations for AlwaysOn Availability Groups (SQL Server)(http://msdn.microsoft.com/en-us/library/ff878487.aspx)
Microsoft SQL Server on VMware Best Practices Guide
(http://www.vmware.com/files/pdf/sql_server_best_practices_guide.pdf)
© 2013 VCE Company, LLC. All Rights Reserved.
ABOUT VCE
VCE, formed by Cisco and EMC with investments from VMware and Intel, accelerates the adoption of converged infrastructure and cloud-based computing models that dramatically reduce the cost of IT while improving time to market for our customers. VCE, through the Vblock Systems, delivers the industry's only fully integrated and fully virtualized cloud infrastructure system. VCE
solutions are available through an extensive partner network, and cover horizontal applications, vertical industry offerings, and application development environments, allowing customers to focus on business innovation instead of integrating, validating, and managing IT infrastructure.
For more information, go to http://www.vce.com.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED "AS IS." VCE MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OR MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.