Praha 2013
11.
ročník největší odbornéIT konference v ČR!Michal Tinthofer
|Michal.Tinthofer@Woodler.eu|
SQL Server AlwaysOn
Overview
Introduction to AlwaysOn & Availability Groups
How to set up AlwaysOn
Managing and tuning AlwaysOn
Monitoring Availability Groups
Common issues
to AlwaysOn & Availability Groups
SQL Server Always On Solutions
AlwaysOn ≠ Availability Groups
AlwaysOn = (SQL Server Failover Cluster Instances + Availability Groups) Availability Groups ≠ Database Mirroring
SQL Server Always On Solutions
Availability Groups • Non-Shared Storage Solution • (Group of) Database level HA • (Group of) Database level DR
• DR replica can be Active Secondary
• Databases must be in FULL recovery model Failover Cluster Instance
(for local HA) + Availability groups (for DR)
• Combined Shared and Non-Shared Storage • Instance level HA
• (Group of) Database level DR
• DR replica can be Active Secondary
The logical topology of a representative AlwaysOn solution
Windows Server Failover Clustering (WSFC) Cluster
WSFC Quorum Witness Remote File Share (Optional)
Network Subnet A Network Subnet B
Node B1 SQL Server Instance 3 Storage Node B2 SQL Server Instance 4 Storage Instance Network Name Instance Network Name WSFC Configuration WSFC Configuration Node A3 SQL Server Instance 2 Storage Instance Network Name WSFC Configuration Node A2 Node A1 SQL Server Failover Cluster Instance 1 Shared Storage Instance Network Name WSFC Configuration WSFC Configuration
AlwaysOn Availability Group
Availability Group Virtual Network Name
Set Up AlwaysOn
Pre-installation Planning:
1.1 Synchronous or asynchronous replication?
Up to three synchronous Total of four replicas + primary
1.2 Where should be Automatic failover?
Two Automatic failover partners (primary and secondary)
2.1 Decision is made by number of nodes and votes used to
prevent cluster split brain
Odd: Node majority
Even: Node and file share majority
How many nodes need to be up for the cluster to stay online, and which nodes should vote.
3.1. How many databases do you want in each group?
3.2. Do you want to move groups around for load balancing?
3.3. Do you need to separate reporting databases from the
OLTP databases
Report db can be read-only replica of production OLTP?
3.4. If just one database on an instance fails, what do you want to happen?
3.5. How many AG you need based on db priority and function
Auto Failover and synch replica or manual failover and asynch replica.
4. If any other servers require restores from your new 2012 instance, they should get 2012 first.
(Think development, disaster recovery, or reporting servers.)
5. Review non-database stuff installed on old servers.
Plan for SSIS, SSAS, SSRS, DTS packages, logins, Agent jobs, etc.
1. Choose number and location of replicas
2. Choose Quorum model (based on number of replicas)
3. Design Availability Groups
4. Plan DR strategy with 2k12 SQL servers only
Windows Configuration
Create a single instance account for all SQL Servers
You can use different accounts
per node if NO Kerberos auth. will ever be used for listener.
Use same drive letters for across all nodes (No C:\ drive please ) Use static IP addresses for all
servers & subnets.
If SQL Failover Cluster instance (FCI) will be used:
Reserve IP also for SQL instance &
Cluster management
Configure DTC if clustered
Achieve network redundancy between nodes before SQL installation.
No separate heartbeat network necessary
Use more NICs with separate IPs or Team them.
Where backups will be done?
Choose preferred & secondary server
What if failover to DR site occurs ?
Plan for both sites (DR & HA) Use windows update to get
servers up to date.
Installation
Single domain should be used!
Install Windows prerequisite hotfixes for Availability
Groups
http://msdn.microsoft.com/en-us/library/ff878487.aspx#WinHotfixes Install the hotfix to allow nodes with 0 votes
-http://support.microsoft.com/kb/249403
This is important to do on all nodes
Open Windows firewall ports 1433,1434 and 5022 on
all nodes.
Enable Instant File Initialization.
.NET Framework 3.5.1 feature and the Failover
Cluster Installation step by step
Install Failover
Cluster Installation step by step
Create a primary
cluster via
Cluster Installation step by step
Pass validation
tests
SCSI SAN must
support SCSI-3
Persistent
Reservation
Cluster Installation step by step
Specify cluster name for MSCS service
Cluster Installation step by step
Windows Cluster
created
With two nodes
for Primary site
AlwaysOn
Cluster Installation settings
Create folder on
shared server
Will be used for new quorum settings R/W access
Go to cluster
management and
change quorum mode
Servers will need to exchange backup
Cluster Installation settings
Quorum is managed by the WSFC, irrespective of the number of SQL Server instances, number of nodes, number of availability groups
Important goal: Design to ensure
Unavailability of the DR site (or the node at DR site) , or loss of network connectivity between sites should not impact the
quorum of the WSFC Two steps:
Node votes: First decide which nodes should have a vote
Primary Data Center Disaster Recovery Data Center
SQL Server Primary
SQL Server Secondary
Windows Server Failover Cluster (single WSFC crossing two data centers)
Availability Group SQL Server Secondary Synchronous Asynchronous File Share
Use this quorum model with a protected file share witness.
The Fileshare Witness always has 1 vote.
Cluster Installation settings
Add an additional voting node to the WSFC in the primary data center, and then use the this quorum model.
Cluster Installation settings
Node Majority
Primary Data Center Disaster Recovery
Data Center
SQL Server Primary
SQL Server Secondary
Windows Server Failover Cluster (single WSFC crossing two data centers)
Availability Group
SQL Server Secondary
Synchronous
Asynchronous
Cluster Installation settings
Pick Node and file share majority
Cluster Installation settings
Confirm new quorum settings
Important design goal is to ensure that:
AlwaysOn
Cluster Installation settings
Cluster Installation settings
Add shared disk to cluster
Suooprted
SCSI
iSCSI
SQL Clustered Instance setup
Prefer default instance over named
Use single AD service account for all instances
You will be able to use Kerberos authentication in future
Also use same collation on all instances
Use same drive letters on all nodes
Apply latest Service Pack for SQL Server 2012
Enable AlwaysOn HADR via Pwshell or ConfMgr.
Prepare SAN storage
Confirm network routing between subnets
Configure SQL Instance as you do usually after new installation
SQL Clustered Instance setup
SQL Clustered Instance setup
SQL Clustered
Instance setup
SQL Clustered Instance setup
Pick features and
SQL Clustered Instance setup
Pick name of cluster
resource group
There will be stored
resources for PROD
cluster
Then choose available
shared storage
And IP address on
SQL Clustered Instance setup
Add accounts to
sysadmin role
Optionally you can
make AD group which
will hold all service
accounts of AG
SQL Clustered Instance setup
Set Service account for
new instance
Finish installation and
SQL Clustered Instance setup
SQL Clustered Instance setup
Choose existing FCI
Then confirm network
interface used for
communication
Fill password for service
SQL Clustered Instance setup
After successful
installation you
will be asked for
SP
Patch the server
AlwaysOn
Enable AlwaysON on PROD FCI
Via Configuration Manager
Or Powershell
DR Cluster Installation
DR Cluster Installation
Some resources cannot be used on those
DR nodes
Such as:
Shared drives
PROD LAN
DR Cluster Installation
Remove Cluster Votes for offsite nodes.
Also remove nodes from possible &
preferred owners on PROD FCI resources
We need to separate
AlwaysOn
DR Cluster Installation
Install second FCI
This will be our DR site
Pick features and
name
NW name
Hostname of instance
ID Instance
Depends on Type
Pick name of cluster
resource group
DR Cluster Installation
Then choose available
shared storage
And IP address on
which DR instance
will listen
Set Service account
DR Cluster Installation
TEMPDB on Local Disk
Enables use of local storage for
TEMPDB
Can use solid state storage to
improve performance of TEMPDB-heavy workloads
Saves money on storage
replication licensing
Reduces cross-data center storage
replication traffic
DR Cluster Installation
Same as on PROD
FCI add second
node on DR FCI
DR Cluster Configuration
Enable AlwaysOn on DR FCI
Remove PROD nodes from possible
FCI Configuration
If named instance used, set static port on All nodes TCP/IP
FCI Configuration
Again ensure that correct possible & preferred owners
are set on all resources in cluster application
AlwaysOn
AlwaysOn (Availability groups) setup
Databases should be:
In Full Recovery mode Full backup must exist
FCI+AG using manual
Special Case: Automatic Failover for DR
Use of 3rd Data Center and synchronous mode
Primary Data Center Disaster Recovery
Data Center SQL Server
Primary
SQL Server Secondary
Windows Server Failover Cluster
Availability Group
Synchronous File Share
AlwaysOn (Availability groups) setup
AlwaysOn (Availability groups) setup
Build your backup strategy for DR
Create AG Listener to provide centralized Client management.
AlwaysOn (Availability groups) setup
Select Initial
Synchronization method
Don’t forget on AG
prerequisites
AlwaysOn (Availability groups) setup
Fixing AG prerequisites
In this case folder structure
should be same
Create same folders on
AlwaysOn (Availability groups) setup
Validation scripting
and process
Logins
Endpoints
“Hard_endpoint”
Start XE session
“AlwaysOn_Health”
Create AG
Add Listener
Join Members
Backup/Restore dbs
Wait 5 min to start
AlwaysOn
Result: FCI +AG
Three separate FCIs
Production
Holds PROD SQL service
Disaster Recovery
Holds DR SQL service
AlwaysOn
Holds AG Listener
Two separate LANs
Nodes are evenly
Disaster = Primary site is down
Manual Process involved to bring database service online on the DR site Force Quorum on the secondary in the DR site
Start-ClusterNode –Name "DRNODE1" –FixQuorum
If still running Stop-ClusterNode –Name "DRNODE1"
Execute FORCE SERVICE ALLOW DATA LOSS
ALTER AVAILABILITY GROUP [AlwaysOnAG] FORCE_FAILOVER_ALLOW_DATA_LOSS;
Adjust quorum model and/or node votes
(Get-ClusterNode "DRNode1").NodeWeight=1 (Get-ClusterNode "DRNode2").NodeWeight=1 (Get-ClusterNode "PrimaryNode1").NodeWeight=0 (Get-ClusterNode "PrimaryNode2").NodeWeight=0
Reporting: Get-ClusterNode | fl NodeName, NodeWeight
Adjust quorum model node votes to 1 for primary site nodes Resume synchronization for all paused AlwaysOn databases Optionally switch to synchronous mode
To ensure transaction safety Failback to primary site
Switch back to asynchronous mode
Adjust quorum model node votes to 0 for dr site nodes
How to failover via GUI or TSQL
SQL Server Failover Cluster Instance DNS Settings
RegisterAllProvidersIP setting for AG listener
Reducing client recovery latency after failover
Connection strings that set MultiSubnetFailover to true/yes
AlwaysOn Availability Groups sets the RegisterAllProvidersIP property to 1 in order to reduce re-connection time after a failover
Clients will connect to all IP addresses simultaneously Available providers
.NET Framework 3.5 SP1 + connectivity patch
SQL Native Client 11.0 ODBC
Microsoft JDBC driver 4.0 for SQL Server
Legacy strings
change RegisterAllProvidersIP to 0
The active IP address (instead of all) is listed in the Client Access Point in the WSFC cluster, reducing latency for legacy clients.
Change HostRecordTTL to 300
Testing MultiSubnetFailover
Same process as ReadOnlyRouting in my previous
FCI monitoring
On SQL 2k8
ResDLL query @@Servername
If two times no SQL response = resource down, then failover
On SQL 2k12
ResDLL query SP_SERVER_DIAGNOSTICS Continuous send of data from SQL instance
Is more precise
Better configuration of failover policy
Flexible Failover Policy
FailureConditionLevel (0 to 5)
5 – Failover or restart on any qualified failure
4 – Failover or restart on moderate SQL Server errors 3 – Failover or restart on critical SQL Server errors 2 – Failover or restart on SQL Server unresponsive 1 – Failover or restart on SQL Server down
0 – No Automatic Failover or restart
User sets new Cluster properties
HealthCheckTimeout and FailureConditionLevel
Flexible Failover Policy
Configuration can be viewed and reconfigured from SQL
server or Cluster resource
HealthCheckTimeout Default 60sec Min 15sec Speed of Responses FailureConditionLevel Default 3
Failover or restart on critical SQL Server errors
Should be enough
User Configurable
Diagnostics are always captured
Instance root LOG\%SQLDIAG%