About Aidan Finn
• Technical Sales Lead at MicroWarehouse (Dublin) • Working in IT since 1996
• MVP (Virtual Machine)
• Experienced with Windows Server/Desktop, System Center, virtualisation, and IT infrastructure
• @joe_elway
• http://www.aidanfinn.com
• http://www.petri.co.il/author/aidan-finn
Books
Agenda
• Item 1
• Item 2
• Item 3
High Availability
• From the Hyper-V perspective, HA is about infrastructure fault tolerance
• Example:
1. HostA is one of a number of hosts in a cluster
2. Every host in the cluster stores VM files on shared storage, such as a SAN
3. VM01 is running on HostA 4. HostA stops running
5. VM01 automatically fails over to another host in the cluster 6. VM01 automatically boots up
• There is some downtime for VM01 but it is minimized
A Typical Hyper-V Cluster
• Two or more hosts
• Each host is connected to a set of networks with special roles
The Heartbeat
You there? Yes
• Failover Clustering conducts health
monitoring between nodes to detect
when servers are no longer available
• When servers are unresponsive
clustering takes recovery action
Failover Cluster Virtual Adapter
• Failover Cluster Virtual Adapter (NetFT) is a virtual network adapter that builds fault-tolerant TCP connections across all available interfaces between nodes in the cluster
• NetFT is the mechanism by which clusters use multiple cluster-enabled adapters to communicate
• Seamless internode communication
– NetFT will dynamically and seamlessly switch cluster communication to a different network (based on priority) when a network fails
• Long story short: The cluster can use multiple enabled networks for cluster communications and is fault tolerant
Heartbeat Detection
• Runs on TCP 3343
• WS2012 R2 Hyper-V clusters:
– Nodes exchange heartbeats every 1 second
– Will allow for failure for up to 10 seconds (5 on non Hyper-V) for nodes on the same subnet
– Will allow for failure for up to 20 seconds (5 on non Hyper-V) for nodes on different subnets
What is Quorum?
• Quorum is when you have enough voters to come to an agreement
• Primary function of cluster is to keep mission critical services online
• It needs to accomplish this without causing corruption or confusion
Explaining Quorum
Quorum Basics
• Sticking with WS2012 R2 to keep this simple • Two types of vote breaker or witness
– Witness disk
• A 1 GB LUN that is created on the shared storage just for this purpose
• Configured as a witness disk in the cluster
• Owner of the disk is the vote breaker in case of tied vote for quorum
– File Share Witness
• Originally intended for multi-site clusters
Explaining Quorum
Other Quorum Concepts
• Sequential host failure (WS2012)
– Scenario when one host after another after another drops offline
– Quorum can be still obtained, even if less than half the nodes are online
• Dynamic quorum (WS2012 R2)
– When the cluster rigs the quorum voting process
– Intended to give cluster more chance of staying online – ALWAYS have a quorum witness
Why Storage Is Needed
• The Hyper-V hosts provide HA to VMs
• Each host must have access to the VMs’ storage
• There is no replication from host-to-host inside a cluster • All VMs are stored on shared storage
• Options include
– SAS storage area network (SAN) – iSCSI SAN
– Fibre channel SAN
– Fibre Channel over Ethernet (FCoE) SAN – PCI RAID (WS2012 +)
Connectivity
• Each node is connected to the shared storage • Exact same connectivity
• Dual path connectivity
– Multipath IO (MPIO) for traditional storage – SMB Multichannel for SMB 3.0 storage
• All disks/LUNs/shares on the storage are assigned to all nodes – Each host has equal access
Cluster-in-a-Box
• Take the requirements of a cluster • Put it into a single enclosure
– 2+ blade servers with own power + networking
– JBOD or PCI RAID shared storage
Cluster Shared Volume (CSV)
• Microsoft’s cluster file system
• Makes the volume on the disk active/active across all nodes • Store lots of VMs on a single volume
– All able to run on any node in the cluster
• Every node connected to the disk can read/write to the volume • One node owns the volume and is responsible for metadata
operations: – Owner
– AKA CSV Coordinator • No drive letter
Redirected IO
• A process used by CSV (only)
– Nodes in cluster redirect storage IO to pass over cluster network via CSV coordinator
– Done on per-CSV basis, not per-cluster • Used by W2008 R2 CSV for backup
– Caused concern
– Redirected IO NOT USED FOR BACKUP SINCE WS2012 • Redirected IO is used by WS2012 for:
– Very brief metadata operations: permissions, file metadata, file create, file open, file extend
Controlling Redirected IO
• On W2008 R2:
– Redirected IO went across the cluster communications network – Network with lowed routing metric (could be manipulated)
• On WS2012 and later:
– Uses SMB 3.0 and SMB Multichannel
– Can flood equal speed networks between nodes if not controlled – Use SMB Multichannel Constraints to select which networks to
talk to other cluster nodes
CSV Cache
• A read cache for virtual hard disks stored on the CSV • Uses percentage of cluster node’s RAM for the cache
– Size of cache is set once per cluster
– Boost read performance, e.g. VDI boot storm
– (Get-Cluster). SharedVolumeBlockCacheSizeInMB = 512 • WS2012
– Up to 20% of nodes’ RAM could be assigned to cache – Enable each required CSV for CSV Cache
– Get-ClusterSharedVolume “Cluster Disk 1” | Set-ClusterParameter CsvEnableBlockCache 1
– Required CSV to be disabled/enabled to start caching • WS2012 R2
Other CSV 2.0 Improvements
• WS2012:
– Uses mount point instead of junction point
– Single synchronised VSS Snapshot for backup - no Redirected IO during backup
– Can enable BitLocker
– NTFS on CSV appears as CSVFS
– Supported for Hyper-V and Scale-Out File Server • WS2012 R2:
Converged Networks
• In W2008 R2 we would have had 1 NIC or NIC team per required network
– Lots of NICs
– Very expensive to add 10 GbE or faster networking for peak usage
• Converged networks concept:
– Aggregate fewer NICs into an accumulation of bandwidth – Divide that bandwidth up using WS2012+ QoS into required
networks
– Makes adopting 10 GbE or faster from economic for medium/larger companies
Non-Converged with iSCSI
Creating a Cluster
• Easier than ever
• Get the pieces right first: – Storage
– Networking • Process:
1. Validate the cluster – fix until it passes 2. Deploy the cluster
Completing the Cluster
• Run Windows Update
– To get updates published via Windows Update
• Search for “Recommended update for Windows Server 2012 R2 Failover Clustering”
Configure Cluster Networks
• Rename the networks in Failover Cluster Manager
– I name them after the NICs that are on the networks • Select your Live Migration network(s)
Configure Witness
• Cluster wizard will automatically find a suitable Disk Witness if one is available
– Make sure you check this
Configure Storage
• SMB 3.0 Storage
– Create one share for File Share Witness – Create one or more shares for storing VMs – Add all hosts to a security group
– Add all admins to a security group – Grant full control to the shares
• Disk storage
– Provision 1 * 1 GB disk for disk witness
– Provision 1 or more LUNs per node in the cluster to store VMs – Connect the disks to all nodes in the cluster
– Activate (GPT) and format the disks in Disk Manager on one node
• Simple orchestration of cluster
node updates
• Determines updates needed,
moves workloads off nodes for
updates
– Uses Windows Update Agent direct from Microsoft or from WSUS
– Identifies node with least load – Puts node in maintenance mode
– Verifies success, then moves to next node
• Maintains service availability and
without impacting cluster quorum
• Can be:
– Scheduled
Update Coordinator
Enabling Cluster Self-Updating
• Place all cluster nodes and cluster computer account in an OU for the cluster
• Delegate rights to cluster CAP
– Create/manage computer objects in this OU
– This is used to create another CAP/computer object for self-updating CAU
Use FCM
• All management of HA VMs is locked out in Hyper-V Manager • Use Failover Cluster Manager
• You can order failover of VMs using Virtual Machine Priority (High/Medium/Low)
Backup
• There are products that support a WS2012 R2 Hyper-V cluster • And then there are products that do it at least decently
• Test & research
• Do not trust sales & marketing