• No results found

High Availability (HA) Aidan Finn

N/A
N/A
Protected

Academic year: 2021

Share "High Availability (HA) Aidan Finn"

Copied!
47
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

About Aidan Finn

• Technical Sales Lead at MicroWarehouse (Dublin) • Working in IT since 1996

• MVP (Virtual Machine)

• Experienced with Windows Server/Desktop, System Center, virtualisation, and IT infrastructure

• @joe_elway

• http://www.aidanfinn.com

• http://www.petri.co.il/author/aidan-finn

(3)

Books

(4)

Agenda

• Item 1

• Item 2

• Item 3

(5)
(6)

High Availability

• From the Hyper-V perspective, HA is about infrastructure fault tolerance

• Example:

1. HostA is one of a number of hosts in a cluster

2. Every host in the cluster stores VM files on shared storage, such as a SAN

3. VM01 is running on HostA 4. HostA stops running

5. VM01 automatically fails over to another host in the cluster 6. VM01 automatically boots up

• There is some downtime for VM01 but it is minimized

(7)

A Typical Hyper-V Cluster

• Two or more hosts

• Each host is connected to a set of networks with special roles

(8)
(9)

The Heartbeat

You there? Yes

• Failover Clustering conducts health

monitoring between nodes to detect

when servers are no longer available

• When servers are unresponsive

clustering takes recovery action

(10)

Failover Cluster Virtual Adapter

• Failover Cluster Virtual Adapter (NetFT) is a virtual network adapter that builds fault-tolerant TCP connections across all available interfaces between nodes in the cluster

• NetFT is the mechanism by which clusters use multiple cluster-enabled adapters to communicate

• Seamless internode communication

– NetFT will dynamically and seamlessly switch cluster communication to a different network (based on priority) when a network fails

• Long story short: The cluster can use multiple enabled networks for cluster communications and is fault tolerant

(11)

Heartbeat Detection

• Runs on TCP 3343

• WS2012 R2 Hyper-V clusters:

– Nodes exchange heartbeats every 1 second

– Will allow for failure for up to 10 seconds (5 on non Hyper-V) for nodes on the same subnet

– Will allow for failure for up to 20 seconds (5 on non Hyper-V) for nodes on different subnets

(12)
(13)

What is Quorum?

• Quorum is when you have enough voters to come to an agreement

• Primary function of cluster is to keep mission critical services online

• It needs to accomplish this without causing corruption or confusion

(14)

Explaining Quorum

(15)

Quorum Basics

• Sticking with WS2012 R2 to keep this simple • Two types of vote breaker or witness

– Witness disk

• A 1 GB LUN that is created on the shared storage just for this purpose

• Configured as a witness disk in the cluster

• Owner of the disk is the vote breaker in case of tied vote for quorum

– File Share Witness

• Originally intended for multi-site clusters

(16)

Explaining Quorum

(17)

Other Quorum Concepts

• Sequential host failure (WS2012)

– Scenario when one host after another after another drops offline

– Quorum can be still obtained, even if less than half the nodes are online

• Dynamic quorum (WS2012 R2)

– When the cluster rigs the quorum voting process

– Intended to give cluster more chance of staying online – ALWAYS have a quorum witness

(18)
(19)

Why Storage Is Needed

• The Hyper-V hosts provide HA to VMs

• Each host must have access to the VMs’ storage

• There is no replication from host-to-host inside a cluster • All VMs are stored on shared storage

• Options include

– SAS storage area network (SAN) – iSCSI SAN

– Fibre channel SAN

– Fibre Channel over Ethernet (FCoE) SAN – PCI RAID (WS2012 +)

(20)

Connectivity

• Each node is connected to the shared storage • Exact same connectivity

• Dual path connectivity

– Multipath IO (MPIO) for traditional storage – SMB Multichannel for SMB 3.0 storage

• All disks/LUNs/shares on the storage are assigned to all nodes – Each host has equal access

(21)

Cluster-in-a-Box

• Take the requirements of a cluster • Put it into a single enclosure

– 2+ blade servers with own power + networking

– JBOD or PCI RAID shared storage

(22)
(23)

Cluster Shared Volume (CSV)

• Microsoft’s cluster file system

• Makes the volume on the disk active/active across all nodes • Store lots of VMs on a single volume

– All able to run on any node in the cluster

• Every node connected to the disk can read/write to the volume • One node owns the volume and is responsible for metadata

operations: – Owner

– AKA CSV Coordinator • No drive letter

(24)
(25)

Redirected IO

• A process used by CSV (only)

– Nodes in cluster redirect storage IO to pass over cluster network via CSV coordinator

– Done on per-CSV basis, not per-cluster • Used by W2008 R2 CSV for backup

– Caused concern

– Redirected IO NOT USED FOR BACKUP SINCE WS2012 • Redirected IO is used by WS2012 for:

– Very brief metadata operations: permissions, file metadata, file create, file open, file extend

(26)
(27)

Controlling Redirected IO

• On W2008 R2:

– Redirected IO went across the cluster communications network – Network with lowed routing metric (could be manipulated)

• On WS2012 and later:

– Uses SMB 3.0 and SMB Multichannel

– Can flood equal speed networks between nodes if not controlled – Use SMB Multichannel Constraints to select which networks to

talk to other cluster nodes

(28)

CSV Cache

• A read cache for virtual hard disks stored on the CSV • Uses percentage of cluster node’s RAM for the cache

– Size of cache is set once per cluster

– Boost read performance, e.g. VDI boot storm

– (Get-Cluster). SharedVolumeBlockCacheSizeInMB = 512 • WS2012

– Up to 20% of nodes’ RAM could be assigned to cache – Enable each required CSV for CSV Cache

– Get-ClusterSharedVolume “Cluster Disk 1” | Set-ClusterParameter CsvEnableBlockCache 1

– Required CSV to be disabled/enabled to start caching • WS2012 R2

(29)

Other CSV 2.0 Improvements

• WS2012:

– Uses mount point instead of junction point

– Single synchronised VSS Snapshot for backup - no Redirected IO during backup

– Can enable BitLocker

– NTFS on CSV appears as CSVFS

– Supported for Hyper-V and Scale-Out File Server • WS2012 R2:

(30)
(31)

Converged Networks

• In W2008 R2 we would have had 1 NIC or NIC team per required network

– Lots of NICs

– Very expensive to add 10 GbE or faster networking for peak usage

• Converged networks concept:

– Aggregate fewer NICs into an accumulation of bandwidth – Divide that bandwidth up using WS2012+ QoS into required

networks

– Makes adopting 10 GbE or faster from economic for medium/larger companies

(32)

Non-Converged with iSCSI

(33)
(34)
(35)
(36)

Creating a Cluster

• Easier than ever

• Get the pieces right first: – Storage

– Networking • Process:

1. Validate the cluster – fix until it passes 2. Deploy the cluster

(37)
(38)

Completing the Cluster

• Run Windows Update

– To get updates published via Windows Update

• Search for “Recommended update for Windows Server 2012 R2 Failover Clustering”

(39)

Configure Cluster Networks

• Rename the networks in Failover Cluster Manager

– I name them after the NICs that are on the networks • Select your Live Migration network(s)

(40)

Configure Witness

• Cluster wizard will automatically find a suitable Disk Witness if one is available

– Make sure you check this

(41)

Configure Storage

• SMB 3.0 Storage

– Create one share for File Share Witness – Create one or more shares for storing VMs – Add all hosts to a security group

– Add all admins to a security group – Grant full control to the shares

• Disk storage

– Provision 1 * 1 GB disk for disk witness

– Provision 1 or more LUNs per node in the cluster to store VMs – Connect the disks to all nodes in the cluster

– Activate (GPT) and format the disks in Disk Manager on one node

(42)
(43)

• Simple orchestration of cluster

node updates

• Determines updates needed,

moves workloads off nodes for

updates

– Uses Windows Update Agent direct from Microsoft or from WSUS

– Identifies node with least load – Puts node in maintenance mode

– Verifies success, then moves to next node

• Maintains service availability and

without impacting cluster quorum

• Can be:

– Scheduled

Update Coordinator

(44)

Enabling Cluster Self-Updating

• Place all cluster nodes and cluster computer account in an OU for the cluster

• Delegate rights to cluster CAP

– Create/manage computer objects in this OU

– This is used to create another CAP/computer object for self-updating CAU

(45)
(46)

Use FCM

• All management of HA VMs is locked out in Hyper-V Manager • Use Failover Cluster Manager

• You can order failover of VMs using Virtual Machine Priority (High/Medium/Low)

(47)

Backup

• There are products that support a WS2012 R2 Hyper-V cluster • And then there are products that do it at least decently

• Test & research

• Do not trust sales & marketing

References

Related documents

configuration inside the virtualized host (assigning VLAN IDs to ports of virtual switches or 421. virtual NICs of VMs) and the configuration outside the virtualized host

In this module, students will learn about high availability and disaster recovery with Hyper‑V virtual machines, and how to implement high availability in virtual environments by

Ovim člankom žele se potaknuti razmišlja- nja o važnosti starog bibliotečnog fonda te pojasniti da njegova vrijednost nije sadržana samo u davnim godinama tiskanja. On je sa-

Material handling is the process of unloading your freight from your shipping carrier, either at the warehouse or show site, delivering it to your booth, storing your empty

System Switch information, system and IP related settings Ports Port link status, port operation mode configuration VLAN VLAN related configuration. Aggregation

When using factory default settings, the switch automatically creates VLAN 1 as the default VLAN, the default interface status of all ports is Trunk, and all ports are configured

LACP, LLDP, CDP, MTU Leaf Ports + Policies AEP VMM Domain Physical Domain External Domain VLAN Pool 1 VLAN Pool 2 VLAN Pool 3 vCenter Virtual Distributed Switch. 4K

Add a Catalyst switch outside port to the outside access port VLAN, and connect the outside access port VLAN to the inside interface VLAN using the crypto connect vlan command..