vSphere HA and vSphere Fault Tolerance
vSphere HA and vSphere Fault Tolerance
Module 9 Module 9
Y
Y
ou
ou
Are Here
Are Here
1.
1. Course IntroductionCourse Introduction
2.
2. Software-Defined Data Center Software-Defined Data Center
3.
3. Creating Virtual MachinesCreating Virtual Machines
4.
4. vCenter Server vCenter Server
5.
5. Configuring and ManagingConfiguring and Managing
Virtual Networks
Virtual Networks
6.
6. Configuring and ManagingConfiguring and Managing
Virtual Storage
Virtual Storage
7.
7. Virtual Machine ManagementVirtual Machine Management
8.
8. Resource Management andResource Management and
Monitoring
Monitoring
9.
9. vSphere HA and vSpherevSphere HA and vSphere Fault Tolerance
Fault Tolerance 10.
10. Host ScalabilityHost Scalability
11.
11. vSphere Update Manager andvSphere Update Manager and
Host Maintenance
Host Maintenance
12.
Y
Y
ou
ou
Are Here
Are Here
1.
1. Course IntroductionCourse Introduction
2.
2. Software-Defined Data Center Software-Defined Data Center
3.
3. Creating Virtual MachinesCreating Virtual Machines
4.
4. vCenter Server vCenter Server
5.
5. Configuring and ManagingConfiguring and Managing
Virtual Networks
Virtual Networks
6.
6. Configuring and ManagingConfiguring and Managing
Virtual Storage
Virtual Storage
7.
7. Virtual Machine ManagementVirtual Machine Management
8.
8. Resource Management andResource Management and
Monitoring
Monitoring
9.
9. vSphere HA and vSpherevSphere HA and vSphere Fault Tolerance
Fault Tolerance 10.
10. Host ScalabilityHost Scalability
11.
11. vSphere Update Manager andvSphere Update Manager and
Host Maintenance
Host Maintenance
12.
Importance
Importance
Most organizations rely on computer-based services like email, Most organizations rely on computer-based services like email, databases, and Web-based applications.
databases, and Web-based applications. The failure of any of The failure of any of thesethese services can mean lost productivity and revenue.
services can mean lost productivity and revenue.
Configuring highly available, computer-based services is
Configuring highly available, computer-based services is extremelyextremely important for an organization to remain competitive in contemporary important for an organization to remain competitive in contemporary business environments.
Module Lessons
Lesson 1: Introduction to vSphere HA Lesson 2: vSphere HA Architecture Lesson 3: Configuring vSphere HA
Lesson 4: Introduction to vSphere Fault Tolerance
Lesson 1:
Learner Objectives
By the end of this lesson, you should be able to meet the following objectives:
• Describe the options that you can configure to make your VMware vSphere® environment highly available
• Discuss the response of VMware vSphere® High Availability when a VMware ESXi™ host, a virtual machine, or an application fails
Protection at Every Level
vSphere makes it possible to reduce planned downtime, prevent unplanned downtime, and recover rapidly from outages. NIC Teaming, Storage Multipathing vSphere vMotion, vSphere DRS vSphere Storage
vMotion Site Recovery
Manager
Component Server Storage Data Site
vSphere Replication, Third-Party
Backup Solutions, vSphere Data Protection vSphere HA and vSphere Fault
vCenter Server Availability: Recommendations
Make VMware vCenter Server™ and the components that it relies on highly available.
vCenter Server relies on these major components:
• vCenter Server database:
– Create a cluster for the database. • Authentication identity source:
– For example, VMware Center™ Single Sign-On™ and Active Directory. – Set up with multiple redundant servers.
Methods for making vCenter Server available:
About vSphere HA
vSphere HA uses multiple ESXi hosts configured as a cluster to provide rapid recovery from outages and cost-effective high availability for
applications running in virtual machines.
Protects against server failures
Protects against application failures
Protects against datastore accessibility failures
Protects virtual machines against network isolation
vSphere HA Scenarios: ESXi Host Failure
Virtual Machine A Virtual Machine B Virtual Machine C Virtual Machine F vCenter ServerESXi Host ESXi Host ESXi Host
Virtual Machine D
Virtual Machine E Virtual Machine A Virtual Machine B
When a host fails, vSphere HA restarts the affected virtual machines on other hosts.
vSphere HA Scenarios: Guest Operating System Failure
vCenter Server
ESXi Host ESXi Host
= vSphere HA Cluster ESXi Host When a virtual machine stops sending heartbeats or the virtual machine process crashes (vmx), vSphere HA resets the virtual machine.
Virtual Machine C
VMware Tools VMware Tools
Virtual Machine E VMware Tools Virtual Machine F VMware Tools Virtual Machine A VMware Tools Virtual Machine B VMware Tools Virtual Machine D
vSphere HA Scenarios: Application Failure
vCenter Server
ESXi Host ESXi Host ESXi Host
Virtual Machine E
Application When an application
fails, vSphere HA restarts the affected virtual machine on the same host. Requires installation of VMware Tools™. Virtual Machine C Application Virtual Machine F Application Virtual Machine D Application Virtual Machine A Application Virtual Machine B Application = vSphere HA Cluster
Importance of Redundant Heartbeat Networks
In a vSphere HA cluster, heartbeats have these characteristics:
• Heartbeats are sent between the master host and the slave hosts.
• They are used to determine whether a master host or slave host has failed. • They are sent over a heartbeat network.
Redundant heartbeat networks ensure reliable failure detection. Heartbeat network implementation:
Redundancy Using NIC Teaming
You can use NIC teaming to create a redundant heartbeat network on ESXi hosts.
Ports or port groups used must be VMkernel ports.
Redundancy Using Additional Networks
You can also create redundancy by configuring more heartbeat
networks: On each ESXi host, create a second VMkernel port on a separate virtual switch with its own physical adapter.
Review of Learner Objectives
You should be able to meet the following objectives:
• Describe the options that you can configure to make your VMware vSphere® environment highly available
• Discuss the response of VMware vSphere® High Availability when a VMware ESXi™ host, a virtual machine, or an application fails
Lesson 2:
Learner Objectives
By the end of this lesson, you should be able to meet the following objectives:
• Describe the heartbeat mechanisms used by vSphere HA • Identify and discuss other failure scenarios
vSphere HA Architecture: Agent Communication
To configure high availability, ESXi hosts are grouped into an object called a cluster.
vCenter Server ESXi Host (Slave)
FDM
ESXi Host (Master) FDM
ESXi Host (Slave) FDM vpxd hostd hostd hostd Datastore Datastore Datastore = Management Network
vSphere HA Architecture: Network Heartbeats
The master host sends periodic heartbeats to the slave hosts so that the slave hosts know that the master host is alive.
vCenter Server Virtual Machine A Virtual Machine B Virtual Machine C Virtual Machine D Virtual Machine E Virtual Machine F
Slave Host Slave Host Master Host
VMFS VMFS NAS/NFS
Management Network 1 Management Network 2
vSphere HA Architecture: Datastore Heartbeats
Management Network 1 Management Network 2 vCenter Server Virtual Machine A Virtual Machine B Virtual Machine C Virtual Machine D Virtual Machine E Virtual Machine FSlave Host Master Host Slave Host
VMFS VMFS NAS/NFS
Cluster Edit Settings Window Datastores are used as a backup communication channel to detect virtual
Additional vSphere HA Failure Scenarios
• Slave host failure • Master host failure • Host isolation
• Virtual machine storage failure:
– Virtual Machine Component Protection
• All Paths Down
• Permanent Device Loss
Failed Slave Host
When a slave host does not respond to the network heartbeat issued by the master host, the master vSphere HA agent tries to identify the cause.
vCenter Server Virtual Machine A Virtual Machine B Virtual Machine C Virtual Machine D Virtual Machine E Virtual Machine F Failed Slave Host
Master Host Slave Host NAS/NFS
(Lock File)
File Locks File Locks
Primary Heartbeat Network Alternate Heartbeat Network
VMFS (Heartbeat Region)
When the master host is placed in maintenance mode or crashes, the slave hosts detect that the master host is no longer issuing heartbeats.
Failed Master Host
Virtual Machine A Virtual Machine B Virtual Machine C Virtual Machine D Virtual Machine E Virtual Machine F Slave Host MOID: 98 File Locks NAS/NFS (Lock File) File Locks vCenter Server
Primary Heartbeat Network Alternate Heartbeat Network MOID = Managed Object ID
Default Gateway (Isolation Address) Slave Host MOID: 100 VMFS (Heartbeat Region) master host MOID: 99
Failed Master Host MOID: 99
Isolated Host
If the host does not observe
election traffic on the management and cannot ping its default
gateway, the host is isolated.
Virtual Machine A Virtual Machine B Virtual Machine C Virtual Machine D Virtual Machine E Virtual Machine F
ESXi Host ESXi Host
Default Gateway (Isolation Address)
ESXi Host
Primary Heartbeat Network Alternate Heartbeat Network
Design Considerations
Host isolation events can be minimized through good design:
• Implement redundant heartbeat networks. • Implement redundant isolation addresses.
If host isolation events do occur, good design enables vSphere HA to determine whether the isolated host is still alive.
Implement datastores so that they are separated from the management network by using one or both of the following approaches:
• Fibre Channel over fiber optic
Virtual Machine Storage Failures
With an increasing number of virtual machines and datastores on each host, storage connectivity issues have high costs but are infrequent. Connectivity problems due to:
• Network or switch failure • Array misconfiguration • Power outage
Virtual machine availability is affected:
• Virtual machines on affected hosts are difficult to manage.
• Applications with attached disks crash.
Virtual Machine Component Protection
Virtual Machine Component Protection (VMCP) protects against storage failures in a virtual machine.
Only vSphere HA clusters that contain ESXi 6 hosts can be used to enable VMCP. Runs on cluster enabled for vSphere HA. VMCP detects and responds to failures. Application availability and remediation. ESXi ESXi
Review of Learner Objectives
You should be able to meet the following objectives:
• Describe the heartbeat mechanisms used by vSphere HA • Identify and discuss other failure scenarios
Lesson 3:
Learner Objectives
By the end of this lesson, you should be able to meet the following objectives:
• Recognize the prerequisites for creating and using a vSphere HA cluster • Configure a vSphere HA cluster
About Clusters
A cluster is a collection of ESXi hosts and their associated
virtual machines, configured to share their resources.
vCenter Server manages cluster resources like a single pool of resources.
Components such as vSphere HA and VMware vSphere® Distributed Resource
Scheduler™ are configured on
vSphere HA Prerequisites
• All hosts must be licensed for vSphere HA. • A cluster must contain at least two hosts.
• All hosts must be configured with static IP addresses. If you are using DHCP, you must ensure that the address for each host persists across reboots.
• All hosts must have at least one management network in common. • All hosts must have access to the same virtual machine networks and
datastores.
• For Virtual Machine Monitoring to work, VMware Tools™ must be installed. • Only vSphere HA clusters that contain ESXi 6 hosts can be used to enable
Configuring vSphere HA Settings
When you create a vSphere HA cluster or configure a cluster, you must configure settings that determine how the feature works.
vSphere HA Settings: Virtual Machine Monitoring (1)
You use Virtual Machine Monitoring settings to control the monitoring of virtual machines.
vSphere HA Settings: Datastore Heartbeating
A heartbeat file is created on the selected datastores and is used in the event of a management network failure.
vSphere HA Settings: Admission Control
vCenter Server uses admission control to ensure that:
Sufficient resources are available in a cluster to provide failover
protection
vSphere HA Settings: Advanced Options
To customize vSphere HA behavior, you set advanced vSphere HA options. To force cluster not to use the default isolation address (default gateway):
• das.usedefaultisolationaddress = false
To force cluster to ping alternate isolation addresses:
• das.isolationaddressX = pintable address
To force cluster to wait beyond default 30-second isolation action window:
Configuring Virtual Machine Overrides
You can override the vSphere HA settings that are set on a cluster for individual virtual machines in that cluster.
Before changing the networking settings on an ESXi host (adding port groups, removing virtual switches, and so on), you must suspend the Host Monitoring feature and place the host in maintenance mode.
This practice prevents unwanted attempts to fail over virtual machines.
Cluster Resource Reservation
The Resource Reservation tab reports total cluster CPU, memory, memory overhead, storage capacity, the capacity reserved by virtual machines, and how much capacity is still available.
Monitoring Cluster Status
Lab 21: Using vSphere HA
Demonstrate vSphere HA functionality
1. Create a Cluster Enabled for vSphere HA 2. Add Your ESXi Host to a Cluster
3. Test vSphere HA Functionality
4. View the vSphere HA Cluster Resource Usage 5. Manage vSphere HA Slot Size
6. Configure a vSphere HA Cluster with Strict Admission Control 7. Prepare for Upcoming Labs
Review of Learner Objectives
You should be able to meet the following objectives:
• Recognize the prerequisites for creating and using a vSphere HA cluster • Configure a vSphere HA cluster
Lesson 4:
Introduction to vSphere Fault
Tolerance
Learner Objectives
By the end of this lesson, you should be able to meet the following objectives:
• List VMware vSphere® Fault Tolerance requirements and limitations • Describe vSphere Fault Tolerance operation
vSphere Fault Tolerance
vSphere Fault Tolerance provides instantaneous failover and continuous availability:
• Zero downtime • Zero data loss
• No loss of TCP connections
ESXi
Primary Virtual Machine Secondary Virtual Machine
Instantaneous Failover Fast Checkpointing
vSphere Fault Tolerance Features (1)
vSphere Fault Tolerance Features (1)
vSphere Fault
vSphere Fault TTolerance protects olerance protects mission-critical, high-performancemission-critical, high-performance applications regardless of the operating system used.
applications regardless of the operating system used. vSphere Fault Tolerance:
vSphere Fault Tolerance:
•• Supports up to four virtual CPUsSupports up to four virtual CPUs •• Supports up to 64 GB of memorySupports up to 64 GB of memory
•• Supports VMware vSphere® vMotion® for primary and secondary virtualSupports VMware vSphere® vMotion® for primary and secondary virtual machines
machines
•• Creates a secondary copy of all Creates a secondary copy of all virtual machine files, including disksvirtual machine files, including disks •• Provides fast checkpoint copying to kProvides fast checkpoint copying to keep primary and secondary CPUseep primary and secondary CPUs
synchronized synchronized
vSphere Fault Tolerance Features (2)
vSphere Fault Tolerance Features (2)
vSphere Fault Tolerance: vSphere Fault Tolerance:
•• Supports thin-provisioned disksSupports thin-provisioned disks
•• Supports memory virtualization hardware assistSupports memory virtualization hardware assist •• Supports Enhanced vMotion Compatibility clustersSupports Enhanced vMotion Compatibility clusters
How vSphere Fault Tolerance Works with vSphere HA and vSphere
How vSphere Fault Tolerance Works with vSphere HA and vSphere
DRS
DRS
vSphere Fault Tolerance works with vSphere HA and vSphere DRS. vSphere Fault Tolerance works with vSphere HA and vSphere DRS. vSphere HA:
vSphere HA:
•• Is required for vSphere Fault ToleranceIs required for vSphere Fault Tolerance •• Restarts failed virtual machinesRestarts failed virtual machines
•• Is vSphere Fault Tolerance awareIs vSphere Fault Tolerance aware
vSphere DRS: vSphere DRS:
•• Selects the virtual machine’s location at power Selects the virtual machine’s location at power -on-on
•• Does not balance fault-tolerant virtual machines in a balanced cluster Does not balance fault-tolerant virtual machines in a balanced cluster
E ESSXXii EESSXXii EESSXXii New Secondary New Secondary Machine Machine Primary Primary Machine Machine Secondary Secondary Machine Machine
Redundant VMDKs
vSphere Fault Tolerance creates two complete virtual machines.
Each virtual machine has its own .vmx configuration file and .vmdk
files. Each of these virtual machines can be on a different datastore.
Primary Secondary
.vmx file .vmx file
Datastore 1
vmdk file vmdk file vmdk file
Datastore 2
vSphere Fault Tolerance Checkpoint
vSphere Fault Tolerance supports multiple processors.
Changes on the primary machine are not processed on the secondary machine. The memory is updated on the secondary.
ESXi FT Network Result X Result X Input ESXi
vSphere vMotion: Precopy
During a vSphere vMotion migration, a second virtual machine is created on the destination host. Then the memory of the source virtual machine is copied to the destination.
VM A vSphere vMotion Network Virtual Machine Port Group Memory Bitmap Memory Precopy VM A Virtual Machine End User
vSphere vMotion: Memory Checkpoint
In vSphere vMotion migration, checkpoint data is the last bit of memory that keeps changing.
VM A Memory Bitmap Checkpoint Data VM A Virtual Machine End User VM A vSphere vMotion Network Virtual Machine Port Group
vSphere Fault Tolerance Fast Checkpointing
The SMP FT checkpoint interval is dynamic by default. It adapts to
maximize the workload performance and can range from as small as a few milliseconds to as large as several hundred milliseconds.
Primary Host Secondary Host
Fault Tolerance Network vmx config Devices Disks VM memory checkpoint
Shared Files
vSphere Fault Tolerance has shared files:
• shared.vmft prevents UUID change.
• .ftgeneration is for the split-brain condition.
Primary Host Secondary Host
shared.vmft .ftgeneration
shared.vmft File
The shared.vmft file, which is found on a shared datastore, is the
vSphere Fault Tolerance metadata file and contains the primary and secondary instance UUIDs and the primary and secondary vmx paths.
UUID-1 UUID-2
Ref: UUID-1
UUID-1
Enabling vSphere Fault Tolerance on a Virtual Machine
You can turn on vSphere Fault Tolerance for a
virtual machine through the VMware vSphere® Web Client.
Review of Learner Objectives
You should be able to meet the following objectives:
• List VMware vSphere® Fault Tolerance requirements and limitations • Describe vSphere Fault Tolerance operation
Lesson 5:
vSphere Replication and vSphere
Data Protection
Learner Objectives
By the end of this lesson, you should be able to meet the following objectives:
• Describe VMware vSphere® Replication™
• Identify vSphere® Data Protection™ requirements • List vSphere Data Protection sizing guidelines
• Describe vSphere Data Protection installation and configuration
About vSphere Replication
vSphere Replication is an extension to vCenter Server.
It provides hypervisor-based virtual machine replication and recovery.
vSphere vSphere
Source Target
vSphere
Replication
How Replication Works
vSphere Replication enables replication of a virtual machine from a source site to a target site, monitoring and managing the status of the replication, and recovering the virtual machine at the target site.
Replication Between Two Sites
Steps for Full Recovery
vSphere Replication integrates with Volume
Shadow Copy Service through VMware Tools. 1. Right-click andselect Recover .
2. Select a target folder.
3. Select a target resource.
4. Click Finish.
About vSphere Data Protection
vSphere Data Protection is a robust, easily deployed, disk-based backup and recovery solution.
vSphere Data Protection Requirements and Architecture
vSphere Data Protection requires vCenter Server, either the Windows implementation or vCenter Server™ Appliance™.
Creating and Editing a vSphere Data Protection Backup Job
You create and edit a backup job on the Backup tab of the vSphere Data Protection UI in the vSphere Web Client.
Performing Restores with vSphere Data Protection
You can restore an entire virtual machine from the Restore tab in the vSphere Data Protection UI:
• The administrator can browse the list of protected virtual machines and select one or more restore points.
Review of Learner Objectives
You should be able to meet the following objectives:
• Describe VMware vSphere® Replication™
• Identify vSphere® Data Protection™ requirements • List vSphere Data Protection sizing guidelines
• Describe vSphere Data Protection installation and configuration