• No results found

VMware- Customer Support Day November 16, 2010

N/A
N/A
Protected

Academic year: 2022

Share "VMware- Customer Support Day November 16, 2010"

Copied!
159
0
0

Loading.... (view fulltext now)

Full text

(1)

VMware- Customer Support Day

November 16, 2010

(2)

Agenda

9:30 AM - Welcome/Kick-Off

Bob Good, Manager, Systems Engineering

9:40 AM - Support Engagement

Laura Ortman, Director, Global Support Services (GSS)

10:00 AM - Storage Best Practices

Ken Kemp, Escalation Engineer

11:00 AM - Keynote – VMware Virtualization and Cloud Management Doug Huber, Director, Systems Engineering

12:00 PM - Lunch/Q&A with the experts (Group A) /VMware Express – Private Viewing (Group B) 1:00 PM - Lunch/Q&A with the experts (Group B) / VMware Express – Private Viewing (Group A)

2:00 PM - View 4.5 Overview/Network Best Practices David Garcia, Release Readiness Manager 3:15 PM - Break

Intera cti ve Session

(3)

Storage Best Practices

Ken Kemp – Escalation Engineer, Global Support Services

(4)

Agenda

Performance

SCSI Reservations

Performance Monitoring

• esxtop

Common Storage Issues

• Snapshot LUN’s

• Virtual Machine Snapshot

• iSCSI Multi Pathing

• All Paths Dead (APD)

(5)

Disk subsystem bottlenecks cause more performance problems than CPU or RAM deficiencies

Your disk subsystem is considered to be performing poorly if it is experiencing:

• Average read and write latencies greater than 20 milliseconds

• Latency spikes greater than 50 milliseconds that last for more than a few seconds

Performance

(6)

Performance vs. Capacity comes into play at two main levels

• Physical drive size

• Hard disk performance doesn’t scale with drive size

• In most cases the larger the drive the lower the performance.

• LUN size

Larger LUNs increase the number of VM’s, which can lead to contention on that particular LUN

• LUN size is often times related to physical drive size which can compound performance problems

Performance vs. Capacity

(7)

You need 1 TB of space for an application

• 2 x 500GB 15K RPM SAS drives = ~300 IOPS

• Capacity needs satisfied, Performance low

• 8 x 146GB 15K RPM SAS drives = ~1168 IOPS

• Capacity needs satisfied, Performance high

Performance – Physical Drive Size

(8)

SCSI Reservations – when an initiator

requests/reserves exclusive use of a target(LUN)

• VMFS is a clustered file system

• Uses SCSI reservations to protect metadata

• To preserve the integrity of VMFS in multi host deployments

• One host has complete access to the LUN exclusively

• A reboot or release command will clear the reservation

• The virtual machine monitor users SCSI-2 reservations

SCSI Reservations – Why?

(9)

What causes SCSI Reservations?

• When a VMDK is created, deleted, placed in REDO mode, has a snapshot (delta) file, is migrated (reservations from the source ESX and from the target ESX) or when the VM is suspended (Since there is a suspend file written).

• When VMDK is created via a template, we get SCSI reservations on the source and target

• When a template is created from a VMDK, SCSI reservation is generated

SCSI Reservations

(10)

• Simplify/verify deployments so that virtual machines do not span more than one LUN

• This will ensure SCSI reservations do not impact more than one LUN

• Determine if any operations are occurring on a LUN on which you want to perform another operation

• Snapshots

• VMotion

• Template Deployment

• Use a single ESX server as your deployment server to

limit/prevent conflicts with other ESX servers attempting to perform similar operations

SCSI Reservation Best Practice

(11)

• Inside vCenter, limit access to actions that initiate

reservations to administrators who understand the effects of reservations to control WHO can perform such

operations

• Schedule virtual machine reboots so that only one LUN is impacted at any given time

• A power on and power off are considered separate operations and both with create a reservations

• VMotion

• Use care when scheduling backups. Consult the backup provider best practices information

• Use care when scheduling Anti Virus scans and updates

SCSI Reservation Best Practice - Continued

(12)

• Monitoring /var/log/vmkernel for:

• 24/0 0x0 0x0 0x0

• SYNC CR messages

• In a shared environment like ESX there will be some SCSI reservations. This is normal. But when you see 100’s of them it’s not normal.

• Check for Virtual Machines with snapshots

• Check for HP management agents still running the storage agent

• Check LUN presentation for Host mode settings

• Call VMware support to dig into it further

SCSI Reservation Monitoring

(13)

Storage Performance Monitoring

Ken Kemp – Escalation Engineer, Global Support Services

(14)

esxtop

(15)

DAVG = Raw response time from the device

KAVG = Amount of time spent in the VMkernel, aka. virtualization overhead

GAVG = Response time that would be perceived by virtual machines D + K = G

esxtop - Continued

(16)

esxtop - Continued

(17)

esxtop - Continued

(18)

• What are correct values for these response times?

• As with all things revolving around performance, it is subjective

• Obviously the lower these numbers are the better

• ESX will continue to function with nearly any response time, however how well it functions is another issue

• Any command that is not acknowledged by the SAN within 5000ms (5 seconds) will be aborted. This is where perceived disk performance takes a sharp dive

esxtop - Continued

(19)

Common Storage Issues

Ken Kemp – Escalation Engineer, Global Support Services

(20)

 How a LUN is detected as a snapshot in ESX?

• When an ESX 3.x server finds a VMFS-3 LUN, it compares the SCSI_DiskID information returned from the storage array with the SCSI_DiskID information stored in the LVM Header.

• If the two IDs do not match, the VMFS-3 volume is not mounted.

 A VMFS volume on ESX can be detected as a snapshot for a number of reasons:

• LUN ID change

• SCSI version supported by array changed (firmware upgrade)

• Identifier type changed – Unit Serial Number vs NAA ID

Snapshot LUNs

(21)

Resignaturing Methods

ESX 3.5

 Enable LVM Resignaturing on the first ESX host

Configuration > Advanced Settings > LVM > LVM.EnableResignaturing to 1.

ESX 4

 Single Volume Resignaturing

Configuration > Storage > Add Storage > Disk / LUN

Select Volume to Resignature > Select Mount, or Resignature

Snapshot LUNs - Continued

(22)

What is a Virtual Machine Snapshot?

• A snapshot captures the entire state of the virtual machine at the time you take the snapshot.

• This includes:

 Memory state – The contents of the virtual machine’s memory.

 Settings state – The virtual machine settings.

 Disk state – The state of all the virtual machine’s virtual disks.

Virtual Machine Snapshots

(23)

Common issues:

• Snapshots filling up a Data Store

• Offline commit

• Clone VM

• Parent has changed.

• Contact VMware Support

• No Snapshots Found

• Create a new snapshot, then commit.

Virtual Machine Snapshot - Continued

(24)

ESX 4, Set Up Multi-pathing for Software iSCSI

 Prerequisites:

• Two or more NICs.

• Unique vSwtich.

• Supported iSCSI array.

• ESX 4.0 or higher

ESX4 iSCSI Multi-pathing

(25)

Using the vSphere CLI, connect the software iSCSI initiator to the iSCSI VMkernel ports.

 Repeat this command for each port.

esxcli swiscsi nic add -n <port_name> -d <vmhba>

Verify that the ports were added to the software iSCSI initiator by running the following command:

esxcli swiscsi nic list -d <vmhba>

 Use the vSphere Client to rescan the software iSCSI initiator.

ESX4 iSCSI Multi-pathing - Continued

(26)

This example shows how to connect the software iSCSI initiator vmhba33 to VMkernel ports vmk1 and vmk2.

 Connect vmhba33 to vmk1:

esxcli swiscsi nic add -n vmk1 -d vmhba33

 Connect vmhba33 to vmk2:

esxcli swiscsi nic add -n vmk2 -d vmhba33

 Verify vmhba33 configuration:

ESX4 iSCSI Multi-pathing - Continued

(27)

The Issue

 You want to remove a LUN from a vSphere 4 cluster

 You move or Storage vMotion the VMs off the datastore who is being removed (otherwise, the VMs would hard crash if you just yank out the datastore)

 After removing the LUN, VMs on OTHER datastores would become unavailable (not crashing, but becoming periodically unavailable on the network)

 the ESX logs would show a series of errors starting with

―NMP‖

All Paths Dead (APD)

(28)

 Workaround 1

 In the vSphere client, vacate the VMs from the datastore being removed (migrate or Storage vMotion)

 In the vSphere client, remove the Datastore

 In the vSphere client, remove the storage device

 Only then, in your array management tool remove the LUN from the host.

 In the vSphere client, rescan the bus.

 Workaround 2

 Only available in ESX/ESXi 4 U1

 esxcfg-advcfg -s 1 /VMFS3/FailVolumeOpenIfAPD

All Paths Dead - Continued

(29)

4.1 Storage Additions

Storage I/O Control which allows us to prioritize I/O from Virtual Machines residing on different ESX servers but using the same shared VMFS volume.

New I/O statistics, including NFS throughput and latency counters.

vStorage API for Array Integration (VAAI) which allow the

offloading of certain storage operations such as cloning and

zeroing operations from the host to the array.

(30)

Questions

(31)

VMware View 4.5 Overview

David Garcia Jr - Global Support Services

(32)

Agenda

View (Overview)

User Experience (Highlights)

Performance & Scalability (Tiered Storage, View Composer)

Management (View Manager)

(33)

Hypervisor Performance

Storage Infrastructure Performance

vCenter Performance

Client Performance

vCENTER SERVER VIEW SERVER

VMware View Performance

Storage Infrastructure Network

Infrastructure Server

and

Virtualization stack

View Server and

Remote Clients

VDI deployment scope

(34)

View 4.5 Architecture overview

Support for vSphere 4.1 and vCenter 4.1 - Delivers integration with the most widely-deployed desktop virtualization platform in the industry.

Takes advantage of optimizations for View virtual desktops.

Lowest Cost Reference Architectures - VMware has worked with partners such as Dell, HP, Cisco, NetApp, and EMC to provide prescriptive reference architectures to enable

(35)

View 4.5 Product highlights

Full Windows 7 Support

View Manager Enhancements

• Increasing Scale and Efficiency

• System and User Diagnostics

• Extensibility

PCoIP Updates: Smart Card Support

View Client with Local Mode (aka Offline Support)

Support for vSphere 4.1

(36)

Native Windows Client Thin- Client Support

Thick clients or refurbished PCs

Broad industry support

Flexible client access from multiple devices

Mac OS 10.5+

Native Mac Client (RDP)

NEW

(37)

Single Sign On

Authentication to Virtual Desktop

Windows Username/Password

Smart Cards/Proximity Cards

Client Based (MAC Address)

USB connected biometric devices

Integration with MS AD

• No Domain change, schema change, password change

Supports ―Tap and Go‖

Functionality

• Integrates with SSO Vendors – Imprivata, Sentillion, Juniper, etc

Simplified Sign-on

Connection Server

Single sign on to virtual desktop and apps

(38)

Web download portal

Enhanced capability to manage distribution of full View Windows Client including PCoIP, ThinPrint and USB redirection features

Ability to distribute current and legacy versions of View Client

Broker URL automatically passed to Windows client upon launch

Experimental Java based Mac and Linux Web Access no longer

supported (use installable Mac Client in View 4 and View Open Client for Linux)

(39)

Value propositions of local desktops

For IT

Extend View benefits to mobile users with laptops

Enable Bring Your Own PC (BYOPC) programs for employees &

contractors

Extend View benefits to remote/branch offices with poor/unreliable networks

For End Users

Mobility – check out VM to local laptop for offline usage

Disaster Recovery – VM replicated to datacenter

Flexibility – BYOPC and personal desktop productivity

Windows Guest

VM 1

View Client with Local Mode

Guest VM 2

(40)

High Level Features View in

2010 Details

Run anywhere After initial checkout, desktop can be used at home or on the road w/o network connectivity.

Broad hardware support Works with almost any modern laptop today.

Encrypted and secure AES Encryption of Desktop and centrally managed policies to control access and usage.

Data centralization & control Admin can pull all data back up to datacenter on demand.

High quality user experience Support for Win7 Aeroglass Effects, DirectX 9 w/3D, distortion-free sound & multimedia.

Reasonable CAPEX costs Up & running in with a single ESX box & local storage!

Disaster recovery options Can schedule data replication to server for rapid, seamless recovery from hardware loss or failure.

Single Image Management w/View Works off same management infrastructure & images as rest of View deployment.

High level features of local desktops in 2010

(41)

View 4.5 major management feature highlights

Up-to 10000 Desktops Admin Features

High perf GUI

Role based Admin

Event DB, Dashboard

View Power CLI extension

Storage Optimization

Tiered storage

Disposable disk/Local swap file redirection

VM on local storage Composer Enhancements

Sysprep support

Fast refresh

Persistent Disk Management

Simplified Sign-on

Smart-card/Proximity card

Client (MAC/device ID), support of Kiosk mode

ThinApp Integration

App repo scanning

Pool/Desktop ThinApp assignment

(42)

Core broker: Performance & scalability

10,000 VM Pod (5 connection servers + 2 standby)

Federated Pool Management

Connection server instance in a cluster will be responsible for VM operations on VMs belonging to the same pool

Reduced locking/synchronization overhead

Enhanced tracker w/ caching

Reduced extra reloading from ADAM Datastore

Refresh UI with 5,000 objects in seconds!

(43)

View Composer improvements overview

Customization/Provisioning

• Sysprep support

• Refresh, Recompose and Rebalance for Floating Pool

Storage Performance and Optimization

• Tiered support

• Optimization

• Disposable disk and Local swap file redirect

• Allow creation of linked-clones on local storage

Management

• Full Management of Persistent Disk (formerly known as UDD)

(44)

View Composer: Tiered storage

Allow master VM replica to reside in a separate datastore

Use high performance storage to boost performance (e.g. reboot, virus scan)

(45)

View Composer: Other storage optimization

Local swap file redirect

• Not reducing storage but allow the use of cheap local storage for individual VM swap file

Allow creation of linked-clones using local data stores

• Wizard will not filter out local data stores for use of VM cloning

• Allow use of cheap local storage for non-persistent pool VMs

(46)

View Composer: Customization/provisioning

•Sysprep support

• Sysprep helps resolve the SID management issue: a new SID will be generated for each cloned VM

•The Three ‗R‘s

• Refresh

• Recompose

• Rebalance

(47)

View Composer: Enhanced management functions

Persistent Disk (formerly known as UDD) Management

• Detach/Migrate/Archive/Reattach

• Managed as ―first class object‖

Garbage collection scripts

• Remove one or more linked-clone VM(s) by name(s) from

View, SVI, VC, and AD

(48)

Administration improvements in 2010

Provides Increased Management Efficiency:

Monitoring, Diagnostics and Supportability

Features

• Scalable Admin UI in Flex

• Role-based Administration

• System and End-User Troubleshooting

• Monitoring Dashboard

• Diagnostics

• Supportability

• Reporting and Auditing Enablement

• Events

• View Management Pack for SCOM

(49)

Scalable admin UI

Based on Adobe Flex

Rich application feel

Scalability

Easy navigation

Cross-Platform

(50)

Role-based administration

Delegated

administration

Flexible Roles

Helpdesk, etc

Custom roles

LDAP-based access control on folders

(51)

System and end-user troubleshooting: Dashboard

Surface key information to administrators

Drill-down as needed

• Locate root cause

System health status

View components

vCenter components

Status of desktops

Status of client-hosted endpoints

• Datastore usage

VMs on storage LUN

(52)

Reporting and auditing enablement: Events

Formally defined events

• Events have a unique well defined identifier

• Standard attributes include module, user, desktop, machine

Provides a unified view across View components

• No more needing to review logs on each broker, agent!

Managed with a configurable database

Accessible with:

• VMware View Administrator

• Direct access (SQL) for other reporting tools

• Powershell

• Vdmadmin provides textual reports (csv or xml)

(53)

View management pack for SCOM

(54)

Links & Resources

Documentation, Release Notes http://www.vmware.com/support/pubs/view_pubs.html

VMware View 4.5 Release Notes

VMware View Architecture Planning Guide

VMware View Administrator's Guide

VMware View Installation Guide

VMware View Upgrade Guide

VMware View Integration Guide

Technical Papers http://www.vmware.com/resources/techresources/cat/91,156

VMware View Optimization Guide for Windows 7 VMware Ensynch 09/27/2010

Vblock Powered Solutions for VMware View VMware Cisco EMC 09/09/2010

Virtual Desktop Sizing Guide with VMware View 4.0 and VMware vSphere 4.0 Update1 Mainline 05/21/2010

Application Presentation to VMware View Desktops with Citrix XenApp VMware 05/20/2010

(55)

Questions

(56)

vSphere Networking Best Practices

David Garcia Jr - Global Support Services

(57)

Agenda

vSwitches & Portgroups

Nic Teaming

Link Aggregation (802.3ad static mode)

Failover Configuration

Spanning Tree Protocol

Network I/O Control

Load-Based Teaming

VmDirectpath, Vmxnet3, FCOE CNA & 10GB

VLAN Trunking (802.1q)

Tips & Tricks

Troubleshooting Tips

Must Read & KB Links

(58)

Designing the Network

How do you design the virtual network for performance and availability and but maintain isolation between the various traffic types

(e.g. VM traffic, VMotion, and Management)?

Starting point depends on:

Number of available physical ports on server

Required traffic types

2 NIC minimum for availability, 4+ NICs per server preferred

802.1Q VLAN trunking highly recommended for logical scaling (particularly with low NIC port servers)

Examples are meant as guidance and do not represent strict requirements in terms of design

(59)

ESX Virtual Switch: Capabilities

Layer 2 switch—forwards frames based on 48-bit destination MAC address in frame

MAC address known by registration (it knows its VMs!)—no MAC learning required

Can terminate VLAN trunks (VST mode) or pass trunk through to VM (VGT mode)

Physical NICs associated with Switches

NIC teaming (of uplinks)

• Availability: uplink to multiple physical switches

• Load sharing: spread load over uplinks

VM0 VM1

vSwitch

MAC address assigned to

vnic

(60)

ESX Virtual Switch: Forwarding Rules

The vSwitch will forward frames

• VM  VM

• VM  Uplink

But not forward

• vSwitch to vSwitch

• Uplink to Uplink

ESX vSwitch will not create loops in the physical network

And will not affect Spanning Tree

VM0 VM1

vSwitch

Physical

vSwitch

MAC a MAC b MAC c

(61)

Port Group Configuration

A Port Group is a template for one or more ports with a common configuration

Assigns VLAN to port group members

L2 Security—select ―reject‖ to see only frames for VM MAC addr

Promiscuous mode/MAC address change/Forged transmits

Traffic Shaping—limit egress traffic from VM

Load Balancing—Origin VPID, Src MAC, IP-Hash, Explicit

Failover Policy— Link Status & Beacon Probing

Notify Switches—‖yes‖-gratuitously tell switches of mac location

Failback—‖yes‖ if no fear of blackholing traffic, or, …

… use Failover Order in ―Active Adapters‖

Distributed Virtual Port Group (vNetwork Distributed Switch)

All above plus:

Bidirectional traffic shaping (ingress and egress)

Network VMotion—network port state migrated upon VMotion

(62)

NIC Teaming for Load Sharing & Availability

NIC Teaming aggregates multiple physical uplinks for:

Availability—reduce exposure to single points of failure (NIC, uplink, physical switch)

Load Sharing—distribute load over multiple uplinks (according to selected NIC teaming algorithm)

Requirements:

Two or more NICs on same vSwitch

Teamed NICs on same L2 broadcast domain

VM0 VM1

vSwitch

NIC Team

(63)

NIC Teaming with vDS

Teaming Policies Are Applied in DV Port Groups to dvUplinks

Service Console

vmkernel

esx10b.tml.local

A B

Service Console

vmkernel

esx10a.tml.local

A B

esx09b.tml.local esx09a.tml.local

―Orange‖ DV Port Group Teaming Policy

0 1 2 3

vmnic0 esx09a.tml.local vmnic0 esx09b.tml.local vmnic0 esx10a.tml.local vmnic2 esx10b.tml.local

vmnic1 esx09a.tml.local vmnic1 esx09b.tml.local vmnic1 esx10a.tml.local vmnic0 esx10b.tml.local

vmnic2 esx09a.tml.local vmnic2 esx09b.tml.local vmnic2 esx10a.tml.local vmnic3 esx10b.tml.local

vmnic3 esx09a.tml.local vmnic3 esx09b.tml.local vmnic3 esx10a.tml.local vmnic1 esx10b.tml.local

vDS

vmnic2 vmnic0 vmnic1 vmnic3

vmnic0 vmnic1 vmnic2 vmnic3

KB - vNetwork Distributed Switch on ESX 4.x - Concepts Overview (1010555)

(64)

NIC Teaming Options

Name Algorithm—vmnic

chosen based upon:

Physical Network Considerations

Originating Virtual Port ID

vnic port Teamed ports in same L2 domain (BP: team over two physical switches) Source MAC

Address

MAC seen on vnic Teamed ports in same L2 domain (BP: team over two physical switches) IP Hash* Hash(SrcIP, DstIP) Teamed ports configured in static

802.3ad ―Etherchannel‖

- no LACP

- Needs MEC to span 2 switches Explicit Failover

Order

Highest order uplink from active list

Teamed ports in same L2 domain (BP: team over two physical switches)

Best Practice: Use Originating Virtual PortID for VMs

(65)

Link Aggregation

(66)

Link Aggregation - Continued

EtherChannel

is a port trunking (link aggregation is Cisco's term) technology used primarily on Cisco switches

Can be created from between two and eight active Fast Ethernet, Gigabit Ethernet, or 10 Gigabit Ethernet ports

LACP or IEEE 802.3ad

Link Aggregation Control Protocol (LACP) is included in IEEE specification as a method to control the bundling of several physical ports together to form a single logical channel

Only supported on Nexus 1000v

EtherChannel vs. 802.3ad

EtherChannel and IEEE 802.3ad standards are very similar and accomplish the same goal

There are a few differences between the two, other than EtherChannel is Cisco proprietary and 802.3ad is an open standard

EtherChannel Best Practice

One IP to one IP connections over multiple NICs are not supported (Host A one connection session to Host B uses only one NIC)

Supported Cisco configuration: EtherChannel Mode ON – ( Enable Etherchannel only)

Supported HP configuration: Trunk Mode

Supported switch Aggregation algorithm: IP-SRC-DST short for (IP-Source-Destination) Global Policy on Switch

(67)

Failover Configurations

• Link Status Only relies solely on the link status provided by the network adapter

•Detects failures such as cable pulls and physical switch power failures

•Cannot detect configuration errors

•Switch port being blocked by spanning tree

•Switch port configured for the wrong VLAN

•cable pulls on the other side of a physical switch.

• Beacon Probing sends out and listens for beacon probes

•Ethernet broadcast frames sent by physical adapters to detect upstream network connection failures

•on all physical Ethernet adapters in the team, as shown in Figure

•Detects many of the failures mentioned above that are not detected by link status alone

•Should not be used as a substitute for a redundant Layer 2 network design

•Most useful to detect failures in the closest switch to the ESX Server hosts

•Beacon Probing Best Practice

•Use at least 3 NICs for triangulation

•If only 2 NICs in team, probe can’t determine which link failed

•Shotgun mode results

•KB - What is beacon probing? (1005577)

•KB - ESX host network flapping error when Beacon Probing is selected (1012819)

•KB - Duplicated Packets Occur when Beacon Probing Is Selected Using vmnic and VLAN Type 4095 (1004373)

•KB - Packets are duplicated when you configure a portgroup or a vSwitch to use a route that is based on IP-hash and Beaconing Probing policies simultaneously (1017612)

Figure — Using beacons to detect upstream network connection failures.

(68)

Spanning Tree Protocol (STP) Considerations

Spanning Tree Protocol used to create loop-free L2 tree topologies

in the physical network

Some physical links put in ―blocking‖ state to construct loop-free tree

ESX vSwitch does not participate in Spanning Tree and will not create loops with uplinks

ESX Uplinks will not block and always active (full use of all links)

VM0 VM1

vSwitch

Physical Switches

MAC a MAC b

Switches sending BPDUs every 2s to

construct and maintain Spanning

Tree Topology vSwitch drops

BPDUs

Blocked link

Recommendations for Physical Network Config:

1. Leave Spanning Tree enabled on physical network and ESX facing ports (i.e. leave it as is!)

2. Use ―portfast‖ or ―portfast trunk‖ on ESX facing ports (puts ports in forwarding state immediately)

(69)

ESX 4.1 Introduces Network I/O Control

VMware® vSphere™ 4.1 (―vSphere‖) introduces a number of enhancements and new features to virtual networking.

Network I/O Control (NetIOC)—flexibly partition and assure service for ESX/ESXi traffic types and flows on a vNetwork Distributed Switch (vDS)

Load-Based Teaming (LBT)—an additional and selectable load-balancing policy on the vDS to enable dynamic adjustment of the load distribution over a team of NICs

Network performance—vmkernel TCP/IP stack and guest virtual-machine network performance enhancements

Scale—enhancements to network scaling with the vDS

IPv6 NIST Compliance—IPv6 enhancements to comply with U.S. National Institute of Standards and Technology (NIST) Host Profile

Cisco Nexus 1000V Enhancements—support for new features and enhancements on the Cisco Nexus 1000V

(70)

Network I/O Control Usage

(71)

Load-Based Teaming (LBT)

LBT is another traffic-management feature of the vDS introduced with vSphere 4.1. LBT avoids network congestion on the ESX/ESXi host uplinks caused by imbalances in the mapping of traffic to those uplinks.

LBT enables customers to optimally use and balance network load over the available physical uplinks attached to each ESX/ESXi host.

LBT helps avoid situations where one link may be congested, while other links may be relatively underused.

How LBT works

LBT dynamically adjusts the mapping of virtual ports to physical NICs to best balance the network load entering or leaving the ESX/ESXi 4.1 host. When LBT detects an ingress- or egress- congestion condition on an uplink, signified by a mean utilization of 75% or more over a 30-second period, it will attempt to move one or more of the virtual ports to vmnic-mapped flows to lesser-used links within the team.

Configuring LBT

LBT is an additional load-balancing policy available within the teaming and failover of a dvPortGroup on a vDS. LBT appears as the ―Route based on physical NIC load.‖

*LBT is not available on the vNetwork Standard Switch (vSS).

(72)

VMXNET3—The Para-virtualized VM Virtual NIC

• Next evolution of ―Enhanced VMXNET‖ introduced in ESX 3.5

• Adds

MSI/MSI-X support (subject to guest operating system kernel support)

Receive Side Scaling (supported in Windows 2008 when explicitly enabled through the device's Advanced configuration tab)

Large TX/RX ring sizes (configured from within the virtual machine)

High performance emulation mode (Default)

• Supports

High DMA

TSO (TCP Segmentation Offload) over IPv4 and IPv6

TCP/UDP checksum offload over IPv4 and IPv6

Jumbo Frames

(73)

VMDirectPath for VMs

I/O Device

Device Driver

Virtual Layer

What is it?

Enables direct assignment of PCI devices to VM

Types of workloads

I/O Appliances

High performance VMs

Details

Guest controls the physical H/W

Requirements

vSphere 4

I/O MMU

Used for DMA Address Translation (Guest Physical  Host Physical) and protection

Generic device reset (FLR, Link Reset, ...)

KB - Configuring VMDirectPath I/O pass-through devices on an ESX host (1010789)

(74)

FCoE on ESX

VMware ESX Support

• FCoE supported since ESX 3.5u2

• Requires Converged Network Adapters ―CNAs‖—(see HCL) e.g.

Emulex LP21000 Series

Qlogic QLE8000 Series

• Appears to ESX as:

10GigE NIC

FC HBA

• SFP+ pluggable transceivers

Copper twin-ax (<10m)

Optical

10GigE NIC

Fibre Channel

HBA

vSwitch

FCoE Switch

Fibre Channel Ethernet

FCoE

CNA—Converged Network Adapter

ESX

(75)

Using 10GigE

2x 10GigE common/expected

10GigE CNAs or NICs

Possible Deployment Method

Active/Standby on all Portgroups

VMs ―sticky‖ to one vmnic

SC/vmk ports sticky to other

Use Ingress Traffic Shaping to control traffic type per Port Group

If FCoE, use Priority Group bandwidth reservation (on CNA utility)

vSwitch

iSCSI NFS VMotion FT SC

FCoE FCoE

SC#2

FCoE 10

FCoE Priority Group bandwidth reservation

(in CNA config utility) Gbps

10GE 10GE

Ingress (into switch) traffic shaping policy control on Port Group

1-2G Low b/w

High b/w Variable/high

b/w 2Gbps+

(76)

Traffic Types on a Virtual Network

Virtual Machine Traffic

Traffic sourced and received from virtual machine(s)

Isolate from each other based on service level VMotion Traffic

Traffic sent when moving a virtual machine from one ESX host to another

Should be isolated Management Traffic

Should be isolated from VM traffic (one or two Service Consoles)

If VMware HA is enabled, includes heartbeats

IP Storage Traffic—NFS and/or iSCSI via vmkernel interface

Should be isolated from other traffic types

Fault Tolerance (FT) Logging Traffic

Low latency, high bandwidth

(77)

VLAN Trunking to Server

IEEE 802.1Q VLAN Tagging

Enables logical network partitioning (Traffic separation)

Scale traffic types without scaling physical NICs

Virtual machines connect to virtual switch ports (like access ports on physical switch)

Virtual switch ports are associated

with a particular VLAN (VST mode)—defined in PortGroup

Virtual switch tags packets exiting host

VM0 VM1

vSwitch

PortGroup

―Blue‖

VLAN 20 Port Group

―Yellow‖

VLAN 10

VLAN Trunks Carrying VLANs 10, 20

802.1Q Header

8100 12-bit VLAN id

field (0-4095)

(78)

VLAN Tagging Options

vSwitch

Physical Switch

vSwitch

Physical Switch

vSwitch

Physical Switch

VST – Virtual Switch Tagging VGT – Virtual Guest Tagging EST – External Switch Tagging

VLAN Tags applied in

vSwitch

VLAN Tags applied in

Guest

PortGroup set to VLAN

―4095‖

External Physical VLAN

assigned in Port Group

policy

(79)

VLAN Tagging: Further Example

KB -Sample configuration of virtual switch VLAN tagging (VST Mode) and ESX Server (1004074) Uplinks A, B, and C connected to trunk ports on physical switch which carry four VLANs

(e.g. VLANs 10, 20, 50, 90)

Ports 1-14 emit untagged frames, and only those frames which were tagged with their respective VLAN ID (equivalent to ―access port‖ on physical switch)

Port Group VLAN ID set to one of 1-4094 Port 15 emits tagged frames for all VLANs.

Port Group VLAN ID set to 4095 (for vSS) or ―VLAN Trunking‖ on vDS DV Port Group

13

10 11 12 14

1 2 3 4 5 6 7 8 9

A C

15

B

VLAN Trunks Carrying VLANs

10, 20, 50, 90 Access Ports

on VLAN 10

Access Ports on VLAN 20

Access Ports on VLAN 50

All VLANs (10,20,50,90)

trunked to VM

interface GigabitEthernet1/2 description host32-vmnic0 switchport trunk encapsulation dot1q

switchport trunk native vlan 999 switchport trunk allowed vlan 10,20,50,90

switchport mode trunk spanning-tree portfast trunk

Example configuration on Physical Switch

(80)

Private VLANs: Traffic Isolation for Every VM

Solution: PVLAN

• Place VMs on the same virtual network but prevent them from communicating directly with each other (saves VLANs!)

• Avoids scaling issues from assigning one VLAN and IP subnet per VM

Details

• Instead, configure a SINGLE DV port group to have a SINGLE isolated*

VLAN (ONLY ONE)

• Attach all your VMs to this SINGLE isolated VLAN DV port group

Distributed Switch with

PVLAN

Private VLAN traffic isolation between guest VMs

Common Primary VLAN

on uplinks

(81)

W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B

vNetwork Distributed Switch

PG PG PG PG PG PG PG PG PG PG PG PG

TOTAL COST: 12 VLANs (one per VM)

TOTAL COST: 1 PVLAN (over 90% savings…)

W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B W2003EE-32-A W2003EE-32-B

vNetwork Distributed Switch

PG (with Isolated PVLAN)

Private VLANs - Continued

(82)

Tips & Tricks

KB - Changing a MAC address in a Windows virtual machine (1008473)

When a physical machine is converted into a virtual machine, the MAC address of the network adapter is changed. This can pose a problem when software is installed where the licensing is tied to the MAC address.

KB – Configuring speed and duplex of an ESX Server host network adapter (1004089)

ESX recommended settings for Gigabit-Ethernet speed and duplex while connecting to a physical switch port are as following:

Auto Negotiate <-> Auto Negotiate

It is not recommended to mix hard-coded setting with Auto-negotiate.

KB - Sample Configuration - Network Load Balancing (NLB) Multicast mode over routed subnet - Cisco Switch Static ARP Configuration (1006525)

NLB Multicast Mode – Static ARP Resolution

Since NLB packets are unconventional, meaning the IP address is Unicast while the MAC address of it is Multicast, switches and routers drop NLB packets

NLB Multicast Packets get dropped by routers and switches, causing the ARP tables of switches to not get populated with cluster IP and MAC address

(83)

Troubleshooting Tips

(84)

Troubleshooting with Esxtop

(85)

Esxtop Traffic

(86)

Capturing Traffic

(87)

ESX tcpdump

(88)

Wireshark in a VM

(89)

Must Read…

http://www.vmware.com/technical-resources/virtual-networking/

Conclusion

This study compares performance results for e1000 and vmxnet virtual network devices on 32-bit and 64-bit guest operating systems using the netperf benchmark. The results

show that when a virtual machine is running with software virtualization, e1000 is better in some cases and vmxnet is better in others. Vmxnet has lower latency, which sometimes

comes at the cost of higher CPU utilization. When hardware virtualization is used, vmxnet clearly provides the best

performance.

Conclusion

VMXNET3, the newest generation of virtual network adapter from VMware, offers performance on par with or better than its previous generations in both Windows and Linux guests. Both the driver

and the device have been highly tuned to perform better on modern systems. Furthermore, VMXNET3 introduces new features and enhancements, such as TSO6 and RSS. TSO6 makes it especially useful for users deploying applications that

deal with IPv6 traffic, while RSS is helpful for deployments requiring high scalability. All these features give VMXNET3 advantages that are not possible with previous generations of virtual network adapters. Moving forward, to keep pace with an ever‐increasing demand for network bandwidth, we recommend customers migrate to VMXNET3 if performance is of top concern

to their deployments.

Technical Papers

(90)

KB Links

KB - Cisco Discovery Protocol (CDP) network information via command line and VirtualCenter on an ESX host (1007069)

Utilizing Cisco Discovery protocol (CDP) to get switch port configuration information.

This command is utilized to troubleshoot network connectivity issues related to VLAN tagging methods on virtual and physical port settings.

KB - Troubleshooting network issues with the Cisco show tech-support command (1015437)

If you experience networking issues between vSwitch and physical switched environment, you can obtain information about the configuration of a Cisco router or switch by running the show tech-support command in privileged EXEC mode.

Note: This command does not alter the configuration of the router.

KB - ESX host or virtual machines have intermittent or no network connectivity (1004109)

KB - Troubleshooting Nexus 1000V vDS network issues (1014977)

KB - Cisco Nexus 1000V installation and licensing information (1013452)

Cisco Nexus 1000V Troubleshooting Guide, Release 4.0(4)SV1(2) 20/Jan/2010

Cisco Nexus 1000V Troubleshooting Guide, Release 4.0(4)SV(1) 21/Jan/2010

(91)

KB Links - Continued

KB - Troubleshooting network connection issues using Address Resolution Protocol (ARP) (1008184)

IEEE OUI and Company id Assignments http://standards.ieee.org/regauth/oui/index.shtml

KB - Network performance issues (1004087)

KB - Low Network Throughput in Windows Guest when Running UDP Application (5298153)

KB - Performance of Outgoing UDP Packets Is Poor (10172)

KB - Poor Network File Copy performance between local VMFS and shared VMFS (1003554)

KB - Cannot connect to ESX 4.0 host for 30-40 minutes after boot (1012942)

Ensure that DNS is configured and reachable from the ESX host

KB - Identifying issues with and setting up name resolution on ESX Server (1003735)

Note: localhost must always be present in the hosts file. Do not modify or remove the entry for localhost

The hosts file must be identical on all ESX Servers in the cluster

There must be an entry for every ESX Server in the cluster

Every host must have an IP address, Fully Qualified Domain Name (FQDN), and short name

The hosts file is case sensitive. Be sure to use lowercase throughout the environment

(92)

Questions

(93)

ESXi Readiness

Planning your migration to VMware ESXi, the next-generation hypervisor architecture.

David Garcia Jr - Global Support Services

(94)

The Gartner Group says…

―The major benefit of ESXi is the fact that it is more lightweight — under 100MB versus 2GB for VMware ESX with the service

console.‖

―Smaller means fewer patches‖

―It also eliminates the need to manage a separate Linux console (and the Linux skills needed to manage it)…‖

As of August 2010 ―VMware users should put a plan in place to

migrate to ESXi during the next 12 to 18 months.‖

(95)

VMware ESX

Hypervisor Architecture

VMware ESXi

Hypervisor Architecture

• Code base disk footprint: <100 MB

• VMware agents ported to run directly on VMkernel

• Authorized 3rd party modules can also run in VMkernel to provide hw monitoring and drivers

• Other capabilities necessary for integration into an enterprise datacenter are provided natively

•No other arbitrary code is allowed on the system

• Code base disk footprint: ~ 2GB

• VMware agents run in Console OS

• Nearly all other management functionality provided by agents running in the Console OS

• Users must log into Console OS in order to run commands for configuration and diagnostics

VMware ESXi and ESX hypervisor architectures comparison

(96)

Call to action for customers

Start testing ESXi

If you‘ve not already deployed, there‘s no better time than the present

Ensure your 3

rd

party solutions are ESXi Ready

Monitoring, backup, management, etc. Most already are.

Bid farewell to agents!

Familiarize yourself with ESXi remote management options

Transition any scripts or automation that depended on the COS

Powerful off-host scripting and automation using vCLI, PowerCLI, …

Plan an ESXi migration as part of your vSphere upgrade

Testing of ESXi architecture can be incorporated into overall vSphere

(97)

Visit the ESXi and ESX Info Center today

http://vmware.com/go/ESXiInfoCenter

(98)

Questions

(99)

Break

(100)

vSphere 4 - Performance Best Practices

Kenneth Kemp, Escalation Engineer

(101)

Agenda

Technical Guides

ESX 4.x Performance & Troubleshooting

• Memory

• CPU

vCenter Performance & Troubleshooting

• High Availability

• Distributed Resource Scheduler

• Fault Tolerance

• Resource Pool Designs

• HW Considerations and Settings

(102)

Technical Guides

(103)

Memory

(104)

Memory – Resource Types

 When assigning a VM a ―physical‖ amount of RAM, all you are really doing is telling ESX how much memory a given VM process will

maximally consume past the overhead.

 Whether or not that memory is physical depends on a few factors: Host configuration, DRS shares/Limits/Reservations and host load.

 Generally speaking, it is better to OVER-commit than UNDER-commit.

(105)

Memory – Overhead & Reclamation

ESX memory space overhead

Service Console: 272 MB VMkernel: 100 MB+

Per-VM memory space overhead increases with:

Number of VCPUs Size of guest memory 32 or 64 bit guest OS

ESX memory space reclamation

Page sharing Ballooning

(106)

Memory – Page Tables

Page tables

ESX cannot use guest page tables

ESX Server maintains shadow page tables

Translate memory addresses from virtual to machine Per process, per VCPU

VMM maintains physical (per VM) to machine maps No overhead from ―ordinary‖ memory references

Overhead

Page table initialization and updates Guest OS context switching

VA

PA

MA

(107)

Memory – Over-commitment & Sizing

Avoid high active host memory over-commitment

• Total memory demand = active working sets of all VMs + memory overhead

– page sharing

• No ESX swapping: total memory demand < physical memory

Right-size guest memory

• Define adequate guest memory to avoid guest swapping

• Per-VM memory space overhead grows with guest memory

(108)

Memory – NUMA considerations

Increasing a VM‘s memory on a NUMA machine

Will eventually force some memory to be allocated from a remote node, which will decrease performance

Try to size the VM so both CPU and memory fit on one node

Node 0 Node 1

References

Related documents

American Economic Review, Quarterly Journal of Economics, Review of Economic Studies, Journal of Public Economics, Journal of Monetary Economics, Journal of Development

Berdasarkan latar belakang yang a kan diteliti dan judul penelitian yaitu “Penerapan Konseling Kelo mpok Cognitive Restructuring (CR) dapat menurunkan perila ku

The ultimate goal is to explore the meanings, complementation patterns and frequency of distribution of the evidential uses of the top object-oriented perception verbs look and

The second method proposed in this dissertation was a decomposition method that enabled the simultaneous solution of both distributed and lumped components via FDTD and

For the purpose of registering a public company in Hong Kong, you will need to provide us with the proposed name, the amount of registered capital, identity proof, such as Hong Kong

If the benchmark budget is defined as one where revenue and expenditure are constant as share of GDP, automatic stabilisations mainly stems from progressive

The premise for this model is that by identifying and understanding the key stages involved in the change process, the likelihood of effective change management is increased –