Produced in cooperation with:
HP Technology Forum & Expo 2009
© 2009 Hewlett-Packard Development Company, L.P.
The information contained herein is subject to change without notice
2972 – Linux
Options and Best
Practices for
Scale-up Virtualization
2 September 3, 09
Virtualization – The basics
Platform Virtualization
•
A virtual representation of the
computer
− Provides a virtual hardware platform
• Needs operating system & virtualization
• Guest OS completely independent from host OS
− Hypervisor resource manages guests/
domUs
− Management environment
− Host/dom0
− High degree of fault isolation
− Cooperative or non-cooperative guest
3 September 3, 09
Virtualization Basics
Platform Virtualization types
•
Full virtualization (or HVM)
− e.g. Xen, KVM, HP Integrity
Virtual Machine
− Multiple, unmodified, guests.
• Emulation or HW assist
− Technically: Any OS
•
Paravirtualization/
Cooperative Virtualization
− e.g. Xen, PV IO drivers
− Guest OS’ are ported to
4 September 3, 09
Virtualization – The basics
Operating System Virtualization
•
Virtualization at OS layer
−
Appears to be a discrete OS
instance
−
Guest & Host OS cannot differ
• Can support different minor versions
of same version
−
Referred to as containers or zones
−
Strong fault isolation not possible
• Normally not the goal
−
Applications:
5 September 3, 09
Platform
Virtualization
6 19 March 2007
Linux Virtualization
A Xen based architecture
Hardware Platform
Xen Hypervisor
Backend Driver Frontend driverGuest 0 (dom0)
Guest 1
(domU)
User Apps PV Guests Frontend driverGuest 2
(domU)
User AppsGuest n
(domU)
User Apps PV driver HardwareVirtual Machine Hardware VMs
Device driver
Oracle VM
•
New server virtualization software and support
− Free product download from the web
•
Based on Xen 3.x open source software
•Runs both Linux and Windows domains
− Supports paravirtualization on all hardware and hardware
virtualization (VT-i, VT-x, VT-d or AMD-V) on latest x86 hardware
− 64-bit and 32-bit guests
− Up to 64-way SMP
− Up to 32 virtual processors per guest
− Includes live migration at no additional cost
− Integrated, browser-based management console
− Free downloadable VM images
8 September 3, 09
Red Hat Virtualization
•
Bundled in RHEL v5.x
•Based on Xen 3.3
•
Runs both Linux and Windows
domains (guest)
− Supports paravirtualization on all
hardware and hardware virtualization
− 64-bit and 32-bit guests
− Includes live migration at no
additional cost
− X and web-based management tools
9 September 3, 09
Novell SLES/Xen Virtualization
•
Bundled with SLES 10 &11
− Based on Xen 3.3
•
SLES/Xen runs both Linux and
Windows domains (guest)
− Supports paravirtualization on all
hardware and hardware
virtualization (VT-i, VT-x, VT-d or AMD-V) on latest x86 hardware
− 64-bit and 32-bit guests
− Up to 64-way SMP
− Integration with Yast2
management tools
− High Availability software
Virtualization technologies
Linux Kernel Virtual Machines (KVM)
•
Uses standard Linux kernel
−
Kernel module for virtualization
•
Kernel Shared Memory (KSM)
−
Simply put:
• Share like memory pages between guests
• >10% savings w/o perf. impact
•
Virtual IO drivers
−
Back-ported by distros
•
Tech Preview in SLES 11
•
Announced for RHEL v5.4
11 19 March 2007
Linux Virtualization
KVM Architecture
Hardware Platform
Linux kernel
DeviceDriver Device Driver
KVM Module
Qemu Qemu
Guest 1 (vmX) Guest 2 (vmX)
User Apps User Apps Regular Linux
Application (User App)
Platform virtualization
Limits
12 9/3/09
Category Xen (OSS) KVM VMware
pCPUs 126 4096 64
vCPUs 32 16[1] 8
Memory: host/guest 1 TB/80 GB 4 TB / 1.4 TB[2] 1 TB / 255 GB Mem over-commit Balloon driver KSM Yes
NPT/EPT Yes Yes Yes
PCI pass-through Yes Yes[3] Yes[3]
“Accelerated” IO G, D D D
G = ParaVirtualized guest D = ParaVirtualized IO drivers
RHEL 5.3 Xen (PV guests)
AIO Read: 4 guests, 4 vCPUs, 1 vDisk, 30GB vRAM
13 9/3/09 0.0000 50.0000 100.0000 150.0000 200.0000 250.0000 300.0000 350.0000 400.0000 1 2 3 4 Ba nd w id th (MB/ s) Guests
AIOD READ - Full NUMA
0.0000 50.0000 100.0000 150.0000 200.0000 250.0000 300.0000 350.0000 400.0000 1 2 3 4 Ba nd w id th (MB/ s) Guests
Kernel Virtual Machine
AIO Read: 4 guests, 4vCPUs, 1 vDisk, 30GB vRAM
Kernel Virtual Machine
AIO Read: 8 guests, 4vCPUs, 1vDisk, 30GB vRAM
16 September 3, 09
OS Virtualization
17 9/3/09
Parallels Virtuozzo Containers
Advanced containers for Linux
Parallels Virtuozzo Containers sits on top of a standard Linux distribution
Each Virtual Private Server:
• Has its own processes, users, files and provides
full root or administrator access
• Owns IP addresses, port numbers, filtering and
routing rules
• Can have its own versions of system libraries or
different patch levels
• Could delete, add, modify any file, install its
own application software or system software in its exclusive area
• Runs the same O.S. the host is running
18 June 15, 2008
Linux kernel cpuset
Soft partitions in the Linux kernel
•
CPUs & memory
−
Exclusively or shared
•
Simple and powerful to use
•
Management tools
−
Standard file system & OS tools
Virtualization technologies
•
LinuX Containers (LXC)
−
Showing up as cgroup (Control Group) extensions
• Some instance of cgroup/cpusets have been present since early
2.6 releases
−
Mainline kernel capability
• Lots happened between 2.6.27 and 2.6.30 (and beyond)
• As a tech preview in SLES 11
• Unknown status for future RHEL release(s)
−
Evolving functionality
• Seem to be interaction between OpenVZ & LXC developers
20 June 15, 2008
LinuX Containers - LXC
Native containers – an emerging technology
•
Expands existing cgroup capability
−
Aggregates CPU, Memory, IO resources
• Network & storage IO
−
One or more tasks and their children
•
Includes resource capping capabilities
−
IO, Memory & CPU
•
Managed using libvirt (or cset)
•
Updated in recent upstream kernels
−
Mainline kernel base for RHEL 6, SLES 11
•
Technology preview in SLES 11
21 September 3, 09
The Future:
22 12 May 2009
Split LRU For Improved VM Scalability
•
Large systems can perform poorly under high memory
demand
− Page replacement requires spinlocked LRU scans
− Each core will typically scan / reclaim under pressure
− 128 GB = 32 million x86-64 pages
− Inadequate swap worsens behavior – no way to free memory
•
LRU split into anonymous and file-backed lists
− Tailored reclaim policies, improved locking
•
Non-reclaimable removed from LRUs
− mlocked, tmpfs, etc.
•
Results in much better scanning efficiency
23 September 3, 09
Virtualization Future
•
Management tools:
−
The new “battle-front”
• Lots of options
−
libguestfs
• Batch configuration changes
− Modify file system structure
− Run commands in guest
context
− Scriptable
− From host environment:
• Dormant guests
• Running guests
24 12 May 2009
Beneficial Upstream Work
Significant benefits for KVM
•
Big Kernel Lock pushdown / elimination
•
Lockless page cache
•
VFS cleanup
−
Includes global inode lock scope reduction, other perf.
improvements
•
ext3 fsync performance
•
NUMA node hugepage allocation
•
Improved page reclamation
25 September 3, 09
26 September 3, 09
Best Practices:
Maximize portability
•
Data management
− File-backed storage for OS
• Std. File Systems
• Cluster File Systems
• Networked File Systems
− Data on shared LVM or raw
storage
• Fibre Channel
• iSCSI
•
Guest management
− Leverage libvirt portability
• Xen, KVM, LXC, OpenVZ, etc
• virt-* utilities
•
For RHEL/SLES:
− Always install both bare-metal
27 September 3, 09
Best Practices:
Maximize portability
•
Think in terms of appliances
− Start with .vmdk based images
• Simplified deployment
•
Always use PV IO drivers
− Assuming they’re available
• Windows too…
•
OS configuration
− Xen
• Use “poor-mans numa”:
− Pin Dom0 CPUs on
nodes/cells with IO attached
• Interleave memory
− KVM – hugepages
In Summary
•
Ever evolving landscape of virtualization
−
Starting to stabilize
• Oracle is a little bit of a wild-card