1 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Virtual
Machines
2 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Virtual
Machine
(VM)
Layered model of computation
Software and hardware divided into logical layers Layer n
Receives services from server layer n – 1 Provides services to client layer n + 1
Layers interact through well-defined programming interface
Virtual layer
Software emulation of hardware or software layer n Transparent to layer n + 1
Provides service to layer n + 1 as expected from real layer n Virtual layer n can run at some layer m ≠n in real system
Real System Virtual System m – 1 n – 1 virtual n = m n n + 1 n + 1 3 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Examples
of
Virtual
Systems
Web browser exchanges data with server
Cloud computing
Virtual
Service level agreement (SLA) specifies infrastructure requirements User sees hardware / software configuration / performance Real
Provider assembles virtual configuration Meets SLA requirements
May be implemented in any way
Hardware Network Hardware Server Client Server OS Protocol Stack Local OS Protocol Stack Web server Browser virtual real real 4 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Types
of
Virtual
Machine
Process Virtual Machine
VM provides application interpretation above OS
Hosted Virtual Machine
Virtual machine monitor (VMM)
Runs above primary OS / below guest OS
Provides guest OS with software emulation of real hardware system
System Virtual Machine
Emulation of system-level hardware environment
Runs above physical hardware and below one or more OSs
VM Application Hardware OS VMM Application Guest OS Hosted VM Hardware VMM Application OS System VM Hardware Application OS
Basic System Process VM
Hardware Application
5 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Process
VM
Example
— Java
Designed for program portability between platforms
Provides standard interface to software
Java VM located above a standard OS
Interface to hardware implementation dependent I/O operations performed by calls to OS
Java compiled to bytecode
Bytecode usually run (interpreted) in Java VM
Java without VM
Java bytecode processor in IBM mainframes Native machine language (ISA) is Java bytecode Execute Java bytecode without interpretation
http://java.sun.com/docs/books/tutorial/getStarted/intro/definition.html
6 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Hosted
VM
Example
— Guest
OS
Over
OS
DOS command line interface over Windows
Windows allocates 1 MB virtual memory space Copies DOS kernel into low memory
System calls handled by guest DOS kernel DOS accesses to hardware
Trapped and served by Windows host OS Responses returned to DOS
Concurrent DOS windows
Multiple allocations of 1 MB virtual memory spaces DEBUG
Application running in virtual DOS machine Sees 1 MB memory space allocated by Windows Register values
Windows emulates real values to DOS Debug emulates DOS values to user
Parallels, VirtualBox, VMware, DOSBox, ...
Host Windows, Linux, DOS, … as guest OSs over host OS
Virtual 86 debug Hardware Windows Application Windows 7 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Virtual
Machine
in
IBM
z/990
Mainframe
Hardware
CPUs, I/O system, internal communication network
VMM (hypervisor)
Operator console for partitioning/configuring CPUs and I/O Provides hardware emulation as abstraction to OS layer
OS
Logical partition (LPAR) runs separate instance of operating system Run z/OS, MVS, VM, Unix, Linux, Windows, … instances in parallel Non-Windows OS versions expect to see hypervisor (not hardware)
User
User sees single-user interface provided by one OS
User … User User … User User … User User … User User … User
OS — LPAR OS — LPAR OS — LPAR … OS — LPAR
VMM — Systems Manager — Hypervisor Hardware
8 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
VM
as
System
Management
Tool
Isolate user environments on single hardware platform
Multiple copies of single operating system running independently Multiple operating systems running concurrently
Maintain higher security
Resource management
Hardware redundancy High availability Recovery management Hardware pooling
Assemble hardware cluster
Map applications to hardware efficiently Load balancing
Remap applications to hardware
App1 App2 OS OS VMM Server App2 App3 OS OS VMM Server
9 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
z/990
Parallel
Sysplex
Model
Parallel Sysplex
Merge 2 to 32 instances of z/OS into a single system Applications divide work and data among LPARs High capacity for very large workloads
Resource sharing
Dynamic workload balancing
Geographical diversity
Coupled LPARs on remote physical systems Physical backup
Automatic failure recovery Continuous availability
User … User User … User User … UserUser … User User … User
LPAR - OS LPAR - OS LPAR - OS … LPAR - OS
Systems Manager Hardware (processors, RAM, I/O)
User … User User … User User … UserUser … User User … User
LPAR - OS LPAR - OS LPAR - OS … LPAR - OS
Systems Manager Hardware (processors, RAM, I/O) Coupling
Facility
10 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Virtualization
for
Server
Systems
Old file server model
Run one application per physical server Server specified for worst case load
Large number of typically underutilized servers Huge aggregate space capacity
Competition from mainframes
VMM provides dynamic load balancing
Hardware provides centralized power, cooling, monitoring, backup High SAR — scalability, availability, reliability
Lower cost per served client than server farm
Virtualization in server
Partition hardware resources to run independent applications Intel virtualization
IA-32 and IA-64 ISA support I/O chipset support
11 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
HP
Virtual
Partitions
(vPars)
Hewlett-Packard, "Installing and Managing HP-UX Virtual Partitions (vPars)"
Boot Order
12 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
System
VM
Organization
Hypervisor
Virtual machine monitor (VMM)
Lowest layer above physical hardware (host) Uniprocessor or multiprocessor system
Creates virtual machine (VM) environments for guest OSs Allocates physical host resources to virtual resources
VM overhead
Processor intensive applications — low overhead Infrequent use of OS calls
Most instructions run directly on hardware I/O intensive applications — high overhead
Frequent use of OS calls
OS calls for I/O services run in emulation I/O-limited applications
Program throughput limited by I/O latency Emulation adds relatively small overhead
13 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
VMM
Requirements
Hardware abstraction
Guest environment must replicate hardware
VMM must present well-defined software interface to OS
Protection
Isolate guests from one another
Protect VMM from guest OS and application software
Guest software cannot change allocation of physical resources
Privilege
VMM runs in kernel mode
Guest OSs and applications run in user mode
Hardware support for VMM
Virtualization primitives built into mainframe ISA
Any OS or application access to hardware causes trap to VMM VMM catches every access to hardware abstraction layer (HAL)
14 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Virtualization
Awareness
Virtualization-aware guest OS
OS written to run above VMM/hypervisor Expects to interact with virtual host
Does not expect full or direct control of physical hardware OS code interfaces with hypervisor code
No need to remap (bluff) pointers intended for real hardware May be presented with view of real system for limited operations Example — mainframe OS
Writes I/O outputs to hypervisor interface
Does not attempt to configure I/O hardware devices
Particular OS may be given direct control of particular I/O device
Virtualization-unaware guest OS
OS written to run above physical hardware Expects full and direct control of real hardware
Requires extensive intervention and remapping by VMM
15 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Hardware
Emulation
Activities
OS sees hardware through operations
OS instructions cause to CPU initiate memory and I/O operations I/O devices initiate DMA operations and interrupts
VMM manages I/O device DMA
Translate OS interrupt handlers from guest format to host format
DMA or IRQ I/O device
actions
Translate data/instruction from guest to host format Remap I/O port space
Read/Write to real host I/O device Read data or instruction Write data CPU I/O Device Access
Translate data/instruction from guest to host format Remap address space
Read/Write to real host memory Read data or instruction Write data CPU Memory Access VMM Emulation Real Hardware Operation Hardware VMM Application OS 16 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Full/Partial
System
Emulation
Full system emulation
VMM intervenes in every OS access to hardware
Translates guest driver to host driver I/O devices
Translates guest configuration instructions to host
Chipset
Translates memory size and organization
Memory
Translates guest ISA to host ISA CPU
CPU emulation example
Run Nintendo game on PC
Translate each Nintendo instruction to IA-32 instruction set
Partial system emulation
Part of host hardware presented to OS unchanged
VMM passes guest operations to host with minimal intervention Most system VMs emulate subset/superset of real host hardware CPU emulation only in special cases
17 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Software
Emulation
of
I/O
Hardware
Advantages
VMM provides emulation of widely supported device hardware Guest OS runs available device drivers without modification
Difficulties
Requires very accurate device emulation
Includes hardware revisions and "bug emulation"
Performance issues
VMM intervention on every guest OS access to I/O device Context switch from guest OS to VMM
VMM emulates I/O access and access to real I/O device Context switch back to guest OS with response
Adds considerable overhead
Emulation is compute-intensive — increases CPU utilization
Least-bad case
Virtual device = real device
Remap I/O ports — no change to driver operation
18 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Bootstrap
Process
in
System
VM
OSloader writes to virtualI/O space
VMMresponds for I/O devices
OSloads drivers for virtualdevices
OSprovides user interface Device Discovery Secondary Boot Device Discovery System boot
Administrator configures VM partitions Administrator points VMMto device
containingOSboot image
VMMbootsOSinto partition
VMMloader writes tohostI/O space Chipset and I/O devices respond
VMMloads drivers for hostdevices
VMMprovides administrator interface
OSloader writes tohostI/O space Chipset and I/O devices respond
OSloads drivers forhostdevices
OSprovides user interface
CPU loads initial system loader (ISL) ISL points to system boot device
Boot device contains VMM
CPU loads initial system loader (ISL) ISL points to system boot device
Boot device containsOS
WorkstationwithVMM Workstation withoutVMM
19 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Virtualization
Difficulties
for
IA
‐
32
IA-32 designed to provide hardware support to OS
Memory segmentation Virtual memory and paging Task management
Interrupt management
Protection and privilege for segmentation, paging, interrupts
Workaround virtualization
Treat OS like user application
Can create a kludge on IA-32 systems IA-32 operating systems
Expect to have highest privilege
Can easily discover their lower privilege Virtual Real Application Hardware VMM OS Kernel User Hardware OS Application 20 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Memory
Resource
Compression
OS manages resources using IA-32 system tables
Assigns pointer to page table root (directory) Manages page table entries
Manages memory segmentation with descriptor tables limited to 8 K entries Global descriptor table (GDT)
Map segment pointer to virtual address
Define segment type (code, data, system) and privilege level Interrupt descriptor table (IDT)
Map interrupts and traps to service routines
Memory compression
VMM must reserve part of guest virtual memory for management OS expects to see the full virtual memory space
Table resource compression
VMM requires entries in GDT and IDT for management of OS VMM must prevent OS access to its descriptors
21 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Ring
Aliasing
Privilege rings
Memory segments assigned privilege from 0 (highest) to 3 (lowest) Stored in segment descriptor (table entry defining segment)
Access rights for code limited to segments of same or lower privilege Copied into code segment selector (pointer to segment via descriptor) User mode ~ ring 3
OS kernel mode ~ ring 0
Ring aliasing
Deprivileging
Run VMM at ring 0 and OS at ring 1 Issues
Paging restricted to two levels
4 level privilege not supported in 64-bit systems OS can read its CPL from code segment selector
3 2 1 0 Access Granted Access Denied DPL CPL DPL CPL DPL CPL DPL CPL
CPL — privilege level of code segment
DPL — privilege level for data access or branch target
22 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Non
‐
Faulting
Access
to
Privileged
State
Privileged registers
Control configuration of hardware systems VMM must
Intercept OS access to privileged registers
Provide virtual values based determined for guest environment
Access to privileged registers in IA-32
Access by unprivileged software usually prevented Causes protection fault
VMM emulates response to guest instruction
Some unprivileged accesses privileged state and do not fault
Guest OS can determine that it does control CPU On user access to system state
Protection fault on write
No fault on read
pointer to current task segment pointer to LDT pointer to IDT pointer to GDT TR LDTR IDTR GDTR 23 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
System
Calls
and
Interrupts
System calls
Application in ring 3 invokes OS in ring 0 Require indirect mechanism (call gate)
Redirects to hidden ring 0 address VMM must emulate call gates
SYSENTER instruction provides fast calls to ring 0 Will call VMM instead of guest OS
SYSEXIT instruction ends SYSENTER routine Faults to ring 0 if executed from lower privilege VMM must emulate response to SYSENTER/SYSEXIT
Interrupts
Interrupts can be masked by controlling interrupt flag (IF) VMM must mask interrupts and handle interrupts by emulation Some OSs toggle IF frequently requiring many VMM interventions
24 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Intel
Virtualization
Technology
(VT)
Virtual machine monitor
Hardware boots (3rd party) VMM software instead of OS VMM configures hardware resources among guest systems
Remaps hardware locations to virtual pointers for guests OSs boot within guest partitions
Hardware support for virtualization
VT-enabled processors alternate between operating modes Root mode grants full hardware control to VMM
Non-root mode presents virtual pointers to guest OS VT-enabled chipset
Grants control of I/O to root mode Remaps I/O channels for non-root mode
Operating system
Sees virtual machine as real system Operates in ring 0 for maximum privilege
Sends instructions to hardware pointers in usual way
http://www.intel.com/technology/itj/2006/v10i3/index.htm Real Full Privilege
Ring 0 Virtual Full Privilege
Ring 3 User Privilege VMM OS User VMX root VMX non‐root
25 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
CPU
PCI Host-to-Bus Bridge
(bus controller) ROM
RAM
PCI (expansion) bus
ISA/EISA bus disk ISA Bridge I/O I/O I/O
Graphics I/O I/O
System
Issues
in
Virtualization
CPU virtualization support
Handles operations initiated by CPU Memory access by guest software
VMM assigns virtual address space to guest OS I/O access by guest software
VMM translates OS driver output for host device
Chipset virtualization support
Handles operations initiated by I/O device
Interrupts and DMA accesses by I/O device Intercepted by VMM and remapped
26 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
VT
‐
x
for
IA
‐
32
Processor
Virtualization
Virtual machine extensions (VMX)
VMX root operation
Operating mode designed for VMM
Grants highest privilege access to host CPU hardware state VMX non-root operation
Operating mode designed for guest OS
Presents OS with virtual host configured by VMM
OS sees standard ring 0 access to virtual IA-32 resources OS access to privileged state trapped by VMM
Mode transitions
VM entry
VMX root operation →VMX non-root operation VM exit
VMX non-root operation →VMX root operation
Host VMM OS User Hardware VMX root VMX non‐root VM Entry VM Exit 27 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Virtual
Machine
Control
Structure
Virtual-machine control structure (VMCS)
Used for mode transition management VM entry
Saves processor state to VMCS host-state area Loads processor state from VMCS guest-state area VM exit
Saves processor state to VMCS guest-state area Loads processor state from VMCS host-state area
VMCS host-state area
Segment register selectors for VMM operations
Privileged system table pointers (GDTR, IDTR, TR, page table root)
VMCS guest-state area
Segment register selectors for OS operations Virtual system table pointers determined by VMM
VMM physical address space not mapped to guest OS virtual address space
Interrupt flag (IF)
28 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
VMCS
Details
Referenced by physical address
No page table entry in any guest address space Location determined by VMM software
VMCS structure
Not determined by architecture
Defined as set of VMCS access host instructions VMM author chooses implementation
VM entry
Loads table pointers from VMCS
Pointer updates cause context shift to VM process
VMM can optionally inject virtual event (interrupt) to cause VM response
VM exit
VM saves context to memory
All VMs exit to common entry point in VMM VM exit records details of reason for exit in VMCS VMM provides detailed response to VM exit
29 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
VMCS
Control
Fields
Settable options for interrupt virtualization
VM exit if guest allows interrupts
Interrupt‐window exiting
VM exit on external interrupt
External interrupts not maskable by guest
External‐interrupt exiting
Guest/Host mask for control register virtualization
Status flags in control registers determine processor options VMM masks selected flags to prevent write by guest
Guest write to masked flag causes VM exit Guest reads flag value specified by VMM in VMCS
VM exit bitmaps
VMM chooses subset of guest actions that cause VM exit Exception bitmap — 32 exceptions that optionally cause VM exit
I/O bitmap — each 16-bit I/O port can be set to VM exit on guest access Instruction bitmap — selects privileged instructions that cause VM exit
30 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
VT
‐
x
Solves
Virtualization
Problems
Ring aliasing and compression
Guest software runs at intended privilege level
Address-Space Compression
Guest/VMM transitions can change virtual address space Guest software has full use of its own address space VMCS resides in physical address space
Does not use not linear address space
Nonfaulting access to privileged state
VMCS controls interrupts
VMM allows guest OS access to privileged registers Accesses cause transition to VMM
System calls
Guest OS runs at ring 0 as intended
Interrupts
VMCS controls response to interrupt through VMCS
31 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
VT
‐
x
Exception
Handling
Event injection replicates exception Exception not set in bitmap OS handles set in bitmap VM exit to VMM VM entry VMM services exception VMM updates system tables event injection OS continues
Possible updates — page tables, system registers, I/O configuration, ...
32 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Interrupt
Virtualization
Set option external-interrupt exiting
Interrupt VM exit to VMM VM entry VMM prepares system tables event injection
Event injection replicates interrupt Possible updates — interrupt tables,
system registers, I/O configuration, ...
33 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
VT
‐
d
for
PCI
Chipset
Virtualization
VMM allocates resources to guest OSs
Virtual address space
Virtual I/O devices mapped to real I/O devices OS accesses real I/O device through VMM mapping
DMA remapping
OS configures virtual I/O devices
Enables device-initiated DMA operations to guest address space Real I/O device must write to guest OS through emulation mapping
Interrupt remapping
Real I/O devices my interrupt CPU Interrupt intended for one guest OS
Real I/O device must deliver interrupt to guest OS through emulation mapping
CPU Bridge RAM
I/O I/O I/O
http://www.intel.com/technology/itj/2006/v10i3/index.htm
34 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
DMA
Protection
Domains
Protection domain
Subset of physical memory allocated for device-initiated DMA Protection domains may be allocated to
VMM Guest OS
Driver process running under guest OS
I/O device
May be assigned to a protection domain
Can only perform DMA to assigned protection domain
DMA address translation
I/O device DMA request to bridge contains memory address VT-d treats request address as DMA virtual address (DVA)
Guest Physical Address (GPA) of guest OS General software-generated virtual I/O address DVA translated to Host Physical Address (HPA)
35 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Mapping
I/O
Devices
to
Protection
Domains
PCI device requester ID
Identifies DMA device and request
Assigned by PCI configuration software during device discovery
Root-Entry Table
Index — 8-bit bus number from requester ID Entry — Pointer to context-entry table
Context-Entry Table
Index — 8-bit device/function number from requester ID Entry — pointer to page structure used to translate DVA
Page structure
Multilevel table structure Similar to IA-32 page tables
Function Device PCI Bus 36 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Address
Space
Overview
VMM Virtual Memory Page Tables GVA GPA Guest Virtual Memory Guest Physical Memory Physical Memory Host Physical Memory HPA Virtual Memory DMA Virtual Memory Page Structures DVA HPA
PCI Bus Device Function
I/O device DMA Request ID
Context Entry Table Root Entry Table Emulated Physical Memory
37 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
IA
‐
32
Interrupt
Handling
Legacy interrupts
Interrupt controller in chipset handles device interrupts
Programmable Interrupt Controller (PIC) integrated into ISA chipset APIC (Advanced PIC) integrated into PCI chipset
I/O device assigned interrupt request (IRQ) connection to APIC APIC
Translates device IRQ to 8-bit CPU interrupt number n Sends hardware interrupt signal (INTR) to processor CPU
Loads 64-bit entry n from Interrupt Descriptor Table (IDT) Entry points to Interrupt Service Routine (ISR)
Message signaled interrupts (MSI)
I/O APIC in PCI chipset formats IRQ signal into structured message Message transferred on PCI bus as device-initiated DMA operation Local APIC in CPU receives and decodes message
38 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Message
Interrupt
Handling
Local APIC
CPU interrupt controller
Receives/decodes local interrupt signals Receives interrupt messages from
I/O APIC
I/O APIC
PCI chipset interrupt controller Receives/decodes device IRQ signals Sends/receives interrupt messages
IA-32 Intel Architecture Software Developer’s Manual
39 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Interprocessor
Interrupts
Interprocessor Interrupt (IPI)
Subset of APIC interrupt message table
CPU writes to interrupt command register (ICR) in local APIC Local APIC issues IPI message on system bus
Used to boot and spawn threads in multiprocessor system
40 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Interrupt
Remapping
Message signaled interrupt (MSI)
Encodes interrupt vector and destination processor
Real I/O device not aware of guest OS view of emulated I/O device VMM must intercept MSI
VMM redefines interrupt message format
Provides substitute MSI DMA write request contains
Message identifier
No interrupt attributes (vector and destination processor) Requester ID of real I/O device generating interrupt
Requester ID mapped through table structure (root/context tables) Points to interrupt remapping table (IRT)
41 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Caching
of
Remapping
Structures
VT-d supports hardware caching of remapping tables
Root/Context tables Paging structures IOTLB
Interrupt remapping table entries
VMM responsible for maintaining remapping cache
Must invalidate stale cache entries
Remapping errors
DMA access request returns error message
Device response to error implementation dependent Errors logged to VMM
VMM may reset cache or I/O device configuration tables
42 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
VirtualBox
Open source hosted VMM by Oracle (Sun Microsystems)
Runs on Intel and AMD x86 hardware
Runs above Windows, Linux, Mac OS X (Intel), Solaris Provides VM with guest OS
Standard DOS, Windows, Linux, OS/2, FreeBSD, Solaris
Uses hardware virtualization support if available (not required)
Scheduling
Host OS grants timeslice to VM
VM sub-processes scheduled by guest OS
Application x86 Hardware Host OS VirtualBox Hypervisor Guest OS Application Application 43 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
VirtualBox Architecture
Front-end (client)
VirtualBox hypervisor Runs above host OS
Without Intel VT performs workaround virtualization Hypervisor runs in ring 0 of guest context
Guest OS runs as user program in ring 1 of guest context Limited use of Intel VT if available
Back-end (server)
Ring 0 driver in host OS
Copes with "gory details of x86 architecture" Allocates physical memory for VM (guest OS)
Saves/restores guest CPU context during host interrupt Registers and descriptor tables
No intervention in guest OS process management
Host OS VirtualBox Driver Hypervisor Guest OS Application 44 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
CPU
Operating
Modes
Runs native (not emulated) on CPU
Host applications at ring 3 Host OS code at ring 0
Guest "safe" application code at ring 3 Non-system activities
Makes system calls to guest OS
Runs emulated on CPU at ring 3
Guest application code that causes guest OS interventions Disable interrupts
Trap of prohibited accesses Executes real mode code
Each instruction interpreted by VirtualBox driver Interpreted code run in CPU instead of native code
Runs native on CPU at ring 1
Guest OS ring 0 code
45 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Xen
Open source system VMM
Runs on Intel and AMD x86 hardware Runs directly above hardware
Linux required to build and install Xen Provides VMs with guest OSs
Linux, Solaris, Windows XP, 2003 Server
Hardware virtualization support required for Windows guest OS
Para-virtualization for Linux/Unix guest OS
OS kernel modified to support Xen explicitly Operating systems ported to run on Xen
Similar effort to porting OS to new hardware platform
Para-virtual machine architecture very similar to native hardware User space applications and libraries not modified
Xen Architecture Overview, http://wiki.xensource.com/xenwiki
46 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Xen
Architecture
Xen hypervisor
Directly above hardware Boots system on on start-up
Domain 0
Initialized by hypervisor on boot Runs XenLinux — modified Linux kernel
Provides Domain Management and Control (DMC)
Domain U VM running guest OS Guest OS Domain U Application Guest OS Domain U Application Guest OS Domain U Application x86 Hardware Xen Hypervisor XenLinux OS Domain 0 DMC 47 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Hypervisor
Full privilege
Operates directly on hardware in ring 0
Functions
CPU scheduling for virtual machines Memory partitioning
Provides hardware abstraction to virtual machines
No awareness of
Networking
External storage devices Video
Common I/O DomainU
Domain 0 Domain U Domain U Page Tables Process List Partitioner Memory I/O x86 Hardware Xen Hypervisor CPU Scheduler 48 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Domain
0
XenLinux
Modified Linux kernel running in unique VM over hypervisor Direct privileged access rights to physical I/O resources Provides I/O virtualization to Domain U guest VMs
Generic I/O drivers
Network Backend Driver
Manages local networking hardware Processes all VM networking requests
from Domain U guests Block Backend Driver
Manages local storage disk
Processes all read/write data requests from Domain U guests
Domain U Domain 0 I/O Drivers Domain U Page Tables Process List Partitioner Memory I/O CPU Scheduler
49 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Domain
U
PV
Domain U PV guests
Paravirtualized VM running modified Linux/UNIX kernels
OS expectations
No direct access to host hardware Shares host hardware with other VMs
Guest drivers provide I/O access
Access backend drivers in Domain 0 PV Network Driver PV Block Driver Backend Driver PV Driver PV Driver Domain U Domain 0 Domain 0 50 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Domain
U
HVM
Domain U HVM Guests
Fully virtualized machines
Run standard Windows or other unmodified OS OS runs as VMX non-root operation with VT-x
OS expectations
No hardware virtualization Not sharing with other VMs Normal hardware access for boot
Xen virtual firmware runs as VMX root operation with VT-x Simulates BIOS expected by OS on initial startup
I/O support
No special drivers
Domain 0 runs Qemu-dm daemon for each HVM Guest
Supports Domain U HVM Guest for networking and disk access requests daemon OS Driver OS Driver Domain U Domain 0 Domain 0 51 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Domain
Management
Xend daemon
Python application running in Domain 0 System manager for Xen environment
Processes requests as XML remote procedure call (RPC)
Qemu-dm
Daemon handles networking and disk requests from Domain U HVM Provides full emulation of hardware for standard OS I/O drivers
Virtual firmware
Provides full emulation of BIOS for Domain U HVM Guest OS
Standard Windows Domain U HVM Windows Application XenLinux Domain U PV Linux Application XenUnix Domain U PV Unix Application x86 Hardware Xen Hypervisor XenLinux OS Domain 0 Xend Qemu 52 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Domain
U
PV
to
Domain
0
Communication
Domain U PV Guest requests I/O from Domain 0 via hypervisor
No direct support in hypervisor for I/O
Inter-Domain event channel
Domain 0 and each Domain U have shared memory area
Asynchronous inter-domain interrupts implemented in hypervisor
Example — Domain U PV Guest data write to hard disk
Guest OS sends write request to PV block driver Guest PV block driver
Writes data to Domain 0 shared memory through hypervisor Sends inter-domain interrupt to Domain-0 through hypervisor Domain 0 receives interrupt from hypervisor
Triggers PV Block Backend Driver access to shared memory Backend Driver
Reads blocks from Domain U PV Guest shared memory Writes data to hard disk
53 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
I/O
Driver
Communication
PV block driver XenLinux Domain U PV Unix Application x86 Hardware Xen Hypervisor Backend block Driver
XenLinux Domain 0 DMC Write request Write shared memory Interrupt Interrupt Interrupt Interrupt Read shared memory Write disk 54 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
Xen
PV
and
HVM
Performance
Dong, et. al., "Extending Xen with Intel Virtualization Technology", Intel Technical Journal
Test Configuration
Intel Xeon @ 2.3 GHz 4 GB DDR2 533 MHz memory 160 GB Seagate SATA disk Intel E100 Ethernet controller
55 Dr. Martin Land Virtual Machines
Modern Microprocessors — Fall 2012
I/O
Bottleneck
Bottleneck — Single Ethernet controller
Guest OS tasks waiting for I/O access
hides performance degradation caused by virtualization Web server running over
native Linux without Xen Threads compete above
2.5 Gbps
Web server running over XenLinux in Domain 0 Threads compete above
1.9 Gbps
Web server running over XenLinux in Domain U PV Threads compete above