Case Study: Xen,VMM based on
Paravirtualization
Performance and security isolation
• The run-time behavior of an application is affected by other applications running concurrently on the same platform and competing for CPU cycles, cache, main memory, disk and network access. Thus, it is difficult to predict the completion time!
• Performance isolation - a critical condition for QoS guarantees in shared computing environments.
• A VMM is a much simpler and better specified system than a traditional
operating system. Example - Xen has approximately 60,000 lines of code; Denali has only about half, 30,000.
• Xen /ˈzɛn/ is a hypervisor using a microkernel design, providing services that allow multiple computer operating systems to execute on the same computer hardware concurrently.
• The University of Cambridge Computer Laboratory developed the first versions of Xen. The Xen community develops and maintains Xen as free and
open-source software, subject to the requirements of the GNU General Public License (GPL), version 2. Xen is currently available for
the IA-32, x86-64 and ARM instruction sets.
• The security vulnerability of VMMs is considerably reduced as the systems expose a much smaller number of privileged functions.
Computer architecture and virtualization
• Conditions for efficient virtualization
– A program running under the VMM should exhibit a behavior essentially identical to that demonstrated when running on an equivalent machine directly.
– The VMM should be in complete control of the virtualized resources.
– A statistically significant fraction of machine instructions must be executed without the intervention of the VMM.
• Two classes of machine instructions:
– Sensitive - require special precautions at execution time:
•
Control sensitive - instructions that attempt to change
either the memory allocation or the privileged mode.
•
Mode sensitive - instructions whose behavior is different in
the privileged mode.
– Innocuous - not sensitive.
Full virtualization and paravirtualization
•
Full virtualization – a guest OS can run unchanged under the VMM as if it
was running directly on the hardware platform.
– Requires a virtualizable architecture
– Example: Vmware
•
Paravirtualization - a guest operating system is modified to use only
instructions that can be virtualized. Reasons for paravirtualization:
– Some aspects of the hardware cannot be virtualized.
– Improved performance.
– Present a simpler interface Examples: Xen, Denaly
Full virtualization and paravirtualization
5
Hardware abstractions are sets of routines in software that emulate some
History
•
In early 2000 ,become disadv. That hardware support for virtualization
provided by AMD -Advanced Micro Devices & Intel 1
stgen x86
architecture
•
In 2005 , Intel released two pentium 4 models support VT-x.
•
In 2006 AMD pacifica & serval Athlon 64 models
x86 poses some problems
•
Certain x86 instructions were impossible to truly ‘virtualize’ in that
classical sense
•
For example, the ‘smsw’ instruction can be executed at any privilege-level,
and in any processor mode, revealing to software the current hardware
status (e.g., Vm,Rf)
•
Intel’s Vanderpool Project endeavored to remedy this (using new
processor modes)
VT-x
•
Virtualization Technology for x86 CPUs
•
There are two modes of operation of VT-x ,and the two operation to
transit from one to another.
•
The VMCS has Two new processor execution-modes
–
VMX ‘root’ mode (for VM Managers)
–
VMX ‘non-root’ mode (for VM Guests)
•
Ten new hardware instructions
•
A six-part VMCS data-structure
VT- x
Cloud Computing: Theory and
Interaction of VMs and VMM
VM Monitor (Host) VM #1
(Guest)
VM #2 (Guest)
VMXON VMXOFF
VM Entry
VM Exit
VM Entry
VT-d, a new virtualization architecture
•
I/O MMU virtualization gives VMs direct access to peripheral
devices.
•
VT-d supports:
–
DMA address remapping, address translation for device
DMA transfers.
–
Interrupt remapping, isolation of device interrupts and VM
routing.
–
I/O device assignment, the devices can be assigned by an
administrator to a VM in any configurations.
–
Reliability features, it reports and records DMA and
interrupt errors that my otherwise corrupt memory and
impact VM isolation.
Cloud Computing: Theory and
VMCS
•
Virtual Machine Control Structure
–
A six-part data-structure (fits in a page-frame)
–
One VMCS for each VM, one for the Monitor
–
CPU is told physical address of each VMCS
–
Software must first “initialize” each VMCS
–
Then no further direct access to a VMCS
–
Access is indirect (via VMX instructions)
Six logical groups
•
Organization of contents in the VMCS:
–
The ‘Guest-State’ area
–
The ‘Host-State’ area
–
The VM-execution Control fields
–
The VM-exit Control fields
–
The VM-entry Control fields
The ten VMX instructions
•
VMXON and VMXOFF
•
VMPTRLD and VMPTRST
•
VMCLEAR
•
VMWRITE and VMREAD
•
VMLAUNCH and VMRESUME
Xen Architecture
◇ Virtual machine layer
◇ Hypervisor layer
◇ Hardware/physical layer
Hardware or physical layer:
Physical hardware components including memory, CPU, network cards, and disk drives.
Hypervisor layer:
Thin layer of software that runs on top of the hardware. The Xen hypervisor gives each virtual machine a dedicated view of the hardware.
Virtual machine layer:
VM Techniques (1) - Full-Virtualization
• Technical aspects:
Full virtualization is a virtualization technique used to provide a certain kind of virtual machine environment, saying, a complete
simulation of the underlying hardware which represents total abstraction of the underlying physical system, and create a complete virtual system in which the guest operating system can execute.
In such an environment, any software capable of execution on the raw hardware, can be run in the virtual machine and; in particular, any operating systems (Guest Operating System). No modification is required in the guest operating system or application; the guest operating system or application is not even aware that it is running within a virtualized environment.
• Typical solution of Full-Virtualization:
❑ Commercial: VMWare ESX, Microsoft Virtual Server, Citrix XenServer.
Full-Virtualization - Continue
• Advantages:
❑ Operating System does not need to be modified in order to run in a virtualized environment.
❑ Virtual machine can smoothly, easily change to different virtual system.
Example: converting VMWare guest image into Xen image
• Disadvantages:
❑Incur performance and resource penalty on VMs.
VM Techniques (2) - Para-Virtualization
• Technical aspects:
Para-virtualization is a virtualization technique that attempts to provide most services directly from the underlying hardware instead of abstracting it. Para-virtualization allows for near-native performance.
Para-virtualization requires that a guest operating system be modified to support virtualization. This typically means that guest operating systems are limited to open source systems such as Linux.
• Typical solution of Para-Virtualization:
• Commercial: Sun Solaris container.
Para-Virtualization - Continue
• Advantages:
❑ Para-virtualized guest system comes closer to native performance than a fully virtualized guest.
❑ The latest virtualization CPU support is not needed for para-virtualized.
• Disadvantages:
❑ Requires that a guest operating system be modified to support
virtualization. This typically means that guest operating systems are limited to open source systems such as Linux.
Xen - a VMM based on paravirtualization
•
The goal of the Cambridge group - design a VMM capable of scaling to
about 100 VMs running standard applications and services without any
modifications to the Application Binary Interface (ABI).
•
Linux, Minix, NetBSD, FreeBSD, NetWare, and OZONE can operate as
paravirtualized Xen guest OS running on x86, x86-64, Itanium, and ARM
architectures.
•
Xen domain - ensemble of address spaces hosting a guest OS and
applications running under the guest OS. Runs on a virtual CPU.
– Dom0 - dedicated to execution of Xen control functions and privileged instructions.
– DomU - a user domain.
Xen: Approach and Overview
•
Xen: paravirtualization
–
Provides some exposures to the underlying HW
•
Better performance
•
Need modifications to the OS
Xen implementation on x86 architecture
• Xen runs at privilege Level 0, the guest OS at Level 1, and applications at Level 3.
• The IDE interface was originally designed for rotating HDD (Hard Disk Drives) in the PC system. IDE DOM (Disk-On-Module).
• An IDE DOM is a small flash storage module which plugs directly to the IDE connector of the host motherboard.
• The x86 architecture does not support either the tagging of TLB entries or the software management of the TLB. Thus, address space switching, when the
VMM activates a different OS, requires a complete TLB flush; this has a negative impact on the performance.
• A translation lookaside buffer (TLB) is a cache that memory management hardware uses to improve virtual address translation speed.
• Solution - load Xen in a 64 MB segment at the top of each address space and delegate the management of hardware page tables to the guest OS with
minimal intervention from Xen. This region is not accessible or re-mappable by the guest OS.
• Xen schedules individual domains using the Borrowed Virtual Time (BVT) scheduling algorithm.
Memory Management
•
Depending on the hardware supports
–
Software managed TLB
•
Associate address space IDs with TLB tags
•
Allow coexistence of OSes
•
Avoid TLB flushing across OS boundaries
•
X86 does not have software managed TLB
–
Xen exists at the top 64MB of every address space
–
Avoid TLB flushing when an guest OS enter/exist Xen
–
Each OS can only map to memory it owns
–
Writes are validated by Xen
CPU
•
X86 supports 4
levels of privileges
–
0 for OS, and 3 for
applications
–
Xen downgrades the
privilege of OSes
–
System-call and
page-fault handlers
registered to Xen
–
“fast handlers” for
Device I/O
•
Xen exposes a set of simple device abstractions
The Cost of Porting an OS to Xen
•
Privileged instructions
•
Page table access
•
Network driver
•
Block device driver
Control Management
•
Separation of policy and mechanism
•
Domain0 hosts the application-level
management software
–
Creation and deletion
Control Transfer: Hypercalls and Events
•
Hypercall: synchronous calls from a domain to
Xen
–
Analogous to system calls
•
Events: asynchronous notifications from Xen
to domains
Data Transfer: I/O Rings
CPU Scheduling
•
Borrowed virtual time scheduling
–
Allows temporary violations of fair sharing to
favor recently-woken domains
Time and Timers
•
Xen provides each guest OS with
–
Real time (since machine boot)
–
Virtual time (time spent for execution)
–
Wall-clock time
•
Each guest OS can program a pair of alarm
timers
–
Real time
Virtual Address Translation
•
No shadow pages (VMWare)
•
Xen provides constrained but direct MMU
updates
•
All guest OSes have read-only accesses to
page tables
Physical Memory
•
Reserved at domain creation times
Network
•
Virtual firewall-router attached to all domains
•
Round-robin packet scheduler
•
To send a packet, enqueue a buffer descriptor
into the transmit rang
•
Use scatter-gather DMA (no packet copying)
–
A domain needs to exchange page frame to avoid
copying
Disk
•
Only Domain0 has direct access to disks
•
Other domains need to use virtual block
devices
–
Use the I/O ring
–
Reorder requests prior to enqueuing them on the
ring
–
If permitted, Xen will also reorder requests to
improve performance
Dom0 components
•
XenStore – a Dom0 process.(Disk-On-Module)
– Supports a system-wide registry and naming service.
– Implemented as a hierarchical key-value storage.
– A watch function informs listeners of changes of the key in storage they have subscribed to.
– Communicates with guest VMs via shared memory using Dom0 privileges
•
Toolstack - responsible for creating, destroying, and managing the
resources and privileges of VMs.
– To create a new VM, a user provides a configuration file describing memory and CPU allocations and device configurations.
– Toolstack parses this file and writes this information in XenStore.
Xen abstractions for networking and I/O
•
Each domain has one or more Virtual Network Interfaces (VIFs) which
support the functionality of a network interface card. A VIF is attached to
a Virtual Firewall-Router (VFR).
•
Split drivers have a front-end in the DomU and the back-end in Dom0; the
two communicate via a ring in shared memory.
•
Ring - a circular queue of descriptors allocated by a domain and accessible
within Xen. Descriptors do not contain data, the data buffers are allocated
off-band by the guest OS.
•
Two rings of buffer descriptors, one for packet sending and one for packet
receiving, are supported.
•
To transmit a packet:
– a guest OS enqueues a buffer descriptor to the send ring,
– then Xen copies the descriptor and checks safety,
– copies only the packet header, not the payload, and
Xen 2.0
• Optimization of:
–
Virtual interface - takes advantage of the capabilities of
some physical NICs, such as checksum offload.
–
I/O channel - rather than copying a data buffer holding a
packet, each packet is allocated in a new page and then
the physical page containing the packet is re-mapped into
the target domain.
–
Virtual memory - takes advantage of the superpage and
global page mapping hardware on Pentium and Pentium
Pro processors. A superpage entry covers 1,024 pages of
physical memory and the address translation mechanism
maps a set of contiguous pages to a set of contiguous
physical pages. This helps reduce the number of TLB
misses.
Cloud Computing: Theory and
Performance comparison of virtual machines
•
Compare the performance of Xen and OpenVZwith, a standard
operating system, a plain vanilla Linux.
•
The questions examined are:
– How the performance scales up with the load?
– What is the impact of a mix of applications?
– What are the implications of the load assignment on individual servers?
•
The main conclusions:
– The virtualization overhead of Xen is considerably higher than that of OpenVZ and that this is due primarily to L2-cache misses.
– The performance degradation when the workload increases is also noticeable for Xen.
The setup for the performance comparison of a native Linux system with OpenVZ, and the Xen systems. The applications are a web server and a MySQL database server. (a) The first
The darker side of virtualization
•
In a layered structure, a defense mechanism at some layer can be disabled
by malware running at a layer below it.
•
It is feasible to insert a rogue VMM, a Virtual-Machine Based Rootkit
(VMBR) between the physical hardware and an operating system.
•
Rootkit - malware with a privileged access to a system.
•
The VMBR can enable a separate malicious OS to run surreptitiously and
make this malicious OS invisible to the guest OS and to the application
running under it.
•
Under the protection of the VMBR, the malicious OS could:
– observe the data, the events, or the state of the target system.
– run services, such as spam relays or distributed denial-of-service attacks.