Hardware virtualization
technology and its security
Dr. Qingni Shen
Peking University
Main Points
VMM technology
Intel VT technology
Virtual Machine Monitors (VMMs)
VMM is a software layer
Allow many virtual machine to share hardware
Allow unmodified software directly compatible
...
Virtual Machine Monitor (VMM)
VMn VM0 VM1 Platform HW I/O Devices Processor/CS Memory Virtual Machines (VMs) Appn App0 Guest OS0 App1 Guest OS1 Guest OSn
Workload Isolation
Purpose of Virtualization
Workload Consolidation
Workload Migration Workload Embedding HW App2 App1 OS HW1 HW2 App2 App1 OS1 OS2 VMM HW App2 App1 OS1 OS2 VMM HW1 App HW2 VMM OS VMM HW1 App HW2 VMM OS VMM HW App App OS1 OS2 VMM HW App1 App2 OS OS
Virtualization Usage Models
Legacy software support
Test
The active partition
Manageable
…
Server consolidation
Failure recovery architecture
High elastic data center
Manageable
…
Migration Consolidation Consolidation Consolidation Isolation Migration Embedding Isolation Migration Embedding Isolation MigrationCL
IENT
SER
VE
R
What is Intel VT technology
Formerly known by the codenames Vanderpool* & Silvervale*
VT is a collection of a series of hardware enhanced components
VT is designed to simplify the virtualization software
VT brings a new value, and various opportunities
VT-x and VT-i the first VT series products implement on Intel processor and chip set.
VT-x for IA-32 CPU virtualization enhancement
Main components of Intel-VT
Intel-VT technology, which is designed by
Intel corporation, is a solution of hardware
assisted virtualization. Including:
VT-x/VT-i for CPU
VT-d for chip set
Core function of VT-x/VT-i
Intel flexible priority technology
– (Intel VT FlexPriority)
Intel VT flexible migration technology
– (Intel VT FlexMigration)
Intel VT extended page table
Intel VT FlexPriority
When the processor executes the task,it will receive request or “Interruption” command which needs to pay attention to and produced by other devices or applications. In order to minimize the impact on performance, a special register within the processor will monitor the task priority. Thus, only a higher priority than the currently running task interruption will be timely focused. Intel FlexPriority can create a virtual copy of TPR6,which can be read, and can be modified by guest os without any intervention in some cases. This measure can make a significant performance improvement in 32 bit OS which uses TPR frequently.( For instance,the performance of application in Windows Server* 2000 will be improved by 35%.)
Intel VT FlexMigration
An important advantage of virtualization is that in no downtime condition, running applications can be migrated between physical machines. The aim of Intel VT FlexMigration is to achieve the seamless migration between current server and future server which are based on Intel processor, even if the new system may include enhanced instruction set. With the help of this technology, management process can create a set of consistent instructions in all servers in migration pool, realizing seamless migration of workload. This generates a more flexible and unified server resource pool which can run seamlessly among generations of hardware.
Platform Hardware
VM1
VM Monitor
VM0
Guest OS0 App App ... App
...
Guest OS1 App App App...
OS and applications should not know that they are
sharing CPU resources with others
VMM should be able to protect themselves from other client software threat
Challenge of development of VMM
VMM should be able to make software stack in VM
mutually independent
VMM should be able to provide virtual hardware platform interface to guest software
Platform Hardware VM1 VM Monitor VM0 Guest OS0
...
Guest OS1 Run VMM in VMM to handle errors during Guest OS operationCPU virtualization of current IA architecture
requires complex software design.
Software solution: Client degradation
Virtual hole of IA architecture:
• Ring level rename • Non-trap instruction • Out of bound error
• I interruption virtualization • Context switching of CPU state •Address space compression
Complex software skills • Source code modification • Binary code modification
App App ... App App App ... App
Sensitive instruction will go wrong when run Guest OS in ring 0 and above
VMM is able to execute
privilege instructions before guest software
VT removes the design of virtualization hole and
complex software
Intel
®
Virtualization Technology
Guest software runs in the new model, and the privilege is down;
• Applications still run in ring 3 • OS runs in degraded privilege ring 0
• VMM runs in a new model with all privileges
Platform Hardware VM1 VM Monitor VM0 Guest OS0
...
Guest OS1 App App ... App App App ... AppAn overview of VT-x
Operation Mode
Guest OS
VMM transition
VM control structure
Virtual-machine control structure
Principle of VM exit
Operation mode
VMX root
mode:
Own all privileges for the operation of the VMM
VMX non-root
mode:
Own a subset of privileges for running guest softwares
Rely on the ring level to reduce guest and software privileges
VMX operation mode
Root operation mode
VMM is running in the root operation mode
Non- root operation mode
Guest software is running in the non-root operation
mode
VM Entry and VM Exit
VM Entry
From VMM into Guest
Fetch VM state from VMCS,and enter in non-root mode
VMLAUNCH instruction is used to initialize the entry
VMRESUME is used to re-enter the virtual machine state
Physical Host Hardware
VM1
VM Monitor
VM0
Guest OS0 App App ... App
...
Guest OS1 App App ... App
VM Exit VM Entry
VM Exit
➤
From Guest into VMM
➤
Enters VMX root mode
➤
Place guest state into
VMCS
➤
Import VMM state from
VMCS
IA-32
Operation
VT-x Operation
Ring 0
Ring 3
VT-x Operation
Ring 0
Ring 3
VMX Root
VT-x Operation
Ring 0
Ring 3
VMX Root
Operation
VMX
Non-root
Operation
Ring 0
Ring 3
VM 1VMLAUNCH
VT-x Operation
Ring 0
Ring 3
VMX Root
Operation
VMX
Non-root
Operation
Ring 0
Ring 3
VM 1VM Exit
VT-x Operation
Ring 0
Ring 3
VMX Root
Operation
VMX
Non-root
Operation
Ring 0
Ring 3
VM 1VMRESUME
VT-x Operation
Ring 0
Ring 3
VMX Root
Operation
VMX
Non-root
Operation
. . .
Ring 0
Ring 3
VM 1Ring 0
Ring 3
VM 2Ring 0
Ring 3
VM nVMLAUNCH
VT-x Operation
Ring 0
Ring 3
VMX Root
Operation
VMX
Non-root
Operation
. . .
Ring 0
Ring 3
VM 1Ring 0
Ring 3
VM 2Ring 0
Ring 3
VM n VMCS2 VMCSn VMCS1Virtual Machine Control
Structure (VMCS
)
VMCSs is control structure stored in the memory
Only one VMCS is active every time
VMCS Payload:
VM execution,exit,entry control
Guest and host state
VM exits information field
VMCS currently has no uniform standard , so
different designs may have different definitions
VMPTRLD:
a pointer pointing to VMCS
Virtual machine control structure (VMCS)
In the view of VMX operation,Intel defines VMCS. This structure can only be operated by VMCLEAR, VMPTRLD, VMREAD, and VMWRITE。
a) GUEST-STATE domain:state of processor when VM changes from root mode to non-root mode;
b) HOST-STATE domain:state of processor when VM changes from non-root mode to root mode ;
c) VM execution control domain : Processor is forced to exit from non-root operation mode to root operation mode if VM is running in non-root operation mode.
d) VM exit control domain : Store information f VM exits from non-root operation mode.
e) VM entry control domain:Read information if VM enters into non-root operation mode.
f) VM exit information domain:Save the reason into domain if VM exits from non-root operation mode to root operation mode.
Reasons of VM EXIT
Exit paging state to operate on the page table
Access CR3, INVLPG instruction(Control TLB disabled)
Page error
CR0/CR4 access
Some states need virtualization
CPUID, RDMSR, WRMSR, RDPMC, RDTSC, MOV DRx
Exception and I/O access
32-entry exception bitmap, I/O-port access bitmap
Control of the asynchronous events
When guest interrupt blocks, VMM should handle this situation
Detect guest states in order to facilitate VM scheduling
Benefits: VT helps improve VMMs
VT reduces the guest OS’s dependency
No need for binary package or translation
Provide support for legacy system
VT improves robustness
No need for complex software technology
Simplified
Smaller Trusted Compute Base (TCB)
VT improves performance
Device Virtualization (VT-d)
As for server, I/O is an important component. The improvement
of CPU computing ability can lead to faster data processing, only
with the premise of the smooth arrival of data to CPU. As a result,
whether the storage or the network, as well as the graphic cards,
memory, and so on, I/O capability is an critical part of
enterprise-level architecture.
Without VT-d technology, VMM must be involved in the
interaction with I/O directly, which will not only slows down the
speed of data transmission, but also increases processor’s
workload due to frequent VMM activities. VT-d provides direct
access to real hardware mechanism for guest OS, which greatly
reduces server processor’s workload.
Current way of virtualization
Simulate the I/O device:VMM simulates an I/O device for the guest so that the guest can make use of the corresponding real drivers through fully simulating devices’ functionality. This approach can provide perfect compatibility (regardless of the fact that whether this device exists or not), but this simulation will affect performance apparently.
Additional software interface : This mode is more like I/O simulation model. VMM software will provide a series of direct device interface to VM, so as to enhance the efficiency of virtualization. This is a bit like the DirectX technology of Windows OS, which offers better performance than I/O simulation model, but decreases the capability.
Design of VT-d
The key to I/O virtualization is to solve the problem of DMA and
IRQ interrupt request.
Intel VT-d technology is based on hardware-assisted virtualization technology of North Bridge. The DMA virtualization hardware and IRQ virtualization hardware, built in the North Bridge, greatly enhance the reliability, flexibility and performance of I/O.
Traditional IOMMUs (I/O memory management units) distinguishes devices through the range of memory address. So it is easy to realize, but is not easy to implement DMA isolation. Therefore, VT-d realizes the existence of multiple DMA protected areas by updating the design of IOMMU architecture, and achieves DMA virtualization eventually. It is also called DMA Remapping.
I/O device will generate many interrupt requests, so the I/O virtualization must separate these requests correctly, and routes them to different virtual machines. Traditional devices have two kinds of interrupt requests: One way is through I/O interrupt controller router, and the other way is through MSI(message signaled interrupts) which is sent by DMA write request directly. Due to the need to embed the target memory address into DMA request, this architecture requires fully access all the memory addresses, without realizing interrupt isolation.
VT-d’s interrupt-remapping architecture solves this problem by redefining MSI format. The new MSI is still in the form of a DMA write request, but does not embed the target memory address, and replaces with a message ID instead. Hardware can identify different VM domains through different message IDs by maintaining a table structure. The interrupt-remapping architecture implemented by VT-d is able to support all I/O resources, including IOAPICs, and all types of interrupt, such as common MSI and extended MSI-X.
DMA Remapping
DMA remapping can provide hardware isolation for
devices to access the memory. Through different I/O
page tables, every device will be assigned to a specific
domain. When the device attempts to access the
system memory, DMA intercepts the access, decides
whether to allow the access, and determines the real
address location simultaneously. When the I/O table
data structure is used frequently, it will be cached.
DMA remapping mechanism can be configured
independently by every device.
Interrupt Remapping
Interrupt
remapping
provides
the
functions of remapping and routing the
interrupt requests from I/O devices.
New design of IOMMU
IOMMU manages device access to system memory. It locates between the peripheral devices and the host, and translates the address of device request to system memory address, and also checks the appropriate permission for each access.
With IOMMU, every device can be assigned to a protection domain, which defines that the I/O page translation will be used in every device of the domain, and reveals the read privilege of every I/O page. As to virtualization, VMM can specify all devices to a specific guest OS environment in the same protected domain, which will create a series of address translation and access restrict for devices running on specific guest OS.
Two kinds of new device virtualization based on VT-d
Direct assignment of I/O device:Physical I/O device is directly assigned to VM. In this model, drivers inside the VM will directly communicate with hardware devices, only through a small amount or without the management of VMM. For the sake of system’s robustness, hardware virtualization is needed to isolate and protect hardware resources only for specified VM to use. In the meanwhile, hardware also needs to possess multiple I/O container partitions for multiple VMs simultaneously.
This model almost eliminates the need of running drivers in VMM completely.
Such as CPU,although it is not an I/O device in common sense, it is surely in this way allocated to VM, while the CPU resources are still under the management of VMM.
Shared I/O device: This model is an extension of the I/O assignment model, and has a high requirement that needs to support multiple function interfaces, and each interface can be assigned to a VM independently. This model will no doubt provide very high virtualization performance.
Network Virtualization (VT-c)
Intel VT-c can further optimize network for virtualization.
Essentially, the function of this set of technology
combination is similar with post office: categorize all the
received letters, packages and envelopes, and deliver them to
their respective destinations. Intel VT-c significantly
increases the speed of delivery, and reduces the workload of
VMM and server processor through these functions
implementing in private network chips. VT-c includes:
Virtual Machine Device Queue (VMDq)
VMDq
In traditional server virtualization environment, VMM must categorize every individual data packet, and deliver it to its assigned VM, which will take up a lot of processor cycles. And with VMDq, this function can be performed by specified hardware within Intel server network card, and VMM is only responsible to deliver presort data packet group to appropriate guest OS. This will slow down I/O latency, and gain more available cycles for processor to deal with business applications. I/O throughput can be more than doubled by Intel VT-c, so that virtualized applications are able to reach the level of the host throughput. Every server will integrate more applications, while I/O bottlenecks will be less.
Network virtualization model
Currently, all the VM softwares with
network capabilities have built-in virtual
switches, a majority of which provide the
function of router on that basis. Their
aim is to connect multiple virtual
machines together into one or more
networks, like the effect of real switch or
router.
Structure of VMDq
VMDq technology provides a classification/sorting engine, belonging to the second layer of ISO OSI 7-layer model, realizes part of the functions of the switch. In order to offer a suitable performance, it must use a stack buffer queue, therefore the network card that supports VMDq will also supports RSS receiver’s extended function.
A layer 2 classification/sorting device is realized by a hardware on the network card that supports VMDq, which through the MAC address or VLAN to send packets to specified VM queue(this queue is called pool). VMM software that completes virtual switch task only requires simple data replication in the final. Thus it greatly improve the efficiency of the virtual network.
Network card that supports VMDq queue usually supports RSS queue. For example, Intel 82576EB network card supports 8 VM queues, and 16 RSS queues. The are essentially 16 send/receive queue pairs, which means every VM can be assigned two pairs.
Diagram of VMDq Acceleration Structure
Virtual Machine Direct Connection( VMDc )
With the aid of single root I/O virtualization (SR-IOV)
standard in PCI-SI, VM direct connection (VMDc)
supports
VM’s direct access to network I/O hardware, and thus
improves the performance significantly.
As it is mentioned
before, Intel VT-d supports direct communication channel
between guest OS and I/O port. SR-IOV can be extended by
supporting each I/O port’s multiple communication
channels. For example,each of the 10 guest OSes can be
assigned a protected and 1Gb/s private link by the mean of
a single Intel 10 Gigabit server network card. These links
bypass the VMM switch,and can further enhance I/O in
performance and reduce workload of server processors.
Security Analysis of VT-d
Hardware virtualization solves the security
problem of virtual system, and provides a
better isolation solution in system hardware
resources.
But the hardware system is complicated, so
there are still some security problems to be
solved. In the meantime, a few attackers
have discovered some loopholes in hardware
virtualization.
Attack Scenario
Assume such a virtual system, which builds a driver
domain with the aid of the Intel VT-d technology.
Driver domains are similar to traditional VMs, but
they are assigned the privileges of choosing devices
such as network card, disk controller etc.
We can attempt to get the complete control of the
whole system by the mean of such a deriver domain.
In this attack scenario, we suppose that attackers
have managed to get a full control of a certain driver
domain.
MSI( Message Signaled Interrupts )
MSI Format(From Intel developer manual ):
All the three attacks, which will be mentioned
later, make use of I/O devices to generate the MSI,
so as to realize the attack.
1)Threat based on SIPI Construction
SIPI ( Start-up Inter Processor Interrupt )
interrupt is a key function of any multiprocessor
(or multi-core) system based on Intel processor.
BIOS uses SIPI interrupt to initialize all
processers and distribute tasks to them at startup.
When system starts, only one processor, called
Bootstrap processor or BSP, is active, and its job
is to initialize other processors to make them
work properly.
SIPI interrupt informs target processor to start to
execute special boot code at the address 0xvv000.
While VV is passed by SIPI interrupt vector. In
order to make SIPI effective, target CPU must be
sent a INIT interrupt firstly, which will reset CPU to
enter the wait-for-SIPI state. BSP sends SIPI
interrupts to all other processors under normal
circumstances.
The only mechanism of sending SIPI interrupt is
through the local advanced programmable interrupt
controller.
SIPI 格 式
( 摘 自
Intel 开 发
2)System call injection attack
Driver Domain
CPU#0 CPU#1 CPU#2
Hypervisor NIC
0x82h
hypercall
3)#AC-based injection attack
#AC can be tried to confuse the stack layout
of exception handler.
#AC exception is the only exception that
meets the following two requirements:
The vector value is greater than 15, so that it
can be distributed by MSI;
It is the only one that can be interpreted as
exception, without storage error codes.
LOW HIG H
ErrorCode
RIP
CS
RFLAGS
RSP
SS
Normal distribution of #AC exception Storage
The #AC handler will be triggered to execute on
target CPU if the MSI, with a vector value 0x11(#
AC), is distributed from some devices. Because
handler is expected to place error codes on the top of
the stack, so it will go wrong when resolve other
values on the stack. In this case, CS may be revolved
to RIP, and RFLAGS will be treated as CS and so on.
When an exception handler ends, it will execute
IRET instruction to popup saved register values, and
jumps back to CS:RIP, which means that handler
will return to RFLAGS:CS actually。
Bibliography
1. Hiremane, R. (2007). "Intel virtualization technology for directed i/o (intel vt-d)." Technology@ Intel Magazine 4(10).
2. Neiger, G., et al. (2006). "Intel virtualization technology: Hardware support for efficient processor virtualization." Intel Technology Journal 10(3): 167-177.
3. Uhlig, R., et al. (2005). "Intel virtualization technology." Computer 38(5): 48-56. 4. Adams, K. and O. Agesen (2006). A comparison of software and hardware
techniques for x86 virtualization. ACM SIGOPS Operating Systems Review, ACM. 5. Zhang, X. and Y. Dong (2008). Optimizing Xen VMM Based on Intel®
Virtualization Technology. Internet Computing in Science and Engineering, 2008. ICICSE'08. International Conference on, IEEE.
6. Perez, R., et al. (2008). "Virtualization and hardware-based security." Security & Privacy, IEEE 6(5): 24-31.
7. De Gelas, J. and I. ESX (2008). "Hardware Virtualization: the Nuts and Bolts." AnandTech. Retrieved March 17: 2008.