Intel® Virtualization Technology
Examining VT-x and VT-d
August, 2007
v 1.0
Intel, the Intel logo, Pentium, and VTune are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States and other countries.
Peter Carlston,
Platform Architect
Embedded & Communications Processor Division
Agenda
Software Virtualization Challenges
Silicon Enhancements for Virtualization
• New Processor Hardware Architecture Extensions
• New Memory and I/O Controller Extensions
• New 1/10Gb Ethernet NIC Extensions
• Intel New Product Cadence & “Penryn”
Summary
3 Copyright © 2007, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners
Non-Virtualized System
Min
Platform Hardware
Operating System
Ring
0
Ring
1 - 3
Apps
Apps
Apps
Apps
Apps
Apps
• OS Runs in Ring-0.
• Uses “time slicing” to run
multiple applications
• OS drivers control all
access to the platform
hardware.
Software-Based VMM Challenges
Min
Platform Hardware
Virtual Machine Monitor
Ring
0
Ring
1 - 3
…
OS
Apps
Apps
Apps
Apps
…
OS
Apps
Apps
Apps
Apps
OS
Apps
Apps
Apps
Apps
• VMM operates in Ring-0
• Traditional OS domain
• OS Ring-0 code now runs
in Rings 1- 3
• OS “de-privileged”
• VMM must resolve
potential conflicts between
OSes:
• Binary patching, etc
• Paravitrualization
• Can lead to performance
and stability issues
5 Copyright © 2007, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners
Intel® Processor Virtualization Technology
Min
Light
Light
-
-
Weight Virtual Machine Monitor
Weight Virtual Machine Monitor
0P
0D
3D
…
OS
Apps
Apps
Apps
Apps
OS
Apps
Apps
Apps
Apps
OS
Apps
Apps
Apps
Apps
• OS runs at privilege level 0 as
expected
• No excessive faulting
• No expensive SW
virtualization “hacks”
• Improved performance and
stability
• Applications run in ring 3 as
expected
• Applications remain
unchanged
Memory
Keyboard / Mouse
Graphics
Storage
Network
Processors
Platform Hardware
• VMM now runs in n
ew CPU execution mode
• HW-based mode transitions
• Memory protection in HW
• VMM is independent of HW
• VMM controls memory paging state and
exceptions
ROOT
NON-ROOT
VMM Types
Hosted Virtual Machine Monitor
VMM sits on top of a Host OS. VMM uses host
OS device drivers
Hypervisor VMM
Primary software layer directly on top of
hardware; has built-in device drivers
Hypervisor VM Monitor (VMM)
Physical Host Hardware
VM
N
...
Guest OS
1App
App
App
VM
1
Guest OS
1App
App
App
Hosted VM Monitor (VMM)
VM
N...
Guest OS1AppApp App
Host OS
VM
1Guest OS1 AppApp App
Host OS
AppApp App7 Copyright © 2007, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners
Virtualization Event Frequencies with VT-x
VT-x architecture substantially reduces the frequency
of virtualization events a VMM needs to process
0
50
100
150
200
250
Base
VT
Base
VT
SYSmark Internet
SYSmark Office
E
v
e
n
ts
/
M
illio
n
I
n
s
tr
u
c
tio
n
s
Other
I/O Operations
Control-Register Accesses
Interrupt Handling
VT-x
VT-x
Evolution of Intel® Hardware Virtualization
Technology
VMM
Software
Evolution
Past
No Hardware Support
Software-only VMMs
• Binary translation
• Paravirtualization
Simpler and more
Secure VMM through
foundation of
virtualizable ISAs
Vector 3: I/O Assisted
Virtualization
Standards for IO-device sharing:
• Multi-context I/O devices
• Endpoint device translation caching
• Under definition in the PCI-SIG*
Vector 1:
Processor Focus
Vector 2:
Platform Focus
Establish foundation
for virtualization in the
IA-32 and Itanium
architectures…
VT-x
… followed by on-going evolution of support:
• Micro-architectural (e.g., lower VM switch times)
• Architectural (e.g., extended page tables (EPT)
Hardware support for IO-device virtualization:
• Device DMA remapping
• Direct assignment of I/O devices to VMs
• Device-independent control over DMA
VT-d
Increasingly better CPU and I/O virtualization
Performance and Functionality as I/O devices
and VMMs exploit infrastructure provided by VT-x,
VT-d
Public Specs
Today
Vector 4:
Trust
Trusted Execution Technology
• Secure Launch
• Memory protection, hardened keys
TXT
9 Copyright © 2007, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners
I/O Virtualization Without VT-d
Unmodified
OS (i.e. Linux*)
VMM
Application
Ring 3
Ring 0
Ring 0
Hardware
Operating
System
Application
Ring 3
NIC
VMM Enum mods
Export dev
Virtualized HW
Instance
Virtualized HW
Instance
1
2
VM Partition
VMM
Virtual Device Emulation
I/O Device Hardware
Device A
|||||||||||||||||||
Device B
|||||||||||||||||||
Buffer
Driver A
Driver B
Buffer
Emulation Based Virtual I/O
OS
Buffer
Driver A
Application
1
VT-d Features
DMA remapping
•
Multi-level page tables allow SW to manage host physical memory
and set up a hierarchy with page directories and tables
•
SW controllability for page walk snooping
•
Super page support (i.e. >4KB)
DMA fault logging
•
Fault recording registers
•
Advanced fault logging uses memory-resident fault log
Interrupt remapping
•
Routes based on originator ID
11 Copyright © 2007, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners
MCH
IA
D$
I$
I$
APIC
APIC
VMX
NIC
PCI Express
Root ports
VT-D,
Remapping
Scheme
VMD(q)
Unmodified
Operating
System
VMM
VMM
Application
Ring 3
Ring 0
Ring 0
Unmodified
Operating
System
Application
Ring 3
Remapping Scheme
Intel® Hardware Virtualization Technology for
Directed I/O (VTd)
Device
Device
Assignment
Assignment
Address
Address
Translation
Translation
Virtual
Machine 1
CPU# 1
VMCS
Memory map
Error!
OK
VM Partition
OS
Buffer
Driver A
Application
VMM
I/O Device Hardware
Device A
|||||||||||||||||||
Device B
|||||||||||||||||||
DMA Re-Map
Direct Assigned I/O via VT-d
VMM
Ring 3
Ring 0
Ring 0
Hardware
Operating
System
Application
Ring 3
NIC
VMM Enum mods
Export dev
Virtualized HW Instance2
13 Copyright © 2007, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners
VT-x and VT-d Summary
Modern multi-core processors’ outstanding performance/watt ratios
are enabling new use models, including Virtualization
Intel® Virtualization Technology™:
• Increases system performance
• Improves system robustness
• Allows creation of simpler/smaller Hypervisors
Part of industry momentum towards improved system performance,
security, and trust
Receive Side Scaling (RSS)
Core 0
I$
D$
Core 1
I$
D$
L2 Cache
Core 2
I$
D$
Core 3
I$
D$
L2 Cache
MSI-x
rx_que0
rx_que1
rx_que2
rx_que3
hash = (tcp->th_sport) ^ (tcp->th_dport) ^ (ip->ip_src.s_addr) ^ (ip->ip_dst.s_addr); hash = hash % PRIME_NUMBER; return lookup_table[hash];
Receive Side Scaling feature is now
designed into Intel® NICs
The NIC driver configures a redirection
table in the NIC
The NICs provide queues; each queue is
assigned a MAC address
The NIC hardware performs a five-tuple
hash of the arriving packet’s ip address
and returns a queue #
The NICs are MSI-x enabled, so an
interrupt can be generated per queue
Interrupts can be assigned to individual
cores: “Interrupt Affinity”
•
IRQ handler lives in core’s L2 cache. Greatly
increases performance
NICs can also be configured to load
balance IRQs across cores
15 Copyright © 2007, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners
MCH
(Blackford)
DCA
DMA
Core 0
I$
D$
Core 1
I$
D$
L2 Cache
1
Hash#
2
3
1
2
3
Memory
Receive
• Packet received by network controller
• Arriving packet data is DMAed into
system memory
• Packet header marked with DCA tag
• MCH issues Cache Hint to target Core
• Core pre-fetches tagged DCA data into
cache
Transmit
• Core posts data to be transmitted into
memory
• NIC reads data from memory and sends it
out on the wire
• NIC places completion notice with DCA
tag in memory
• Completion is pre-fetched into the core’s
cache
2 YEARS
2 YEARS
2 YEARS
2 YEARS
2 YEARS
2 YEARS
TOCK
NEHALEM
TOCK
SANDY BRIDGE
TICK
WESTMERE
TICK
PENRYN Family
TOCK
Core 2 Processor, Xeon
®
Processor
TICK Pentium
®
D, Xeon™, Core™ Processor
45nm
65nm
Intel
Intel
®
®
Product Cadence
Product Cadence
32nm
2006
2007
2008
2005
High K Dialectic
17
1W
Intel® Core™ 2 Duo
8W
35W
40W
Max Thermal
Design Power
“Silverthorn/Poulsbo”
Intel® Virtualization Technology; Trusted Execution Technology; IO Device Virtualization
“Rich Internet Experience”
Gen 4 Graphics Core New Graphics/FP Instructions “Larrabee”
Intel® Performance Primitives (Media & Signal Processing) Intel Math Kernel Library
Interrupt Coalescing; Per-Core Queues; Direct Cache Access; Front Side Bus-Attached FPGAs
All product information and dates are preliminary and subject to change without notice
Information Assurance (MILS)
1 Gb
I/O Optimization: Greater Bandwidth; Less Latency
10 Gb
> FPGA
Integrated
Graphics / Floating Point Processing
External
Ultra Mobile Industry Leading Performance Per Watt HPC
Intel® Xeon® 5100
Intel® Xeon® 5300
Extended Life-Cycle Silicon
Ultra Fine-Grained Power Management Enhanced Sleep States SOC Large L2 Caches
Threading Tools Performance Analyzers C++ & Fortran*Compilers
Intel® Core™ 2 Duo
Dual Core
Quad-Core
Next Generation Intel 45 nm High-k Process
Technology – “Penryn” Family
•
~2x larger transistor budget provides freedom to add new features and
higher performance with cost effective die sizes
•
>20% faster transistor switching speed delivers higher core speeds
and increased instructions per clock
•
Lower leakage current reduces power consumption or enables more
capability and performance within a given power envelope compared to
65nm processors
19 Copyright © 2007, Intel Corporation. All rights reserved.
*Other brands and names are the property of their respective owners
Improved OS Synchronization Primitive
Performance – Penryn Family
Faster locked instruction
performance
•
Key primitive for multiple thread
synchronization
– Faster locks enable more
concurrency between threads
•
Up to 55-80% faster
Faster interrupt masking control
•
Execution time critical to OS for
shared resource control
•
Uarch improvements to eliminate
pipeline stalls in the common case
•
CLI/STI instructions as much as
100% faster
Faster access of Time Stamp
Counter
•
RDTSC instr as much as 3x faster
•
Key functionality for database
servers, OS time-of-day services
frequent in transaction processing
Example spin lock sequence:
Used for controlling access to shared resources (i.e.
I/O, kernel state)
Applicable for MT/MP OS env
spin_lock:
lock dec [edi]
;atomic decrement
jns lock_acquired
;exit if lock was 1
spin:
pause
;otherwise loop until
cmp [edi], 0
; lock is released
jle spin
jmp spin_lock
;try to reacquire lock
lock_acquired:
ret
Penryn Family Virtualization Performance
Improvements
Physical Host Hardware
VM
1
VM Monitor
VM
0
Guest OS
0
App App
...
App...
Guest OS
1
App App