First generation hardware support - Nesting Virtual Machines in Virtualization Test Frameworks

duced. In Xen, well-known for its use of paravirtualization, the real device drivers reside in a privileged guest known as domain 0. A description of Xen can be found in subsection 3.6.3. However, Xen is not the only hypervisor that uses paravirtualization for I/O. VMware has a paravirtualized I/O device driver, vmxnet, that shares data structures with the hypervisor [10]. “A Performance Comparison of Hypervisors” states that by using the paravirtualized vmxnet network driver they can now run network I/O intensive datacenter applications with very acceptable network performance [24].

3.2.3 Memory management

Paravirtual interfaces can be used by both the hypervisor and guest to reduce hypervisor complexity and overhead in virtualizing x86 paging [19]. When using a paravirtualized memory management unit, the guest operating system page tables are registered directly with the MMU [22]. To reduce the overhead and complexity associated with the use of shadow page tables, the guest operating system has read- only access to the page tables. A page table update is passed to Xen via a hypercall and validated before being applied. Guest operating systems can locally queue page table updates and apply the entire batch with a single hypercall. This minimizes the number of hypercalls needed for the memory management.

3.3 First generation hardware support

In the meantime, processor vendors noticed that virtualization was becoming in- creasingly popular and they created a solution that solves the virtualization problem on the x86 architecture by introducing hardware assisted support. Hardware support for processor virtualization enables simple, robust and reliable hypervisor software [25]. It eliminates the need for the hypervisor to listen, trap and execute certain instructions for the guest OS [26]. Both Intel and AMD provide these hardware extensions in the form of Intel VT-x and AMD SVM respectively [11, 27, 28]. The first generation hardware support introduces a data structure for virtualization, together with specific instructions and a new execution flow. In AMD SVM, the data structure is called the virtual machine control block (VMCB). The VMCB combines control state with the guest’s processor state. Each guest has its own VMCB with its own control state and processor state. The VMCB contains a list of which instructions or events in the guest to intercept, various control bits and the guest’s processor state. The various control bits specify the execution environment of the guest or indicate special actions to be taken before running guest code. The VMCB is accessed by reading and writing to its physical address. The execution environment of the guest is referred to as guest mode. The execution environment of the hypervisor is called host mode. The new VMRUN instruction transfers control from host to guest mode. The instruction saves the current processor state and loads the corresponding guest state from the VMCB. The processor now runs the guest code until an intercept event occurs. This results in a #VMEXIT at which point

3.3. FIRST GENERATION HARDWARE SUPPORT 20

the processor writes the current guest state back to the VMCB and resumes host execution at the instruction following the VMRUN. The processor is then executing the hypervisor again. The hypervisor can retrieve information from the VMCB to handle the exit. When the effect of the exiting operation is emulated, the hypervisor can execute VMRUN again to return to guest mode.

Although Intel has implemented their own version of hardware support, it has many similarities with the implementation of AMD although the terminology is somewhat different. Intel uses a virtual machine control structure (VMCS) instead of a VMCB. A VMCS can be manipulated by the new instructions VMCLEAR, VMPTRLD, VMREAD and VMWRITE which clears, loads, reads from, and writes to a VMCS respectively. The hypervisor runs in “VMX root operation“ and the

guest in ”VMX non-root operation“ instead of host and guest mode. Software

enters the VMX operation by executing the VMXON instruction. From then on, the hypervisor can use a VMEntry to transfer control to one of its guest. There are two instructions available for triggering a VMEntry: VMLAUNCH and VMRESUME. As with AMD SVM, the hypervisor regains control using VMExits. Eventually, the hypervisor can leave the VMX operation with the instruction VMXOFF.

Figure 3.2: Execution flow using virtualization based on Intel VT-x.

The execution flow of a guest, virtualized by hardware support, can be seen in figure 3.2. The VMXON instruction starts and the VMXOFF stops the VMX operation. The guest is started using a VMEntry which loads the VMCS of the guest into the hardware. The hypervisor regains control using a VMExit when a guest tries to execute a privileged instruction. After intervention of the hypervisor, a VMEntry transfers control back to the guest. In the end, the guest can shut down and control is handed back to the hypervisor with a VMExit.

The basic idea behind the first generation hardware support is to fix the problem that the x86 architecture cannot be virtualized. The VMExit forces a transition from guest to hypervisor, which is based on the trap all exceptions and privileged instructions philosophy. Nevertheless, each transition between the hypervisor and a

3.3. FIRST GENERATION HARDWARE SUPPORT 21

virtual machine requires a fixed amount of processor cycles. When the hypervisor has to handle a complex operation, the overhead is relatively low. However, for a simple operation the overhead of switching from guest to hypervisor and back is relatively high. Creating processes, context switches, small page table updates are all simple operations that will have a large overhead. In these cases, software solutions like binary translation and paravirtualization perform better than hardware supported virtualization.

The overhead can be improved by reducing the number of processor cycles re- quired for a transition between guest and hypervisor. The exact number of extra processor cycles depends on the processor architecture. For Intel, the format and lay- out of the VMCS in memory is not architecturally defined, allowing implementation- specific optimizations to improve performance in VMX non-root operation and to reduce the latency of a VMEntry and VMExit [29]. Intel and AMD are improving these latencies in their next processors, as you can see for Intel in figure 3.3.

Figure 3.3: Latency reductions by CPU implementation [30].

System calls are an example of complex operations having a low transition overhead. System calls do not automatically transfer control from the guest to the hypervisor in hardware supported virtualization. A hypervisor intervention is only needed when the system call contains critical instructions. The overhead when a system call requires intervention is relatively low since a system call is rather complex and already requires a lot of processor cycles.

First generation hardware support does not include support for I/O virtualization and memory management unit virtualization. Hypervisors that use the first generation hardware extensions will need to use a software technique for virtualizing the I/O devices and the MMU. For the MMU, this can be done using shadow tables or paravirtualization of the MMU.

In document Nesting Virtual Machines in Virtualization Test Frameworks (Page 30-33)