• No results found

PROCESSOR VIRTUALIZATION ON EMBEDDED LINUX SYSTEMS

N/A
N/A
Protected

Academic year: 2021

Share "PROCESSOR VIRTUALIZATION ON EMBEDDED LINUX SYSTEMS"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

PROCESSOR VIRTUALIZATION ON EMBEDDED LINUX SYSTEMS

Geoffrey Papaux, Daniel Gachet, and Wolfram Luithardt

Institute of Smart and Secured Systems (iSIS), University of Applied Sciences and Arts Western Switzerland // Fribourg Boulevard de Perolles 80, CH-1705 Fribourg/Switzerland

email: [email protected] web: isis.eia-fr.ch

ABSTRACT

The advent of increasingly powerful low-power proces-sors offers new opportunities for embedded systems. Instead of multiple small microprocessors devoted to a single task, a centralized multi-core processor can be used to run all ap-plications, while ensuring isolation and resources allocation for critical tasks. Seeing a growing interest for bringing vir-tualization on embedded systems, IP suppliers, such as ARM, added hardware extensions to their architectures for provid-ing a native virtualization support.

KVM is an open source hypervisor integrated in the Linux

Kernel and offering ARM support. A prototype running

KVM/ARM on the TI OMAP5432 uEVM board, with com-plete software stack for simplifying virtual machines man-agement has been created during this work.

The benchmarks executed show an overhead of 1% to 3% for CPU intensive applications but also demonstrate a per-formance degradation of about 60% for memory-intensive applications such as matrix multiplication. Paravirtualized vs. emulated devices have been analyzed, pointing out the extremely poor performance of emulated devices, while par-avirtualized devices achieve near native performance.

1. INTRODUCTION

Since a couple of years, there is a revolution in the domain of embedded systems with the advent of increasingly pow-erful low-power microprocessors. Many electrical devices of everyday life integrate now a microprocessor, adding new features and opportunities. A new generation of cheap, low-power but very low-powerful microprocessors will change the way that industrial equipment is built. ARM, a well-known actor in this field, designs a wide range of low-power and highly efficient microprocessors to satisfy the increasing per-formance needs of modern embedded systems. With this change a new trend has started: Most of the equipment man-ufacturers are moving from their proprietary operating sys-tems towards open source syssys-tems. Linux is now playing a much more important role.

The new multi-core processors give new opportunities to companies developing embedded systems. Instead of using several microprocessors spread over the system board and performing each a specific task, a single multi-core proces-sor could run multiple operating systems performing all the needed operations. This implies the ability to dedicate some cores to specific tasks or to update applications running on a dedicated core without affecting the other ones. To satisfy systems requirements, it could be necessary to run different operating systems on each core, for instance a standard Linux Kernel to handle non time-critical tasks (e.g. human-machine interface, remote access interface, firewall, etc.) while

an-other handling highly reactive tasks (e.g. routing, switching, hardware devices management, etc.).

The requirements of running multiple systems on the same platform, while ensuring resource isolation, have been addressed several years ago with virtualization

technolo-gies. Server infrastructures have widely adopted

virtual-ization to optimize resource usage and service availability. With the increasing performances of the new microprocessor generations, this is becoming reality on embedded systems. However before wide adoption from the industry, some is-sues, like portability or performance penalties have to be ad-dressed.

1.1 Objectives

The objective pursued by the present work [5] is two-fold: (i) Implement and validate the virtualization architecture on a real hardware device, in this work the Texas Instruments OMAP5432 uEVM board featuring a dual-core Cortex-A15 processor; (ii) Perform a first performance analysis of the virtualization mechanisms and of the hardware virtualization extensions that have recently been added to the ARM archi-tecture and supported by Linux and KVM.

1.2 Novelties

In order to increase the popularity of virtualization within embedded systems, the present work brings an additional milestone in porting Linux and KVM virtualization to ARM embedded systems and further knowledge bases in perfor-mance understanding on virtualization extensions.

1.3 Structure of the paper

This paper is organized as follows: Section 2 presents the main application domains of the virtualization in embedded Linux systems. Section 3 introduces the state-of-the-art in the area of virtualization mechanisms, from simple mecha-nisms up to complete virtualization. Section 4 explains the porting of KVM/ARM to the TI OMAP 5432 uEVM Board. Section 5 illustrates the results of the performances measure-ments performed on several use cases. Finally, Section 6 summarizes the work and sets the further steps.

2. APPLICATION DOMAINS

Bringing virtualization on embedded systems opens various application scenarios, given the widespread use of ARM

pro-cessors. Besides the advantage of consolidating multiple

small microprocessors into a single multi-core processor, this will allow embedded system vendors for a smooth transition to Linux, while keeping legacy applications running in a vir-tual machine.

(2)

Virtualization within embedded systems will often pro-vide better software architectures, increasing the overall sys-tem robustness, reliability and security with a positive impact on scalability and simplicity of the system. Live applica-tion updates could be another topic of interest. Apart from server farms and desktop use case, two major domains could be foreseen in the embedded field, namely: heavy embedded systems and real-time embedded systems.

In heavy embedded systems, such as modern telecom equipment, several challenging aspects have to be mas-tered: software complexity, security, reliability, modularity or maintainability. As shown in Figure 1, virtualization pro-vides a very powerful way to solve those issues. Some virtual machines (guests) could be devoted to tasks requiring isola-tion and high security (e.g. remote management interfaces), sharing a set of processor cores while time-critical tasks are running on a guest with a dedicated core.

Core 1 Microprocessor Hypervisor Guest (Switching) Sw it ch Remote Management

Core 2 Core 3 Core 4 Guest

(Management)

Guest

(Device)

...

Figure 1: Heavy Embedded System.

In real-time embedded systems, as in the field of pro-cess control systems, communication takes even more im-portance. However, it’s still crucial for such systems to keep good performance and accuracy of all real-time aspects. As depicted in Figure 2, virtualization may enable running real-time guest OS for process control and other operating sys-tems for auxiliary tasks on the same device.

Core 1 Core 2 Microprocessor Hypervisor Guest (Device) Guest (Process Control) P ro p ri et ar y H ar d w ar e Remote Management Guest (Communication, Firewall)

Figure 2: Real-Time Embedded System.

Linux with KVM virtualization on low cost and high power ARM microprocessors seems to be very promising in the embedded system field to solve these recurring issues en-countered in the industry. The present work focuses mainly on the first scenario (heavy embedded systems) and further work will be necessary to analyze the real-time use case. Es-pecially, the new level of indirection added by the virtualiza-tion layer (hypervisor) must be carefully analyzed if it can fulfill real-time requirements.

3. STATE OF THE ART

Isolating multiple applications running on the same hardware can be performed at several levels. The first option that can be assimilated as lightweight virtualization is based on the concept of containers. The second family with stronger iso-lation, but usually considered to be heavier, are those running true virtual machines on top of a hypervisor.

3.1 Containers

A container is an isolated user space instance sharing a com-mon kernel with the host that can be seen as an enhanced

of chroot. This solution is also named operating system

level virtualization, because the hardware is shared between all containers. FreeBSD has its own jail mechanism while Linux sees two major alternatives, OpenVZ and LXC (Linux Container). OpenVZ requires a specific kernel while LXC relies entirely on features provided by a stock Linux Ker-nel. LXC uses cgroups for limiting resource usage (disk I/O, bandwidth, CPU time, memory, etc.) and kernel namespaces for the isolation of process IDs, network stack, IPC, etc.

Because of their nature, containers are tightly coupled to Linux and cannot run other operating systems. But they offer an efficient solution to isolate multiple user space instances with little overhead, which can be deployed on every type of machine running a Linux Kernel. Containers open new opportunities for application development and deployment, as illustrated by the recent release of Docker, an open source project aiming to provide an infrastructure for packaging an application and its dependencies in a single container that can be shipped and deployed very easily.

Containers cannot offer and ensure full isolation because of the shared host Kernel. Indeed, if an application running in a container can gain root access on the host through Ker-nel vulnerability, it has full access to the host machine and all running containers. Similarly, a Kernel crash caused in a container will take down the host and all running containers at the same time.

3.2 ”True” virtualization

Pure virtualization solutions, namely full virtualization, par-avirtualizationand hardware-assisted virtualization ensure a much better isolation between the virtualized systems and the hypervisor. First, the hardware seen by the virtual machines is an abstraction presented by the hypervisor so the guest sys-tem is completely decoupled from the real hardware. Then, the operating system running in the virtual machine is fully independent; this avoids sharing a common OS between the host and the guests, which could thus be different for each machine. In this architecture, the hypervisor plays a central role. It is responsible for the provision and management of the guests as well as ensuring isolation between the guests and the host. This implies virtualizing the CPU, the memory and other I/O devices with various techniques experimented and refined over the years.

For this work, we focus on hardware-assisted virtualiza-tion. The hardware provides a set of extensions that the hy-pervisor can use and configure in order to run unmodified guest systems. For the x86 world, these extensions are Intel

VT-x and AMD-V. More extensions have been added

after-wards, e.g. for memory virtualization (Intel EPT) and device virtualization (Intel VT-d). Paravirtualization is still a valu-able asset to provide high performance device virtualization.

(3)

Several software solutions exist for x86 platforms. VMware is a well-established commercial software with a variety of products covering the needs from desktop to server virtualization. The open source counterpart for desktop vir-tualization is VirtualBox. For the server virvir-tualization, there are two major open source actors. Xen, based on micro-kernel architecture, supports both paravirtualization (Xen PV) and hardware assisted virtualization (Xen HVM) for run-ning unmodified guest systems. The second open source alternative, KVM (Kernel-based Virtual Machine), is fully integrated in the Linux Kernel and targets only hardware-assisted virtualization.

3.3 Virtualization on ARM

The idea of bringing virtualization technologies on ARM platforms is relatively new; therefore available solutions are under active development and are continuously improved. Since the ARM architecture is not strictly virtualizable (see Section 4.1), some attempts to bring virtualization sup-port with paravirtualized solutions or binary translation have emerged around companies and the open source community. Xen and KVM have seen contributors working on paravir-tualized solutions with respectively Xen ARM PV,

Embed-dedXEN[7] and the KVM for ARM project [2].

Understanding the growing interest for virtualization on their platform, ARM started working on extensions to their architecture to add hardware virtualization support. When the ARM virtualization extensions have been released, other KVM and Xen projects started to implement a virtualization solution taking benefit of these new features, respectively KVM/ARM [3] and Xen on ARM.

4. KVM/ARM ON TI OMAP 5432 UEVM There were two open source alternatives for bringing virtual-ization to the TI OMAP 5432 uEVM board: KVM and Xen. Both were merged in the Linux Kernel roughly at the same time (Kernel version 3.8 for Xen and 3.9 for KVM). In this work, KVM was chosen over Xen for the following reasons: 1. Hypervisor architecture: KVM aims to extend the Linux Kernel so that Linux itself becomes an hypervisor. This has the advantage of reusing mature Kernel components as well as directly benefit from Linux improvements, which is profitable when dealing with a recent hardware board such as the TI OMAP 5432 uEVM;

2. Full virtualization support: for his ARM port, the devel-oper team of Xen decided to get rid of the QEMU stack and to rely heavily on paravirtualization. KVM/ARM stays in line with the original x86 KVM and supports running an unmodified guest without any layer of par-avirtualization which is a great asset when it comes to run proprietary / legacy systems in a virtual machine; 3. Maturity: at the beginning of our work, the KVM/ARM

port was considered more mature because of its more ad-vanced upstream integration in the Linux Kernel and re-sources available about other ARM development boards successfully running KVM/ARM. On Xen, for instance, guest SMP support was introduced later in the 3.10 Linux Kernel.

4.1 ARM Virtualization Extensions

The traditional ARM architecture does not meet the strictly virtualizablerequirements, as defined in [6]. For instance, some bits of the Current Program Status Register (CPSR) are only accessible from a privileged mode (e.g. the bits [0:4] defining the current processor mode). Trying to write these bits (MSR instruction) from user mode is simply ig-nored, without any possible notification or trap. Reading the CPSR from user mode is allowed (MRS instruction), but the privileged bits (such as the mode bits [0:4]) will have an UN-KNOWN value, as specified in the Architecture Reference Manual [1]. This is only one example of an ARM sensitive instruction, other examples could be the instructions for co-processor access (MCR/MRC).

ARM added a set of extensions [9] enabling the devel-opment of a full virtualization solution relying exclusively on hardware features, thus eliminating the need for

paravir-tualization or dynamic binary translation. These are

op-tional extensions for the ARMv7-A architecture, integrated in ARMv8-A Architecture. These extensions include a new execution mode for the hypervisor (HYP mode illustrated in Figure 3), a two-stages address translation mechanism (see Figure 4), the Large Physical Address Extensions (LPAE), a Generic Interrupt Controller and a Generic Timer. As of now, two 32 bits processors include the virtualization exten-sions, the Cortex-A15 and the Cortex-A7. The two ARM 64 bits processors, Cortex-A57 and the Cortex-A53, will also include the virtualization extensions in the near future. The TI OMAP 5432 uEVM board is powered by two Cortex A-15 cores implementing the ARM virtualization extensions [8].

Virtual Machine Monitor / Hypervisor Guest OS Guest OS

App1 App2 App1 App2 (Non-privileged)User mode

Supervisor mode (Privileged)

Hyp mode (More Privileged)

Guest Guest

Figure 3: New HYP execution mode introduced with the ARM virtualization extensions.

Virtual Address space (VA) Intermediate Physical Address space (IPA) Stage 1 translation Owned by Guest OS Guest 1 Guest 2 Stage 2 translation Owned by Hypervisor Physical Address Space (PA) Figure 4: Tow-stage memory address translation [4].

(4)

4.2 Software architecture of an embedded hypervisor The aim of this work was to setup a fully featured software stack to turn the TI OMAP 5432 uEVM in an embedded hy-pervisor with easy remote management and configuration for virtual machines, as illustrated in Figure 5.

OMAP 5432 uEVM

Virtual Machine

Linux Kernel 3.8

Cortex-A15 Memory Storage I/O Devices

App1 Linux Kernel 3.12.4 kvm QEMU 1.6 App2 Virtual Machine Linux Kernel 3.12 App1 App2 Libvirtd 1.1.4 virtio sshd

Figure 5: Software stack for turning TI OMAP 5432 uEVM into an embedded hypervisor.

The first piece of software is the boot loader. Das U-Boot, the widely adopted open source boot loader has been used. It is also the default boot loader provided by TI for this board. U-Boot then starts a KVM/ARM capable Linux Kernel and a root file system is mounted with user space applications for managing the hypervisor and starting virtual machines. First, a traditional SSH daemon is started to enable remote access to the hypervisor. Libvirt runs on the system for vir-tual machine management. The virtio paravirvir-tualized inter-face is there to provide high performance devices instead of inefficient device emulation. On top of the hypervisor, three different vanilla Linux Kernel versions have been tested: a 3.8 and 3.10 successfully booted with one virtual CPU and a 3.12 Linux Kernel which was able to boot with two virtual CPUs attached.

4.3 Issues

Porting KVM/ARM to the TI OMAP 5432 uEVM board was supposed to be a relatively easy process, because the board was shipped with Linux and KVM is integrated in the Kernel. For the reason that KVM support on ARM is relatively new, latest software versions are often required to work properly. This was true for most of the software stack when our work started in September 2013. The following issues had to be resolved:

4.3.1 U-Boot

Before starting the Linux Kernel, both CPUs have to be en-tered in the new HYP mode. This has to be done quite early in the boot process, when the processor is still in secure mode, and is therefore a task for U-Boot which has to be modified accordingly. Moreover, another patch is required for getting network working in U-Boot on the OMAP board (e.g. for loading a Kernel through TFTP).

4.3.2 Linux Kernel

First of all, a Kernel patched by TI is required. At the begin-ning of our work (Sept. 2013), the version provided was 3.8 while KVM/ARM requires at least 3.9. Works were ongoing for the 3.12 version, which is now complete and TI officially released a supported 3.12 Kernel for the OMAP board. But in the meantime we had to experiment with heavily manually patched Kernels. Now, even with the official 3.12 version some patches are still necessary for KVM/ARM to work on the board. For instance, the LPAE feature has to be enabled in the Kernel for KVM/ARM support. Once LPAE enabled, USB fails to initialize complaining about the ehci-omap driver using a 32 bits DMA mask instead of a 64 bits mask. Another issue is about the new vGIC which is not fully de-fined in the OMAP5 device tree file. Another modification is required in this device tree file to define the Hardware PMU, enabling the use of hardware performance counters with the Linux perf subsystem.

4.3.3 QEMU and Libvirt

QEMU is the user space application using the KVM inter-face provided by the Linux Kernel to start virtual machines. Therefore we need a version which can deal with the new KVM/ARM. As support for paravirtualized devices with vir-tio has been introduced in QEMU 1.6 (as experimental fea-ture), we recommend using at least this version.

Libvirt was tightly coupled to its original target platform, x86, and until the version 1.1.3 still relies on the presence of a PCI bus. This generates an error on the ARM platform where no PCI bus is provided. Therefore at least the version 1.1.3 is needed for KVM/ARM.

4.4 Yocto layer meta-kvm-arm

TI is involved in the Yocto project and officially releases lay-ers for their platforms (meta-ti layer). TI has even been ac-cepted as a ”Yocto Project Participant”. Thanks to the modu-lar approach and the flexibility offered by Yocto, the meta-ti BSP layer can be integrated with any other layer for building a custom Linux distribution that meets specific needs. In our case, we combined it with the meta-virtualization layer for packaging additional software required to build a fully fea-tured hypervisor. However since some patched or newer soft-ware versions were needed, a custom layer has been created: meta-kvm-arm. It customizes and overrides some recipes with ”.bbappends”. This layer features an omap5-evm-kvm machine along with a new image kvm-image-extended in-cluding all pieces of software presented in this section. Se-lecting this machine and this image will generate everything needed to run KVM/ARM on the OMAP board (Kernel im-age, DTB, U-Boot, root file system, etc.).

5. PERFORMANCE MEASUREMENTS Several benchmarks have been run on our setup to measure the overhead introduced by the virtualization on ARM. Three configurations are compared, all compiled with the same Yocto build system and all based on a 3.12.4 Kernel:

• Native: all virtualization-related features are disabled; • Hypervisor: LPAE, vGIC and KVM virtualization

en-abled;

• Virtual machine: guest kernel running on top of QEMU and KVM/ARM accessed by SSH to run the tests.

(5)

5.1 Raw computational performance

The aim of this benchmark series is to compare pure compu-tational tasks. NBench performs various operations such as sorting and bitfield manipulation. Results in Figure 6 show almost no performance impact for such tasks (1% to 3%).

1.00 0.99 0.97 1.00 1.00 0.98 1.00 1.00 0.98 1.00 1.00 0.98 1.00 1.00 0.99 0.00 0.50 1.00 NUMERIC SORT

STRING SORT BITFIELD FP EMULATION

FOURIER NBench

(Normalized performance index, higher is better)

Native Hypervisor Virtual Machine

Figure 6: NBench benchmark results. 5.2 I/O Devices

The next benchmark compares virtualized devices to real hardware. We measured network bandwidth with iperf ex-ecuted as a server and as a client (TCP) with default options. In Figure 7 we notice the poor results of emulation. On the other hand, the virtio paravirtualized interface shows near-native performance. When started as a client on the OMAP board, iperf reports smaller bandwidth, probably because of the additional load on the CPU to generate network traffic.

2.26 2.67 93.26 79.52 94.78 94.30 94.72 94.12 0.00 20.00 40.00 60.00 80.00 100.00 iperf TCP (server) iperf TCP (client) Bandwidth [Mb/s] Network Bandwidth

(VM with bridged interface)

Native Hypervisor VM Virtio VM Emulation

Figure 7: Network bandwidth benchmark results comparing hardware, emulated and paravirtualized network devices. 5.3 Memory performance

After running other advanced benchmark series [5], we no-ticed a substantial performance degradation on memory-intensive applications of about 60% for a matrix multiplica-tion. Simpler operation such as reading 1-D array shows that random memory accesses are heavily impacted, as illustrated in Figure 8. For small arrays, there is no difference but for arrays occupying 4MB of memory we start to notice a perfor-mance impact. And it is getting worse with bigger arrays. On the other hand, sequential accesses seem not to be impacted at all. This can be explained by a cache issue, either for array data or other caches such as TLB due to the two-stage ad-dress translation. We also collected various hardware events with the Linux perf tool, without being able to find an ex-planation for this performance issue.

0 500 1000 1500 2000 2500

1D Array Random Read

(Random index computed in main loop)

hypervisor vm

Array memory usage

Ti m e [s ] 0 10 20 30 40 50

1D Array Sequential Read

(Main loop iterator as sequential index)

hypervisor vm

Array memory usage

Ti

m

e

[s

]

Figure 8: Benchmark result showing the performance issue observed for random memory read.

6. CONCLUSION

The present work has demonstrated that the creation of a working prototype running KVM/ARM on the TI OMAP5432 uEVM board, with a rich software stack pro-viding easier virtual machine management is nowadays fea-sible and provides very good performance. The benchmark-ing results show a very low overhead for pure computational tasks. However, a performance issue has been detected for memory-intensive applications such as a matrix multiplica-tion and needs further investigamultiplica-tions. Paravirtualized devices have shown near-native performance. This work is just a first step; this prototype opens various perspectives for fu-ture works, such as interrupt latency measurement or specific hardware device access from the guests. Extensive tests shall be performed to see to what extent a virtualized architecture can fulfill real-time requirements. The ARM big.LITTLE processors and the ARM 64 bits versions will also open new application domains. The Yocto layer created can be easily extended to support new ARM boards in the future.

ACKNOWLEDGEMENT

Many thanks go to the HES-SO//Fribourg (University of Ap-plied Sciences and Arts Western Switzerland // Fribourg) and iSIS (Institute of Smart and Secured Systems) competence network for its support.

REFERENCES

[1] ARM. ARM Architecture Reference Manual ARM v7-A and ARMv7-R. 2012.

[2] C. Dall and J. Nieh. KVM for ARM. 2010.

[3] C. Dall and J. Nieh. KVM / ARM : Experiences Building the Linux ARM Hypervisor. 2013.

[4] J. Goodacre. Hardware accelerated Virtualization in the ARM Cortex Processors, 2010.

[5] G. Papaux. Virtualization on ARM processors under

Linux. Technical report, HES-SO//Master, 2014. [6] G. Popek and R. Goldberg. Formal requirements for

vir-tualizable third generation architectures. 1974.

[7] D. Rossier. EmbeddedXEN: A Revisited Architecture of the XEN hypervisor. (June), 2012.

[8] TI. Technical Reference Manual OMAP543x. 2013. [9] P. Varanasi and G. Heiser. Hardware-supported

References

Related documents