• No results found

RCL: Software Prototype

N/A
N/A
Protected

Academic year: 2021

Share "RCL: Software Prototype"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

RCL: Software Prototype

D3.2.1

(2)

Scheduled delivery 30.06.2014 Actual delivery 30.06.2014

Version 1.0

Responsible Partner IBM

Dissemination Level

PU Public

Revision History

Date Editor Status Version Changes

18.05.2014 Ronen Kat Draft 0.1 Outline 25.05.2014 Yossi

Kuperman Draft 0.2 Added section 3

05.06.2014 Ronen Kat Draft 0.3 Merge IBM and RedHat updates 15.06.2014 Ronen Kat Draft 0.4 Added introduction and finishing 22.06.2014 Ronen Kat Draft 0.5 Input from internal reviewers 23.06.2014 Dave

Gilbert Draft 0.6 Address comments from reviewers on Red Hat sections 30.06.2014 Ronen Kat Final 1.0 Finishing

Contributors

Ronen Kat – IBM, Yossi Kuperman - IBM

Dave Gilbert – Red Hat, Andrea Arcangeli – Red Hat

Internal Reviewers

Vasileios Anagnostopoulos Luis Tomás

Copyright

This report is © by IBM and other members of the ORBIT Consortium 2013-2016. Its duplication is allowed only in the integral form for anyone's personal use and for the purposes of research or education.

Acknowledgements

The research leading to these results has received funding from the EC Seventh Framework Programme FP7/2007-2013 under grant agreement n° 609828.

(3)

Glossary of Acronyms

Acronym Definition

D Deliverable

DoW Description of Work EC European Commission PM Project Manager PO Project Officer WP Work Package

MTU Maximum Transition Unit

(4)

Table of Contents

1. Executive Summary...6

2. Introduction...7

3. I/O Consolidation...8

4. Memory Consolidation and Externalization...11

5. Cloud Management...13

(5)

List of Figures

Figure 1: Split I/O deployment...8

Figure 2: Memory externalization nested test-bed...11

List of Tables

Table 1: I/O consolidation: components internal names...9

Table 2: I/O consolidation: status of components...9

Table 3: I/O consolidation: status of API...10

Table 4: Memory consolidation and externalization: status of components...12

(6)

1.

Executive Summary

This document summarizes the prototype development work done as part of WP3. For this project interval, the first nine months of the project, we report for Task 3.1 (T3.1) and Task 3.2 (T3.2) – as Task 3.3 (T3.3) has not yet started.

Work for developing the I/O consolidation layer (T3.1), and memory consolidation and externalization layer (T3.2) is well under way and the development completeness coverage is well beyond the required 25% of this deliverable per the quality plan in D1.2.1 [1].

The next deliverable of the prototype D3.2.2 is scheduled for June 2015 - month 21 of the project.

(7)

2.

Introduction

This first software prototype delivery realizes the design and specification outlined in deliverable D3.1.1 [2].

The work and status of developing the I/O consolidation layer is described in Chapter 3, the work and status of memory consolidation and externalization layer is described in Chapter 4, and the work and status of the cloud management is described in Chapter 5.

2.1. Progress toward feature and API completeness

Development is well under way, and progress of the feature completeness is close to (or even above) 50% of the planned features. The progress for the API development is about 25% for the I/O consolidation layer and above 50% percent for the memory consolidation and externalization layer.

Work on cloud management has not started yet, and is scheduled to start on October 2014 per the project plan.

(8)

3.

I/O Consolidation

Our objective is to externalize I/O resources and consolidate all of them in a single dedicated appliance. We did so by detaching the back-end logic responsible for handling I/O from the hypervisor software stack and moved this logic from all the compute servers used to host the VMs to a remote server dedicated for I/O virtualization.

The prototype is comprised from the components described in deliverable D3.1.1, the internal names of the components are listed in Table 1, and the status is listed in Table 2.

3.1. Status of prototype

The prototype implementation is capable of creating block and network virtual devices. The exposed virtual devices are partially functional and not yet ready to be deployed.

It is possible to communicate over a virtual network device with the load generator and read/write a block of data from a remote block device that resides at the I/O hypervisor memory.

We deployed Split I/O prototype on 3 machines in our lab (depicted at Figure 1 from right to left): the load generator, the I/O hypervisor (back-end) and the host that runs the VM (front-end). Each machine is an IBM System x3550 M4, equipped with two 8-cores sockets of Intel Xeon E2660 CPU running at 2.2 GHz. Each machine is further equipped with 56GB of

memory and an Intel x520 dual port 10Gbps SRIOV NIC.

The prototype is implemented for Linux 3.9 kernel as a set of new kernel modules. Each module corresponds to a component described at deliverable 3.1.1 (see Table 1).

Modules 3.1.1, 3.1.21, 3.1.22, and 3.1.23 are installed on the I/O hypervisor machine, and modules modules 3.1.1, 3.1.11, 3.1.12 and 3.1.13 are installed on the VM. Note that module 3.1.1 is installed both on the VM and I/O hypervisor.

Note that component 3.1.1 (Ethernet transport) is used by both the front-end and the back-end. Its main purpose is to facilities with data transportation from both ends. It does so efficiently by using layer 2 (MAC layer) and thus avoids higher layers such as TCP/IP which incur high overhead. As block I/O requests have arbitrary sizes, and the size of the network packet that can be sent over the wire size is bounded by a Maximum Transition Unit (MTU), component 3.1.1 should fragment the requests before sending them to the other end for processing. The fragment size is determined by the MTU.

To instantiate a virtual device, we created a special user-space utility that operates the (generic back-end) kernel module (module 3.1.21). Invoking the utility on the I/O hypervisor with the following details: device type (block or net), backing device (e.g. local block device)

(9)

which in turn instructs the VM to expose a virtual device.

Bellow is a mapping between our internal modules names and the components described at D3.1.1.

Component Module Name Installed On

3.1.1 - Split I/O Ethernet Transport vrio_eth.ko VM + I/O hypervisor 3.1.11 - Split I/O Generic Front-End vrio_generic.ko VM

3.1.12 - Split I/O Block Front-End vrio_gblk.ko VM 3.1.13 - Split I/O Net Front-End vrio_gnet.ko VM

3.1.21 - Split I/O Generic Back-End vrio_generic.ko I/O hypervisor 3.1.22 - Split I/O Block Back-End vrio_hblk.ko I/O hypervisor 3.1.23 - Split I/O Net Back-End vrio_hnet.ko I/O hypervisor 3.1.31 - Split I/O management module vrio.py I/O hypervisor

Table 1: I/O consolidation: components internal names Notes:

1. vRIO (virtual Remote I/O) is the internal development name for Split I/O.

2. Module vrio_generic.ko is used both for the generic front-end and the generic

back-end.

3.2. External interactions

Management for the Split I/O hypervisor is provided through the split I/O hypervisor python module. Status of development is described in the Table 2.

The management library will be used by the cloud management layer as part of T3.3.

3.3. Completion status of components

In the next table (Table 2) we show the development status of split I/O modules included in this prototype.

Component Status and progress

3.1.1 - Split I/O Ethernet Transport 70% completed 3.1.11 - Split I/O Generic Front-End 70% completed 3.1.12 - Split I/O Block Front-End 40% completed 3.1.13 - Split I/O Net Front-End 50% completed 3.1.21 - Split I/O Generic Back-End 70% completed 3.1.22 - Split I/O Block Back-End 40% completed 3.1.23 - Split I/O Net Back-End 50% completed 3.1.31 - Split I/O management module Not started

Table 2: I/O consolidation: status of components

In the next table (Table 3) we show development status of the split I/O API functions.

(10)

RESET_GUEST_DEVICES() Not started CREATE_BLOCK_DEVICE() 80% completed REMOVE_BLOCK_DEVICE() Not started CREATE_NETWORK_DEVICE() 80% completed REMOVE_NETWORK_DEVICE() Not started

(11)

4.

Memory Consolidation and

Externalization

Our objective is to provide a mechanism that allows the hypervisor to retrieve memory pages from a remote system for use in a VM; this is being used as part of a post-copy

migration implementation where a migrated VM starts running on the destination host prior to all of the memory being copied over.

4.1. Status of prototype

The prototype is capable of performing small post-copy migrations, with page requests making the full round trip in limited situations, but is incomplete and not yet stable. The prototype consists of modifications to both QEMU and the Linux kernel (v3.13 currently). The components are as described in deliverable 3.1.1.

The current deployment is within a testing environment consisting of a nested hypervisor allowing all components to be easily tested on one machine as shown in Figure 2. The two QEMU instances are the source (left) and destination which has a partial copy of it's

memory.

The Linux kernel running in the L1 guest includes the 'Linux kernel mm subsystem

enhancements' (component 3.2.1). These allow the destination QEMU instance to mark an area of memory as 'userfault' – i.e. external, which casues QEMU to be notified when the guest accesses the page. At a later point in time, the QEMU uses another kernel

modification to atomically (and efficiently) move a page into place to satisfy the previous request.

The QEMU running in the L1 guest contains the modifications from components 3.2.2 and 3.2.3. 3.2.2 – 'Remote memory front end' routes page requests/data between the kernel on the destination machine and the network towards the source machine. Requests from the kernel are recorded in a page ownership datastructure, and sent as requests along a 'return path' to the source VM.

3.2.3 – 'Remote memory handler' satisfies page requests from the destination machine and routes control messages to the Remote memory front end; these include the messages that initiate the transition to postcopy.

(12)

The existing QEMU migration protocol has been modified to include commands to initiate the postcopy mode, and to provide a bidirectional transport allowing the destination to request pages. The mode is enabled using an extra migration-capability flag.

4.2. External interactions

None yet.

4.3. Completion status of components

The status reflects the status of a prototype that is starting to work; the basics are there but need filling out, making more robust and tidying up before submission to the upstream projects.

Component Status and progress

3.2.1 Linux kernel mm subsystem enhancements 60% completed 3.2.2 Remote memory front end 50% completed 3.2.3 Remote memory handler 50% completed

Table 4: Memory consolidation and externalization: status of components Figure 2: Memory externalization nested test-bed.

Host system (running Linux with unmodified KVM)

L1 guest (running modified Linux kernel) Host (unmodified)

QEMU instance:

L2 guest (unmodified) L1 modified

QEMU instance (1): L1 modifiedQEMU instance (2):

(13)

4.4. Additional notes

The next step is to finish the functionality and stabilise the current version so that arbitrary guests can be transferred.

(14)

5.

Cloud Management

The cloud management is part of Task 3.3 which is scheduled to start at M13 (October 2014). Therefore, no components of the cloud management are included in this deliverable.

(15)

6.

References

[1]. ORBIT document - D1.2.1 - Quality Plan, February 12, 2014 [2]. ORBIT document - D3.1.1 - RCL Design And Open Specification

References

Related documents

We perform a joint estimation of the physical and risk-neutral dynamics of a matrix affine jump diffusion (MAJD) model in order to study option-implied risks, i.e., risks generated

The! problem! definition! is! divided! into! four! research! questions.! The! first! and! second! research! questions! are! theory! based! and! discuss! Open! Innovation! strategy!

From the main menu, enter 26 to access Menu 26 — Schedule Setup as shown next. Figure 14-1 Menu 26 —

The estimated income elasticities from Equation 3 are as expected, where lower income groups are more likely to increase their food budget in response to

Video record from deaf people as model Malaysian Federation of the Deaf (Persekutuan Orang Pekak Malaysia) CD-ROM containing Guidelines to Sign Language (CD-ROM by

In addition, elementary general education teachers seldom address social or play skills within the classroom setting, often the most critically challenging deficit in autism and

Based on a dot array calibration template correction method, cubic B-spline interpolating function is adopted to interpolate the curved faces of the pixel’s