Ver1.1 2013/02/10
Etsuji Nakai
Deploying Baremetal Instances
with OpenStack
2
NII dodai-compute2.0 project
$ who am i
–
The author of “Professional Linux Systems” series.
• Available only in Japanese. Translation offering from publishers are welcomed ;-)
Professional Linux Systems
Technology for Next Decade Deployment and ManagementProfessional Linux Systems Professional Linux SystemsNetwork Management
Etsuji Nakai
–
Senior solution architect and cloud
evangelist at Red Hat.
–
Working for NII (National Institute of
Informatics Japan) as a cloud technology
consultant.
4
NII dodai-compute2.0 project
Why does baremetal matter?
General usecase
–
I/O Intensive application (RDB)
–
Realtime application (Deterministic latency)
–
Native Processor Features
–
etc....
Specific usecase in “Academic Research Cloud (ARC)” of NII
–
Flexible extension of existing server cluster.
5
Academic Research Cloud (ARC) in NII, today.
This is a prototype of the Japan-wide research cloud.
– It's now running in NII's laboratories, and will be extended as a Japan-wide research cloud.
Research labs can extend their existing clusters (HPC cluster, cloud infrastructures,
etc...) by attaching baremetal servers from the resource pool.
・・・ ・・・
・・・ ・・・
Existing HPC Cluster
Existing Cloud Infrastructure L2 connection(VLAN)
Baremetal Resource Pool
・・・
On-demand provisioning/
de-provisioning Flexible extension of existing cluster
6
NII dodai-compute2.0 project
Future plan of the ARC.
ARC will be extended as a Japan-wide cloud with SINET4 WAN connection.
– SINET4 is a MPLS based wide area Ethernet service for academic facilities in Japan, operated by NII. ・・・ ・・・ ・・・ ・・・ Existing HPC Cluster
Existing Cloud Infrastructure Baremetal Resource Pool
・・・
http://www.sinet.ad.jp/index_en.html MPLS based Wide Area Ethernet
7
Overview of dodai-compute1.0
What is dodai-compute?
–
Baremetal driver extension of Nova, currently
used in ARC.
•
Designed and developed by NII in 2012
•
Based on Diablo with Ubuntu 11.10
•
Source codes – https://github.com/nii-cloud/dodai-compute
–
Upside: Simple extension aimed for the specific usecase :-)
–
Downside: Unsuitable for general usecase :-(
•
Cannot manage mixed environment of baremetal and hypervisor hosts.
•
One-to-one mapping from instance flavor to baremetal host. (No scheduling
logic to select suitable host automatically.)
•
Nonstandard use of availability zone. (Used for host status management.)
The most outstanding issue
-It's not merged in upstream.
No community support,
No future!
8
NII dodai-compute2.0 project
Planning of ARC baremetal provisioning feature
It should be designed based on the framework in the upstream.
–
Existing framework: GeneralBareMetalProvisioningFramework.
•
So called “NTTdocomo-openstack.”
•
Blueprint - http://wiki.openstack.org/GeneralBareMetalProvisioningFramework
•
Source codes - https://github.com/NTTdocomo-openstack/nova
As a first step, we compared the architectures of “dodai-compute”
and “NTTdocomo-openstack”, and considered the following things.
–
What's common and what's uncommon?
–
What can be more generalized in “NTTdocomo-openstack”?
–
What should be added to be used for ARC?
The goal of the project “dodai-compute2.0” is
- Extend the upstream framework for ARC.
- Not to be a private branch, stay in the upstream.
Note:
– NTTdocomo-openstack branch has been merged in the upstream with many modifications. Although this slide is based on NTTdocomo-openstack branch, the future extension will be done directly on the upstream.
9
By the way, what does “dodai” stand for?
1. Base, Foundation, Framework, etc...
Comparison of dodai-compute1.0
and NTTdocomo-openstack
11
Today's Topics
1. Coupling Structure with Nova Scheduler.
2. OS Provisioning Mechanism.
13
General flow of instance launch
Compute Driver
Select host
for new instance
VM
Nova Scheduler
Compute Driver
VM
VM
・
・
・
VM
Asks to launch instance
Launch VM
Register hosts
to scheduler
Question:
–
How can we apply baremetal servers in place of VM instances in
14
NII dodai-compute2.0 project
A1. Register “Baremetal Pool” as an “Instance Host”
Compute Driver
Nova Scheduler
Launch
baremetal
server
Asks to
launch instance
Compute Driver
Select pool
for new instance
Baremetal
Pool
Select baremetal server
to launch
Baremetal
Pool
dodai-compute takes this approach. Its driver acts as a single host which
accommodates multiple baremetal servers.
15
A2. Register each baremetal as a “Single Instance Host”
Compute Driver
Nova Scheduler
Launch selected
baremetal server
Asks to
launch instance
Select baremetal server
for new instance
NTTdocomo-openstack takes this approach. Its driver acts as a proxy for
baremetal servers, each of them accommodates just one instance.
Register each baremetal
as host
16
NII dodai-compute2.0 project
Class structure for coupling with Nova
dodai-compute1.0 and NTTdocomo-openstack has basically the same class
structure in terms of coupling with Nova.
–
The drawing is the case of dodai-compute1.0
–
NTTdocomo-openstack uses “BareMetalDriver” in place of “DodaiConnection”
https://github.com/nii-cloud/dodai-compute/wiki/Developer-guide
Base class of different kinds of visualization hosts
Driver for libvirt managed hypervisor (KVM/LXC) Driver for baremetal management
17
How does Nova Scheduler see baremetal servers?
dodai-compute's driver acts as a single host which accommodates multiple
baremetal servers.
–
It's like representing a baremetal pool as a single “Host” which runs baremetal
servers as its “VM's”.
–
Scheduling policy is implemented in the driver side. (Nova Scheduler has no choice
of hosts.)
Nova API Nova Scheduler Nova Compute (dodaiConnection) Scheduler recognizes it as a single host dodai db(Baremetal serverinformation)・・・
Choose host to provision by referring to dodai db A host of “baremetal VM's”
18
NII dodai-compute2.0 project
How does Nova Scheduler see baremetal servers?
NTTdocomo-openstack driver acts as a proxy of all baremetal hosts.
–
Each baremetal server is seen as an independent host which can accommodate up to
one instance.
–
Scheduling policy is implemented as a part of Nova Scheduler. It uses "extra_specs”
metadata to distinguish baremetal hosts from hypervisor hosts.
Nova API Nova Scheduler
Nova Compute (BareMetalDriver)
Scheduler recognizes all baremetal hosts
・・・
Register all hosts by referring to baremetal db Hosts of just one instance beremetal db (Baremetal
serverinformation)
19
Considerations on the Nova Scheduler coupling
dodai-compute
–
Scheduling (server selection logic) is up to the driver.
•
Currently, there's no intelligence in the driver's scheduler. One-to-one mappings
between physical servers and instance types are pre-defined.
•
However, it enables users to choose a baremetal server explicitly.
NTTdocomo-openstack
–
Scheduling (server selection logic) is up to Nova Scheduler.
•
Currently, the standard “Filter Scheduler” is used.
•
“instance_type_extra_specs=cup_arch:x86_64” is used to distinguish
baremetal hosts from hypervisor hosts.
•
Users cannot choose a baremetal server to use explicitly.
This must be addressed for ARC usecase.
We may use additional “labels” in instance_type_extra_specs, like,
“instance_type_extra_specs=cpu_arch:x86_64,racklocation:a32”
21
OS Installation Mechanism of dadai-compute1.0
The basic flow of OS installation in dodai-compute1.0
–
Management IP (IPMI) of baremetal servers are stored in database.
–
The driver prepares a boot image and an installation script.
–
The actual installation works are handled by the script.
Baremetal Server OS Installation Server PXEBoot Server
pxe boot image
(2) Pass installation script URL as a kernel parameter
BareMetal Driver (1) Fetch the target image from Glance
(tar ball of root file system contents),
And prepare the installation script.
(3) Fetch the installation script and run it.
(4) Fetch the image tar ball,
22
NII dodai-compute2.0 project
OS Installation Mechanism of NTTdocomo-openstack
The basic flow of OS installation in NTTdocomo-openstack.
–
Management IP (IPMI) of baremetal servers are stored in database.
–
The driver prepares a boot image and an installation script.
–
The actual installation works are handled by the script.
Baremetal Server OS Installation Server PXEBoot Server
pxe boot image
(2) Embed installation script into the init script
BareMetal Driver (1) Fetch the target image from Glance
(dd image of root filesystem),
And prepare the installation script.
(3) export local disk as an iSCSI LUN, and ask installation service to fill it.
(4) Attache the iSCSI LUN,
23
OS Installation Mechanism
–
Management IP (IPMI) of baremetal servers are stored in database.
–
The driver prepares a pxe boot image to start OS installation.
–
The actual installation works are handled by scripts in the boot image.
The difference just lies on the actual installation method.
–
Installation script of dodai-compute1.0:
•
Make partitions and filesystems on the local disk.
•
Fetch tar.gz image and unbundle it directly to the local filesystem.
•
Install grub to the local disk.
–
Installation script of NTTdocomo-openstack:
•
Start tgtd (iSCSI target daemon) and export the local disk as an iSCSI LUN.
•
Ask the external “Installation Server” to install OS in that LUN.
•
The installation server attaches the LUN and copy “dd” image to it.
•
Grub is not installed. The baremetal relies on PXE boot even for bootstrapping
of OS provisioned in the local disk.
The basic framework is
the same for both of them.
24
NII dodai-compute2.0 project
Considerations on OS Installation Mechanism
Registered machine images need to have meta-data to specify:
–
Type of Installation Service
–
Installation service's FQDN
•
We may use “properties attribute” of the image.
Baremetal Server OS Installation Server A OS Installation Server B PXEBoot Server
pxe boot image/
initrd script for the selected installation service (2) Prepare PXE boot image
corresponding to the selected installation service BareMetal
Driver (1) Prepare the target
Image in the corresponding installation service
(3) Script in initrd starts the installation using the selected installation service.
We could give more general framework
25
Considerations on OS Installation Mechanism
Candidates of Installation Service:
–
Existing ones such as in dodai-compute and NTTdocomo-openstack.
–
We'd like to add Kickstart method, too.
•
The image contains a ks.cfg file instead of an actual binary image.
•
The installation service install the baremetal using Kickstart.
Kickstart gives more flexibility and ease of use
for customizing image contents.
27
Network configuration of dadai-compute1.0
L2 separation is done by VLAN.
– Each lab has its own fixed VLAN ID assigned on SINET4. – dodai-compute asks OpenFlow controller to setup a
port/VLAN mapping. VLAN is explicitly specified by a user.
– Mappings between baremetal's NICs and associated switch ports are stored in database.
OS side configuration is done by the local agent. – NIC bonding is also configured for redundancy.
– NIC bonding is mandatory in ARC.
Baremetal Server Service Network Switch #1 Management Network dodai-compute Management IP (Fixed) Service IP Service Network Switch #2 bonding PXE Boot / Agent Operations SINET4
Service IP and Bonding config is done by local agent based on the request from dodai-compute
VLAN Trunking
OpenFlow Controller
28
NII dodai-compute2.0 project
Network configuration of NTTdocomo-openstack
Virtual Network is managed by Quantum API and
NEC OpenFlow Plug-in.
– L2 separation is done port-based packet separation using flowtable entries.
– Mappings between baremetal's NICs and associated switch ports are stored in database.
– VLAN based separation needs to be added for ARC
usecase.
– When a user specifies more than two NICs, the driver choose unused NICs from the database and setup the flowtable entries for associated ports.
– NIC bonding mechanism needs to be added for ARC
usecase. Baremetal Server Service Network Switch Management Network BaremetalDriver Management IP (Fixed) Service IP PXE Boot OpenFlow Controller
29
How will Quantum API be used for ARC usecase?
Using Quantum API and plugin is a preferable
choice for ARC. But we need some
modification/extension, too.
VLAN based separation needs to be added
for ARC usecase.
– Our plan is to add BareMetal VLAN plugin
which configures port/VLAN mappings using flowtable entries, or directly configures port-VLAN on CISCO switches.
– This enables us not only SINET4 VLAN
connection but also interconnection with VM instances using OVS plugin(via VLAN).
NIC bonding mechanism needs to be added
for ARC usecase.
– As all NICs of baremetal servers are registered
in database, we may add redundancy
information there. (eg. NIC-A should be paired with NIC-B for bonding.)
– We may still need a local agent to make actual
bonding configuration. Hypervisor Host
OVS Plugin Port VLAN
Baremetal Server Service Network
Switch #1 Service NetworkSwitch #2 SINET4
VLAN Trunking
VLAN Trunking
BareMetal VLAN Plugin
31
Summary
Target areas for the future extension:
1. Scheduler extension for grouping of baremetal servers.
– Allowing users to specify baremetal servers to be used.
2. Multiple OS provisioning method.
– Allowing multiple types of OS images such as: • dd-image (NTTdocomo-openstack style) • tar ball (dodai-compute style)
• Kickstart installation (new feature)
3. Baremetal Quantum plugin for VLAN inter-connection.
– Allowing inter-connection to existing VLAN networks. – Allowing NIC-bonding configuration.