Hadoop MapReduce in
Eucalyptus Private Cloud
Johan Nilsson
May 27, 2011
Bachelor’s Thesis in Computing Science, 15 credits
Supervisor at CS-UmU: Daniel Henriksson
Examiner: Pedher Johansson
Ume˚
a University
Department of Computing Science
SE-901 87 UME˚
A
Abstract
This thesis investigates how setting up a private cloud using the Eucalyptus Cloud system could be done along with it’s usability, requirements and limitations as an open-source cloud platform providing private cloud solutions. It also studies if using the MapReduce framework through Apache Hadoop’s implementation on top of the private Eucalyptus Cloud can provide near linear scalability in terms of time and the amount of virtual machines in the cluster.
Analysis has shown that Eucalyptus is lacking in a few usability areas when setting up the cloud infrastructure in terms of private networking and DNS lookups, yet the API that Eucalyptus provides gives benefits when migrating from public clouds like Amazon. The MapReduce framework is showing an initial near-linear relation which is declining when the amount of virtual machines is reaching the maximum of the cloud infrastructure.
Contents
1 Introduction 1 2 Problem Description 3 2.1 Problem Statement . . . 3 2.2 Goals . . . 4 2.3 Related Work . . . 43 Virtualized cloud environments and Hadoop MapReduce 5 3.1 Virtualization . . . 5
3.1.1 Networking in virtual operating systems . . . 7
3.2 Cloud Computing . . . 8
3.2.1 Amazon’s public cloud service . . . 9
3.3 Software study - Eucalyptus . . . 10
3.3.1 The different parts of Eucalyptus . . . 11
3.3.2 A quick look at the hypervisors in Eucalyptus . . . 12
3.3.3 The Metadata Service . . . 13
3.3.4 Networking modes . . . 14
3.3.5 Accessing the system . . . 15
3.4 Software study - Hadoop MapReduce & HDFS . . . 16
3.4.1 HDFS . . . 17
3.4.2 MapReduce . . . 19
4 Accomplishment 23 4.1 Preliminaries . . . 23
4.2 Setup, configuration and usage . . . 24
4.2.1 Setting up Eucalyptus . . . 24
4.2.2 Configuring an Hadoop image . . . 31
4.2.3 Running MapReduce on the cluster . . . 34
4.2.4 The MapReduce implementation . . . 36
5 Results 39 5.1 MapReduce performance times . . . 40
6 Conclusions 43 6.1 Restrictions and limitations . . . 43 6.2 Future work . . . 44
7 Acknowledgements 47
References 49
List of Figures
3.1 A hypervisor can have multiple guest operating systems in it.. . . 6
3.2 Different types of hypervisor-based server and machine virtualizations. . . 6
3.3 Simplified visualization of cloud computing. . . 8
3.4 Overview of the components in Eucalyptus on rack based servers. . . 12
3.5 Metadata request example in Eucalyptus. . . 14
3.6 The HDFS node structure.. . . 18
3.7 The interaction between nodes when a file is read from HDFS. . . 19
3.8 The MapReduce phases in Hadoop MapReduce.. . . 21
4.1 Eucalyptus network layout on the test servers. . . 25
4.2 The optimal physical layout compared to the test environment.. . . 26
4.3 Network traffic in the test environment. . . 27
5.1 Runtimes on a 2.9 GB database.. . . 41
5.2 Runtimes on a 4.0 GB database.. . . 41
5.3 Runtimes on a 9.6 GB database.. . . 41
5.4 Map task times on a 2.9 GB database.. . . 42
5.5 Map task times on a 4.0 GB database.. . . 42
5.6 Map task times on a 9.6 GB database.. . . 42
Chapter 1
Introduction
By using a cloud service a company, organization or even a private person can outsource management, maintenance and administration of large clusters of servers but still keep the benefits. While using a public cloud provider is sufficient for most tasks; bandwidth, storage, data protection or pricing details might encourage companies to house a private cloud. The infrastructure to control and maintain the cloud can be proprietary like Microsoft Hyper-V Cloud [17], VMware vCloud [21] and Citrix Open Cloud [4], but there are also a number of free and open-source solutions like Eucalyptus Cloud, OpenNebula [19] and CloudStack [5].
The cloud can provide the processing power, but the actual framework to take benefit of the these distributed instances does not inherently come with the machines. The Hadoop MapReduce claims to provide very high scalability and stability across a large cluster [7]. It is meant to run on dedicated servers, but there is nothing that limits them from running on a virtual machine.
This thesis is a study performed at the University of Ume˚a, Department of Comput-ing Science to provide familiarity with the cloud and it’s related technologies in general, focusing specifically on the Eucalyptus cloud infrastructure. It shows a mean of setting up a private cloud, along with using the Hadoop MapReduce idiom/framework on top of the cloud showing the benefits and requirements of running MapReduce on a Eucalyptus private cloud. As a proof of concept a simple MapReduce test is implemented and tested on the cloud to provide an analysis of the distributed computation of MapReduce.
The report will have a software study on the systems used in the thesis, followed by a description of the configuration, setup and usage of Eucalyptus and Hadoop. Finally the result from the analysis will be presented along with a short conclusion.
Chapter 2
Problem Description
This thesis is two-fold. It will provide a relatively large software study of the Eucalyptus cloud and a general overview of some the technologies it uses. It will also study what Hadoop MapReduce is and how it can be used in conjunction with Eucalyptus.
The first part of the thesis is to analyse how to setup an Eucalyptus private cloud in a small environment; what the requirements are to run and maintain it and what problems and/or benefits the current implementation of it has. This is a documentation and imple-mentation of one way to configure the infrastructure to deliver virtual machines in small manner to a private user/company/organization.
The second part is to test how well Hadoop MapReduce is performing in a virtual cluster. The machines used for the cluster will be virtual machines delivered through the Eucalyptus cloud that has been set up in the course of the thesis. A (simple) MapReduce application will be implemented to process a subset of Wikipedia’s articles and the time it takes to process this, based on the number of nodes that the cluster runs on, will be measured. In a perfect environment the MapReduce framework can deliver near-linear performance [7] but that is without the extra overhead of running on small virtual machines.
2.1
Problem Statement
By first setting up a small Eucalyptus Cloud on a few local servers the thesis can answer which problems and obstacles there are when preparing the open-source infrastructure. The main priority is setting up a cloud that can deliver virtual instances capable of running Hadoop MapReduce on them to supply a base to perform the analysis of the framework.
Simplifying launching Hadoop MapReduce clusters inside the Eucalyptus Cloud is of second priority after setting up the infrastructure and testing the feasibility of MapReduce on virtual machines. This can include scripts, stand-alone programs or utilities beyond Eucalyptus and/or Hadoop.
2.2
Goals
The goals of this thesis is to do a software study and analysis of the performance and usability of Hadoop MapReduce running on top of virtual machines inside an Eucalyptus Cloud infrastructure. It will study means to setup, launch, maintain and remove virtual instances that can together form a MapReduce cluster. The following are the specific goals of this thesis:
– Demonstrate a way of setting up a private cloud infrastructure using the Eucalyp-tus Cloud system. This includes configuring subsystems that EucalypEucalyp-tus uses like hypervisors, controller libraries and networking systems.
– Create a virtual machine image containing Hadoop MapReduce that will provide ease of use and minimal manual configuration at provisioning time.
– Provide a way to easily create and remove virtual instances inside the private cloud, adjusting the size of Hadoop worker nodes available in it’s cluster.
– Test the Hadoop MapReduce framework on a virtual cluster inside the private cloud. This is to show what kind of performance increase a user gains when adding more virtual nodes to the cluster, and if it is a near linear increase.
2.3
Related Work
Apache Whirr is a collection of scripts that has sprung out as a project of its own. The purpose of Whirr is to simplify controlling virtual nodes inside a cloud like Amazon Web Services [10]. Whirr controls everything from launching, removing and maintaining instances that Hadoop then can utilize in a cluster.
Another similar controller program is Puppet [14] from Puppet Labs. This program fully controls instances and clusters inside an EC2-compatible (AWS or Eucalyptus for example) cloud. It uses a program outside the cloud infrastructure that can control whether to launch, edit or remove instances. Puppet also controls the Hadoop MapReduce cluster inside the virtual cluster. Mathias Gug, an Ubuntu Developer, has tested how to deploy a virtual cluster inside an Ubuntu Enterprise Cloud using Puppet. The results can be found on his blog [13].
Hadoop’s commercial and enterprise offspring, Cloudera [6], has released a distribu-tion called CDH. The current version, version 3, contains a virtual machine with Hadoop MapReduce configured along with Apache Whirr instructions. This is to simplify launching and configuring Hadoop MapReduce clusters inside a cloud. These releases also contains extra packages for enterprise clusters, such as Pig, Hive, Sqoop and HBase. CDH also uses Apache Whirr to simplify AWS deployment.
Chapter 3
Virtualized cloud environments
and Hadoop MapReduce
This in-depth study focuses on explaining some key concepts regarding cloud computing, virtualization and clustering along with how certain specific software solutions work based on these concepts. As some of the software are used in the practical implementation of the thesis the in-depth study naturally focuses on how these work in practical environment.
3.1
Virtualization
The term virtualization refers to creating a virtual environment instead of an actual physical one. This enables a physical system to run different logical solutions on it by virtually creating an environment that meets the demand of the solution. By virtually creating several different operating systems on one physical workstation the administrator can create a cluster of computers that acts as if they were physical.
There are several different methods of virtualization. Network Virtualization refers to creating virtual networks that can be used for segmenting, subnetworking or creating vir-tual private networks (VPN) as a few examples. Desktop virvir-tualization enables a user to access his local desktop from a remote location and is commonly used in large corporations or authorities to ensure security and accessibility. A more common virtualization usually encountered by a home user is Application Virtualization which enables compilation of code to machine instruction running in a certain environment. Examples of this include Java VM and Microsoft’s .NET framework. In cloud computing, Server & Machine Virtualization are extensively used to virtually create new computers that can act as a completely different operating system independent of the underlying system it runs on [26].
Without virtualization situations would arise where machines would use only a percent-age of their maximum capacity. If the server would have virtualization active and enable more operating systems run on the physical hardware, the hardware would be used more
effectively. This is why server and machine virtualization is of great benefit when creating a cloud environment because the cloud host can maximize effectivity and distribute resources without having to buy a physical server each time a new instance is needed.
The system that keeps track of the machine virtualization is called a hypervisor. Hy-pervisors are mediators that translates calls from the virtualized OS to the hardware and acts as a security guard. The guard prevents different virtual instances from accessing each other’s memory or storage areas that is outside their virtual bounds. When a hypervisor creates a new virtual instance (a Guest OS ), the hypervisor ’marks’ memory, CPU and storage areas to be used by that instance [22]. The underlying hardware is usually the limit of how many virtual instances that can be run on one physical machine.
Figure 3.1: A hypervisor can have multiple guest operating systems in it.
Depending on the type of hypervisor it can either work directly with the hardware (called type 1 virtualization) or on top of an already installed OS (called type 2 virtualization). The type used varies based on which hypervisor, underlying OS or hardware installed that is installed. These different variations demands different requirements for each system; a hypervisor might flawlessly work on one hardware/OS setup but might be inoperable in a slightly different variation [26]. See figure 3.2.
3.1. Virtualization 7
Hypervisor-based virtualization is the most commonly used one [22], but several different variants of it exists. Kernel-based virtualization employs specialized OS kernels, where the kernel runs a separate version of itself along with a virtual machine on the physical hardware. In practice one could say that the kernel acts as a hypervisor and it is usually a Linux kernel that uses this technique. Hardware virtualization does not rely on any software OS, but instead uses specialized hardware along with a special hypervisor to provide virtualization. The benefit of this is that the OS running inside the hypervisor does not have to be modified, which normal software hypervisor virtualization requires [22]. Technologies for providing hardware virtualization on the CPU’s (native virtualization) are based on the CPU developers such as Intel VT-x or AMD-V.
The operating systems that runs in the virtual environment are called machine images. These images can be be put to sleep and then stored on the hard drive with their current installation, configuration and even running processes hibernated. When requested, the images can then be restored to their running state to continue to finish what they did before hibernation. This allows dynamic activation and deactivation of resources.
3.1.1
Networking in virtual operating systems
With operating systems acting inside a hypervisor and not directly contacting the physical hardware the problem arises when there are several instances that wants to communicate on the network. They do not actually exists with a physical Network Interface Card (NIC) connected to them, so the hypervisor has to ensure that the right instance receives the correct network packages.
The way the networking is handled depends on the hypervisor. There are four techniques used to create virtual NICs [26]:
NAT Networking
NAT (Network Area Translation) is the same type of technique used in common home routers. It translates an external IP address to an internal one, which enables multiple internal IPs. The packets sent are recognized by the port they are sent to and from. The hypervisor provides the NAT translation and the VMs reside in a subnetwork with the hypervisor acting as the router.
Bridge Networking
Bridging the networking is basically connecting the virtual NIC with the physical hardware NIC. The hypervisor sets up the bridge and the virtual OS connects to it as it believes it to be a physical. The benefit of this is that the Virtual Machine will show up on the local network just as any other physical machine.
Host-only
Host-only networking is the local variant of networking. The hypervisor disables net-working to external machines outside of the VM which defeats the purpose of the VM in a cloud environment. This is used on local machines mostly.
Hybrid
These can connect to most of the other types of networking styles and in some ways can act as a bridge to a host-only VM.
Networking the virtual machines in a proper way is crucial when setting up a virtualized cloud. The virtual machines have to be able to connect to the cloud system network to provide resources.
3.2
Cloud Computing
Cloud computing is a type of distributed computing that provides elastic, dynamic process-ing power and storage when needed. At its essence it basically gives the user the computprocess-ing power when it needs it. The term cloud refers to typical visual representation of Internet in a diagram; a cloud. What cloud computing means is that there is a collection of computers that can give the customer/user the amount of computational power needed, without them having to worry about maintenance or hardware [20].
Typically a cloud is hosted on a server farm with a large amount of clustered computers. These provide the hardware resources. The cloud provider (the organization that hosts the servers) offers an interface for users to pay for a certain amount of processing power, storage or computers in a business model. These resources can then be increased or decreased based on demand, so the user only needs to focus on it’s contents whereas the provider takes care of maintenance, security and networking.
Figure 3.3: Simplified visualization of cloud computing.
The servers in the server farm are usually virtualized, although they are not required to be to actually be included in a cloud. Virtualization is a ground pillar in cloud computing; it enables the provider to maximize the processing power on the raw hardware and gives the cloud elasticity, the ability for users to scale the instances required. It also helps providing two other key features of a cloud: multitenacity, the sharing of resources, and massive scalability, the ability to have huge amounts of processing systems and storage areas (tens of thousands of systems with large amounts of terabytes or petabytes of data) [16].
There are three major types of services that can be provided from a cloud. These are usually different levels of access for the user, ranging from having control of just a few components to the operating system itself [16]:
Infrastructure as a Service (IaaS)
3.2. Cloud Computing 9
be on dedicated hardware (that is, not virtualized) where the user has to install what-ever they want on the system themselves. The user is given access to the operating system, or the ability to create their own through images that they create (typically in a virtualized environment). This is used when the user wants the raw processing power of alot of systems or needs a huge amount of storage.
Platform as a Service (PaaS)
PaaS does not give as much freedom as the IaaS, but instead focusing on having key applications already installed on the systems delivered. These are used to provide the user the systems needed in a quick and accessible way. The users can then modify the applications to their needs. An example of this would be a hosted Website; the tools for hosting the website (along with extra systems like databases, web service engine etc.) is installed and the user can create the page without having to think about networking or accessibility.
Software as a Service(SaaS)
SaaS is generally transparent for the user. It gives the user a software that has its process in a cloud. The user can only interact with the software itself and is more often unaware that it is being processed in a cloud. A quick example is the Google Docs, where users can edit documents that are hosted and processed in the Google cloud.
The cloud providers in the business model often uses web interfaces to enable users to increase or decrease the instances they use. They are then billed by the amount of space required or processing power used (depending on what type of service that is bought) in a pay-as-you-go system. This type is of the IaaS type, which for example Amazon, Proofpoint and Rightscale can provide [16]. However a cloud does not necessarily only exists on the Internet as business model of delivering computational power or storage. A cloud could either be public; which means that the machines delivered resides on the Internet, private; which means that the cluster is hosted locally or hybrid ; where the instances are local at start but can use public cloud services on-demand if the private cloud does not have sufficient power [20].
Cloud computing is also used in a lot more different systems. Google uses cloud com-puting to provide the backbone to their large systems such as Gmail, Google App Engine and Google Sites. Google App Engine provides a PaaS for the sole purpose of creating and hosting web sites. The Google Apps is a SaaS cloud that is a distributed system of handling office types of files; a clouded Microsoft Office [16].
3.2.1
Amazon’s public cloud service
Amazon was one of the first big companies moving into being a large cloud system provider [20]. Amazon is interesting in terms of being one of the first giants that provides an API to access data through their Amazon Web Service (AWS) and web pages. Since Eucalyptus uses the same API, albeit with a different and open source implementation, a closer look on Amazon and their service is interesting. Amazon provides several different services [20], but in terms of this thesis there are some of more interest:
Amazon Simple Storage Service (S3)
The Simple Storage Service is Amazon’s way of providing vast amounts of storage space to the user. A user can pay for the amount of space needed, from just a few gigabytes to several petabytes. Also, fees apply to the amount of data transfered to and from the storage. S3 uses ”buckets” which in layman terms can be seen as folders to store data within. These buckets are stored somewhere inside the cloud and are replicated on several devices to provide redundancy. Using standard protocols such as HTTP, SOAP, REST and even BitTorrent to transfer the data to the S3; the Simple Storage Service provides ease of access [3] to the user.
Amazon Elastic Compute Cloud (EC2)
The Elastic Compute Cloud is a way to provide a dynamic/elastic amount of com-putational power. Amazon gives the user the ability to pay for nodes. These nodes are virtualized computers that can take an Amazon Machine Image and use it as the image that runs on their virtualized environment (see section 3.1). The EC2 aims at supplying large amounts of CPU and RAM to the user, but it is up to the user to write and execute the applications to use the resources [2]. These virtualized computers, nodes, are contained inside a security group, or a virtual network consisting of all the EC2 nodes the user has payed for. During computation they can be linked together to provide a strong distributed computational base.
Amazon Elastic Block Store (EBS)
While the S3 is focused on storage it does not focus on speed and fast access to the data. When using the EC2 system the data that has to be processed must be stored in a fast-to-access way to avoid downtime for the EC2 system. The S3 does not provide that, so Amazon has created a way of attaching virtual, reliable and fast devices to be attached to the EC2. This is called the Elastic Block Storage, EBS. EBS differs to S3 in that the volumes cannot be as small or large as the S3 (1 GB - 1 TB on EBS compared to 1 B - 5 TB on S3 [1, 3]) but instead has faster read-write times and are easier to attach to EC2 instances. One EBS volume can only be attached to one EC2 instance at a time, but one EC2 instance can have several EBS volumes attached to it. The EBS also offers the ability to snapshot the EBS volume and store the snapshot on a different storage medium, for example the S3 [1].
As an example of using the Amazon EC2, the New York Times used EC2 and S3 in conjunction to convert 4 TB of articles from 1851 - 1980 (around 11 million articles) stored in TIFF images to PDF format. By using 100 Amazon EC2 nodes, the NY Times converted the TIFF-images to 1.5 TB of PDFs in less than 24 hours, a conversion that would take far greater time if done on a single computer [18]. On a side note, the NY Times also used Apache Hadoop installed on their AMIs to process the data (see section 3.4).
3.3
Software study - Eucalyptus
Eucalyptus is a free open-source cloud management system that is using the same API as the AWS are using. This enables tools that originally where developed for Amazon to be used with Eucalyptus, but with the added benefit of Eucalyptus being free and open-source. It provides the same functionality in terms of IaaS deployment and can be used as
3.3. Software study - Eucalyptus 11
a private, hybrid or even a public cloud system with enough hardware. Instances running inside Eucalyptus runs Eucalyptus Machine Images (EMI, cleverly name after AMI), which can either be created by the user or downloaded as pre-packaged version. An EMI can contain either a Windows, Linux or CentOS operating system [8]. At the time of writing Eucalyptus does not support Mac OS.
3.3.1
The different parts of Eucalyptus
Eucalyptus resides on the host operating systems of which is installed on. Since it uses libraries and hypervisors that are restricted to the Linux OS it cannot be run on other operating systems like Microsoft Windows or Apple OS. When Eucalyptus starts it contacts its’ different components to determine the layout and setup of the systems it controls. These components are configured using configuration files in each component. They all have different responsibilities and areas to create a complete system that can handle dynamic creation of virtualized instances, large storage environments and user access control.
Providing the same features as Amazon in terms of computation clouds and storage, the components inside Eucalyptus have different names but with equal functionality and API [8]:
– Walrus
Walrus is the name of the storage container system, similar to Amazon S3. It stores data in buckets and have the same API to read and write data in a redundant system. Eucalyptus offers a way to limit access and size of the storage buckets through the same means as S3, by enforcing user credentials and size limits. Walrus is written in Java, and is accessible through the same means as S3 (SOAP, REST or Web Browser).
– Cloud Controller
The Cloud Controller (CLC) is the Eucalyptus implementation of the Elastic Compute Cloud (EC2) that Amazon provides. The CLC is responsible for starting, stopping and controlling instances in the system, as this is providing the computational power (CPU & RAM) to the user. The CLC is indirectly contacting the hypervisors through Cluster Controllers (CC) and Node Controllers (NC). The CLC is written in Java.
– Storage Controller
This is the equivalent to the EBS found in Amazon. The Storage Controller (SC) is responsible for providing fast dynamic storage devices with low latency and variable storage size. It resides outside the virtual CLC-instances, but can communicate with them as external devices in a similar fashion of the EBS system. The SC is written in Java.
Beneath the Cluster Controller, on every physical machine lies the Node Controller (NC). Written in C, this component is in direct contact with the hypervisor. The CC and SC talks with the NC to determine the availability, access and need of the hypervisors. The CC and SC runs on a cluster-level, which means they only need one per cluster. Usually the SC and CC are deployed on the head-node of each cluster - that is a defined machine marked as being the ”leader” of the rest of the physical machines in the cluster - but if the cloud only
consists of one large cluster the CLC, SC, CC and Walrus all can reside on the head node, the front-end node.
Figure 3.4: Overview of the components in Eucalyptus on rack based servers.
All components communicate with each other over SOAP with WS-security [8]. To make up the entire system Eucalyptus has more parts which were mentioned in brief earlier. The Cluster Controller, written in C, is responsible for an entire physical cluster of machines to provide scheduling and network control of all the machines under the same physical switch/router. See figure 3.4. While the CLC is responsible for controlling most of the instances and their requests (creating, deleting, setting EMIs etc) it talks with both the CC and SC on the cluster level. Walrus on the other hand is only responsible for storage actions and thus only talks with the SC.
The front-end serves as the maintenance access point. If a user wants more instances or needs more storage allocated to them, the front-end has Walrus and the CLC ready to accept requests and propagate them to CCs and SCs. This provides the user the transparency of the Eucalyptus cloud. The user cannot tell where and how the storage is created, only that they actually received more storage space by requesting it from the front-end.
3.3.2
A quick look at the hypervisors in Eucalyptus
To be able to create and destroy virtualized instances on demand the Node Controller needs to talk with a hypervisor installed on the machine it is running on. Currently, Eucalyptus only supports the hypervisors Xen and KVM. To be able to communicate with them, Eucalyptus utilizes the libvirt virtualization API and virsh.
3.3. Software study - Eucalyptus 13
operating systems on it. This requires that the Host OS’s must be modified to do calls to the hypervisor instead of the actual hardware [22]. Xen can also support hardware virtualization but that requires specialized virtualization hardware. See section 3.1. The first guest operating system that Xen virtualizes is called dom0 (basically the first domain) and is automatically booted into when starting the computer. When Xen runs a virtual machine, the drivers are run in user-space, which means that every OS runs inside the memory of a user instead of the kernel’s memory space. Xen provides networking by bridging the NIC (see Section 3.1.1).
KVM, Kernel-based Virtual Machine, is a hypervisor built using the OS’s kernel. This means that KVM uses calls far deeper into the OS architecture, which in turn provide greater speed. KVM is very small and built into the Linux kernel but it cannot by itself provide CPU paravirtualization. To be able to do that it uses the QEMU CPU emulator. QEMU is in short an emulator designed to simulate different CPU’s through API calls. The usage of QEMU inside KVM means that KVM sometimes is referred to as qemu/KVM. When KVM runs inside the kernel-space, it uses calls through QEMU to interact with the user-space parts, like creating or destroying a virtual machine [22]. Like Xen, KVM also bridges the NIC to provide networking to the virtual machines.
When Eucalyptus creates EMIs to be used inside the cloud system, it requires images along with kernel and ramdisk pairs that works for the designated hypervisor. A RAM disk is not required in the beginning since the ramdisk image defines the state of the RAM memory in the virtual machine [22], but if the image has been installed and is running when put to sleep a ramdisk image should come with it (assume there were none, then when the virtual machine resumed all the RAM would be empty). Since the images might look different depending on which hypervisor created them; Xen and KVM cannot load each others images. This provides an interesting point to Eucalyptus: if in theory there was an image and ramdisk/kernel pair that would work on both the hypervisors, Eucalyptus could run physical machines that had either KVM or Xen installed on them and boot any Xen/KVM image without encountering any virtualization problems. With the current discrepancy, the machines in the cloud are forced to run a specific hypervisor so that the EMIs can be loaded across any Node Controller in the cloud.
3.3.3
The Metadata Service
Just like Amazon, Eucalyptus has a metadata service available for the virtual machines [8]. What the metadata does is supplying information to VMs about themselves. This is achieved by the VMs contacting the CLC with a HTTP request. The CLC checks from which VM the call is made and returns the requested information based on the VM which made the call. An example, if the CLC’s IP is 169.254.169.254 then a VM could make a request like:
http://169.254.169.254:8773/latest/meta-data/<name of metadata tag>
http://169.254.169.254:8773/latest/user-data/<name of user-defined metadata tag>
The metadata tag can be anything from standard default ones like the kernel-id, security-groups or the public hostname or it could be specific ones defined by the administration. This is a method of obtaining data to setup information when new instances are created or
Figure 3.5: Metadata request example in Eucalyptus.
destroyed. The metadata calls have the exact same callnames and structure like the AWS, so tools used inside the AWS system works with the Eucalyptus metadata.
3.3.4
Networking modes
With Eucalyptus installed on all the machines in the cluster(s), the different components call each other with SOAP commands on the physical NIC. However, when new instances are created they need to have their networking set up on the fly. Since the physical network might have other settings regarding how the NIC’s retrieve their IP’s Eucalyptus have different modes to give the virtual machines access to the network. The virtual machines communicate with each other using virtual subnets. These subnets must not in any way be in the same area as the physical net, used by the components of Eucalyptus (notice the difference between components like CLC, Walrus, NC etc and virtual machines). The CC has one connection towards the virtual subnet and another bridged to the physical network [8].
Networking modes inside the Eucalyptus cloud system differs in how much freedom and connectivity the instances have. Some modes adds features to the VM networks:
– Elastic IPs is in short a way to supply the user with a range of IPs that the VMs can use. These can then be used towards external, public, IPs and is ideal if the user needs a persistent Web server for example.
– Security groups is way of giving the user of a group of instances control what can be done or not in terms of network traffic. For example one security group can enforce that no ICMP calls is answered, or that one cannot make SSH connections.
3.3. Software study - Eucalyptus 15
– VM isolation prevents VMs from different security groups to contact each other if activated. When running a public cloud providing IaaS, this is almost a must-have.
The different modes gives different benefits and drawbacks and some is even required to use under certain circumstances. There are four different networking modes in Eucalyptus. In three modes, the front-end acts as the DHCP server distributing IPs to the virtual machines. The fourth mode require an external DHCP server to distribute IPs to virtual machines [8]. In all networking modes the VMs can be connected to from an external source if given a public IP and the security group allows it. The different modes are the following:
SYSTEM
The only networking mode that requires an external DHCP server to serve new VM instances with IPs. This mode requires little configuration since it does not limit internal interaction. It does however provide no extra things like security groups, elastic IPs or VM isolation. This mode is better if used when the cloud is private and there are few users that share the cloud.
STATIC
STATIC mode requires that the DHCP server on the network is either turned off or configured to not serve a specific IP-range that the VMs use. The front-end has to take care of the DHCP server towards the instances, but through a non-dynamic way by adding pairs of MAC-addresses and IPs to VMs. Just like SYSTEM, it does not provide the benefits that you normally associate with a public cloud like elastic IPs, VM isolation or security groups.
MANAGED
MANAGED mode gives the most freedom to the cloud. Advanced features like VM isolation, elastic IPs and security groups are available by creating virtual networks between the VMs.
MANAGED-NoVLAN
If the physical network relies on VLAN then normal MANAGED mode will not work (since several VLAN packets on top of each others will cause problems for the routing). In this mode most of the MANAGED mode features are still there except VM isolation.
When setting up the Eucalyptus networking mode one has to consider what type of cloud it is and what kind of routing setup is made on the physical network.
3.3.5
Accessing the system
When a user wants to create, remove or edit the instances they can either contact them directly through SSH (if they have public IPs) or they can control the instances by using Eucalyptus’ web interface. By logging in on the front-end with a username and password the user or admin can configure settings of the system. Also, tools developed for AWS can be used for this since Eucalyptus supports the same API calls [8].
Similar there is a tool called euca2ools for administration. It is a Command Line Interface tool that is used to manipulate the instances that a user has running. An admin
using euca2ools has more access than an ordinary user. Euca2ools is almost mandatory when working with Eucalyptus.
3.4
Software study - Hadoop MapReduce & HDFS
Hadoop is an open source software package from the Apache Foundation which contains different systems aimed at file storage, analysis and processing of large amounts of data ranging from only a few gigabytes to several hundreds or thousands of petabytes. Hadoop’s software is all written in Java, but the different parts are separate projects by themselves so bindings to other programming languages exists on per-project basis. The three major subprojects of Hadoop are the following [9]:
HDFS
The Hadoop Distributed File System, HDFS, is a specialized filesystem to store large amounts of data across a distributed system of computers with very high through-put and multiple replication on a cluster. It provides reliability between the different physical machines to support a base for very fast computations on a large dataset.
MapReduce
MapReduce is a programming idiom for analyzing and process extremely large datasets in a fast, scalable and distributed way. Originally conceived by Google as a way of handling the enormous amount of data produced by their search bots [23], it has been adapted in a way that it can run on a cluster of normal commodity machines.
Common
The Hadoop Common subproject provides interfaces and components built in Java to support distributed filesystems and I/O. This is more of a library that has all the features that HDFS and MapReduce uses to handle the distributed computation. It has the the code for persistent data structures and Java RPC that HDFS needs to store clustered data [23].
While these are the projects that Hadoop have as it’s major subprojects, there are several other that are related to the Hadoop package. These are generally projects related to distributed systems which either uses the major Hadoop subprojects or are related to them in some way:
Pig
Pig introduces a higher level data-flow language and framework when doing parallel computation. It can work in conjunction with MapReduce and HDFS and has an SQL-like syntax.
Hive
Hive is data warehouse infrastructure with a basic query language, Hive QL, which is based of SQL. Hive is designed to easily integrate and work together with the data storage of MapReduce jobs.
3.4. Software study - Hadoop MapReduce & HDFS 17
HBase
A distributed database designed to support large tables of data with a scalable infras-tructure on top of normal commodity hardware. It’s main usage is when to handle extremely large database tables, i.e. billions of rows on millions of columns.
Avro
By using Remote Procedure Calls (RPC), Avro provides a data serialization system to be used in distributed systems. Avro can be used when parts of a system needs to communicate through the network.
Chukwa
Built on top of HDFS and MapReduce, Chukwa is a monitoring system when a large distributed system needs to be monitored.
Mahout
A large machine learning library. It uses the MapReduce to provide scalability and handling of large datasets.
ZooKeeper
ZooKeeper is mainly a service for distributed systems control, monitoring and syn-chronization.
HDFS and MapReduce are intended to work on commodity hardware which is the oppo-site to specialized high-end server hardware designed for computational-heavy processing. The idea is to be able to use the Hadoop software on a cluster of not-that-high-end com-puters and still get a very good result in terms of throughput and reliability. An example taken from Hadoop - The Definitive Guide [24] of a commodity hardware:
Processor: 2 quad-core 2-2.5GHz CPUs Memory: 16-24 GB ECC RAM
Harddrive: 4 x 1TB SATA disks Network: Gigabit Ethernet
Since Hadoop is designed to use multiple large harddrives and multiple CPU cores, having more of them is almost always a benefit. The ECC RAM stands for Error Correction Code RAM and is almost a must have since Hadoop uses a lot of memory in processing and reportedly sees a lot of checksum errors on clusters without it [24]. Using Hadoop on a large cluster of racked physical machines in a two-level network architecture is a common setup.
3.4.1
HDFS
The Hadoop Distributed File System is designed to be a filesystem that gives a fast access rate and reliability for very large datasets. HDFS is basically a Java program that com-municates with other networked instances of HDFS through RPC to store blocks of data across a cluster. It is designed to work well with large file sizes (which can vary from just a hundreds of MBs to several PBs), but since it focuses more on delivering high amount of data between the physical machines it has a slower access rate and higher latency [23].
HDFS is split into three software parts. The NameNode is the ”master” of the filesys-tem that keeps track of where and how the files are stored in the filesysfilesys-tem. The DataNode is the ”slave” in the system and is controlled by the NameNode. There is also a Secondary NameNode which, contrary to what it’s name says, is not a replacement of the NameNode. The secondary NameNode is optional, which is explained why later in the section.
When HDFS stores files in it’s filesystem it splits the data into blocks. These blocks of raw data is of configurable size (defined in the NameNode configuration) but the default size is 64 MB. This is compared to a normal disk block which is 512 bytes [23]. When a datafile has been split up into blocks, the NameNode sends the blocks to the different DataNodes (other machines) where they are stored on disk. The same block can be sent to multiple DataNodes which will provide redundancy and higher throughput when another system requests access to the file.
The NameNode is responsible for keeping track of the location of the file among the DataNode, as well as the tree structure that the filesystem uses. The metadata about each file is also stored in the NameNode, like which original datafile it belongs to and it’s relation to other blocks. This data is stored on-disk in the NameNode in form of two files: the namespace image and the edit log. The exact block locations on the DataNodes is not stored in the namespace image, this is reconstructed on startup by communicating with the DataNodes and then only kept in memory [23].
Figure 3.6: The HDFS node structure.
Due to the NameNode keeping track of the metadata of the files and the tree structure of the file system it is also a single point of failure. If it breaks down the whole HDFS filesystem will be invalid since the DataNodes only stores the data on disk without any knowledge of the structure. Even the secondary NameNode cannot work without the NameNode, since the secondary NameNode is only responsible validating the namespace image of the NameNode. Due to the large data amounts that file metadata can provide the NameNode and secondary NameNodes should be different machines (and separated from the DataNodes) on a large system [23].
However, as of Hadoop 0.21.0 work has begun to remove the Secondary NameNode and replace it with a Checkpoint Node and Backup Node which are meant to keep track of the NameNode and keep an up-to-date copy of the NameNode. This will work as a backup
3.4. Software study - Hadoop MapReduce & HDFS 19
in case of a NameNode breakdown [11], lowering the risk of failure if the NameNode crashes.
By default, the NameNode replicates each block by a factor of three. That is, the NameNode tries to keep three copies of each block on different DataNodes at each time. This will provide both redundancy and more throughput for the client that uses the filesystem. To provide better redundancy and throughput HDFS is also rack-aware, that is it wants to know which rack each node resides in and how ”far” in terms of bandwidth each node is from each other. That way the NameNode can keep more copies of blocks on one rack for faster throughput, but additional copies on other racks for better redundancy.
Figure 3.7: The interaction between nodes when a file is read from HDFS.
DataNodes are more or less data-dummies that takes care of storing and sending file data to and from clients. When started they have the NameNode’s location defined as an URL in their configuration file. This is by default localhost, which needs to be changed as soon as there are more than one node in the cluster.
When a user wants to read a file it uses a Hadoop HDFS client that contacts the Na-meNode. The NameNode then fetches the block locations and return the locations to the client, forcing the client to do the reading and merging of blocks from the DataNodes. See Figure 3.7. Since HDFS requires a special client to interact with the filesystem it is not as easy as mounting a NFS (Network File System) and reading from it in an operating system. However, there are bindings to HTTP and FTP available and software like FUSE, Thrift or WebDAV can also work with HDFS [23]. Using FUSE on top of HDFS would mean that one can mount it as a normal Unix userspace drive.
3.4.2
MapReduce
MapReduce is a programming idiom/model for processing extremely large datasets using distributed computing on a computer cluster. It is invented and patented by Google. The word MapReduce derives from two typical functions used within functional programming, the Map and Reduce functions [7]. Hadoop has taken this framework and implemented it to be able to run it on top of a cluster of computers that are not high-end , similar to HDFS, through a license from Google. The purpose of Hadoop’s MapReduce is to be able to utilize the combined resources of a large cluster of commodity hardware. MapReduce relies on a
distributed file system, where HDFS is currently one of the few supported.
MapReduce phases
The MapReduce framework is split up into two major phases; the Map phase and Reduce phase. The entire framework is built around Key-Value pairs and the only thing that is communicated between the different parts of the framework are Key-Value pairs. The keys and values can be user-implemented, but they are required to be serialized since they are communicated across the network. Keys and values can range from simple primitive types to large data types. When implementing a MapReduce problem the problem has to be able to be split into n parts, where n is at least the amount of Hadoop nodes in the cluster.
It is important to understand that while the different phases in a MapReduce job can be regarded as sequential, they are in fact working in parallel as much as possible. The shuffle and reduce phases can start working as soon as one map task has completed and this is usually the case. Depending on work slots available across the cluster each job is divided as much as possible. The MapReduce framework is built around these components:
InputFormat
Reads file(s) on the DFS, tables from a DBMS or what the programmer wants it to read. This phase takes an input of some sorts and splits it into InputSplits.
InputSplits
An InputSplit is dependant on what the input data is. It is a subset of the data read and one InputSplit is sent to each Map task.
MAP
The Map phase takes a key-value pair generated through the InputSplit. Each node runs one map task and is run in parallel with each other. One Map task takes a key-value pair, process it and generates another key-value pair.
Combine
The optional combine phase is a local task run directly after each map task on each node. It does a mini-reduce by combining all keys that are the same generated from the current map task.
Shuffle
When the nodes have completed their map task it enters the shuffle phase, where data is communicated with each node. Key-value pairs are passed between the nodes to append, sort and partition it. This is the only phase where the nodes communicate with each other.
Shuffle - Append
Appending the data during the shuffle phase is generally just putting all the data together. Shuffle append is automatically done by the framework.
Shuffle - Sort
The sort phase is when the keys are sorted by either a default way or in a programmer-implemented way.
3.4. Software study - Hadoop MapReduce & HDFS 21
Shuffle - Partition
The partition phase is the last phase of the shuffle. This calculates how the combined data should be split out to the reduces. It can either be handled in a default way or programmer-implemented. It should generate an equal amount of data to each reducer for optimal performance.
REDUCE
Reduce is done by taking all the key-value pairs with the same key and performing some kind of reducing on the values. Each reduce takes a subset of all the key-value pairs, but will always have all the values to one key. For example, the (Foo, Bar) and (Foo, Bear) will go to the same reducer as (Foo, [Bar, Bear]). If a reducer has one key, no other reducer will receive that key.
Output
Each reducer generates one output to a storage. The output can be controlled through subclassing OutputFormat. By default the output is generating part-files for each re-ducer in the form of files named part-r-00000, part-r-00001 etc. This can be controlled through an implementation of OutputFormat.
Although each inputsplit is sent to one Map task, the programmer can tell the Input-Format (through a RecordReader) to read across the boundaries of the split given. This enables the InputFormat to read a subset of data without having to combine them from two or more maps. When one inputsplit has been read across it’s boundaries, the latter split will begin after where the former stopped. The size of the inputsplit given to a map task is most oftenly dependant on the size of the data and the size of a HDFS block. Since Hadoop MapReduce is optimized for - and most oftenly runs on - HDFS, the block size of HDFS most often dictates the size of the split if it is read from a file.
Figure 3.8: The MapReduce phases in Hadoop MapReduce.
The key-values given to the mapper does not always require the keys to be meaningful. The values can be the only interesting for the mapper, outputting a different key and value
after computation. The general flow of key-value pairs in the MapReduce framework is the following:
map(K1, V 1) → list(K2, V 2) reduce(K2, list(V 2)) → list(V 3)
However, when implementing MapReduce the framework takes care of generating the lists and combining the values to one key. The programmer only needs to focus on what one map or reduce task does and the framework will apply it n times up to until the job is mapped. In terms of the Hadoop framework it can be regarded as:
framework in(data → (K1, V 1)) map(K1, V 1) → (K2, V 2)
framework shuffle((K2, V 2) → (K2, list(V 2))) reduce(K2, list(V 2)) → (K3, V 3)
framework out((K3, V 3) → (K3, list(V 3)))
When starting a Hadoop MapReduce cluster it requires a master that contains the HDFS NameNode and the JobTracker. The JobTracker is the scheduler and input of a MapReduce job. The JobTracker communicates with TaskTrackers that runs on other nodes in the cluster. A node that only contains a JobTracker and HDFS DataNode is generally known as a slave. TaskTrackers periodically pings the JobTracker and checks whether a free task to work on is ready or not. If the JobTracker has a task ready it is sent to the TaskTracker which performs it. Generally on a large cluster the master is separate from the slaves, but on smaller clusters the master also runs a slave.
Chapter 4
Accomplishment
This chapter describes how the configuration of Eucalyptus and Hadoop was done. It describes one way to set up an Eucalyptus cloud and one way to run Hadoop MapReduce in it. While it describes one way, it should not be regarded as the definite way to do it. Eucalyptus can run several different networking modes on top of different OS’s which means that the following configurations are not the only solution.
4.1
Preliminaries
The hardware available for this thesis were nine rack servers running Debian 5, kernel version 2.6.26, connected through one gigabit switch with virtual network support. Of these, one was the designated DHCP server of the subnet, only serving specific MAC addresses a specific IP address and not giving out any IP to an unknown MAC address. This server also had a shared NFS /home for the other eight servers in the subnet. Due to the other eight server being dependant on this server, it was ruled out of the Eucalyptus setup. Out of the eight available, four supported the Xen hypervisor and four supported the KVM hypervisor. This is the hardware settings of the servers.
CPU
AMD Opteron 246 Quad-core on test01-04 AMD Opteron 2346 HE Quad-core on test05-08 RAM 2 GB on test01-04 4 GB on test05-08 HDD 27 GB on /home 145 GB/server
Initially Eucalyptus version 1.5.1 was chosen for this thesis, but due to problematic configurations of the infrastructure a later version was used in the final testing; 2.0.2. See
section 4.2.1 for further explanation. For Hadoop MapReduce the latest version, 0.21.0, was chosen due to the fact that in this release a major API change has been made. 0.21.0 also has a large number of bug fixes implemented into it compared to the earlier versions. None of the servers had any Hadoop or Eucalyptus previously installed on them.
4.2
Setup, configuration and usage
The following sections contains a description of how the Eucalyptus and Hadoop MapReduce software were setup. It is divided into three different subsection; the first focuses on how to configure Eucalyptus on the servers available, the second on how to create an image with Hadoop suitable for the environment and finally the MapReduce implementation along with how to get it running. Installing, configuring and running Eucalyptus requires a user with root access to the systems of which it is installed on.
The configuration is based on a Debian host OS, as this was the system it was run on. This means that some packages or commands either does not exists or have a different command structure on other OS’s like CentOS or RedHat Linux.
4.2.1
Setting up Eucalyptus
Compared to Xen, KVM works more ”out of the box” since it is tightly configurated with the native Linux kernel. The choice of hypervisor was then to use KVM to avoid any problems that might occur between the host OS and the hypervisor. This meant that four out of eight servers could not be used as servers inside the cloud infrastructure. Eucalyptus can probably be set up to use both Xen and KVM if it loads an image adapted to the correct hypervisor, but that is out of scope of this thesis.
Installing Eucalyptus can be done using the package manager of the OS. In Debian, apt-get can be used once the repository of Eucalyptus has been added to /etc/apt/sources.list. Depending on the version used the repository location is different. To add Eucalyptus 2.0.2, edit the sources.list file and add the following line:
deb http://eucalyptussoftware.com/downloads/repo/eucalyptus/2.0.2/debian/ squeeze main
Calling apt-get install eucalyptus-nc eucalyptus-cc eucalyptus-cloud eucalyptus-sc will in-stall all the parts of Eucalyptus on the server it was called on. Starting, stopping and rebooting Eucalyptus services is done through the init.d/eucalyptus-* scripts.
The physical network setup is to have one server act as the front-end that has all the cloud, cluster and storage controllers running on it. The three other servers only run virtual machines and talk to the front-end. The FE does not run any instances at all to avoid any issues with networking and resources. Public IPs can be ”booked” in the CLC configuration file and for the test environment the IP ranges from *.*.*.193-225 were set as available for Eucalyptus. This needs to be communicated with the network admin, as the Eucalyptus software must assume that these are IP addresses are free and no one else are using them.
4.2. Setup, configuration and usage 25
Figure 4.1: Eucalyptus network layout on the test servers along with what services were running on each server.
In terms of the networking, different modes has different benefits. However, due to the configuration of the subnet DHCP server the SYSTEM mode is not an option. The STATIC configuration simplifies setting up new VMs but it prevents benefits like VM isolation and especially the metadata service. So the choice is to use either the MANAGED or the MANAGED-NOVLAN mode. By setting up a virtual network between two of the server one can verify whether the network is able to use virtual LANs. This is documented on the Eucalyptus network configuration page [8]. The network verifies as VLAN clean, that is it is able to run VLANs.
Most of the problems encountered when setting up an Eucalyptus infrastructure is re-lated to the network. The networking inside the infrastructure consists of several layers, and with the error logs available (found in the /var/log/eucalyptus/*.log files) there is a lot of information to search through when searching for errors. The configuration file that Eucalyptus uses, /etc/eucalyptus/eucalyptus.conf, has a configuration setting that sets the verbosity level of the logging. When installing, at least INFO setting is found to be recom-mended, while DEBUG can be set when there are errors that are hard to find the source of.
When working with the Eucalyptus configuration file, it is important to note that the different subsystems (NC, CC, CLC, SC and Walrus) uses the same configuration file but different parts of it. As an example, changing the VIRTIO * settings on the Cloud controller has no effect, as it is a setting that only the Node controller uses. This might cause confusion in the beginning of the configuration, but the file itself is by default very well documented.
When setting up the initial Eucalyptus version - Eucalyptus 1.5.1 - problems occur with compability of newer libraries. Since 1.5.1 uses libraries that are relatively new compared to the Eucalyptus itself, it attempts to load libraries that has changed the names and/or locations. The 1.5.1 version does for example attempt to load the Groovy library through an import call that points to a non-existent location. To remedy these library problems 1.6.2 was selected as the next Eucalyptus version to try. This demanded a kernel and distribution upgrade from Debian 5 to 6.
Eucalyptus 1.6.2 has a better support for KVM networking and calls to newer libraries so it should work better than 1.5.1. 1.6.2 does not run without any problems though. Booting the NCs on the three node server (test06-08) as well as the Cluster controller on the front-end (test05) goes without any problems but the cloud controller silently dies a
few seconds after launching it. This is an error that cannot be found in the logs since it is output directly to stderr from the daemon, which itself is directed to /dev/null. By running the CLC directly from /usr/sbin/eucalyptus-cloud instead of /etc/init.d/eucalyptus-clc one can see that 1.6.2 still has dependency problems with newer Hibernate and Groovy libraries. This can be solved by downgrading the libraries to earlier version, however this can cause compability issues with other software running on the servers.
To prevent compability issues the latest version, 2.0.2, was installed on the four servers used. This proved to be a working concept. All the Eucalyptus services are running, but they are not connected to each other. The CLC is not aware of any SC, Walrus or CC running, and the CC is not aware of any nodes available. Since Eucalyptus is a web service that runs on Apache Web Server one can verify that the services are running by calling
ps auxw | grep euca
to determine that the correct daemons are running. This should show an httpd-daemon running under the eucalyptus user.
There are two ways of connecting the different parts of the system. Either the correct settings can be set in the configuration file at the CLC and then rebooting it or running the euca conf CLI program. What euca conf does is actually change the configuration file and reboot a specific part of the CLC. The cloud controller can then connect to the NCs through REST calls (which can be seen by reading the /var/log/eucalyptus/cc.log file at the front-end). This means that the Eucalyptus infrastructure is working in one term, but the virtual machines themselves can still be erroneously configured.
Network configuration
4.2. Setup, configuration and usage 27
Configuring the right network settings for the VMs that run inside the cloud is a some-what trial and error procedure. Even though there are documentation on network setup, it does not explain why the setup should be in the documented way. It also assumes that the cloud has a specific setup regarding the physical network. In a perfect environment the front-end should have two NIC’s, with one NIC connected to the external network and one to the internal. See Figure 4.2.
The front-end will act as a virtual switch in any case which means that in a single-NIC layout the traffic passes through the physical layer to the front-end, where it is switched and then passed through the same network again as shown in Figure 4.3. This means that on the NCs the Private and public interface settings are in fact the same NIC, with a bridge specified so that virsh knows where to attach new VMs. On the front-end there is no virtual machines but only a connection from the outside network (the NIC, eth0) and a connection to the internal network (here a bridge).
Figure 4.3: Network traffic in the test environment. Dotted lines indicate a connection on the same physical line.
While the documentation specifies that MANAGED should work even in the non-optimal layout the physical network works on, none of the different combinations of settings on the NC and FE could make the VMs to properly connect to the ”outside” network. An indication of a faulty network configuration is by checking the public IP address of the newly created instance through the euca2ools or another tool like hybridfox, or read through the logs. If the public IP shows 0.0.0.0 it is generally an indication of a faulty network setting. To properly configure the network for the VMs, the front-end and node controllers needs to have configured settings in the configuration file that match each other. Table 4.1 shows variables shared on the NC and front-end found in the configuration file, with their setting in the test environment that provides a working network.
Variable Front-end Value NC Value Comment
VNET MODE MANAGED-NOVLAN MANAGED-NOVLAN Network environment. Same on NC and FE. VNET PUBINTERFACE eth0 eth0 The public interface to
use.
VNET PRIVINTERFACE br0 eth0 The private interface to the nodes.
VNET BRIDGE - br0 Bridge to connect the VMs to.
Table 4.1: Configuration variables and their corresponding values in the environment.
When a VM is booted, the VMs network (which itself is a virtual bridged NIC) attaches to the bridge specified in the configuration file. This requires that the node controller and the frond end has created a bridge to the physical network.
When running on Ubuntu or Debian as host OS, disabling the package NetworkManager is required since it can - and most likely will - cause problems. NetworkManager can be scheduled to check the network to a certain setting and reboot it if it is configured in another way. This problem was identified at least twice as a root to networking issues during the course of this thesis.
Node Controller configuration
The Node controllers are responsible for communicating with the hypervisor on each ma-chine. In Eucalyptus, this is done by talking to virsh, which is a virtualization binary for Linux. Virsh then takes care of telling KVM in this case which image/ramdisk/kernel to load and a few different settings. Most of these can be left untouched, but there are a few things that needs to be changed according to Eucalyptus settings. Virsh is made to run stand-alone, and in order for Eucalyptus to run virsh it has to be set as the user who runs it. These settings is specified in the Eucalyptus administration handbook, see the references [8].
If the VMs fail to receive an IP address one can specify in the virsh settings to use VNC, which is a remote desktop system. Using VNC can be regarded as a security issue and should only be used as a means to troubleshoot. VNC can be added to all the VMs launched in the NC where the changes are made. This requires an image to be used where a username/password pair already exists on the image, compared to the root access that is given through keypairs in Eucalyptus default environment. VNC was used in the initial troubleshooting of the networking by editing the /usr/share/eucalyptus/gen kvm libvirt xml file. In the <devices> element, adding this gives VNC accessibility:
<graphics type=’vnc’ port=’-1’ autoport=’yes’ keymap=’en-us’ listen=’0.0.0.0’/>
Node controllers running MANAGED-NOVLAN mode requires the NIC to be bridged. This can be done using the linux package bridge-utils with a command named brctl. When brctl is installed bridges can be added or removed by editing the network configuration file, /etc/network/interfaces, and rebooting the networking daemon through the command /etc/init.d/networking restart. The NCs all have the same networking configuration on the test system which is:
iface eth0 inet dhcp auto lo
iface lo inet loopback auto br0
iface br0 inet dhcp bridge_ports eth0
This configuration means that the eth0 is not activated but configured by default, whilst the loopback address (127.0.0.1 or localhost) is always on. The bridge is attached to the eth0 interface where it is automatically configured and run.
4.2. Setup, configuration and usage 29
In the Eucalyptus configuration file most of the settings can be left to the default values as the configuration file is mostly set to run ”out-of-the-box”. Changing the NC PORT con-figuration can be useful when there are other services running on the server, but it requires changing the settings on the front-end too, as this is the port where the different Eucalyp-tus parts talks to each other. The only things that the NC needs to be set are VNET MODE, VNET PUBINTERFACE, VNET PRIVINTERFACE, VNET BRIDGE and HYPERVISOR (either ”xen” or ”kvm”, where kvm was used in this thesis).
Front-end configuration
Front-end refers to the Cloud controller, which in this case also is the Walrus, Storage Controller and Cluster controller. On larger clusters the SC, CC and Walrus should run on other servers to minimize load. Cluster controllers also specify what is referred to as Availability Zones in AWS terms.
Configuring the front-end is done by editing the eucalyptus.conf file. The changes re-quired to be made here is larger than on the NCs, as in this file the networking mode, subsystem locations and external software are specified. Some of the changes in the config-uration file can be done using the euca conf script to administrate the system, but editing the configuration file is suggested to ensure that all the changes are exactly as the ones needed. In the test environment the following settings are changed from the default (see 4.1 for VNET-modes of the front-end):
DISABLE DNS
Enabling DNS allows the VMs to use Fully Qualified Domain Names. This requires that the network admin adds the front-end address to the public nameserver as the nameserver of Eucalyptus VMs. This is further described in the handbook [8]. Set to N in the test environment as we want to use FQDNs due to Hadoop require-ments.
SCHEDPOLICY
The policy on how the front-end chooses which NC to place a new VM on. ROUNDROBIN is the default which means that the front-end cycles through each NC to add a new VM. This gives an even distribution of VMs.
Set to ROUNDROBIN.
VNET DHCPDAEMON
Specifies the location of the binary to the DHCP server daemon. The front-end will run this on the internal net to distribute IP-addresses to the VMs.
Set to /usr/sbin/dhcpd3.
VNET DHCPUSER
Sets the user to run the dhcp server. Can be either dhcpd or root. Disabled on the test environment, which defaults to root.
VNET SUBNET
Specifies which internal IP range the VMs should use inside the internal network. These cannot be accessed from outside. The IP range should be something that does not exists on the current subnet.
VNET NETMASK
In conjunction with the subnet setting, the netmask specifies which addresses can be given out of the subnet.
A netmask of 255.255.0.0 which is the test setting, will enable VMs to receive internal addresses from 10.10.1.2 - 10.10.255.255. The 10.10.1.1 is the address of the DHCP server in the internal network (basically the front-end).
VNET DNS
Address to the external DNS. When a properly configured image boots, it will fetch this address from the meta-data service and add it to its /etc/hosts file.
VNET ADDRSPERNET
Determines how many addresses can be set per virtual subnet. Recommended is either 32 or 64 on large clouds. It is quite irrelevant on the test environment as there is not enough physical hardware to support that many VMs.
Set to 32.
VNET PUBLICIPS
This can be either a range or a comma-separated list of IP addresses. These are the addresses on the public external net that is free to use.
Set to 130.239.48.193-130.239.48.225 on the test environment.
As the front-end takes care of distributing IP addresses to new VMs in all modes except SYSTEM, the front-end needs to have it’s internal DHCP server running. The DHCP daemon is an external binary that has to be set in the configuration file. This binary will only run when there are VMs in the cloud running, and it will only listen to, and answer calls from, VMs. An issue with Eucalyptus 2.0.2 is that it is programmed to work towards the dhcpd3 daemon, whereas the latest daemon which replaces dhcpd3 in the Debian repo is the isc-dhcp-server. The isc server has an incompatible API which renders it unusable to Eucalyptus 2.0.2. This means that to run the Eucalyptus DHCP service the system needs the older dhcpd3 server installed and not the isc-server. This can be done by forcing apt-get to install the older version of dhcpd3.
Accessing and managing a Virtual Machine
To launch a virtual machine inside the Eucalyptus cloud the user requires keypairs that are created and retrieved through the web interface of the Eucalyptus. The default user, the admin, has privileges to run, add, remove or edit any images inside the cloud, while an ordinary user cannot do as much. In the test environment the admin privileges was used so that images could easily be updated into the cloud.
The first thing required is to generate a keypair that is bound to the user (admin in this case). This keypair is used in all the managing of both the Eucalyptus features and instances. By logging into https://<front-end>:8443/#login and choosing Credentials one can download the required keypair files. Extracting and using source eucarc in a CLI gives the current command interface access to euca2ools credentials, which means CLI interaction with Eucalyptus features. The keypair, which is a file ending with .private, is required to use SSH into an image.