1
Temperature-Aware Virtual Machine Scheduling in
Green Clouds
Thesis submitted in partial fulfillment of the requirements for the award of degree
of
Masters of Technology
in
Computer Science and Applications
Submitted By
Vikas Kumar
(Roll No. 601003029)
Under the supervision of:
Dr. Rajesh Kumar
Associate Professor
School of Mathematics and Computer Applications
Thapar University,
Patiala – 147004.
June 2012
2
4
Abstract
Rapid growth of the demand for computational power by business, scientific and web-applications has led to the creation of large-scale data centers consuming enormous amounts of electrical power leads to increase in size of electrical bill and heat dissipation. The increased power consumption and performance of data center increases the operating temperature of computing facility.
High temperature gradients degrade reliability and performance therefore it requires vigorous cooling in order to keep the equipment and the software stable. Moreover, high energy consumption not only increases operational cost, which reduces the profit margin of Cloud providers, but also leads to high carbon emissions which is not friendly for environment. Hence energy-efficient solutions are required to minimize the impact of cloud computing on both cost of operation and environment.
In this thesis, a new approach for scheduling of virtual machines (VMs) in Cloud environment is presented that provides efficient green enhancements within a scalable Cloud Computing architecture. Proposed thesis “Temperature-Aware Virtual Machine Scheduling in Green Clouds” aims to maintain the temperature of Virtualized Cloud system below critical temperature threshold by scheduling VMs according to temperature of node and insures reliable quality of service (QoS). Thus apart from saving energy and money by avoiding huge investment on cooling, it also reduces carbon footprints. To demonstrate feasibility of our approach from a performance perspective, quantitative results have also shown.
5
Contents
Certificate………...…………...………...………...I Acknowledgement………..………..……….II Abstract………...…………..……….III Contents……….…..…..….….…………....IV List of Figures……….………….………...VII List of Tables………..………...VIII Chapter 1 Introduction………..……….………....….1 1.1 Cloud Computing ……….………...2 1.2 Service Models………..…..……….…...31.2.1 Infrastructure as a Service (IaaS)………..………..…....4
1.2.2 Platform as a Service (PaaS)………...4
1.2.3 Software as a Service (SaaS)………....…….………..5
1.3 Deployment Models………..………..………5
1.3.1 Private Cloud………..………...5
1.3.2 Public Cloud……….………...….…..6
1.3.3 Community Cloud………..………...……….7
1.3.4 Hybrid Cloud……….………...….……...8
1.4 Various Cloud Providers ………...…..………..………….8
6
1.6 Impediments to Cloud Adoption………..……...…………..12
1.7 Green Computing………....……….….14
1.8 Organization of thesis………...…………...15
Chapter 2 Literature Survey………...………..…..………..16
2.1 Virtualization………...……….………….16
2.1.1Paravirtualization………...……….….………..18
2.1.2 Full virtualization………...……….……...18
2.1.3 Virtualization Solution Providers………...………….………...18
2.2 Dynamic Voltage and Frequency Scaling………..………..………….21
2.3 Contemporary Approaches for Energy Efficiency in Cloud Data Centers……...21
2.4 Novel Approaches for Energy Saving in Network Infrastructure………....…….28
2.5 Power efficient software……….……….……….….28
2.15.1 Power Efficient Software Principles………..………..…..………..……..29
Chapter 3 Problem Statement………...…...…….………30
3.1 Gap Analysis………....………..……...30
3.2 Objectives of Thesis………..……....……30
Chapter 4 Temperature-Aware Virtual Machine Scheduling in Green Clouds………...…..….32
4.1 Architecture……….……....…....………..32
4.2 Working of Proposed Scheduling Technique………..….…...….. 34
4.2.1 Migration of VMs ………...………...……….35
7
4.3 Algorithm for VM Migration………....……..………..36
4.4 Flow Chat for VM Migration………....…..………..37
4.5 Algorithm for Allocation of Node to VM Request………...……….38
4.6 Flow Chat for Allocation of Node to VM Request………....…...………39
Chapter 5 Experimental Results………....…..……….41
5.1 Implementation………..……...………...41
5.1.1 Platform Setup………...…….…...…...…41
5.1.2 Paravirtualized Cloud Environment Setup………..……….41
5.1.3 Monitoring the Temperature of Node………..……....………42
5.2 Evaluation………..…..………...…...43
5.2.1 Test Case 1………..……..…..………….43
5.2.2 Test Case 2………..……..…..………….45
5.2.3 Test Case 3………..………..………..………….47
Chapter 6 Conclusions and Future Scope………...………..…….….…………..49
6.1 Conclusion…………...………..……...……..49
6.2 Future Scope………..……...…...49
8
List of Figures
Fig. 1.1 Evolution of Cloud Computing from Mainframe…….………...………1
Fig. 1.2 Hosting, Outsourcing and Cloud Computing…………...…….…………..………….2
Fig. 1.3 Cloud Computing service models………...………....…….………4
Fig. 1.4 Private Cloud deployment models………...………....………6
Fig. 1.5 Public Cloud deployment models…………...………..……..………….6
Fig. 1.6 Community Cloud Computing delivery model……….…...………....7
Fig. 1.7 Hybrid Cloud Computing delivery model……….…………...….………...…8
Fig. 2.1 Paravirtualization………....…………...16
Fig. 2.2 Full Virtualization………..…..………..17
9
List of Tables
Table 5.1 Platform Configurations……….………...………...41
Table 5.2 Global List……….………...………43
Table 5.3 Local List of node 1………..………..…….43
Table 5.4 Local List of node 2………..………..……….43
Table 5.5 Global List………44
Table 5.6 Local List of node 1………..………..………….44
Table 5.7 Local List of node 2………..…..………….44
Table 5.8 Global List……….……...…………45
Table 5.9 Local List of node 1………..………..……….45
Table 5.10 Local List of node 2………..……..…………...45
Table 5.11 Global List……….………...………..46
Table 5.12 Local List of node 1………..…….……...……….46
Table 5.13 Local List of node 2………..…..…..………….46
Table 5.14 Global List………...…….…..………...47
Table 5.15 Local List of node 1………..………….47
Table 5.16 Local List of node 2………..……….47
Table 5.17 Global List………..…….……….…...48
Table 5.18 Local List of node 1………..……....……….48
10
Chapter 1
Introduction
The underlying concept of Cloud Computing rocks back in the mainframe days of 1960‟s when the idea of utility computing was proposed by MIT‟s computer scientist John McCarthy. He wrote that “computation may someday be organized as a public utility” [1]. Utility computing became a sort of business for companies such as IBM, Oracle etc. These companies saw the potential for enormous profit to be made in this type of business and started providing computing services. Fig 1.1 shows the evolution of Cloud from Mainframe.
Fig. 1.1. Evolution of Cloud Computing from Mainframe
Grid Computing developed from the idea of linking the number of computers to increase scalability and availability. The Grid specifically refers leverage of computers for particular application where as Cloud Computing leverages the multiple resources along with the computational resources to provide the services to the end users.
11
1.1 Cloud Computing
NIST definition of Cloud Computing - “A model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” [2]. Cloud Computing is the delivery of computing as a service rather than providing as product, whereby shared resources, softwares, and information are provided to computers and other devices as a utility (like the electricity grid) over a network (typically the Internet).
The primary idea of Cloud Computing is that organizations do not longer manages and own their IT infrastructure, but have it delivered as a service by a Cloud Service Provider (CSP). Over the last years, there is a vogue to outsource more and more of IT services to external parties. It is difficult to make a clear distinction between shared service centers (SSC), hosting, outsourcing and Cloud Computing. Fig. 1.2 shows the difference between these terms based on three aspects: delivery of service, management of IT resources and ownership of assets. The more these aspects can be plotted to the right on the arrows, the more can be spoken about Cloud Computing.
12
To describe Cloud Computing and the fundamental difference with traditional IT or outsourcing, the following characteristics [2] can be used:
Resource Pooling: Contrary to traditional IT, resources are shared by multiple customers (multi-tenancy).
Rapid Elasticity: Cloud services can be easily scaled up and down by the demands of the customer.
Measured Service: Customers only pay for a service they use („pay-as-you-use‟ or by subscription) instead of paying for long-term licenses and/or investing in hardware which are not related to the actual usage.
Broad Network Access: Although leased lines and exclusive networks can be used for Cloud Computing, its primary infrastructure is the public internet.
On-Demand Self-service: In contrast to the vast majority of traditional IT, Cloud services can be used almost instantly.
An easy to understandable example of Cloud Computing is e-mail. In the traditional IT model, organizations had their own e-mail servers, which were managed by company‟s IT administrators. The e-mail was only available within the office, and the IT administrators had to manage and backup their e-mail for the whole organization. When a server reaches its full capacity, the administrators had to deploy extra servers. With Cloud Computing, organizations buy mail as a service from a CSP, e.g. Gmail or Microsoft Office 365. The CSP stores the e-mails somewhere on its servers, manages the backups, and delivers a nearly 100% availability from anywhere over the world. When an e-mail box is full, CSP provides an easy and cheap some extra storage space to buy. The organization only pays for the amount of service it uses.
1.2 Service Models
To be able to talk about more specific about the services, Cloud Computing can be split into three service models, Software, Platform and Infrastructure as a Service [2]. These service models depict the degree of service or control of the Cloud Service Provider (CSP) on services and the degree of freedom possess by a customer. Fig. 1.3 gives a graphical representation of
13
the different service models, and their components. The blue blocks are managed by the customer; grey blocks are delivered as a service by the CSP.
Fig. 1.3. Cloud Computing service models [2]
1.2.1 Infrastructure as a Service (IaaS)
Using Infrastructure as a Service (IaaS), the customer buys infrastructure services from a CSP, but manages the layers on top of the infrastructure itself. In this service model, the CSP offers processing power, storage, networks, and other fundamental computing resources. The consumer is able to deploy and run operating systems and applications on it. The consumer does not manage or control the underlying Cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly limited control of select networking components such as firewalls [4]. Examples of IaaS are Amazon Elastic Compute Cloud and Terremark Enterprise Cloud.
1.2.2 Platform as a Service (PaaS)
In Platform as a Service (PaaS) model, the CSP offers a development platform on top of the services delivered with IaaS (which is described earlier). The consumer is able to deploy
14
applications onto the Cloud infrastructure created using programming languages and tools supported by the Cloud provider. The consumer does not manage or control the underlying Cloud infrastructure but has control over the deployed applications [4]. Examples of PaaS platforms are Amazon Elastic Beanstalk, Microsoft Azure Platform, Force.com and Google App Engine.
1.2.3 Software as a Service (SaaS)
In the Software as a Service (SaaS) model, the CSP offers software as a service, including the applications. The applications are accessible from various client devices through a thin client interface such as a web browser. The consumer does not manage or control the underlying Cloud infrastructure, but may be able to set limited user-specific application configuration settings [4]. Examples of common SaaS applications are Gmail, Office 365 and SalesForce.com.
With SaaS, the customer takes the full application services from the CSP. The customer IT department does not have to install or deploy any software; the application can be used via the internet. The customer IT department (or business analysts) can configure the application to the customer‟s needs, but only within the boundaries offered by the CSP. The customer only pays for the capacity used; this can consist of etc the number of users and (or) premium options in the software.
1.3 Deployment Models
Services of Cloud Computing can be delivered with any of four deployment models: private, public, hybrid or community. These deployment models describe who owns, manages and is responsible for the services.
1.3.1 Private Cloud
In a Private Cloud, the services are completely dedicated to the particular customer; resources are not shared with other customers. The Cloud infrastructure is operated solely for an organization. It may be managed by the organization or a third party and may exist on premise or off premise [5].
15
Fig. 1.4 shows that the Private Cloud is only used by one customer; resources are not shared with other customers. The Cloud service may be offered by the customer‟s IT department itself, or by an external CSP. The Dutch government is example of an organization which is building its own internal Private Cloud.
Fig. 1.4. Private Cloud Computing delivery model [3]
1.3.2 Public Cloud
In a Public Cloud, the delivered services are shared with other customers. The Cloud infrastructure is made available to the general public and is owned by a provider selling Cloud services [5].
16
Fig.1.5 shows that in the Public Cloud, resources are shared with multiple customers, which may operate in different market segments, and may have different security demands. Public Clouds offer most of the Cloud advantages, as the CSP can optimally utilize the resources by sharing them among multiple customers.
1.3.3 Community Cloud
The Community Cloud combines aspects of the Private Cloud and Public Cloud, resources are shared, but only with other customers that have the same requirements. The Cloud infrastructure is shared by several organizations and supports a specific community that has shared concerns. It may be managed by the organizations or a third party and may exist on premise or off premise [5].
Fig. 1.6 shows an example of a community Cloud, which is in this case used for a government community. The users of this community Cloud (government agencies; all purple blocks in the
Fig.1.6. Community Cloud Computing delivery model [3]
Above diagram) have the same demands and security requirements for their IT. Google offers such a government Cloud with the Google Gov Cloud.
17
1.3.4 Hybrid Cloud
A Hybrid Cloud combines multiple deployment models. Hybrid cloud is a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together, offering the benefits of multiple deployment models. It can also be defined as multiple cloud systems that are connected in a way that allows programs and data to be moved easily from one deployment system to another [5].
Fig. 1.7 gives a graphical representation of a Hybrid Cloud, consisting of a Public Cloud and Private Cloud Hybrid cloud is a composition of two or more clouds (private, community, or public) deployment models.
Fig.1.7. Hybrid Cloud Computing delivery model [3]
1.4 Various Cloud Providers
This section presents the brief overview of some of the famous Cloud providers in market, distinguishing them bases of services and management of the resources provided by them.
Amazon Web Services (AWS)
AWS [6] refers to the services offered by Amazon to cover the entire service spectrum i.e. IaaS, SaaS and PaaS, AWS includes a number of components:
18
Amazon Elastic Compute Cloud (EC2): The IaaS product of Amazon is the leader in its class. It supplies customers with a pay-as-you-go resource that can include storage or computation. EC2 has a web interface for requesting virtual machines as server instances. An EC2 instance seems like physical hardware and it lets the customer to control settings of the entire software stack. Server instances are available in three different sizes; each one having a different amount of memory, computing power, and bandwidth.
Amazon Simple Storage Service (S3) implements a dynamically scalable storage service which can be used to host applications that are subsequently offered to end-users.
Amazon Simple DB realizes a database (DB) and provides it as a web service. Developers store and query data items via web services requests. Amazon liberates these developers from worrying about the database‟s internal complexity.
Rackspace
Rackspace [6] offers infrastructure as a service called Cloud servers and a platform as a service as Cloudsites, to host web applications. Rackspace also provides Cloud files as a storage service which can be combined with a content delivery network (CDN) service. This latter service competes directly with the CDN from Amazon called Cloud front, but Rackspace, unlike Amazon, does not charges for bandwidth consumption between the storage service and the CDN.
GoGrid
GoGrid [6] provides infrastructure as a service, standing as a direct competitor to Amazon or Rackspace. GoGrid offers a competitive service consisting on dedicated hosted servers in their Cloud facilities. Thus they are a provider of virtual or physical infrastructure on-demand, unlike Amazon (who only supplies virtual infrastructure on-demand). Additionally, GoGrid complements the offer of dedicated infrastructure with a hybrid environment that enables users of their dedicated hosting service to request virtual resources to handle usage spikes.
19
Salesforce
Salesforce [6] is one of the pioneers in Cloud Computing. Salesforce‟s first and still main product is a Customer Relationship Management (CRM) web service. Salesforce has focused on enterprise customers and has added new applications on top of its CRM. While earlier Salesforce only offered SaaS class products, in 2002 Salesforce shifted towards the PaaS market with the release of new platform that allows developers to develop applications that will execute natively on their platform or be integrated with third party services. Salesforce is responsible for scaling up or down the platform as needed, thus making the addition of new physical resources transparent to the user.
The Saleforce development environment is based on the Eclipse integrated development environment (IDE) and uses a new programming language called APEX. APEX is closely related to C# and Java. Saleforce also provides non-programmers with tutorials and models to enable them to compose business web applications in a visual way.
Google App Engine
Google‟s PaaS product [6] is a platform to develop and host web applications on Google‟s servers. The user can leverage Google‟s distributed and scalable file systems (BigTable and File System), along with technologies used by Google‟s wide range of web applications e.g., Gmail, Docs, Google Reader, Maps, Earth or Youtube.
Although in the beginning the only programming language supported was Python, presently but support for Java, and it is forecasted that other programming languages will be allowed in the future. In a move towards connecting both Clouds, Google and Salesforce have recently provided libraries that allow the developer to access the each other‟s web services application programming interface (API) through applications. Once installed, the application can seamlessly make web service API calls of the each other services, hence integrating applications hosted on both Clouds.
Microsoft Windows Azure
Microsoft‟s PaaS service is called Windows Azure [6]. This is a very new (commercially it became available in February 2010) Cloud platform offering that provides developers with
on-20
demand computing and storage to host, scale, and manage web applications on the Internet using Microsoft‟s datacenters.
The Azure Services platform currently runs only .NET Framework applications but Microsoft has indicated that a large range of languages will be supported. Indeed, two software development kits (SDKs) have already been made available for interoperability with the Azure Services platform that enable Java and Ruby developers to integrate their application with .NET services.
Eucalyptus
Eucalyptus [6] cannot comparable in size or capacity with the previously discussed Cloud providers, but worth including because of its distinctive purpose. This is an open source Cloud Computing framework developed by the University of California at Santa Barbara as an alternative to Amazon EC2. The main goal of Eucalyptus was to enable researchers to perform research in the field of Cloud Computing. This initiative is unique as no other Cloud system combines support for open development with the goals of being easy to install and maintain. Its IaaS model is fully compatible with Amazon‟s EC2 as Eucalyptus uses the same API as AWS.
1.5 Benefits of Cloud Computing
In addition to lower expenses, enterprises can benefit from many other primary benefits associated with Cloud Computing. These can be summarized as follows [5]:
Cost
Companies can save the considerable costs associated with building, maintaining, and operating a data center, especially power and cooling related expenditures. Additionally, the model allows firms to lower expenditures on support staff, particularly those providing infrastructure support, systems management and help desk services.
Capability/Scalability
Many organizations have simply run out of existing capacity due to limitations on power consumption. With the Cloud, companies can scale quickly and efficiently added investment.
21
Many Cloud providers even offer burstable infrastructure that automatically expands and contracts to meet peak performance periods.
More Green
Businesses are being pressured to reduce their impact on the environment in the form of greenhouse gases. As a result, they are now required to report their carbon emission.
Outsourcing via a Cloud solution enables companies to become more environmentally friendly.
Efficient Use of Computing Resources
The advent of virtualization has provided companies with ways to efficiently use their computer resources. Users no longer require separate servers for different applications. With virtualization multiple server technologies can run from a single server. This shift to virtualization supports the growth of Cloud Computing due to the increased capabilities of servers.
Matches Current Computing Trends
The introduction of the notebooks has moved a lot of sales from computers and laptops with more powerful processors and extended capabilities to less powerful and more efficient platforms. This signals that users are looking for computers that meet their needs and are affordable. The advent of Cloud Computing will be able to match this trend because a lot of the processing overhead is performed at the servers and not the computer, so the need for an extremely powerful computer is muted.
1.6 Impediments to Cloud Adoption
Enterprises should carefully consider five major impediments to the successful adoption of Cloud Computing. These challenges are as follows [7]:
Security
Commercial Cloud providers offer broad access to end users an accordingly roles and access permissions are less controllable. Legacy security measures must be duplicated within the Cloud such as firewalls and intrusion detection. Virtualization greatly adds to the complexity of this process and provides new threats in areas such as virtual switches and hypervisors.
22
Privacy/Compliance
Issues related to privacy include jurisdiction of information (where and under what set of laws), access and controls, the availability of audit trails, and compliance with industry and legal standards and regulations.
Immaturity of Vendors/Offerings
Public Cloud IaaS providers have yet to develop a strong track record in supporting large production or enterprise systems on a ready to use basis.
Risk Mitigation
It is difficult to determine how well a provider is mitigating data location, loss, or security oriented risks. In fact, some providers have simply gone out of business. Consequently, requirements for data protection should be strictly governed through the use of contractual service level agreements.
Legacy Applications
Core business applications are often highly customized, convoluted, and entangled. As a result, prior to moving them to the Cloud a re-engineering effort is often required to modernize and rationalize an applications portfolio before it is deemed “Cloud-worthy.”
Though Cloud computing is a highly scalable and cost-effective infrastructure for running High Performance Computing, enterprise and Web applications, as discussed earlier in this chapter. However, the growing demand of Cloud infrastructure has drastically increased the energy consumption of data centers, which has become a critical issue. High energy consumption not only translates to high operational cost, which reduces the profit margin of Cloud providers, but also leads to high carbon emissions which is not environmental friendly. Hence, energy-efficient solutions are required to minimize the impact of Cloud Computing on both operating cost and on the environment. Therefore Cloud Computing has a green line .i.e. converted into Green Computing. There are several technologies and concepts employed by Cloud providers to achieve better utilization of resources and energy as
23
compare to traditional computing which will be discussed in chapter 2. Before moving to chapter 2 a brief introduction to Green Computing is presented in later section.
1.7 Green Computing
Green computing is the study and practice of using computing resources efficiently. The primary objective of such a program is to account for the triple bottom line (i.e. “People, Planet, Profit”) [8].
Green Computing is a very hot topic now days, not only because of rising energy cost and potential savings, but also due to the impact on the environment. Energy to operate and cool computing systems has grown significantly in the recent years, primarily due to the volume of systems and computing that companies now heavily rely upon.
Reasons for promoting green, or energy efficient computing are: Climate Change
First and foremost, conclusive research shows that CO2 and other emissions are causing global climate and environmental damage. Preserving the planet is a valid goal because it aims to preserve life. Planets like ours, that supports life, are very rare. None of the planets in our solar system, or in nearby star systems have m-class planets.
Savings
Green Computing can lead to serious cost savings over time. Reductions in energy costs from servers, cooling, and lighting can generate savings for corporations.
Reliability of Power
As energy demands in the world go up, energy supply is declining or flat. Energy efficient systems helps ensure healthy power systems. Also, more companies are generating more of their own electricity, which further motivates them to keep power consumption low.
Despite the huge surge in computing power demands, there are dedicated ways for organization through they can reduce their energy consumption and CO2 footprint while maintaining required levels of computing performance.
24
A number of practices can be applied to achieve Green Computing in Cloud, such as improvement of application‟s algorithms, energy efficient hardware, Dynamic Voltage and Frequency Scaling (DVFS), terminal servers and thin clients, and virtualization of computer resources [9].
1.8 Organization of thesis
This thesis is organized as follows –
Chapter 2 – This chapter describes in detail the literature survey on various important related
works and tools for energy consumption-optimization for Cloud Computing.
Chapter 3 – In this chapter statement of the problem is given which has derived from the
literature survey.
Chapter 4 – This chapter gives detail of our proposed solution “Temperature-Aware Virtual
Machine Scheduling in Green Clouds”.
Chapter 5 – In this chapter results have been presented which is derived from implementation
of proposed solution.
Chapter 6 – This chapter describes the conclusion and future research work.
25
Chapter 2
Literature Survey
In this chapter we have surveyed related works and techniques for efficient energy management of resources in Cloud Computing and mapped them to our taxonomy to guide future design and development efforts.
2.1 Virtualization
In the Cloud model what customers really pay for, that is what they dynamically rent, are virtual machines. This enables the Cloud service provider to share the Cloud infrastructure located in a data center between multiple customers.
Fig. 2.1. Paravirtualization [10]
26
Fig. 2.2. Full Virtualization [10]
Virtualization strictly refers to the abstraction of computer resources using virtual machines; software implementations of machines that execute programs as if there were separate physical machines. Virtualization allows multiple operating systems to be executed simultaneously on the same physical machine. Virtualization and the dynamic migration of virtual machines allow Cloud Computing to make the most efficient use of the currently available physical resources. Virtualization is achieved by adding a layer beneath the OS, between the OS and the hardware. This additional layer makes it possible to run several OS instances on top of the same underlying resources. Two different options for this virtualization layer exist:
Type-1: This kind of virtualization layer is called a hypervisor. It is installed directly onto the
system, and has direct access to the hardware. For this reason it is the fastest, most scalable, and robust option.
Type-2: Second one is Hosted architecture. Here virtualization layer is placed on top of a host
operating system.
Both options for virtualization are applicable to x86 architecture systems. This platform will be used in this thesis as it is by far the most common architecture nowadays. Due to its dominance
27
in the PC market most operating systems are designed to be compatible with this architecture. There are two different techniques to perform virtualization: paravirtualization and full virtualization.
2.1.1 Paravirtualization
Paravirtualization involves modification of a guest operating system (OS). Today this method is only supported for open source operating systems, limiting its applicability. However, paravirtualization offers higher performance than full virtualization in performance because it does not need to trap and translate every OS call. Fig. 2.1 depict the virtual view of paravirtualization
2.1.2 Full virtualization
A full virtualization platform is barebones operating systems - like kernel called the hypervisor, usually with a management console that runs on top of the kernel. There is no host operating system. In VMware‟s case, the hypervisor is composed of a virtual machine monitor component and a kernel component, where the later provides all the hardware abstraction. These components are not based on traditional operating system technology, but designed for efficiently managing virtual machines. Fig. 2.2 show virtual view of full virtualization
2.1.3 Virtualization Solution Providers
In this section we will discuss three most popular virtualization technology solution providers: the VMware solutions, Xen hypervisor and KVM. All of these systems support way to perform the power management (which will described later in this section), however neither allows coordination of VMs‟ specific calls for power state changes. Other important capabilities supported by the mentioned virtualization solutions are offline and live migrations of VMs. They enable transferring VMs from one physical host to another, and thus have facilitated the development of different techniques for virtual machines consolidation and load balancing [11].
VMware
VMware, Inc. is a company providing virtualization software, founded in 1998 and based in Palo Alto, California, USA. VMware‟s desktop software runs on Microsoft Windows, Linux,
28
and Mac OS X, while VMware‟s enterprise software hypervisors for servers, VMware ESX and VMware ESXi, are bare-metal embedded hypervisors that run directly on server hardware without requiring an additional underlying operating system.
VMware software provides a completely virtualized set of hardware to the guest operating system. VMware software virtualizes the hardware for a video adapter, a network adapter, and hard disk adapters. The host provides pass-through drivers for guest USB, serial, and parallel devices. In this way, VMware virtual machines become highly portable between computers, because every host looks nearly identical to the guest. In practice, a system administrator can pause operations on a virtual machine guest, move or copy that guest to another physical computer, and then resume execution exactly at the point of suspension. Alternatively, for enterprise servers, a feature called VMotion allows the migration of operational guest virtual machines between similar but separate hardware hosts sharing the same storage. Each of these transitions is completely transparent to any users on the virtual machine at the time it is being migrated [12].
VMware Workstation, Server, and ESX take a more optimized path to running target operating systems on the host than emulators which simulate the function of each CPU instruction on the target machine one-by-one, which compiles blocks of machine-instructions the first time they execute, and then uses the translated code directly when the code runs subsequently. VMware software does not emulate an instruction set for different hardware not physically present. This significantly boosts performance, but can cause problems when moving virtual machine guests between hardware hosts using different instruction-sets (such as found in 64-bit Intel and AMD CPUs), or between hardware hosts with a differing number of CPUs.
Although VMware virtual machines run in user-mode, VMware Workstation itself requires the installation of various drivers in the host operating-system, such as the Global Descriptor Table (GDT) and the Interrupt Descriptor Table (IDT). The VMware product line can also run different operating systems on a dual-boot system simultaneously by booting one partition natively while using the other as a guest within VMware Workstation [10].
Xen
The Xen hypervisor is an open-source hosted or bare-metal virtualization technology developed collaboratively by the Xen community and engineers from over 20 innovative data center
29
solution vendors [13]. Xen is used as the basis for a number of different commercial and open source applications, such as: server virtualization, Infrastructure as a Service (IaaS), desktop virtualization, security applications, embedded and hardware appliances. Xen is powering the largest clouds in production today.
Here are some of Xen‟s key features are [14]:
Small footprint and interface: Xen uses a microkernel design, with a small memory footprint and limited interface to the guest which is around 1MB in size, it is more robust and secure than other hypervisors.
Operating system agnostic: Most installations run with Linux as the main control stack. But a number of other operating systems can be used instead, including NetBSD and OpenSolaris.
Driver Isolation: Xen has the capability to allow the main device driver for a system to run inside of a virtual machine. If the driver crashes, the VM containing the driver can be rebooted and the driver restarted without affecting the rest of the system.
KVM
KVM (Kernel-based Virtual Machine) is open source software that supports full virtualization solution for Linux on x86 hardware containing virtualization extensions (Intel VT or AMD-V). Limited paravirtualization support is available for Linux and Windows guests using the VirtIO framework. It supports a paravirtual Ethernet card, a paravirtual disk I/O controller, adjusting guest memory usage, and a VGA graphics interface using SPICE or VMware drivers in case of paravirtualization.
KVM consists of a loadable kernel module called kvm.ko, that provides the core virtualization infrastructure and a processor specific module, kvm-intel.ko or kvm-amd.ko. KVM has also been ported to FreeBSD and Illumos as a loadable kernel module. FreeBSD and Illumos bring new industry technologies such as DTrace and ZFS to the hypervisor, offering additional visibility and storage reliability [15].
30
2.2 Dynamic Voltage and Frequency Scaling
Dynamic Voltage and Frequency Scaling (DVFS) is technique through which one can lower the operating frequency and voltage, which results in decreased power consumption of a given computing resource considerably. Dynamic Voltage and Frequency Scaling (DVFS) technique reduces the number of instructions a processor can issue in a given amount of time, thus reducing the performance. This, in turn, increases run time for program segments which are sufficiently CPU-bound.
This technique was originally used in portable and laptop systems to conserve battery power, and has since migrated to the latest server chipsets. Current technologies exist within the CPU market such as Intel‟s Speed Step and AMD‟s Power Now! Technologies. These dynamically raise and lower both frequency and CPU voltage [9].
2.3 Contemporary Approaches for Energy Efficiency in Cloud Data Centers
In this section various works in efficient energy management at data center level is discussed in detail.
Load Management for Power and Performance in Clusters
Pinheiro et al. [16] have proposed a technique for managing a cluster of physical machines with the objective of minimizing the power consumption, while providing the required Quality of Service (QoS). The authors use the throughput and execution time of applications as constraints for ensuring the QoS. Here nodes are assumed to be homogeneous. The algorithm periodically monitors the load and decides which nodes should be turned on or off to minimize the power consumption by the system, while providing expected performance. To estimate performance the authors apply a conception of demand for resources, where resources include CPU, disk and network interface. This conception is used to predict performance degradation and throughput due to workload migration based on historical data. However, the demand estimation is static that is the prediction does not consider possible demand changes over time. To determine the time to add or remove a node the authors introduce a total demand threshold that is set statically for each resource and supposed to solve the problem of the dormancy caused by a node addition.
31
The algorithm is executed on a master node which creates a single point of failure and might become a performance logjam in a large system. And it may become fiasco for performance of system when demand growth rate is high.
Energy-Aware Consolidation for Cloud Computing
Srikantaiah et al. [17] have investigated the problem of dynamic consolidation of applications in virtualized heterogeneous systems in order to minimize energy consumption, while meeting performance requirements. The authors have explored the impact of the workload consolidation on the energy-per-application metric depending on both CPU and disk utilizations. The authors have found that the energy consumption per application results in "U"-shaped curve that is when the utilization is low, due to high fraction of the idle state, the resource is not efficiently used leading to a more expensive in terms of the energy-performance metric and when there is high resource utilization which results in increased cache miss rate, context switches and scheduling conflicts thus the energy consumption becomes high due to the performance degradation and consequently lengthy execution time.
Here main drawback is that authors have investigated problem which is workload type and application dependent thus it is not suitable for a universal Cloud environment.
Energy-efficient server clusters
Elnozahy et al. [18] have investigated the problem of power-efficient resource management in a single web-application environment with fixed response time and load-balancing handled by the application. The two main power-saving techniques are switching power of computing nodes on or off and Dynamic Voltage and Frequency Scaling (DVFS). But here the main idea of this approach is to estimate the total CPU frequency required to provide the necessary response time, determine the optimal number of physical nodes and set the proportional frequency to all the nodes. In this approach all the resource usage data are approximated.
However, the transition time for switching the power of a node is not considered. The load balancing is handled by an external system that is the system other then where this algorithm is running. The algorithm is centralized that creates Single Point of Failure (SPF) and reduces the
32
scalability. Despite the variable nature of the workload, the resource usage data are not approximated, which may results in potential inefficient performance.
Virtual Power: Coordinated Power Management
Nathuji and Schwan [19] have studied power management techniques in the context of virtualized data centers, which has not been done before. Besides hardware scaling and VMs consolidation, the authors have introduced and applied a new power management technique called “soft resource scaling”. The idea is to emulate hardware scaling by providing less resource time for a VM using the virtual machine monitor‟s (VMM) scheduling capability. The authors found that combination of “hard” and “soft” scaling may provide higher power savings due to the limited number of hardware scaling states.
The authors have proposed an architecture where the resource management is divided into local and global policies. At the local level the system that is guest OS governs power management strategies. However, such management may appear to be inefficient, as the guest OS may be power-unaware. While consolidation of VMs is handled by global policies which applying live migration to reallocate VMs. However, the global policies are not discussed in detail considering QoS requirements.
Distributed Application Scheduling Based on Prediction of Communication Events
Dodonov and De Mello [20] have proposed an approach to scheduling distributed applications in Grids based on predictions of communication events. They have proposed the migration of communicating processes if the migration cost is lower than the cost of the predicted communication with the objective of minimizing the total execution time. They have shown that the approach can be effectively applied in Grids.
Moreover, it is not viable for virtualized data centers, as the VM migration cost is higher than the process migration cost.
Data Center Network Virtualization Architecture
Guo et al. [21] have proposed and implemented a virtual cluster management system that allocates the resources in a way satisfying bandwidth guarantees. The allocation is determined
33
by a heuristic that minimizes the total bandwidth utilization. The VM allocation is adapted i.e. migration is performed when some of the VMs are deallocated or power off but protocols for the migration are defined statically.
However, the VM allocation is not dynamically adapted depending on the current network load. Moreover, in this approach energy consumption by the network due to the migration of VMs to minimize the bandwidth is not considered.
Towards Energy-Aware Scheduling in Data Centers using Machine Learning
Berral et al. [22] presented a theoretical approach for handling energy-aware scheduling in data centers. Here, the authors propose a framework which provides an allocation methodology using techniques that include turning on or off machines, power-aware allocation algorithms and machine learning to deal with uncertain information while the expected QoS is maintained through the avoidance of SLA violations.
In order to save energy, the strategy proposed in this paper is simple; reduce the number of active nodes by turning off those that remain inactive using workload consolidation. To achieve this, they propose a scheduling algorithm named “dynamic backfilling” which allows the migration of workloads among servers in order to provide a greater consolidation and thus the reduction of active nodes. These workload movements are performed with regard to certain policies that include System Occupation (SO), Current Job Performance (CJP) and Expected SLA Satisfaction (ESS) with the aim of improving the migration process and reducing SLA violations.
In order to reduce the performance degradation, machine learning techniques are introduced to predict the customer satisfaction level of each job before placing or moving them across the servers in the data center. Additional working nodes thresholds are utilized to assist the turning on or off server frequency and adjust the overhead caused by these operations.
While in this approach the authors mention the inclusion of different workload types for the experiment, they do not describe how these different types are handled by the proposed mechanism. However, their results confirm that there exist significant differences in energy-performance among distinct types of workloads.
34
Multi-Tiered On-Demand Resource Scheduling
Song et al. [23] have proposed resource allocation to applications according to their priorities in multi-application virtualized cluster. The approach requires machine learning to obtain utility functions for the applications and defined application priorities. It does not apply migration of VMs to optimize allocation continuously (the allocation is static). To ensure the QoS, the resources are allocated to applications proportionally according to the application‟s priorities. Each application can be deployed using several VMs instantiated on different physical nodes. In resource management decisions only CPU and RAM utilizations are taken into account.
In cases of limited resources, the performance of a low-priority application is intentionally degraded and the resources are allocated to critical applications. The authors have proposed scheduling at three levels: the application-level scheduler dispatches requests among application‟s VMs; the local-level scheduler allocates resources to VMs running on a physical node according to their priorities; the global-level scheduler controls the resource flow among applications.
The potential limitations of the proposed approach are that it requires machine learning to obtain the utility functions for applications. And it does not utilize VM migration to adapt the allocation in run-time. Thus approach is suitable only for enterprise environments, where application can have explicitly defined priorities.
Power and Performance Management via Lookahead Control
Kusic et al. [24] have discussed the problem of power management in virtualized heterogeneous environments using help of the Limited Lookahead Control (LLC). The objective of authors is to maximize the resource provider‟s profit by minimizing both power consumption and SLA violation. Kalman filter is applied to estimate the number of future requests to predict the future state of the system and perform necessary reallocations in the system via reallocation. However, in contrast to heuristic-based approaches, the proposed model requires simulation-based learning for the application specific adjustments which is achieved with the help of neural networks. Moreover, there may be significant increase in the execution time of the optimization controller due to the complexity of the model as the use of neural networks, which is not suitable for large-scale real-world systems.
35
Optimal Power Management for Server Farm to Support Green Computing
Niyato et al. [25] have proposed an approach which aims to contribute to the energy saving problem for data centers. They introduce a mechanism which works in two different sections of distributed data centers. First, each data center works along with an optimal power management module to make decisions about server mode switching to minimize the power consumption (turning on or off servers). Additionally, a module named job broker makes decisions on user‟s assignment to a specific data center with the aim of minimizing the total cost, which is composed of network and power consumption cost.
This approach considers the allocation of only one job per server. When the job is finished, the server sends a message to the scheduler indicating its status. Then the scheduler can assign a new job or deactivate it. This, in addition to the characteristic of awaking servers in advance, could be very beneficial to performance in scenarios where all the workloads had high computing demands. However, the energy savings in real cloud scenarios could be seriously affected because of the heterogeneity of workloads and the lack of mechanisms for handling heterogeneous hardware infrastructure. The allocation of jobs with low resource demands in complete servers could represent a serious resource waste problem.
pMapper: Power and Migration Cost Aware Application Placement
Verma et al. [26] have formulated the problem of power-aware dynamic placement of applications in virtualized heterogeneous systems as continuous optimization. At each time frame the placement of VMs is optimized with the help of reallocation i.e. live migration of VMs in order to minimize power consumption and maximize performance. The authors have applied a heuristic for the bin packing problem with variable bin sizes and costs.
Here, continuous migrations are performed to optimize the system but the information about the cost for calculation of migration is not provided, only the conception of cost of VM live migration is introduced. The proposed approach may not handle a strict SLAs requirement that is SLAs may be violated due to continuous migration in the system. The authors suggest several directions for future work which are not suggested by anyone else, such as consideration of memory bandwidth, more advanced application of idle states and extension of the theoretical proves of the problem.
36
Optimal Power Allocation in Server Farms
Liu et al. [27] described as an approach which aims reduce the power consumption in data centers by reducing the number of turned on servers. In order to achieve that, the authors present an architecture composed of some components such as monitoring services, a migration manager, the managed environment, and the front end that provides information to users. Although the authors describe this architecture as their final proposal, in this paper they are mainly focused in describing the live migration algorithm which search optimal placement of virtual machines, minimizing the total cost. In this paper cost has been calculated considering physical machine cost, the virtual machine status and the virtual machine migration cost. Maintenance of performance is pursued by a workload simulator which takes the resource requirements and collects real-time measurements from the data center.
However, it is not explained how the overhead caused by the turning on or off server is handled. Moreover, in this paper the authors center their focus in only one type of workload.
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
Laszewski et al. [28] have presented a scheduling mechanism which aims to reduce the power consumption in virtualized clustered environments by dynamically reducing processor speeds. The mechanism presented is composed of three algorithms that work together in order to allocate workloads in a virtualized cluster based on the required and available processor speed in the underlying physical nodes. The algorithms continuously monitor the VM‟s status to adjust the processor speed on each node, reducing the power consumption. To achieve that, this approach uses profiles describing the available and maximum processor speed for each server in the cluster.
In aiming to maintain the performance levels, the Xen hypervisor performance governor is used in order to enable the manual control of the frequencies according to the workload requirements. Additionally, the performance evaluation of varying the number of VMs and operating frequencies is presented. Here, nBench, a Linux CPU benchmark is used to simulate intensive computing jobs and measure the CPU performance at the same time.
37
However, the performance results in this approach are never correlated with the energy reduction obtained. Moreover, they assume only one type of workload with fixed behavior. This is not necessarily true in a real Cloud scenario where different behavioral pattern applications can live together.
2.4 Novel Approaches for Energy Saving in Network Infrastructure
Gupta et al. [29] have suggested putting network interfaces, links, switches and routers into sleep modes when they are idle in order to save the energy consumed by the Internet backbone and consumers. Based on the foundation laid by Gupta et al. [28], a number of research works have been done on the energy-efficient traffic routing by Internet Service Providers (IPSs) and applying sleep modes and performance scaling of network devices.
Chiaraviglio and Matta [30] have proposed cooperation between ISPs and content providers that allows the achievement of an efficient simultaneous allocation of compute resources and network paths that minimizes energy consumption underperformance constraints.
Koseoglu and Karasan [31] have applied an approach of joint allocation of computational resources and network paths to Grid environments based on the optical burst switching technology with the objective of minimization of job completion times.
Tomas et al. [32] have investigated the problem of scheduling Message Passing Interface (MPI) jobs in Grids considering network data transfers satisfying the QoS requirements.
2.4 Power efficient software
Future improvements in energy efficiency are likely to result from rethinking algorithms and applications at the higher levels of the computing stack, alongside improvements at the circuitry and other low-level components [9].
R.N. Mayo et al. [33] discovered that even simple tasks such as listening to music, making a phone call, etc, can consume significantly different amounts of energy on a variety of heterogeneous mobile devices. As these tasks have the same purpose on each device, the results show that the implementation of the task and the system upon which it is performed can have a dramatic impact on efficiency.
38
Software has always been constructed and optimized to maximize its efficiency in certain terms. Other optimizations are carried out for scalability or robustness but rarely made for energy consumption.
However, techniques exist to reduce the power needed by a piece of software to complete a set of tasks, leading engineers to realize that software can be constructed in an energy efficient manner [33].
The potential to enhance energy efficiency through software will depend upon the dissemination of these techniques to make them as ubiquitous as other performance enhancing measures.
2.41 Power Efficient Software Principles
Saxe outlines three key principles to producing Power Efficient software [34]:
The amount of work done by the software directly corresponds to the amount of resources consumed. Therefore if more energy is applied and the system runs in a higher state then the software will do more useful work by some magnitude appropriate to the energy increase.
The software will minimize the amount of unnecessary computation by using an event-based architecture over a polling system, and therefore remain dormant until action is required.
There should be extra care taken to ensure that the software has no problems with memory leaks or freeing unallocated memory. These problems will cause increased interference from the host operating system, resulting in additional energy consumption. A number of research works have been done on the energy efficient resource management in data centers as discussed in this chapter. However, the problem of thermal management in the context of virtualized data centers has not been investigated till date. Moreover, to the best of our knowledge there is no study on a comprehensive approach that combines optimization of VM placement according to the current utilization of resources with thermal optimizations for virtualized data centers. Therefore, the exploration of such an approach is crucial and consider for thriving the Cloud Computing environments. On the bases of literature survey, formulation
39
Chapter 3
Problem Statement
Previous chapter discussed related works and techniques available for energy efficient resource management in clouds. This chapter focuses on problem statement taken up in the thesis.
3.1 Gap Analysis
Data centers hosted with Cloud applications consume huge amounts of electrical energy, contributing to high operational costs and carbon footprints to the environment. Therefore, we need Green Cloud Computing solutions that can not only save energy for the environment but also reduce operational costs and CO2 emission from these data centers.
In literature survey various related works for efficient management of energy and Cloud resources are reviewed. On the bases of literature survey, following gaps are drawn:
There is no algorithmic approach for allocation of virtual machines in Cloud in order to optimize the temperature of data center.
There is no algorithmic approach for migration of virtual machines in Cloud in order to optimize the temperature of data center.
In short, huge amount of energy is spent in the form of cooling of data center in order to make hardware and software stable. This can be avoided, by having a robust algorithm whose core function will be to optimize the temperature of data center with in threshold temperature limit via scheduling of virtual machines in Cloud.
3.2 Objectives of Thesis
The objectives of the thesis are as follows:
Proposing Temperature-Aware Virtual Machine Scheduling in Green Clouds. Establishing Paravirtualized Cloud Environment.
40
Cost of cooling the data center is increasing day by day because of increase in performance requirement and load on these data centers. Traditional technique such as cooling the data center with the help of water or air is not sufficient for today‟s state of art data center. Moreover, most advanced approaches like load balancing and load skewing focus on efficient utilization of resource against the cost of operation. Thus, there is requirement for a technique whose core job will be to optimize temperature of data center with in threshold temperature limit such that the whopping cost on the cooling of data center can be saved.
Detailed design and algorithm for problem which has been formulated in chapter 3 is discussed in chapter 4.
41
Chapter 4
Temperature-Aware Virtual Machine Scheduling in Green Clouds
This chapter discusses how the problem stated in the previous chapter can be solved in Virtualized Cloud Environment.
4.1 Architecture: Temperature-Aware Virtual Machine Scheduling in Green Clouds
Our proposed technique consists of two levels. At top, there is Load Balancer whose core functions are to entertain new VM requests and control the migration of VMs from one node to other node. To do it functions efficiently, it will keep track of each node in Virtualized Cloud System and maintain following information in tabular form say Global List:-
Current Temperature. Optimal Temperature. Critical Temperature. CPU Utilization. Free Memory (RAM).
Load Balancer will sort above information in increasing order of current temperature of nodes. Bottom level .i.e. second level consists of physical nodes on which VMs run. Every node will be connected to Load Balancer. Apart from Load Balancer‟s Global List every node will maintain a Local List which contain following information regarding VM in it:-
VM Name.
Type of Operating System CPU Utilization
Number of Processors required
42
Fig.4.1. System Architecture
Updating Global and Local Lists
Global List maintained by Load Balancer is to be updated after a fixed interval of time. This fixed interval would be set by the cloud administrator. Duration of this interval would depend upon the complexity of system and services provided by system. Beside this fixed interval Load Balancer will update Global List on occurrence of any of following events:-
Request for new VM arrives. Migrations of VMs.
Shutdown of VMs Powers on of VMs.
Local List which is maintained by every node is updated under the occurrence any of following events:-
Migrations of VMs. Power on of VMs. Shutdown of VMs.
Waiting Queue
It is the queue, through which request for new virtual machines is entertained. Every new request for virtual machine is added at end of Waiting Queue. Load Balancer will service VM requests in same order as appeared in Waiting Queue. If VM request gets node, then Load
43
Balancer will remove that VM request from Waiting Queue otherwise move further in Waiting Queue and service the queue till end.
Critical Queue
It is just like simple queue which contain the information of those nodes which have Current Temperature equals to its Critical Temperature or more than its Critical Temperature. Critical Queue is used in migration of VMs from one physical node to other in order to optimizing the temperature of Virtualized Cloud System. Information in Critical Queue is sorted in decreasing order of Current Temperature of nodes.
Critical Temperature
It is core temperature which is set by original equipment manufacturer (OEM), running the core on Critical Temperature or beyond requires significant amount of cooling to perform computation optimally. Moreover, running machine for significant amount of time beyond Critical Temperature may produce irreversible changes in hardware. And degrades hardware‟s computation efficiency and also decreases the life of core. This Critical Temperature is different for different core e.g. corei3/i5/i7 has 65°C, 72°C for Dual core 2080 [8].
Optimal Temperature
It is core temperature set by OEM, core‟s performance does not degrade or no instability occurs while running core on Optimal Temperature or below Optimal Temperature. Moreover, running core on Optimal Temperature or below Optimal Temperature doesn‟t require any extensive cooling [9]. So running core on this temperature or below this temperature, we can save cost of cooling and increase life of hardware.
4.2 Working of Proposed Scheduling Technique
Our Temperature Aware Scheduling Technique is divided in two parts:- I. Migration of VMs.