Server virtualization is an essential feature of modern data centers. It is a cunning tech- nology that provides applications with an alternate view of the execution environment and resources that are independent of the underlying system and architecture. Such ab- straction is often necessary to run untrusted software over the internet and on different system architecture. The modern data centers and virtualization are dependent on each other. The virtualization is consistently used in the data centers to run various types of applications on high-end servers. Virtualization technique can be implemented on different computing systems in various ways.
In high-end servers, the virtualization layer usually resides over the physical hard- ware. In this case, the guest Operating Systems (OS) run on top of the virtualization layer and interact with a hypervisor rather than directly with the actual physical hard- ware. Virtualization mainly provides two advantages. Firstly it allows software to run
*Image source: https://commons.wikimedia.org/wiki/File:Young_Poincare.
jpg
3.1. INTRODUCTION 45
on a system for which the software was not designed and secondly it provides an easy way to run an untrusted application on a remote machine. Both of the abilities are cru- cial for deploying application over the internet. For above reasons, the virtualization has become a popular choice for data centers all over the world.
The history behind the inception of Cloud is fascinating. As the computing hard- ware has become ever cheaper during the late nineties, various enterprises started building up a vast cache of commodity hardware primarily to meet their computa- tional requirements. However, soon it was realized that most data centers have a low level of utilization [174], it can be as low as
5%
and hardly ever crosses20%
. Low utilization directly translates into both high operational cost and energy bill, which are real concerns for any data center owner [175].On the other hand, the rapid growth of the internet provided an opportunity to both small and medium scale internet based service providers to acquire additional re- sources on a short-term basis. These businesses often face a sudden deluge of traffic, which may last only for a short duration of time. Such requests cannot be served sustainably with the physical resources available to most small and medium scale businesses. On the other hand, purchasing a large cache of physical hardware for occasional service requirements is not economical. Therefore, an alternative to con- ventional computing resource acquiring process was necessary, and out of this mixture of drastically different economic demands, the idea of Cloud was conceived to help the operators on both sides of the spectrum.
The Cloud is a technology that is born out of necessity; it is the concept of using computing power as a utility. The Cloud has already been proven as one of the most successful innovations of the past decade, and constant efforts are being made to make it more accessible for high-performance computing (HPC) [8]. As already mentioned that a fundamental concept of Cloud is to rent computing power as a utility, not the physical hardware. It is a revolutionary idea in the field of computing, and this would not have been possible without virtualization. The Cloud allows users to deploy their applications over the internet, while the actual physical machines in the data center remain out of reach for the Cloud user.
The Cloud offers users the ability to rent any amount of resources within a short period without prior reservation, run any application on them and subsequently release them without any strings attached. In the heart of the whole process is the virtualiza- tion technology, which allows the physical resources to be divided into virtual units
3.1. INTRODUCTION 46
and rented over the internet. Deployment of virtualization makes sure that the Cloud users can have the full control over the rented Virtual Machines (VMs), while Cloud providers have the total control over the physical hardware. However, the consequence of running multiple VMs on a physical machine is resource contention, which leads to performance degradation. Although, unexpected performance degradation may be ac- ceptable for applications like web services; however, may not be adequate for running parallel applications. This performance variance is influenced by a lot of factors, in- cluding resource contention among the running VMs, the architecture of the hypervisor itself and I/O requests handling by the virtualization system.
The scheduling is an integral part of executing parallel applications. To fully utilize the inherent parallelism of an application first it is broken into some smaller tasks and then scheduled for execution. During the scheduling process, each task is assigned a start-time, finish-time, and processor to execute on. Once the application is broken down into some smaller tasks, it can be represented by a task graph for convenience. A crucial part of scheduling is to estimate the execution start time and finish time of each task. That is because before a child task can start execution all the parent tasks of it must finish executions. As mentioned earlier, that on VMs tasks often faces unpredictable performance degradation; as a result, the finish time of tasks cannot be estimated accurately during the scheduling process.
Since the delay in the execution of a task of the task graph can have a cascad- ing effect, this can make the whole scheduling scheme inefficient [176]. Therefore, to effectively schedule a parallel application on a virtualized system analyzing the perfor- mance impact of the co-located VMs is necessary [177–186]. Performance prediction models for VMs can be useful for improving energy efficiency as well [187–193]. Ef- ficient use of energy has become a burning issue for the data centers today [194–197]. A computing system can achieve the highest efficiency only when all the hardware components are fully active, and the system is running on full load. Techniques such as temporarily shutting down of one or more cores of the processor or some blocks of memory have been shown to be not energy efficient [198]. The virtualization can be an effective way of reducing energy consumption through consolidation of VMs.
As mentioned above the servers of many data center remain under-utilized at most of the times [174]. One way to increase utilization is to consolidate more VMs on the same server. Being able to run more VMs on the same server means increased server utilization for the same amount of physical resources. On the other hand, consolidating
3.1. INTRODUCTION 47
too many VMs on the same physical server will degrade the performance of VMs. A model that can predict the performance penalty for consolidation will help to choose the optimal number of VMs to be consolidated [26, 182, 191, 199]. The model will in- dicate how much performance will be lost due to the consolidation of a certain number of VMs on a server.
In this scenario, the physical system remains fully active while the consolidated VMs number may be varied to achieve different levels of performance-versus-utilization trade-offs. Obliviously, more consolidated VMs on less number of running physi- cal servers means more performance degradation, while running additional physical servers means better performance, however, at the cost of extra energy consumption. Predicting such trade-off behavior with a model can help the system administrator to take better decisions.
To build a performance model for co-located VMs some factors need to be consid- ered [200]. First, the relationship between the resource consumption of various VMs is not linear. Second, many hardware parameters and counters have to be taken into account for this problem, and that increases the dimensionality of the problem. Third, to build such a model various benchmarks are required, that can put stress on different systems resources both individually and collectively. Fourthly, the experiments have to be done with a variable number of simultaneously running VMs. As the number of co-located VMs on a server may change at any time, therefore, their effect on the performance needs to be examined.
It should be noted that although modern data centers are made up of heterogeneous hardware, however, it does not necessarily mean that each machine configuration is different from every other machine. Usually, for a data center, large pieces of hardware is purchased at a time in batches, and each batch usually consists of machines of similar configuration.
This chapter begins experiments with consolidated VMs and tasks. Various bench- mark suits are used to collect resource usage data from various VM combinations. Separate benchmarks are used to put stress on separate computing resources. The re- source consumption data from multiple VMs are collected.
A set of benchmarks are used to measure various VM resource usages like CPU, memory, disk I/O. Also, an Online Transaction Processing (OLTP) benchmark is used to put stress on all three resources simultaneously. The benchmarks are run on VMs