to build a prediction model, which is then used to predict the execution time variation of another set of workloads.
The profiled-data can be used with any machine learning technique to build a prediction model. In this dissertation, two techniques are used to build models, the Least Square Regression (LSR) and Artificial Neural Network (ANN). The accuracies of the prediction models from both the techniques are also presented and compared.
1.3
Structure of the dissertation
This chapter is the first chapter of the dissertation; it is the introduction. The rest of the chapters are organized as described below.
• Chapter 2 discusses the background of the virtualization. It describes the ori- gin and development of virtualization technology over the decades. Also, the classification of the virtualization technology is given. There exist many clas- sifications of virtualization, it is important to know how they are different and which ones are important for the Cloud and data centers.
The chapter also discusses how the various computing resources can be virtu- alized. The benefits of using virtualization in the Cloud and data centers are discussed. Basics of Grid computing and how Cloud emerged from Grid tech- nology are discussed. Furthermore, a classification of various types of Cloud also provided.
• Chapter 3 starts the experiments with consolidated VMs. The chapter discusses the importance of the research. In the chapter, experiments have been conducted with nine different benchmark suites; they are also described in the chapter. The benchmarks can be divided into four categories based on their resource usage intensities.
The first three categories are CPU, memory, and I/O intensive benchmarks. The fourth set of benchmarks are multiple resources intensive, that is they use several types of resources at the same time. Those benchmarks are concurrently exe- cuted in consolidated VMs. In this chapter, experiments are conducted on the Xen hypervisor. The resource usage data of the consolidated VMs are collected
1.3. STRUCTURE OF THE DISSERTATION 9
after the experiments have finished, they show how the resource usage pattern of VMs changes with the changing number of consolidated VMs. The results of this chapter also helped to design experiments for the rest of the chapters of the dissertation.
• Chapter 4 introduces and discusses the new consolidation benchmarking tech- nique, called the ICBM [43]. The chapter describes how the ICBM is applied to the VMs of a hypervisor. In this case, Xen hypervisor is used to run the VMs. The ICBM can be divided into several steps; those steps are described in the chapter.
For the experimental purposes, the co-located VMs are divided into two cate- gories, target and co-located VMs. The target VM is monitored for performance variation. On the other hand, the co-located VMs are used to create resource contention. The number of co-located VMs are increased in the system that is what creates the resource contention.
The various workloads are run on the co-located VMs and execution time vari- ation of the target VM is collected. The profiled data are used to train LSR models. The models are then used to predict the execution time variation of the target VMs. During training and testing, separate workloads are used in the co- located VMs. Thus, the combination of co-located VMs used for the testing is different from those used for the training.
The execution time variation of the target VM depends on the resource con- tention created due to the co-located VMs. Thus, a different combination of co-located VMs affects the target VM execution time differently. Data is col- lected for a selected combination of co-located VMs, and the model is trained. The experimental results show that the LSR model can predict the execution time variation of the target VM for other combinations of co-located VMs.
This chapter contains material that was published as [43]. The author of this dis- sertation has done all the works related to publication. The author has designed and conducted the experiments, collected and analyzed the data, and wrote the draft for the paper. Since the publication, the author has also added more expe- riential results and data to the chapter.
1.3. STRUCTURE OF THE DISSERTATION 10
• Chapter 5 presents a framework for scheduling and profiling the parallel work- flows on VMs. Also, the concept of ICBM is extended to the parallel workflows. In the previous chapter, the ICBM is introduced for analyzing the performance of a group of VMs. The ICBM runs a group of tasks in the VMs according to some predefined patterns. Those patterns are stored in a file beforehand; the scheduler uses the pattern to run the tasks on VMs and collect their performance variation data.
This chapter extends the concept of ICBM to the parallel workflows. A parallel workflow consists of many individual tasks; this chapter shows that the ICBM can be applied to those tasks, too. The framework is designed with several goals in mind those are described in this chapter. The implementation of the frame- work is divided into several modules, which are described in this chapter. Finally, the experiments are conducted with a real-world parallel workflow. The framework executed all the tasks of the workflow and collected the execution time variation data.
This chapter was previously published as [44]. The author of this dissertation has designed and conducted all of the experiments. The author also has collected and analyzed the experimental data and compiled the draft of the paper.
• In Chapter 6, the experiments are done with the ICBM and multiple hypervisors. Three hypervisors are set up for the experiments, they are the ESXi, XenServer, and Xen.
The experimental results show that the execution time variation is not dependent on the location of the VM on the server, rather it is dependent on the total number of VMs. That is the VM performance is mainly affected by the total number of VMs on the system. If the total number of VMs is increased in the system, the performance degrades, on the other hand, if the VM number is decreased the performance enhances; thus, the relationship is reciprocal.
Various resource intensive benchmarks are run on three hypervisors and execu- tion time variation is profiled. The number of co-located VMs are changed in the server to vary the resource contention in the system; this resource contention is responsible for the execution time variation of VMs.