• No results found

Network aware virtual machine allocation and decision tree based MapReduce run time prediction in the cloud

N/A
N/A
Protected

Academic year: 2021

Share "Network aware virtual machine allocation and decision tree based MapReduce run time prediction in the cloud"

Copied!
103
0
0

Loading.... (view fulltext now)

Full text

(1)University of Wollongong. Research Online University of Wollongong Thesis Collection. University of Wollongong Thesis Collections. 2013. Network aware virtual machine allocation and decision tree based MapReduce run time prediction in the cloud Jing Tai Piao University of Wollongong. Recommended Citation Piao, Jing Tai, Network aware virtual machine allocation and decision tree based MapReduce run time prediction in the cloud, Master of Information and Communication Technology - Research thesis, School of Information Systems and Technology, University of Wollongong, 2013. http://ro.uow.edu.au/theses/4279. Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: [email protected].

(2)

(3)  . School of Information Systems and Technology          . Network aware virtual machine allocation and decision tree based MapReduce run time prediction in the cloud          . Jing Tai Piao          . "This thesis is presented as part of the requirements for the award of the Degree of of the University of Wollongong"  . 08/2013.

(4) DECLARATION I, Jing Tai Piao, declare that this thesis, submitted in fulfilment of the requirements of the award of Master of Information and Communication Technology by research, in School of Information System and Technology, University of Wollongong, is wholly my own work unless otherwise referenced or acknowledged. The document has not been submitted for qualification at any other academic institution.. Signature: Date: 14/08/2013.

(5) ACKNOWLEDGEMENTS Completing this thesis is more than the effort of me, it is the result of the hard work of many people. Here, I need to express my greatest gratitude for all of them who had sincere helped me. First, I would like to thank my supervisor Dr. Jun Yan for his continuous support and constant encouragement. I could not complete this thesis without his guidance and advise, especially after I plunge into my career I merely had few time to make progress in writing thesis, but Dr. Jun Yan pushed me and encouraged me to overcome the hardest time. Finally, thanks to my beloved wife, Ying Tian. I really appreciate that you can take care of me when I don’t have time to do the house works.. 3.

(6) TABLE OF CONTENT CHAPTER1 Introduction ............................................................................................ 8   1.1   The concept of cloud computing .................................................................... 8   1.2   The layers of cloud computing ..................................................................... 12   1.3   Advantage of cloud computing ..................................................................... 14   1.4   Statement of Problem ................................................................................... 15   1.5   Research Aims and Methodologies .............................................................. 17   1.6   Thesis Outline ............................................................................................... 18   CHAPTER2 Literature Review ................................................................................. 19   2.1   Introduction................................................................................................... 19   2.2   Basic Interior Network Architecture of Cloud Datacenter ........................... 20   2.3   Major cloud computing platforms ................................................................ 21   2.4   Virtual machine allocation policies .............................................................. 22   2.4.1   Virtual machine allocation policies in practice...................................... 22   2. Striping policy ....................................................................................................... 24   3. Packing policy ....................................................................................................... 24   2.4.2   Academic proposals of virtual machine allocation policies .................. 24   2.5   Virtual machine migration policies............................................................... 26   2.5.1   Virtual machine migration policies in practice ...................................... 27   2.5.2   Academic proposals of virtual machine migration ................................ 27   2.6   Performance and computing resource prediction approaches ...................... 28   1.Regression............................................................................................................... 28   2. Clustering Techniques ........................................................................................... 29   3. Support Vector Machine (SVM)............................................................................ 29   4. Kernel Canonical Correlation Analysis (KCCA) .................................................. 29   5. Decision Tree Learning ......................................................................................... 30   2.7   Requirement analysis .................................................................................... 32   CHAPTER3 Network aware virtual machine allocation and migration in cloud computing .................................................................................................................. 34   3.1   Introduction................................................................................................... 34   3.2   Research scenario and challenges ................................................................. 39   3.3   The Virtual Machine Allocation Approach .................................................. 40   3.4   The Virtual Machine Migration Approach ................................................... 47  . 4.

(7) 3.5   Algorithm complexity ................................................................................... 49   3.6   Summary ....................................................................................................... 50   CHAPTER4 Computational Resource Consumption Prediction .............................. 51   4.1   Introduction................................................................................................... 51   4.2 Decision Tree Overview .................................................................................. 53   4.1.1   ID3 Decision Tree Algorithm ................................................................ 55   4.1.2   C4.5 Decision Tree Algorithm............................................................... 57   4.1.3   CART Decision Tree Algorithm............................................................ 58   4.2   Introduction of MapReduce .......................................................................... 62   4.3   Prediction Model .......................................................................................... 63   4.4   Scheduling Approach.................................................................................... 70   4.5   Summary ....................................................................................................... 71   CHAPTER5 Experiments .......................................................................................... 73   5.1   Experiment of Network aware VM Allocation & Migration Policy ............ 73   5.1.1   Introduction............................................................................................ 73   5.1.2   Experiments process .............................................................................. 75   5.2   Experiment Evaluation and Results of Decision tree based Execution Time Prediction ............................................................................................................... 81   5.3   Workload Generation and Experiment process ............................................ 84   5.4   Experiment Result Evaluation ...................................................................... 85   5.5   Summary ....................................................................................................... 89   CHAPTER6 Conclusion and Future Works .............................................................. 91   6.1   Summary ....................................................................................................... 91   6.2   Limitations .................................................................................................... 93   6.3   Future Works ................................................................................................ 94   References.................................................................................................................. 96   Appendix A The worload generation script ............................................................. 100    .        . 5.

(8) LIST OF FIGURES Figure 1.1. ………………………………………………………………….10. Figure 1.2. ………………………………………………………………….12. Figure 2.1. ………………………………………………………………….21. Figure 3.1. ………………………………………………………………….36. Figure 3.2. ………………………………………………………………….37. Figure 3.3. ………………………………………………………………….38. Figure 4.1. ………………………………………………………………….52. Figure 4.2. ………………………………………………………………….52. Figure 4.3. ………………………………………………………………….62. Figure 4.4. ………………………………………………………………….64. Figure 4.5. ………………………………………………………………….69. Figure 5.1. ………………………………………………………………….78. Figure 5.2. ………………………………………………………………….78. Figure 5.3. ………………………………………………………………….79. Figure 5.4. ………………………………………………………………….81. Figure 5.5. ………………………………………………………………….82. Figure 5.6. ………………………………………………………………….83. Figure 5.7. ………………………………………………………………….84. Figure 5.8. ………………………………………………………………….84. Figure 5.9. ………………………………………………………………….87. Figure 5.10. ………………………………………………………………….88. Figure 5.11. ………………………………………………………………….88. Figure 5.12. ………………………………………………………………….89.    . LIST OF TABLES Table 4.1. ………………………………………………………………….68. 6.

(9) ABSTRACT The emergence of cloud computing brings an entire new computing paradigm which allows the user to request virtualized physical computing resources from data center. These physical computing resources are represented in the vision of virtual machines (VMs). Within a cloud datacenter, all of the computing resources are virtualized as a pool. These VMs share the entire computing resources in this pool, including CPU cores, memory and disks. Theoretically, the user could acquire infinite computing capability. In addition, the computing resource virtualization facilitates the migration of the VM from a physical node to another so that the downtime could be eliminated. However, the concurrent VM allocation and migration policies aim at maximizing the utilization rate of physical computing resources in the datacenter. The network communication cost is largely ignored. As the growth of the scale of data center, the logical distance between the VMs and its data could be further so that the communication cost increases. This research aims at proposing a VM allocation and migration policy with network I/O performance consideration so that the communication cost can be optimized. The allocation policy will decide the first physical place of the VM in the datacenter, whereas the migration policy will migrate the next physical place when the network status deteriorates. Also, the implementation of the proposed VM allocation and migration policy would cause the decrease of the physical resource utilization rate. We propose an approach to comprise the network status optimization and decrease of resource utilization rate. In this approach, we predict the execution time of the VM so that the available CPU time of a physical node can be known before actual deploy the VM. If the VM is allocated on the physical node with maximum available CPU time, the utilization rate could be optimized. Among the nodes with better network status and the nodes with more CPU available time, we assign index to them so that we can make a balance between communication cost and physical resource utilization rate.. 7.

(10) CHAPTER1 INTRODUCTION This chapter gives an introduction of the original research work reported in this thesis, including its research context, research aims, research methodologies, and research outputs. First, a brief introduction of cloud computing and services is given to build a fundamental knowledge background of this research. In the next section, we anatomize cloud computing and services in different layers and compare the main types of cloud computing. In the problem statement section, we narrow down to particular research problems addressed in this thesis, and describe the aims and goals of this research. In the following section, we introduce the methodology employed in this research. At last, the structure of the rest of the thesis will be given. 1.1. The concept of cloud computing. In the past decades, the computing paradigm has experienced several revolutions, from single standalone personal computer (PC), to client/server (C/S) computing, to peer to peer (P2P) distributed computing, and to cloud computing [36]. In the early age of computers, the majority of users were satisfied with working with a standalone computer. The advent of C/S computing model brought the first revolution of computing paradigm. Instead of interacting with an isolated PC, C/S model allows the user, usually from a client computer, to access applications, data and other resources residing on remote servers. The client normally has less memory, storage space and processing power than the server. A server is typically designed to serve multiple clients thus the computational affairs are largely centralized on the server side. In the third phase, the P2P paradigm was developed. In the P2P environment, the computational tasks and control are decentralized to multiple computers. In contrast to the traditional C/S paradigm, all of computers on the network are performing consistently. Each computer is recognized as a peer and 8.

(11) resources and services could be exchanged freely among peers. One of the most important subsets of P2P is the distributed computing paradigm which allows idle computers across a network to dedicate their computing power to a large, processorintensive project [36]. In the distributed computing environment, the involved computers run computing activities on its spare time and the results are uploaded to the distributed network periodically. With the growth of the Internet, cloud computing emerges as a new computing paradigm. In this phase, the computing resources, such as processing power and data storage could be exposed as services over network [36]. As a revolutionary computing paradigm, cloud computing is gaining increasing attention in both academia and industry. The leading enterprises in IT industry, such as Amazon, Google, Sun, Oracle, IBM, Microsoft and Apple, and other non-profit organizations are establishing distinguished cloud computing platforms by implementing unique cloud computing technologies to follow their business or technical strategies. Literately, the icon of cloud is used to represent the Internet in the network diagram. The concept of cloud computing, therefore, implies that the computational services could be accessed via an Internet connection. The underlying infrastructure of cloud computing contains hundreds or even thousands of physical computers in a data center. These massive computers are combined and inter connected by virtualization technology. Any physical computing resources, such as CPU, memory and disk space, can be merged logically as a new virtualized instance. All of the cloud computing services are built based on such infrastructure virtualization. As shown in Figure 1.1, the software in cloud computing is running upon the virtualization layer.. 9.

(12) From the cloud user’s perspective, both the software and the underlying infrastructure of cloud computing are exposed as APIs to the cloud user. According to the definition of National Institute of Standards and Technology (NIST) [51], cloud computing is a computing model which enables the user to access a shared pool of configurable computing resources which can be rapidly provisioned and released in a ubiquitous and convenient way based on the demand of user. The physical fundamental of cloud consists of massive computers which are virtualized as a unity. The computing resources, including CPU time, memory, disk space and bandwidth are merged as a pool. In cloud computing, these resources are exposed as services over the Internet in different layers (see Figure 1.1). All of these services are highly scalable, as the resources can be provisioned and released logically and virtually in the fundamental layer of cloud computing.. Access of Software as a Service via web browser. Access Infrastructure as a Service via APIs. Access Software as a Service via APIs. Software. Virtualization layer. Physical machines in cloud underlying infrastructure. Figure 1.1 The interaction between cloud computing services and end user 10.

(13) To sum up, the essential characters of cloud computing can be summarized as following factors: •. On-demand self-service: Computing capabilities, such as CPU time and data storage can be obtained automatically based on the demand.. •. Broad network access: Services of cloud computing should available over network and standard APIs.. •. Resource pooling: The provider’s computing resources should be virtualized as a unity to allow multiple user access simultaneously. The resources are consumed in a tenant modal. The user only should be charged when they request resources.. •. Rapid elasticity: Computing resources can be dynamically and automatically provisioned and released.. •. Measured service: Resource usage can be transparently controlled, monitored and reported to both of service provider and consumer.. As mentioned above, cloud computing does not only provision the computing capabilities. In some cases, the service providers would offer applications which are build upon the fundamental cloud infrastructure. Thus, the services of cloud computing can be seen as multiple layers depending on the service level. In the next section, we will study and examine the services of cloud computing in different layers.. 11.

(14) 1.2. The layers of cloud computing. As shown in Figure 1.2, all of the services in cloud computing fall into three categories, namely Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS).. Vertical integeration. Software as a Service (SaaS). Platform as a Service (PaaS). Infrastructure as a Service (IaaS). Horizontal integration. Figure 1.2 The Categories of Cloud Computing The fundamental layer of cloud computing is called IaaS, which offers the basic computing capabilities as services over the Internet. The basic computing capabilities are a set of data storage space, CPU cycles, memory capacity, network bandwidth, and so on. The examples of IaaS include Amazon EC2 [2], Amazon S3 [4] and Microsoft Azure [33] etc. Here, Amazon EC2 and Amazon S3 have become. 12.

(15) important subset of Amazon Web Services. In Amazon EC2, for instance, users are able to build an entire virtual system which resides on thousands of physical machines and the scale and capacity of the system can be dynamically adjusted by the business demand and growth. The PaaS is built upon the cloud infrastructure. PaaS aims at offering a platform to application developers to build and deploy applications into the cloud infrastructure. It typically integrates operating systems, the programming language execution environment, database, and web servers. The users of PaaS could develop and deploy their applications into a cloud platform. A well-known example of PaaS is Google App Engine [20]. Google App Engine integrates with development tools such as eclipse. The developers can develop and run their applications on Google’s infrastructure. The application in Google App Engine starts to serve once it has been uploaded and there is no server to maintain. These applications are highly scalable, depending on the growth of network traffic and data storage. In the highest layer, the functionalities of software can be described and published as services. From the user’s perspective, these services are accessible remotely via the Internet but the underlying infrastructures of these services are absolutely untouchable and invisible to the user. In practice, two well-known examples of SaaS are Google applications (Google Apps) and Salesforce.com. The Google Apps currently offers Google Docs, Google calendar, Google email service (Gmail) and other services. These services are available online but do not request user set up, configure or maintain any server.. 13.

(16) 1.3. Advantage of cloud computing. Cloud computing brings significant benefits in four factors: l. High flexibility. Cloud computing services are able to scale its computing or. data storage resources up or down according to the business requirement. The cloud computing resources can be almost instantaneously added or released and the user only pays for the resources which are actually used. l. Cost efficient. Compared with traditional IT solutions, cloud computing is a. more economic choice to implement, maintain and upgrade. Due to all of the cloud services are provisioned across the Internet, the user of cloud services only needs a cheap computer to access the Internet rather than purchasing or maintaining amount of expensive servers. l. Disaster recovery. Cloud services always keep the customers’ data remotely. in the data center. In a sense, it can be seen as an off site backup from the customers’ IT infrastructure. Furthermore, most of cloud service providers have disaster recovery system in place to secure the data from any disaster. For example, Windows Azure [33] platform has embedded the disaster recovery mechanism into the heart of their system, its data-storage service has build-in redundancy to secure its data and guarantee the application is still available. l. Energy saving. The greenhouse gas emission has become to a major concern. to IT industry. Cloud computing service reduce the electricity consumption by implementing scheduling and load balancing algorithm so that the power utilization rate can be improved significantly, and thereby reduce the emission of the greenhouse gases.. 14.

(17) 1.4. Statement of Problem. Services of cloud computing lay on three different levels, i.e., infrastructure (IaaS), platform (PaaS) and software (SaaS). These services are easily accessible across the Internet and can also be flexibly tailored to satisfy the user’s varying demands. The users of cloud services do not need to purchase and maintain any hardware and cloud services are only charged based on the usage. In practice, Amazon EC2 enables the user to request virtual computers in the form of instances from cloud data centers. These instances vary in cost, CPU frequency, memory and disk capacity. A small instance contains 1.7 GB of memory, 1 virtual core and 160 GB of local data storage. A medium instance contains 3.75 GB memory, 1 virtual core and 410 GB of local data storage. The Amazon EC2 [2] is highly scalable because the instance could be modified from one type to another within a few minutes. Compared with the Amazon EC2, traditional upgrade of host needs extra expenditure of capital and time. As the users have no control on the fundamental infrastructure of cloud, the reliability and performance of the applications or services which are resided on cloud would largely rely on the promise of cloud service providers’ Service Level Agreement (SLA). In terms of reliability, the well known PaaS provider Microsoft Azure announce that they can achieve 99.9% - 99.95% [22], while Amazon EC2 claims that the reliability can reach 99.95% [33]. The commitment of availability guarantees the application or service that are running on the cloud platform can be accessed in uptime. However, performance of cloud services is largely ignored by the entire industry. For some data intensive application, for instance, stock or currency analysis applications running on cloud platform, time restriction could be a critical concern for the user. In this case, result should be given within a certain time 15.

(18) period; otherwise, the value of the result would become worthless due to the fluctuation of the market. Unfortunately, the current cloud computing is not performance-oriented, most of them are concentrate on how to maximize the utilization rate of underlying resources. As the infrastructure of cloud data center is completely blind to the user, the application could be hosted in a place that is far away from where its relevant data are stored. For instance, before a user executes a MapReduce application based on Amazon Elastic MapReduce, user could save related data into Amazon S3 which is another Amazon supported cloud data storage service. In this case, the virtual machine would be allocated regardless of how fast the application can access the data. Once the virtual machine was placed onto a physical machine with poor network I/O performance, the data interaction would become a bottleneck of the whole application performance. According to the literature review, however, contemporary virtual machine provisioning and placing approaches rarely consider optimizing the virtual machine allocation policy with respect to network I/O performance [22]. Besides, in the cloud data center, the computing resource virtualization and processor sharing often result in the instability of the network I/O performance within a cloud environment. For example, the TCP/UDP throughput between the small instances in Amazon EC2 varies between 1Gb/s and 0 frequently [22]. Instead of placing the VM on a physical node through its entire lifetime, migration of virtual machine to better network condition would improve the total performance of a cloud-based application. However, in practice, migrating VMs may decrease the utilization rate of underlying resources. To address this issue, we propose an approach to balance between the communication cost optimization and resource utilization rate. In this approach, we 16.

(19) firstly adopt machine learning based algorithm to predict the execution time of the VM before actual deploy it into cloud. Thus, we can have the available CPU time on each physical node. We assign available CPU time index which is obtained by multiple the available CPU time with a predefined weight nt, while network status index can be computed by multiple the network aware allocation priority with another weight nn. The sum of the two weights (nt and nn) equals to 1. The user is able to adjust the proportion of the weight to balance the computing resource utilization rate and network communication status. Therefore, the VM allocation priority can be decided by comparing the sum of CPU available time index and network status index.. 1.5. Research Aims and Methodologies. This research focuses on addressing the network I/O performance aware virtual machine allocation and migration problem and the computing resource demand prediction problem. The aims of this research are 1. to investigate virtual machine allocation and migration policies with the consideration of network I/O performance, 2. to propose an estimation model to estimate the computing resource demand before actual execution of an application within cloud computing and based on the prediction approach, we can comprise between the computing resource utilization rate and communication cost. To achieve the first aim, a comprehensive literature review has been conducted. The previous research related to the virtual machine migration and placement are reviewed and analyzed to identify gaps. Based on the comparison result, we propose a network aware virtual machine placement and migration approach. In the experiments, we have conducted a quantitative study to compare the application’s performance with and without migration. 17.

(20) To achieve the second aim, we have utilized both quantitative and qualitative research to collect and analyze data. Firstly, we reviewed the previous performance prediction approaches. Secondly, we choose machine learning approach to predict the performance of the application. Thirdly, we collect the experiment data by observing the sample application. Using the data set as learning set, we will implement machine learning algorithm to find the relationship pattern between computing resources, performance and application characteristics. Eventually, we intend to propose a prediction based approach to balance between the physical resource utilization rate and communication cost. 1.6. Thesis Outline. The thesis reports our research on the network aware VM placement and migration policy as well as the prediction of VM execution time in the context of cloud computing. The rest of this thesis is organized as follows: In Chapter 2, the previous related researches are reviewed and the advantages and limitations of the former approaches are examined and discussed. In Chapter 3, we introduce the proposed network aware virtual machine allocation and migration policy. In Chapter 4, the performance prediction of MapReduce application is proposed. In Chapter 5, the experimental results are demonstrated and discussed. Finally, in Chapter 6, the limitations and future works of this research will be discussed.. 18.

(21) CHAPTER2 LITERATURE REVIEW 2.1. Introduction. In this chapter, firstly, we study the general network architecture within cloud data center to emphasize the research scope of this thesis. Secondly, this chapter reviews the former research in three areas, namely VM allocation, migration, and computing resource prediction in cloud computing. In cloud computing, each VM is the illusion of the dedicated physical machine, encapsulating the abstraction of computing resources and workloads. The VM allocation in cloud computing has the meaning of settling down the VM to a position within the fundamental physical infrastructure. A VM allocation policy is a mechanism implemented in a cloud datacenter to assign the VM. These policies are usually implemented to maximize the utilization rate of computing resources or saving energy consumption of a cloud datacenter. In Section 2.3, we will review and discuss previous allocation policies in both industrial implementation and academic proposals. The VM migration studied in this thesis means migrating VMs in cloud data center on-the-fly. The virtualization of cloud infrastructure allows the VMs to be migrated from one physical machine to another without any disruption of the running tasks. Currently, live VM migration mechanisms have already played an important role to achieve the goal of load balancing, energy saving, disaster recovery and system maintenance [49]. In the section 2.3, we demonstrate the basic interior network architecture of cloud data center.. 19.

(22) In the section 2.4, we will review and discuss the live migration mechanisms in both industrial implementations and academic research. It can be seen that there is a tradeoff between maximization of performance and resource utilization rate [50]. Arbitrarily, allocating and migrating the VMs to pursue the best performance would result in the VMs being distributed unevenly and vastly across entire physical infrastructure, whereas the utilization rate of physical machines would be degraded. To balance the performance and the resource utilization rate, we propose a scheduling mechanism based on the application resource consumption prediction. The existing research in this area is reviewed and discussed in Section 2.5. 2.2. Basic Interior Network Architecture of Cloud Datacenter. This section aims at providing a landscape of the network architecture within the cloud datacenter. In a typical cloud datacenter, a Data Center Network (DCN) is built to interconnect tens, sometimes hundreds, of thousands of serves to deliver various cloud services to the public [14]. These servers host the VMs and each VM would associate with one or multiple publicly visible and routable IP address because VM may reside cross multiple servers. Here, the publicly exposed IP address is used to allow the user to send the request, and it is named as virtual IP address (VIP), whereas the direct IP address (DIP) is used to route internally. The typical network architecture of a cloud datacenter can be demonstrated as the diagram below [21]:  . 20.

(23)  . Figure 2.1 The basic network topology of cloud data center Firstly, users’ requests reach the layer 3 border and access routers (BRs and ARs) according to its external IP addresses. Secondly, the BR and AR would find out the associated layer 2 domain based on the destination VIP. These VIPs are preconfigured onto the load balancers (LBs) which are connected with top level switches. For each VIP, the LB would have a list of DIP mapped with it and each DIP points to a particular server in the rack below the LB.     2.3. Major cloud computing platforms. Amazon is one of the most famous commercial cloud computing service provider. All of the Amazon cloud based web services can be accessed remotely. The core service of Amazon cloud computing includes Amazon S3 [4], Amazon Elastic Compute Cloud (EC2) [2] and Amazon MapReduce [3]. The Amazon S3 is a cloud based data storage service which allows the user to store their data into Amazon cloud data center. Amazon declared that there were 905 billion objects in Amazon S3 at the end of the first quarter, 2012. The Amazon cloud data center is able to handle 21.

(24) up to 650,000 requests per second for those objects in peak hour. The Amazon EC2 allows the user to request virtualized computing resources with the view of virtual machine. The size of virtual machines is predefined by Amazon itself. There are several types of computer instances available for user to choose. The user not only can arbitrarily choose the computer instances to host their applications but also form those instances as a virtual cluster. In addition, Amazon opens their API to the user so that they can develop customized application to access the Amazon cloud services. Eucalyptus cloud computing [13] provides a Linux based IaaS which is fully compatible with Amazon APIs so that user can manage their instances in both Eucalyptus and Amazon. In addition, the instance in Eucalyptus can be migrated into Amazon seamlessly. Windows Azure provides service in IaaS and Paas which aim to offer hosting service for the web based applications. Azure allows the user to build their applications using different programming languages and technologies, such as ASP .NET, PHP and Node.js. It allows the user to build up either Windows Server or Linux virtual machines based on its physical infrastructure.   2.4 2.4.1. Virtual machine allocation policies Virtual machine allocation policies in practice. Creating a virtual machine on a traditional server, such as Windows Server 200X or Linux server, means a part of the computing resources on the physical host, such as the CPU capacity, memory or disk space would be assigned to the virtual machine. This step usually will be manually achieved by an experienced system administrator. The size of the virtual machine would be based on the actual requirements or the experience of the system administrator. Once the virtual machine was created, the. 22.

(25) underneath load balancing mechanism would take over the control of running the virtual machine. If the virtual machine was hosted by a branch of computer, say a cluster, then the load balance mechanism would allocate the virtual machine on the idle computer.. In the scenario of cloud computing, hundreds or even thousands of virtual machines could be running simultaneously and each virtual machine would have different requirement in regarding to the CPU, memory, disk space and so on. For this reason, it is nearly impossible to create a general configuration or setup them manually. Instead, data center implements series of mechanisms or algorithms to organize and manage the virtual machines automatically. These mechanisms are keen on different aspects. For example, some mechanisms focus on reducing energy consuming, some mechanisms focus on increase the utilization rate of physical computing resource, and others focus on optimizing the system performance. In this section below, these recent popular virtual machine allocation policies will be described and analyzed 1. Round robin policy The simplest virtual machine allocation policy is round robin. This policy is the default scheduling policy in Eucalyptus cloud platform [13]. Round robin iteratively checks through all available hosts until the host which has sufficient free resources to allocate the virtual machine is found. For the next virtual machine, the above process is repeated until all of the VMs are allocated. Round robin scheduling method restricts in the principle that “first in first serve” (FIFS) and VMs are only allocated in the physical node that has sufficed resources so that the usage of physical resources cannot be maximized [13]. 23.

(26) 2. Striping policy Striping discards all of the hosts which do not have enough resource. Among the rest hosts, the host which currently hosts the least number of virtual machines will be chosen. The process will be executed continually until all of the VMs are allocated. This policy is adopted by OpenNebula cloud platform [12]. The striping policy aims at maximizing the available resources to VMs in a host but the VMs would spread vastly across the entire infrastructure. 3. Packing policy The Packing policy is the opposite of the Striping policy. For each new virtual machine, it first discards all hosts that do not have the available resources to host the virtual machine. From the remaining hosts, it finds the one that is currently hosting the greatest number of virtual machines. Once found, it matches the virtual machine to that host. This process is continued until all VMs are allocated. The Packing policy seeks to consolidate the virtual machines to as few hosts in the host pool as possible. It is currently available as one of the built-in policies in the OpenNebula cloud platform, and implemented as the Greedy policy option in Eucalyptus [27]. 2.4.2. Academic proposals of virtual machine allocation policies. In this section, a review of former literatures was conducted in regards to the virtual machine allocation policies. The finding shows that previous researches dedicate to share the physical computing resources in time or space in a more effective way but influence of network I/O performance is largely ignored. In the articles which are written by [45] and [44], the virtual machine allocation problem is formed as a Constraint Satisfactory Problem (CSP). In their approach, the CPU and memory capacity of a physical machine (Pj) are denoted by CjCPU and CjRAM , respectively. 24.

(27) Let the vector Hj = (hj1, hj1, …, hjv) denote the VMs that are running on Pj, and R = (r1, r2, …, rv) denote the resource requirements of each VM, the resource constraint can be represented as:. €. €. ∑. v. ∑. v. CPU l =1 l. r. . h jl ≤ C CPU 1≤ j ≤ q j. r ram . h jl€≤ C ram 1≤ j ≤ q j. l =1 l. This means the total computing resources that can be assigned to VMs are € constrained by the total resources of the physical machine. The solution of this problem minimizes the total activated physical machine in the data center. In the article which is written by [15], the proposed approach focuses on maximizing the economic revenue of cloud service provider by introducing Combinatorial Auction. Here, the problem of allocating VMs is modeled as a winner determination problem. In the article [15], the researchers take into account data I/O performance when allocating data replica for MapReduce application in cloud. The location of data in cloud is balanced by round-robin and serpentine allocation policy so that the data blocks can be distributed evenly on each physical node. Even though this approach balances the data transaction among the data nodes, the approach relies on the historical data transaction statistics. Therefore, the approach cannot balance the resource allocation in runtime. In general, previous computing resource allocation approaches largely ignore the network I/O performance so that VMs would be allocated with a non-optimized physical distance to the relevant data. This situation would cause an extra data read and write overhead or eventually lead to the degradation of application completion 25.

(28) time. For the data intensive applications on cloud computing, a branch of virtual machines would be created to combine as a virtual cluster. (e.g. Amazon Elastic MapReduce). In this case, the performance of the entire application would not only be decided by how many computing resources are assigned to the virtual machine but also the network I/O performance between each part of the virtual cluster. It can be seen that, for a data intensive application, the performance of the application may not improve even assign more physical resources. This will lead to a huge capital waste for the end user. 2.5. Virtual machine migration policies. The concept of virtual machine migration in the cloud computing usually means the virtual machine could be seamlessly transferred from one physical machine to another on the fly. During the transfer process, either application or the end user should not aware the virtual machine migration occurs. The virtual machine migration has great meaning to cloud computing in terms of load balancing, energy saving and failure recovery. Virtual machine migration facilitates the cloud computing transfer the virtual machines from a busy physical host to an idle one and thereby maximize the utilization rate of physical computing resource and the efficiency of energy consuming. Once the fatal error occurs on the physical machine so that it cannot host the virtual machine anymore, the migration mechanism could transfer the virtual machine to a healthy physical machine so that the availability of cloud computing can be increased dramatically, comparing with traditional hosting technologies. In fact, the Amazon EC2 claims that the availability of their hosting service up to 99.95% which means the monthly down time of a virtual machine less than 21.6 minutes. In this section below, the migration approaches that have been adopted will be discussed in some details:  . 26.

(29) 2.5.1. Virtual machine migration policies in practice. The most well-known live migration approach is called Pre-copy [7][32], which has been implemented in Xen and VMware virtualization platform. This practical migration policy emphasizes the migration efficiency to overcome the issues during migration, such as migration failures, migration conflicts and migration thrashing [48]. In the approach, the purpose of migration is to achieve the goal of load balancing, online maintenance, and energy saving but the network status is ignored at all. 2.5.2. Academic proposals of virtual machine migration. The research presented in [48] concerns the migration of virtual machine in runtime. The usage of CPU, memory and network bandwidth on physical machines is monitored. The historical virtual machine resource consumption is recorded to predict whether the physical machine would be overloaded in CPU, memory and network bandwidth. The virtual machine migration would be triggered once the overload trend is detected. However, this approach mostly relies on the analysis of historical data to predict the future computing resource demands of the virtual machine. Other researches, including [1] [24] [30] [29], aim at migrating VMs for different purposes, such as improving the power efficiency and satisfying performance requirements. Some migration approaches also considered the network performance and communication costs when determining migration policies. However, the existing research heavily relies on the statistical methods to migrate the files that most frequently communicate with each other. Thus, the optimization procedure has to be conducted after a relatively long period to obtain statistics. This drawback makes the statistic method based VM allocation or migration approaches hard to fit a 27.

(30) runtime circumstance. If the applications communicate with data in a short time, the lack of statistics may make these allocation or migration approaches inapplicable. 2.6. Performance and computing resource prediction approaches. The prediction problem of application computing resource consumption has been studied for a long time. It is desirable that the computational resource requirements of an application can be predicted before the application is actual run. Within the context of distributed computing, such as grid computing and cloud computing, the accuracy of prediction has significant meaning in regards to improving the efficiency of resource utility and guaranteeing the performance of the application. The prediction approaches are derived from a primitive assumption that applying the same computing resource to similar applications would have similar performance [42]. The previous research models the application with various features, and measures the performance of the application with certain computing resource. The application feature vectors and performance data can be combined as a training set. A relative convincible prediction can be made by applying statistical machine learning algorithms onto this training set. In the rest of this chapter, we intend to summarize some major machine learning algorithms which had been implemented to predict the computing resource consumption or performance. 1.Regression Linear regression is the simplest machine learning algorithm. It assumes that the relationship between dependent variable (y) and independent variables (x) can be described as. . Theoretically, it can be expected that the completion time of. an application would decrease as the computing resource being consumed increases. This approach is applied in the article [1] to simplify the sophisticated modeling 28.

(31) problem for medical image process applications. In practice, however, the increased computation resource may not be related to the decline of completion time in a linear way [11]. Rather, a number of approaches have proven to be more accurate than linear model [39] [16]. 2. Clustering Techniques Clustering techniques partition the data set according to the similarity of the variables. Traditionally, the clustering techniques classify the training data by defining the distance between variables in dataset. Even though the cluster techniques, such as K-means can be used to identify the nearest data set within the sample space, it is difficult to model the similarity of the application features as the characteristics of application are normally independent to each other. 3. Support Vector Machine (SVM) Support vector machine (SVM) has advantages in terms of handling a large number of attributes in the non-linear scenario, whereas the disadvantage of SVM is that it brings extra computational overheads at the same time and it relies heavily on the size of training set [11]. 4. Kernel Canonical Correlation Analysis (KCCA) In the articles which were written by [39] [16], the characteristics of application are formed in multiple dimensions in the space. Thus, the Euclidean distance between the applications in the space could describe the similarity among these applications. The advantage of this kind of approaches is that the prediction result could be highly accurate. But the accuracy usually relies on the abundant historical data and the computational complexity would be relatively higher than other approaches.. 29.

(32) 5. Decision Tree Learning Decision tree learning has been adopted to study the relationship between application performance and application features in some researches. Traditionally, decision tree is a data mining method which is used to predict the classification of target variables based on a set of samples. For a given training set, a decision tree is constructed by iteratively split the source set into subset based on one or several attribute test. On each node of the tree structure, a separate criterion will be used to divide the training set into different categories. The splitting process would not stop until the purity of each leaf on the tree is reached. ID3 and C4.5 are two classical algorithms to generate a decision tree. In ID3, the concept of information gain is borrowed from information theory to select the best classifier among the attributes. Therefore, the attribute which would generate maximum information gain would be firstly chosen as a classifier. Here, the information gain is calculated by entropy of each subset. The formula below shows the computation of the entropy of set S: n. E(S) = −∑ freq(Ci , S) log 2 freq(Ci , S) , j =1. Where €. l. E(S) is the total entropy of the set S. l. n is the number of different values of the attribute in the set S. l. freq(Ci, S) is the number of class i occurs in set S.. While the information gain can be computed by the following formula:. 30.

(33) m. G(S, A) = E(S) − ∑ freq(Ai , S)E(SA i )   i=1. Where:. € l. G(S, A) is the information gain after the split.. l. E(S) is the information entropy of the set S. l. l. freq(Ai , S)  is the number of class Ai occurs in the set S.  . m is the number of different values of the attribute A in the set S. € Thus, for a given set S the decision tree can be constructed by following the ID3. process below: l. Among all of the attributes A = {A1, A2, A3, …, Am} select the attribute Ax which generates the maximum gain as a classifier to split the whole set S into S+ and S-  . l. Using subset S+ or S- as new node and computing new splitter by finding maximum information gain again.. l. Iteratively conducting the splitting process until the entropy equals to zero.. The algorithm C4.5 is an extension of ID3 in following perspectives: l. C4.5 can be used to deal with the attribute with continuous ranges.. l. C4.5 can be used to classify the training set with unknown attribute values.. If we model the historical applications execution information as training set and application characteristics as attributes, it is feasible to implement decision tree learning algorithm to classify the performance of the applications. By learning from the tree, it is possible to predict the performance for a coming application before actual execute it. So far, decision tree algorithm C4.5 has been implemented in the article [17] to classify the applications into different time intervals. However, the time intervals are 31.

(34) static so that the control granularity cannot be optimized. In the researches [17] [25] [36] [40] [43], the application attributes are selected and grouped as subsets so that the searching can be done in the templates instead of the tree structure. The characteristics of applications are described as a template, such as several predefined attributes, and thereby the similarity between applications could be judged by comparing each attribute. These researches take advantage of the feature that decision tree algorithms ignore the correlation between attributes and classify the training set according to its characteristics only. In addition, the research results have proven that the prediction accuracy of template approach is better than adopting complex regression as characteristics split method [43]. 2.7. Requirement analysis. According to the discussion and analysis in this chapter, the existing virtual machine allocation and migration approaches do not take the network I/O performance into account. The first goal of this research is that the proposing an approach should not only provide a way to detect the network conditions in the cloud computing and thereby allocate the virtual machine on the physical host with the consideration of network condition. Due to the network condition within a cloud computing would change constantly, a virtual machine migration approach should be developed as a sub achievement of this goal. Allocating and migration the virtual machine bias to the network I/O condition would decrease the utilization rate of physical resource. For this reason, the second goal of this research is that developing a trade-off mechanism to balance the network. 32.

(35) I/O performance and the physical computing resource utilization. To achieve this goal, when the allocation/migration candidate hosts were found, the available computing resources or available time of the computing resources on these candidate hosts should be detected and compared as well. If each candidate host has a allocation/migration index W, then the value of W should consist of index of network I/O performance WN, whereas computing resource index WR. After applying weights onto the network I/O performance as well as the available computing resources or available time of the computing resources, we can increase the weight of network I/O if the network I/O is more important than the computing resource utilization rate and vice versa, if the utilization rate of computing resources is more important the weight of computing resource can be increased. According to compare the value of migration index W, the final position of the VM can be decided and the balance between network I/O and computing resource utilization rate can be achieved.. 33.

(36) CHAPTER3 NETWORK AWARE VIRTUAL MACHINE ALLOCATION AND MIGRATION IN CLOUD COMPUTING 3.1. Introduction. Relying on the virtualization technologies of fundamental physical computing resources, cloud computing brings a complete new computing paradigm in which the application and the related data would no longer physically reside on users’ computer. Instead, in cloud computing, the application and data would be deployed on some virtualized computers. These virtualized computers can be seen as a concrete visual form of cloud computing. The formal name of these virtualized computers is Virtual Machine, also known as VM. These VMs are built based on the virtualized fundamental physical resource, which means the computing resources of a VM may not belong to the same physical computer, or even a certain physical host. From the user’s perspective, these VMs are fully functional as normal computers to process and store data. The capacity of the VM can be setup and modified during the entire life cycle of the VM by changing the values on the cloud computing user interface. The cost to update a VM on cloud computing is much less than upgradeing a physical host and the underneath VM loading balance and migration is absolutely blind to the end user. According to the literature review in the previous chapter, the current implemented VM allocation mechanisms aim at maximizing the utilization rate of computing resources. From the network condition perspective, the application and the related data would be arbitrarily allocated within the data center. Thus, a VM could be allocated with an un-optimized logical or physical distance to its data. To illustrate this problem, let us consider the following case. Suppose there are 3 VMs, namely VM1, VM2, VM3 and these VMs will host 3 different data intensive. 34.

(37) applications, respectively. Assume that each VM will process its own associated data, i.e., VM1 is associated with Data1, VM2 is associated with Data2 and VM3 is associated with Data3. The data was distributed as shown in Figure 3.1. Data1 is stored as two segments, namely Segment1 and Segment2. Segment1 is stored on Physical Host1, whereas Segment 2 is stored on Physical Host2. Data2 is stored on Physical Host3 and Data3 is stored on Physical Host2. By implementing the traditional VM allocation policies (e.g. Round Robin), the first VM, VM1, will be assigned on Host1. Then, VM2 will find the next host with maximum available computational resources, namely Host2. This process will be repeated so that the VM3 will be allocated on Host3. In this case, however, the application on VM2 should access Router2 and Router3 to access Data2. Similarly, the application on VM3 should access Router3 and Router2 to reach Host2. If the applications on VM2 and VM3 are data intensive applications, the communication cost could be huge and the overall application performance may be negatively affected.. For data intensive applications, the volume of data is typically in the size of terabyte or even petabyte. In this case, the application’s total processing time is largely dedicated to data I/O. Without consideration of network status when allocating VM, the unnecessary communication cost between the VM and its data could result in the decrease of performance.. 35.

(38) Data1 (Segment 1). Data 1(Segement 2). Data 3. Data 2. Data Host 1. Data Host 2. Data Host 3. Router 1. Router 2. Router 3. VM 1. Physical Host 1. VM 2. VM 3. Physical Host 2. Physical Host 3. Figure 3.1 The result of implementing the Round Robin to allocate VMs Similarly, Figure 3.2 shows the result of VM allocation by implementing packing policy. The process of packing policy is to find the physical machine that already hosts the maximum number of VMs, among all of the physical hosts with sufficient resource to host the VM. This process will continue until all of the VMs are hosted. By doing so, three VMs would be allocated on the same physical hosts. In this circumstance, the computing resource utilization rate is maximized. Physical Host2 and Physical Host3 could run in an idle status so that the power consumption could be reduced. However, the logical distance between VM2 and its data has been increased so that the performance of the application on VM2 could be worse, comparing with the Round Robin in Figure 3.1. The application on VM3 could access its data via Router 1 and Router 2. Comparing with Figure 3.1, the logical distance is not increased.. 36.

(39) Data 1(Segement 2). Data1 (Segment 1). Data 3. Data 2. Data Host 1. Data Host 2. Data Host 3. Router 1. Router 2. Router 3. Physical Host 2. Physical Host 3. VM 1. VM 2. VM 3. Physical Host 1. Figure 3.2 The result of implementing the packing policy to allocate VMs From the network’s perspective, the best situation is most likely as shown in Figure 3.3. As most part of the Data1 which is associated with VM1 resides on Physical Host1, VM1 should be allocated on this node so that it gets closer to its data. VM2 should be allocated on Physical Host3, whereas VM3 should be allocated on Physical Host2 to reduce the logical distance and the data transmit cost.. 37.

(40) Data1 (Segment 1). Data 1(Segement 2). Data 3. Data2. Data Host 1. Data Host 2. Data Host 3. Router 1. Router 2. Router 3. VM 1. Physical Host 1. VM 2. VM 3. Physical Host 2. Physical Host 3. Figure 3.3 allocating VMs based on the proposed network aware VM allocation policy Thus, the proposed network aware VM allocation policy should achieve two goals. Firstly, the proposed approach should be able to describe the distribution status of the data in the cloud data center. Secondly, the proposed approach should be able to calculate the logical distance between a VM and its data so that the best position of the VM can be found in regards to the network status. In addition, as the network bandwidth is shared by a great number of applications simultaneously, the bandwidth could be assigned and released to a VM frequently. Thus, the best physical position of a VM to access its data may not be permanent. It is necessary to find a way to dynamically monitor the network status and migrate VM accordingly. 38.

(41) 3.2. Research scenario and challenges. To achieve the goal of optimizing network I/O between a VM and its data, there are a few challenges to address. In cloud computing, data is likely to be physically distributed across multiple physical nodes. For example, Data1 in Figure 3.1 is separated as Segment1 and Segment2. Here, the first challenge is how to find a mathematical model to present the relationship between the position and data access speed simultaneously. In addition, dynamic allocation or migration of the position of VM with consideration of the network status should optimize the communication cost between a VM and its data. However, this approach could bring a side effect as it may decrease the utilization rate of the underlying physical resources. As shown in Figure 3.3, if VM2 and VM3 are allocated on Physical Host2 and Physical Host3 respectively, the utilization rate of the underlying physical resources would be decreased. Therefore, the second challenge is how to retain the utilization rate of physical resources when considering network I/O. To address the first challenge, we introduce two matrices to model the relationship between data distribution status and data transfer time, as described in the next section. To resolve the second issue, we introduce a machine learning based mechanism to predict the application running time before allocating it on the physical machine. By doing so, we can schedule the VMs based on the time rather than space so as to optimize the utilization rate of computing resources. This approach will be discussed in details in the Chapter 4 section 3. In terms of the virtual machine migration, there are two challenges as well. The first challenge is about the conditions that will trigger migration. As the network status. 39.

(42) may constantly change in the cloud computing environment, the network I/O performance could sometimes become intolerable (e.g. in case of a router failure). In this case, VMs should be allocated to new physical hosts to retain the performance. The second challenge is how to find the new physical machine to host the VM when migration is needed. 3.3. The Virtual Machine Allocation Approach. In this section, the proposed network aware VM allocation approach will be introduced. In this approach, the time of transferring a standard network package among the physical nodes can be measured by the upper network layer. We assume there. are. m. applications. associated. with. a. set. of. data. Data = {d1, d2 , ..., dm } respectively. According to the data distributed situation, a data distribute matrix Dn, m can be constructed as below:. D=. d1, 1 d1, 2 ... d1, m d 2, 1 d 2, 2 ... d 2, m .... .... .... .... d n , 1 d n , 2 ... d n , m The column number m represents the total number of data which will be accessed by the applications. The row number n represents the total number of data storage nodes. For example, if there is only one application allocated in the entire cloud infrastructure, and its relevant data is split into two blocks and each block has been stored in a particular physical node, then we can use notation d1, 1 and d1, 2 to represent these two data blocks. The matrix D can be built as below:. D=. d1, 1 d2, 1. 40.

(43) The sum of a column. n. ∑d. is the size of data associated with a particular. i ,t. i =0. application t. So, if an application t will process data dt, the size of dt can be computed by the formula below: n. dt = ∑ di , t , dt ∈ Data i =1. If there are j physical nodes P = {P1, P2, …, Pj} and i physical nodes O ={O1, O2, …, Oi} available to store the data and to host applications, respectively, we can use a matrix Si,j to demonstrate the data transfer speed between each data node and each application host. The number of row n in the D and number of column j in the S has the same meaning. They both represent the number of the data storage nodes. Therefore, j=n. In the matrix S, each cell Si, j denotes the network speed between the physical € machine Pi and the associated data storage node Oj. For example, the cell S2, 3 represents the data transfer speed between the physical node P2 to the data storage node O3, where P2 ∈ P, O3 ∈ O . In addition, the network speed Si, j can be represented as a function Speed which associates with a time slot Δt that is used to transfer a standard network package p. So, we have. Si, j = 1 / Speed( p, Δt) So, we have the matrix Si,j as below:. 41.

(44) S=. s1, 1 s1, 1 ... s1, 1 s2, 1 s2, 2 ... s2, j ... si , 1. ... si , 2. ... ... ... si , j. The data access time matrix Ti,m which represents the data access time from each physical machine to the related data can be computed by the following formula:. T = [S × D]x×m For instance, a data access time matrix as below:. T=. 3 4 2 5. contains the following information: l. In the matrix T, the number of row i = 2 indicates that there are two physical machines available to host VMs. We use notation h1 and h2 to denote these two physical machines.. l. The number of column m = 2 represents that the related data is stored separately within two nodes. We use notation d1 and d2 denote these two data.. l. In the matrix T, the values in the first row indicate that the access time from h1 to d1 is 3 milliseconds, whereas the access time from h1 to d2 is 4 milliseconds.. l. In the matrix T, the values in the second row indicate that the access time from h2 to d1 is 2 milliseconds, whereas the access time from h2 to d2 is 5 milliseconds.. For an arriving application that intends to access d1, it is desirable to place the VM that will host this application on h2 because it has the shortest data access time. 42.

(45) In practice, however, even though the physical host with the best data I/O performance has been discovered, the VM still cannot be placed to that host if the host does not have enough available resources. Thus, it is necessary to firstly remove those physical nodes that do not have enough physical resource to support the arriving VM request. In other words, our approach does not need to traverse all of the physical nodes in the data center. Rather, only the nodes that have enough resources would be taken into account. To find the qualified physical nodes that have enough resources, we assume that the computing resources can be denoted as a set:. C = {P, M } which indicates the total computation capacity on a physical machine. Here, the notation P represents the total processor capacity on the physical machine and M denotes the total memory capacity on the physical machine. Let Co represents the total occupied computing resources on a hose. Co can be represented as n. n. i =1. i =1. Co = {∑ Pvmi ,∑ M vmi } where n is the number of running VMs on the host. The total occupied process equals to the sum of all VM used processes and the total occupied memory equals to the sum of all VM used memories. Therefore, the available computing resource set can be represented as: n. n. i =1. i =1. C A = {P − ∑ Pvmi ,M − ∑ M vmi }. 43.

(46) where n indicates the number of running VMs. Let CR = {PR ,M R } denote the computation capacity requirement of the newly arriving VM. PR and MR denote the process requirement and memory requirement, respectively. Then, if the available computation resource on the physical machine satisfies n. P − ∑ Pvmi > PR i =1. and n. M − ∑ M vmi > M R i =1. then the VM would be able to be placed on this physical machine. For simplicity, we use CA to represent the total available computational resources, and CR denotes the requested computing resources of the arriving VM, then the qualified physical hosts must satisfy the relationship CA > CR. After the placement of the arriving VM, the available commutating resource will be modified because some resources have been assigned to the VM. So, the new available resources can be computed by the formula below:. CA = {PA − PR ,M A − M R } So far, the proposed network aware VM allocation policy can be described in the following steps:. 44.

(47) 1. Firstly, for an arriving VM, the physical nodes which do not have enough resources to support the VM will be ruled out and the remaining physical nodes are qualified nodes. 2. Secondly, from the qualified nodes, the data access time matrix Si,j will be obtained to compute the data distribution matrix Dn,m. 3. Thirdly, the data transfer time matrix Ti, m can be obtained by multiplying Dn,m with Si,j. 4. Finally, the value in each column in matrix Ti, m is calculated. The column with minimum total value represents the node with the best data I/O performance. The algorithm that implements the above steps is presented as follows.. 45.

(48) SET  S  [m][j],  D  [j][n],  CR,  CA[m]          FOR  each   PhysicalNode ∈ m          IF  CA  [m]>  CR   THEN      .  .  . FOR  each  S  [m][j]  in  m  and  D  [j][n]  in  column  n                       T[m][n]  +=  S  [m][j]  ×  D  [j][n]            .  .  .  .  . FOR  each  T[m][n]  in  n  .  .  .  .  .  . n. Minimum  T  =  . ∑T[m][i].  . i=1.  .  .  .  .  . IF  Minimum  T  >  Minimum  T  .  .  .  .  .  . THEN    .  .  .  .  .  .  .  .  .  .  .  . END  IF  .  .  .  .  . END  LOOP          . RETURN   m with  Minimum  T  . END  LOOP   ELSE    .  Discard  the   PhysicalNo de  . END  IF            END  LOOP      . For a cloud computing environment with physical nodes (m), data nodes (n), the algorithm initial with a network speed matrix S[m][j], a data distribution matrix D[j][n], the VM computing resource requirement (CR) and available computing resource on each physical node CA[m]. The value of in each cell in the data transfer speed matrix T[m][n] can be obtained by the formula: T[m][n] += S [m][j] × D [j][n]. After the generation of data transfer matrix, then sum up the value in each row the row with minimum value would be the best place to allocate the VM in terms of the network condition. 46.

(49) 3.4. The Virtual Machine Migration Approach. Due to the fact that unpredictable network latency may occur between the routers in the data center, the data transmission among the VMs cannot be guaranteed. According to the observation in other research, the network I/O condition within the Amazon EC2 [2] is unstable [46]. The researchers measured the TCP/UDP throughput between 6 small instances pairs and 3 medium instance pairs over 150 hours. The throughput was calculated while each 256KB data was transmitted across network. The findings show that the TCP/UDP transmission is drastically unstable in Amazon EC2, varying from full link rate at near 1Gb/s to 0 Gb/s. This research indicates that the VMs scheduling and balancing processes within cloud computing may cause additional network traffic jam. It can be seen that the unstable network characteristic may jeopardize the quality of cloud computing services in regards to the application completion time. In this research, we assume that there is a Service Level Agreement (SLA) TSLA between the cloud service consumer and the service n. provider in regards to the maximum application process time. If. ∑T. x,i. i=1. the VM should be migrated to optimize the data access time.. 47. > TSLA , then.

References

Related documents