• No results found

VMPP ALgORIThMS

In document Networking for Big Data Chapman pdf (Page 53-55)

Because of the significance of VM placement strategy to data centers, much research has been focusing on this problem. The purpose of this chapter is to introduce some represen- tative literature, which may help us have a clearer and more comprehensive view of this problem. In this chapter, we mainly focus our attention on

• Minimizing server cost • Minimizing network cost

• Jointly minimizing server cost and network cost • Trade-off between cost and utility

Server Cost Minimization Algorithms

Server cost minimization is the most basic problem of VMPP, and has been studied exten- sively. The problem can be described as minimizing the used server number, while at the same time satisfying the resource demand of VMs such as CPU, memory, and network.

Basic Algorithm

1. The basic server number minimization VMPP can be treated as a static one-dimen- sional bin packing problem (BPP) [17]: the sets of VMs and servers are given and assumed to be static; the resource is simplified as one dimensional (e.g., CPU); the objective is to pack all VMs to servers, such that the used number of servers is mini- mized, and the capacity of each server is not less than the resource demand of the VMs packed on it. This VMPP can be solved by the first-fit bin packing algorithm [17], that is, assigning each VM, one after another, to the first server that has enough remaining resource capacity for the VM, until all the VMs are packed.

Multidimensional Resources

In practice, to run applications, VMs need multidimensional resources such as CPU, mem- ory, and network resources. Mishra and Sahoo [18] proposed a method to deal with the server number minimization VMPP with respect to multidimensional resources, which is similar to the vector BPP. For simplicity, we demonstrate the main idea in two dimensions, as in Figure 2.2.

A Survey of Virtual Machine Placement in Cloud Computing for Big Data    ◾   31  

The horizontal axis represents the normalized CPU, and the vertical axis represents the normalized memory. Vector TC and RU denote the total capacity and the resource utilization of a server, respectively. Vector RI, which is the difference between vector RU and RU’s projection on TC, can be used as a metric to measure the resource imbalance of a server. The smaller the vector RI is, the more balanced the server resource utilization is. By minimizing the imbalance of server resource utilization (i.e., the magnitude of vector RI), we can increase the multidimensional server resource utilization, and thus, reduce the needed server number. The main idea of this bin packing heuristic can be described as fol- lows. First, assign the least imbalanced VM to the first server. Second, assign the next VM that meets the following two requirements to the current PM: (1) the resource demand of the VM is less than the remaining capacity of the current server; (2) after the VM has been assigned, the current server has the least value of RI. If the current server does not have enough remaining capacity to host any VM, then open a new server. Finally, repeat the above process, until all VMs are packed.

Dynamic Resource Demand

For dynamic VMPP, the future VM demand is unknown, and will change in a short time. So we need to recompute the placement configuration, and redeploy the VMs, preferably in a shorter time than is required of a significant change of resource demand. One way of solving dynamic server cost minimization VMPP is by first predicting the future VM demand and treating the demand as deterministic, and then solving the problem as static BPP. Bobroff et al. [19] provided a prediction technique, and has solved the VMPP using First Fit Decreasing (FFD) algorithm [20]. More prediction approaches can be found in Wood et al. [21].

But a recent study [13,22] has shown that data center network traffic patterns are highly volatile, and it is unreliable to treat the VM network demand as deterministic. Wang et al.

TC 1 MEM RI RU CPU 1

[23] proposed to treat the future network bandwidth usage as a random variable obey- ing Gaussian distribution, for online VMPP (an online problem refers to the situation in which new VMs may arrive, old VMs may leave, and the information cannot be known in advance). This VMPP is a stochastic bin packing problem (SBPP). The difference between BPP and SBPP is that when we reduce the latter to the former, the size of a VM demand will depend on the other VMs packed on the same server. To minimize the server number, we need to first consider which VMs should be packed together so that their sizes are reduced, and then solve the problem as BPP. Wang et al. [23] and Breitgand and Epstein [24] both proposed some methods on how to divide the VMs into different groups.

The advantage of treating server minimization VMPP as BPP is that we can find some efficient algorithms such as fit-first and FFD to solve the problem fast, while the disadvan- tage is that these algorithms do not guarantee an optimal solution.

Alternatives to Bin-Packing

Instead of treating the sever number minimization problem as a BPP, Hermenier et al. [25] treats the problem as a constraint satisfaction problem (CSP) [26], which can be gener- ally defined as finding the solution for a set of objectives under some constraints. For this VMPP, server number is the objective to minimize, and the constraints can be expressed as: for every server, its resource capacity is not less than the total resource demand of the VMs on it. This CSP can be solved by dynamic programming [27]. The advantage of treating VMPP as CSP is that the solution is better than those of heuristics based on local optimization algorithms such as first-fit and FFD, and that the optimal solution can be found frequently. The disadvantage lies in the excessive time needed for problem solving. Hermenier et al. [25] proposed some methods to reduce the computation time as well.

Network Cost Minimization Algorithms

Compared with server cost minimization VMPP, network cost minimization VMPP is relatively new. The traditional method to handle network problems is with respect to improving the underlying network architecture. An early work that considers minimizing network cost with respect to VMPP is done in Meng et al. [13].

Minimizing Network Costs between VMs

For VMPP, traffic cost between two VMs i and j can be defined as TijDij in which TijdmTijdm denotes traffic rate from VMs i to j, and Dij denotes the communication cost between the two hosts of VMs i and j. Traffic rate can be represented by the amount of data transferred in a unit time, and communication cost can be represented by the number of switches on the routing path.

In Meng et al. [13], each server is treated as consisting of several slots, and a slot repre- sents a portion of server resources. For simplicity, the number of slots and VMs are assumed to be equal, and the mapping between slots and VMs should be one-to-one. The goal is to find a permutation function dm π:[1, … , n] → [1, … , n] to minimize the total traffic cost

T Dij d j n i n m = =

In document Networking for Big Data Chapman pdf (Page 53-55)