Coordinating Virtual Machine Migrations in Enterprise Data Centers and Clouds

(1)

Coordinating Virtual Machine Migrations in Enterprise Data Centers and Clouds

Haifeng Chen

(1)

Hui Kang

(2)

Guofei Jiang

(1)

Yueping Zhang

(1) (1)

_{NEC Laboratories America, Inc.}

(2)

_{SUNY Stony Brook University}

4 Independence Way, Princeton, NJ 08540

Stony Brook, NY 11794

{

haifeng, gfj, yueping

}

@nec-labs.com

[email protected]

Abstract

Virtual machine(VM) migration usually requires a con-siderable amount of system resources such as the network bandwidth. In the case of multiple simultaneous migra-tions, which happens regularly in data center operamigra-tions, such resource demands will increase dramatically and are difficult to be satisfied immediately. In this paper we propose a scheduling method for multiple VM migra-tions to guarantee the fast completion of those tasks and hence the reduced impacts on system performance. We consider two aspects to achieve that. Firstly we analyze the VM migration behavior and build a simulation tool to predict the time of multiple migrations under different links conditions and VM characteristics. By analyzing the simulation outputs, we can discover the best bandwidth sharing policy for each network link, i.e., the number of concurrent migrations that can lead to the shortest com-pletion time. Based on the link sharing policy, we further propose a bin-packing algorithm to organize bandwidth resources from all the network links, and allocate them to different migration tasks. As a result of our global resource assignment, the migration tasks can fully utilize available resources in the whole network to achieve the fast completion. Experimental results have demonstrated the effectiveness of our migration scheduling approach.

Keywords-resource scheduling; optimization

I. Introduction

Live VM migration is being widely utilized in vir-tualized data centers and clouds due to its capability of maintaining high system performance under dynamic workloads. However, VM migration requires considerable network bandwidth and other resources, which may in consequence lead to performance degradations of the mi-grating VM during the period of migration. While that resource demand and VM performance drop are usually affordable for a single VM migration due to the short period of that process, it is challenging to manage mul-tiple concurrent migrations because the system may not have enough resources immediately to meet the dramatic

resource demands from many VMs. As a result, it will take much longer time for multiple migrations to complete, which leads to long performance degradations for those VMs. To this end, this paper investigates the behavior of concurrent VM migrations, and proposes a solution to schedule multiple migrations appropriately for the avoid-ance of adverse impacts caused by resource shortages.

The motivation behind our study comes from the fact that multiple VM migrations show up regularly in real system operations. For instance, if some physical machines need to be removed from service for maintenance, all the VMs in those machines have to be migrated to other places. Since applications are nowadays comprised of many VMs distributed across several machines for the purpose of load balancing and fault tolerance, the workload surge in an application may require the rearrangement of several VM instances in the system. An even worse situation is that some system faults such as configuration mistakes may trigger a large number of VM migrations. In those cases, it is important to handle concurrent VM migrations in an effective way, so that they can be completed as fast as possible and hence the total performance degradation time for those VMs can be minimized.

Note that the process of VM migration is different from traditional data migrations [7][10]. While data migrations are conducted between storage devices in which the storage IO usually becomes the resource bottleneck, VM migra-tion mainly moves VM memory pages between machines where the network bandwidth becomes precious for most cases. More importantly, unlike the data migration where the size of transferred data are usually fixed, the contents that need to be transferred in VM migration vary with the available network bandwidth as well as the characteristics of VM such as its memory dirty rate. This is due to the mechanism of iterative memory pre-copy implemented in most migration software, where the number of memory pages need to be transferred in each pre-copy round depends on the speed of transfer in previous rounds. For the same migration task, it may take much longer time than expected in a small bandwidth network compared with that with enough bandwidth, especially when the memory dirty rate of that VM is high.

(2)

The unique features of VM migration pose new chal-lenges when multiple VMs request to migrate simultane-ously. Firstly since those migrations may have overlapped links in their migration paths, we need to determine whether to let them share the link by initiating them con-currently, and what is the maximum number of concurrent migrations allowed in that link. The link sharing between multiple migrations can improve the overall utilization of network bandwidth due to the resource multiplexing between migration flows, and thus contribute to the quick completion of migrations. However, the amount of trans-ferred memory pages also increases since each VM is only allocated with a portion of bandwidth in the overlapped links. We need to find a balance for the optimal number of concurrent migrations that share the network link.

Making such a balance is difficult because it depends on several factors such as the link capacity and the VM memory dirty rate. In addition, the dependency of migration performance with respect to those factors is non-linear, which is hard to be predicted by some mathematical formulas. Therefore, this paper proposes to use software simulation to identify the optimum link sharing policy under different VM and link conditions. Our simulation follows the source code implementation in Xen to predict the VM migration time given the available link bandwidth and VM characteristics. In the case of multiple VM migrations, we incorporate several extra factors in the simulation. For example, some migration overheads, such as the time spent in the initial resource reservation and final VM activation in the target machine, can be saved by the parallelism of multiple migrations. We also model behavior of bandwidth usages when multiple migrations share the network link. By running the simulation tool, we can compare VM migration performance under different conditions, and then generate the optimal sharing policy for each link, i.e., the number of concurrent migrations that can achieve the minimal total time based on the link’s available bandwidth and VM’s memory page dirty rate.

In addition to the network link sharing, another chal-lenge for multiple VM migrations is the assignment of system global resources to each migration to achieve the quick completion of those tasks. This is because that the network is comprised of a lot of links with various available bandwidth and multiple migrations travel through different sets of network links. While the link sharing strategy only concerns with the number of migrations in each individual link, the global resource assignment attempts to find the optimal scheduling of VM migrations to fully utilize the bandwidth in the whole network. For example, if a VM’s migration path does not overlap with those of other tasks, we can initiate it immediately. But if some links along a VM’s migration path are shared with other migrations, we need to find a plan to order those tasks

based on the migration sharing policies in those links. This paper presents a bin-packing algorithm to address the global resource assignment, in which the bin represents all the links in the network and the item denotes each migration task. While the capacity of the bin is determined by the available bandwidth in all network links, the size of each item is associated with the bandwidth demand of each migration, which can be estimated from the migration sharing policy in each link along that VM’s migration path. Given the bin capacity and item sizes, we use the first-fit decreasing (FFD) heuristic to allocate each migration to a corresponding bin so that the total number of bins to host those migrations is minimized. As a result, we achieve the quickest completion of migration tasks, because the number of bins corresponds to the length of period for executing those migrations.

In the experiments, we first set up a testbed to run migrations and compare the real evaluation results with simulation outputs. We demonstrate that our simulation tool can provide accurate predictions for migration perfor-mance under different conditions. We then further evaluate our migration scheduling algorithm by simulating different numbers of VM migrations in a data center. Results show that with the help of our migration scheduler, we can achieve the fast completion of multiple migrations.

II. Virtual Machine Migration

Our study focuses on the pre-copy migration technique implemented in common virtualization software such as Xen and VMware. It makes use of an iterative multi-pass algorithm to transfer VM guest memory in successive steps. In each iteration, only the memory that has been dirtied in the interim is sent. When the pre-copy stage is terminated, the final state is sent to the new host and the transfer of control to the new physical machine is com-pleted. Figure 1 presents the main stages of VM migration from host A to destination B. The stop condition for the iterative pre-copy step varies with different products, based on the observed convergence of memory page transfer.

The migration of a VM from the source machine A to the destination host B includes following steps:

1) Resource Reservation: resources at the destination host B are reserved.

2) Iterative pre-copy: VM memory pages modified during the previous iteration are transferred to the destination. The entire RAM is sent in the first iteration.

3) Stop-and-copy: The VM is halted for the final transfer round. The remaining memory pages are transferred to the destination B.

4) Commitment and Activation: After machine B indicates that it has successfully received a consistent copy of the VM, resources are re-attached to the VM in host B.

(3)

Two metrics are commonly used to quantify the VM migration performance: the total migration time and the

service downtime. While the total migration time

cov-ers the duration of entire migration process, the service downtime happens between when the VM is suspended at machine A and when it is activated in machine B, i.e., the last stage of migration. In this paper, we mainly focus on the total migration time because it is the period in which the migrating VM demonstrates performance degradations. Note it is possible that other VMs located in the source and destination machines are also impacted due to their competitions with the migration process for system resources [2][12]. Here we do not investigate the interferences between the migrating VM and background applications. Given available system resources, our goal is to properly schedule migration tasks to reduce their total time, so that the period for performance drop in migrating VMs (and possible background VMs) can be reduced.

In reality, when multiple migrations occur concurrently, their overall migration time may become much longer due to the pre-copy mechanism. In order to demonstrate this, we first model the migration time of a single VM given

its memory sizeV and memory dirty rateR in a network

link with available bandwidth B. Among the four stages

described in Figure 1, the time spent in the ‘resource reservation’ and ‘commitment and activation’ is relatively stable, which can be described by two constant val-ues Pre-migrationOverhead and Post-migrationOverhead. However, the time for the pre-copy and stop-and-copy varies significantly with the link condition and VM char-acteristics. It starts with the transfer of all VM memory

pages which takes time t0 = V /B, followed by

trans-ferring memory pages that are dirtied in the previous

round, which takes t1 = R ∗ t0/B, t2 = R ∗ t1/B,

· · ·, tk−1 = R ∗ tk−2/B, · · ·. The pre-copy process

is stopped at the Kth round when stop conditions are

met. After that, the source VM is suspended and the remaining memory is transferred to the destination which

takes tK =R∗tK−1/B. So the period for the pre-copy

and stop-and-copy is computed as

T0 = t0+t1+· · ·+tK

= V

B ×

1−(R/B)K+1

1−R/B (1)

As a whole, the total migration time is described as

Tmig = P remigrationOverhead + T0 +P ostmigrationOverhead (2) Note that VM migration is different from pure data migra-tion. While in data migration the size of transferred data is fixed under different network conditions if we ignore the resent data for packet losses, the size of transferred content in VM migration varies with the link bandwidth as well as the VM memory dirty rate, as shown in equation (1). Given

larger network bandwidth, the time ti for each pre-copy

iteration will be shorter, which in consequence generates

less contentsR∗ti for the next round and hence leads to

the earlier convergence of pre-copy. On the other hand, a network with low bandwidth will generate more memory pages that need to be transferred in pre-copy iterations.

Due to the unique behavior of VM migration, we face new challenges when multiple VMs request for migration simultaneously. Firstly if migration tasks traverse through some common network links, we need to determine whether to execute those tasks in one batch or several batches sequentially. Making such a decision requires the prediction of VM migration performance under those different conditions. However the migration performance depends on several factors including the link bandwidth and VM characteristics, and such dependencies are non-linear, as shown in equation (1). We cannot directly use equations (1) and (2) to predict the migration time because

it is hard to obtain the value K in (1). Instead, this paper

uses software simulation to discover the migration perfor-mance under different link sharing strategies. Based on the simulation results, we can identify the optimal sharing policy among migrations in the overlapped network links. While the optimal link sharing is associated with each individual network link, it is also necessary to organize the global resources from all network links and assign them to multiple migration tasks, because each migration usually travels through several network links. We need to find a schedule to determine which VM needs to migrate first and together with which other VMs, so that the bandwidth in the whole network can be fully utilized and hence the total migration time can be minimized. In the paper, we transform the migration scheduling into a bin-packing problem, in which the bin represents network links and the item denotes each migration task. We present an algorithm to pack migrations into the smallest number of bins, which leads to the quickest completion of those tasks.

As a whole, our method exploits both the bandwidth sharing in each link and the global resource assignment to achieve the shortest period of multiple migrations. We will present the link bandwidth sharing in Section III, and the global resource assignment in Section IV respectively. Section V will present experimental results of our approach.

III. Network Link Sharing

This section studies the performance of executing sev-eral migrations simultaneously in one network link. Given

smigration tasks, if we execute them in parallel, the total

migration timeT(pal) _becomes

T(pal)_{= max}_{_T

(4)

where Tmig−i is the duration of the ith migration. Note

that the value ofTmig−ivaries with number of concurrent

migrationssdue to the differences in allocated bandwidth.

In this section, we estimate Tmig−i under various setting

of s, and identify the best network link sharing strategy,

i.e., the s concurrent migrations that lead to the shortest

total migration time. We use the normalized timeT(pal)_/s

to compare migrations under differents values.

Whens= 1, it corresponds to the case when migrations

are performed sequentially. With the increase of s value,

more migrations are executed in parallel. Compared with the sequential migration, there are both advantages and disadvantages if multiple migrations are issued simulta-neously. First, we can reduce the total amount of pre-migration and post-pre-migration overhead in parallel migra-tions. This is because the time spent in the ‘resource reservation’ and ‘commitment and activation’ stages in VM migration are less dependent on the allocated network bandwidth. By executing those two stages in parallel for all migrations, we can save the total migration time.

Scheduling several migrations simultaneously can also improve the bandwidth utilization of network links. Since VM memory pages are transmitted over the TCP socket in the migration, the additive increase and multiplicative decease mechanism in TCP congestion control [11] may underutilize the network bandwidth. While tuning some TCP configurations, such as allocating large TCP buffers, may improve the bandwidth utilization, a dedicated TCP flow still could not make use of all the available bandwidth due to its implementation overhead. By assigning multiple migration flows in a single link, we can multiplex the usage of network bandwidth to improve the migration efficiency. Note here we assume that the background network traffic initiated by non-migrating VMs have much shorter durations than the migration flows, which is typical in enterprise network operations. As a consequence, those background traffic flows may not significantly interfere with the long migration flows [4].

On the other hand, due to the sharing of network link in parallel migrations, the allocated network bandwidth for each migration is reduced compared with the sequential case. As described in Section II, such a bandwidth re-duction will generate more memory pages that need to be transferred during the migration, which in consequence leads to an increase of VM migration time.

In order to find a balance between the benefits and disadvantages of network link sharing, we need to first predict the VM migration performance under different link sharing strategies. However, due to the complexity of VM migration, its duration depends on several factors such as the available bandwidth in the link and VM memory dirty rates. Table I presents two examples to illustrate this, in which we compare the time of two migrations when

VM page dirty rate link band-width sequential migration parallel migration 2 k/s 1 Gbps 22s 20s Case 1 15 k/s 1 Gbps 27s 38s 10 k/s 1 Gbps 23s 21s Case 2 10 k/s 300 Mbps 78s 90s

Table I. Total time of VM migrations when they are executed sequentially and in parallel.

they are executed in a sequential order and simultaneously. In the first example, we migrate the VMs with 1GB memory in a link with 1Gbps bandwidth. It shows that when the VMs have 2k memory dirty pages per second, it takes only 20 seconds for parallel migration to complete, whereas sequential migration consumes 22 seconds. How-ever, when the VM memory dirty rate increases to 15k pages per second, parallel migration becomes slower than the sequential one. In the second example, we compare the migration time of two VMs with 1GB memory and 10k memory pages per second dirty rate in different link situations. It shows that while parallel migration is faster when two migrations are executed in a link with 1Gbps available bandwidth, sequential migration becomes much faster when the link bandwidth drops to 300 Mbps.

The above observations illustrate the complex behav-ior of VM migration whose performance is hard to be predicted accurately. Even in a single VM situation, the migration time is a nonlinear function of link condition and VM characteristics, as shown in equation (1). When several extra factors are involved in multiple simultaneous migrations, it becomes even harder to develop a mathemat-ical model to predict the VM migration performance. As an alternative, this paper uses software simulation to identify the migration performance under different link sharing strategies. Our simulation follows the source code of VM migration implemented in Xen, which is in accordance with the stages described in Figure 1. Since there have already been some studies on simulating a single VM migration [1], we do not present that part in detail here.

Our focus is to predict the migration timeTmig−i of each

VM under s simultaneous migrations, and hence their

overall migration time T(pal)_.

As described in Section II, the VM migration time

Tmig−i is composed of three parts: the

PremigrationOver-head, the time T0 spent in the ‘iterative pre-copy’ and

‘stop-and-copy’ stages, and the PostmigrationOverhead. In multiple simultanous migrations, we use the same value of PremigrationOverhead and PostmigrationOverhead as in the single VM migration, because those parameters usually do not change too much with respect to allocated

resources. However, the timeT0 changes significantly due

to the bandwidth sharing between multiple migrations. In

the case whensmigrations are executed simultaneously in

(5)

is allocated with only B/s bandwidth. Considering the overhead of TCP in bandwidth utilizations, we further express the allocated bandwidth for each migration as

˜

B=B∗(1−∆)/s (4)

where ∆ represents the degree of bandwidth reduction

due to the overhead of TCP congestion control. The

value of ∆ decreases with the increase of s, i.e., the

number of concurrent migrations, due to the multiplexing of bandwidth usages between multiple migrations. In the

paper, the∆value is determined by the following equation

∆ = 0.25∗exp{−0.5∗(s−1)} . (5)

As we can see from equations (4) and (5), while B˜ is

only around 0.75B when one migration is executed in

the link, the real utilized bandwidth for each migration

exponentially increases to approach B/s as the number

of concurrent migrationssincreases. Given the real

band-widthB˜, we can simulate the timeT0for each migration,

and hence its total migration timeTmig−i.

0 5000 10000 15000 0 500 1000 1500 2000 1 2 3

memory dirty rate (pages/sec) link bandwidth (Mbps)

number of migrations

Figure 2. The optimal link sharing policy for multiple migrations with respect to the link bandwidth and VM memory dirty rates.

By using our simulation tool, we evaluate the total VM migration time under various link sharing strategies, given different VM characteristics and link available bandwidth. From the simulation results, we find that the optimal link sharing mainly depends on two factors: the link available bandwidth, and the memory dirty rates of migrating VMs. This is because that those two metrics determine the size of extra contents, in addition to the original VM memory, that need to be transferred during the migration. After summarizing many simulation scenarios, we can obtain the

optimalsnumber of concurrent migrations, given specific

link bandwidth and VM memory dirty rates, that can lead to the shortest total migration time.

For the ease of explanation, we assume that all the migrating VMs have 1GB memory and share the same memory dirty rate, and plot some of our results in Figure

2. The x and y axes in the figure represent the link

available bandwidth and VM memory dirty rate

respec-tively, whereas the z axis describes the optimal number

of concurrent migrations given different link and VM

settings. It shows that the optimal migration number s

increases with the increase link bandwidth, but decreases with the increase of VM memory dirty rates. For the region with large link bandwidth and small memory dirty rates, we can run 3 concurrent migrations to achieve the best performance. This is because that the benefits of link bandwidth sharing surpass the disadvantage of increased memory dirty pages in those bandwidth sufficient and low memory dirty rate conditions. On the other hand, when the link bandwidth is small and VMs have large memory dirty rates, sequential migration performs the best. Note that it

is hard to estimate the svalue based on just one of those

two metrics. Only when we have the values of both link bandwidth and VM memory dirty rates, we can locate the corresponding region in Figure 2 and then determine the optimal concurrent migration number.

IV. Global Resource Assignment

Section III only presents the bandwidth sharing in a

single network link for multiple migrations. In reality,

the whole network is comprised of a large number of communication links organized by certain topology design. Meanwhile, each migration usually covers a set of network links in its migration path, i.e., from the source to desti-nation machines. Based on the bandwidth sharing policy in each link, this section identifies an optimal assignment of global network resources for multiple migrations to achieve the minimal total migration time.

Figure 3. Multiple migrations in the network.

Figure 3 presents an example to illustrate the problem of global resource assignment. In a typical three-tier data

center network, there are three migrations M1, M2 and

M3, from source si to the destination machine di, i =

1,2,3, whose migration paths are plotted as the dash, solid,

and dash-dot lines respectively. As we can see, since the

migrationM3has a totally disjoint path withM1 andM2,

M3 can be scheduled concurrently with M1 andM2. On

the other hand, since M1 andM2 have overlapped links

in their migration paths, whether we can schedule them concurrently depends on the bandwidth sharing policy in those common links. That is, if the available bandwidths in overlapped links are large enough to support parallel migrations for the quick completion of both tasks, we can

(6)

executeM1 andM2concurrently, and vice versa.

While it is relatively easy to handle only two migrations with overlapped links as in the above example, it becomes a combinatorial optimization problem to find the best organization of many migrations with different overlapped links along their migration paths. In this paper, we use a bin-packing algorithm [8] to address that issue. We treat all the links in the network as a bin, and use a

multi-dimensional vectorC to represent its capacity. That is, we

index each link in the network, and measure the available bandwidth in those links as

C= [c1, c2,· · · , cr]⊤ (6)

where r equals to the number of links in the bin. In

practice, the value ofrdepends on the number of physical

links in the network as well as the network configurations. For example, when the network is operated in the full-duplex mode, which is typical in network configurations,

the value requals to twice the size of network links due

to the differentiation in traffic directions. If the network is configured by equal-cost multipath (ECMP) load sharing [5], we need to combine those multiple links that are used for balancing the load into a logic link, and only include the logical link in the vector (6).

This item in our bin-packing algorithm relates to each migration task. Given the indices of network links, we use

a r dimensional binary vector P(i) _{= [1}_,₀_,₀_,_{· · ·}_,_1]⊤ _to

represent the path of migrationMi, in which the value ‘1’

in theith entry indicates the inclusion of theith link in the

path and ‘0’ vice versa. The end-to-end bandwidth demand

for migrationMi is defined as a vector

D(i)=P(i)×d(i)= [1,0,0,· · · ,1]⊤×d(i) (7)

d(i)_{is the expected bandwidth allocated to}_M

i, which will

be determined in Section IV-A.

Now given the capacity of the bin and the resource demands of all items, the bin-packing algorithm is to pack those items into the smallest number of bins. By doing so, we can achieve the quickest completion of all migrations, because the number of bins generated by bin-packing represents the total duration of those migrations. In Figure 4, we demonstrate such a process by first assuming that all

migrations take the same amount of timeT. Thexaxis in

the figure denotes the time, which is divided intobintervals

T1,T2, · · ·,Tb, with equal length T. Since all migrations

are assumed to have the same durationT, the network has

the bandwidth capacity C at the beginning of each epoch

Ti. The y axis in Figure 4 represents the migration tasks.

Considering the link sharing policy discussed in Section III, we can only initiate a subset of migrations in each

epoch Ti. Our bin-packing solution is to find an optimal

assignment of VM migrations into those epochs, so that the total migrations can be completed in the shortest time.

Figure 4. Migration scheduling is regarded as a bin-packing process.

There are several questions need to be addressed fol-lowing the bin-packing framework. 1) How to determine

the expected bandwidth allocationd(i)_{in equation (7), and}

hence the demandD(i)_{, for each migration task? 2) What}

is the detail of bin-packing algorithm used for migration scheduling? 3) Since Figure 4 assumes the same duration for all migrations, how to deal with the situation where migration tasks have different durations? The following sections describe our solutions to those issues.

A. Estimate the Migration Demand

The expected bandwidthd(i)_{allocated to each migration}

Mi is important for achieving the shortest completion of

multiple migrations. If d(i) _{is small, we can pack more}

migrations in each bin, and the total migration period, i.e., the number of epochs in Figure 4 can be reduced.

However, ifd(i)_{is too small, there will be many concurrent}

migrations in each bin to compete for network resources, and hence the migration time for those concurrent tasks, i.e., the length of each epoch, will become longer. Since the optimal link sharing policy discovered in Section III provides a guideline about the number of concurrent migrations in each network link, we determine the value

ofd(i) _{based on such policies in all the links along}_M

i’s

migration path.

We denote the links in Mi’s path as a set

{l₁(i), l₂(i),· · · , l(_ki)}, each of which has available bandwidth

cj,j = 1,· · · , k. In the following we first estimateMi’s

bandwidth demand in each link l(ji), and then derive the

overall demandd(i)_{from those local estimations. Given the}

available capacitycj of linkl(

i)

j and a migrating VM with

memory page dirty rate Ri, we can identify its optimal

bandwidth sharing policys(ji) in that link from simulation

results in Section III, which represents the optimal number of such VMs that can migrate simultaneously in the link to achieve the minimal total migration time. From that we

determine the bandwidth demand ofMi in linkl

(i)

j as

d(ji)=cj/s

(i)

j (8)

(7)

de-termined to allow s(ji) such concurrent migrations in the

link to achieve the optimal migration time. If the link available capacity is not big enough to support parallel

migrations, i.e.,s(ji) = 1, the migration Mi can consume

all the available bandwidth in that link. Otherwise,Mionly

obtains a portion of the bandwidth capacity because it has to share the link with other migration tasks.

Once we find the bandwidth demand d(ji) for all the

links l_j(i), j = 1,· · ·, k along Mi’s path, the overall

bandwidth demand ofMi is determined as the maximum

demand among all the local estimations

d(i)_{= max}n_d(i) 1 , d (i) 2 ,· · ·, d (i) k o (9)

The intuition here is that the overall demand d(i) _should

satisfy all the local demands d(_ji) to ensure the quick

completion of migrations. Note that although rare, it is

possible that d(i) _{may exceed the available bandwidth of}

some links in Mi’s path. In that case,d(i) is determined

as the minimal bandwidth capacity of those links.

B. Find the Bin-Packing Solution

As shown in Figure 4, we want to pack the migration tasks into the smallest number of time slots, where the aggregated demand of assigned migrations in each time slot should not exceed the capacity of the network. Such a bin-packing problem is NP-hard, and the first-fit decreasing (FFD) heuristic [3] is used here to identify its near optimal solution. The FFD method sorts migration tasks in a decreasing order of bandwidth demands, and attempts to place each task in the earliest epoch that can accommodate it. It can be described in the following steps.

1) Transform the resource demand vector D(i) _{for each}

migrationMi, described in equation (7), into a scalarη( i₎

,

D(i) _→_η(i)_{, where}_η(i) _{equals to the summation of all}

the elements inD(i)_.

2) Sort the resource demands based on their transformedη(i)

values ;

3) Scan migration tasks from high to low η(i)s. For each selected migration task, we try to place it in the earliest time intervalT that still has free capacities to host it; 4) Repeat Step 3) until all the migration jobs have been

allocated.

In Step 1 we transform the demand vector D into a

scalar η because it is not straight forward to use the

vectorDto compare and sort bandwidth demands between

different migrations. Here we use the summation of all the

elements in D as the transformed scalar η. The intuition

is that the migrations that require more bandwidth and also cover more links should get higher priorities in the scheduling. After the above steps, the number of assigned

binsTb equals to the duration of total migration tasks.

C. Extend to General Situations

In practice, the migration tasks have different time durations due to their variances in VM memory size,

memory page dirty rate, and so on. There are no clear boundaries between each time slot, described as vertical dash lines in Figure 4, to synchronize migration tasks. In those general situations, we still use the FFD based heuristics to schedule migrations. However, the start of each migration is triggered by event signals rather than the time. That is, when a migration is completed, it sends a signal to the migration scheduler. Upon receiving that event, the scheduler computes the current available bandwidth in the network links

C(new)=C − X

active migrationsMk

D(k) (10)

whereC is the original link capacity described in equation

(6) and D(k)_{s are the bandwidth demands of ongoing}

migrations. We regardC(new)_{as the current bin size, and}

scan the ordered migration tasks in the waiting list with an attempt to allocate as many of them as possible to fill

the capacityC(new)_{. The whole process stops when all the}

VMs have migrated to their target machines.

V. Experimental Results

In this section we first run VM migrations under different situations and compare the real migration time with simulation results. After that, we use the simulation data to mimic a large scale system where a number of VMs request to migrate. We compare different strategies of scheduling those migrations, and demonstrate the good performance of our approach.

A. Validate the Simulation Model

In order to validate the accuracy of our simulation model in predicting the time of multiple VM migrations, we set up a test bed in which two physical machines connect to a switch through 1 Gbps network links. Among the two machines, one serves as the migration source and the other works as the destination. We use the Cisco SRW2016 16-port Gigabit switch in the experiment. The physical servers have Intel Core2 Quad 2.5GHz Processor, 4GB RAM, and 500GB hard disk. In the experiment, we migrate different number of VMs from source to destination, and record the migration time of those tasks.

In order to generate different network link conditions, we initiate background VMs in both the migration source and destination machines. The network performance test-ing application iperf [6] is executed in background VMs to generate traffic from the migration source to destination machine. By tuning the intensity of background traffic, we can have situations where the network links have different available bandwidths.

Besides the background traffic, we also change the memory dirty rates of those migrating VMs. We run a micro benchmark application in the VM, which regularly

(8)

0 5 10 15 10 20 30 40 50 60 70

page dirty rate (k/sec)

migration time (sec)

1VM, 1Gbps link

real migration time simulation time 0 5 10 15 10 20 30 40 50 60 70

1VM, 500Mbps link

real migration time simulation time 0 5 10 15 10 20 30 40 50 60 70

1VM, 300Mbps link

real migration time simulation time (a) (b) (c) 0 5 10 15 10 20 30 40 50 60 70 80 90 100

2VMs, 1Gbps link

real migration time simulation time 0 5 10 15 10 20 30 40 50 60 70 80 90 100

2VMs, 500Mbps link

real migration time simulation time 0 5 10 15 10 20 30 40 50 60 70 80 90 100

2VMs, 300Mbps link

real migration time simulation time (d) (e) (f) 0 5 10 15 20 40 60 80 100 120 140 160

3VMs, 1Gbps link

real migration time simulation time 0 5 10 15 20 40 60 80 100 120 140 160

3VMs, 500Mbps link

real migration time simulation time 0 5 10 15 20 40 60 80 100 120 140 160

3VMs, 300Mbps link

real migration time simulation time

(g) (h) (i)

Figure 5. Real evaluation and simulation results when different number of VMs with various memory dirty rates concurrently migrate through network links with different available bandwidths.

updates VM memory pages based on a memory page dirty rate that is predefined in a configuration file. We modify that configuration in different migration runs to produce VMs with different memory dirty characteristics.

Based on the above settings, we write a script to generate a number of migration situations by changing the VM memory dirty rates, the network link available bandwidth, and the number of concurrent migrations in the link. In each situation, we run VM migrations three times, and compute the average migration time of three repeated runs to represent its performance. We then compare real migration performance with the corresponding output from our simulation tool to verify the accuracy of simulation.

Figure 5 presents some of our evaluation results. It contains 9 figures to demonstrate different migration sit-uations. In each figure, we plot the total migration time

(the y axis) with respect to different VM memory dirty

rates (the x axis) from 1k memory pages per second

to 15k pages per second. The solid lines in the figures represent the results of real evaluations, whereas the dash lines denote the simulation outcome. While Figure

5(a)-(c) presents the performance of a single VM migration in the network link with 1 Gbps, 500 Mbps, and 300 Mbps available bandwidths respectively. Figure 5(d)-(f) and (g)-(i) illustrate the migration performance of 2 and 3 concurrent VMs under different network conditions.

As we can see, our simulation results are close to the real evaluation outputs in all those situations, especially when the network has large available bandwidth or the number of concurrent VMs is small. Note that the eval-uation results are the average of three repeated migration runs. We also check the variance of migration time from repeated runs. While they are not plotted in the figure, it shows that the variance of migration performance in-creases with the number of concurrent VMs, and dein-creases with the available network bandwidth. This is reasonable because the performance of VM migrations may become more unpredictable in resource limited situations. It also explains a few discrepancies between real evaluation and simulation results in figures such as Figure 5(h) and (i). In spite of that, our simulation tool provides an accurate pre-diction of migration performance under various situations.

(9)

In addition to the total migration time, we also use the monitoring tool ‘virt-top’ running in libvirt [9] to record the resource utilization of both physical and virtual machines during the migration process. In Figure 6, we plot the idle CPU utilization in the migration source machine to illustrate the migration process of two VMs with 1GB memory size and 2k memory pages per second dirty rates in a 1Gbps network link. While Figure 6(a) presents the situation where those two migrations occur sequentially, Figure 6(a) presents the results of parallel migrations. Here we use idle CPU time to illustrate the migration process because that measurement presents clear indications about the start and end of migrations. As we can see, there are two transit performance drops in the sequential migration, which coincide with the ‘commit-ment and activation’ stages in two migrations. On the other hand, we see only one significant performance drop in the parallel case because the two migrations complete their last stages simultaneously. Overall the sequential migration consumes around 24 seconds, whereas it only takes 20 seconds to complete the parallel migration. Such a performance gain in parallel migration mainly comes from the network bandwidth multiplexing between two migration tasks, as well as their parallel executions of the ‘resource reservation’ and ‘commitment and activation’ stages as described in Section II.

(a) 0 10 20 30 40 50 60 70 70 80 90 time (sec) idle CPU (b) 0 10 20 30 40 50 60 70 70 80 90 time (sec) idle CPU

Figure 6. Host CPU utilizations when two VMs with 1GB memory and 2k dirty memory pages per second migrate through 1GBps link (a) sequentiallly, and (b) in parallel.

Under the same network situation as in the previous case, we increase the VM memory dirty rate to 15k pages per second, and then compare the migration performance when two VMs are migrated sequentially and in parallel. The results are plotted in Figure 7. It shows that while the migration curves have similar shapes with those in Figure 6, both sequential and parallel migrations take longer time compared with the previous case. This is because that the increase of VM memory dirty rate introduces a lot of extra memory pages need to be transferred during the migration. In addition, the number of generated extra pages is larger

(a) 0 10 20 30 40 50 60 70 70 80 90 time (sec) idle CPU (b) 0 10 20 30 40 50 60 70 70 80 90 time (sec) idle CPU

Figure 7. Host CPU utilizations when two VMs with 1GB memory and 15k dirty memory pages per second migrate through 1GBps link (a) sequentiallly, and (b) in parallel.

in parallel migration due to the limited network bandwidth allocated to each migration task. As a result, the time for parallel migration significantly increases to 38 seconds as shown in Figure 7(b), whereas the sequential migration time only increases to 27 seconds.

We have evaluated many other migration cases. From our evaluations we see that the sequential and simultaneous migrations often outperform each other in various cases according to the different VM characteristics and link conditions. Given such observations of complex migra-tion behaviors, we confirm that software simulamigra-tion is an effective way to predict the migration performance. In the experiments we run our simulation tool extensively to cover as many different migration situations as possible. From the simulation results we can identify the optimal link sharing policy, i.e., the number of concurrent VMs that can lead to the shortest migration time. We have already plotted that result in Figure 2.

B. Migration Performance Evaluation

We use the data generated from our simulation tool to evaluate the performance of multiple VM migrations in a typical two-tiered data center network. There is an aggregation switch at the top of network hierarchy, which connects to eight edge switches via 10Gbps network links. Each edge switch in turn connects to sixteen servers via 1Gbps network links. We create background traffic in the network. While the traffic in 1Gbps network links is randomly generated from a predefined range, the traffic in 10Gbps links is the aggregation of incoming traffic from its associated 1Gbps links. There are four virtual machines in each physical server. Each VM has 1GB memory and the memory dirty rate is randomly chosen between 4k to 8k memory pages per second. As a result, the system consists of 64 physical servers and 256 VMs in total. Note that since our focus is the network impact on migration performance, we assume that all physical machines have enough CPU and memory resources to host VMs.

(10)

We generate different numbers of VM migrations in the system and simulate the migration time. For each migration, the source VM is randomly selected from machines connecting to the left two edge switches, and its destination is from those machines connecting to the right two edge switches. We use the bin-packing method described in Section IV-B to schedule those migrations. Here the network links are configured in the full-duplex mode. As a result, the number of links in each bin, i.e., the

size of the vector C in equation (6), is twice the number

of physical links in the network. The resource demand of each migration is determined by the VM’s memory dirty rate and the available bandwidths of all links along its migration path. Given the bin capacity and resource demand of each migration, we vary the number of VM migrations in the system and simulate the migration time under those different situations. In order to demonstrate the superior performance of our approach, we compare our

results with the performance of the fixed k-simultaneous

migrations, where maximal k migrations are executed

simultaneously in each round and the new round starts only

after the previous k migrations are all completed. In the

experiments we choose k= 4 and 8 for the comparison.

0 20 40 60 80 100 120 0 100 200 300 400 500 600 700 800 900

total number of migrations

migration time (seconds)

4 simultaneous migrations 8 simultaneous migrations bin−packing based scheduling

0 20 40 60 80 100 120 0 500 1000 1500 2000 2500 3000 3500 4000

total number of migrations

migration time (seconds)

4 simultaneous migrations 8 simultaneous migrations bin−packing based scheduling

(a) (b)

Figure 8. The duration of different number of VM migrations under (a) lightly loaded and (b) heavily loaded network conditions.

Figure 8 presents the results of VM migrations when they are executed under two different network bandwidth conditions. While in Figure 8(a) we generate around 500 Mbps background network traffic in each second-tier link to create a relatively light-load network, Figure 8(b) increases the background traffic to have a heavily

loaded situation. The xaxis in both figures represent the

number migrations that need to execute, and the y axis

represents the migration time. The curves are obtained from the average performance of ten repeated simulations, in which the solid and dash lines denote the results of 4-simultaeous migration and 8-simultaeous migration respectively, and the dash dot lines represent the results of our scheduling method. As we can see, when the network is lightly loaded, the 8-simultaeous migration completes faster than the 4-simultaeous migration, because the large network bandwidth allows more simultaneous migrations

to share the resource for accelerating the task. On the other hand, when the network available bandwidth is limited, the 4-simultaeous migration performs better than the 8-simultaeous migration, as shown in Figure 8(b). This is due to the large amount of dirty memory pages generated by the 8 simultaneous migration under the bandwidth limited condition. Nevertheless, compared with the strategy of fixed k-simultaneous migrations, our bin-packing based scheduling works much better in both situations. It can automatically adapt to the network conditions and generate the migration schedule with the shortest completion time.

VI. Conclusions

This paper has proposed a novel method to coordinate multiple VM migrations in enterprise data centers and clouds. It has considered the migration sharing in each network link, as well as the global network bandwidth assignment for migration tasks. While the network link sharing has been addressed by software simulation, we have proposed a bin-packing algorithm to deal with the global resource assignment. As a result, the total time for those migration tasks can be minimized. The experiments have validated the effectiveness of our approach.

References

[1] S. Akoush, R. Sohan, A. Rice, A. W. Moore, and A. Hopper. Predicting the performance of virtual machine migration.

18th Annual IEEE/ACM International Symposium on Mod-eling, Analysis and Simulation of Computer and Telecom-munication Systems (MASCOTS ’10), pages 37–46, 2010.

[2] C. Clark, K. Fraser, S. Hand, and et. al. Live migration of virtual machines. In Proceedings of the 2nd conference on

Symposium on Networked Systems Design & Implementa-tion (NSDI ’05), pages 273–286, Berkeley, CA, 2005.

[3] E. G. Coffman, M. R. Garey, and D. S. Johnson. Approxi-mation algorithms for bin packing: a survey. ApproxiApproxi-mation

algorithms for NP-hard problems, pages 46–93, 1997.

[4] L. Guo and I. Matta. The war between mice and elephants. Technical report, Boston, MA, USA, 2001.

[5] C. Hopps. Analysis of an equal-cost multi-path algorithm, 2000.

[6] Iperf. http://iperf.sourceforge.net/.

[7] Y. Kim. Data migration to minimize the total completion time. J. Algorithms, 55:42–57, April 2005.

[8] B. Korte and J. Vygen. Combinatorial Optimization: Theory

and Algorithms. Springer, 4th edition, 2007.

[9] Libvirt. http://libvirt.org/index.html.

[10] C. Lu, G. A. Alvarez, and J. Wilkes. Aqueduct: Online data migration with performance guarantees. In

Proceed-ings of the 1st USENIX Conference on File and Storage Technologies (FAST ’02), 2002.

[11] A. Tanenbaum. Computer Networks. Prentice Hall Profes-sional Technical Reference, 4th edition, 2002.

[12] Y. Wu and M. Zhao. Performance modeling of virtual machine live migration. In Proceedings of the 4th IEEE

International Conference on Cloud Computing (CLOUD ’11), pages 492–499, 2011.