Live Migration Basics - Adaptive Resource Relocation in Virtualized Heterogeneous Clusters

In virtualization, the term migration means moving a running instance of a virtual machine (a guest operating system) from one physical host to another. Migrating operating system instances across distinct physical hosts is a useful tool for administrators of data centers and clusters: It allows a clean separation between hardware and software, and facilitates fault management, load balancing, and low-level system maintenance [46].

Xen provides two types of migration techniques, ‘live’ and ‘non-live’. In the case of non-live migration (simply called migration), the VM is stopped and the memory pages are transfered to the destination VMM. During this phase the VM is not able to respond to any external stimuli. This represents the downtime for the migrating VM. Once the memory pages are transfered, the VM is resumed at the destination VMM. The downtime is in order of seconds on traditional GigE infrastructure [46]. Xen utilizes the pre-copy migration technique [46] to achieve live migration. The design has a total of six stages as shown in Figure 3.4. The first two stages ensure that the source and destination VMMs are compatible and the destination

3.5 Live Migration Basics

VMM has sufficient resources available to entertain the new VM.

Stage 0: Pre-Migration Active VM on Host A

Alternate physical host may be preselected for migration Block devices mirrored and free resources maintained

Stage 1: Reservation

Initialize a container on the target host Stage 2: Iterative Pre-copy Enable shadow paging

Copy dirty pages in successive rounds. Stage 3: Stop and copy

Suspend VM on host A

Generate ARP to redirect traffic to Host B Synchronize all remaining VM state to Host B Stage 4: Commitment

VM state on Host A is released Stage 5: Activation

VM starts on Host B Connects to local devices Resumes normal operation Downtime (VM Out of Service) VM running normally on Host B VM running normally on Host A

Overhead due to copying

Figure 3.4: Stages in default Xen migration [courtesy: [46]].

In the third stage (called iterative pre-copy), the memory pages of the VM being migrated are transfered from the source VMM to the destination VMM. This pre-copying occurs in bounded iterations or rounds. A code inspection revealed that the maximum bound is 30 iterations. In the first iteration, all of the memory pages are transfered to the receiving VMM host. In the subsequent iteration, only those memory pages which were modified (dirtied) after the previous iteration are transfered . The developers suggest that this technique results in less pages being transfered during each subsequent iterations and therefore the downtime of the VM is minimized. Once a certain threshold of the memory pages being dirtied is achieved, the stop and copyphase is activated. In this phase, the running VM on the source VMM is stopped and the remaining dirty memory pages, the CPU state etc. are transfered to the destination VMM. In the next two phases, the source and the destination VMMs do the final acknowledgment and the VM is activated on destination VMM. Both live and non-live migration techniques require the VM file system to be hosted on a shared network resource like network file system or a storage area network. As VMWare’s VMotion and Microsoft’s Hyper-V live migration are proprietary software, not much information is available regarding the internals of migration. Clark et.al [46] state that the VMWare’s VMotion

architecture is similar to the Xen’s live migration.

Live Migration vs Process Migration

Process migration is the act of transferring a process between two machines. It enables dynamic load distribution, fault resilience, eased system administration, and data access locality [81]

Process migration [91, 32, 50], was a hot research topic in systems research in the late 90’s. The approach has seen very little use for real-world applications [46]. Milojicic et. al [81] state the complexity of implementation and dependency on an operating system as the main obstacles to the wider use of process migration. The complexity includes the handling of the residual dependencies that a migrated process retains on the machine from which it migrated. Examples of residual dependencies include open file descriptors, shared memory segments, and other local resources. Another issue with process migration is the transparency of migration from the user. In most of the migration solutions, the application needs to be migration-aware. For example, LSF [109] employs check-point and restart facility for the process migration but this requires the application to be rewritten with checkpoint and restart facility.

One can use OpenMPI’s checkpoint and restart facility [65] to migrate an MPI application without any change in the application source. However, our experiments suggest that migration time is in the order of minutes and not suited to our research. In any case, this facility was in infancy at the time we started our research.

In contrast, the migration of virtual machines provide a better alternative. Due to its robust nature and no residual dependencies, migration of virtual machines has found its way into most of the mainstream operation system distributions e.g. Kernel based virtual machine (KVM) [4] is part of Linux kernel since version 2.6.21 and Microsoft’s Hyper-V [14] is distributed with Windows 2008 server. The live migration is also operating system independent, which makes it very attractive for Grid and Cloud computing.

In document Adaptive Resource Relocation in Virtualized Heterogeneous Clusters (Page 54-56)