Pwnetizer - Initial Considerations - Pwnetizer: Improving Availability in Cloud. Computing thro

3.4.1 Live Migration as a Starting Point

At first glance, VM Cloning sounds relatively simple. Why not just conduct Live VM Migration and leave the source VM turned on at the end? This would leave us with two VMs with the same main memory contents. However, both VMs would be writing to the same disk file, which is problematic because normal Operating Systems assume that they have dedicated access to the local hard disks.

As a result, neither of the VMs would be able to determine what the other VM may have written to disk, which inevitably leads to data loss and corruption after some time. For two VMs to be able to properly share the same disk file, a clustered filesystem and a custom OS would be required.

Nonetheless, such arrangement would mean that a single compromised VM could take over the secondary storage used by all other clone VMs, which is undesirable.

Suppose that we decide to leave the source VM turned on after migration. For us to rule out persistent storage conflicts and guarantee VM independence, we require a disk cloning mechanism to ensure that the original VM and the clone VM are writing to separate disk files, which must be completely identical at cloning time and can diverge thereafter. Unfortunately, given that disk files represent a VM’s entire filesystem (i.e., OS files, application binaries and data, and user documents), they can be several Gigabytes in size. For the cloned disk file to be fully-consistent with respect

to the clone VM’s main memory state, the disk cloning procedure must be performed during live migration’s stop-and-copy phase, when both VMs are suspended and no writes are being issued to the original VM’s disk. Consequently, disk cloning becomes the main downtime bottleneck, taking into account that modern hard disks have write speeds in the order of 50 MB/s, which means that making a copy of a 1GB disk file would require roughly 20 seconds of downtime. Our Pwnetizer cloning strategy minimizes this downtime by maintaining a local mirror of the networked filesystem share, which will be covered in Section 3.5.3.

Leaving the source VM turned on after Live VM Migration leads to another problem: the two VMs will end up having the exact same network configuration (i.e., MAC and IP addresses), which translates into connectivity issues for both. Hence, one of them must acquire a new network identity to avoid networking conflicts. The main challenge in this case is for the clone VM to detect that cloning has happened so that it can begin reconfiguring its network. A way of doing this is described in Section 3.5.5.

From this discussion, it is clear that Live VM Migration serves as a good starting point for VM Cloning, as it results in a new VM with fully-consistent main memory state. Nonetheless, two non-trivial challenges (secondary storage independence and network reconfiguration) must be dealt with for the two VMs (original and clone) to coexist with each other.

3.4.2 Precopy vs Postcopy

In this section, we summarize the two most popular memory migration algorithms and evaluate their suitability for the VM Cloning scenario.

Pre-Copy

The pre-copy algorithm proposed by Clark et al. [8] keeps the source VM running for most of the migration procedure. It uses an iterative push phase, followed by a minimal stop-and-copy phase. The iterative nature of the algorithm is the result of what is known as dirty pages: memory pages that have been modified in the source VM since the last page transfer must be sent again to the destination VM. At first, iteration i will be dealing with less dirty pages than iteration i − 1.

Unfortunately, the available bandwidth and workload characteristics will make it so that some pages will be updated at a faster rate than the rate at which they can be transferred to the destination VM. At that point, the stop-and-copy procedure must be executed. The stop-and-copy phase is

when the CPU state and any remaining inconsistent pages are sent to the new VM, leading to a fully consistent state.

Post-Copy

Post-copy migration defers the memory transfer phase until after the VM’s CPU state has already been transferred to the target and resumed there. As opposed to pre-copy, where the source VM is powered on during the migration process, post-copy delegates execution to the destination VM. In the most basic form, post-copy first suspends the migrating VM at the source node, copies minimal processor state to the target node, resumes the virtual machine at the target node, and begins fetching memory pages from the source over the network. Variants of post-copy arise in terms of the way pages are fetched. The main benefit of the post-copy approach is that each memory page is transferred at most once, thus avoiding the duplicate transmission overhead of pre-copy [25].

Suitability for VM Cloning

The key difference between VM Migration and VM Cloning is that the former switches computation from one host to the other, whereas the latter must end up with computation running on both hosts. Figure 3.2 contrasts the pre-copy and post-copy procedures’ timelines. From a performance standpoint, post-copy hinders the VM’s performance the most because page faults must be fetched over the network, which takes something in the order of milliseconds as opposed to the nanoseconds that local DRAM accesses take. On the other hand, the total amount of data transferred is larger in the case of pre-copy due to its iterative nature; post-copy transfers each memory page only once because page dirtying is taking place at the destination side. Consequently, it is hard to decide between pre-copy and post-copy solely based on performance.

VM Cloning puts two new dimensions on the table: secondary storage consistency and network-ing conflicts. At the end of VM Clonnetwork-ing, both VMs must have internet connectivity and be assigned a secondary storage device that coincides with their operating system’s view of the filesystem. The easiest way to tackle connectivity in VM Cloning is to leave the original VM’s network configuration untouched and have the clone VM drop all ongoing sessions and assume a different IP+MAC address set. Under the pre-copy scheme, the source VM remains active while its memory pages are trans-ferred, so connectivity is not an issue. Meanwhile, the post-copy scheme initially pauses the source VM and starts the clone VM, requiring an immediate update in the LAN’s packet forwarding rules (through a gratuitous ARP reply) to preserve connectivity and keep TCP sessions up and running on the clone VM. At the end, the source VM will have to assume a new network identity, given

Figure 3.2: The timeline of (a)pre-copy vs (b)post-copy migration. Taken from [25].

that its clone will have taken its place inside the network. Of course, this is burdensome because packet forwarding rules have to be forcefully updated to redirect TCP/IP packets to the clone VM even though the source VM is to come alive once again. Nevertheless, this inconvenience might be justified by the guarantee that post-copy provides in terms of network efficiency (i.e., each memory page needs to be transferred only once).

Secondary storage consistency is the defining criterion when evaluating suitability in the VM Cloning scenario. When employing pre-copy, page dirtying is happening on the source VM’s side;

hence, by the time the clone VM comes alive, both VMs will possess a main memory state that is consistent with the source VM’s secondary storage. Thus, the problem in that case lies in making a copy of that secondary storage for the clone VM to use. Post-copy further complicates things by having the dirtied pages in the clone VM’s side once the process is over, which means that the clone VM’s memory state will be consistent with a secondary storage that does not correspond to the one that existed when the source VM was paused. Consequently, the source VM will travel back in time when it is resumed and will require a copy of the secondary storage made at the beginning of the cloning process. This is clearly suboptimal, as it will force some computations that have already been carried out by the clone VM to be executed once more inside the source VM. It may also trigger system clock issues inside the source VM as a result of having paused its Operating System for an extended period of time (several seconds), which is a problem when maintaining application logs.

Simply updating the clock time is not a viable solution, as it will lead to time-based jobs (e.g., cron jobs on UNIX) being skipped. In addition, mechanisms to speed up the clock to get to the correct

time (e.g., ntpdate on UNIX) require a reliable NTP server or Hypervisor support (e.g., VMware Tools) and are OS-dependent.

Taking all factors into account, the pre-copy page transfer algorithm seems to fit better with the VM Cloning scenario than its counterpart. For this reason, Pwnetizer will extend pre-copy rather than post-copy in order to materialize full VM Cloning with negligible downtime.

In document Pwnetizer: Improving Availability in Cloud. Computing through Fast Cloning and I/O. Randomization (Page 39-43)