7.3 Virtualization and Emulation Mechanisms
7.3.1 Virtual Machine Based Replay
Virtual machine mechanisms [15; 10; 33] may allow replay on a different host environment from recording, but typically rely on the availability of complete virtual machine image,
including all software code, its entire file system and additional file snapshots, to resume the execution. Not only does this require a large amount of data, but this is often impractical for bug reproducibility as customers are unlikely to allow application vendors to have an entire replica of all of their custom proprietary software. Decoupling provided by partial checkpointing is also different from that of virtual machines, which may decouple replay of an application in a guest operating system from dependencies on the hosting environment, but still require during replay the complete virtual machine image with all of the installed binaries used at the time of recording.
The space requirement of virtual machine snapshots would be several orders of magni- tude higher than that of partial checkpoints. Since VM snapshotting involves checkpointing the state of the entire operating system and its applications, including the state of secondary storage, the amount of data in each snapshot is large and it can take several tens of sec- onds or minutes to complete. Crosscut [11] aims to extract a subset of data offline from a complete recording of a VM to reduce the size. However, it still requires a heavy weight instrumentation during recording and the original log it generates is large. On the other hand, since vPlay only captures the most relevant application level state, it is able to take several partial checkpoints of the application per second. The high checkpoint frequency also allows for quick forward and backward movement of execution during replay. Further- more, virtual machine based logging imposes high runtime overhead given the large number of low level hardware events. For instance, only a fraction of the network traffic processed by a virtual network card would be visible to the application and consumed by it.
Chapter 8
Applications and Extensions
8.1
Debugging as a Service
Cloud computing is an emerging service paradigm where managed virtual assets are offered to the users as a service by the cloud service provider. The assets typically consist of a preconfigured operating system or application platform packaged as instances of virtual machine appliances. Since the user or the customer does not have to acquire dedicated resources and manage them, cloud resources are used by businesses to simply their infras- tructure by out sourcing their operations to the cloud provider.
Minimizing application downtime is a common objective for cloud providers, end-customers and software vendors alike, yet problem determination in a cloud environment remains an elusive and time-consuming process. As legacy applications adapt to the incipient cloud en- vironments which are significantly different in their response characteristics to the physical hardware, many latent application bugs surface. A cloud environment hosts several virtual machines with over-committed memory and CPU resources on the same physical hardware in a multi-tenant configuration. The sharing of resources leaves the applications vulnerable to interference from other virtual machines and triggers unexpected behavior.
Most existing debugging tools are designed to be used by the developers in a development environment and are unsuitable for a cloud ecosystem. When an application fails in a managed cloud environment, the user has little access to or control over the underlying environment. Since traditional debugging processes cannot be applied, the user has to
depend on the cloud itself for assistance.
While the cloud computing model presents certain challenges to applications which are trying to adapt to the cloud, it also provides an excellent substrate for experimental and innovative services. Features and services involving multiple software components, requiring custom changes and configuration to the software stack could be easily deployed on the cloud in a contained manner. For example, one of the hurdles to a widespread adoption of vPlay is that the vPlay system needs to be installed on the platform where the application runs in production, and at the target where the developer analyzes and resolves the problem. Sophisticated debugging tools such as vPlay with rich feature set and support for a wide range of unmodified applications often rely on specific kernel extensions to gain access to the application internals, which may not be supported by main-stream kernels. However, deploying these extensions into existing environments is challenging in practice. Any kernel extensions deployed as kernel modules on a host platform may violate the service agreement with the operating system distribution vendor, increasing the adoption barrier.
In this section, we present a practical debugging framework and a model with emerging cloud computing eco-systems as a reference. The framework consists of two components, namely recording and replay appliances. The recording appliance is a part of the cloud infrastructure and it produces a recording of bugs encountered by an application in the cloud. The replay appliance is provided as a simple hardware device which reads previously generated recordings and reproduces the bugs captured within them for the developer to analyze.
The solution benefits the end-users by minimizing the application downtime, the appli- cation vendors, by enabling them to quickly fix the problems, and the cloud providers, by enabling them to offer value-added debugging services.