3.3 Partial Checkpointing in User Space
3.3.1 Instrumentation Framework
vPlay-user uses an instrumentation architecture based on a small change to ptrace func- tionality which allows a process to set it as a ptrace parent to itself. As per normal ptrace semantics, a process attempting to attach to itself as a debugger results in an error. It may be possible to attach to the application through an external process. However, ptrace is originally designed for debugging and its runtime performance is unacceptable for common- case use. The simple extension provided by vPlay-user allows a process to be notified of events generated by itself in addition to any external process, such as a debugger, which may have registered to receive those notifications. In the conventional debugging paradigm, a separate debugger process controls the execution of the debugged process. The kernel no- tifies the debugger through a SIGSYS signal whenever the debugged application encounters events such as system calls, receipt of a signal, process completion etc. The debugger is allowed to take necessary actions before the event is processed. In such a model, each ap- plication event generates several context switches between the debugger and the debugged application resulting in high overhead. The ptrace extension implemented by vPlay-user avoids this context switching overhead by allowing the application itself to be the debugger. The challenge, however, is to transparently embed code into the process address space to handle the debugging events posted by the kernel. The rest of this section describes the architecture that permits this operation.
Figure 3.3: User space instrumentation via vPlay-user agent
The application is started by an initial process called, vPlay-user agent. vPlay- user agent is implemented as a self-contained statically linked application program with
its load address chosen to be in an address range not commonly used by the applications. On Linux/x86, we have chosen the address range, 0x08000000 - 0x08031000, to load the vPlay-user agent. Common Linux/x86 applications don’t use addresses below 0x08048000. As a part its initialization, the agent maps a region of memory with MAP SHARED attribute. The agent exclusively uses this region to hold its internal data structures. Any state cre- ated by the agent residing in one process could be accessed by its counterparts in other processes via the shared memory region. Agents in different processes or threads of the ap- plication communicate through the shared memory region arbitrated through futex based synchronization.
The agent starts the application by creating a child process and directly mapping the memory regions and segments described by the application’s binary into memory. Loading of an application binary is typically performed by the kernel as a part of the exec system call. vPlay-user agent, however, implements the exec operation in user space. The goal in doing so is to retain vPlay-user agent’s own memory regions within the application’s address space as new memory regions of the application are added. Invoking the standard exec system call would cause all existing memory regions along with the agent to be unmapped and replaced with new memory regions specified in the application binary. Performing exec in user space allows vPlay-user agent’s interception routines to be embedded within the application’s address space.
vPlay-user agent installs a signal handler for the SIGSYS signal before transferring con- trol to the application by jumping to the application’s start address. Once the application takes control, any events posted by the ptrace subsystem will be handled by the signal handler which is a part of the vPlay-user agent. Installing a signal handler for SIGSYS is disallowed by intercepting and disabling the sigaction family of system calls. The signal handler takes necessary actions such as recording the system call return value and argument data when SIGSYS arrives. Application passes the system call arguments in processor reg- isters which are saved on the signal stack by the kernel when the SIGSYS signal is posted to the application. Within the signal handler context, the agent is able to process the system call arguments by reading and writing to the signal stack. System call return value can also be altered as desired by modifying the respective register on the signal stack. The agent
can virtualize the system call, emulate or nullify it, or process it in any other way it wants by calling the same system call with altered parameters or other system calls, all in user space.
ptrace notifications are disabled whenever the agent runs its own code. A flag in the ptrace extension indicates whether to post system call events to the application. When a monitored application thread makes a system call, the ptrace extension first resets the flag and sends a SIGSYS signal to the thread. Since the notifications are disabled, any system calls made by the agent code that runs as a part of the SIGSYS handler would not cause further signals to be issued. After the agent completes its processing, it re-enables the notifications and returns control to the application.
The memory region occupied by the vPlay-user agent is marked read-only while the application code runs so that potentially buggy application code does not accidentally cor- rupt the agent’s memory. Any attempts to change the region permissions by the application are disallowed by intercepting the mprotect system call, which is the only user interface to change memory permissions. The permissions for the agent’s memory region are changed to read-write by the ptrace kernel extension before posting the SIGSYS signal. This mech- anism ensures that the agent retains secure control over the application’s system calls even though it is a part of its user address space. The agent changes the permissions back to read-only before returning control to the application code.
vPlay-user agent emulates the sigreturn system call which is normally called implic- itly at the end of the signal handler. At the end of the system call processing, control is transferred back to the application by returning from the signal handler. Normally, the signal stack is setup such that the application automatically calls the sigreturn system call when it returns from the signal handler. However, that would trigger another debug event resulting in an unbounded recursion. To avoid the problem, vPlay-user adjusts the signal stack to remove the implicit call to sigreturn and instead directly performs the sigreturn operation in user space. It involves appropriately loading the processor registers saved on the signal stack and jumping directly to the application code.
The initial process started by the agent and the entire process hierarchy rooted at the initial process automatically inherit the instrumentation. The agent’s code and the shared
memory region are automatically mapped into any children of the agent program and their successors at the same address, as a part of the fork system call. The signal handler state is also inherited by the children processes and threads, resulting in any signals received by the children threads to be directed to respective agent stubs mapped at the same address in all the processes. Events generated by children threads are posted to the respective threads and handled by respective copies of the vPlay-user agent embedded within the host process of the thread.
vPlay-user also installs the signal handlers for other signals such as SIGSEGV and SIGBUS, which indicate error conditions. If the application itself attempts to install a signal handler for a specific signal, the corresponding system call is intercepted and the function pointer of the application signal handler is separately saved to be called later. It allows vPlay-user to intercept exceptions caused by application failure, such as a segmentation violation or divide by zero, to cause checkpoints to be written to disk.
vPlay-user agent can be controlled by sending signals to the target application. vPlay- user registers a signal handler for a reserved user signal to enable external process to com- municate with the agent. To control the behavior of the agent or direct it to perform an action, the user can simply send the reserved user signal to the respective thread. In par- ticular, it allows an external process to periodically send the start and stop commands to enforce periodic partial checkpointing and to stop recording and write the checkpoints at any time based on an external fault detection system. Since the reserved signal sent to the application is first processed by the embedded vPlay-user agent, it is able to take necessary action before calling the original signal handler that the user may have installed. When the application is not being recorded or replayed, vPlay-user agent silently forwards all system calls and signals to the application. When the application fails due to an exception such as SIGSEGV, any data recorded until that point is saved to disk.
The basic start and stop primitives that control partial checkpointing are implemented by the vPlay-user agent. A recording interval commences with an external process sending all application threads a reserved signal to have them reach a barrier. At the barrier, the agent first records the current processor context of the thread. Register context is obtained from the signal stack and descriptor entries used by the thread are obtained through the
API provided by the operating system. On Linux, get thread area is used to read the GDT and modify ldt, to read the LDT.
Like the kernel implementation, when an application is recorded, each thread within the application undergoes recording. Every input that crosses the application boundary is intercepted and recorded. Each thread in the application records its private processor state and one thread per-process records the common memory state. Partial checkpoints generated are stored in separate buffers by the vPlay-user agent within respective processes, and written to disk on request or when a failure is detected.