Breakpoint-Based Fault Injector - Data-Type-Aware Fault Injection on Multiple Computer Systems

Chapter 4. Data-Type-Aware Fault Injection on Multiple Computer Systems

4.3. Tool

4.3.2. Breakpoint-Based Fault Injector

Both the OS and application software maintain large runtime states for various types of data. Different types of data are likely to show different failure behaviors when the data are corrupted by hardware faults. It is thus necessary to classify fault injection results as a function of corrupted data type, except for types of data that are negligibly small from a statistical point of view. All EFI fault injectors are designed to support data-type-aware fault injection; the exact set of supported data types depends on the type of fault injector.

The EFI breakpoint-based fault injector can inject faults into OS and user programs on commodity CPUs. It does so with the breakpoint-based fault injector module, which resides in the OS kernel. When the OS kernel is hosted by a hypervisor, the same fault injector module can be used, although the performance overhead of fault injection opera- tion varies depending on how the used hardware- or software- breakpoint mechanism is virtualized. That implies that EFI would have different performance overheads on type-I and type-II hypervisors (e.g., hardware-assisted virtualization vs. binary translation).

Breakpoint-based fault injection uses a hardware- or software-breakpoint mechanism in order to obtain the control of a target system when a fault injection target is being accessed. Users can configure the fault injector module by using shell commands or file I/O operations (e.g., proc file system interface16 in Linux). The configuration command includes the target process identifier, breakpoint address, injection target type, injection target address, and error bitmask. The breakpoint-based fault injector module in an injec- tor node (see Figure 4.2) sets a breakpoint on a specific virtual address. If the breakpoint

15_{We use the Jython programming language for plugins in our current implementation. Jython has a sim-}

ple syntax (similar to the wide used Python script) and its programs are dynamically linkable to the EFI controller software written in Java. Thus, the plugin programming interface call is implemented as a proce- dure call from the Jython plugin script to the externally exposed Java methods of the controller software. The Jython and Python projects are at http://jython.org and http://python.org, respectively.

16_{Proc file system (procfs) is a special-purpose file system in UNIX-variant OSes that dynamically pro-}

vides information about resources managed by the OS (e.g., hardware, kernel, and user process resources). Each procfs is organized as a conventional hierarchical file system structure, and is typically mapped under the /proc directory of the root file system.

is triggered, the breakpoint handler emulates a soft error in a target system state. Injec- tions into the following data types are supported for both kernel- and user-level software.

(i) Processor register. The breakpoint-based fault injector can inject faults into gen-

eral-purpose control and data registers as well as special-purpose registers in processors. A breakpoint is set on an instruction in the code memory of the virtual address space of a target process (or any process if the target is the OS kernel for example because the Linux kernel resides in the 4th gigabyte of virtual address space and the kernel space is shared by all processes in the system). We use either a hardware breakpoint feature of a processor or a software-breakpoint mechanism. Software breakpoint is implemented through dynamic rewriting of the OS kernel code for kernel-level injection, or through use of

ptrace17 system calls for user-level injection. The fault injector module uses the context of a target OS kernel or user process. The context is saved at the entry of every interrupt handling event. Such context information (e.g., general-purpose register values) is stored in the kernel stack or the process control block (PCB) of a preempted process, depending on the OS implementation. The breakpoint handler of the fault injector module emulates an error by modifying the target register value saved in the stack or the PCB. The corrupted context is restored to the processor hardware register just before returning from the breakpoint handler to the preempted target process. Thus, when the process resumes, the corrupted register value is visible to and can be used by the preempted target process. If the fault injection target is one of certain special-purpose registers (e.g., an MMX register in x86), the target register value is directly modified by the breakpoint handler, because many special-purpose registers are neither saved at the entry of an interrupt handler nor modified inside an interrupt handler.

(ii) Memory data. In memory, the supported injection targets are the code, static data,

dynamic data, and stack memory segments. A hardware breakpoint is set on the virtual address of a target data and is triggered before the target address is accessed for read or write. The breakpoint handler changes the target memory value (e.g., by using a provided error bitmask) in order to emulate the effect of a fault. If the target is part of statically allocated memory spaces, the symbol information of the binary of a target program or the image of an OS kernel is used to identify the actual type of data or instruction stored in the target virtual address.

17_{ptrace (Process Trace) is a system call that allows the caller process not only to control and monitor the}

execution of another process but also to inspect and manipulate the internal state of another process. This is common in most UNIX-variant OSes that includes Linux because this provides the key mechanism for software debuggers.

4.4. Measurement Method

When the data-type-aware FI is being realized, the main technical difficulty comes from FI targets that are in dynamically allocated resources. We present a data-type-aware fault injection technique for dynamic memory data. This technique consists of: (i) the object

tracker module (see Figure 4.2) that tracks dynamic memory objects and translates a

symbolic identifier (that specifies the type of a memory object) to a virtual address, and (ii) the profiler module that monitors the dynamic memory regions (e.g., size and read/write ratio).

In document From experiment to design – fault characterization and detection in parallel computer systems using computational accelerators (Page 81-83)