Fault Injection Design and Implementation

4 Software Fault Injector

4.3 Fault Injection Design and Implementation

The fault injector has been implemented on an Intel Pentium IV system running the Linux RedHat 7.3 (kernel version 2.4.18-3). It has also been tested with Linux RedHat 9 (kernel version 2.4.20-8) and Ubuntu 10.04 (kernel version 2.6.32-31). The dynamic algorithm responsible for the

linking of the fault injector with the OS kernel was implemented using Linux Loadable kernel Modules (LKMs)7_.

The DBench-FI fault injector is based on common characteristics and concepts of modern preemptive multitasking operating systems, which explains its high level of portability, not found in other SWIFI tools. For reasons that are explained below, two mechanisms of modern operating systems are of particular importance in the methodology used by DBench-FI: the memory management mechanism, where any process running on the system is viewed as having its own memory address space, and the process management mechanism, responsible for the implementation of the abstraction which consists on the existence of multiple processes seemingly running simultaneously, even on systems with a single processor. A thorough description of the components and mechanisms of the Linux kernel are described in [Mauerer 2008, Kerrisk 2010, Love 2010].

7_{Loadable Kernel Modules allow a running operating system kernel to be}

dynamically extended, increasing its flexibility concerning the addition of new hardware support or functionality. They are usually used by device drivers and filesystems. Currently, most modern Unix-like operating systems, such as Solaris, Linux and FreeBSD use or support LKMs.

It is worth pointing out that, in Linux, like in all monolithic architectures, the operating system functionality is concentrated within the kernel. Regarding the architecture of the OS kernel, it should be noticed that Linux is considered essentially monolithic8, as it is packed in a single, large, binary image, which includes all its subsystems such as process management, memory management, file systems, etc., and runs in a single address space9_{. However, at the same time, the Linux kernel is also} modular, as it supports the dynamic insertion and removal of code from itself at runtime, and thus compensating some of the known disadvantages of the monolithic kernels10_{. As a consequence, Linux kernel is not}

8_{Despite the Linux kernel incorporates both monolithic and microkernel ideas, it was}

originally developed according the monolithic paradigm in order to avoid the need to develop a message passing mechanism and a module loading architecture, and accelerating the achievement of a ready-to-run and fully operational OS [Maxwell 2002].

9_{The great majority of commercial Unix variants are monolithic. Most notable}

exceptions are the Carnegie-Mellon's Mach 3.0, as well as other Unix-like systems based on this microkernel, such as the MAC OS X and the GNU Hurd operating systems, which follow a microkernel approach [Bovet et al. 2005].

10_{The supporters of monolithic kernels argue a greater efficiency and performance in}

considered a pure monolithic kernel, as it incorporates both monolithic and microkernel ideas.

The kernel function responsible for deciding the next executable task that will be dispatched to the CPU, known as schedule, assumes a special role in the design of DBench-FI. The schedule function is called in the following circumstances: (i) a task yields the processor; (ii) a task blocks in an I/O operation; (iii) a task uses up its time slice (quantum); or (iv) a task is

address space), when comparing to the overhead caused by the necessary message-passing mechanisms that must exist between the various processes of a microkernel. On the other hand, microkernel supporters claim that they force system programmers to use “clean” and modularized programming approaches, which leads to an improved ease of development of new system modules. Other benefits of the microkernel architecture are the dynamic extensibility of the kernel and the ability to swap kernel components at runtime, and, consequently, a more efficient use of the system memory, since the modules are only loaded when they are actually required. These characteristics support the increased flexibility, portability and maintainability of microkernels design when compared to the monolithic variants.

preempted by another task (with higher priority11_{). Figure 4-2 gives a} common view of the Linux kernel architecture, focusing on the interaction between applications, scheduler and hardware.

Concerning the design and implementation of DBench-FI, another important characteristic is the Linux memory management system, which is made-up to be architecture independent. As any modern multitasking operating system, the Linux kernel provides memory protection mechanisms (vital to the system stability), which prevent any attempt, on behalf of a user process, of illegitimate access to a memory area that belongs to another user process or to the kernel itself. Moreover, any user process running on the target system is regarded as having its own virtual

11_{Although the Linux kernel is preemptive (user mode processes may always be}

interrupted), there are some kernel critical regions which cannot be preempted by the scheduler until its execution ends. For this reason the Linux kernel is said to provide soft real-time behavior (its kernel tries to schedule applications within timing deadlines, although it may not always get it). Usually, fully preemptive kernels are associated with hard real-time operating systems, since they ensure the compliance with very stringent timing requirements for scheduling.

memory address space12_{, which includes its code, data and stack areas. A} representation of a user process address space in Linux is shown in Figure 4-3. It is worth noting that the kernel is mapped in the address space of every process, in the top area of its memory address space (from TASK_SIZE13_{to 2}32_{or 2}64_{, in IA-32 systems or IA-64 systems, respectively).}

12_{Virtual memory is referred as the practice of lying to processes about the real}

(physical) addresses at which they reside. To each user process is given the illusion that its address space always starts at 0 and extends from there. It is worth noting that some purists differentiate the concept of virtual memory from the notion of “disk-as-memory”. In fact, although the virtual memory is usually associated with swapping and paging techniques, it can be, in sensu stricto, differentiated from them (the latest techniques refer the OS ability of blending primary and secondary storage, providing to processes the use all of its memory as if it were always available): an OS can give each process a logical address space without making any association between primary and secondary storage [Maxwell 2002].

13_{In Linux, every user process has its own virtual address space ranging from 0 to}

TASK_SIZE (an architecture specific constant defined as a kernel symbol, which represents the maximum size that a user process can access in bytes, i.e., since the space address always starts at 0, it assumes the maximum address that a user process can access+1). On IA-32 systems, for instance, the TASK_SIZE assumes the value of 3 GiB (i.e., 3 × 230_bytes).

Figure 4-2 – The Linux operating system architecture.

Concerning the mapped regions, for a correct understanding of the interconnection of the fault injector and the memory management functions of the OS kernel, it is important to point out the most significant differences that they have with each other. The code segment, referred as Process Code in Figure 4-3, is write-protected and shared by all processes that execute the code it contains. This represents a significant difference when compared to the remaining areas (data and stack), which are private to each process and where writing is allowed. Another fundamental distinction between the code area and the data and stack areas relates to the fact that the first cannot be dynamically reserved. In fact, a Linux user process can dynamically allocate three types of memory: stack, heap and mmaped

Processes Tasks User Mode System mode Hardware Process Stack Unused Memory Process Data (Heap) Process Code Kernel Process Stack Unused Memory Process Data (Heap) Process Code Kernel Process Stack Unused Memory Process Data (Heap) Process Code Kernel Process Stack Unused Memory Process Data (Heap) Process Code Kernel Scheduler

memory14_{. A thorough description of the components and mechanisms of} the Linux kernel are described in [Mauerer 2008, Kerrisk 2010, Love 2010].

As already mentioned, the DBench-FI was initially developed for the purpose of injecting faults in the memory address space of a given process. In its first version, presented in [Costa et al. 2003], it is possible to inject stuck-at-0, stuck-at-1, and bit-flip type of faults in the data segment of any user process (as well as on its stack area). Thereafter, it was added the ability to inject faults in the code segment of any process, as well as the possibility of the injected faults that assume a user defined value through a fault information file, as depicted in Figure 4-1 – The DBench-FI fault injector architecture. In the context of the software fault emulation, the

14_{The range of valid virtual addresses of a process can change throughout its}

lifetime, as the kernel allocates and deallocates memory according to its needs. A process can allocate memory by increasing the size of the heap - raising the program break (the current limit of the heap), through the use of the brk() and sbrk() system calls (upon which the well-known malloc functions are based). A process can also create and free memory mappings into its virtual address space, using the mmap() and munmap() system calls, respectively. The process stack dynamically grows and shrinks as functions are called and returned. Special process registers are used for this purpose, as explained later on this chapter.

possibility of using this new type of faults, together with the possibility of targeting the code segment of any process, enables the use of more representative fault models. In fact, these improvements provided the compatibility of DBench-FI with the state-of-the-art in software faults model – the mentioned G-SWFIT, presented in [Durães et al. 2006].

Figure 4-3 – The process virtual address space in IA-32 systems.

It is worth noting that, as expected, these latest enhancements did not involve any change in the methodology or in the model of the fault injector. It should be also emphasized that, in consequence of the possible share of the code segment across multiple processes, the faults injected in that area may affect the behavior of all processes which share that region.

Process Stack Unused Memory Process Data (Heap) Process Code 0xBFFFFFFF 0x0000000 Program break Available for mmap Kernel TASK_SIZE 232 0 Userspace Kernel space 0xFFFFFFFF

Concerning the design and implementation of DBench-FI, as one of the goals of DBench-FI consists on injecting faults in the address space of any process, including the operating system kernel itself, two different solutions were initially considered, both based in a new process running in kernel mode:

 The interception of the OS scheduler and the detection of the target process in order to access its virtual address space. It is worth noting that the virtual address space of a process is only available when that same process is chosen by the schedule function to use the CPU;

 Access the memory area of the target process through the lookup of the corresponding page table entries used by the memory management system of the OS. It is worth pointing out that the OS kernel maintains a page table for each process, in order to map the virtual addresses of a process to the corresponding physical addresses.

Reasons of clarity, elegance and portability, justified the choice for the interception of the OS scheduler (the first solution considered). In order to detect the time when the target process was chosen to use the CPU, and its virtual memory address space is available for the injection of faults, the DBench-FI dynamically intercepts and changes the OS schedule function. The required fault can then be injected.

In a first step, the address of the kernel schedule function is found, and then redirected to a new function called new_schedule, responsible for both the target process detection and the fault injection. The memory address where the schedule function resides is determined through a search in the

Linux file /proc/ksyms15_{, which contains a list of every symbol that is} exported by the OS kernel (known as kernel symbol table)16_{. This} methodology presents a higher degree of portability across different versions and distributions of the Linux OS, when compared, for example, with the memory pattern search algorithm used in the first version of the fault injector [Costa et al. 2003]. However, this approach requires that the used kernel supports LKMs, which are, however, also required for the dynamic installation of the Fault Injector Core Module. Moreover, considering the benefits of the dynamic extensibility of the kernel, typical of the microkernel architectures, most of the current Linux kernels and distributions are compiled with this option enabled, which is indeed considered as default. It is important to mention that the used methodology requires supervisor privileges, since both the accesses to the LKMs features and to the /dev/ksyms file demands it for security reasons.

15_{The Linux file /proc/ksyms is created on-the-fly when the kernel boots up. For Linux}

kernels version 2.6, and above, the /proc/ksyms file was replaced by /proc/kallsyms.

16_{The file /boot/system.map could also be used for this purpose, since it contains all}

symbols used by the kernel. However, this file is usually used for debugging purposes and, sometimes, it is not available (as it is not required for the OS booting process).

The procedure used by DBench-FI is illustrated in Figure 4-4 and consists of the following steps:

1) Determine the runtime address of the schedule kernel function on the OS kernel symbols table;

2) Copy the first nine bytes of the kernel schedule function (represented by instructions A, B and C in Figure 4-4) to a new function called saved_instructions;

3) Generate a jump instruction with the runtime address of the

new_schedule function (where the target process detection and the

fault injection will take place) and overwrite the first bytes of schedule code with the generated jump instruction;

4) Create a jump instruction in order to execute the saved nine bytes of the kernel schedule function (saved in step (2) to

saved_instructions) after the execution of new_schedule;

5) Create a jump instruction in order to execute the rest of the original schedule function code (from the 10th byte forward of the original schedule function).

It should be noticed that, considering the methodology used by the fault injector, as well as the implementation of the new_schedule function in a high level language (C language), it is fundamental to restore the stack after the identification of the target process and before the jump (step 4) to the original schedule instructions (saved in saved_instructions). Such need is justified for the following two reasons:

1. The compiler, according to the calling conventions, automatically creates a prologue and an epilogue, which allows the use of the

stack for passing data between the caller code and the called

2. The function new_schedule is finished with a jump to

saved_instructions (step 4) instead of using the conventional

epilogue17_.

It should also be noticed that when the fault injector kernel module is loaded, the policy and the main algorithm of the original operating system scheduler remains the same. Additionally, when it is unloaded or removed, the redirections that were made are undone and the scheduler becomes exactly the original.

Concerning the intrusiveness, it is important to enhance that when the fault injector is loaded but no faults are injected, the performance penalty corresponds to ten machine assembly instructions that were added in order to intercept and redirect the scheduler. This fact guarantees a very low and totally negligible intrusiveness, considering the current processors.

17_{The x86 family processors have two general-purpose registers in order to}

manipulate data on the stack: the ESP and the EBP. While the first register points to the top of the stack, the second is used to reference data on the stack. At the end of a subprogram, the original values of the registers are restored (they are previously saved at the start of the subprogram). Detailed information about the stack and the calling conventions are presented in [Carter 2006]

Figure 4-4 – The DBench-FI fault injector methodology.

In document Dependability Benchmarking for Large and Complex Systems (Page 113-126)