Case Studies - LnCm fault model : complexity and validation

target the ISA registers.

Similar to the work presented in this chapter, Ayatolahi et al. [9] mimicked bit- flips in registers of a real hardware platform. In addition, they investigated the impact of L1C1and L1C2on program execution. The work presented here differs from that presented in [9] mainly in the DBU fault model assumed. The DBU fault model in [9] selects a single location and flips two bits in that location, while in this work, in addition to the L1C1 model assumed in [9], the L2C1 fault model that chooses two locations and flips one bit in each location is also assumed. Another difference is: in [9], faults are also injected in memory words and the bit-error sensitivity for different target locations is investigated, in order to provide an insight regarding the results. Both the work presented here and that presented in [9] reported a higher level of benign (No Impact) executions for SBUs and a higher proportion of crash failures for DBUs.

6.2 Case Studies

As an assessment on how fault models impact on program executions, a series of experiments is conducted, using soft-error injections on seven embedded software systems where single bit and double bits errors were injected into CPU registers of the target systems. The aim of the study is to investigate how failure mode varies for different fault models. The first set of evaluations focused on how error resilience and error sensitivity varies for the different fault models, and the second set of evaluations focused how error error resilience and error sensitivity varies for different target locations. Error sensitivity is taken to be the probability that a fault causes software system failure as defined in Chap- ter 3 and error resilience taken to be the probability that a fault does not result in a program failure defined under the failure scheme in Chapter 3.4.3 .

6.2. CASE STUDIES 68

(See Section 3.2). This section describes the parameters adopted for the target programs to use them with the fault injection tool described in Chapter 3 (See Section 3.4.2.

6.2.1 System Instrumentation

The LLFI tool was used for all fault injection experiments conducted in this the- sis. The fault injection experiments reported in this chapter are undertaken on seven programs: derivatives, step, cubic, Isqrt, Corners, Step and Smoothing. To perform fault injection with LLFI, the source code of the software system is first compiled into a single IR byte-code. Twelve variables were instrumented in each target program. A golden run was created for each program. Three fault models, L1C1, L1C2 and L2C1 (See Chapter 3.2), were adopted for the fault injections experiments in this chapter. In line with the fault models assumed in this chapter, bit-flip faults were injected into bit-positions for all instrumented program variables. Nine different input sets are selected for each target program. The combination of input and target program is called an execution flow. This means, for each target program under each fault model experiments were conducted for nine execution flows. The target systems and their input data are described in Section 3.3.

For the L1C1 model, each fault injection input execution flow entailed a single bit-flip in a program variable at one bit-position, i.e., no multiple fault are introduced in any single input execution flow. For the L1C2 model, each fault injection input execution flow entailed a double bit-flip in a program variable at two bit-positions, i.e., not more than two faults are introduced in any single program execution, and only a single variable can be targeted. For the L2C1 model, each fault injection input execution flow entailed single bit-flip in two program variables at one bit-position each, i.e., exactly two variables are targeted in any single input execution flow, and no multiple faults can be inserted

6.2. CASE STUDIES 69

in any variable.

6.2.2 Experimental Procedure

Before commencing the fault injection experiments, the CFG of IR byte-code of the program was partitioned into three parts, namely: (i) early, (ii) central and (iii) late. These partitions are defined as block locations (or blocks for short). From each block location, four target locations, were choosen at random, i.e., target locations are partitioned and selected according to their placement in the IR byte-code of the program, and also according to their execution location in the program. A target location (or location for short) is defined as a given register used by the program. When an L1C1error is injected, a single location is selected. On the other hand, two locations are selected for L2C1 errors. A fault injection experiment is the injection of an error under the assumed fault model in a given target location or pair of locations. A fault injectioncampaign for a fault model is a set of experiments for a given input and program, i.e., a set of experiments for an execution flow.

Once a location (or pairs of locations) have been selected, bit-flip errors were then injected exhaustively in the locations to cover all possible combination. Errors are only injected in target location(s) immediately before the target location is read to avoid unnecessary overwrites. For each selected location, fault is injected only once during the execution of the program. For L1C1 and L1C2 errors, there are twelve target locations for each campaign and for L2C1 errors there are x_r

target locations pairs,x= 12 being the number of all chosen target locations andr= 2, the number of locations to target in any given experiment. Under the L1C1fault model,nexperiments were conducted in each target location,n being the length of the register. A total 48,384 L1C1s were introduced in the various programs. For L1C2 errors, n_r experiments were performed in each location,nbeing the size of the target location andr= 2 (the number of

6.2. CASE STUDIES 70

bits to flip in the said location). A total of 1,524,096 L1C2 errors were injected across the different target programs. Under the L2C1 model, for each location pair, n×m experiments were undertaken,m, nbeing the length of the target locations. Overall, a total of 17,031,168 L2C1 errors were injected into the target programs.

Table 6.1: Register classificaiton scheme

Type

Data Type

ADD/FADD Returns the sum of its two operands. Binary Control or Other SUB/FSUB Returns the difference of its two operands. Binary Control

or Other MUL/FMUL Returns the product of its two operands. Binary Control

or Other DIV/FDIV Returns the quotient of its two operands. Binary Control

or Other SHL Shifts a value to the left a specified number

of its, and returns the shifted value. Binary Other

BITCAST

Converts a value to a second type without changing any bits. It returns the converted value.

Casting Other

SITOFP

Converts a signed integer value to a floating point value. It returns the converted value.

Casting Other

ZEXT

Zero extends its operand to a larger size, i.e., fills the higher order bits of the value with zero until it reaches the size of the destination type.

Casting Other

LOAD

Specifies the memory address form which to load. The location of the memory pointed is loaded.

MAAC Control or Other

ALLOCA

Allocates memory on the stack frame of the currently executing function and returns a pointer.

MAAC Pointer

GETELEM

Gets the address of a subelement of an aggregate data structure. It performs address calculations only and does not access memory. It returns a pointer to an element.

MAAC Pointer

Further, target locations were categorised based on register instruction type and the type of data held in the register. Table 6.1 depicts register classification scheme.

6.2. CASE STUDIES 71

A target location is classified based on the type of operation it performs (see Table 6.1) as follows:

• Binary: If it performs binary or bitwise binary operations in a program, i.e., it performs computaions, such as adding, on two operands and returns the results of the operation.

• Casting: If it performs bit conversions operations, such as casting, con- verting value of one data type to another data type.

• Memory Access and Address Computation (MAAC): If it loads

data from memory, allocates memory on stack or gets address of a subelement of an aggregate data structure.

For L2C1 errors, target location pairs are classified as above only if both locations belong to the same category, and locations of mixed categories are classified as follows:

• Binary and Casting: If one location is Binary and the second is Casting. • Binary and MAAC: If one target location is Binary and the other is

MAAC.

• Casting and MAAC: If the target location is made of Casting and

MAAC pair.

A target location is classified based on the type of data it holds (see Table 6.1) as follows:

In document LnCm fault model : complexity and validation (Page 95-100)