Residue Codes - Arithmetic Codes - Hardware Error Detection Using AN-Codes

3. Arithmetic Codes

3.2. Residue Codes

All applications of the Berger code that are known to us are indeed implemented Systems using this code in hardware. In [LTRN92] the Berger code is used in the design of a self-checking

arithmetic logical unit (ALU). The authors present hardware-implemented check bit prediction algorithms for addition, two’s complement subtraction, logic operations, the shift, and the rotate operation.

The authors of [LOBR09] also present a self-checking ALU that is realized using the Berger code. Additionally, this ALU does also provide correction of transient errors using the code to detect errors and a redundant execution to correct errors that occurred.

We will not use the Berger code for our software-implemented error detection for the following reasons:

• The Berger code is not sufficient for an implementation in software. • It does not facilitate the detection of all error symptoms that we defined

in Section 2.5 and might not detect multiple bitflips.

3.2. Residue Codes

Residue codes are systematic and separate codes. The code word for the functional value x is the tuple of x and its residue to a code specific constant A that is greater than 1. Thus, the encoded version of x in a residue code is

xc= (x, x mod A) = (x, xA) with A > 1.

The code parameter A is used to adjust the detection capability of the code. The larger A is chosen, the less probable are undetected errors because the less functional values have the same residue.

A code word is valid if the check bits equal the modulus of A of the functional value, that is, if the following equation holds:

x mod A = xA.

Note that the check bits xA exist redundantly to x. Thus, if using a residue

code, we have to execute additional operations to compute the results’s check bits for each operation. Hence, x and xA of the tuple xc= (x, xA) can be used

to check the validity of xc.

Further known residue codes are multiresidue codes and inverse residue codes. Multiresidue codes, i. e., residue codes that use multiple different residues, can be used to implement error correction as shown in [Rao70] and [Rao74, chapter 5]. For inverse residue codes the check bits are formed as A − xA. According

to [Avi71] these codes are better in detection of repeated-use faults, i. e., faults where a stuck bit is used several times. This kind of fault may occur in circuits that implement shift operations.

24 CHAPTER 3. ARITHMETIC CODES

Table 3.1 summarizes which operations are supported by residue codes and which Supported

Operations _{not. The table depicts for the operations supported how the functional value and}

the check bits, i. e., the residue, are computed. Of course many operations could be implemented using other operations and programming constructs such as loops and branches. The table only presents solutions that apply basic arithmetic and logical operators to the input residues. In particular solutions that require a loop or branch are not presented.

encoded operation implementation

functional value check bits

arithmetic operations:

zc= xc+cyc z = x + y zA= (xA+ yA) mod A

zc= xc−cyc z = x − y zA= (xA− yA) mod A

zc= xc∗cyc z = x ∗ y zA= (xA∗ yA) mod A

zc= xc/cyc z = x/y not directly encodable

signed numbers: supported shift operations:

zc= xc<<cyc z = x << y not directly encodable

zc= xc>>cyc z = x >> y not directly encodable

logical boolean operations:

or: zc= xc||cyc z = x || y zA= xA+ yA− xA∗ yA

and: zc= xc&&cyc z = x && y zA= xA∗ yA

not: zc= !cxc z =! x zA= 1 − xA

bitwise boolean operations:

or: zc= xc|cyc z = x | y not directly encodable

and: zc= xc&cyc z = x & y not directly encodable

not: zc= ∼c xc z = ∼ x not directly encodable

comparisons: not supported

Table 3.1.: Implementation of encoded operations for residue codes. Note that we must not use the functional value z of the result for computing the check bits zA. That would be nothing else than a redundant computation

instead of an arithmetically encoded one. Of course that could be always used as a less safe fallback solution. This solution is less safe because redundancy is susceptible to permanently faulty hardware2.

While the redundant computation of the check bits for addition, subtraction, and multiplication is easily done, we know no solution for the division. Of course the division can be emulated expensively using a loop that subtracts the divisor from the dividend until zero is reached.

The supported arithmetic operations addition, subtraction, and multiplication 2

3.2. RESIDUE CODES 25

support positive and negative numbers as well. Thus, arithmetic with signed and unsigned numbers can be realized using a residue code.

The computation of the check bits for the left shift operation x << y seems to be no problem. Because a left shift is equivalent to a multiplication with a power of two, the residue of the result is (xA∗ 2y) mod A. However, for the

computation of 2y the functional value y is used directly instead of its residue yA. Thus, any error leading to a modification of y will not be detectable. For

encoding the left shift an encoded version of the computation of the power of two is required. This also has to be emulated, for example by multiplications with two in a loop. However, residue codes can only protect the multiplication, but neither the required comparison nor the loop itself.

The right shift operations are equivalent to a division with an appropriate power of two. However, since we do not know a way to compute the residue for the division operation directly, we also do not know a way to compute the residues for right shift operations. Additionally, the right shift also requires the encoded version of the power-of-two computation.

Boolean logical operations are easily implemented using the knowledge how to emulate these operations using arithmetic operations. In contrast to the ANSI-C standard, this implementation makes it necessary to restrict boolean values to 1 representing true and 0 for representing false. Note that the remainders xA

and yA of the boolean values x and y are equal to x and y, that is, either equal

to 0 or to 1. Thus, the implementations of the boolean operations for the residue computations are nothing else then a redundant, different implementation. To summarize, to the best of our knowledge, directly encoding division, bitwise logical operations, comparisons, and control flow statements with a residue code is not possible. These either have to be protected by other means or have to be implemented using operations for which an encoded version exists.

Residue codes can be implemented in software and hardware as well. Let us Implementation look at the following function foo implemented in pseudocode similar to C:

i n t f o o ( i n t x , i n t y , i n t z ) { i n t u=x+y ;

i n t v=u+z ; return v ; }

If this function is protected by a residue code implemented in software, it would in parallel compute the residues of all computation results like the following function foo c does:

( int , i n t ) f o o c ( i n t x , i n t y , i n t z , i n t xA , i n t yA , i n t zA ) { i n t u=x+y ; i n t uA=(xA+yA) % A ; // u % A i n t v=u+z i n t vA=(uA+zA ) % A ; // v % A return ( v , vA ) ; }

Residue codes can detect the following errors: faulty operations and modified Error detection capabilities operands because these will result in non-matching residues. They can also

26 CHAPTER 3. ARITHMETIC CODES

detect data and control flow errors such as exchanged operands and exchanged operators to some extent because the residues, i. e., check bits, are computed separately. Two such errors need to neutralize each other if an exchanged operand or operator shall be not detectable. If in the above example u is erroneously replaced by x, then either x has to have the same residue as u or uA also has to be replaced with xA. Otherwise, the error will result in a mismatch between v and its expected residue vA with a high probability. Finally, residue codes cannot detect lost updates.

We know only of one example application of residue codes. That is the fault- Systems using

this code _{tolerant STAR computer [AGM}+_{71] that was developed at the Jet Propulsion}

Laboratory in the 1960s and used an inverse residue code for error detection. In that computer the code was implemented in hardware. In first versions of STAR an AN-code (see the next section) was used. However, that was replaced because the hardware implementation of a separate residue code was more efficient. We will not use residue codes for our software-implemented error detection because of the following reasons:

• Residue codes do not facilitate the detection of all error symptoms that we defined in Section 2.5.

• They provide no directly encoded version of division. However, for encoding right shifts an encoded division is required. Especially for bitwise logical operations and unaligned loads and stores shift operations will be needed and should not be slowed down additionally by an emulated division operation, that is, a division that is implemented using other directly encoded operations.

In document Hardware Error Detection Using AN-Codes (Page 35-38)