5.4 End-to-End Register Value and Dataflow Validation
5.4.1 Implementing End-to-End Residue Checking
Arithmetic codes have been deeply studied in the past for protecting data but also for protecting arithmetic and logic functional units (computation). They are based on attaching a redundant code to every data word. While data is protected by verifying the associated redundant code, arithmetic operations are protected by operating in parallel the data and the codes. This is, arithmetic codes are preserved by correct arithmetic operations: a correctly executed operation taking valid code words as input produces a result that is also a valid code word. Several arithmetic codes exist (see Section 3.2), such as AN codes, Berger codes, residue codes and parity codes.
We choose residue codes [11, 58, 96] to build a system where register values and computation is covered against errors. Among the different available separable arith- metic codes, the size of a residue code is much smaller than the size of a Berger code, and also the residue functional units require much less area than Berger functional units [96, 105]. Compared to parity prediction, residue codes are less invasive and cheaper for wide multipliers and adders [134].
Residue codes are based on the property that the residue of the result of an arith- metic operation can be computed from the residues of the operands as well as through a modular division of the result. Given two input values N1 and N2, and R being the chosen residue value, the arithmetic property ((N1 mod R) • (N2 mod R)) mod R = (N1• N2) mod R, holds true for most of the common operations ’•’.
Figure 5.4 shows a typical implementation of how residue checking works. The computation ⊗ is performed independently for both the regular data (operating A and
5.4. End-to-End Register Value and Dataflow Validation
·
71 » » ¼ ½ ¾ ¼¿ ¾ ¼½ ¼¿ À ¼ ½¿ Á  Á  À ¼Ã ÄÅÆ ¼ ¼ ¼Ã Ç Ç Ç Ç ÇÇFig. 5.4: Concurrent error detection with residue codes
B and producing O) and the redundant codes (operating RA and RB and producing RAB). Then, in order to verify that both the data values A and B as well as the functional unit operation are correct, the redundant code of O is computed through function R(O) and compared against RAB. A mismatch indicates an error.
If R is in the form of R = 2k−1 for some k (for example R being 3, 7, 15, etc.), the residue code is called low-cost, because it allows a simple calculation of the residue value.3 It is important to note that low-cost residues leave one value of the code unused (specifically, the value 2k). The reason is that residues of the form 2k cannot be used, because any fault affecting the word at position i, where i ≥ k, will remain undetected. From a fault coverage perspective, if multiple faults add or substract a value by a multiple of 2k− 1, the faults will be undetectable (faults that alias back into the same residue value). A modulo-3 residue can detect not only all single-bit errors, but also most of 2-bit errors. When using a low-cost residue, burst faults of up to k − 1 bits are guaranteed to be 100% detectable [11, 13, 213]. We choose R = 3; previous works [105, 141, 189] show that the implementation costs are rather small. It will be discussed in Section 5.6.
The research community and the industry have proposed, for most of the common operations, effective residue functional units (this is, functional blocks computing the expected results’s residue from the operands’ residues).
Residue functional units have been studied for integer arithmetic operations, in-
3
The residue of an n-bit number is computed by dividing the binary number into k-bit chunks, and then summing these numbers through modulo-k addition. This allows the implementation of the residue encoders to be extremely simple, because no division or multiplication is needed [57, 210].
72
·
Chapter 5. Register Dataflow Validation From RF (source 2) Execution port ok? From RF (source 1) From Bypasses From Bypasses DataOpcode Tag comparison
1 2 To RF (write-back) T o B y p a s s e s T o B y p a s s e s Destination Tag RA RB Residue check RO
Fig. 5.5: End-to-end residue checking: extensions in the backend logic (residue hardware is shown colored)
cluding addition, substraction, multiplication, division and square root [96, 141, 152, 153, 169, 189, 210]. Similar ideas have been also applied to logical operations, includ- ing AND, OR, XOR operations [19, 58, 125, 177, 213] as well as shifts [74]. Residue functional units for single precision and double precision floating point operations (such as addition, substraction, multiplication, division, multiplication with addition and multiplication with substraction) are also supported [46, 68, 76, 77, 105, 124]. Residue checking has also been generalized for vector (SIMD) operations [21, 77].
The separability of residue codes simplifies the implementation of the checking component. Residues are not intrusive into existing designs: execution units are left as they are, while the computation of the residue of the result is done concurrently without impacting the delay of the original circuit. Moreover, for the cases where a residue functional unit is not cost-effective and is not implemented (for example
5.4. End-to-End Register Value and Dataflow Validation
·
73 for small logic blocks), the separability allows the designers to skip the checking of the operation, while still providing error detection for the source operands and computability of the result’s residue through function R(O).There are two different possibilities for embedding a residue code in a self-checking system: residue codes can just be applied locally inside the functional units, or the complete system computes with encoded operands [96, 119].
During the beginning of the arithmetic code era, residue codes were applied locally inside the functional units. This basic design option is commonly referred to as a ”self-checking system” [209]. In this design, the residues of the source operands are computed before they are fed into the residue functional unit, possibly introducing extra delay in the computation and checking part.
Forty years later, Iacobovici extended the concept to out-of-order processors where the complete processor computes with encoded operands [96, 119] and baptized this kind of residue protection as ”end-to-end residue checking” [75, 76]. Figure 5.5 shows an implementation of such end-to-end residue checking scheme. Residue codes are calculated where data is originated: (i) loads from the data cache, and (ii) output from the functional units. Residue codes flow through the bypass network, and are stored in the register file. This way data is protected in an end-to-end fashion: from the point it is originated, to the point it is consumed. Notice that for this implementation, we substitute parity with residue coding, since both protect the data. Correctness of functional units is achieved by the residue checkers placed next to them. Furthermore, this design option not only avoids adding residue generators to compute codes on-the-fly for the source operands, but also minimizes the delay introduced.