• No results found

3.5 Industrial Validation Techniques

4.1.3 Fault Coverage Evaluation Methodology

The fault injection methodology in this thesis aims at modeling the faults caused by any source of error (transient error, intermittent error, design bug or other hard faults) in an advanced out-of-order processor pipeline, and study the response of the proposed techniques in detecting or diagnosing them.

From a circuit-level perspective, a fault can affect a stored bit in a sequential element or affect the transistors and wires of combinational logic blocks. However, modeling these faults requires a gate-level model of the processor pipeline. Even though gate-level modeling allows accurate measurements, microarchitectural-level fault injection regimes are more desirable from a design perspective [155]:

• Simulation speed Fault simulation at the gate-level is extremely time con- suming. These models are very detailed, and their simulation speed is orders of magnitude lower than for performance simulators. Given that many fault injections are required for a high degree of confidence, simulating them at the circuit-level becomes almost impossible.

• Reliability decisions during design path-finding Fault injection at the circuit-level is not suitable for use during design phases. Early reliability esti- mates must be made in order to guide and adapt the design, in a similar manner as it is done with power or temperature budgets. This fact calls for reasonably accurate cost-effective methods to obtain error coverage metrics, and therefore, microarchitectural-level models (such as timing simulators) represent a sweet spot. Furthermore, abstract models are the ones available during these stages, and not circuit-level models.

• Fault masking Fault injection at a gate-level has the downside of fault mask- ing [165]. Quantifying masking effects is critical when computing accurate (non-

52

·

Chapter 4. Evaluation Framework

Table 4.2: Simulator configuration

Parameter Value

Frequency 2.8 GHz

Technology 32 nm

Voltage 1.1 V

DDR3-1600‡, 48ns/54 for open/random RAM page

Main Memory

+ 27 cycles for load-to-use latency

2 MB, 16-way, write-back, 27 cycles load-to-use, 2 slices, Last-Level Cache (LLC)

1 R/W port of 32B each. Runs at core f /V, 32B ring 256 KB, 8-way, write-back, 12 cycles load-to-use, Unified Second-Level Cache (L2$)

1 R/W port of 64B

32KB, 8-way, write-back, 2 cycles hit†

Data Cache (D$)

2 R/W ports of 32B, 64B lines Miss Status Holding Register

(MSHR) 16 outstanding misses

32KB, 8-way, 3 cycles hit, Instruction Cache (I$)

1 R/W port of 16B Data/Instruction Translation

Lookaside Buffer (DTLB/ITLB) 128 entries, 8-way, 25 cycles per miss

GShare [112] PHT-BTB 8K entries bimodal 4-way,

Branch Predictors 16-bit history, 16 entries return-address stack

14 cycles misprediction penalty

Decode width up to 4 micro-instructions

Rename width up to 4 micro-instructions

Allocator Queue (Alloc) 12 entries (micro-instructions)

Allocate width up to 4 micro-instructions

Rename Tables (RATs) 1 frontend RAT, 8 checkpoint RATs

Issue Queue (IQ) 32 entries scheduler, connects to 6 exec ports

Issue width up to 6 micro-instructions

ALU [0/1/5], LEA [0/1], Shift [0/5], INT Operations [exec ports]

Mult-Div [1], Jump Unit [5]

FP Operations [exec ports] Adder [1], Mult-Div [0]

SIMD INT:

SIMD INT/FP ALU [1/5], Mult-Div [0], Shift [0/5], Other [1/5]

Operations [exec ports] SIMD FP:

Add [1], Mult-Div [0], Other [5]

Load-Store Queue (LSQ) 30 loads, 20 stores (up to 2 loads and 1 store per cycle)

Memory Operations [exec ports] Load Address [2/3], Store Address [3], Store Data [4]

Register Files (RF) 128 INT, 128 FP-SIMD, 2 bypass levels

Reorder Buffer (ROB) 128 entries

Commit width up to 4 micro-instructions (max. 1 non-bogus store)

: +2 cycles for load-to-use latency due to address calculation.

4.1. Benchmarks, Tools and Simulators

·

53 pessimistic) processor failure rates, but masked faults must be ignored when evaluating the error coverage potential of a fault-tolerance technique. A bet- ter approach is to directly model at a microarchitectural simulator non-masked anomalies or failure scenarios caused by faults at the circuit-level. These simulators are aware of several sources of masking. Instructions belonging to wrong paths or mispeculated, instructions with dead results and instructions suffering some types of logical masking can be identified and be avoided dur- ing the fault injection. The net result is that the incidence of unmasked faults is higher when using these models, resulting in a rigorous evaluation of the fault-tolerance techniques.

We use a fault injection approach where faults locations at the microarchitectural- level that end up manifesting in the same visible failure scenario are grouped to- gether [155]. For example, faults in a register scoreboard entry, or in a shift-register, or in a select request or bid signal, or in the latency of producers, etc. can result in prematurely issued instructions. To do so, the pipeline stages and processor com- ponents described in Appendix A are thoroughly inspected to identify the high-level visible faults that can be modeled in a timing simulator, enabling a fast and reason- ably accurate evaluation. We have used fault studies, such as Reddy’s [155], to guide the finding of our particular failure scenarios. For fault locations not analyzed in previous works, we have conducted fault injection studies in order to understand the resulting manifesting failure scenario, and to reason about the conditions when they mask or manifest.

In each of the next chapters, we detail the different failure scenarios that can arise when faults affect the hardware involved in implementing the register dataflow, memory dataflow and control flow recovery logic. For each failure scenario we list the hardware components that, when faulty, can end up causing each type of failure. For every considered failure scenario, 1000 effective faults are injected per bench- mark. The fault injection is performed one-at-a-time during the first 10M instruc- tions, in a random manner. Then, each experiment is allowed to run for 100M instructions, to let the fault manifest. An injection experiment is rejected (not effec- tive) when the fault is masked. Masking happens when these conditions are satisfied: (i) the architectural state in the functional simulator is not corrupted (i.e the state matches the expected golden state), (ii) the functional simulator does not report an error (no assert in the benchmark is raised and no wrong exit status is returned by the simulated benchmark), and (iii) the watchdog timer (described in Section 4.2) does not trigger.

The timing simulator and the interface to the functional simulator have been deeply modified to support explicit fault injection. First, it has been extended to ex-

54

·

Chapter 4. Evaluation Framework

plicitly model micro-architectural structures that were originally implicitly modeled. This includes hardware blocks like the bypass network, the bypass-register file data, branch coloring fields for wrong-path tracking, logical register destinations, latency fields in the issue queue, ready fields, etc. In addition, the performance simulator has been modified to include buggy methods. The objectives are twofold: first, it allows supporting fault injection for hardware locations that cannot be explicitly modeled at a micro-architectural level, and second, by using buggy methods we can guarantee that the proposed solutions can cover against functional design bugs. Some examples include: buggy methods for the wake-up logic, select logic, load-store queue logic, instruction squashing logic, input multiplexors, and ROB walk logic.

For locations explicitly modeled in the performance simulator, faults are injected as single bit flips. For locations not explicitly modeled, faults are modeled as acti- vation of the buggy simulator methods. The duration of the injected faults have no fundamental impact on the coverage of end-to-end schemes, as noted by Meixner [186]. For non-transient faults, instead of letting faults persist during the whole experiment execution, we have chosen a more pessimistic approach where they behave like ”short intermittent” faults. This approach provides lower-bounds on error detection cover- age for permanent faults, as the opportunity to detect them is limited by only one fault activation, and not be consecutive ones. It is important to note that by relying on spatial redundancy, permanent faults can be detected (the checked hardware is different to the hardware implementing error detection). Furthermore, design het- erogeneity covers against design bugs (the checker logic is different to the checked logic).

Methods like AVF analysis [126, 128] have not been used because despite be- ing suitable for computing estimates for SRAM and CAM structures, they cannot estimate the vulnerability for combinational logic.