10.2 System-Level Characterization, CTMDP Modeling, and Analysis of
10.2.3 Instruction Based Characterization, Modeling, and Analysis
Characterization of Radiation-Induced Soft-Errors
Ionizing radiation may impact the potential on electrical nodes of micro-electronic devices, causing them to forcefully change states. The amount of ionizing particles traversing the device’s sensitive area is computed based on the flux intensity of par- ticles per square centimeter per second. If these transient radiation events transfer enough energy, then an SEU (i.e., a bit-flip in a memory cell) or a transient fault in the combinational circuitry of the device may be generated. At this stage, the SEU may propagate through different states but does not necessarily generate a system failure. In fact, most generated SEUs will be masked by different masking mechanisms. These masking mechanisms are related to SEU propagation path, system current state, and fault characteristics (e.g., injection time, site). Thus, only a subset of the faults intro- duced in a system will result in errors. A percentage of these errors can be detected by the system within a certain time limit, which is defined as the system’s coverage factor. In this work, in order to accurately estimate the SEU vulnerability of different
variables in an application, details of the microarchitecture specification, technology node and SEU behavior are taken in consideration. The proposed high level analysis investigates the vulnerability of each variable by checking the SEU propagation prob- abilities to different states and the lifetime of the different variables. The lifetime of a variable is defined as the time period starting when an SEU is injected in a register until the time when this SEU is masked or becomes inactive (i.e., it no longer affects the system’s behavior). This can be characterized by the set of states in which the variable is active, including the state where the variable is defined, every following state in which the variable is used as an operand of another operation, and all the states on the path between the definition state and the usage state [63].
Microarchitecture-Based Characterization
The proposed analysis introduces a characterization of the instruction set of a target microprocessor based on its microarchitecture specifications. The purpose of conduct- ing this characterization is to identify the registers involved in the different steps of the required computation. The instruction type is found with respect to the opcode. Each instruction type is associated to a model, which specifies which registers in the integer unit’s pipeline are involved, during the normal execution of the target instruction, as well as different fast-forwarding cases [86]. Based on the string of instructions that needs to be executed, this approach identifies which registers are involved and the appropriate execution path for each instruction is selected. This enables an accurate high-level estimation of the lifetime of the inject faults.
For example, considering the LEON3 pipeline, the characterization of the arithmetic operation multiplication (MUL) includes:
DECODE INST register (decode stage)
RFO DATA1 and RFO DATA2, RD CTRL (Register Access stage) R E OP1 and R E OP2 (Execute stage)
R M RESULT (Memory access stage) R X RESULT (Exception stage) R W RESULT (Write Back stage)
Similarly, other possible characterization of the MULimm instruction where R A IMM register will be active instead of RFO DATA2. While this modeling stage is depen- dent on the target processor, it is a step that needs to be done only once and that can be easily adapted to a new target.
S0 F1 S1 S2 S4 FE λ_iCache 1-λtrap F2 S8 S7 S5 λ_DT_2 S11 S10 S9 F3 S12 F4 S12 F5 S13 F6 λ_ DT _1 λ_OP_2 λ_OP1 λprop DE RA EX MA XC WB
Figure 63: Proposed Probabilistic Model of SEU Propagation Through a the Proces- sor Pipeline in an ADD Instruction
Instruction-Based Markov Modeling
The LEON3 processor implements the full SPARC V8 standard [62], including hard- ware multiply, divide, and multiply-accumulate instructions. SPARC is a CPU in- struction set architecture derived from RISC. The pipeline of the IU of the LEON3 consists of seven pipeline stages structured according to the Harvard architecture. Based on the instruction type, with respect to the opcode, a probabilistic model of SEU propagation through the registers involved at each stage of the pipeline is proposed. This instruction-specific model includes the probabilistic details of the propagation path and the impact of an SEU on the operation/functionality of the instruction. For each type of instruction (e.g., ADD, ADD Imm, MUL), a model is constructed to account for the different use of the pipeline registers. For instance, the ADD instruction uses different pipeline registers than the ADD Imm instruc- tion. Therefore, the possible fault propagation paths in these two instructions are not equivalent. The fault propagation paths of all considered instruction types have been obtained through the generation of counterexamples in our model-checking and modeled, in a library, as Markov chains. An illustrative example of the ADD instruc- tion model is shown in Fig. 63. Starting from state S0 (error-free fetching), at the
fetching stage an SEU affecting the PC can alter the address of the next instruction, which may result in wrong or invalid operations. This is shown in Fig. 63 in the
transition from state S0 to state F1 (wrong or invalid PC) with a rate λpc, which
indicates the rate of occurrence of an SEU-induced error in the PC register. More- over, in the fetching stage, an SEU affecting the icache can result in an alteration in the data accessed from the memory and stored in the instruction register. This is represented in Fig. 63, in the transition from state S0 to state S1 with rate λicache, which indicates the rate of a bit-flip in the memory. In the decode phase, an SEU can affect the instruction register which is represented in Fig. 63 with the transition from state S2 (error-free state) to state S4. This transition can be done with rate
λinst. Such SEU can cause an error in the decoded data for any of the operands or
operation. In the register-access phase, an SEU can propagate from the decode phase to data1 (transition from state S4 to state S8 with rate λDT 1) or data2 (transition
from state S4 to state S7 with rate λDT 2). Moreover, an SEU can propagate from
the decode phase to the write address (RD CTRL) (transition from state S4 to state
F2 with a rate equal to λRD ctrl). Moreover, an SEU can affect any of the registers
at the RA phase. This is represented by the transitions from S5 (error-free state) to F2, S7, and S8 as shown in Fig. 63. In the execute stage, an SEU can propagate from the decoding phase causing execution of the wrong operation or execution of the right operation on the wrong data. This case is represented by the transitions from either S10, S11 to F3 with rates equal to λprop2 and λprop2, respectively. In this case, the SEU may be logically masked in the data path. An error in the results of the execution phase propagates to the following stages (MA, XC, WB). Moreover, an error can be injected at the MA, XC, WB stages, resulting in an error in the stored data in the memory.