4.5. Single event effects (SEE)
4.5.2. System level response
4.5.2.3. Single event functional interrupt (SEFI)
SEFI represent the most disruptive version of non-destructive SEE. Although this type of anomaly was previously predicted for space environments (Koga et al., 1985), the term single event functional interrupt (SEFI) was first mentioned
1996). SEFI is defined as all non-destructive failure modes that lead to the malfunction (or interruption of normal operation) of a part or the totality of the device (Bougerol et al., 2008). This definition is in contrast with certain authors that define SEFI as the cause of a higher error rate than expected due to uniformly distributed upsets (Crain et al., 1999; LaBel et al., 1996).
The causes and effects of SEFIs vary from the type of component and the technology used. In general, SEFIs are linked to an upset (SET or SEU) in a control area that configures a specific function, and leads to the loss of that function. In contrast to SEUs and SETs that may or may not affect the operation of the device, every single type of SEFI leads to a direct malfunction. Figure 4-12 shows an SET affecting combinational logic, not affected by the logical and electrical masking mechanisms (as in Figure 4-13), that propagates to a register in a control area within the latch window (as in Figure 4-14). If the register affected is being used by a vital part of the system software, a SEFI could take place.
Table 4-5. Classification of SEFI
Name Also called Typical Effect Recovery procedure Technology affected Examples
Logic SEFI Address error, recoverable bust error, temporary block error Reading/writing of the wrong row, column; 512- 8k addresses in errors
Rewriting of the right value
Complex memories
such SDRAM Fuse latch upsets (SEFLUs)
Soft SEFI Resettable SEFI Functionality loss of up to a full memory bank Refresh cycles FPGA, microprocessors, complex memories
Stuck block errors
Hard
SEFI Permanent SEFI, Reboot SEFI Complete loss of functionality
Complete power cycle of the device FPGA, microprocessors, complex memories Events that induce data and functionality loss that cannot be recovered
As microcircuits become more complex they also become more susceptible to SEFIs; among those: SDRAMs (Harboe-Sorensen et al., 2007) with complex internal architecture (such as state machine), FLASH memories (Irom and
Nguyen, 2007; Nguyen et al., 1999; Oldham et al., 2008), FPGA (Czajkowski et al., 2006) and microprocessors (Czajkowski et al., 2005). Dependent on cause, consequences and recovery procedures, SEFIs can be classified as logic, soft or hard (see Table 4-5).
Logic SEFIs (Bougerol et al., 2008): with regards to memories, it is also called “address error”, “recoverable burst error” (R. Ladbury et al., 2006) or
“temporary block error” and mainly includes row and column errors. The upset of a row or column register leads to the reading or writing of the wrong
row/column. This type of SEFI typically causes between X and 8k addresses in errors where X is the number of addresses per row/column (Bougerol et al., 2008). Rewriting of the right values is used as to recover functionality (Schagaev and Buhanova, 2001).
Examples of logic SEFIs are “fuse latch upsets” also called SEFLUs (Bougerol et al., 2011, 2010) that lead to the wrong addressing of a whole row/column. Manufacturers are experiencing an increasing number of defective cells, therefore adding spare cells and exposing them to reliability tests. If during those tests, a cell is found defective, fuse latches are used to disable the particular row/column. Typical signatures of fuse latch upsets are multiples of X addresses where X is the number of addresses belonging to a column/row.
Soft SEFIs also called “Resettable SEFIs” (Bougerol et al., 2008; Lawrence, 2007) are due to upsets in the device configuration area and usually induce the functionality loss of several thousands of addresses up to a full memory bank. Reconfiguration of the device with a mode register set command can be used as a recovery procedure of the functionality (but not the data). Examples of this are “block SEFIs” also called “stuck block errors”, observed in the IBM Luna-ES rev C during heavy ion testing (“NASNGSFC Landsat-7 Project Office, Private
specific value. Since simple writing was not sufficient, device refresh cycles were used to clear the problem. SEUs in selected areas of an FPGA such the JTAG bit serial configuration port can lead to inability of reconfiguration.
Hard SEFIs (Bougerol et al., 2010; Harboe-Sorensen et al., 2007), also called Reboot SEFIs (Bougerol et al., 2008), “permanent SEFIs” (Slayman, 2005), “non resettable errors” (Lawrence, 2007, p. 512) or “persistent non recoverable errors” (R. Ladbury et al., 2006) can be induced by different phenomena and lead to the complete loss of memory functionality. Possible causes of this type of catastrophic SEFI are upsets in the internal state machine, counter registers or activation of special modes. An example of this is an SEU in one of the power on reset registers that can lead to the removal of the entire configuration area. Complete power cycle of the device is compulsory as a recovery procedure.
Fortunately, the probability of SEFI is low compared to other types of SEEs (Slayman, 2005). The reasons for that are:
The ratio of the periphery logic area to memory array area is very low; The critical charge for logic gates is usually higher than for SRAM cells. The most part of the periphery logic is combinational, and therefore less susceptible to upsets due to the three inherent masking mechanisms.
SEFIs can also be classified as high current SEFIs if they involve a certain increase in current (Koga et al., 2001a, 2001b).
In addition to SEFIs in complex memories, the energetic particles can also strike other circuits such that the error detection and correction mechanisms affect the functioning of the whole circuit. In FPGAs, SEFIs can cause the device to stop from functioning normally and therefore require a power reset in order to resume normal operations. In microprocessors, SEFIs can induce upsets in the program counter, illegal branching and jumps to undefined states.