Bitstream Based Fault Injection Techniques

3.6 Evaluation of Hardening Techniques

3.6.2 Bitstream Based Fault Injection Techniques

Bitstream based fault injection techniques, also known as fault emulation tech-niques, are an inexpensive and a widely extended tool [73, 85, 87, 198, 234, 239,

305–307] to characterize specific design implementations. They enable to col-lect huge amount of data due to continuous injection over a period of hours or months and they are a valuable tool for studying the behaviour of DUT under the presence of bit errors. Despite being less realistic than radiation test, the results obtained with these techniques are highly accurate. For this reason, fault emulation is commonly utilized as a complementary tool to physical injection.

They mainly consist in reconfiguring the DUT with a corrupted configuration bitstream and checking its behaviour in order to evaluate the reliability. Figure 3.21 shows the basic flow of a bitstream based fault injection.

Figure 3.21: Basic flow of bitstream based fault injection.

First, the corrupted bitstream has to be created. The usual way is to flip the desired bits of the configuration bitstream. When determining the critical bits of a particular implementation, an alternative is to generate as much of corrupted bitstream as configuration bits in order to check the effects produced of each configuration bit. Another way is to generate random positions of the corrupted bits like in [307]. Considering the large amount of bits of a configuration bit-stream, these alternatives require lots of time and large memory storages. Since the reconfiguration is a time demanding process, the size of the bitstream to be loaded will directly affect the duration of the fault injection process. Due to this, instead of loading the entire bitstream, an alternative is to generate partial bitstreams and utilize the dynamic partial reconfiguration capability of SRAM-based FPGA [308, 309]. This practice reduces both, the injection time and the

memory demand to store the different corrupted bitstreams.

If the hardening approach is focused on the memory elements, it is relevant to put special efforts in injecting errors in the stored data. This strategy requires to study the bitstream structure. In the case of Xilinx devices the .ll location file it is a very helpful resource. However, it requires certain level of processing effort.

The second step in the basic fault injection flow is to reprogram the FPGA with the corrupted file. Since this can be done utilizing different configuration inter-faces, this aspect directly involves the experimental setup. Figure 3.22(a) shows the most inexpensive strategy, which is to utilize a single FPGA device, as both, DUT and fault injector. Despite this approach can be practical to test particular design portions, it requires to perform a deep study of the implementation and its bitstream and it does not permit to perform a complete fault injection test.

This is because it implies to implement the fault injection logic in the DUT itself.

Hence, a faulty bitstream could also affect the fault injection, and even the test logic. A widely extended practice, as Figure 3.22(b) depicts, is to utilize and additional external board to the manage the reconfiguration of the DUT [310].

This scheme avoids the possibility of corrupting the fault injection logic itself.

Nevertheless, it increases the costs due to the need of additional devices. An in-between solution is to utilize SoC devices like in [165], where the Zynq device has been utilized. As Figure 3.22(c) shows, in this case the Processing System manages the creation of the corrupted bitstream and downloads it through the PCAP interface to configure the programmable logic. This approach avoids both, the need of additional hardware and corrupting the fault injection controller.

After reconfiguring the FPGA with the corrupted bitstream the functionality has to be tested by running the test application and comparing the results obtained with reference values. There are several alternatives to obtaining these reference results. The most simple one is to pre-calculate and store the responses of the application to be tested under specific inputs and conditions. In this way after running the application the results can be compared with the pre-calculated responses. Another alternative is to utilize an additional implementation of the target application and run both, the DUT and the replica in parallel, checking the outputs in runtime.

An extended practice to increase the efficiency of functionality tests is to utilize test vectors. This implies that each injected fault is tested with a vector of different inputs. Bearing in mind that different input can affect to distinct bits this strategy opens the range of possible generated errors. After performing the actual test the results obtained can be saved in a file permitting to run a next test. In this way all the obtained results can be analysed in future fault tolerance evaluations.

1010101010110101001101110110100101

(a) Autonomous internal fault in-jection scheme.

1010101010110101001101110110100101 1110101010101101010100100101010101 0100101001010111000110111001001111

(Configuration bitstream)

DUT board (FPGA) External board

Configuration interface

(b) External fault injection scheme.

10101010101101010011110100101 (c) SoC fault injection scheme.

Figure 3.22: Different bitstream based fault injection setups.

Several fault injection tools have been presented in the literature. Some of the most representative ones are listed in the lines below:

• Xilinx provides a so called Soft Error Mitigation Controller (SEM) [186]

that can be used to inject, detect and correct errors in the configuration memory of 7 series devices. Nevertheless, it is only suitable for the config-uration memory and not for BRAMs or distributed memories. Regarding BRAMs, Xilinx offers a fault injection mechanism for BRAMs with its CORE Generator. However, due to its limitations this mechanism is not practical for many applications.

• Fault Injection (FI) tool presented in [311] is a bitstream based tool for Virtex FPGAs that addresses only configuration memory cells and user registers. This tool is especially designed to fit with the requirements of those applications in which the FPGA undergoes in frequent reconfigura-tions. It modifies the configuration bitstream while this is loaded into the device without utilizing standard synthesis tools making it independent to the system utilized. Nevertheless, since this tool cannot access to the built-in FSM state transitions it requires the utilization additional fault built-injection techniques to test them.

• XRTC Virtex-5 Fault Injector is another fault injection tool for Virtex devices presented in [312]. This FPGA fault injection system for testing digital FPGA circuits has been designed in conjunction with the Xilinx Radiation Test Consortium. The main goal is that it achieves a high cus-tomization and full bitstream coverage at a high fault injection rates.

• FPGA-based Fault Injection TOol (FITO) proposed in [313] permits to inject faults emulating SEUs and SETs at RTL level of an FPGA design by adding extra ports and connections to the flip-flops.

• FLIPPER tool presented in [314], as its name suggests, is able to provoke bit-flips within the configuration memory utilizing the partial reconfigura-tion capability of SRAM-FPGAs.

• Fault injection Using SEmulation FUSE [315] is a tool that includes the concept of semulation (a combination of simulation and emulation). It per-mits to take advantage of the benefits of both techniques: the higher fault injection speed of the emulation and the flexibility and the observability of the HDL simulation.

• Shadow Components-based Fault Injection Technique (SCFIT) [93] tool is also based on the semulation. It utilizes TCL scripts to access to the FPGA by using the JTAG configuration interface.

• Flexible on-chip fault Injector for run-time Dependability validation with target specific COmmand language, FIDYCO [316] is a tool that combines both hardware and software schemes. While, the hardware includes the tar-get FPGA implementation, the software is located in an external computer.

Due to the flexibility provided by this tool, the designer is able to test a va-riety of components. It mainly consists in moving the fault injector towards the target and after it, translating the target to the FPGA.

• Direct Fault Injection (DFI) [96] is focused on injecting errors in soft-core processors implementations. This approach is a combination of multiple fault-injection methods such as FITO, FUSE, etc.

• NETlist Fault Injection (NETFI) [317] is a tool that permits to inject faults in designs written in any HDL language (Verilog, VHDL, etc.). The main idea of this approach resides on modifyin the built-in FPGA resources that are to be used by the netlist after the synthesis process.

• Fault-Injection Fault Analysis tool (FIFA) [318] allows to inject faults at RTL level. It implements two versions of the DUT in the FPGA device.

• The platform presented in [319] called FT-UNSHADES2 is an approach focused on carrying out the fault injection utilizing a hardware assistant that can accelerate the analysis process. It uses a mother board connected to two daughter boards. It has different operating modes to deal with ASICs and FPGAs. The FPGA mode injects the fault utilizing the bitstream.

However, it also has an additional Beam Testing mode to be used with the system inserted in an ion beam. This mode allows to place one daughter board exposed to the ion beam and maintain the other safe in other to compare them acting as a coincidence detector.

• Advanced System for the TEst under Radiation of Integrated Circuits and Systems (ASTERICS) is a platform used in [91]. This upgraded version of the THESIC+ platform, is built utilizing two FPGAs. While the first one manages the communication between a computer and the ASTERIC board utilizing a hard processor, the second contains the DUT with the user design to be tested, the injection module and the memory controller. A watchdog implemented in the first board checks the possible errors generating a loss in the sequence.

• In [310], a low hardware overhead injection approach for FPGA designs, avoiding the need of special injection boards was presented. This works offers two fault injection approaches: An external SEU flow which is a fault injection approach managed by an external device, and a single bit error test flow, that utilizes the internal reconfiguration obtaining a high injection

performance.

• The work [320] proposes a high-speed fault injection system along with a methodology able to test the sensitivity of a design’s bitstream to SEUs.

This system is especially designed for soft-core processors and it also can be utilized for radiation testing purposes.

One relevant drawback of these techniques is that they are not able to inject faults across all sequential elements of the design, especially when the design utilizes proprietary IP cores, where physical mapping is unspecified. Another drawback is that these techniques tend to overemphasize cache and register er-rors [295]. Due to the inherent nature of bitstreams these techniques are highly technology dependant. Although the flows and concepts can be utilized in dif-ferent devices, the implementation of each approach has to be adapted to the particular specifications of the DUT, which may require to investigate its design.

Despite the utilization of this tool represents a very helpful aspect, to investigate new fault injection approaches is out of the scope of this work.

In document Contributions to the fault tolerance of soft-core processors implemented in SRAM-based FPGA Systems. (Page 122-128)