Lockstep Approaches Overview - Lockstep Approaches

4.6 Lockstep Approaches

4.6.4 Lockstep Approaches Overview

Three different lockstep approaches have been proposed to deal with the issues that arise with the utilization of the checkpointing and rollback techniques follow-ing distinct strategies. Table 4.6 summarizes their most relevant characteristics.

Table 4.6: Lockstep approaches overview.

Characteristic HW Fast Low Overhead Autonomous

HW overhead high low moderate

HW modifications high no moderate

Time demand low high low

Performance penalty high low moderate

Platform independence yes relative relative

PS requirement no yes no

The HW Fast Lockstep approach achieves a high checkpoint frequency and a fast rollback recovery with platform independence. It also avoids the need of any bit-stream processing. However, since it demands considerable hardware adaptations to add the different elements, it increases the hardware overhead and reduces the maximum achievable operating frequency.

The Bitstream Based Low Overhead Lockstep approach utilizes the BBBA de-veloped in this work as a basis to perform the rollback and the checkpointing and does not introduce any resource overhead. Thanks to this fact, it does not affect to the maximum operating frequency of the design. Another relevant ad-vantage of this approach is that it does not require any adaptation of the target processor. Nevertheless, the prize to pay when utilizing this approach is a low checkpointing frequency and a relatively slow rollback recovery due to the time demand of bitstream readback and writing processes. The BBBA also requires to use the Processing System and an external memory to store the context. De-spite that in Zynq based designs this could be an negligible issue due to its dual ARM processor, the adaptation of this approach depends on technical aspects

of the hardware platform utilized. Hence, it could be considered as a relatively platform-independent solution.

The Bitstream Based Autonomous Lockstep approach combines several aspects of the two previous methods, implying a trade-off solution. In this way, it utilizes the Approach to Manage Data of Registers with the Bitstream to save and load the context of registers and protects data and program memories following an ECC strategy. Due to this it also avoids the need of processing the bitstream.

In this way, this approach provides an autonomous fast checkpoint and recovery processes with a moderate hardware overhead. The drawbacks of this method are related to the implementation of the ECC logic, which increases the hardware overhead and reduces the maximum frequency. Furthermore, due to the usage of different Xilinx primitives is a platform dependant technique. However, it could be possible to adapt the main ideas to other vendor’s technology.

All the proposed approaches provide different solutions, depending on the re-quirements of the application to be designed. When the availability is a crucial aspect the HW Fast Lockstep approach could be more advisable. On the other hand, in applications that don’t demand real time response, the Bitstream Based Low Overhead Lockstep approach could be a good candidate in order to save resources and power. The Bitstream Based Autonomous Lockstep approach is a halfway solution that provides fast context saving and recovering operations but requires small adaptations on designs to be hardened.

4.7 Proposed Synchronization Approaches for Repaired Soft-Core Processors in Hardware Redundancy Based Schemes

This section addresses the scarcity of investigations around synchronization in hardware redundancy based systems by proposing, implementing and evaluating five different synchronization methodologies. Since the triple redundancy method is one of the most established redundancy levels, the approaches in this section will be based on TMR (Triple Modular Redundancy) schemes. Nevertheless, the majority of concepts presented can be applied to other redundancy levels.

The five approaches span a broad spectrum of possible alternatives from minimal hardware overhead to completely hardware-based synchronization. This allows balancing the trade-off between implementation cost and synchronization speed, depending on the requirements of the target application at hand. The perfor-mance of the proposed techniques is verified and compared to each other on the

PICDiY processor. However, all the approaches are of general nature and can easily be migrated to other processor architectures. The presented methods are furthermore not restricted to a set-up implementing TMR and DPR (Dynamic Partial Reconfiguration). They are applicable to any TMR protected processor system to recuperate a processor element, which was forced out of sync by an SEE.

When implementing a combination of TMR and DPR for the realization of fault tolerant systems, a pure reconfiguration of a faulty module is not sufficient given that the reconfigured module comprises of an internal state. This synchronization is especially critical for processors, because their state is composed of a number of different registers. For the PICDiY processor the following elements need to be synchronized: the program counter, the stack-pointer and the stack content, the special function registers (FSR, STATUS, INTCON, PCL, PCLATH and INTCON) and the data memory. These elements will be referred to as synchro-nization objects in the remainder of this work.

In general, finding an adequate synchronization strategy for a given application implies balancing a trade-off. On the one hand, adding specialized hardware for the processor synchronization will enable a very fast synchronization process.

On the other hand, implementing the synchronization with little extra, or none, hardware combined with software will result in less implementation overhead and a lower impact on the critical path of the design.

The structure of the synchronization method impacts the duration of two sub-steps of the whole recovery process. Firstly, the time required to copy the correct values of the different synchronization elements to a recently reconfigured proces-sor instance in the coarse-grain TMR setup. This time will be called copy-time.

The second aspect of the synchronization speed is the time from the detection of an error by the voter until the point in time where the system is ready to start the synchronization process. This second time is named wait-for-sync. Some approaches can not begin directly with the synchronization, but they first need to finish ongoing calculations before CPU time can be spent for updating a re-configured processor instance in the system. The time from the detection of an error to the re-synchronization has implications on the overall system robustness, because in this time period the TMR system operates only with two functional instances, making it vulnerable to consecutive SEUs.

The whole SEU recovery process is illustrated in Figure 4.29, where the time re-quirement is defined by the sum of four components: the time needed to detect the error, the wait-for-sync time, the time consumed for repairing the SEU by partial reconfiguration and the copy-time. Whereas the time for partial reconfiguration is proportional to the size of the reconfigured partition and the reconfiguration

Table 4.7: General synchronization objects and accessibility for PICDiY, PicoBlaze and MicroBlaze processors.

Synchronization object PICDiY PicoBlaze MicroBlaze CPU-flags read/write no access read/write Stack-pointer no access no access read/write Stack-content no access no access read/write Program counter read/write no access read/write CPU registers read/write read/write read/write Data memory read/write read/write read/write

speed, the time to detection is not affected by the synchronization approach, and is only application dependent and hence is considered beyond the scope of this work.

Figure 4.29: SEU recovery process and impact of synchronization times.

In the following, different synchronization approaches are presented for the exam-ple of the PICDiY processor. The synchronization objects of this specific archi-tecture are summarized in Table 4.7. This table also contains the synchronization objects of the PicoBlaze, a close alternative and the MicroBlaze processor, a more powerful and complex processor architecture. Despite of the simpler architecture of the PICDiY and the PicoBlaze, they are more demanding in terms of syn-chronization. For the MicroBlaze all synchronization objects are accessible via software. However, the PICDiY and the PicoBlaze have an inherent need for additional hardware when a complete synchronization is desired because some elements are neither readable nor writable by software.

It needs to be noticed that this work does not consider the program memory as a synchronization object. This is because when a program memory mod-ule is repaired by utilizing DPR its content is also initialized by the bitstream.

Nevertheless, among the proposed approaches the BBBA based one is the only that could synchronize the program memory since it is implemented as a ROM.

The only way to address the synchronization of this module by the other four approaches would be to implement it as a writable memory block.

In document Contributions to the fault tolerance of soft-core processors implemented in SRAM-based FPGA Systems. (Page 175-179)