• No results found

A new Architecture to retrieve MCUs in programmable memories in a radiated environment is presented. It uses the advantages of both the reconfiguration-based techniques and redundant-based techniques to obtain a hybrid system suitable for fault-tolerant requirements, which is optimized in terms of operating time and area.

This architecture merges different well-known techniques such as:  Single Error Correction Techniques, [5] and [6].

 Interleaving Distance concept (for instance, in [35], [36]).

Detection and correction decoupling concept (for instance, in [7], [26]), based on an independent parity check.

It also proposes new features as:

 The implementation of a permanent reference (not vulnerable to radiation) called Hardwired Seed Bits (HSB) that contains the essential information for error retrieval. The HSB are composed of an extremely low number of bits from which the whole memory can be reconstructed, including the EDAC functions, under the multilevel approach of the proposed architecture.

 This system can be physically implemented as a small code at the static memory, which is transparent for the logic layer. For instance, in an FPGA, it can be implemented at the SRAM level. The content of HSB and Checkword is generated at the development phase of equipment and depends on the final code that the system protects.  A suitable algorithm for Fault-Tolerant application is also included

in the system in order to drive the architecture operation. It runs the operating sequence within the main program, resulting in a transparent performance for the user.

The proposed architecture allows an area reduction when compared with a TMR-EDAC approach, meaning a smaller FPGA can be selected for the system. In the example presented in this work, the proposed system performs up to 3 times quicker compared with a classical Hamming code.

7.1 SUMMARY OF ORIGINAL CONTRIBUTIONS

Proposed solution is based on a Reconfiguration-based solution that aims to restore as soon as possible the original values.

Main advantage of this kind of linear codes is the light-weighted correction coding because the calculation of each checksum bit is based on parity computations of alternate bits of data. This codification allows detecting / correcting a limited number of errors that may occur anywhere only in the data bits, not in the own redundant bits. The main contributions of the proposed Architecture are the following:

A Multilevel Architecture, which allows concatenating two or more levels of checksum in which every level only protects the redundant checksum bits of previous levels. The range of this multilevel architecture is scalable to any memory size.

The whole memory information is highly compressed in a set of few bits HSB (Hardwired Seed Bits) which are immune to radiation. From them, the EDAC (Error Detection And Correction) codes included in the Architecture are able to completely restore the whole memory.

This Architecture allows a significant minimization of the failure probability in memories due to radiation. This minimization consists in not only retrieving Single Event Upsets (SEUs) but also Multiple Cells Upsets (MCUs). This low probability rate opens the possibility of fulfilling Fault Tolerant requirements due to it is able to meet safety requirements such as the ones stablished at the demanding Standards of the Aerospace Industry to avoid catastrophic failures.

This Architecture proposes significant optimizing the resources involved. This is particularly important since solving MCUs normally increases exponentially the complexity of the system in terms of both devices involved and operation time. The contribution in this is case us the integration of different techniques such as detection-correction decoupling and bit interleaving.

This Architecture allows Fault-Tolerant performance. Not only it presents a very low probability failure rate, but it also extends the protection to the EDAC functions themselves.

A procedure to determine the failure probability of the proposed Architecture has been presented. This procedure allows designing the setting of the Architecture in order to fulfill the safety requirements. However, there is still a path ahead that has been identified, since there are different aspects involved in an accurate probability estimation, which has been proposes a future work. In the present work, a worst case estimation was presented as a first approach, which was useful in order to assess the experimental results.

7.2 FUTURE WORK

This work aims to propose a step-ahead-concept to optimize the performance of SRAM memories working in a radiated environment. This work has allowed obtaining knowledges for the complex nuclear environment that creates the radiation issue in airborne electronics and also coding mechanisms for error protection.

Several open topics and new questions have been arisen after the work performed in this Thesis. The future work will continue the development of the Framed HSB Architecture herein described. The future work-packages are presented and briefly described below:

1. Implementation technologies:

 HSB Implementation: A basic HSB implementation could be performed with jumper pins (a pull-up/pull-down approach). However, this would not be practical for large memories, and even impossible for dynamic functions. In these cases, a programmable HSB capability is required. Manufacturers may have the solution for this issue, as inserting some FLASH Seeds Bits (FSB) in the memory. A TMR-HSB can be implemented within the memory, losing the Hardwired approach, and taking into account the tiny bits number involved.

 Space Condition: A physical control of the bits location is required to implement the interleaving distance. The FPGA has a complex physical structure composed basically of RAM blocks which are dynamically interconnected through routing matrices. Furthermore, each RAM block itself includes elements such as a Look-Up-Table (LUT) which are configurable. The FPGA algorithm synthesizer should be reprogrammed to implement the Framed HSB Architecture functions taking into account not only the logic but also the physical bit distribution.

2. Development to explore the limits of Framed HSB Architecture

 Time Condition: Fault-Tolerant requirement implies downsizing the failure rate to below a threshold required by the System Safety Assessment that is related to DAL. Time condition means that the system operation has to be fast enough to neglect double consecutive radiation events that take place in same iteration cycle. This condition is not only related with the clock frequency of the system but also with the fact that the whole memory is not scrubbed at once but frame by frame.

 Probability calculations: All actual systems are susceptible to failure. Therefore, safety failure probability thresholds and calculations related to double consecutive radiation impacts in Framed HSB architecture require additional investigation. The Architecture may isolate the consecutive radiation events into a set of single radiation event even in the same cycle, if they affect to different frames. Only an issue is created when consecutive radiation events in the same scrubbing cycle affect bits in common frames. This fact improves the Architecture failure probability, though it was not taken into account in this work.

 Scrubbing algorithm: As MCU appears as a set of neighbor bits affected due to a single neutron radiation event. The algorithm of Framed HSB Architecture could be improved based on an intelligent scrubbing sequence where both redundant SEC functionalities work simultaneously. They could focus the correction procedure in the neighbor frames. This algorithm could reduce the Exposition time. 3. Detection and correction of Double consecutive radiation impacts.

 Double consecutive radiation impacts: The probabilistic calculation performed in this Thesis is based on the Total Failure Rate (TFR). TFR includes all possible cases for n-Multiple Uncorrectable Impacts and reveals that TFR is equal to the Double consecutive radiation impact rate. Therefore, the unique constraint to the failure rate under this Architecture is the double consecutive radiation event. This issue requires additional investigation based on an algorithm that could complement the detection and correction of Double consecutive radiation impacts.