1.3 Thesis Contributions
1.3.2 Component-Level Analysis Phase
In this phase, the analysis of vulnerability is focused on each individual component that forms the system design. Unlike the fault tree analysis phase, where the compo- nents were treated as black boxes, in this phase each component is further elaborated and analyzed based on its internal architecture. The goal of this phase is to identify the sources of the component’s vulnerability and to study the most efficient ways to internally mitigate that vulnerability. This phase include the following steps:
Obtention of the component’s behavior description. For example, if the com- ponent to be analyzed is an Arithmetic Logic Unit (ALU), we must obtain information about it’s architecture, what operations it performs, what is it con- nected to, etc.
Next, a library of system operations is built. Much like classes in an object oriented language, this library contains models of the operations (subcompo- nents) that serve as the building blocks for the component to be analyzed, such as logic gates, adders, multipliers, etc.
Based on the component’s behavior description, operations from the library of system operations are instantiated in order to construct the model of the component at system-level.
Once the model is ready, an exhaustive fault injection analysis is conducted in order to identify any weaknesses in the component.
The component, subjected to fault injections, is verified against a set of prop- erties derived from the system specifications. This verification process is done through probabilistic model checking with PRISM. With this analysis, the im- pact of each fault injection scenario is obtained.
Next, with the results obtained through the probabilistic analysis, the limit of the component’s availability is estimated.
If the availability of the component does not meet the preestablished perfor- mance metrics, TMR mitigation is applied to the most critical subcomponents. In this case, the model is updated to reflect the changes, and an evaluation of the mitigation overhead is conducted. Then, the probabilistic analysis process is repeated.
If the availability of the component meets the preestablished performance met- rics partially (i.e., the component passes the test but some of its subcomponents do not), fault mitigation must be applied to the critical components and the model must be reevaluated, restarting the fault injection process.
If all the availability metrics are met (component overall, and all the subcom- ponents), the phase ends and the results are reported.
Finally, an additional step takes place, in which the mitigation proposed by each of the phases is compared and the most efficient solution is chosen. The problems addressed by this phase of the methodology are encapsulated in questions Q1. Q2, Q3, Q5 and Q6, presented in Section 1.2. The advancements achieved by the proposed techniques are the following:
1. Investigate the impact of soft-errors at system-level using PMC: The proposed system-level approach consists in modeling the system to be tested with all its components and their expected logical behaviors. Then, the fault- injection points are identified and the fault propagation paths are obtained by counter-example generation with PMC. Subsequently, the analysis performed consists in the probabilistic evaluation of several vulnerability metrics, such as Mean Time to Failure and Mean Time To Recover. These metrics are evaluated for each individual type of fault in the system. Furthermore, the analysis com- putes of the contribution of each component of the system to a failure. This idea generated the following publications:
C4: Ammar, M., Bany Hamad, G., Ait Mohamed, O., Savaria, Y., Ve- lazco, R. Comprehensive vulnerability analysis of systems exposed to SEUs via probabilistic model checking. In IEEE European Conference on Radiation and Its Effects on Components and Systems (RADECS 2016).
J2: Ammar, M., Bany Hamad, G., Ait Mohamed, O., Savaria, Y. System- Level Analysis of the Vulnerability of Processors Exposed to Single-Event Upsets via Probabilistic Model Checking. IEEE Transactions on Nuclear Science. 2017 Sep;64(9):2523-30.
B1: Ammar, M., Bany Hamad, G., Ait Mohamed, O., Savaria, Y. (2018). System-Level Modeling and Analysis of the Vulnerability of a Processor to Single Event Upsets (SEUs). Velazco R, McMorrow D, Estela J. Radiation Effects on Integrated Circuits and Systems for Space Applications.: (pp. 13-38), Springer, 2019. DOI: 978-3-030-04660-6 2.
2. Application-based analysis of the impact of soft-errors on a CPS us- ing PMC: To provide a better estimation of CPS vulnerability, not only the
hardware but also the software (application) must be considered in the anal- ysis. Based on my component-level modeling approach for fault injection and generation of fault propagation paths, I have proposed a new analysis technique to perform PMC on an application execution trace. For each instruction, the propagation of SEUs is modeled as a Continuous-Time Markov Chain (CTMC), based on the hardware’s microarchitecture. From these models, a full estima- tion of the fault propagation probabilities and latency through each instruction is computed. Furthermore, this model allows the analysis of fault propagation probabilities through the entire program execution. This analysis resulted in the following publications:
C5: Bany Hamad, G., Ammar, M., Ait Mohamed, O., and Savaria, Y. System-Level Characterization, Modeling, and Probabilistic Formal Analysis of LEON3 Vulnerability to Transient Faults. In IEEE European Conference on Radiation and Its Effects on Components and Systems (RADECS 2018). J3: Bany Hamad, G., Ammar, M., Ait Mohamed, O., and Savaria, Y. New Insights Into Soft-Faults Induced Cardiac Pacemakers Malfunctions Analyzed at System-Level Via Model Checking. IEEE Access. PP. 1-1. 10.1109/AC- CESS.2018.2876318, 2018.