IV. AHR MIPS Performance Evaluation
4.5 Error Free HITL Simulation
4.5.4 Second Attempt Results
Learning from the first attempt that implementing a processor and its memulator on the same FPGA confounds any sort of meaningful energy analysis, the second attempt at a HITL simulation placed a processor on one FPGA and its memulator on a separate FPGA. Unfortunately, the various MIPS processors failed to function properly when operating in this fashion. This problem was explored in detail for the TMR MIPS processor.
The first step in the troubleshooting process was to use the light emitting diodes (LEDs) and hex displays on the processor and memulator DE-10 Standard boards to display meaningful signal and state information. These visual indicators revealed that TMR MIPS attempts to read the first instruction from the memulator. The memulator then provides data back to TMR MIPS and TMR MIPS appears to receive that data. At this point, the TMR Voter fails to transition to the next state in which it should provide the results from memory back to the three Basic MIPS processors. Instead, it stays in the state where it continues to request the first instruction from memory.
The next step was to implement a clock divider on both FPGAs. The clock divider on each FPGA could be varied by adjusting the switches on each DE-10 Standard board. The clock dividers were independent of one another and only set the clock speed for the FPGA on the same board. Reducing the clock speed to approximately 2 Hz on both FPGAs caused the TMR MIPS processor to function correctly. Increasing the clock frequency on both boards would allow TMR MIPS to function properly for a while, but would then encounter a Type A or Type B error when no errors were being injected. Occasionally, the TMR MIPS processor would freeze up as observed in the first troubleshooting step, but the processor would stop at different TMR Voter and Basic MIPS processor states than the one observed in the first step. Operating
TMR MIPS at such a low frequency would prevent any meaningful measurements of program runtime and energy usage, so more troubleshooting steps were pursued.
The third troubleshooting step was to use Quartus II to perform a timing analysis on the TMR MIPS processor and TMR Memulator. These timing analyses showed that neither could run at the default 50MHz clock frequency, but could run at 25MHz. A clock divider that reduced the clock frequency to 25MHz was implemented on the processor and memulator and Quartus II indicated that the updated designs passed the timing analysis. When implemented in hardware, the results were identical to what was revealed in the first troubleshooting step.
The final troubleshooting step built upon the third troubleshooting step by utiliz- ing the Quartus II SignalTap tool to monitor the signals on the TMR MIPS processor and TMR Memulator while in operation. The results of the SignalTap analysis were identical to the results of the first troubleshooting step once again.
The results of this troubleshooting process suggest that there is a problem with how Quartus II implemented the TMR MIPS processor design in hardware that was somehow partially corrected when the switched clock divider was implemented. It is beyond the scope of this research to delve into the inner workings of Quartus II and discover how it makes its hardware implementation decisions. Future work may implement the TMR Voter in the same manner as the Basic MIPS processor; the TMR Voter may be implemented entirely from NAND gates and D-flip-flops rather than using high-level VHDL language to describe a state machine. This approach may force Quartus II to implement the hardware such that the TMR Voter does not fail to make a state transition after receiving a response from memory.
4.6 Summary
This chapter discussed the approach to verifying that Basic MIPS (unmitigated processor) TMR MIPS (TMR strategy), TSR MIPS (TSR strategy), and AHR MIPS (AHR strategy) worked as designed. It further examined the method whereby the architectures could be compared to one another in terms of program runtimes and energy usage in the absence of errors so that useful determinations could be made about the relative advantages and disadvantages of each architecture with the goal of highlighting the advantage of AHR over TMR or TSR alone. It was shown that, in the absence of errors, AHR does combine two different redundancy methods, allows switching between these redundancy methods, provides flexibility in selecting when to switch between these redundancy methods, and opens up tradespace in time and energy allowing AHR to function more like TMR or TSR or anywhere in between in terms of time and energy performance.
This chapter also discussed how to perform Hardware-in-the-Loop (HITL) simula- tions in order to collect representative data on the performance of these architectures when implemented in hardware.
These simulations, analyses, and HITL simulations have not yet demonstrated the ability of AHR to operate in radiation environment where SEUs and SETs may occur. Chapter V will examine how to inject errors into TMR, TSR, and AHR MIPS for the purpose of evaluating their performance in the presence of errors in Chapter VI.