A majority of testing occurred out of system and was done to verify functionality
of the GTMR structure and memory interfaces. Exhaustive testing of all XUM softcore
instructions was not conducted as it was assumed to function properly since no major
changes were applied to it. Instead, the primary test program used was a slightly modified
version of the UART demo program provided in the original source code for the XUM
softcore that would output a character every half second and echo back any characters
received. To minimize hardware differences between the flight board and development
board, the Genesys 2 development board was used. This board features a KINTEX-7
FPGA of the same size and speed grade as that of the flight hardware, and so its
performance should be similar. It also contains Flash, DDR3 and SD Card memories on
board so we could verify those interfaces. Because a majority of the testing occurred on
the Genesys development board and not flight hardware, the communications interface
primarily used was XUM’s UART module directly connected to a personal computer
(PC) vice the NPSAT PC-104 interface. Internal logic analyzers, which communicate via
the Joint Test Action Group (JTAG) signal lines between the PC and FPGA, were used
extensively to both debug and verify the design.
1.
Fault Injection Testing
Fault testing was performed by manipulation of the system clock lines. By forcing
one of the clocks to run at a different speed from the other two, we observed the effect
that a SET on an individual system clock would have on the softcore. Two results can
possibly occur from a SET on a system clock: the affected system progresses to its next
state earlier than it should or parts of the affected system can slip into a metastable state
and progress to the next state or miss several clock cycles. Testing of SEUs on individual
registers is not necessary as a system-wide effect essentially causes an SEU in every
register of the affected system.
42
The case where an SET causes the affected system to progress to the next state is
shown in Figure 19. This is simulated by running one of the system clocks at twice the
frequency of the other two system clocks. As can be seen at sample-time two, all three
systems are synchronized. There is a slight propagation delay in the incoming instruction
seen in systems zero and one, however, by sample-time three those outputs become
stable. At sample-time four, system clock zero experiences its SET. We see that system
zero has transitioned to its next state based on its vote for the program counter (PC).
While systems one and two maintain that the PC should be 0000000Ch, system zero’s
vote has changed to 0000020Ch. Though system zero’s vote changed, the majority vote
of all PCs, which drive follow-on combinational logic in all three systems, remains
0000000Ch. At sample-time six, we see all the systems resynchronize and agree that the
PC should be 0000020Ch; system zero having been fully restored.
The case where an SET causes the affected system’s PC to go into a simulated
metastable state and not progress is shown in Figure 20. This is accomplished by running
one of the system clocks at one quarter the speed of the other two system clocks. It can be
seen at the rising edge of system clocks zero and one that the votes for the PC are
updated, and since they are the same, the true vote being sent to all follow on logic in all
three systems is the correct value. Due to the simulated metastable state of system two’s
PC, its vote cannot be relied upon, and in this example is frozen at the value where it
entered this simulated metastable state. This does not necessarily have to be the case, and
in a true metastability, the register could capture new values sporadically. Regardless, at
sample-time 268, system two’s PC registers can be seen to have recovered, and it can be
seen that system two is once again synchronized with the other two systems.
Fault injection into configuration memory was not tested. The primary method for
verifying the fault tolerance in the configuration memory would have been to subject the
device to radiation testing. Once again, due to the extremely limited development time,
we were not able to conduct radiation testing, and this needs to be verified on orbit. Since
the SEM module does produce error signals, we will be able to differentiate errors
resulting from configuration memory upsets form those originating in user memory.
43
Waveform Capture Demonstrating GTMR Correction of a Fault Induced by a Fast Clock
Figure 19.
Waveform Capture Demonstrating GTMR Correction of a Fault Induced by a Slow Clock
Figure 20.
44
2.
Performance Testing
Periodically, throughout development, waveforms were captured to determine
how quickly data are being passed between the various levels of memory in an effort to
determine where the softcore could be optimized. Shown previously in Figures 19 and
20, we see that we were able to provide the XUM processors with new instructions every
single clock cycle from the L1 cache, which is the optimal case. The penalty incurred for
a L1 cache miss is depicted in Figure 21. By counting the number of samples Inst_Ready
was low and dividing by four since the system clock runs at 50 MHz, we determined the
number of CPU clock cycles the processor was stalled. Inst_Ready was low for samples
376 to 442, 76 total samples. This translates to a miss penalty of 19 processor clock
cycles. During these 19 clock cycles, we transferred 1024 bits of data, or 32 instructions,
to the L1 cache. Finally, the miss penalty for the L2 cache is show in Figure 22. Similar
to the L1 cache, we determined the total miss penalty by observing the number of
samples Inst_Ready is low, which in this case is exactly the miss penalty due to the
decreased sample rate of 50 MHz. We see the total L2 cache miss time is approximately
26,500 clock cycles, a majority of which is the SD card either internally locating and
preparing the data, CurrentState = 0Ch, or the actual shifting of the data block to the
FPGA, CurrentState = 0Dh. The L2 miss penalty could be further reduced if the 4-bit SD
protocol were used vice the 1-bit SPI protocol; however, this only reduces the portion of
the penalty incurred during CurrentState = 0Dh and saves approximately 3,000 clock
cycles. In the case where a write-back must be performed prior to allocation of new data,
these penalties essentially double.
45
Waveform Capture of a L1 Cache Miss with a Sample Rate of 200MHz
Figure 21.
Waveform Capture of a L2 Cache Miss with a Sample Rate of 50MHz
Figure 22.
46
In document
Implementation of the configurable fault tolerant system experiment on NPSAT-1
(Page 60-65)