• No results found

A. SOFTCORE DEVELOPMENT

1. XUM Softcore Processor Modifications

We first verified the functionality of the XUM SOC softcore before performing

any modifications. We conducted this on a XUPV5-LX110T development board, the

same used to build the original core [18], to eliminate any design variables. Later,

development was conducted on a Genesys 2 development board featuring a Kintex-7

325T FPGA. We found one minor issue involving the UART instantiated within the

system and communicated this back to the author of the core for verification. We also

noticed that memory accesses were not keeping up with the demand of the processor.

Modifications were made to the core to resolve these two issues and then techniques were

applied to the core to make it internally triple-modular redundant.

18

a.

UART

The UART, which has a dual functionality of being able to either boot-load a new

program into memory or allow a user program to communicate [18], was found to stop

receiving data for user programs after sending its first transmission. We discovered that

during the write back of any character, BootSwEnabled transitioned from low to high,

and following that transition, the receive first in, first out (FIFO) module quickly

emptied. The issue is caused by a race condition. When BootSwEnabled goes high, it

switches to boot-load mode and empties the receive FIFO searching for a specific string

of character and causing the user program miss all of its input data. This is shown in

Figure 8. While we were able to correct the issue with the UART, and used it extensively

in testing, we removed the entire module from the final design since NPSAT-1

communicates over its PC-104 bus; however, the concept of using the communication

interface to control, temporarily, the instruction data bus for reprogramming the operating

system was kept and adapted into our PC104 interface.

Waveform Demonstrating XUM UART Improperly Transitioning

Figure 8.

BootSwEnabled on Write

b.

Adjusting the Memory Access Protocol

The XUM core utilizes a four-way handshake to retrieve instructions and data

from memory. In the original core, memory is instantiated using block RAM with

registered outputs. Examining both the code and waveforms produced when executing a

simple program on the processor, we noticed that even with the memory clocked at twice

the speed of the processor, 66 MHz and 33 MHz, respectively, the memory was unable to

keep pace with the processor and took three central processing unit (CPU) clocks to

retrieve one instruction as shown in Figure 9. The author himself notes that “instruction

19

memory fetches only once per handshake, the minimum theoretical CPI is greatly

increased from 1 to between 3 and 4” [18].

Waveform Capture of XUM Instruction Fetch Utilizing Four-way

Figure 9.

Handshake Protocol

To resolve this issue, we implemented a L1 cache onboard the FPGA and

replaced the handshake protocol with a simple flagging protocol. In the new protocol, we

assume that the processor makes requests to the L1 cache every CPU cycle, either a read

or write. Based on this assumption, we designed our cache to be able to resolve a hit

within the same CPU clock cycle it is received. Because the cache is able to respond

every CPU clock cycle, only a single valid line needs to come from memory to the

processor. If the valid line is invalid, the processor stalls; otherwise, it accepts the data on

the data lines.

c.

Application of the GTMR Architecture

The process of applying the GTMR architecture to XUM consisted of a two-step

approach. We first BTMR the system as a whole. Then, we GTMR the system module by

module. We also structure our code to resemble the BTMR hierarchy, with all voting,

both BTMR and GTMR, taking place at the level where the systems are triplicated as

shown in Figure 10. The benefit of this structuring is that there is little disturbance to the

internal interconnect of the systems beneath the TMR hierarchical level, so any system

viewed below the TMR hierarchical level can be viewed as a single operable system. The

inputs and outputs of this single system can be directly mapped to the top-level ports, and

the system functions. This allows us to test a single system for functionality and reduce

compilation times.

20

Schematic Drawing of the CFTP TMR Level

Figure 10.

21

In applying BTMR, we must first identify all parts of the system that cannot be

triplicated. These are instantiated on the top level. In the case of the original core, this

was only the top-level pins; however, we also include the clock management tile (CMT)

on the top and generate triplicated clock outputs from it. The reason for this is to

minimize clock skew / jitter, which can be induced by chaining together multiple CMTs.

The top module then routes the appropriate signals to the TMR level.

At the TMR level, the entire system was triplicated. Inputs from the top level are

distributed to each of the three systems, and outputs from each system are voted on

before being sent to the top level. This completes the BTMR implementation, and we

verified that the system was operable with three processors executing synchronously.

To progress from our BTMR XUM system to a GTMR system, we had to first

locate every register inside of the XUM softcore. These registers were then modified so

that their outputs were directly routed to the TMR level vice the connecting logic. After

progressing through the GTMR voter, the signal was then routed back to the connecting

logic for which it was originally destined. Furthermore, all registers were modified to

update every clock cycle. In the event the processor stalled or a register was not going to

be actively written to, such as those in the register file, a voted on output is still updated

in each. Since the process of modifying any single processor did not compromise the

system functionally, we were able to apply the GTMR module by module and verify the

system in between updating each module so that any errors produced could more readily

be identified.

This standardized process of applying the BTMR and GTMR architectures was

applied to every submodule later instantiated within CFTP with the exception of the

following: the CMT, the internal configuration access port and frame error correction

code primitives, and the double-data rate type three (DDR3) IP softcore.

Related documents