3.4 Techniques That Address High Defect Rates
4.2.4 Bitstream Loader
Pack Place Route Alternatives Generation Bitstream Load
CYA Bitstream Loader
Load Base Config First Net
Test Path
Keep/Commit Path Good
Next Net? Yes
Done No Next Path? Abort! Yes No Bitstream Basic Config - Base Config - Alternatives - Testing Instructions - Additional Metadata Bad User Design
Conventional Flow CYA
CYA Bitstream
I decompose the loader into four components, a pro- grammer, a deprogrammer, a tester, and a controller.
Programmer
By the term “programmer” I refer to that element of any FPGA bitstream loader that sets the configuration
bits. In the simplest case, we might have random access to the configuration bits (c.f. Xilinx 6200 [90]). This makes it fast and easy to set each bit but demands greater area overhead for configuration support than conventional configuration chains.
At the cost of more time and work loading the bitstream, we can exploit the frame schemes that exist in modern FPGAs (e.g., [91, 93]). Specifically, we could organize the bitstream to specify the frame address and the address of bits to change within the frame. Then, the loader can: (1) read out the old frame, (2) change the specified bits in the frame, and (3) load the modified frame. This is the same kind of operation used by Xilinx J-bits to perform bitstream modification on Virtex series components [28]. Such a scheme would require no changes to the core of the FPGA. The cost is longer load times, as we must spend an entire frame read/write sequence for every frame touched by an alternative (see Section 4.6.2 and Tables 4.4 and 4.5).
Deprogrammer
When a path fails, the loader must undo the configuration changes to release resources for other paths. Functionally, the deprogrammer must roll back the configuration to its state before the last path was added. One way to accomplish this is to record changes made during programming so that they can be reverted; this has the advan- tage of demanding no semantic understanding of the bitstream, but requires space
to store the changes. An alternate version might use the same path specification for programming with the configuration sense reversed.
Tester
The tester is responsible for testing each path and reporting the success or failure of the test. The bitstream loader only needs to know if the end-to-end path test fails. The alternatives encoded in the bitstream directly tell the loader what to try next when a test fails. If the bitstream loader does not have random access into the bitstream, the loader will need adequate local space to store the current test specification to be used with the sequence of alternatives.
One simple way to support testing is to drive and recover data using the internal clustered logic block (CLB) flip-flops. These flip-flops can still be used for observ- ability even if they are not used in the design. In some cases, we might reconfigure the CLB logic to facilitate testing. For example, the source CLB would be configured to drive the path under test from a flip-flop and configure the destination LUT as a buffer with its flip-flop enabled. It may be possible to set up the tests using bit- stream configuration, trigger transition tests using readback capture, and then view the results using configuration and state readback [92, 93]. End-to-end connectivity tests check that we can see both driven zeros and driven ones at the destination.
Timing tests can be performed using a variant of “launch-from-capture” transition fault testing (e.g., [73, 86]). This simple change from “does the signal get through” to “does the signal arrive on time” can be combined with an outer loop to optimize overall delay, as described in Sections 5.2 and 6.1.
To expand the class of resources covered by this approach, LUTs/subblocks can be tested with conventional procedures (e.g., test patterns) before this path testing. Additionally, it may be necessary to swap LUTs at the start or end of a path before
performing the path test.
Controller
The controller coordinates the other units to implement Algorithms 4.2 and 4.3. It is a very straightforward entity that makes no complex decisions and performs no complex actions. Notably, the controller does not need to understand the FPGA architecture or the semantics of the configuration bits. All the intelligence about the meaning of the bitstream is effectively compiled into the bitstream. The controller only needs to mechanically follow the bitstream load program. With suitable test support, the embedded PowerPC on Virtex devices could be used to run this algorithm, using the internal configuration access port (ICAP) [93] to perform the configuration.
Algorithm 4.2: Bitstream Load Algorithm
input :Bitstream B
output:Boolean conf igured?
foreach 2P T N et N ∈ {B} do
f ound←F ALSE;
while (not f ound) do
if N.outOf P athsthen
return f ailure;
P ←N.nextP ath;
if P.isU sable(P)then
P rogram(P); if T est(P)then f ound←T RU E; else DeP rogram(P); return success;
Algorithm 4.3: isUsable Function for Bitstream Load Algorithm
input :P ath P
output:Boolean usable
/* First free up the LUT, if required and possible */
if inU se(P.node) then
if canM ove(currentOccupant(P.node)) then
move(currentOccupant(P.node));
else
return F ALSE;
/* To utilize fanout, two-point nets may share common prefixes */
while P.hasM ore and P.next.matches(Current Conf iguration) do
P.step;
while P.hasM ore do
if inU se(P.next.node) then
return F ALSE;
P.step;
return T RU E;