• No results found

In-Circuit Debugging Tool

In this section, we present the proposed in-circuit debugging method, which con-sists of two phases, as with the conventional design flow depicted in Figure 3.4:

the design (generalized) phase and the debugging (specialised) phase.

The first stage is the design phase and it is dedicated to creating a virtual net-work of multiplexers that contains Boolean functions and describes multiple sig-nal sets, according to the PConf flow. The parameterized debugging architecture in fact is created during this phase, by annotating as parameters the selection bits of the internal signals. The debugging infrastructure is added on a design and the design is then synthesized, placed and routed and a generalized bitstream (with Boolean logic) is created. This varies from the conventional bitstreams, as a vir-tual FPGA overlay is created that has a Boolean function that describes the way signals are connected through trace buffers.

Create & Implement design

Add debugging core to the design

Use Analyzer to debug the design

adapt core/ reimplement

Step 1 Step 2 Step 3 Conventional

Create, Implement, add hw to design

Use Analyzer to debug the design

Step 1 Step 2

Proposed

Figure 3.4: Outline of the conventional versus the proposed debugging process. The elimination of one step boosts runtime efficiency.

At the end of these steps, before the bitstream is loaded within the FPGA con-figuration memory, the tool evaluates the Boolean function with the parameter val-ues denoting which signals are observed and creates a specialised (conventional) bitstream. The intermediate steps that complete the design phase are the main focus of this work.

The second stage is called the debugging phase and performs the actual debug-ging. During this phase, for each debugging turn, the design can be reconfigured with different signals. The signals that are not traced at the same time can share routing resources (based on the parameter settings). This is outlined in Figure 3.5.

Each phase is described step-by-step in the remainder of this section.

3.4.1 Design phase

The method used to apply the proposed technique enables automatic generation of PConfs starting from the DUT and is based on the same steps as conventional FPGA tool flows: synthesis, technology mapping, placement and routing [56].

Figure 3.6 outlines this stage of the tool flow.

Custom Hardware Addition

At the beginning of the design phase, the tool locates the internal signals and adds the multiplexer network, that is connected to the internal signals. Then, the trace-buffers are integrated in the design.

In more detail, the tool reads the DUT and locates the internal signals and con-nects them with the debugging infrastructure randomly. The multiplexer’s selec-tion bits are annotated as parameters. The parameters will indicate whether or not

Online stage - Debug time Offline stage - Compile time

TCON

Figure 3.5: Proposed debugging flow. The proposed two discrete offline and online stages boost runtime efficiency.

a single signal has to be selected to be observed in a certain debugging run. In or-der to achieve that, these multiplexers are implemented not in the regular resources but in the FPGA’s reconfiguration resources, reducing the overhead significantly.

An overlay network that routes the possible signals to the available trace-buffers is created and the appropriate annotation is automatically generated, that will enable the parameterization of the added hardware, at a later stage. Figure 3.7 demonstrates in different layers how the signal parameterization is achieved.

The middle layer shows the FPGA and the signals that need to be observed. Then the virtual level adds the infrastructure that multiplexes the signals to trace-buffers (top layer). Therefore, after the hardware addition the DUT has an integrated de-bugging infrastructure, that is almost the same size as the original circuit, but for an extended circuit with tracing infrastructure installed at its signals. Hence, dedi-cated FPGA resources (that are claimed before implementation) for the multiplexer network and for the trace-buffers are no longer needed.

The parameterized debugging infrastructure is integrated inside the normal CAD flow, in order to alter as less as possible the critical path delay, to prevent additional routing stress when new signals are to be traced, and to offer automation of the process. Since the debugging infrastructure is incrementally added during compilation, it is optimized alongside the original design.

HDL circuit

Synthesis

Logic optimisation (ABC)

parameterised FPGA configuration .net netlist of logic &

heterogeneous blocks

parameterised netlist

Verification Phase FPGA

architecture description file

towards (T)LUTs & (T)CONs

Placement Routing (TRoute) Packing (TPack) Technology Mapping

(TCONMAP)

Place & Route Signal

Param SignalRank GateRank

Signal-Set Select Toolset Add Debugging

Infrastructure

Synthesised gate-level netlist (Quartus)

Figure 3.6: Schematic of the generic stage of the proposed tool flow.

Synthesis

At this point, the DUT is synthesized. The synthesis step can be performed by ABC, or by any tool that is able to synthesize functional blocks to an FPGA flow and communicate directly with it. ABC is a part of the VTR flow [97] (a common academic FPGA CAD flow). This is needed, as the next stage we will use the graphs that are generated by ABC. Additionally, a synthesis tool that can pass the parameterization information without altering the design is needed.

Overlay

ΤΒ

ΤΒ ΤΒ

FPGA Design ΤΒ Design

Trace Buffers

Figure 3.7: Demonstration of the separate layers. The user circuit, the parameterized multiplexers and the trace-buffers respectively.

TCON Technology Mapping

During technology mapping, the parameterized Boolean network of the added de-bugging infrastructure that was generated during signal parameterization is not directly mapped onto the resource primitives available in the target FPGA archi-tecture, but intermediately on abstract primitives that introduce and allow the re-configurability of the logic and routing resources. Therefore, the extra multiplexers added to guide the internal signals to the (also added) trace-buffers have their selec-tion bits parameterized into Boolean funcselec-tions and mapped in the virtual abstract primitives.

TPaR Placement and Routing

Next, the Tuneable Place and Route tool (TPAR) places and routes the netlist and performs packing, placement and routing with the algorithms TPack, TPlace and TRoute accordingly [12, 56, 147]. These algorithms route the DUT in a way that their routing resources can be reused to route a new subset of signals to the trace-buffers during debugging. Hence, the signals that can be traced can share routing resources (based on the parameter settings).

3.4.2 Debugging phase

At the end of the computationally intensive design phase the TPaR creates a PConf.

The debugging phase is the specialised stage of the flow, where the DUT’s trigger is initiated and the debugging can start. During this phase, for each debugging turn, the design can be reconfigured with different signals. The signals that are

not traced at the same time can share routing resources (based on the parameter settings). For each debugging cycle the network that is partially reconfigured with the exact signals the designer wishes to trace, the new signal selection translates directly into a new evaluation of the function that represents the selected signals.

Then it can be reconfigured with Dynamic Partial Reconfiguration.

The multiplexer network added with the signal parameterization tool is recon-figured with the specialised solution which is evaluated according to the new sig-nals that have to be observed. In order to generate a new set of sigsig-nals, a specialised configuration is automatically performed, with minimal user interaction and with-out halting the DUT. The Boolean functions are evaluated for a specific parameter value by the Specialized Configuration Generator to generate a specialized bit-stream. Usually it is implemented on an embedded processor. The embedded processor is responsible to swap the specialized bitstream into the configuration memory using the HWICAP. It is worth noting that here, only the configuration cells of all the routing switch boxes and the connection boxes for the memory resources will be reprogrammed, instead of the full recompilation and/or reconfig-uration, (as it is the case in related work). Therefore, by implementing a PConf the extra recompilations are avoided during debugging, since only an evaluation of a Boolean function is needed, instead of recompilation and/or reconfiguration.

Moreover, there are no FPGA resources dedicated to the inserted multiplexers.