3.6 Results and Discussion
3.6.1 Offline Comparison of Area Utilization
First, the debugging infrastructure that incrementally adds a multiplexer network to be connected with trace-buffer infrastructure is integrated in the design. It is assumed that the trace-buffers pre-exist in an FPGA and the user can add only an overlay network that routes all possible signals to all possible trace-buffers.
For the comparison of the area utilization, the benchmarks were synthesized with Vivado 2016.1 and the two PConf tool flows. The FPGA architectures used were a Virtex 7 (in Vivado) and a theoretical architecture that is mimicking the basic structures of commercial architectures, but with full transparency on the configurable infrastructure, so that it is supported by the PConf tool flows. All
Vivado ILA ABC PPD FPD
LUT FF LUT FF LUT LUT LUT
b12 224 119 15538 19715 309 343 329
b13 37 51 4092 5552 51 72 59
b14 2165 219 46749 53316 1047 1165 1101 b15 1948 416 26710 33464 1952 2015 1979 Table 3.1: Logic Utilization of the debugging infrastructure for the benchmarks implemented with different tools. The original design is mapped with Vivado and ABC. The
two proposed methods are the PPD (partially parameterized debug) and FPD (fully parameterized debug). ILA is the trace-based debugger by Xilinx and ABC an academic
synthesis tool that supports PConf.
architectures contain 6-input LUTs for fair comparison.
Multiplexer Network
The proposed debugging architecture (the one attached to each internal signal) needs tuneable connections and tuneable LUTs. If it is implemented with the con-ventional tools, it needs up to 56⇥ more LUTs to ensure all signals can be linked to the trace-buffers. The proposed architecture has a small impact on physical LUTs for each design. This happens because the debugging infrastructure is pa-rameterized and integrated in the reconfiguration resources. Hence, the number of physical resources remain almost constant. The multiplexer network creates vir-tual connections between the signals and the trace-buffers, as it was explained in Section 3.3, reducing the debugging overhead significantly.
Integrated Logic Analyzer
In order to support commercial architectures, a similar design and verification pro-cess as the one described in Section 3.4 needs to be followed with minor modifi-cations to support commercial FPGA architectures (Xilinx, Virtex-7). The basic difference is that the Vivado’s Integrated Logic Analyzer (ILA) core is used, in-stead of any parameterization-based optimizations. The core has been configured in such way that it is able to debug as many signals as possible, with a trigger and probe set to minimal, to maximize the area savings. However, even though the area overhead of the debugging core is kept minimal, there is a massive increase in the used resources. This effect results in two designs (b14, b15) not being able to be placed on the FPGA, requiring a larger device and repetition of the whole process. The results are depicted in the ILA column in Table 3.1.
With our proposed methods, the only area overhead that exists, is a minimal increase in the number of physical FPGA resources (LUTs) after the installation of the debugging architecture, which is expected, as not all the multiplexers’ inputs
Figure 3.8: Graphical representation of the area results (in LUTs) and the number of parameterized resources (TLUTs, TCONs). The LUT in the pie diagrams represents the
total number of LUTs needed (including the ones with the parameterized resources needed). The TLUT and TCON show the number of parameterized resources are needed
in respect to LUTs.
are parameterized. There is a large increase in the virtual FPGA resources (TLUTs and TCONs), but with minimal impact on the design’s area and performance. The results are shown in Table 3.1. In the table, the benchmarks were first synthesized with the Vivado and ABC tools. The columns Vivado and ABC show the area in number of LUTs for each benchmark without any debugging infrastructure. The ILA column shows the area impact after the instantiation of the debugging core on a commercial device. The PPD column shows the area results obtained with the TLUT flow, were only the LUTs were parameterized, presented as the Partially Pa-rameterized Debugging (PPD) results. The FPD results in the last column indicate the Fully Parameterized Debugging (FPD) infrastructure, where all the debugging infrastructure has been fully parameterized with the TCON tool flow (parameter-ized routing infrastructure). We can observe that with the proposed tool, we have significantly less debugging resources (shown in the last two columns), compared to the vendor tool (ILA).
Figure 3.8 visualizes the number of actual and parameterized resources needed for the debugging infrastructure. The lower part of the figure shows the area of the virtual multiplexer network mapped in TLUTs and TCONs instead of actual resources. In this figure is illustrated how the the existence of virtual resources (TLUTs in PPF and TCONs in FPD) allows the designs to maintain low area over-head, despite the insertion of the debugging infrastructure. We can observe that due to the virtualization, the area overhead increases up to 10% with the proposed method.
The results indicated as PPD contain LUTs and TLUTs, while the results in-dicated as FPD contain LUTs and TCONs. In general, some Vivado implementa-tions can use fewer LUTs for the golden design (Vivado column) in commercial tools compared to the tools that target theoretical architectures (ABC column).
This is due to the fact that these tools probably contain better optimizations for numerical operations, compared to the generic theoretical FPGA architectures that are used in academic tools. However, designs with more multiplexers are better optimized using the PConf tools. Especially TCONs are heavily used for the im-plementation of the multiplexing between internal signals and trace-buffers for the debugging architecture, because multiplexers are well suited for optimizations of shared resources.
One can observe in Table 3.1, that with Vivado 2016.1 the implementations on a Virtex-7 are on average 21.3⇥ larger than the proposed implementations. The proposed architecture (FPD) with the integrated debugging functionality is in fact 11⇥ smaller than the design integrated with Vivado’s ILA and 1.9⇥ larger than the golden application(ABC column). Therefore, with an area penalty of approxi-mately 10%, the debugging functionality can be integrated in a given design. How-ever, there are some kinds of design that debug can be more complicated, such as with third-party IPs that realize high frequency, high-dense communication (neural
Vivado Proposed AXI-HWICAP AXI-HWICAP (DMA) HWICAP MiCAP-Pro
b12 480 1690 3.2 1.03
b13 500 1684 2.9 0.83
b14 2050 7500 13.8 3.846
b15 2420 8530 8.97 2.49
Table 3.2: Comparison of reconfiguration time between the proposed technique and Vivado-based implementation
networks), or high DSP utilization. In these cases, the proposed approach should be integrated in the Vivado toolflow in a way that it can still access the internal signals within the IP. This is investigated in Chapter 4.