Generic CAD Tools for FPGAs - Related Work and State of the Art

3. Related Work and State of the Art

3.4. Generic CAD Tools for FPGAs

by a multitude of programmable logic cells of the underlying physical platform. This is the price that needs to be paid for portability and higher flexibility and essentially there is an analogy to the comparison of FPGA vs. ASIC. Thereby, the factor of area overhead of a virtual FPGA over its underlying physical FPGA platform depends mainly on the granularity of the underlying platform as well as how well the virtual resources can be matched by the physical resources. This factor is individual for each combination of virtual architecture and underling platform. Thus, the same Virtual FPGA has a different area efficiency on one underlying platform than on another and a change in the design parameters of the virtual FPGA can turn the game. That’s why it has been difficult to compare the few existing virtual FPGA architectures with each other and any context-less conclusion about the superiority of one virtual architecture over the others is of limited validity, not only due to the lack of transferable quantification but also due to different purposes and abilities of the existing solutions. To bring more transparency regarding area efficiency and to facilitate future comparisons, Chapter 6.3 introduces transferable metrics that can be applied on virtual architectures for various target technologies.

3.4. Generic CAD Tools for FPGAs

The basic purpose of CAD tools for FPGAs is to map an application onto an FPGA architecture and to generate a bitstream with which it is possible to configure the resources of an FPGA in a way that it creates a respective circuit to run the application.

Existing vendor tools for COTS FPGAs generally don’t support custom FPGA architectures. During the past two decades a number of academia driven efforts were carried out to provide parameterizable tools for exploring custom architectures and mapping applications onto them.

SIS, a system for sequential circuit synthesis brought forth in [88] is a framework for test- ing different algorithms and for synthesizing and optimizing sequential circuits by receiv- ing any one of the following as input: State transition table, signal transition graph and logic-level description. On one hand it generates an optimized net-list of the underlying technology and on the other hand it maintains the input-output behavior.

In 2006, a technique called And-Inverter Graph (AIG) was introduced in [75] which rep- resents the combinational logic using a network of two-input ANDs and inverters. By switching between AIG rewriting and AIG balancing, area optimization without increas- ing delay and delay optimization without increase in area are obtained respectively. Im- plemented on the sequential logic synthesis and verification tool ABC, this technique was able to be faster than SIS and MVSIS yet offering a better quality, which will be beneficial for applications like hardware emulation, estimation of design complexity and equiva- lence checking.

Quartus Integrated Synthesis (QIS) made known under Quartus University Interface Pro- gram (QUIP) [7] can be used in two modes, either as a comparison tool or as a front end to convert VHDL/Verilog design codes into formats used by the academic tools. This tool accepts input not only as VHDL/Verilog code but also as schematic, instantiated LPM modules and as IP cores. It supports constructs like FOR, GENERATE, GENERIC, etc,

finite-state-machines, RAM and multipliers by converting them into synthesizable sub- sets, logic, embedded memory blocks and DSP blocks respectively.

[14] describes a CAD tool called Versatile Place and Route (VPR) for FPGA architectures, which can perform placement and routing either as global routing or as a combination of global and detailed routing by taking the mapped netlist and architectural description of the targeted FPGA as its input file. The output file is also helpful in determining the utility of routed wire length, track count and maximum net length. Though their architecture description does not include segments with more than one logic block, they are highly flexible to add new routing architecture features.

Followed by [14], VPR 5 [64] introduces four new compelling features to the VPR tool such as single-driver routing, modeling heterogeneous logic blocks like hard memory and multipliers, optimization of electrical models in different process technologies and a set of regression tests needed to verify the functionality and quality of the output results in order to maintain robustness of the tool.

A rich toolset called MEANDER, which consists of non-modified academic tools (Free- HDL, SIS, T-VPACK), modified academic tools (E2FMT, ACE, VPR) and new tools (DI- VINER, DRUID, DUTYS, DAGGER), was presented in [96]. By accepting inputs as VHDL design files of the application, all the necessary steps from elaboration, format transla- tions, synthesis, logic optimization, activity estimation, packing, placement and routing onto custom island style FPGA architectures can be done with this toolset. It includes also a tool for bitstream generation, which however is not compatible to custom FPGA architectures other than the AMDREL FPGA. Another specialty of MEANDER is a web interface to operate the tools on a remote server (currently hosted at [69]) from any web browser or through ssh without the need of on-site installation.

NAROUTO, a framework for having architecture-level exploration in terms of delay, area and power/energy estimation in heterogeneous FPGAs [93] is an open-source tool. In order to automate the annotation of the generated net-list, a new toolset called Heteroge- neous Support Toolset (HST) has also been developed. One of the merits of this framework is its ability to handle designs with IP cores more efficiently.

Even though the above named tools are flexible and cover most of the steps needed for application mapping, there are still some parts missing for a complete toolflow from design entry to the final bitstream. The employed architecture models are very abstract, which is good for design space exploration but somewhat decoupled from actual implementation. Most critical, none of the existing tools is able to create executable bitstreams for custom FPGAs because they were intended mainly for exploration purposes and lack a bistream generation tool as backend or there is no way to bring in details about custom configuration mechanisms and organization. Note that MEANDER contains the tool DAGGER for bitstream generation onto the AMDREL FPGA architecture, however the tool is not suit- able for other custom and virtual FPGAs. Furthermore, a way of detailed manipulation and verification of the application mapping results with GUI is missing in the existing tools. To close these gaps, the proposed framework within this thesis contains a new tool called V-FPGA Explorer. It complements the above named tools by making a bridge between abstract layout and actual configuration. It transforms the textual synthesis and layout results of the other tools to an object oriented graphic capable representation, con- sidering the custom architectural mechanisms, and generates the final bitstreams to con-

3.5. 3D FPGA Architectures

figure a custom V-FPGA. At any time the layout and function can be altered through a GUI or through XML files. It supports a rich set of parameters for architecture customiza- tion of the V-FPGA and is capable to generate custom architecture abstracts (so-called architecture files), that are required by the place & route and DSE tools. Additional features are testbench generation for simulation and script generation for parameter sweeps and for running VPR in batch to automate benchmarking and extend DSE capabilities. In conjunction with QUIP, ABC, SIS, VPR and MEANDER it forms a powerful and rather complete toolflow for application mapping onto custom V-FPGA architectures.

3.5. 3D FPGA Architectures

Since routing resources and interconnect have the majority share on area and delay, there is more and more focus towards 3D interconnects and routing architectures by 3D stacking of multiple die layers with through silicon vias or microbumps. Thereby, in most cases the vertical connections between layers are established either in switch boxes or in logic blocks.

The Rothko 3D-FPGA introduced in [70] and [59] is based on stacked layers of sea-of- gates architecture with metal interconnections (called interlayer vias) between layers. The vertical connections are made through the unified RLBs (Routing & Logic Blocks), i.e. as shown in Figure 3.10 a RLB can connect to its adjacent neighbours within a layer plus to an RLB above and an RLB below from other layers. Each layer allows horizontal routing in one direction only. Since two layers are stacked face-to-face and thus have opposite routing directions, the routing direction of a path can be changed by changing the layer through the vertical interconnects. Logic-wise an RLB contains a 3-input LUT.

In [33] a 3D pipelined asynchronous FPGA is presented where vertical connections between stacked layers are made through the SBs (switch boxes) of the routing infrastruc- ture using pipelined 3D switches.

A model based study of monolithically stacked 3D FPGA is presented in [61] and [62], where a stacking is envisioned to take place within the same die (see Figure 3.11), thus allowing a higher density of vertical interconnects compared to chip or wafer stacking. However the monolithic approach is limited to adding only switch and configuration memory layers on top of the usual layers of a 2D FPGA, i.e. the logic still remains the same in a CMOS layer while only switch transistors and configuration memory cells are shifted to additional mask layers.

A 2-layer 3D-FPGA approach where routing and logic area are mostly separated on different layers is presented in [113]. As shown in Figure 3.12, logic and a small part of routing are on the first layer, while the second layer contains only routing. In contrast to most other approaches the 3D connections are not on the switch blocks but on the inputs and outputs of the logic block. They showed that it is possible to achieve for the MCNC benchmarks in average a 57% reduction of channel width by building 3D connections on the input and output pins of logic blocks.

Pin Pin Pin Pin Pin Pin Pin

Connection to layer above Connection to layer below RLB RLB RLB RLB RLB RLB RLB RLB RLB (a) (b)

Figure 3.10.: Rothko 3D FPGA [59]: a) routing structure of a layer, b) connectivity between RLBs SB CB SB SB CB SB CB CB LB LB LB LB LB LB LB LB LB Memory Layer Switch Layer CMOS Layer 3D-FPGA

Figure 3.11.: Lin et al.: Monolithically stacked 3D FPGA [61]

3.5. 3D FPGA Architectures

The architecture level exploration of Siozios et al. in [94] indicates that stacking of FPGA layers with TSVs in switch boxes can bring in average for the MCNC benchmarks reduc- tions of 13% in total wire length and 32% in power consumption while overall perfor- mance is increased by 35%. Further improvements in [97] with a heterogeneous mix of 2D and 3D switch boxes and different regions allow to reduce the number of vertical interconnects without penalty, whereby a reduction in area by 37% is possible compared to homogeneous 3D FPGAs. Compared to 2D FPGA, they show an improvement of 41% in delay, 32% in total power consumption and 36% in total wire-length.

Similarly as [43], [84], [99], [94] and [97], the 3D V-FPGA architecture presented in this thesis uses TSVs through SBs to establish connections between stacked FPGA layers, which in the first place is a choice justified by tool support, scalability and the possibility to support heterogeneous layers if needed, when compared to the other option of providing vertical interconnects through logic blocks. However, a difference in 3D V-FPGA compared to the prior art is that each port of a PSM can have an exclusive TSV connecting an identical port of a PSM from a different layer. A technique called LoopbackPropagation (see Section 4.1.4) allows then to route the TSV signal also to other ports of the same PSM. This can reduce the overall number of switches and configuration bits. Furthermore, the amount and distribution of TSVs can be parameterized in the 3D V-FPGA architecture.

4. V-FPGA: Virtual Field Programmable Gate

In document The Customizable Virtual FPGA: Generation, System Integration and Configuration of Application-Specific Heterogeneous FPGA Architectures (Page 51-57)