Hardware construction languages provide additional expressibility and parameterizability to hardware designers, greatly encouraging the development of reusable hardware libraries. While successful in their own right, HCL’s lack the ability to fully separate their source code from underlying platforms or technologies. The next chapter discusses how a hardware compiler framework enables this separation, as well as providing additional capabilities to Chisel hardware libraries.
Figure 3.4: Three processors Rocket, BOOM and DecVec reuse each other’s code. Modules used by all three designs include an ALU, a MulDiv unit, an ICache, a TLB, a Decoder, and an FPU. Modules used by Rocket and BOOM include a non-blocking data cache, a PTW, a CSR, and a BTB.
Chapter 4
FIRRTL: A Hardware Intermediate
Representation and Compiler Framework
Like how software compilers transform general-purpose code into specialized assembly, a hardware compiler transforms general RTL into specialized RTL. The FIRRTL compiler enables this automatic transformation of a design, unlocking a huge amount of potential through optimizations and other generic transformations. This potential is best understood through two main features of the FIRRTL compiler framework: (1) its reusable transforma- tions, and (2) its extensibility for customizations.
At first glance, writing an RTL transformation may seem like over-engineering; if a user wants to inline a module, why write an entire transformation when inlining manually is not very difficult? The key observation is that inlining is a common procedure required for most physical design implementations, and thus automation via transformation saves significant future manual effort. Indeed, writing a transformation only saves effort if its use- case is common enough that all future uses amortize the cost of the initial transformation development. This argument is similar to the argument in Chapter 1 for how reusable hardware libraries amortize their development costs over time. By making transformations easy to write and integrate within a compiler framework, the upfront development cost of a transformation is reduced and the number of worthwhile automatable tasks increases.
To motivate the need for an extendable hardware compiler infrastructure, consider the following example: a streaming digital-signal processing (DSP) hardware library. Every component in this library has a decoupled interface, where a queue of unknown size could be instantiated between each component. An unfortunate and unavoidable consequence of this library is that, if the queue size is zero, then the decoupled ready and valid signals between the components are vestigial yet form a combinational path. Ideally these signals would never be generated, but detecting this circumstance depends on knowing a neighbor’s configuration; the local information available to a given library component generator is not sufficient.
Using an extendable hardware compiler framework enables this streaming DSP library to analyze the entire design topology and remove these vestigial combinational loops au-
tomatically by writing and integrating their own custom transformation. Supporting this use-case requires the compiler framework to inspect and modify a design, be extendable for custom transformations, and support a robust mechanism for communicate information throughout the compilation process (e.g. which signals were generated by the library so other combinational paths remain untouched.)
This chapter contains a discussion of FIRRTL’s hardware compiler framework (HCF) through the following topics: (1) analysis of LLVM, an existing software compiler framework; (2) introduction of FIRRTL’s intermediate representation (IR); (3) description of value in- ference (e.g. widths) and its implementation; (4) support for arbitrary metadata throughout the compilation process; (5) the mechanisms for transforms to inspect and modify a design; (6) a description of interesting transformations; (7) an evaluation of the FIRRTL compiler framework.
4.1
Background
Modern software compiler frameworks, such as LLVM[27], consist of (1) frontends, (2) trans- formations, and (3) backends. A frontend parses programs written in a specific programming language (e.g. C++ or Rust) into a compiler-specific IR. IR-to-IR transformations such as optimization passes then can operate on and modify the program’s structure. Finally, a backend converts the IR into a program in the target ISA, e.g. ARM or x86. This structure of translating an input language into an IR enables reusing transformations among multiple designs and languages.
Figure 4.1: LLVM can create a C++-to-x86 compiler or a Rust-to-ARM compiler, yet share internal transformations on LLVM-IR. Similarly, our HCF can create a Chisel-to-ASIC- Verilog compiler or Verilog-to-FPGA-Verilog compiler and share internal transformations.
LLVM originated, like FIRRTL, as an academic project by Chris Lattner and advised by Vikram Adve in 2000 at the University of Illinois at Urbana-Champaign. Since these beginnings, the "compiler infrastructure project" has revolutionized compiler research by providing an open source, modular, and modern compiler.
The LLVM compiler has a modular design of many passes that operate on a common and well-defined intermediate representation (IR) of a program. This IR is independent of source program and target machine.
The LLVM’s compiler infrastructure is composed of passes which operate in sequence on a program’s IR. Each pass accepts a program’s IR and returns a modified IR. They pipe together until the program is optimized, simplified, and instrumented.
Some passes require analyzing the program before modifying it, and many of these anal- yses can be shared among passes. However, other passes invalidate previously run analyses, which must be rerun. This presents an interesting challenge - how does the compiler know when to recompute analyses? Another challenge is pass dependency - some passes expect and require being run after other passes - how is this ordering done?
LLVM solves both of these challenges with a mechanism called pass scheduling and registration. As part of their interface, passes specify the following:
• any prerequisite passes (default is no other passes)
• any passes they invalidate/preserve (default is invalidating all other passes)
Note that references to prerequisite or invalidated passes is by name, which can be brittle to code modifications. Additionally, incorrect specification of prerequisites or invalidations can cause undetermined runtime behavior.
After passes are declared, they must be registered (either statically or dynamically) to a global Pass Manager with the following:
• Command-line option name • Name of the pass
• Whether it walks and modifies the control-flow-graph • Whether it is an analysis pass
LLVM has three pass categories: analysis passes, transform passes, and utility passes. Analysis passes compute information that other passes can use, and can be reused multiple times for multiple passes. Transform passes mutate the program in some way, and can use (or invalidate) analysis passes. Utility passes provide some utility that don’t otherwise fit categorization, e.g. passes to extract functions to bitcode.
Each pass can take on one of a variety of traversal types. An ImmutablePass doesn’t traverse the program but just reports statistics or other information. A ModulePass operates on the entire program and thus cannot optimize its execution. A CallGraphSCCPass tra- verses the program IR bottom up and can only access local information. Due to its traversal
behavior, it is possible to optimize its execution, but is tough to write one that is correct conceptually. A FunctionPass visits each function, independent of visiting other functions. This makes it easily parallelizable, conceptually simple, but has limited functionality. Fi- nally, a LoopPass executes on each loop in the function, independent of all the other loops in the function.
The LLVM compiler infrastructure is well designed and has some important takeaways that can be applied to designing the FIRRTL compiler infrastructure. First, there can exist a user-enforced dependency between a pass’s actual and specified behavior; a disconnect here would be difficult to debug. In addition, the variety of program traversal APIs can restrict passes but also enable optimizations such as inter-mixing the execution of multiple passes for better cache behavior from data locality. It is also important to keep the window open for multithreaded compilation. Finally, be sure to implement lots of useful infrastructure to simplify writing (and integrating) a compiler pass.
FIRRTL’s compiler framework is similarly structured: Chisel and Verilog frontends parse into its IR, transformation passes provide simplification, optimization, and instrumentation, and the resulting IR can either be simulated directly or passed to one of many Verilog back- ends tailored for simulators, FPGAs, or ASIC technology processes. Dependencies between transformations are specified, enabling a total ordering of tranformations to be determined prior to running the compiler. Custom transformations are integrated automatically through this dependency interface.