The Compiler Class - ARCHITECTURALLY RETARGETABLE COMPILATION

CHAPTER 5 ARCHITECTURALLY RETARGETABLE COMPILATION

5.3 The Compiler Class

The Compiler class is responsible for taking the application designer’s description of their application and converting it into a set of configurations for the hardware platform and a series of run-time events for the run-time system to carry out. The prototype compiler, called

BasicCompiler, makes some assumptions about the characteristics of the applications that it

will compile. These assumptions result in simplifications to the compilation process and are not requirements of the overall automated design strategy. More advanced compilers could certainly be written that allow these initial set of assumptions to be relaxed. The Compiler abstract class provides a fixed interface so that future compilers can be seamlessly inserted into the tool set.

The primary limitation of the BasicCompiler is that is can only handle UnorderedStage objects that contain a single type of operation. Even given this restriction, many different kinds of interesting applications can be developed. The image interpolator, for instance, does not suffer at all from this restriction. The invRows unordered stage contains only one type of operation, the

InvRowOp operation. Unordered stages can contain many instances of that operation. The

invRows stage contains one InvRowOp per row in the input image. Under the restriction, invRows could not, for instance, contain both InvRowOp’s and InvColOp’s.

The BasicCompiler begins by insuring that the application description passed to it meets the single operation type requirement. The compiler’s outputs will be a RunTimeScript object and a set of EDIF netlists for hardware configurations.

The compiler works on one unordered stage at a time. The first step is to design hardware for executing the application. The compiler examines the stage and determines what type of operation is used in it. Queries are made to the target hardware architecture to determine how many processing elements are available, the sizes of the elements, and the number of memory interfaces that each element supports. The compiler then determines how many instances of the operation can be built on the RTR CCM subject to the restrictions on the number of available memory ports and the amount of configurable logic available per device. Given that each operation requires a memory port, in the absence of some other limiting constraint, one operation can be instantiated per memory port available in the processing element. Sometimes, this is not possible due to restrictions on the amount of configurable logic that is available. Even if a given processing element has multiple memory interfaces, if an operation consumes enough of the element’s configurable resources, it may not be possible to assign multiple instances of the operation to that element.

Once the compiler has maximized the number of operations that it can instantiate in hardware, it calls the JHDL technology mapper to produce netlists for programming the processing elements. Methods from the Operation objects are called to generate the operation specific hardware. Concrete implementations of the hardware architecture abstraction provide any platform specific interface hardware that is necessary. The result is a series of netlists that are processed through vendor tools to produce hardware configurations.

At this point, the compiler has created hardware instantiations of the operator type for this stage on the platform’s processing elements. Each hardware instantiation of the operation is capable of executing any of the specific instances of the operations represented in the unordered stage. In the example of the image interpolator targeted for the WildForce board, the compiler creates one hardware instance of the RowInvOp operation on each of the WildForce’s processing elements. The invRows stage, for a 128x128 image, has 128 instances of the RowInvOp operator in it. So the compiler’s next step is to schedule each of the 128 RowInvOp operations on one of the hardware instances.

The compiler accomplishes this by adding a series of events to the RunTimeScript object. The first event is a ConfigureProcessingElements event that causes the hardware configurations,

generated by the netlists produced earlier, to be loaded onto the processing elements. In the interpolator on WildForce example, four Xilinx bit files are loaded onto the four processing elements. Next, the compiler chooses one instance of the operation from unordered stage for each available hardware instantiation. In case under discussion, the compiler will choose four RowInvOp operations from the invRows stage and assign each of them to one of the operation

instantiations on the four processing elements. FetchMemory events are added to the

RunTimeScript for each of the operation instances selected to move their input data from the user’s memory space to the local memory space of the processing elements. A

HardwareInterlude event is then added to the RunTimeScript to execute the hardware

configuration. The processing elements perform the operations assigned to them and store their results in the processing element’s memory space. So, the compiler adds a set of RetireMemory events that move the resulting data from the accelerator board back to the user’s memory space. One of two things can happen next. If all of the operation instances from the unordered stage have been scheduled on the hardware, the compiler proceeds to the next unordered stage. If operations remain, the cycle discussed above is repeated with one exception. Since hardware configured to perform the desired set of operations is already present on the RTR CCM, no

ConfigureProcessingElements event is generated. The compiler simply generates another set

of fetch, execute and retire events.

If the application contains StageSoftware objects, the compiler will translate those into

SoftwareInterlude events.

When the entire process is finished, the compiler has produced a series of run-time events that will sequence the user’s application on the target hardware.

In document Architecture-Independent Design for Run-Time Reconfigurable Custom Computing Machines (Page 68-71)