CHAPTER 5 ARCHITECTURALLY RETARGETABLE COMPILATION
5.2 The RunTime Class
The RunTime class implements the basic run-time functionality of an application. Recall the structure of an application. An application consists of a hierarchical tree of ordered and unordered stages. Using a defined set of ordering rules, the hierarchical structure can be flattened
into a unique ordered sequence of StageUnordered objects. Each StageUnordered object
contains a set of Operation objects. Each Operation object defines methods for moving data back and forth between the host and the RTR CCM. Each operation also provides a description of a hardware operation that reads data from a memory interface, performs computations, and returns results back to the memory interface. No restrictions are placed on the hardware defined for the memory interface. It may read and write data in any pattern that it wishes. The Operation objects all target an abstract memory interface. Since memory interfaces differ in their sizes and word widths, the application programmer is allowed to specify a set of word widths that their application will support and a minimum size for a memory on a potential platform. An application can also perform some of its computation or housekeeping operations in software by including StageSoftware objects in the hierarchy.
At run time, the operations need to be sequenced onto the RTR CCM. This occurs as the execution of a series of RunTimeEvent objects created by the compiler. All user applications are reduced to a series of these event objects. The process that generates this sequence will be discussed in Section 5.3. For now, there are five types of run-time events. They are all inheritors of the RunTimeEvent abstract class. The RunTimeEvent hierarchy is diagrammed in Figure 3-8. The code fragment that implements the run-time execution of the
ConfigureProcessingElements is given in Listing 5-6. The ConfigureProcessingElements
event causes hardware configurations to be loaded onto the target platform. The event has an
array of Configuration objects in it that contain the information about which processing
element should be programmed with which bit file. Execution of the configuration event involves simply cycling through the array of configurations and calling the hardware architecture’s LoadConfiguration() on each one.
if( curEvent instanceof ConfigureProcessingElements ) { ConfigureProcessingElements event =
ConfigureProcessingElements)curEvent; for( int j=0 ; j<event.getNumConfigs() ; j++ )
{
hardware.LoadConfiguration(event.getConfigNum(j)); }
}
The FetchMemory event fetches memory from the host-side application’s data space and writes it into a memory on the accelerator platform. There is a fetch memory event associated with the execution of every operation. The event causes the data that the operation needs to be loaded into the proper memory for processing. The FetchMemory object has a reference to the operation that is to be executed. It calls the operation’s userToCCM() method to get an array of MemoryImage objects that need to be moved to the processing array. Each element of the MemoryImage array is passed to the hardware architecture’s LoadMemory() method. A code fragment demonstrating the actions taken to resolve a FetchMemory event is shown in
else if ( curEvent instanceof FetchMemory ) { FetchMemory event = (FetchMemory)curEvent; Operation curOp = event.getOperation();
for( int j=0 ; j<event.getNumMemPorts() ; j++ ) {
MemoryImage mem[] = curOp.userToCcm(); MemPair dest = event.getMemPortNum(j);
for( int memNum=0 ; memNum<mem.length ; memNum++ ) {
hardware.LoadMemory(dest.pe,dest.mem,mem[memNum]); }
} }
Listing 5-7: FetchMemory Code Fragment.
The RetireMemory event represents the complementary operation to the FetchMemory event.
RetireMemory causes the processed data from an executed operation to be brought back from
the CCM and returned to the application’s data space. RetireMemory is more complex than
FetchMemory because of the random memory access issues discussed in Section 4.4. The fetch
event has a reference to the operation that has been executed and calls its ccmToBuffer()
method to get an array of MemoryImage objects that need to be moved from the CCM to the host. It calls the hardware architecture’s RetrieveMemory() method for each MemoryImage object in the array. As demonstrated by the example shown in Listing 4-7, the application code can set up
the MemoryImage objects returned by ccmToBuffer() to reference temporary buffers. These
same MemoryImage objects are then passed to the operation’s bufferToUser() method, so that the application code can replace the memory in any desired fashion. Again, as discussed in Section 4.4, if this step is unnecessary, the application code can simply define an empty method
for bufferToUser(). A code fragment showing the process invoked by a RetireMemory object
is shown in Listing 5-8.
Both of the interlude events have relatively simple implementations. The HardwareInterlude object represents the hardware execution of an operation. It simply calls the
ExecuteHardware() method of the hardware architecture. The SoftwareInterlude object
represents the execution of a StageSoftware object. It has a reference to the StageSoftware
object to be executed and simply calls its execute() method. A code fragment showing the
else if ( curEvent instanceof RetireMemory ) { RetireMemory event = (RetireMemory)curEvent;
Operation curOp = event.getOperation();
for( int j=0 ; j<event.getNumMemPorts() ; j++ ) {
MemPair dest = event.getMemPortNum(j); MemoryImage mem[] = curOp.ccmToBuffer();
for( int memNum=0 ; memNum<mem.length ; memNum++ ) { hardware.RetrieveMemory(dest.pe,dest.mem,mem[memNum]); } curOp.bufferToUser(mem); } }
Listing 5-8: RetireMemory Code Fragment. else if ( curEvent instanceof HardwareInterlude ) {
hardware.ExecuteHardware(); }
else if ( curEvent instanceof SoftwareInterlude ) {
SoftwareInterlude event = (SoftwareInterlude)curEvent; event.execute();
}
Listing 5-9: Interlude Event Code Fragment.
At run time, applications follow a basic cycle of these events to execute an application.
Applications begin by executing a series of ConfigureProcessingElements events to load
hardware configurations onto the processing elements on the RTR CCM. Next, input data is
moved to the processing elements’ local memories by FetchMemory events. Then a
HardwareInterlude event causes the hardware to execute the currently configured operations.
When that has finished, RetireMemory events move the resultant data from the accelerator back to the host. The exact series of events and the repetition of the cycle are determined by the compiler.