C Compiler Generation for rASIP - Software Tool-suite Generation for rASIP

Pre-fabrication Design Space Exploration

5.3 Software Tool-suite Generation for rASIP

5.3.2 C Compiler Generation for rASIP

To enable fast design space exploration the availability of a High-Level Language (HLL) compilation flow is extremely important. It contributes to the design efficiency by firstly, allowing quick change in the application without slow, error-prone changes in the assembly code. Secondly, with the HLL compilation flow legacy software applications usually written in a high-level language can be easily targeted to the current processor. In the proposed

rASIP design space exploration flow the re-targetable C compiler generator from LISA is used extensively. Since the rASIP description is an extension of LISA, the compiler generation flow can be integrated without any modifications.

The re-targetable compiler generation flow from LISA is based on a modular compiler development approach. The phases of compiler (typically referred as front-end), which are machine-independent are written only once. The machine-dependent back-ends of compiler are composed of several modules as shown in the following figure 5.8.

For a clear understanding, the tasks of the various back-end modules in a compilation flow is briefly mentioned here.

Instruction Selection The input of an instruction selector is the IR representation of the application source code.

The IR representation is typically in form of a syntax tree. In the instruction selection phase the syntax tree is covered by available machine instructions. Typically this is done via tree-pattern matching in a dynamic programming framework.

Register Allocation The intermediate variables used in the assembly instructions must be associated with the avail-able physical registers of a processor. This is precisely the task of a register allocator. Typically, the register allocation is done analyzing the liveness analysis of each variable, building an interference graph from that and then performing bounded graph coloring [112]. In case of number of live variables exceeding the storage limit of physical registers, the data is temporarily stored into data memory. This event is referred as spilling.

Scheduling In the scheduling phase, the compiler must decide the order at which the processor instructions are to be executed. In case of multi-issue processors, the scheduler also needs to check the possibility of fine-grained instruction level parallelism and perform necessary code compaction. For deciding on the execution order, the scheduler checks on the data-dependency between different processor instructions as well as structural hazards. On the basis of the dependencies, the instructions can be rendered as nodes in a DAG with the dependencies as edges. The basic scheduling algorithms operate by selecting the nodes with no dependent predecessors one by one. The quality of scheduling can be strongly improved by scheduling instructions across basic blocks or by increasing the size of basic blocks by other optimization techniques.

Code Emission The code emitter simply prints the results of the previous phases in form of machine-level assembly code.

Using LISA compiler generator, the machine-dependent back-end modules can be easily re-targeted to the processor under development. As also depicted in the figure 5.8, the compiler back-end is automatically generated from a back-end description for a typical re-targetable compiler. In LISA, these back-end descriptions are either extracted automatically or entered manually by the designer via a Graphical User Interface (GUI).

Re-targeting the Compiler for Instructions with Latency : Here we limit our discussion within the specific features of the LISA-based compiler generator, which aids the designer to re-target the compiler after enhancing the processor with instruction-set extensions containing latency. Inquisitive reader may refer to [95] [117] [144] for a detailed study on compiler generation for the complete ISA.

After the instruction-set is extended with a custom instruction, the compiler must be extended to cover the fol-lowing aspects. Firstly, the compiler back-end named instruction selector must be made aware of this new custom instruction by introducing this custom instruction to it. Secondly, the register usage of the custom instruction must be made explicit to help the register allocator distribute the register resources among the processor instructions.

Finally, the dependencies of the custom instruction with the other processor instructions need to be outlined for properly scheduling the machine instructions. For re-targeting the C compiler based on new custom instructions, a complete re-generation of compiler back-end is un-necessary. An incremental flow on top of this re-targetable framework would be fast helping the designer to speed-up the design space exploration. Exactly, this is achieved by using inline assembly functions calls to directly call assembly statements from the C application.

Lexical

The inline assembly call defines the usage of arguments, and the custom instruction syntax itself. Within the C application, the custom instructions are called like normal function calls, which are replaced by inline assembly functions’ assembly syntax portion during compilation.

asm unsigned short bitrev(unsigned short addr, unsigned short n_bit_rev){

.packs “@{temp} = bit_rev @{addr} @{n_bit_rev};”,1 .packs “@{} = @{temp};”,2

}

Figure 5.9. Inline Assembly

An inline assembly function as well as its usage is shown in the figure 5.9. This function is used to perform a special operation named bit reversal in the re-configurable block. The inline assembly function is composed of two parts namely, the directives and the syntax. The arguments passed in the inline assembly call can be of definite value or a variable. For variables to be used locally within this inline assembly function, those can be declared using.scratchdirective. In case it is variable, the target register resource(s) of this variable can be specified by .restrictdirective. The exact syntax to be used for the inline assembly function during compilation is described following the inline assembly directives within.packskeyword. The return value of this function can be specified, as indicated, by@directive inside the in inline assembly syntax.

Connecting the inline assembly functions to the re-targeting of compiler back-ends, it can be observed that by

specifying the exact assembly syntax in the C code the necessity of extending the instruction selector is completely avoided. The inline assembly directives such as.scratchand.restrictis accounts for the back-end descrip-tion used in the register allocadescrip-tion phase. The scheduling phase actually requires a little more elaborate informadescrip-tion updating than only passing the directives. First of all, the inline assembly function call can be attributed with .barrierdirective. This ensures that during the IR-based optimization phase, the inline-d assembly instructions are not moved across basic blocks. The delay of producing the results for instructions in the FPGA can be ac-counted by two ways. Firstly, the custom instruction syntax can be extended by no operation instructions (NOPs).

This prevents any meaningful instruction to be issued as long as the custom instruction is executing. Of course this yields a pessimistic schedule. A more sophisticated approach is to update the latency table in the compiler back-end description. A latency table (as shown in the figure 5.10), a typical data-structure for generating the scheduler, contains the dependencies of various instructions (or classes of instructions). The latency table is arranged in a two-dimensional matrix. The rows of this matrix consist of all processor instruction (or classes of instructions) which writes to a resource, referred as producers. The columns of this matrix consist of all processor instruction (or classes of instructions) which reads from a resource, referred as consumers. With the new custom instructions, the latency table can be extended with new entries in the rows and/or columns. Following that, the dependencies (RAW, WAW and WAR) for each pair of instructions are to be updated in the latency table. Note that, the producer and consumer identification (P4,C7) as well as the resources produced (resourceR, written at cycle 2) and consumed (resourceR, read at cycle 0) by the custom instructions are specified by the.packsdirective in the inline assembly function.

asm unsigned short bitrev(unsigned short addr, unsigned short n_bit_rev){

.packs “@{temp} = bit_rev @{addr} @{n_bit_rev};”,3 .packs “@{} = @{temp};”,4

Figure 5.10. Inline Assembly with Latency

In document Language-driven exploration and implementation of partially re-configurable ASIPs (rASIPs) (Page 68-71)