• No results found

Behavioral Template (BT) Generator

In document Copyright Undertaking (Page 140-144)

System-level simulations

7.3 Behavioral Template (BT) Generator

Figure 7.2 also shows the main structure of the BT . The input to the template generator is the target HLS frequency and technology library, the interface type, either standard or custom, and the latency (Ldly) given in clock cycles. Alternatively the user can also manually specify the BT0s latency. This allows the user to explore different scenarios quickly. In order to create a more flexible template, the latency is passed as an input parameter to each template, thus making it fully parameterizable at runtime (it does not require to be re-synthesized each time the latency changes).

The template generator takes these inputs and generates two different types of synthesizable ANSI-C programs: BT type I and BT type II. Both types of templates are composed of 4 main parts:

Part 1: Interface read. The first part contains the synthesizable interface API

119

assign mot = in00 * coeff00;

always @ ( aot or ST1_02d )

assign out_en = ( ( ST1_02d &

got ) | ( ( ST1_02d & (

(a) RTL description of a FIR filter

int fir(void){

Fig. 7.4: Input RTL design and the resulting C design using proposed method of VeriIntel2C

required to read data into the accelerator. The choice of interfaces is based on the synthesizable APIs provided by the commercial tool used in this work. HLS tools provide libraries of synthesizable APIs for standard interfaces in order to facilitate the work of the designer. Currently, FIFO, RAM, AHB, and AXI are supported. For custom interfaces (e.g. a module which has to interface with an LCD display), the user can encapsulate the interface as a synthesizable function and include into the interface API library used by the BT template generator. In this case the interface

is taken from this library and only the computational and delay loop are generated (parts 2 and 3).

Part 2: Computational Loop. The second part differs based on the type of BT to be generated. In the case that no COND constructs are found in the original code to be abstracted, BT type I is generated. For this type of BT , the computational loop performs some basic computation on the input data. This is important because the HLS tool would optimize the logic of the entire template away if no computation is performed. Hence this part ensures that the template structure is preserved. In the case that COND are present, a BT type II is generated. This implies that the the code that computes the COND condition is preserved from the original BIP including the loop where it is used. The rest of the code is fully abstracted away. In the worst case, this could imply that the BIP can not be abstracted away and thus the BT would be exactly the same as the BIP. It should be nevertheless noted that most of the accelerators lead to BT of type I.

Part 3: Delay Loop. The third part contains the delay loop to make the BT0s final latency match the latency of the original BIP . For this purpose a loop con-taining only a timing description is used. The timing descriptor symbolizes a clock cycle. Commercial HLS tools normally support different types of scheduling modes.

The traditional one is the automatic scheduling mode, in which the HLS scheduler automatically times the behavioral description. Another scheduling mode provided is manually mode. This implies that the user can manually time the behavioral descrip-tion by inserting the clock boundaries directly in the code. In SystemC this is done with wait statements, while at the ANSI-C level, this is vendor specific. In the case of

121

the commercial HLS tool used in this work, a $ sign is used to denote a clock bound-ary. Hence when having a loop with N iterations it will take N cycles to execute the loop. For HLS tools that do not allow this type delay control, the delay can easily be achieved in the computationally loop by performing simple operations in a for loop sequentially. As mentioned previously, the main problem in some applications is that the exact circuit latency is unknown a priori. For this purpose the proposed method has two options. By default, if the user does not specify the latency as an input, the method synthesizes (HLS) the input BIP once and extracts the latency reported by the HLS tool. For case I types, with no COND, this should be accurate. For case II, with COND, the method synthesizes the BIP and extracts the min and max latency reported by the HLS tool for the full original C description and for a C description with only the COND part of the code.

Part 4: Interface write. Finally, the last part contains write back portion of the interface using again the synthesizable API provided by the vendor or the custom interface encapsulated in a library by the user.

With regards to the time required to generate the BT , it should be noted that the most time consuming part is the HLS in order to extract the latency of the accelerator.

Only a single HLS is required for BT of type I (without data dependencies), while for BT of type II (with loop data dependencies), our template abstraction method requires two HLS. The first synthesis on the original behavioral description and the second of the optimized one in order to determine the latency of the delay loop. During the experimental phase we noted that a single HLS on any of the benchmarks did not exceed 10 seconds. It should also be noted that the RTL to C conversion is extremely

fast, taking less than 1 second. In case that the user knows the accelerator’s latency or wants to explore different what-if scenarios, the time to generate these templates is negligible as the latency is passed as a parameter to the BT .

In document Copyright Undertaking (Page 140-144)