Liang et al.’s Hi-DMM, is a framework which modifies source-code containing dynamic memory al-locations to be high-level synthesis friendly [33]. By performing a source-to-source transformation, HLS-amenable specialized buddy allocators are implemented in place of generic malloc and free calls.
Through their program analysis, they distinguish several types of allocation requests, which can be one of the following:
1. Constant-Coarse-Grained Allocation (CCGA): Requested byte size is large and known at compile time
2. Constant-Fine-Grained Allocation (CFGA): Requested byte size is small and known at compile time
3. Variable-Grained Allocation (VGA): Requested size is not known.
After their analysis, their specialized allocators are then paired with one of the allocation request types, in an attempt to improve performance. These specialized buddy allocators can only be accessed through an HLS handshake protocol, which introduces additional latency to the design. Additionally, Hi-DMM automatically partitions the heap through a graph based analysis of the program using Karger’s
Chapter 4. Related Work 26
algorithm [53]. This algorithm iteratively cuts the graph until the desired number of heaps is met. The author’s approach to heap-partitioning and dynamic memory support is optimized to their allocation library. In this thesis, we present a static analysis which can determine the number of safe heap partitions. Our framework is also general for any well-defined allocation mechanism.
4.10 Summary
In this chapter, we review previous work which investigates the usage and optimization of dynamic memory allocation routines either as a hardware module or in the high-level synthesis process. We also comment on state-of-the-art tools, and if they support the synthesis of dynamic memory allocation routines during the HLS process. From our research, we have identified some gaps in the study of dynamic memory allocation in HLS. There has been no evaluation between dynamic memory allocation algorithms in the HLS context. Our work evaluates the high-level synthesis of five unique dynamic memory allocation algorithms in a number of environments. From our evaluation, we provided a guideline to assist HLS-designers select an allocator. Additionally, our work explored performance and area optimizations that are possible with dynamic memory allocation algorithms in HLS.
Chapter 5
dmbenchhls: A Dynamic Memory Allocation Benchmark Suite
There is no standardized way to evaluate the high-level synthesis of dynamic memory allocation tech-niques and associated techtech-niques. In this chapter, we explore a variety of methods to evaluate dynamic memory allocation within high-level synthesis. We first review several memory request patterns. We then review applications that require dynamic memory allocation.
5.1 Memory Request Patterns
We define a number of memory patterns which typically appear within applications and were previously suggested in [54]. These memory patterns are listed as follows:
5.1.1 Triangle
1 //==-- Triangular Memory Pattern --==//
2 int *arr[BOUND];
3 for(int i = 0; i < BOUND ++i)
4 arr[i] = malloc(choose_size());
5
6 //.. Do computation with 2-D Array
7 for(int i = 0; i < BOUND; ++i)
8 free(arr[i]);
Figure 5.1: The triangle memory pattern.
The triangle memory pattern, presented in Fig. 5.1 iteratively requests for memory upfront. Once the
27
Chapter 5. dmbenchhls: A Dynamic Memory Allocation Benchmark Suite 28
memory can be released, this pattern iteratively releases the reserved memory. This pattern can vary in a number of ways. Request sizes can be constant or computed by a function (e.g., randomly generated, linear and increasing, represented by choose size()). Additionally, the order in which memory is released need not be in the same order as it was requested. The number of design choices for this pattern is large, and therefore, dynamic memory allocation algorithms may perform differently depending on the allocation size, and ordering. We implement the triangle memory pattern to iteratively request for memory in a linear fashion, where the loop iteration index will dictate the request size. Additionally, we release memory in the same order it was allocated. This provides a realistic evaluation of an allocator, in the sense that with an linearly-increasing request size, it will attempt to stress the underlying allocation algorithm [54].
5.1.2 Square
1 //==-- Square Memory Pattern --==//
2 for(int i = 0; i < BOUND ++i) {
3 int * arr = (int *)malloc(choose_size());
4 //.. do some things
5 free(arr[i]);
6 }
Figure 5.2: The square memory pattern.
The square memory pattern requests for memory, executes program logic, and then immediately releases this. This request-do-release pattern is iterative. Similar to the triangle pattern, the request-size can be constant or produced by a function. Our implementation requests for memory based on a loop-index, executes program logic (which does not contain any other memory requests), and then releases the hold on this memory.
5.1.3 Random
The random memory pattern, depicted in Fig. 5.3 consists of a compute kernel that randomly gener-ates malloc requests during runtime (Lines 34 and 35). Randomly generated mallocs are provided with a randomly generated request size as input (Line 35). Each request is given a random lifetime (Line 14), which dictates the number of iterations to wait until free is invoked on this request (Lines 19 to 31).
Our implementation holds the state of randomly generated mallocs in a List structure, which is a fixed size (Line 8).
Chapter 5. dmbenchhls: A Dynamic Memory Allocation Benchmark Suite 29
We created five C applications which require dynamic memory allocation routines and are amenable to hardware implementation as an additional evaluation methodology. Each benchmark provided in this suite is significantly different in terms of high-level behavior, and was inspired by real-life applications [55, 56].
• priq: A priority queue. An array of random numbers are queued (one-at-a-time) and then popped, exhibiting a square memory pattern.