A Pre-allocated Address Allocation Scheme

In this section, we describe a novel dynamic memory allocation algorithm that is amenable to the high-level synthesis process. Through the review and analysis of pre-existing dynamic memory allocation mechanisms, we identified short-comings of prior approaches, including:

• Searching for free memory is costly (e.g., having to inspect a doubly-linked list for free memory, reading the bitmap to find a free segment of memory, etc.)

• Once free memory is located, calculating the address can be computationally complex (e.g. refer to the bitmap allocator in Chapter 3.3).

• Repeated requests with similar sizes are not handled well with pre-existing approaches (i.e. there is no pool of preallocated, similar-sized memory segments).

Additionally, we have not described how these algorithms translate to hardware, and their performance or area impacts. So far, our discussion of these algorithms has remained algorithmic. The implementation of these algorithmic features may prohibit possible performance optimizations during the high-level synthesis process. From these findings, we propose a new dynamic memory allocation scheme, which is both flexible and intended for hardware.

Our algorithm makes use of a key-value data structure, which we represent as:

f : K → V (3.1)

Where f is a function which maps members (keys) in K to a member in V . Using this definition, the set of positive integers, Z⁺ is K, and represents all possible memory request sizes. We wish to map a request size to a pre-allocated segment of memory which meets or exceeds the request size. Pre-allocated segments are logically separated by their allocation size (which are all powers-of-two) into bins. Each bin has up to n pre-allocated memory segments of the corresponding size. All pre-allocated memory

Chapter 3. Dynamic Memory Allocation Schemes 19

Figure 3.5: The general structure of a pre-allocated address allocation scheme.

segments are from a contiguous memory location (i.e. the beginning of the second memory segment in the bin occurs directly after the end of the first memory segment’s reservation). Each bin’s memory space is disjoint (i.e. an array is reserved for each bin). To track the reservation status of a pre-allocated segment, a n-bit-vector is included with each bin to mark if an address has been reserved. The set of all pre-allocated addresses is the set V . This is shown in Fig. 3.5, where a request size (e.g. when β =16) maps to a list of possible addresses (A₁₆₀,A₁₆₁...) which can be reserved via a bit-vector (b₁₆₀,b₁₆₁...).

We now define the mapping function, f below.

Suppose a memory request of β bytes is issued. Our algorithm must identify which bin of pre-allocated segments to select from. Since each bin corresponds to pre-allocated segments of memory which are sized to be powers-of-two, we can compute the logarithm in base 2, and then take the ceiling to determine the bin of pre-allocated addesses which meet or exceed the request-size. Once this is determined, the n-bit-vector associated with that bin is inspected to locate an available address. To avoid performing a linear search through n-bits for a reservable address, we define the following process to locate a free, pre-allocated address, which was taken from [33]. Using the bin’s bit-vector, Bv = {bv₀, bv₁, ..., bv_n} the following operation is performed:

(¬Bv) ∧ (¬((¬Bv) − 1))

This will locate the first (and lowest-position) 0-valued bit in this bit-vector. Using this information, we can identify which address is free (each bit location maps to a unique address in the bin, i.e., bv₀

Chapter 3. Dynamic Memory Allocation Schemes 20

represents the reservation status on Av0). We provide an example:

B_v= {1, 1, 1, 0, 1, 1, 0, 1}

¬Bv= {0, 0, 0, 1, 0, 0, 1, 0}

¬Bv− 1 = {0, 0, 0, 1, 0, 0, 0, 1}

¬(¬Bv− 1) = {1, 1, 1, 0, 1, 1, 1, 0}

(¬Bv) ∧ (¬((¬Bv) − 1)) = {0, 0, 0, 0, 0, 0, 1, 0}

After a free address has been uncovered, the bit representing this address is marked as reserved (updated to a logic-1), and the address is returned to the requester.

We define another key-value mapping, f⁻¹ : V → B, where we map an address back to the cor-responding bit representing it’s reservation. This key-value structure is used to reset the state on the bit-vector, releasing it’s reservation. f⁻¹ is described below.

When a reserved memory-address is released, the following process occurs: since every address is unique to one of the available bins, all bins are searched against the incoming address. Once the bin is identified, the bit representing this address is also identified by searching the bit-vector and checking for this address (recall that the bit-to-address mapping is also unique). Once found, the corresponding bit is set to 0.

We modify our algorithm to handle several additional cases: when this scheme is issued a request for β bytes, the memory request is inspected to check if it’s larger than the largest bin size. If the request exceeds the largest size in our scheme, address 0 is returned. Lastly, it is possible that all pre-allocated segments for a particular bin are reserved in which case a linear search takes place through V , iterating over all members of V which are have a pre-allocated memory segments that are equal to or larger than the memory request. If an address is located during this search, we follow the above procedure.

Otherwise, we return an address of 0.

3.6 Summary

In this chapter, we reviewed four dynamic memory allocation algorithms from literature. From our inves-tigation, we proposed a new dynamic memory allocation algorithm which addresses some shortcomings of previous methods and is both flexible and intended for hardware.

Chapter 4

Related Work

We review work which has attempted to bridge the gap between dynamic data structures and dynamic memory allocation techniques and the world of high-level synthesis. We highlight work which also aims to improve the performance and area of dynamic memory allocation schemes within High-Level Synthesis.

4.1 Dynamic Memory Allocation Support in Modern High-Level Synthesis Tools

We review modern (as of 2019) commercial and academic HLS tools and explicitly show if a tool supports the high-level synthesis of dynamic memory allocation schemes. We limit our review to tools which accept C or C++ as input. We provide a summary of this information in Table 4.1, and briefly describe the high-level synthesis tools which support synthesizable dynamic memory allocation (i.e. the tool can synthesize a C-implemented, dynamic memory allocation algorithm to hardware description language.).

CHC is a high-level synthesis compiler and associated tool chain from Altium [36]. This tools accepts a large subset of the C language as input (we believe recursion and function-pointers remain unsupported).

However, this tool supports dynamic memory allocation, although the exact methodology of how they support it is unclear. Therefore, we cannot identify if and what optimizations these tools apply to programs using dynamic memory allocation.

Both Bambu [25] and gcc2verilog [44] support dynamic memory allocation constructs during the HLS process. These two frameworks were built upon the GCC (GNU Compiler Collection) framework.

These tools operate on GCC’s IR, gimple, [45, 13]. gimple representations of dynamic memory allocation algorithms can be constructed and linked in to the original gimple representation of the user program. If

Chapter 4. Related Work 22

Table 4.1: List of modern, C-based HLS tools and their support for synthesizing dynamic memory allocation constructs. Entries marked as -^u indicates that the documentation is unclear.

HLS Tool Proprietor Synthesizable Dynamic Memory Support

eXCite [34] Y Explorations No

Catapult HLS [35] Mentor Graphics No

CHC [36] Altium Yes

Stratus [37] Cadence -^u

DK Design Suite [38] Mentor Graphics -^u

ROCCC [39] Jacquard Comp. No

the user does not provide alternate definitions of malloc and free, system-level software (i.e. malloc() and free() from the operating system) or tool-included definitions are supported. These definitions are written in the C programming language and make use of on-chip BRAMs for heap memory; thereby no additional on-chip processor is required to use these algorithms. For example, Bambu has a C implementation of malloc() and free() taken from [46], employing a linked-list memory allocation algorithm and implements heap memory as on-chip BRAMs. However, this can be undesirable since it is not clear if the provided algorithm is best suited for a particular HLS application [9].

Although these three modern day tools have support for synthesizing dynamic memory constructs to hardware, there is many limitations with these works. Users may not be able to (1) dictate the size of heap, (2) specify how many heaps could be assigned to the program to facilitate parallelism and (3) may not have the ability to explore performance and area impacts of using various allocation strategies. The work presented in this thesis differentiates from these; we present an exploration of a variety of dynamic memory allocation mechanisms and their hardware-equivalent, as well as automated performance and area optimizations for dynamic memory allocation mechanisms. Our work also allows for the user to express a variety of design constraints pertinent to dynamic memory allocation algorithms.

Chapter 4. Related Work 23

4.2 Synthesis of Hardware Models in C With Pointers and

In document Dynamic Memory Allocation Techniques for High-Level Synthesis. Nicholas V. Giamblanco (Page 28-33)