3.3 HALLOC: A Hierarchical Memory Allocator
3.3.3 Memory De-allocation
This hierarchical allocation means we will also need a hierarchical de-allocation, but at this point we do not free local chunks back to SHM-Arena or super-chunk. The reasoning behind this decision is that even if we send back chunks to SHM-Arena we may still see fragmentation. This fragmentation is due to the fact that once a super- chunk is added to local allocator it is broken down in to even smaller chunks (as and when new allocation request arrives); this will prevent us performing coalescing at a global level. Now, as we already mentioned earlier, there are two types of deallocation; we refer to them as local free and remote free.
Each core maintains a garbage list containing chunks marked for de-allocation. In the case of remote free, a chunk is added to the garbage list of owner core.
Again in similarity to allocation functions, we have a few functions that deal with de-allocation of memory allocated with scc_malloc. Here we list all of the de-allocation functions:
scc_free A wrapper function and counterpart to scc_malloc. This wrapper hides all the details of de-allocation from a programmer and calls the appropriate de-allocation function available internally.
scc_free_local A local de-allocator and counterpart to local allocator scc_malloc_local, and as expected it is also described in[71]. Local de-allocator scc_free_local returns the chunk to be freed to the core local free list and provides immediate coalescing functionality. This function is called in case of local free and only by owner core.
scc_free_remote When chunk is de-allocated using remote free, scc_free_remote is called to add the chunk to appropriate garbage list. This function is called by any core that is not owner of a chunk that is being free. In the case of remote free, deferred coalescing takes place to reduce fragmentation.
scc_free_garbage This function is called to de-allocated chunk that are placed in garbage list of a core.
Algorithm 3 Algorithm scc_free to De-allocate Memory Pointed by p
1: if not (shmStart≤ p ≤ shmEnd) then
2: OS_standard_free(p) ◃ p points to private memory
3: return
4: if owner-id = core-id then
5: mutex_lock()
6: scc_free_local(p) ◃ K&R free
7: mutex_unlock()
8: else
9: air_lock()
10: scc_free_remote(p,owner− id) ◃ add chunk to garbage list of owner core
11: air_unlock()
We know at which address SHM-Arena starts and ends from Listing3.2. We can use this information to employ a simple protection mechanism to check if an address is valid. An address is a valid shared memory address if condition shmStart≤ address ≤ shmEnd is satisfied (line 1). If the memory pointed by p was allocated in private region using standardmalloc (malloc by OS), then we need to free it using standard free (line 2). This may happen plus it’s a good protection measure. In case when p was allocated by scc_malloc there are two actions which can be performed. In the first case, owner-id from chunk header and core-id are compared and found to be same (line 4). Then function scc_free_local, which is the standard free function corresponding
mutex lock to ensure thread-safe operation (lines 5–7). Immediate coalescing is also performed by scc_free_local, detailed description and implementation details can be found in[71]. In second case where id mismatch happens, a call to scc_free_remote is performed. Since scc_free_remote can be invoked by multiple cores simultaneously, it is protected by a lock implemented using the Atomic Increment Counter (AIR), described in Section2.5.3. Since implementation of scc_free_remote is very simple, we omit any code/pseudocode. In-short each core maintains an array with each element pointing to garbage list of other cores, where index of an element is equal to the core- id/owner-id. A chunk being remote freed is simply inserted at the beginning of a garbage list.
Algorithm 4 Algorithm scc_free_garbage to De-allocated Memory from Garbage List
1: air_lock()
2: glfirst← garbage_list ◃ copy the garbage list
3: garbage_list← NULL ◃ make the garbage list empty
4: air_unlock()
5: mutex_lock()
6: while gl f irst̸= NULL do
7: glnext← chunk after glfirst in list
8: scc_free_local(gl f irst) ◃ K&R free
9: glfirst← glnext
10: mutex_unlock()
Algorithm4describes how the core de-allocates the chunk which was added to its garbage list by other cores. This algorithm is executed during the garbage clean-up process that is invoked by scc_malloc when the first attempt to allocate memory fails. there are two stages in scc_free_garbage;
Lines 1–4 We protect this region with a lock implemented with AIR to protect the garbage list from being corrupted due to the concurrent access by other cores. Once lock is acquired we copy garbage list to local list and set garbage list to be NULL (to mark it empty). This is an inexpensive operation as the garbage list is a linked list so that only copying pointer to first chunk is enough. This strategy reduces the time that other cores have to wait in order to access the garbage list while the owner core is in its clean-up routine.
Lines 5–10 Here, since we are accessing free list (accessed only by owner core), mutex is enough to protect from any concurrent access. Then we simply loop
through the copied garbage list until the end and add each chunk to the free list by calling scc_free_local.