Software Perspective - THESIS DEPOSIT - Addressing parallel application memory consumption

2. THESIS DEPOSIT

2.2 Software Perspective

In this section we evaluate the memory system from a software perspective, focusing on how applications interact with the operating system and in-turn the underlying hardware. From this we can then evaluate how memory allocations are handled and the different methods of measuring memory consumption.

2.2.1 Allocations

Different programming languages have different methods of allocating space, such as Fortran’s ‘allocatable’ or C++’s ‘new’. These calls all result in POSIX user level memory management calls, such as malloc and calloc. These allocations are, in turn, handled by the memory allocator which manages the virtual address space. The allocator will request memory pages from the operating system via one of the two system calls, ‘mmap’ and ‘brk’. Although theses system calls are generally only used by the memory allocator they can be called explicitly by the user for advanced memory management.

Malloc Malloc is the standard memory allocation function, which returns a block of virtual memory address space of the given size (in bytes). The memory is not automatically initialised to a value or touched, and so the allocator may employ lazy allocation principles.

Calloc Unlike malloc, calloc initialises all of the allocated bits to zero thus preventing lazy allocation. Calloc is designed for allocating a block of memory composed of an array of elements of specified size.

Realloc Realloc is designed for extending a region of memory, or potentially shrinking it. Where contiguous space in the virtual address space is available the call extends the memory region in-place, otherwise it relocates it, in both cases the content is preserved.

Free The free function call returns a block of memory to the allocator, and enables the region to be used by other allocations. Unlike languages with garbage collection, in POSIX languages free must be called explicitly to deallo- cate memory.

mmap Memory Map (mmap) is an allocator for larger, independent, chunks

of memory. This methodology reduces fragmentation, as chunks can be indepen- dently deallocated. In addition to mapping physical memory to virtual memory ‘mmap’ can be used to map other devices, or even files, to memory. We note that whilst this memory is not technically part of the virtual memory heap we do not differentiate between the two.

brk The use of ‘brk’ is a more traditional way of obtaining more memory

from the operating system. It does this by expanding the current data area in a continuous block. A call to ‘brk’ with a negative value will shrink the data area, again in a continuous block. Calls move the ‘program break’ location, defining the end of the data segment for that process. Due to the use of continuous regions of memory, fragmentation can occur, which can prevent the whole data region from being shrunk. To minimise these effects of fragmentation, ‘brk’ is traditionally only used for small allocations.

The research in this thesis tends to sit at the application level, and thus is allocator agnostic. As such our experiments are based on the default system

Kernel Space Stack Memory Maps Text Segment Data Segment BSS Segment Heap mmap brk Max stack size (RLIMIT_STACK)

Dynamic libraries, file mappings & large allocations

Small allocations

Uninitialised static variables Initialised static variables Binary Image (ELF) Low address

High address

Figure 2.5: Structure of virtual memory address space for application

allocator.

Lazy Allocation

Many operating systems operate on the principle of lazy allocations. This means that not all of the memory allocated is mapped from virtual memory to physical memory instantly.

With a large allocation, malloc may return a valid virtual memory address, but not all of the pages are mapped to physical pages. Only when a page is accessed is it actually mapped. This behaviour allows the operating system to oversubscribe hardware pages, between applications, making use of space which has been allocated but not used. This can cause Out Of Memory (OOM) issues with the system, where the OOM killer will kill processes to free memory.

This behaviour is one of the main differences between malloc and calloc, as calloc initialises the memory, thus invoking the mapping. This ensures the memory is actually available when the application comes to use it.

2.2.2 Virtual Memory

Figure 2.5 represents the layout of a C style application in virtual memory on a Linux style operating system. From this it is clear to see the regions of memory where consumption can occur. In the figure we differentiate between the memory map region and the heap, and present more discussion on these regions in Section 2.2.1.

Memory, from an application perspective, is generally split into two loca- tions: the stack and the heap (including the memory mapped region).

The Stack This region is generally used for small bits of data, such as variables and function parameters. It is composed of ‘stack frames’, which represent the state of the current function. Stack frames are stacked in order of call, thus are easily traversable and accessible through a single stack pointer.

Memory Mapping Segment This region of memory is consumed by calls

to ‘mmap’, and is for use for dynamic libraries and the mapping files or devices to memory. Memory maps can also be used by allocators to store large objects.

The Heap This region is managed by calls to ‘brk’, which maintains a continuous data region. The majority of allocations are served by this data region. The term is used throughout this thesis to refer to the memory region consumed by allocations, though the allocator may use either ‘brk’ or ‘mmap’ to handle the specific allocations.

2.2.3 Reporting Consumption

The use of virtual memory management in Linux makes the recording of memory consumptions somewhat complex. It introduces subtleties of what should be viewed as the memory consumption of an application, and how it should be reported.

continuous, but is actually mapped to a number of pages of physical memory. Pages inside the virtual memory space may or may not actually be loaded into memory, depending on current usage. Similarly different pages of physical memory can be shared between multiple virtual memory spaces, in the form of shared memory.

We detail a few of these perspectives here, and discuss the role they play in the research presented in this thesis. Of these, one focuses on application level and the remaining three focus on the operating system level analysis.

Allocations

At the highest level the allocation’s size is the volume of memory requested by the application. This is the memory consumption we choose to focus on, as it is closely coupled to the application. As such this volume should not differ between platforms or operating systems, though the memory consumed by system libraries may do.

We note that due to the specifics of allocator memory management, discussed in Section 2.2.1, we must measure allocations at the application level, as opposed to the system level. Monitoring the system calls to ‘mmap’ and ‘brk’ would only indicate the memory assigned to the allocator, rather than what the application specifically requested. As such there is no allowance for understanding the effects of lazy allocations here; we simply record what the application requests, and not how the operating system handles the allocations.

Virtual Memory Size

Virtual Memory Size (VSZ) represents an over-prediction of memory consumption, by recording a total of everything that is used and everything that can be used. This represents the complete size of the virtual memory address space. This size also includes memory which hasn’t been loaded into physics memory pages. As such this metric is very close to measuring allocations at the application level, except with reduced information about the structure of the

allocations.

Resident Set Size

Resident Set Size (RSS) represents a total memory consumption which includes shared memory. Whilst shared memory can be accessed by multiple different threads, the RSS value of all owning threads will include the full volume of the shared memory in their report.

Proportional Set Size

Proportional Set Size (PSS) is seen as a fair way of representing memory consumption, but accounting for a proportional cost of shared libraries. Rather than RSS assuming each process takes full account of the memory used by shared libraries PSS splits the consumption of each shared library, by the number of processes sharing it.

This method allows heavily shared libraries to contribute a small volume of consumption to each process utilising it. This is especially applicable within the domain of HPC, where multiple instances of the same application are loaded on the same node.

In document Addressing parallel application memory consumption (Page 52-57)