• No results found

9.2 Memory Footprint

9.2.1 Static Memory Allocation

In this test, we try to estimate the static memory allocation overhead of run-time environments in order to find out whether our C code generated from RPython is able to compete with handwritten C/C++.

For every programming language/programming environment, there is some infrastructure that allows the program to run. For instance, for a C program, the standard C library has to be loaded in the memory and some data structures instantiated. The standard Java environment does not contain only a huge and powerful standard library, but also the byte-code interpreter and JIT compiler.

Chapter 9. Benchmarks

Program\List Length 100 1000 10000 100000

PyPy-C, BoehmGC 663 663 881 3043

PyPy-C, refGC, uClibc 82 82 348 1696

C 279 279 541 1892 C, uClibc 74 74 336 1688 C, BoehmGC 598 598 602 2208 CPP 807 811 811 2159 PyPy-JVM 11694 11690 11989 13476 CPython 2781 2748 4579 21586

Table 9.1: VmRSS Memory Consumed by a Process with a Linked List, in KiB. There may also be a significant language-dependent overhead connected with the dynamic objects allocation, i.e., allocation on the heap. Overhead of plain C consists only of data structures needed for the heap management itself. The support for object-oriented programming needs an additional infrastructure such as virtual method tables and data for run-time type identification. The objects in fully dynamic languages such as Python or JavaScript are usually implemented as hash-tables that allow flexibility but are much more memory-hungry than C

structs.

The PyPy compiler generates C code; however, it also adds a significant amount of infrastructure: the support for OOP, exceptions, garbage collection, and functions from the standard Python library.

In this benchmark, we have a test program whose data structures consist merely of one dynamically allocated linked-list. Items of the list are "native" objects of the particular environment and contain only one integer as a payload, see figure 9.2.

We let the program allocate a single list of certain length and observe the memory consumption. The lengths of the lists are 100, 1000, 10000 and 100000. The memory consumed by the shortest list is rather negligible so in this case the overall memory consumption goes on the account of mandatory data structures of the particular programming environment.

After the program allocates the list, it starts to infinitely iterate through the list. This kind of infinite loop prevents GC from releasing the data structure; moreover, this steady state allows us to reliably measure the amount of consumed memory.

The amount of really consumed memory (VmRss) for various number of allocated objects and various execution environments can be seen in table 9.1, figures 9.3 and 9.4.

1 # Python 2 c l a s s Item : 3 d e f __init__ ( s e l f , n , next ) : 4 s e l f . n = n 5 s e l f . next = next 6 // C 7 s t r u c t Item { 8 i n t n ; 9 s t r u c t Item ∗ next ; 10 } ; 11 12 s t r u c t Item ∗ Item_init ( i n t n , s t r u c t Item ∗ next ) {

13 s t r u c t Item ∗ r e s u l t = XMALLOC( s i z e o f ( s t r u c t Item ) ) ;

14 r e s u l t −>n = n ; 15 r e s u l t −>next = next ; 16 r e t u r n r e s u l t ; 17 } 18 19 // C++ 20 c l a s s Item { 21 p u b l i c : 22 i n t n ; 23 Item ∗ next ; 24 Item ( i n t n , Item ∗ next ) ;

25 } ;

26 27 Item : : Item ( i n t n , Item ∗ next ) {

28 t h i s −>n = n ; 29 t h i s −>next = next ; 30 }

Chapter 9. Benchmarks 0 0.5 1 1.5 2 2.5 3 100 1000 10000 100000 MiB Allocated objects VmRSS PyPy-C, BoehmGC

PyPy-C, refGC, uClibc C C, uClibc C, BoehmGC

Figure 9.3: Memory Consumption

Analysis of the Results

First see the graph in figure 9.3. The main variant for our intentions is PyPy-

C-BoehmGC. For the lists of lengths 100 and 1000 the memory consumption is

almost the same as for C-BoehmGC so we can say that the PyPy-C-BoehmGC run-time environment is quite compact. However, for the lengths of 10000 and 100000 there is obvious that allocation of a single object is more expensive in

PyPy-C-BoehmGC.

Variant denoted as C that manages memory manually consumes less memory than an equivalent program that utilizes garbage collector: C-BoehmGC. There is a static overhead caused by the Boehm GC library code; on the other hand, we can not surely say whether the allocation of a single object is more expensive when Boehm GC is used.

Apart from libgc that implements Boehm GC, there is another library with significant memory requirements, the standard C library implementation: GNU C Library (glibc). The variant denoted as C-uClibc that uses a more compact implementation of the standard library consumes less memory. The saving is only in the terms of the code size; the heap object allocation expense is the same as in the case of glibc.

0 5 10 15 20 25 100 1000 10000 100000 MiB Allocated objects VmRSS PyPy-C, BoehmGC PyPy-JVM C CPython CPP

Figure 9.4: Memory Consumption (with Java and CPython)

fall is that Boehm GC is not compatible with this library so we have to use reference-counting GC that can be generated by PyPy itself. It is sufficient for our single-threaded test program. The memory consumption of PyPy-C-refGC-

uClibc variant is almost identical as the memory requirements of C-uClibc. It

seems that the overhead of the object-oriented support is negligible which is not surprising as we do not really employ OOP features in this program. In contrast, there was a measurable difference between C-BoehmGC and PyPy-C-BoehmGC which means that Boehm GC is less efficient for C code generated by PyPy.

Now see the graph in figure 9.4. We have also C++ implementation of the test. We just see that standard C++ library is slightly bigger than standard C library.

The more interesting is the PyPy-JVM variant. It is run by the standard Sun/Oracle Java 6. We can see that the static overhead is enormous as the Java run-time environment is powerful and apart from a huge standard library also contains an interpreter and a JIT compiler. On the other hand, the memory consumed by the list itself is comparable to C environment.

Chapter 9. Benchmarks

Program Consumed Memory [MiB]

PyPy-C, BoehmGC 22.2

PyPy-C, refGC 16.0

C 15.6

C, BoehmGC 31.0

Table 9.2: VmRSS memory consumed by repeated allocation cycles, in MiB. The last variant is the RPython source run by the standard Python inter- preter. The interpreter itself is much more compact than JVM but also much more memory consuming than compiled C. Allocation of objects is extremely expensive. It is due to the fact that every object instance contains a hash table in order to provide fully dynamic behavior.