Runtime Code Compilation - Process Virtual Machines

CHAPTER 2 LITERATURE REVIEW

2.2 Process Virtual Machines

2.2.4 Runtime Code Compilation

The previous sections dealt with process VMs and their building blocks, especially, bytecode interpreters. The VMs, for a given scenario, are able to translate the high-level expressions to bytecodes which are then sent to an interpreter and executed. This means, for each expression, that an expression tree is built and translated to bytecode at runtime and then eventually executed. Though a very flexible approach as compared to static program compilation, it is expected to have some startup overhead. In many scenarios like implementing filtering in a web-server using traditional bytecode, this implementation technique will lead to undesired overhead. This, coupled with the time to interpret and execute, has an adverse impact on performance. A way to optimize this model of execution, and reduce the overhead, is to use dynamic compilation techniques such as Just-in-time (JIT) compilation, where instead of directly executing the bytecode in the interpreter, it is first converted to native code for the given architecture and then executed. Though some startup time is still present while using JIT compilation, the speedup obtained in most cases by the use of native code overshadows the other drawbacks by a large factor.

JIT Compilation Overview

JIT compilation has been used in various scenarios and can take up multiple forms - from dynamic translation (architecture to architecture in ISA simulators) to an intermediate interpreter/runtime compiler implementation. In this section, JIT is discussed in the context of its use as an intermediate compiler invoked at runtime and running in parallel to, or re- placing the interpretation phase - just as in the case of BPF. In [27], an early implementation

of JIT in JDK 1.1 has been discussed in detail. There are two possible execution methods after generating bytecode using javac - convert it to native code using JIT and then execute it on the hardware, or use the java interpreter and execute it on the JVM directly. In a traditional non-JIT scenario, at runtime, the JVM loads the class files, determines the semantics of each individual bytecode, and performs the appropriate computation. However, the addition of the translation to native code improves the speed to a great extent. Newer Java implementations use HotSpot technology where the code starts getting interpreted at the beginning but, as it detects that certain routines/components are being heavily used, they are dynamically compiled and executed. There are a few major drawbacks and benefits for this dynamic compilation approach, as compared to plain static compilation, which are still relevant in current JIT implementations.

Advantages :

1. One of the major advantages of dynamic compilation is that the generated code is better optimized than statically compiled code [28]. Modern JIT compilers make use of performance counters to collect information about how the program is behaving during runtime, so that dynamic compilation can be better optimized.

2. Better optimized code can be achieved by inlining certain methods, program control flow optimization, dead-code removal, loop combining, loop unrolling or architectural optimization during native code generation by chosing optimization strategy suited for the code profile.

Drawbacks :

1. In dynamic compilation, there is always an initial warmup time which induces a certain overhead. This may be small or big depending upon optimizations and the implementation of dynamic compilation.

2. Some optimizations can be non-deterministic in nature and are not suitable for all applications. For example, in real-time and interactive (GUI) applications, it may cause unpredictable behavior [28] if not implemented properly.

To summarize, in the case of plain interpretation of bytecode, execution is slower as compared to bytecode compiled to native code. In the case of static compilation, there will be less opportunities for optimization as compared to JIT compilation. For the limited JIT require- ments in our research scenario, a significant speed improvement at a possibly very low initial startup cost can be expected. The subsections below discuss typical JIT compiler elements and subsequently some simple and moderately complex implementations of JIT for our needs.

JIT Compiler Design

A JIT compiler has to be closely coupled to the process VM or the simple bytecode gene- rator in the machine. The modern JIT compiler design may be segregated into four major constituent elements :

VM-JIT Compiler Interface : Assuming that the decision to choose the target section of code for compilation has been reached (based on certain statistics for function use), there has to be an interface from the interpreter which will inform the compiler to begin compilation, specifying certain arguments such as the function name, bytecode etc. Such an interface has the responsibility to bootstrap and invoke the JIT compiler. This interface, along with the actual compiler code, can be considered as the compiler frontend. The compiler library can be dynamically loaded, or be made part of the VM itself, and provide methods like jitCompileInit() etc. There has to be a provision so that the compiler has access to the required data structures and code in the VM.

Compilation Logic : This can be considered as the actual code of the compiler which will begin compilation. Usually, the main argument it needs is the bytecode data structure required to be translated and it returns the executable native code. For example, GNU LibJIT has a function called jit_function_compile which takes as argument, the jit_function_t structure [29]. This structure actually contains the IR (Intermediate Representation) of the function, before it is actually compiled. Upon compilation, the actual machine code (for a specific architecture) is generated and stored in the same structure. The generation of code for most simple JIT compilers is a direct switch statement to convert individual IR statements to native code. The compiler can be more sophisticated and attempt to optimize the generated code (optimizing compiler or non-optimizing compiler ). The optimizing type has the ability to selectively compile only commonly used methods, dead-code detection etc. The non-optimizing type usually compiles all the methods, which may be expensive as they consume memory as well as use up valuable time. However, it is simple to implement, and useful for small VM designs (such as filtering machines)

Code & Execution Management : Upon generation of the code, the compiler provides certain functions so that the compiled code can be invoked and executed in the VM’s context seamlessly, just as it would have been interpreted. The invocation is pretty straightforward : we can pause the interpretation phase, upon reaching the desired target function, compile the function and then execute it in the same context, for example with some methods like

jit_function_apply() in LibJIT [29]. Upon return, we restart from where the pause happe- ned. Before execution, the VM has to figure out how the execution control has to be switched. This can vary for different approaches. For example, in simple compilers, sometimes all the bytecode is pre-compiled in one go, at the start, and then just executed directly by the VM.

Memory Management : This is another major aspect of a JIT compiler. During the compilation and execution phase, there is a need for memory. For example, the compiled code has to be allocated to a specific memory location and has to be freed once used. Managing the stack offsets is yet another task during execution. Apart from that, if it is an implementation where a switch from the VM’s interpreter to the compiled code is required repeatedly, the interpreter’s PC has to be saved and the native code’s jump tables have to be laid out. Also, there is global data, such as dynamically allocated data during code execution, which is supposed to go possibly in the VM memory. Most JIT compilers need to have a well defined strategy for managing memory.

In document Low-Impact System Performance Analysis Using Hardware Assisted Tracing Techniques (Page 41-44)