Classic JIT Optimizations and Techniques - Just-in-Time Compilation

2.2 Just-in-Time Compilation

2.2.2 Classic JIT Optimizations and Techniques

There has been a number of JIT optimizations developed over the years tackling the dynamic language execution overheads. Below is a list of some of the most well known ones:

DIR Dispatch Overhead Removal – This is the first and easiest overhead to tackle. This is also the only optimization that AOT compilers for dynamic languages can implement. This optimization can be implemented by inlining the semantics of the DIR instructions.

1 w1_type = w1.type 2 w2_type = w2.type 3 if w1_type == IntObj: 4 guard(w2_type == IntObj) 5 i1 = w1.intval 6 i2 = w2.intval 7 res = i1 > i2 8 else: 9 if w1_type == UserInstanceObj1: 10 res = call(UserInstanceObj1_gt_fun, w1, w2)

11 elif w1_type == UserInstanceObj2:

12 res = call(UserInstanceObj2_gt_fun, w1, w2) 13 else: 14 gt_fun = call(get_fun, w1, "__gt__") 15 res = call(gt_fun, w1, w2) 16 if res == True: 17 retval = w1 18 else: 19 retval = w2

Figure 2.8: JIT Pseudo-IR with a Polymorphic Inline Cache (PIC) – Corresponds to the pseudo-IR from Figure 2.4, but after two additional user types have been encountered and optimized using a PIC.

Type Feedback – Even though dynamic languages allow different types at each DIR instruction, in practice, programmers only use a few types. Optimizing away the type dispatch overhead uses this observation. The first step towards optimizing this overhead is for the interpreter (or sometimes first-tier DER as in JavaScriptCore [jav]) to collect type information and feed this to the optimizing compiler. The optimizing compiler would use this information to generate specialized DER for the observed types. It will also include guards for checking if the types of variables match the types the DER was optimized for. This technique was first used in Self [HU94a].

Inline Caching – One of the optimizations that can be done by the JIT compiler using type feedback is inline caching. For example, in dynamic-language method call such as foo.bar(), the type of the receiver object (foo) often cannot be deduced statically. However, in practice, very few types might be observed at the call site. Normally, a type dispatch needs to be performed to find the concrete method that corresponds to the type of the receiver object. This optimization inlines into the DER the concrete method along with a type check to ensure the receiver object is indeed what it was expected. Figure 2.4 shows an example of what inline caching might look like in the DER. This example shows that the type for the max function arguments are observed to be IntObj. Using this observation, the semantics of the greater-than function (__gt__) for integer types have been inlined in the trace. Type checking is done using guards, allowing falling back to the interpreter in

case different types are observed when executing the DER. Note that even though type feedback is often used to determine which types to inline cache, inline caching can also be used without type feedback but with type heuristics (using common paradigms in the language). This technique was first used in the Smalltalk system [DS84].

Polymorphic Inline Caches (PIC) – An extension of inline caching is polymorphic inline caches. While the vast majority of the call sites indeed have a single receiver object type (monomor- phic), deoptimizing to the interpreter every time the inline cache is different can be expensive. This approach grows the inline caches for all the common receiver object types observed at each call site. Figure 2.8 shows an example PIC in the DER that corresponds to the earlier DER in Figure 2.4, after additional user-defined types have been encountered and DER recompiled. This particular example uses a polymorphic inline cache both for the builtin IntObj type as well as two different user-defined types. In case the argument is of a fourth type, this code will manually retrieve the __gt__ function for the type, which can be expensive. This technique was first implemented in Self [HCU91].

On-Stack Replacement (OSR) – One issue of having multiple representations of the program, the DIR and DERs produced through multiple levels of compilation is that switching from one representation to another is difficult because each representation uses a different stack layout. This is especially a problem if the current DER receives a different type that what it was specialized for, and needs to deoptimize. It cannot simply return from the currently executing DER because there is intermediate state in the current stack frame that needs to be communicated to the interpreter so that it can continue execution from that point. Similarly, JIT compilation might need take place while the interpreter is in the middle of executing a method, especially important for loops. This technique retains enough meta-information and stack layouts of each representation to reproduce the stack layout for the target representation. This technique, too, was first demonstrated in Self [HU94b].

Semantic Overhead Optimizations – In addition to type specialization techniques, there are also a number of techniques that target the semantic overheads of dynamic languages. One example is the infinite-bitwidth integer types used in Python. Even though semantically, integers need to behave as if they have infinite bits, in practice, most values would fit in machine-bitwidth-sized integers. To efficiently handle this, the VM would actually use a machine-bitwidth-sized integer to represent the dynamic language integer, but check for overflows for any operation that might cause an overflow. In case of an overflow, it changes the representation of the integer to use multiple

1 def f(n, x): 2 if n == 0: 3 return 1 4 elif n % 2 == 1: 5 return x * f(n - 1, x) 6 else: 7 return f(n / 2, x) ** 2 8 9 def f_13(x): 10 return x * ((x * (x ** 2)) ** 2) ** 2

Figure 2.9: Partial Evaluation Example – The partial evaluation off with n == 13 yields f_13, and it has identical semantics tof as long as n == 13. Example adapted from [JG02].

words. Similarly, in Python, lists can contain any data type. However, in practice, it is quite common for programmers to put objects of the same type, such as a list of integers. The VM can then opportunistically use an integer array to hold the data. Besides saving memory, this also reduces the number of type checks for reads. For correctness, the VM checks for insertions into the list and changes the list to a generic object array if an object of a different type is being inserted. Note that these optimizations, unlike the optimizations targeting DIR dispatch and type dispatch, can be implemented for interpreter-only VMs.

In document Co-Optimizing Hardware Design and Meta-Tracing Just-in-Time Compilation (Page 34-37)