COMPILER INFRASTRUCTURE 43 - Compiling Scala for Performance

NEW REFERENCE(scala.Tuple2) DUP

NEW REFERENCE(A) DUP

CALL_METHOD A.<init>

NEW REFERENCE(B) DUP

CALL_METHOD B.<init>

CALL_METHOD scala.Tuple2.<init>

Figure 3.4: ICode for instantiating a pair.

and its corresponding initialization call. In fact, the two instructions don’t need to be in the same basic block:

new Pair(if (n > 0) new A else new B, new B)

This is a global data flow analysis problem, and we rely on a classical reach-ing definitions analysis to recover the def-use chain for uninitialized objects.

Abstraction recovery

Even though ICode and Java bytecode are very similar, there are a number of operations that require special treatment in the parser:

dup_x1/2 Duplicate and exchange bytecodes are not supported in ICode. The top element of the stack is duplicated and inserted two values down the stack. The operation is simulated using temporary variables (occurs fairly rare in practice).

iinc Increment local variable by a constant. Simulated using arithmetic oper-ations.

modules Loads of the special module instance variable are converted back to one

LOAD_MODULEinstruction.

box/unbox Calls to the runtime library that box/unbox primitive values are con-verted to explicit^BOX/^UNBOXoperations.

3.2.3 Data Flow Analysis framework

To facilitate the implementation of various data flow analyses in the compiler we provide a simple framework for iterative forward and backward data flow analysis [38]. The framework is parameterized with a semilattice, that can im-plement different abstracts values of the analysis domain. The semilattice is specified in terms of the Scala type of the domain values, and an implemen-tation of the least upper bound of two abstract values. For instance, suppose we are interested in liveness of local variables [38]. A variable is live at a cer-tain program point if there is at least one path in the control-flow starting at

that program point on which the variable is used without being reassigned in between. The lattice is implemented as follows:

object livenessLattice extends SemiLattice { type Elem = Set[Local]

val top: Elem = new ListSet[Local]

val bottom: Elem = new ListSet[Local]

def lub2(exceptional: Boolean)(a: Elem, b: Elem): Elem = a ++ b }

The abstract values of this analysis are sets of variables. When two control-flow paths merge, the result is the union of the two sets. Intuitively, variables that are live on either of the two possible control-flow paths are live at a split point.

The analysis is fully implemented by providing an implementation for the transfer function of a basic block. The transfer function takes a basic block and the current value at the exit of the block (for a backward analysis like liveness) and returns the value at the entry of the block:

def blockTransfer(b: BasicBlock, out: lattice.Elem): lattice.Elem = gen(b) ++ (out -- kill(b))

where the gen and kill sets describe the effect of a basic block on local vari-ables [38]. gen contains varivari-ables that get a new value while kill are those whose value is invalidated by operations inside b.

The framework provides an implementation of a worklist algorithm for computing a fix-point of the transfer functions.

3.3 Optimizations

3.3.1 Inlining

Inlining is the basis on which other optimizations build. Bringing the callee into the caller provides more context for analysis and opportunities for opti-mizations, so it is very important to be able to inline “interesting” methods, such as^mapor^foreach, in order to get to the ultimate goal: inline closure meth-ods and remove anonymous function objects altogether.

Inlining depends on being able to statically resolve a method call: to know which implementation is going to be selected at runtime. As described in detail in § 6.2.1, numerous techniques have been proposed in the literature [50, 22, 7, 14, 45]. All of them require the whole-program, and most impor-tantly they are not precise enough: Class Hierarchy Analysis (CHA) consid-ers only methods implemented in subtypes of the static type of the receiver, while Rapid Type Analysis (RTA) prunes them further by keeping only those instances that appear in a call to^new. Considering the most important use-case, that of closures, the static type of the receiver is always one of the ^FunctionN traits. They have literally hundreds of implementations in a normal program (1200 in the standard library alone). All of them are instantiated.

3.3. OPTIMIZATIONS 45

Figure 3.5: Type Flow Analysis lattice

Our technique is pretty straight forward: propagate types from allocation sites to the call site, through local variables and stack slots. This is similar to Sundaresan’s VTA [50], but instead of propagating through the whole program (collapsing methods into one single node in the call graph), we propagate only inside the method. The types are flow sensitive (meaning we may have more precise type for a variable on one of the branches), and the inlining decision can be taken only if the method is^final(because we do not have a whole-program assumption).

Type flow analysis

Type Flow Analysis (TFA) infers the type of local variables and stack elements at every point in a method. We use a classical forward data flow analysis, formulated in terms of a type lattice. We begin by defining the type lattice, which is composed of the 9 primitive types and the class hierarchy, having the usual subtyping semantics.

The abstract values of this analysis are pairs of local variable state and stack state:

x7→T,[T₁, T2, . . . Tn]

The first element of the pair maps local variables to types, while the second element is a list of types corresponding to the stack.

We define the ordering relation by a least upper bound operation on the elements of this lattice. We base the operation on the implicit ordering rela-tion defined by the subtyping relarela-tion on Scala types, and define a least upper bound for bindings and stacks, as shown in Figure 3.5. Special care has to be taken for control-flow paths involving exceptions. There may be control-flow merge points at the beginning of an exception handler (for instance, when dif-ferent basic blocks are covered by the same exception handler). The least upper bound in that case has to be the special exception handler stack, containing ex-actly one element, of the type of the exception being caught. This is a direct consequence of the semantics of Java exception handlers: when an exception handler is invoked, it has exactly one value on the stack (the exception that was thrown). Figure 3.6 shows an exception handler covering an object

in-Covered by 6 NEW Pair

DUP LOAD_LOCAL n CZJUMP 2 : 3

NEW REFERENCE(A) DUP

CALL_METHOD A.<init>

JUMP 4 ...

NEW REFERENCE(B) DUP

CALL_METHOD B.<init>

JUMP 4

...

new Pair(if (n > 0) new A else new B, new B)

Figure 3.6: Control-flow merge point for exception handlers. The exception handler is a successor of all other basic blocks.

stantiation. The exception handler is a successor of all other basic blocks, but their output stacks may not necessarily have the same number of elements nor types. The least upper bound of any set of stacks flowing into an exception handler is the special exception handler stack. Interestingly, this does not af-fect local variables, whose state is valid in the exception handler, and whose least upper bound proceeds the normal way.

The analysis is defined in terms of an abstract state and the effect of each ICode instruction. Since all instructions are typed, it is very straightforward to model their effect on the stack and local variable environment. The only instructions that may introduce more precise types are^NEWand^CHECK_CAST. Inlining

TFA provides more precise types at a call site than the static type of the receiver.

If the type is precise enough to identify only one possible method implemen-tation, and that method is final, a decision to inline may be taken. The decision depends on safety and heuristic criteria.

Assuming the icode for the callee method m is available, inlining is safe inside method c when all of the following are true:

visibility Method m does not access any private members, or if it does, both m and c are in the same compilation unit. In the latter case, the member is made public.

hierarchy Method m does not call methods through^super.

3.3. OPTIMIZATIONS 47

In document Compiling Scala for Performance (Page 43-47)