Virtual Machine Software - Software Partition

3.4 Software Partition

3.4.1 Virtual Machine Software

A typical Java virtual machine implemented purely in software loads Java classes, manages resources such as memory and threads, provides an interface to the virtual machine for native code, and most importantly, controls the bytecode execution. The part of a virtual machine that executes bytecode is called its execution engine. This is the part that typically uses the largest amount of CPU time during the execution of a Java application.

The simplest software implementation of a bytecode execution engine is a bytecode interpreter. In an interpreter, the software fetches one instruction at a time, then branches according to the opcode and finally executes the native instructions implementing the Java bytecode instruction in question. This loop is continued until the interpreter encounters an instruction that requires special processing. For example, a method invocation instruction may require calling a native function or loading a new class to the virtual machine.

Although an interpreter is simple to implement, it is not very efficient. Since the Java virtual machine is entirely stack-based, interpreting bytecode requires a large amount of memory accesses even for relatively simple opera- tions. For example, the following sequence of instructions, which multiplies a local variable by 10, requires six stack accesses:

iload_0 ; push 1 bipush 10 ; push 1

imul ; pop 2, push 1

istore_0 ; pop 1

The source code for the example could be x = x * 10;. Besides the stack accesses, the code segment also accesses the local variable area two times, once for the first load and another time for the last store. This makes the total number of data side accesses to the memory eight. For the instruction side, the number of accesses in this example is five. That number is composed of the four opcodes and the parameter data for the bipush instruction. Since the data and the instruction stream are likely to be placed in the same physical memory, the total number of accesses is 13, and this still does not include the house keeping, such as updating the PC and the top of the stack.

There is also a per-instruction overhead in an interpreter caused by the pointer access used to fetch the instruction and by the instruction dispatch itself. Many optimizations have been developed to reduce this overhead.

Direct-threading and inline-threading [16] seek to reduce the instruction fetch and instruction dispatch time by converting opcodes into the corre- sponding native instructions before execution. Just-in-time-compilation [11] further optimizes the generated code and reduces stack accesses as well by using the host CPUs registers to store intermediate results. Because the Java instructions are replaced with native instructions, these optimization strategies also remove the bytecode program counter.

The structure of the execution engine needs to be changed to support a bytecode co-processor. Rather than executing the bytecode instructions in software, the virtual machine loads the required method’s code segment to the local memory of the co-processor and sets the internal registers to appro- priate values. Then the execution is continued on the co-processor until it encounters an instruction that it cannot execute or the software commands it to halt. The execution in the co-processor can also be suspended because the current thread has used its time slice. The thread scheduling algorithm is discussed later in Section 3.4.3.

Since most of the processing is done on the co-processor, many improve- ments to the software part of the execution engine become unnecessary and impractical to implement. Because the virtual machine needs to be able to update the stack and the internal registers of the co-processor when the execution is transferred from one domain to the other, optimizations that reduce stack accesses or replace the program counter become unusable as such. However, they also become largely unnecessary, because the co-processor takes care of most of the menial stack manipulation and the program counter and stack pointer are updated in parallel to the actual execution in the co- processor, utilizing the inherently parallel nature of hardware.

The software partition of the REALJava virtual machine is implemented in C++. The virtual machine supports JNI [99] and the standard edition of the Java 2 platform [100]. Currently the REALJava virtual machine runs in Windows and Linux on x86 computers or in Linux on PowerPC based systems. Since the software is coded in C++ with no assembler optimizations, porting the software to new architectures and operating systems should be relatively easy. The virtual machine also contains a simple emulator of the hardware’s capabilities, and can be used for testing new functionality on software.

The structure of the REALJava virtual machine is shown in Figure 3.7. Like a generic Java virtual machine, it contains a native interface, a heap memory manager and a class loader. The execution engine of the REALJava virtual machine, however, is split between the software and the co-processor.

The software partition of the virtual machine also manages the local memory of the co-processor for java stack and method code segments and implements a simple thread scheduler for allocating threads to the co-processor.

Method Swap Space Data Class Heap Java Native Methods Area Registers ALU Execution Engine Scheduler Thread Management Memory Host CPU Loader Class Interface Native System Memory Stacks Java Local Memory

Figure 3.7: Logical layout of the REALJava Virtual Machine. The left side of the figure is the host CPU’s domain, while the right side is the co-processor’s domain. On both sides the upper part represents the computational loads and the lower part shows the memory regions.

3.4.2 Bytecode Execution and Method Invocation and Re-

In document Tero Säntti. A Co-Processor Approach for Efficient Java Execution in Embedded Systems (Page 72-74)