Other Data Structures - Data Structures - Tero Säntti. A Co-Processor Approach for Efficient Ja

3.5 Data Structures

3.5.4 Other Data Structures

Besides the three main data structures, the REALJava has six additional data structures. All of these are stored in the system memory, and are thus controlled by the software partition. They will be described next. (1) The reference table mentioned above, used for referencing to the Java objects

in the heap. (2) The static member area, used for storing the static members of classes. The data items here are initialized when a new class is loaded. (3) The class area, which is used for storing the methods and the constant pools of loaded classes. (4) The string area. It is used for storing unique strings, such as “Hello world!” and so on. (5) The code segment area. This area contains the executable native code of the software partition and also the required libraries for I/O and native interfaces. (6) The swap area, which will be created if the co-processors local memory is full, and needs to be swapped to the system memory. This area is initialized only if it is required. The run time sizes of the areas are highly application dependent. Since all of the areas, save for the code segment area, can grow when needed, the sizes can be set to any values that are seen as suitable for a given system. There are no actual limits posed by the REALJava, except that the pointers to the regions are 32 bits long, causing the maximum total amount for all areas to be 4 Gb. This should be more than sufficient for embedded systems.

3.6 Chapter Summary

In this chapter the REALJava virtual machine was described at a conceptual level. The execution model used in the research for this thesis was shown. The hardware software partitioning was defined, followed by the structure of the hardware partition. The software partition was also discussed shortly, with the emphasis being on the components related to the co-processor. Fi- nally, the details of the internal data structures were specified.

Chapter 4

Prototyping the REALJava

Virtual Machine

The Java virtual machine presented in Chapter 2 was partitioned in Chapter 3. In this chapter FPGA technology is used in the prototyping of the result- ing REALJava virtual machine. The prototypes have also been presented in [57]. The assumptions made during the partitioning and structural spec- ification are replaced by ones fitting to the target technology. This chapter begins with details of the major changes required to meet the FPGA specific constraints. The FPGA platforms used for the prototypes are shortly described, along with the properties of the PowerPC 405 embedded processor integrated in the larger FPGA platforms. The FPGA specific tools and techniques are outlined next. The physical size of the co-processor core is also evaluated with comparison systems spanning from Java co-processors to full general purpose processors. Finally, the communication subsystems for all of the platforms are presented.

4.1 Major Changes from the Conceptual Model to

the FPGA Implementation

The first and possibly the most drastic change is the instruction folding unit, which was dropped from the design. The folding was not included because the FPGAs in use do not have fast ROMs, which were specified in Section 3.3.7 to be a prerequisite. Also the fact that the FSM inside the folding unit would have to be synchronous, and thus run at the same speed as the rest of the system, is clearly prohibitive. Since the folding was left out some other measures had to be taken in order to keep the execution rate high enough. To this end the data flow between the stack and the ALU was modified. All the data in the stack goes through a cache and is automatically forwarded

to the ALU. The ALU maps the data items as operands using predefined rules obtained from the Java bytecode instruction set. The rules state for instance that in integer subtraction the topmost entry of the stack is sub- tracted from the second entry. Since the Java compilers produce code that minimizes the number of stack locations used, most of the code produced follows the lines of load, load, compute, store. This means that the top two locations of the stack are generally loaded from the local variable area just before an arithmetic operation, causing the data items to be in the cache and thus readily presented to the ALU. A partial version of folding is per- formed at the output of the ALU. If the result is going to be moved into a local variable, then it is written directly there. In the straight forward implementation the result would be first written to the top of the stack, and immediately moved to the local variable. This form of instruction folding needs only to check whether the next instruction is a local variable store. This check is easy to implement, since the instruction fetch unit provides the instruction stream parameters to the ALU. The parameters are located after the related instruction, so reading the parameter data during the execution of an instruction that does not require parameters actually provides the next instruction.

The last changes were made to the control register bank. The method invocation module described in Section 5.5 required two additional registers to be implemented into the co-processor. These registers provide the ALU with the number of local variables and the number of parameters required by the Java method to be invoked. Since both of these numbers are 16 bits long, they were combined into a single 32-bit register location, which is named LO. Also a set of other addresses were added, in order to minimize unnecessary communications between the CPU and the co-processor. These additions will be presented later in Chapter 5.

In document Tero Säntti. A Co-Processor Approach for Efficient Java Execution in Embedded Systems (Page 84-88)