An Energy Consumption Model for Java Virtual Machine

Full text

(1)An Energy Consumption Model for Java Virtual Machine Sébastien Lafond Johan Lilius Åbo Akademi University, Department of Computer Science, Lemminkäisenkatu 14A, FIN-20520 Turku, Finland. Turku Centre for Computer Science TUCS Technical Report No 597 March 2004 ISBN 952-12-1320-5 ISSN 1239-1891.

(2) Abstract In recent years we have seen an explosion of markets for portable electronic devices such as PDAs, personal communicators and mobile phones. The size and complexity of applications, but also development constraints like getting the product to market on time make the use of high-level languages like Java necessary. Java 2 Micro Edition (J2ME) has emerged as a good solution for developing applications on those platforms. The main goal of Java language is to allow applications development with an abstraction of the target platform, making the concept “write once, run it anywhere” possible. The Java Virtual machine (JVM) is an abstract machine, making the interface between platform independent applications and the hardware, through a possible operating system. Thus the use of Java language can be seen as adding one more layer, the Java virtual machine, between the hardware and software layers. In this paper we establish a general framework for estimating the energy consumption of an embedded Java virtual machine. We have designed a number of experiments to find the constants overhead of the Virtual Machine and establish energy consumption cost for individual Java Opcodes. The results show that there is a basic constant overhead that is equal for every Java program, and a that a subset of Java opcodes have an almost constant energy cost. We also show that memory access is a crucial energy consumption component.. Keywords: Java virtual machine, KVM, energy comsumption, opcode, bytecode. TUCS Laboratory Embedded Systems Laboratory.

(3) Contents 1 Introduction. 2. 2 Energy Distribution in Handheld Devices 2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Solutions for low power systems . . . . . . . . . . . . . . . . . .. 2 3 4. 3 An energy consumption model of Java applications 7 3.1 Measurements methodology . . . . . . . . . . . . . . . . . . . . 8 3.1.1 KVM enprofiler . . . . . . . . . . . . . . . . . . . . . . . 10 4 Experiments 4.1 First experiment . . . . . . . . . . . 4.2 Second experiment . . . . . . . . . 4.3 Third experiment . . . . . . . . . . 4.4 Other experiments . . . . . . . . . . 4.5 Java opcode energy cost . . . . . . . 4.5.1 Measurements methodology 4.5.2 Results . . . . . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. 11 11 12 14 14 16 17 18. 5 Conclusion. 19. A Annex. 23. 1.

(4) 1 Introduction In recent years we have seen an explosion of markets for portable electronic devices such as PDAs, personal communicators and mobile phones. These batteryoperated devices provide more and more functionalities and as a consequence become more and more complex. They have in common strong constraints on energy consumption, and thus maximizing battery life for such devices is crucial. Several techniques have been developed to optimize the energy consumption of embedded systems. Those techniques can be hardware based solutions, as well as software or co-design solutions [1]. Until now, techniques for low power optimization of software have been mostly applied on processor instructions level [2, 3, 4] by using principally processor specific instructions [5, 6]. Techniques on memory management have also been widely applied for optimizing energy consumption [7, 8, 9, 10]. Manufacturers have now understood the fact that software optimization for lower energy consumption is an important issue. A typical example is the story of the Siemens C25 "power" [11]. The size and complexity of applications, but also development constraints like getting the product to market on time make the use of high-level languages like Java necessary. Java 2 Micro Edition (J2ME) has emerged as a good solution for developing applications on those platforms. Due to the wide diversity of hardware and OS used in the world of handheld devices, portability across systems is not easy and needs efforts. Java technology makes it possible to develop applications, usable by a range of different systems and platforms, without having to worry about portability. We can easily predict that developer communities and companies will more and more provide Java applications for such handheld devices, to satisfy an increasing demand. But fewer studies have been done on energy consumption from the virtual machine level. The study of energy consumption of Java applications is needed to build an energy consumption model of the virtual machine. This model will be use to quantify the energy needed by the applications.. 2 Energy Distribution in Handheld Devices We mostly know problems related to power dissipation through the noisy fans and the more and more sophisticated cooling systems that need to be installed on modern PC processors. As the number of transistors integrated on one chip increased, the power dissipation follows the same trend. For an Intel Pentuim IV on 90 process the power dissipation can reach 115 watts and drain up to 91 amperes through its 86 processor supply voltage pins. But as soon as a system require being mobile and autonomous those extreme values make a Pentuim IV 2.

(5) I. Dipole. U Figure 1: Convention for current and voltage notation for a dipole in a DC circuit unusable over standard batteries characteristics. For such system giving consideration to its energy consumption is crucial, so we will look in this section at basic concepts about electrical power and energy and review different solution for low power systems.. 2.1 Definitions Using the current and voltage notation convention as show in figure 1, we can define the instantaneous power for a dipole in a direct current (DC) circuit as the following :.

(6) . where is the instantaneous power in Watt (W) dissipated by one dipole at time , the potential difference in Volt (V) across his inputs at time , and . the current in Ampere (A) passing through the dipole at time . ! The average power dissipated by one dipole between the time and in Watt, is defined by : "

(7) . # ! $% . &('*) ',+. - .0/1. . where the function of the potential difference in Volt across his inputs, and . the function of the current in Ampere passing through the dipole. 3 ! The energy 2 , in Joule (J), consumed by one dipole between the time and is defined by : 2.

(8). &('*) ',+. - .0/1. The energy is a measure of power expended over time. In other words one Joule is equivalent to one watt dissipated or radiated during one second. The misuses of terms Energy and Power in the literatures have often lead to confusion. In this document we will try to respect their definitions, with only one 3.

(9) Energy consumed. Power. t0. E. 1. t1 Time. t0. t1 Time. Power. Energy consumed. Figure 2: System A - Power dissipation and energy consumed. t0. E'. t1 Time. 1. t0. t1 Time. Figure 3: System B - Power dissipation and energy consumed exception concerning the term "low power" which already has been adopted by the community as meaning "low average power".. 2.2 Solutions for low power systems In order to extend the battery life of an handheld device two solutions can be applied: increase the amount of energy embedded in the device, or decrease its energy consumption. Much effort has been put into increasing the batteries capacity, but solutions are costly in terms of price, volume and weight [12]. Therefore the attention has been turn to develop so called Low Power Systems. Energy saving can be obtained by specific hardware implementations and, or, by techniques for low power optimization of software [13]. In this section we will look at system level and software solutions that can be applied once the hardware implementation has been defined. The power dissipated by one CMOS component is proportional to the square of its supply voltage 4565 , to his frequency 7 in Hertz (Hz) , its lumped capacitance 8 ;: in Farad (F), its activity factor 9 and its leakage current : 8 "

(10) D:- 9=< <>7?<>4 565A@ 4B565C< 4.

(11) D: where 4B565C< represent the static power (dissipated due to the leakage current), 8 and 9=< <>7E<F4 565 the dynamic power dissipated by the CMOS circuit. We can see that decreasing the supply voltage will have a quadratic impact on the energy consumption, as the dynamic power is proportional to the square of the supply voltage. Historically the power supply of processors was 5V, but nowadays the most common value is 1.2V. However decreasing the supply voltage is not indefinitely possible, as it require to also decreasing the processor process technology. But reducing the process technology causes itself an increasing leakage current [14] and thus increase the energy consumed due to the static power dissipation. This means that efficient techniques for reducing energy consumption will have also to deal with the static as well as the dynamic power dissipated by a CMOS circuit. System level solutions are based on hardware & software collaboration to achieve energy consumption reductions. They all have in common that they reduce energy consumption regardless of the instantaneous power dissipation. Indeed, a system can from time-to-time have a relatively high instantaneous power dissipation and still be described as a low power system, i.e. having a low average power dissipation. Figures 2 and 3 describes this example where System A will HJI LK than the system B, as system A has a consume less energy during the period G lower average power dissipation than system B. If both systems perform the same task and satisfy the same requirements, system A should be chosen to save energy. The deactivation of peripherals, or parts of these, during their idle time is certainly the most common technique for reducing system energy consumption. It implies to define power management policies that will decide whether to put a periphery in a determined sleep mode or not. Most policies are based on the idle time already spent by the peripherals but a policy can also use prediction algorithms. It’s generally the operating system (OS) that will handle these policies, as the OS has a more global view of the system activities than each individual peripheral. Nevertheless, applications running on the system can have unpredictable behaviors, mainly through user interfaces. Too aggressive power management policies can also deteriorate the system performance and make it inefficient. Matching the processor throughput with the current workload while dynamically scaling down the processor frequency can also achieve energy saving. The frequency of a CMOS processor is proportional to its supply voltage as it drives propagation delays and transition times. Thus reducing the processor frequency makes it possible to also decrease the processor supply voltage and gain quadratic energy saving. Figure 4 shows the relation between frequency and voltage on a ARM926EJ-S processor core and figure 5 shows the possible energy saving obtainable through dynamic voltage and frequency scaling (DVS) [15]. The efficiency of DVS is based on the algorithm used to determine the optimized frequency and voltage suitable for the current workload. Algorithms for DVS have 5.

(12) 2.20. 2.00. Vcore (V). 1.80. 1.60. 1.40. 1.20. Measured Vcc Vcc + safety margin 1.00 0. 20. 40. 60. 80. 100. 120. 140. 160. 180. 200. Frequency (MHz). Figure 4: Voltage vs. frequency for ARM296EJ-S processor. 100%. Energy use. 80%. 60%. 40%. 20%. 0% 0. 50. 100. 150. 200. Frequency (MHz). Figure 5: Energy used vs. frequency for ARM296EJ-S processor. 6.

(13) been an active research area, and several solutions have been proposed [16, 17, 18, 19]. But the need for fully software optimization when system level solutions are not available and, or, hardware knowledge is missing. A processor instruction level power analysis model has been developed in [20, 21] where a base cost for each instruction and an overhead costs between adjacent instructions was measured. This model is used to evaluate the energy reduction that can be achieved thanks to software optimization. They are mainly based on the fact that the more instructions a processor has to execute, the more energy it will consume. As a result, techniques for low power optimization of software will require the same analysis and use the same solutions that are used to execution time optimization. By unrolling loops from the executed code, i.e. replicating the body of a loop k times and increase the loop counter with k, we can reduce the loop overheads, remove branch instructions and also have more opportunities for instruction scheduling. In a similar way, inlining procedures (also called in-line expansion), i.e. replacing a function call by the body of the function, allow the elimination of the overhead of the function call, and improves possibilities for compiler analysis and optimization. But both optimizations have the disadvantage of increasing the code size, and thus one needs to trade off energy reduction against memory size. All the more so, as bigger memory requirements will have as a consequence to increase the static energy consumption. For some applications, and depending on the architecture, it might be possible to have ’hand optimization’, particularly if the application is small. As shows in[22], the use of register operands instead of memory operands thanks to an optimal register allocation can lead to a 40% energy consumption reduction. In [23] the use of the specific instruction LAB, which allows loading two registers in one cycle, brings up to 50% of reduction by comparison with two MOV instructions. These two last examples show us the importance of the fact that the compiler uses all capabilities of an architecture.. 3 An energy consumption model of Java applications The main goal of the Java language is to allow the development of applications with an abstraction of the target platform, making the concept “write once, run it anywhere” possible. The Java Virtual machine (JVM) is an abstract machine, making the interface between platform independent applications and the hardware, through a possible operating system. Thus the use of Java language can be seen as adding one more layer, the Java virtual machine, between the hardware 7.

(14) Start JVM. Initialization of the VM. Load the class containing the main method. Interpreter loop. Exit. Figure 6: Simple view of the JVM life cycle and software layers. As it has been done for processor instructions, it would be interesting to see how well applying estimation and optimization techniques on the virtual machine opcodes level can be done. Figure 6 shows a simple view of the JVM life cycle. An efficient energy model should characterize each stage of the life cycle model, and thus show in which stage(s) effort need to be concentrated to achieve energy optimization. It seems obvious that such model need to consider the system’s hardware and software configurations and therefore is not directly portable. But the methodology used to build it can easily be applied on different configurations. As show in [24] the memory consumption must also be included in the model, as the memory might represent the biggest source of energy consumption on a typical embedded system. This is even more important to take into account as the JVM is a stack machine and will therefore have a relatively high memory activity. In addition, it will also be interesting to look at the cost differences between the two possible Java code execution modes: interpreted or just in time (JIT) compilation. JIT compilation increases significantly the execution speed, but in the same time increase memory footprint. A trade-off between time execution and memory footprint size will certainly have to be found to reach the optimum optimization point for energy consumption.. 3.1 Measurements methodology We chose the Sun Microsystems K Virtual Machine (KVM), CLDC v1.0.3, as it has been developed for resource-constrained platform and has its source code freely available. KVM is a small virtual machine containing about 50-80 Kb of object code in its standard configuration and has a 128-256 Kb total memory footprint size depending of the application running on it. KVM can run on a 16-bit or 32-bit RISC/CISC processors clocked from 25MHz. To build an energy model of the KVM we modified the energy profiler enprofiler [25] developed by the Embedded Systems Groups at Dortmund University. Enprofiler is a processor instructions level energy profiler for ARM7TDMI processor cores operating in Thumb [26] mode and integrating the consumption of memory accesses. It has been built from physical measurements done on an Atmel AT91EB01 evaluation board consisting of a AT91M40400 processor cloked 8.

(15) at 33MHz, an external 512K bytes 16-bits SRAM, an external 128K bytes 16-bits Flash memory and several other peripherals. A detailed description of the energy model used by enprofiler is given in [27]. The Thumb instruction set is a subset of the most common 32 bit ARM instruction compressed into 16 bit opcodes which allows on average a 30% code density reduction in comparison to the corresponding ARM code. Thumb instruction set is used on system where cost and energy consumption are important factors, as Thumb code requires smaller memory and will be on average 45% faster than ARM code with 16-bit memory.. Figure 7 shows the measurements methodology scheme used to characterize each stage of the KVM life cycle. The Java class generator generates, from template classes, Java applications with the possibility to modify parameters inside the class source code. Thanks to the Java to C translator JCC we compile and link together the JVM source code and the generated Java application. The executable code is run on the ARM7TDMI emulator ARMulator, which trace instructions, memory accesses and events that occur during the application execution. From this trace we extract all information concerning the memory size, type (read, write, sequential, non-sequential) and access address, the instructions address and their opcode. The energy profiler read the emulator trace as well as an instruction energy consumption database providing processor instruction costs, and an energy consumption database providing the cost of a memory access depending of its address, size and type. The energy profiler estimate the energy consumed by the application and provide information on how the energy is distributed between the processor and memories for each KVM stage.. The main steps of this scheme are as the following: 9.

(16) Java Class generator. Java Application (source code). Java Code Compact (JCC). Java Virtual Machine (Source code). Compiler & Linker. Energy consumption per instruction. Executable (Application + VM). Processor Emulator (ARM7TDMI). Platform data (memory mapping). Energy profiler. Processor trace file : Memory - Instruction Event - Register - Bus. Energy. Battery specification. Max. running time of the application. Figure 7: General measurements methodology scheme 3.1.1 KVM enprofiler KVM enprofiler analyzes 4 input files, and provide details about the energy consumed by the application. It will provide the number of instructions, memory access and garbage collections that occur during the execution. From the linker we collect few useful symbols and their address in order to provide the energy consumed by different stages of the KVM. The eight symbols used are :. M main: this symbol represent the main() function of KVM, and is used by the KVM enprofiler to detect the start of the KVM execution. M StartJVM: this symbol represent the StartJVM(argc, argv) function (in StartJVM.c source file). This function only checks if the user gave a class name as argument, and then calls the KVM_Start() function. M KVM_Start: this symbol represent the KVM_Start() function (in StartJVM.c source file). This function initializes the VM, the global variables, the profiling variables, the memory system, the hashtable, the class loading interface, the Java system classes, the class file verifier and the event handling system. It also initializes the multithreading system after loading the main application class. Just before calling the Interpreter it pushs into the Java stack the classes JavaLangOutOfMemoryError, JavaLangSystem, JavaLangString, JavaLangThread and JavaLangClass. In this way the JavaLangClass is the first class that is initialized. 10.

(17) M garbageCollect: this symbol represent the garbageCollect function (in garbage.c source file) that performs mark-and-sweep garbage collection. M ExitGarbage : the ExitGarbage symbol has been added to the KVM source code to detect the end of the garbage collector in the KVM enprofiler. M Interpret : this symbol represent the Interpret() function (in execute.c source file) that runs the interpreter loop. M KVM_Cleanup : KVM_Cleanup represent the KVM_Cleanup() function (in StartJVM.c source file). It runs several finalization functions of the system when the VM is shut down. M ExitVM : This symbol is used by the KVM enprofiler to detect the end of the KVM execution. The KVM enprofiler need as inputs: 1. An instruction energy consumption database providing processor instruction cost. 2. A memory energy consumption database providing the energy consumption of a memory access depending of its address, size and type. 3. The processor trace file 4. A file generated by the linker providing all needed symbols and their address.. 4 Experiments We run the measurement process over a set of small and basic applications to characterize each stage of the KVM life cycle and see if some stages are dominant over others.. 4.1 First experiment The first application tested, see figure 8, is an application that does absolutely nothing. We run it through the KVM enprofiler in order to find out if overhead constants in the energy consumption can be determined. We can predict that one or several stage(s), like StartJVM, will have constant energy consumption, as they are application independent. The results are given in figure 9.. 11.

(18) public class HelloWorld { public static void main(String arg[]){ { //nothing to do...... } }. Figure 8: First experiment - The Empty application Energy distribution. Memory / Instructions distribution. 4.48% 7.13% 9.80%. 24.69%. 3.33% StartJVM Inst StartJVM Mem KVM Start Inst KVM Start Mem Interpret Inst. Memory. Interpret Mem KVM Clean Instr. 21.24%. Instructions. KVM Clean Mem Garbage Inst Garbage Mem. 75.31%. 49.46%. Figure 9: First experiment - Energy consumption distributions From figure 9 we can already make some remarks. Even if this application does absolutely nothing, the Interpreter stage represent about 70 % of the energy consumed, and the memory access represent the principal source of energy consumption, at 75% of the total energy consumed. We can also note that the KVM automatically launch one garbage collection just before starting to interpret the class file(s). Thus for each application tested at least one garbage collection will consume energy.. 4.2 Second experiment The second experiment runs a Java class, see figure 10, that instantiates several objects of class MyClass. This class doesn’t contain any fields and has just one main method. Each new class MyClass created by main is stored in the heap, and will contain only a reference to the class definitions area. Each instantiation will create a new stack frame and call the MyClass constructor. This constructor calls the java/lang/Object constructor method and creates a second new stack frame. The stack frame created by the main method contains two operand stacks and three local variables to store the length and the loop index. From the energy consumption graph in figure 10 we note that only the energy consumed by the interpreter is dependent on the loop length value. All other stages of the KVM consume a constant energy including the garbage collector, as the maximum number of created object was not enough to fill up the Java heap 12.

(19) Energy consumption in uJ. public class MyClass { public static void main(String arg[]) { int n = X; for(int i=0;i<=n;i++) { new MyClass(); } }. Energy Consumption 80000. StartJVM Inst StartJVM Mem. 70000. KVM Start Inst KVM Start Mem. 60000. Interpret Inst. 50000. Interpret Mem KVM Clean Instr. 40000. KVM Clean Mem Garbage Inst. 30000. Garbage Mem. 20000 10000. }. 0 0. 100. 200. 300. 400. 500. 600. 700. 800. 900. 1000. Length of the loop. Figure 10: Second experiment - Class file source code and loop length effect on energy consumption ..... 4 goto 18 7 new #2 -> create a new ’MyClass’ object in the heap 10 dup > duplicate the reference of the new object in the operand stack 11 invokespecial #3 -> call the constructor 14 pop -> remove the top of the operand stack 17 iinc 2 1 -> increment the second local variable by 1 18 iload_2 -> load the second local variable in the operand stack (i) 19 iload_1 > load the first local variable in the operand stack (length) 20 if_icmple 65543 ...... Figure 11: Second experiment - Java opcodes corresponding to the for loop and trigger off a garbage collection. It is also important to notice that the energy consumed by the interpreter stage is linear and proportional to the loop length. This can be explain by the fact that the interpreter is looping over a number of constant Java opcodes, see figure 11. The KVM enprofiler evaluate the cost of memory access according to the memory technology, i.e. have for each memory type (RAM, ROM, Flash, etc.) an average cost for each access type regardless of its address. As the new opcode allocates the same amount of memory for all objects created, it will have an identical cost for each execution.. The energy distribution for a loop length of 1000, see figure 12, is similar to the first experiment with an interpreter stage even more dominant, representing over 95% of the total energy consumed. If we compare the numerical values of StartJVM, KVM_Start and KVM_Clean stages with the one obtained during the first experiment, we note that they are identical. This shows that constant overhead 13.

(20) Energy Distribution. Memory/Instructions Distribution. 1.42%. 29.93%. Instructions. StartJVM Inst StartJVM Mem. Memory. KVM Start Inst. 28.59%. KVM Start Mem Interpret Inst Interpret Mem KVM Clean Instr KVM Clean Mem Garbage Inst Garbage Mem. 70.07%. 67.16%. Figure 12: Second experiment - Energy distribution for a loop length equal to 1000 costs can be define for some of the KVM stages, as they behave independently from the Java application running on KVM.. 4.3 Third experiment The third experiment runs a Java class which implements a basic arithmetic operation, see figure 13 for the class source code. In figure 14 the distribution of energy consumed by processor instructions and memory access is identical to the one obtained during the second experiment. This observation suggests that running an arithmetic operation or object instantiation dominant application will not influence the energy distribution between processor instructions and memory access. The consumed energy distribution is also similar to the preceding one with the overall interpreter stage contribution lightly lower at 85%. This difference is explained by the lower energy value consumed to interpret this third Java class. As some other stages seem to have a constant energy consumption, the interpreter stage size decrease in the energy distribution. Figure 15 shows the relation between the energy consumed by the application and the loop length. The graph is similar to the precedent one with constant values for all stages except for the interpreter. For the same reasons as experiment 2, the energy consumed by the interpreter is linear in the loop length. But its instruction and memory slopes are about five times lower as the Java opcodes needed to increment one integer consume less energy than the one needed to instantiate one MyClass object.. 4.4 Other experiments We run numerous other Java classes through the KVM enprofiler and all observations on their results were similar to one described above. That is, some KVM 14.

(21) public class MyClass2 { public static void main(String arg[]) { int length = X; int i=0; for(int number=0;number<=length;number++) { i++; } } }. Figure 13: Third experiment - Java class source code. Energy Distribution Memory / Instructions Distribution. 3.39% 1.58%. StartJVM Inst StartJVM Mem. 30.01%. KVM Start Inst. Instructions. KVM Start Mem. Memory. Interpret Inst Interpret Mem. 25.65%. KVM Clean Instr KVM Clean Mem Garbage Inst Garbage Mem. 69.99%. 60.48%. Figure 14: Third Experiment - Energy Distribution for loop length equal to 1000. Energy Consumption Energy consumtion in uJ. 22500 20000 StartJVM Inst. 17500. StartJVM Mem KVM Start Inst. 15000. KVM Start Mem. 12500. Interpret Inst Interpret Mem. 10000. KVM Clean Instr KVM Clean Mem. 7500. Garbage Inst. 5000. Garbage Mem. 2500 0 0. 100. 200. 300. 400. 500. 600. 700. 800. 900. 1000. Length of the loop. Figure 15: Third Experiment - Energy Consumption vs Loop Lenght. 15.

(22) StartJVM Inst.. StartJVM Mem.. KVMStart Inst.. KVMStart Mem. KVM Clean Inst.. KVMClean Mem. 89,2. 210,94. 748,81. 1639,18. 144,92. 326,38. Interpreter Inst. 3552. Interpreter Mem. 8273. Table 1: KVM Energy Consumption Overhead in NPO stages consume a constant amount of energy independently of the Java application running on it. Table 1 shows the three constant stages and their energy consumptions due to respectively processor instructions execution and memory access. In addition, from the first experiment we can define the interpreter overhead cost, as it didn’t have to interpret any Java opcode from the main class method. For all experiments done the total energy consumption follows the same distribution scheme that is 70% of the energy is consumed by memories access and 30% by processor instructions execution. The energy consumption of each KVM stage respects also the same distribution. The garbage collection energy consumption is a very interesting topic and would need further exploration of all variables needed to characterize it. Several factors can influence the garbage collection behavior and thus its energy consumption: the size of the heap, the sizes and numbers of live or dead objects, and heap fragmentation. However, as shown on figure 16, the garbage collection stage will hardly exceed more than 10% of the total energy consumed by an application running on KVM. Figure 16 shows the energy distribution of a small Java class which instantiate new objects in a loop. Each object size is approximatively 2Kb. Thus for a 128Kb heap size and a loop length of 1000 the garbage collector needs to be triggered 16 times to reclaim the heap space occupied by unreferenced objects created by the loop. Table 2 shows the energy values consumed by the interpreter and garbage collector for a loop length of 1000. From all experiments done it is clear that the interpreter stage is far ahead the main source of energy consumption and a better comprehension of it is needed if someone wants to achieve energy optimization on the KVM. As the interpreter reads and executes the Java bytecode, having a closer view of the interpreter implies increasing the granularity of its energy consumption model by looking at the cost of each Java opcode interpreted.. 4.5 Java opcode energy cost In order to get a better interpreter’s energy consumption understanding an evaluation of each Java opcode energy cost is needed. As a strict class file structure 16.

(23) public class HelloWorld { int[] tab = new int[500]; public static void main(String arg[]){ int length = X; for(int number=0;number<=lenght;number++) { new HelloWorld(); } }. Energy Distribution 8.12% 3.71% StartJVM Inst. 25.73%. }. StartJVM Mem KVM Start Inst KVM Start Mem Interpret Inst Interpret Mem KVM Clean Instr KVM Clean Mem Garbage Inst Garbage Mem. 60.94%. Figure 16: Garbage collection weight in energy distribution Interpreter Inst. 54 035. Interpreter Mem. 127 949. Garbage Collect. Inst. 7 789. Garbage Collect. Mem. 17 057. Table 2: Energy consumption values for a loop length of 1000 inNPO needs to be respected, it is not possible to only execute one Java opcode. Thus a costs comparison between two class files is needed to estimate the difference cost between them. The general measurements methodology scheme used to characterize each KVM stage life cycle can be re-used with different input. Instead of using Java application source code files we will use as input appropriate byte-code executable class files. 4.5.1 Measurements methodology Figure 17 shows an abstract view of the class files generator used to create two class files, named ClassFile and ClassFile_Ref, from which energy cost will be compared. The opcode behavior towards the Java operand stack and the local variables array has to be defined for each studied Java opcode, i.e. provide the operand stack state needed before and resulting after the studied opcode execution as well as the number of local variables needed. Figure 18 shows an example of generated bytecode classes for the Java opcode NOP (0x00). In this example ClassFile’s method 1, the main method, executes 256 NOP opcodes when the ClassFile_Ref’s method 1 executes only the compulsory return opcode in order to return void from the main method. By comparing the interpreter’s energy consumption for both class files we can get the energy consumption estimation for 256 NOP executions and thus the energy cost of one NOP opcode. To ensure the estimation quality for each opcode we generate several pairs of class files executing 17.

(24) Java class file generator. ClassFile. Opcode + (argument) Opcode behavior with the stack operand Opcode behavior with the local variables array. ClassFile_Ref. Figure 17: Bytecode executable class file generator the studied opcode and also monitor the possible energy consumption differences between all other KVM stages. 4.5.2 Results From all Java opcodes we will not study the 51 opcodes which handle floating point values as floating point is not supported by the CLDC specification. In addition opcodes including from 0x99 to 0xc9, see table 3 in annex section, can not be estimate using this comparison method as they either have a non constant behavior or have to be executed in a specific context from where the studied opcode energy cost can not be extracted from the context cost. For example it is not possible to disassociate by comparison the cost of the return opcode from the interface method invoking cost as invoking a method implie to execute at some point the opcode return only once. Also the cost of interpreting the opcode new will depend on severals parameters like the index position in the constant pool entry, the new object’s image size, and if the entry has already been resolved or not by the virtual machine. Table 20 in the annex section shows for each studied opcode its energy consumption due to processor instructions execution and memories access as well as the number of processors cycles and processor instruction needed to execute it. On average one Java opcode consume 0.995NPO due to processor execution and 2.377NPO due to memory access with a standard deviation of respectively 0.093 and 0.222NPO . We can also notice that loading value from the local variable array to the operand stack is lightly more expensive than storing the same value back to the local variable. It is also interesting to remark that the opcode bipush consume about 9% less energy than iload and 5% less than the ilaod_x opcodes. Thus it is more energy efficient to load an constant integer lower than 256 into the operand stack using bipush than initializing the local variable array with the constant and using iload or ilaod_x. The same is true if a constant integer lower than 65536 18.

(25) ClassFile Method 1: 0000d8 0009 0000da 0008 0000dc 0009 0000de 0001. ClassFile_Ref. access flags = 9 name = #8<main> descriptor = #9<([Ljava/lang/String;)V> 1 field/method attributes: field/method attribute 0 0000e0 0006 name = #6<Code> 0000e2 00000119 length = 281 0000e6 0000 max stack: 0 0000e8 0001 max locals: 1 0000ea 00000101 code length: 257 0000ee 00 0 nop 0000ef 00 1 nop 0000f0 00 2 nop 0000f1 00 3 nop .............. 0001ed 00 255 nop 0001ee b1 256 return 0001ef 0000 0 exception table entries:. Method 1: 0000d8 0009 0000da 0008 0000dc 0009 0000de 0001. access flags = 9 name = #8<main> descriptor = #9<([Ljava/lang/String;)V> 1 field/method attributes: field/method attribute 0 0000e0 0006 name = #6<Code> 0000e2 00000019 length = 25 0000e6 0000 max stack: 0 0000e8 0001 max locals: 1 0000ea 00000001 code length: 1 0000ee b1 0 return 0000ef 0000 0 exception table entries:. Figure 18: Example of generated byte-code class files has to be load into the operand stack, it will be more efficient to use the opcode bipush instead of iload, but if the integer constant can be stored in the first 4 local variables then iload_x became the most efficient opcode. It is also important to compare the obtained values with the NOP energy consumption. As the opcode NOP is the first case statement in the interpreter’s switch and doesn’t execute any instruction, its energy consumption represent the minimum overhead cost due to the interpreter mechanism. For the most expensive studied opcode, dup2_x2, the interpreter mechanism overhead represent about 63% of its energy consumption.. 5 Conclusion Several observations have been done in this report concerning the energy consumption of the KVM. For the hardware configuration fixed by the KVM enprofiler, the distribution between the processor and memories is constant over the KVM execution with 70% of the energy consumed through memory accesses. This shows the major importance of the memories concerning an embedded system’s runtime performance. In addition our analysis shows that the virtual machine interpreter is far ahead the main source of energy consumption As the interpreter mechanism overhead cost is the predominant factor in opcode execution cost, it will be interesting to look at the cost differences between the two possible Java execution modes: interpreted or JIT compilation. 19.

(26) References [1] G. Cabillic T. Higuera V. Issarny J-P Lesot F. Parain, M. Banâtre. Techniques de réduction de la consommation dans les systèmes embarqués temps-réel. Technical report, INRIA Rennes, 2000. [2] Vivek Tiwari and Sharad Malik and Andrew Wolfe. Power Analysis of Embedded Software. In International Conference on Computer-Aided Design, San Jose CA., nov 1994. [3] A. Seth and R.B. Keskar and R. Venugopal. Algorithms for Energy Optimization Using Processor Instruction. In Cases’01, 2001. [4] R. Venugopal Anil Seth, Ravindra B Keskar. Algorithms for energy optimization using processor instructions. In International conference on Compilers, architecture, and synthesis for embedded systems- Atlanta, Georgia, USA, 2001. [5] Wen-Tsong Shiue. Retargetable Compilation for Low Power. Technical report, Silicon Metrics Corporation. [6] Sharad Malik Mike Tien-Chien Lee, Vivek Tiwari. Power analysis and lowpower scheduling. In International Symposium on System Synthesis, Cannes, France, sep 1995. [7] Vivek Tiwari Mike Tien-Chien Lee. A Memory Allocation Technique for Low-Energy Embedded DSP Software. In IEEE Symposium on Low Power Electronics, San Diego, CA, oct 1996. [8] Catherine H. Gebotys. Low Energy Memory and Register Allocation Using Network Flow. In Design Automation Conference, pages 435–440, jun 1997. [9] X. Fan, C. Ellis, and A. Lebeck. Memory controller policies for DRAM power management. International Symposium on Low Power Electronics and Design (ISLPED), aug 2001. [10] S. Wuytack. Global communication and memory optimizing transformations for low power systems. IEEE Intnl. Workshop on LPD, Napa, CA, 1994. [11] Siemens Press Office - Release N. ICP CD 19908.001. [12] Ronny Tits Luc Claesen, Hans de Kuyper. Low power applications at system level. In Low Power Design in Deep Submicron Electronics, pages 543–564, 1997. 20.

(27) [13] Cabillic Gilbert Higuera Teresa Issarny Valérie Lesot Jean-Philippe Parain Frédéric, Banâtre Michel. Techniques de réduction de la consommation dans les systèmes embarqué temps-réel. Technical report, INRIA Rennes, 2000. [14] Ghavam G. Shahidi Bijan Davari, Robert H. Dennard. Cmos scaling for high performance and low power - the next ten years. In Proceeding of the IEEE, Vol. 83, No4, pages 595–606, 1995. [15] Ravi Ambatipudi Clive Watts. ARM Information Quarterly Magazine, volume 2, number 3, pages 26–29. 2003. [16] Alan Demers Scott Shenker Mark Weiser, Brent Welch. Scheduling for reduced cpu energy. In First Symposium on Operating System Design and Implementation, November 1994. [17] Mani B. Srivastava Inki Hong, Miodrag Potkonjak. On-line scheduling of hard real-time tasks on variable voltage processor. In IEEE/ACM international conference on Computer-aided design, pages 653–656, 1998. [18] Gang Qu Miodrag Potkonjak Mani B. Srivastava Inki Hong, Darko Kirovski. Power optimization of variable voltage core-based systems. IEEE Transaction on computer-aided design of integrated circuits and systems, 18(12), 1999. [19] Tien-fu Chen Jian Liang Kuo. Dynamic voltage leveling scheduling for realtime embedded systems on low-power variable speed processors. CASSES 2002, Grenoble - France. [20] Andrew Wolfe Vivek Tiwari, Sharad Malik. Power analysis of embedded software: A first step towards software power minimization. In IEEE Transactions on VLSI Systems, December 1994. [21] Mike T.-C. Lee V. Tiwari. Power analysis of a 32 bit embedded microcontroller. In Prof. Asia and South Pacific Design Automation Conf., pages 141–148, 1995. [22] Andrew Wolfe Vivek Tiwari, Sharad Malik. Compilation techniques for low energy: An overview. In Symposium on Low Power Electronics, October 1994. [23] Sharad Malik Masahiro Fujita Mike Tien-Chien Lee, Vivek Tiwari. Power analysis and minimization techniques for embedded dsp software. In IEEE Transactions on VLSI Systems, Decembre 1996. 21.

(28) [24] Mark C. Johnson Kaushik Roy. Software design for low power. In Low Power Design in Deep Submicron Electronics, pages 433–460, 1997. [25] Enprofiler. http://ls12-www.cs.uni-dortmund.de/research/encc/. [26] An introduction to thumb. Technical report, Advenced RISC Machines Ltd, 1995. [27] Lars Wehmeyer Peter Marwedel Stefan Steinke, Markus Knauer. An accurate fine grain instruction-level enrgy model supporting software optimization. In PATMOS 01, 2001.. 22.

(29) A Annex. Opcode nop 0 iconst_null 0x1 iconst_m1 0x2 iconst_0 0x3 iconst_1 0x4 iconst_2 0x5 iconst_3 0x6 iconst_4 0x7 iconst_5 0x8 lconst_0 0x9 lconst_1 0xa bipush 0x10 sipush 0x11 iload 0x15 lload 0x16 lload 0x16 aload 0x19 iload_0 0x1a iload_1 0x1b iload_2 0x1c iload_3 0x1d lload_0 0x1e lload_1 0x1f lload_2 0x20 lload_3 0x21 aload_0 0x2a aload_1 0x2b aload_2 0x2c aload_3 0x2d istore 0x36 lstore 0x37. Inst. Cost in J 0.831440 0.890020 0.899160 0.890020 0.890020 0.890020 0.890020 0.889760 0.890020 0.922300 0.930960 0.926900 0.990360 1.013700 1.167820 1.167820 1.013700 0.968120 0.968120 0.968120 0.968120 1.104800 1.104800 1.104800 1.104800 0.968120 0.968120 0.968120 0.968120 1.004140 1.148940. Mem. CostJ 1.989840 2.126940 2.150940 2.126940 2.126940 2.126940 2.126940 2.126940 2.126940 2.192040 2.216040 2.214420 2.373900 2.434380 2.815440 2.815440 2.434380 2.322900 2.322900 2.322900 2.322900 2.655960 2.655960 2.655960 2.655960 2.322900 2.322900 2.322900 2.322900 2.410380 2.767440. Nb Cycles 178 190 192 190 190 190 190 190 190 196 198 198 212 216 248 248 216 206 206 206 206 234 234 234 234 206 206 206 206 214 244. Figure 19: Java opcodes costs - Part 1 23. Nb Proc. Inst. 45 49 50 49 49 49 49 49 49 50 51 52 58 55 63 63 55 51 51 51 51 57 57 57 57 51 51 51 51 54 61.

(30) Opcode astore 0x3a istore_0 0x3b istore_1 0x3c istore_2 0x3d istore_3 0x3e lstore_0 0x3f lstore_1 0x40 lstore_2 0x41 lstore_3 0x42 astore_0 0x4b astore_1 0x4c astore_2 0x4d astore_3 0x4e pop 0x57 pop2 0x58 dup 0x59 dup_x1 0x5a dup_x2 0x5b dup2 0x5c dup2_x1 0x5d dup2_x2 0x5e swap 0x5f iadd 0x60 isub 0x64 imul 0x68 ineg 0x74 ishl 0x78 ishr 0x7a iushr 0x7c land 0x7f lor 0x81. Inst. Cost in J 1.004140 0.958800 0.958800 0.958800 0.958800 1.086160 1.086160 1.086160 1.086160 0.958800 0.958800 0.958800 0.958800 0.857440 0.857440 0.928740 1.040200 1.119080 1.026160 1.169000 1.321140 0.990280 0.957860 0.957360 0.959500 0.920080 0.976480 0.976360 0.976420 1.127820 1.128420. Mem. CostJ 2.410380 2.298900 2.298900 2.298900 2.298900 2.607960 2.607960 2.607960 2.607960 2.298900 2.298900 2.298900 2.298900 2.037840 2.037840 2.200260 2.451780 2.638200 2.434680 2.751300 3.100140 2.338680 2.273580 2.273580 2.273580 2.176260 2.321580 2.321580 2.321580 2.701320 2.701320. Nb Cycles 214 204 204 204 204 230 230 230 230 204 204 204 204 184 184 198 220 236 218 246 276 210 204 204 204 196 208 208 208 240 240. Figure 20: Java opcodes costs - Part 2. 24. Nb Proc. Inst. 54 50 50 50 50 55 55 55 55 50 50 50 50 47 47 50 55 59 56 62 69 52 51 51 51 49 53 53 53 63 63.

(31) 0x99 ifeq 0x9d ifle 0xa1 if_icmplt 0xa5 if_acmpeg 0xa9 ret 0xad lreturn 0xb1 return 0xb5 putfield 0xb9 invokeinterface 0xbe arraylenght 0xc2 monitorenter 0xc6 ifnull. 0x9a ifne 0x9e if_icmpeg 0xa2 if_icmpge 0xa6 if_acmpne 0xaa tableswitch 0xae freturn 0xb2 getstatic 0xb6 invokevirtual 0xbb new 0xbf athrow 0xc3 monitorexit 0xc7 ifnonnull. 0x9b iflt 0x9f if_icmpeg 0xa3 if_icmpgt 0xa7 goto 0xab lookupswitch 0xaf dreturn 0xb3 putstatic 0xb7 invokespecial 0xbc newarray 0xc0 checkcast 0xc4 wide 0xc8 goto_w. Table 3: Opcodes from 0x99 to 0xc9. 25. 0x9c ifgt 0xa0 if_icmpne 0xa4 if_icmple 0xa8 jsr 0xac ireturn 0xb0 areturn 0xb4 getfield 0xb8 invokestatic 0xbd anewarray 0xc1 instanceof 0xc5 multianewarray 0xc9 jsr_w.

(32) Turku Centre for Computer Science Lemminkäisenkatu 14 FIN-20520 Turku Finland http://www.tucs.fi. University of Turku Q Department of Information Technology Q Department of Mathematics. Åbo Akademi University Q Department of Computer Science Q Institute for Advanced Management Systems Research. Turku School of Economics and Business Administration Q Institute of Information Systems Science.

(33)

No results found