• No results found

Characterizing Java Virtual Machine for More Efficient Processor Power Management. Abstract

N/A
N/A
Protected

Academic year: 2021

Share "Characterizing Java Virtual Machine for More Efficient Processor Power Management. Abstract"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

Characterizing Java Virtual Machine for

More Efficient Processor Power Management

Marcelo S. Quijano, Lide Duan

Department of Electrical and Computer Engineering University of Texas at San Antonio

[email protected], [email protected]

Abstract

With CPUs dominating the power consumption of a datacenter, improving energy efficiency is a primary design goal for current processors, servers, and even datacenters. Therefore, power management schemes such as clock gating and power gating have been proposed to lower the processor power consumption. These schemes utilize different sleep states implemented in current commercial CPUs, dynamically transitioning the processor into a suitable sleep state based on its workload. In particular, power gating is a vital solution to eliminate leakage power nowadays representing a significant amount of the processors’ power dissipation. Although operating at various sleep states reduces CPU idle power, transitioning to and from these states incurs overhead in power consumption and wake-up latencies. The existing power management solutions for a computer system usually focus on the architecture layer with no notion of higher layers such as operating system and virtualization. As a result, their power saving may be suboptimal if cross-layer interaction is taken into account. In this paper, we propose investigating the Java Virtual Machine (JVM) to generate power saving hints at virtualization layer and provide useful information to the hardware system for selecting adequate power limits for the underlying processor. Different components of the JVM are capable of characterizing the software being compiled for optimization purposes. For instance, the garbage collector keeps track of the lifetime of different objects in the JVM; the Just In Time (JIT) compiler determines the frequency of the different methods. Useful information gathered by different features within the JVM can be used to construct a framework that provides guidance for effective low-level power management. Furthermore, benchmarking software can be used to characterize a certain JVM implementation and isolate the least or the most power hungry components and program phases.

Introduction

Popular demand for websites, cloud storage, and databases, among other components in the preceding years, has caused data centers to grow and expand. Increasing the number of servers running in a datacenter increases one important factor; power consumption. From 2005 to 2010 the power consumed by datacenters in the U.S. increased approximately 56% [reference]. This increase in power consumption has driven an interest in optimizing power efficiency of a datacenter. In modern datacenters, CPU makes up 42% of the total peak power, making it the top power consumer [reference]. Therefore lowering the power consumption of the CPU in servers will reduce the overall datacenter power.

The motive of this paper is to provide a new approach to make clock and power gating more efficient, power friendly, and cost effective. Clock gating is a technique used to put the processor in different

(2)

sleep states depending on its demand. Power gating is the technique of completely shutting down the processor whenever it is advantageous towards reducing power consumption. The goal is to use data gathered by different components of the JVM to provide clues to the power management hardware to determine the correct power state for the underlying processor.

This paper explores the use of Powertop, a tool that provides real time information about a program’s effect toward power driving factors of a processor, to gather information about the processor’s states while running different benchmarks in different JVMs. First, some useful background information about the JVM’s architecture, high level functionality of the JVM and power consumption of JVMs are provided to help the reader understand certain data and conclusions we will provide throughout the paper. Suggestions on how JVMs can be used to reduce processor’s power are also provided in the Background section. Then all the experimental components for the experiment in this paper will listed, defined, and their functionalities towards the experiment will be explained including: Powertop, the DaCapo Benchmarks, and Icedtea and Jre8 JVMs. The paper will then be followed by the results and analyses section which will be composed of the results provided by Powertop when running a single benchmark in Icedtea, each benchmark individually in Icedtea, and each benchmark individually in Icedtea and Jre8. The results will then be analyzed and finally conclusions will be made based on the results.

Background

The Java Virtual Machine

A JVM is composed of three main sections: the Virtual Machine Runtime Area, the Garbage Collector, and the Just In Time (JIT) compiler. The VM Runtime Area is responsible for the JVM life cycle, exception handling, error handling, class loading, interpreter, Java native interface and thread management and Synchronization. The Garbage Collector is used for Java object memory allocation and reclamation. The JIT Compiler interprets Java byte codes into native code for the underlying platform [3].

Java programs are compiled by a Java compiler to generate class files consisting of bytecode, i.e. the instruction set of the JVM. During execution, the class loader loads the class files into to the RAM. The bytecode in the class files is then converted into native machine code by the execution engine using a JIT compiler. Figure 1 provides a visual representation of a JVM [3], and one can see how the runtime area communicates with the class loader and the execution engine.

(3)

Figure1.The Java Virtual Machine Architecture

In the runtime area in Figure 1, there are five components, for the purpose of this paper, I will only explain the heap, since it is the section of memory that is segregated for JVM use and it is the area were the garbage collector operates. The heap is separated into two sections, young and old generation. In the young generation section, short lived data is stored, and in the old generation section, data that has survived several garbage collections is stored. The heap is also designed to keep track of how many garbage collections each surviving object has survived. This data can helpful in determining which objects are used the most [3].

Power Consumption of JVMs

Research shows that the peak power consumption of JVMs was usually caused by the application they were running and not by its components [4]. For this reason, research focused on reducing the power consumption of JVMs by implementing optimizations on the different components has faded, and the focus has shifted more on to modulating application peak power.

Using the JVM to Reduce Power

After having a general idea of how a JVM and its components function, it seems feasible to use certain information, gathered by different components of the JVM, about the program being executed to assist the power management hardware decide the adequate processor power state. For instance, the garbage collector records the lifetime of different objects in the JVM’s heap. The Just In Time (JIT) compiler uses different methods to determine the frequency of different methods called in the application executed by the JVM and collecting this data may be helpful towards assisting in determining the power state the processor should be running in.

(4)

Experimental Setup

This section provides a summary of what Powertop is capable of providing and how it was used in the experiment. It also summarizes the purpose of the DaCapo benchmarks and how they are used in the experiments.

Power and Energy Measurements Using Powertop

Powertop is a Linux based tool that provides real time information about programs’ influence to a processor’s state and activity while they are executed. Powertop has seven different modes, and two of them were used in our experiments: html mode and workload mode. Html mode and workload mode were used for Powertop to run a specified workload and store its results in an html file. As depicted in Figure 2, the html file consists of 8 tabs: Summary tab, CPU Idle tab, CPU Frequency Tab, Software Info Tab, Device info Tab, Tuning Tab, Advanced Host Controller Interface (AHCI) tab, and All tab. The summary tab lists the top power consuming processes which kept waking the processor up while running the benchmark. The CPU Idle tab denotes the different Idle states (i.e. the C-states) broken up by CPU, core, and package. The server’s processor used in this experiment has 4 different C-states, C0, C1,C3, and C6, with C0 being the active state, C6 the power gating deep sleep state, and C1 and C3 being the clock gating idle states in between active and deep sleep. The frequency tab presents the different scalable active state frequencies also broken up by CPU, core, and package. There are 16 scalable frequencies in the processor used and they vary from 1200 MHz to more than 2.6 GHz. The highest being turbo mode varies from 2.7 GHz to 3.2 GHz, and the lowest being 1200 MHz. The software info tab depicts the same data as the Summary tab, which shows the top power consuming items which kept waking up the processor, but shows more details like the GPU’s operations per second, disk input output per second, and graphics card wakeups per second caused by each program. The device info tab lists the hardware devices that consume the most power. The tuning tab denotes the list of devices that are not tuned for power management and shows commands on how to tune them individually. The AHCI tab is not supported by the server used. Finally, the All tab shows all the tabs data in one place [5].

(5)

Figure2.Html File Created by Powertop

The DaCapo Benchmark Suite

The DaCapo Benchmarks are a set of realistic, open source Java applications. They were created as an effort to produce Java benchmarks that followed two main criteria: diverse real applications and ease of use. These criteria were met by gathering diverse Java programs to increase the coverage of application behavior, using benchmarks that are easy to measure and have minimal dependencies, and provide a range of various input sizes [6].

The benchmarks used for this experiment are: batik, fop, h2, luindex, lusearch, sunflow, and xalan. Batik benchmark creates a number of Scalable Vector Graphics (SVG) files. Fop benchmark converts ax XLS-FO file to a PDF file. H2 benchmark is an in-memory database benchmark executing transactions against a baking like application. Luindex indexes a set of documents comprised of the works of Shakespeare and King James Bible. Lusearch benchmark does a text search of keywords over a data composed of works of Shakespeare and King James Bible. Sunflow benchmark generates a set of images using ray tracing technique. Xalan benchmark transforms XML documents into HTML files [6].

The Java Virtual Machines

The two JVMs used in this experiment are Icedtea and Jre8. Icedtead JVM is an open source that was invented to provide a solution to build OpenJDK using free non-proprietary tools. OpenJDK is

(6)

an open source JVM implemented by Sun Microsystems in order to be able to use it in free GNU/Linux distributions. Jre8 JVM is a proprietary JVM owned by Sun Microsystems. Since Jre8 is a proprietary JVM, it is expected to be a better JVM when comparing both power and effectiveness.

Results and Analyses

This section presents Powertop’s results when running different experiments with different JVMs and different Benchmarks. The first section focuses in running one benchmark and depicts useful data that Powertop provides. The second section shows results and conclusions regarding running each benchmark once using Icedtea JVM. Finally, the third section compares Powertop’s results when running each benchmark in each JVM, Icedtea and Jre8. Similarities, differences, and conclusions are stated based on the results.

Sunflow Benchmark Analyses Using Powertop

To demonstrate what Powertop is capable of providing, Powertop was used to run sunflow benchmark once in a single thread. It was given a large input, and an html file was generated with the data gathered while the benchmark was executed. Icedtea JVM was used to run the benchmark. Figures 3, 4, and 5 depict the data gathered from the html file created by Powertop after running sunflow with the criteria above.

Figure3.Wakeups/s while Executing Sunflow Benchmark Using Icedtea

The number of wakeups per second by a processor is an important aspect to consider when looking at power efficiency, and Powertop can gather this information when executing a benchmark. Figure 3 illustrates the number of wakeups per second (around 225 wakeups/s) that occurred while the sunflow benchmark was executed. Data presented in later sections will conclude that sunflow wakes up the processor quite often compared to other benchmarks executed in Icedtea JVM.

0 50 100 150 200 250 sunflow Ev en t/s Benchmark

(7)

Figure4.Powertop Results Regarding Idle States while Executing Sunflow Benchmark Using Icedtea

Figure 4, denotes the percentage of time that the processor spends in each of the idle states. Since the sunflow benchmark uses ray tracing to generate a set of images, it is expected to be computation intensive, and Figure 4 conveys that. When running the sunflow benchmark, the processor spends over 80 percent of the time in the C0 or active state, and a low 15 percent of the time, the processor goes into deep sleep while it is in the idle states.

Figure5.Powertop Results Regarding Turbo Mode vs Idle Mode while Executing Sunflow Benchmark Using Icedtea

To compare the time spent in turbo mode vs idle mode, Figure 5 was introduced. Turbo mode 0 10 20 30 40 50 60 70 80 90 C0 % C1 % C3 % C6 % Perc en t o f Ti m e in an Id le St at e Idle States

Sunflow

0 10 20 30 40 50 60 70 80 90 sunflow Perc en ta ge o f Tim e Sp en d in E ach Sta te Benchmark

Percentage of Time in Turbo Mode vs Idle Mode

Turbo Mode Idle

(8)

represents the processor running at its maximum frequency. These results were gathered from the frequency tab in the html file. The graph shows that the processor was in turbo mode close to 80 percent of the time and only 20 percent in idle mode.

Figures 6 and 7 demonstrate the power effects of varying the input size of the benchmark. Based on the two figures, it can be determined that the greater the input size, the more power the processor will consume. This is because the smaller input spends more time in idle states rather than active states compared to the results given with the bigger input.

Figure6.Powertop Results Regarding Turbo Mode vs Idle Mode while Executing Sunflow Benchmark Using Icedtea

Figure7.Powertop Results Regarding Turbo Mode vs Idle Mode while Executing Sunflow Benchmark Using Icedtea

0 10 20 30 40 50 60 70 80 90 C0 % C1 % C3 % C6 % Perc en ta ge o f Tim e in a n Id le Sta te Idle States

Small vs Large Input Idle States Comparison

sunflow large input sunflow small input

0 10 20 30 40 50 60 70 80 90

sunflow large input sunflow small input

Perc en ta ge o f Tim e Sp en d in a St at e Input Size

Small vs Large Input Active vs Idle State

Comparison

Turbo Mode Idle

(9)

Comparing Benchmarks Using Powertop

Using Powertop, seven benchmarks were used for this experiment. This section explores the results provided by Powertop when each of these benchmarks was executed individually. Each benchmark was executed once, given the biggest input possible, and in a single thread. Powertop then stored the results in an html file after each execution where the data for the graphs was gathered from.

Figure8.Wakeups/s while Executing Each Benchmark Independently Using Icedtea

Figure 8 denotes the processor’s wakeups per second during the execution of each of the stated benchmarks. It is obvious that lusearch, sunflow, and xalan benchmarks wake the processor up with the highest frequencies. This is due to the fact that those three benchmarks are the most computational intensive. Looking at Figure 8, one can also determine that the luindex benchmark is the benchmark that needed the least processor assistance. This makes sense since it is only indexing documents, which does not require much effort by the processor.

0 50 100 150 200 250 300 350

batik fop h2 luindex lusearch sunflow xalan

Ev

en

ts

/s

Benchmark

(10)

Figure9.Powertop Results Regarding Idle States while Executing Each Benchmark Independently Using Icedtea

After analyzing Figure 9, one can conclude that sunflow and xalan spend the most percentage of time in C0 active state, lusearch is the only benchmark that spend some time in the C1 state, and luindex is the benchmark with the most percentage of time in C6 deep sleep state. Sunflow and xalan benchmarks spending most of their time in the C0 active state is in accordance with Figure 8, since they were also the benchmarks that caused the most wakeups per second along with lusearch. Generating a set of images using ray tracing technique and transforming XML documents into HTML files, which are done in these two benchmarks, are both computation intensive. The fact that lusearch is only in active state 50 plus percent of the time being one of the benchmarks that woke up the processor the most is due to the chuck of time spend in the C1 state. Since a text search of keywords over a corpus of data may require the processor constantly, but it is not as computational intensive as sunflow and xalan benchmarks. Finally, it is in agreement with Figure 8 that luindex should spend the most time in deep sleep since it is the least computational intensive.

0 10 20 30 40 50 60 70 80 90 C0 % C1 % C3 % C6 % Perc en t o f Ti m e in an Id le St at e Idle States

Time Spend in Different Idle States based on

Pecentage Running Icedtea

batik fop h2 luindex lusearch sunflow xalan

(11)

Figure10.Powertop Results Regarding Turbo Mode vs Idle Mode while Each Benchmark Independently Using Icedtea

Figure 10 presents the percentage of time spent in turbo mode vs idle mode. The conclusions made from this graph are the same as in Figure 8 and 9 for the same reasons. The most computational intensive benchmarks, sunflow and xalan, spend the most time in turbo mode and the least intensive program consumed the greatest percentage of time in idle mode.

Figure11.Execution Time by Each Benchmark Using Icedtea

If execution time per benchmark is taken into account, the results do not vary much. Figure 11 demonstrates that again both sunflow and xalan benchmarks take the longest time to execute and luindex benchmark the least time.

0 10 20 30 40 50 60 70 80 90

batik fop h2 luindex lusearch sunflow xalan

Perc en ta ge o f Tim e Sp en d in St at e Benchmarks

Percentage of Time in Turbo Mode vs Idle State

Turbo Mode Idle 0 5000 10000 15000 20000 25000 30000

batik fop h2 luindex lusearch sunflow xalan

Exe cu tio n Ti m e in m s Benchmarks

(12)

Comparing JVMs Using Powertop

This section compares two java virtual machines (JVM), Icedtea and Jre8, and their results when running the benchmarks. First, comparisons between the time spent in different idle states for different benchmarks between both JVMs will be made, followed by comparisons between time spent in turbo vs idle mode, and finally, contrasts between execution times of each benchmark between both JVMs will be presented.

Figure12.Powertop Results Regarding Idle States while Executing Each Benchmark Independently

By observing Figure 12, one can infer that Jre8 is more power efficient since it spends more percentage of time in deep sleep state, C6. It demonstrates this conclusion more drastically when looking at batik and fop benchmarks. All other benchmarks gave roughly the same results.

0 20 40 60 80 100 C0 % C1 % C3 % C6 % Perc en t o f Ti m e in an Id le St at e Idle States

Time Spend in Different Idle States based on

Pecentage Icedtea

batik fop h2 luindex lusearch sunflow xalan 0 20 40 60 80 100 C0 % C1 % C3 % C6 % Perc en t o f Ti m e in an Id le St at e Idle States

Time Spend in Different Idle States based on

Pecentage Running Jre8

batik fop h2 luindex lusearch sunflow xalan

(13)

Figure13.Powertop Results Regarding Turbo Mode vs Idle Mode while Each Benchmark Independently

When observing the time spend in Turbo vs Idle mode, the same holds true, Jre8 is more efficient. This statement can be made by observing Figure 13. Similar to when comparing idle states, the batik and fop benchmarks spend a greater percent of time notoriously when running in Jre8 compared to running them in Icedtea. When comparing Turbo vs Idle mode xalan benchmark also spends a significantly greater percentage of time in Idle mode when running in Jre8 compared than running in Icedtea. Besides those benchmarks, all other benchmarks showed similar results when running in each JVM when comparing Turbo and Idle mode.

When looking at the execution time for each benchmark using each of the JVMs in Figure 13, one can conclude that Jre8 is quicker at executing most of the benchmarks. Fop, luindex, sunflow, and xalan were execute slightly faster by Jre8, and Icedtea was quicker at executing batik, h2, and lusearch. Something to take into account is that only substantial difference between the execution times between the JVMs occurs when executing the batik benchmark, in which Jre8 takes nearly twice as long as Icedtea. 0 20 40 60 80 100 Perc en ta ge o f Tim e Sp en d in St at e Benchmarks

Percentage of Time in Turbo Mode vs Idle

State Running in Icedtea

Turbo Mode Idle 0 20 40 60 80 100 Perc en ta ge o f Tim e Sp en d in St at e Benchmarks

Percentage of Time in Turbo Mode vs Idle

State Running in Jre8

Turbo Mode Idle

(14)

Conclusions and Future Work

The motivation of this work is to generate power saving hints at virtualization layer and provide useful information to the hardware system for selecting adequate power limits for the underlying processor. Powertop was presented as a possible solution for a JVM power measurement tool. It was successful at providing data that lead to conclusions on which type of applications are power hungry and how Jre8 is better than Icedtea JVM.

First, it can be concluded that graphic related programs such as generating images are power hungry when running in the JVM. Also, any complex transformation from one file type to another can also consume a fair amount of power, e.g. in the case of the xalan benchmark converting XML files to HTML files. On the contrary, a simpler file conversion, which was implemented by flop benchmark, from XLS-FO file to a PDF file does not consume as much power. Also, a simple program like indexing a set of documents is also power friendly when implemented. The next step to take based on these conclusions is looking into what is happening inside the JVM that is causing a program to consume or not consume that much power? What is the state of the JVM when an application is causing it to consume power? What type of functions are the ones that make the processor be active the longest?

Second, Jre8 was slightly more efficient then Icedtea when taking execution time into account, but the differences were minimal for most benchmarks. Jre8 was also more efficient when it came down to power management, and the differences there were more notable. The next step will be to analyze why Jre8 is more power efficient than Icedtea? What are the difference in the internal states when running a power hungry vs a low power program?

Additional future studies include analyzing the garbage collector when running a power hungry application. Useful data on objects’ lifetime may be acquired and analyzed and conclusions can be made on why a particular object has a short or long lifetime within the JVM. The JIT compiler is another component that may store useful data that may be used to draw conclusions, e.g. it uses different techniques to determine the frequency of the different methods implemented by the program running in the JVM.

References

[1] Koomey, Jonathan, “Growth in Data Center Electricity Use 2005 to 2010,” A report by Analytics Press, The New York Times (2011).

[2] L. Barroso, J. Clidaras, and U. Holzle, “The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines,” Second Edition, Morgan & Claypool Publishers (2013). [3] Venners, Bill, “Inside the Java Virtual Machine,” McGraw-Hill, Inc. Press, New York, NY, 1996, Chap. 5.

[4] Contreras, Gilberto and Martonosi, Margaret “Techniques for Real-System Characterization of Java Virtual Machine Energy and Power Behavior,” IEEE (2006).

[5] Accardi, Kristen C.and Yates, Alexandra, “Powertop User’s Guide,” Intel Corporation (2014). [6] Blackburn, Garner, Hoffmann, Khan, McKinley, Bentzur, Diwan, Feinberg, Guyer, Hirzel, Hosking, Jump, Lee, Moss, Phansalkar, Stefanovi, VanDrunen, von Dincklage, Wiedermann “The DaCapo

(15)

References

Related documents

Marriage certificates show that in Morteros the majority of these repeated pairs belong to Italian pairs and Spanish pairs of surnames, in both immigration and post-immigration

This conclusion is further supported by the following observations: (i) constitutive expression of stdE and stdF in a Dam + background represses SPI-1 expression (Figure 5); (ii)

Uji statistik independent t test menunjukkan tidak adanya perbedaan secara signifikan SDNN dan RMSSD pada kelompok kontrol dan intervensi, sehingga bekam kering tidak

Our new favourite thing will be to listen to music together We will be so happy when Covid is

Project Management, Configuration Services, Deployment Services, Asset Tagging, Apple One-to-One Services Solution, Remote Consulting, and Full-Time Resources are custom

godini također nisu bile idealne za uzgoj šećerne repe, iako su temperature bile iznad prosjeka, kada se usporede sa stvarnim potrebama prema Lüdeckeu može se vidjeti da

An investigation of solar panel thermal images collected from an An investigation of solar panel thermal images collected from an unmanned aerial vehicle.. unmanned aerial vehicle

Making sacramental wine requires special attention and care, starting with qvevri washing and marani hygiene and ending with fermentation, aging and storage. During