1
Failures classification and analysis of the
Java Virtual Machine
Domenico Cotroneo1,2, Salvatore Orlando1, Stefano Russo1,2
(1) Dipartimento di Informatica e Sistemistica - Universit`a degli Studi di Napoli Federico II Via Claudio 21, 80125 - Naples, Italy
(2) Laboratorio ITEM - Consorzio Interuniversitario Nazionale per l’Informatica Via Diocleziano 328 - 80124 Naples, Italy
{
cotroneo, saorland, sterusso
}
@unina.it
Abstract —This paper presents a failure
analy-sis of the Java Virtual Machine in order to pro-vide useful insigths into the nature of reported failures and to improve the understanding of de-pendability aspects of the Java Virtual Machine. Failure data are extracted, in the form of reports, from publicly available bug databases, where de-velopers and users of Java applications usually submit failures/bugs.
Results presented in this work clearly indicate that much more efforts have still to be done in or-der to improve the dependability of the JVM. In particular, the conducted analysis revealed that i) built-in error detection mechanism are char-acterized by a low coverage; ii) it is not pos-sible to claim that the JVM achieves the same levels of dependability across different platforms, since a considerable amount of failures depend on the Operating System or Hardware Platform (or both) on which the JVM was running; iii) developers have to face a trade-off between per-formance and reliability, since Just-in-time com-pilers and Garbage Collectors are responsible for more than 35% of reported failures; iv) The Java Virtual Machine is particularly susceptible to CPU-bound workloads. Finally, code fragments activating faults in the Java Virtual Machine are
injected into Java Applications. A monitoring
infrastructure is setup to gain insights into the nature and causes of each failure. Preliminary results show that often these faults could be re-moved changing the environment of the JVM.
Index Terms—Dependability, Java Virtual
Ma-chine, Failure Analysis, Failure Diagnosis
I. Introduction
Thanks to its high level of portability, pro-gramming abstractions, improvements and ex-tensions carried out in last years, the Java plat-form has gained great popularity being adopted in a wider class of application scenarios. While in the past the major limitation of using Java as
This work has been partially supported by the Ital-ian Ministry for Education, University and Research (MIUR),within the framework of the FIRB Project “Middleware for advanced services over large-scale, wired-wireless distributed systems (WEB-MINDS), and by the Regione Campania, within the framework of the “Centro di Competenza Regionale ICT”.
a development platform was a dramatic perfor-mance penalty, it is possible to state that nowa-days benefits from using Java can be achieved at an acceptable performance penalty, as demon-strated with benchmarks reported in [1]. Re-cently, we are witnessing the use of Java in the real-world critical applications, such as on-line banking or stock exchange trading. Moreover, Java is starting to be adopted also in more crit-ical scenarios, such as process and remote con-trol systems. For instance, Java has been used to develop the ROver Sequence Editor (ROSE), a component of the Rover Sequencing on Visu-alization Program (RSVP), used to control the Spirit Robot in the exploration of Mars [2]. Given such scenarios, we claim that the time is ripe for addressing dependability issues of the JVM. Current implementations have no di-rect support for fault tolerance: Java applica-tions either ignore fault tolerance mechanisms or achieve them through approaches that are out-side the scope of the JVM itself. Although in the industrial and academic community there is a growing need to improve the dependability of the Java Virtual Machine, just few works in the existing literature focused on this issue.
This paper represents a first step toward a de-pendability study of the Java Virtual Machine, which can be regarded as the “beating heart” of the Java Platform. The steps we aim to pursue are the following:
1. Since the JVM is a very complex system, classifying failures, their characterization and their occurrences is a quite an hard task. Thus, we start by presenting a fail-ure characterization of the Java Virtual Ma-chine based on data gathered from stan-dard Bug Databases, which constitute the only publicly available source of failure data for the JVM. Nevertheless, despite of their qualitative nature, these reports should be considered well-founded since they are
sub-mitted and evaluated by skilled people. In order to perform a significant analysis of failure data, a careful filtering is requested. 2. Analyze extracted data thus providing use-ful insights into i) the nature of the failures of the JVM, ii) the sources of the errors and, iii) the relationships between workloads and failures into the JVM.
3. Reproduce the conditions which led to such failures, in order to analyze in detail the behavior of the JVM in faulty condition, thus allowing us to define several significant profiles to conduct an injection campaign aimed at discovering failures modes of the JVM. The behavior of the virtual machine when the fault is injected is captured us-ing a monitorus-ing infrastructure developed using the Java Platform Profiling Architec-ture (JPPA) [3].
4. Once results from previous phases are ana-lyzed, we can define JVM workload profiles in order to conduct a comprehensive field-data measurement campaign.
This work concerns with the first two steps of the outlined phases, even though we are able to give some preliminary results about the third phase. Results presented in this work clearly indicate that much more efforts have still to be done in order to improve the dependability of the JVM. In particular, some of our key findings, which are detailed in the paper, are described below:
1. Built-in error detection mechanisms need to be improved, since they was not capable of detect a considerable amount of failures (45.03%)
2. It is not possible to claim that the JVM keeps the same levels of dependability across different platforms, since a large per-centage of failures depended on the environ-ment on which the JVM was running. 3. Developers have to face a trade-off between
performance and reliability, since Garbage Collectors (which are often optimized for performance) and Just-in-time compilers are responsible, together, for more than 35% of failures.
4. The greatest part of failures (80%) oc-curred when the JVM was loaded with rel-evant workloads.
The rest of the paper is organized as follows. Section II reports an overview of the Java Plat-form and discusses previous relevant work in the fields of dependability assessment. Section III briefly introduces an architectural model of the JVM. Section IV, V and VI cope with the first two steps of the above mentioned dependability
study: the first two sections describe analyzed data sources and criteria used for the analy-sis, whereas the latter discusses obtained results. Section VII performs a preliminary investigation about the behavior of the Java Virtual Machine when faults are injected. Finally, section VIII concludes the paper discussing directions of fu-ture work.
II. Background and Related Work
The Java platform was designed to be as much as possible portable: Java technology compo-nents do not care what kind of computer, phone, TV, or Operating System they run on. They can work on any kind of compatible device for which an implementation of the Java Virtual Machine is available. But there are many differ-ences in terms of available resources and com-putation capabilities among the various devices supporting the Java Platform: Java-enabled de-vices range from smart phones to multiprocessor servers. The analysis reported in this paper ad-dresses dependability issues of JVMs fully com-pliant to the Java Virtual Machine specification [4], usually employed in J2EE (Java 2 Enter-prise Edition) and J2SE (Java 2 Standard Edi-tion) applications. Currently, we do not address issues related to Virtual Machines employed in J2ME (Java 2 Micro Edition) applications. Field failure data collection,measurement-based analysis and fault injection campaigns are first steps of a process that leads to dependability modeling of computer systems and to develop-ment of mechanisms for detection and recovery. Several studies addressed the characterization of the dependability of computer systems based on field failure data collection. In [5] Iyer and Tang performed a measurement analysis of field fail-ure data for a DEC VAXCluster Multicomputer system, proposing markovian models both for failures and errors. Many other works cope with the measurement based dependability analysis of COTS operating systems such as Windows [6], [7], [8], [9] and Linux [10], [11].
Moreover, several studies were carried out about system dependability evaluation through fault injection. Some of them address the Java Plat-form, such as [12], [13] and [14]. The first proposes a pattern-based Fault Injector, named JACA, which performs fault injections using re-flection, whereas the latter proposes another Fault Injector designed for network applications, named FIONA, which inject faults into the
DatagramSocketClass, using theJVM Tool
In-terface, included in JPPA.
er-Hardware Operating System
Java Applications
Host System ISA
Java API (JDK)
Memory Management Unit
Reference Handling
Garbage Collection Finalization
System Services Unit
OS Virtualization Layer Unit
Thread Management Timers Class Loader Management & Debugging Fast Allocation Mechanisms Execution Unit JIT Compiler (s) Interpreter Exception handling JNI
Host System ABI
User Host System ISA
Vendor-specific Packages
JVM ISA
Fig. 1. Architecture of the Java Virtual Machine
ror information gathered from event logs, auto-mated failure reporting tools or failure reports provided by user or maintenance staff. The work proposed in this paper aims at perform a prelim-inary analysis of the dependability of the JVM, gathering information from failure reports pro-vided in bug databases. Moreover, the behav-ior of the JVM following the execution of faulty code, i.e. fragments of Java code which lead to JVM failure (taken from the above mentioned Bug Databases) is analyzed.
III. An overview of the architecture of the JVM
The JVM is a virtual machine belonging to the High Level Language VMs (HLL-VM) cat-egory [15]. An HLL-VM is a VM which adds support for cross-platform programming,
pro-viding a virtual Instruction Set Architecture
(ISA), virtualizing the Application Binary
In-terface (ABI) and the ISA exposed by the
un-derlying Operating System and Hardware, thus making applications written for the virtual ma-chine platform-independent.
The virtual ISA of the JVM is a set of instruc-tions called bytecodes; programs written in Java are compiled into bytecodes. The JVM is com-posed by four main components, depicted in fig-ure 1:
• Execution Unit - It dispatches and
exe-cutes operations, emulating a CPU. An op-eration could be a translated bytecode in-struction, a compiled bytecode instruction
or a native instruction. The Interpreter
translates single bytecode instructions into
native machine code whereas the
Just-In-Time(JIT)compiler translates entire
meth-ods into native code doing some optimiza-tions. Instead native instructions need no
translation since they are not bytecodes but native machine instructions. They are dynamically loaded,linked and executed by
theJava Native Interface (JNI). Moreover,
the Exception Handler handles exceptions
thrown by both Java Applications and the
Virtual Machine. Exceptions thrown by
applications are definedchecked, while
ex-ceptions thrown by the VM are defined
unchecked and are related to errors
origi-nated into the virtual machine.
• OS Virtualization Layer Unit - It
pro-vides a platform-independent abstraction of the host system’s ABI. This abstraction layer provides a common gateway for all JVM components to access host system’s resources.
• Memory Management Unit- It handles
both the JVM heap area and the stack area, managing object allocation, reference han-dling, object finalization and garbage
col-lection. Moreover Fast Allocation
Mech-anism are provided to allocate temporary memory areas for internal VM operations.
• System Services Unit - Components
in-cluded in this unit offer services to Java
Ap-plications. TheThread Management
com-ponent handles thread creation and ter-mination and implements mechanisms for thread synchronization as specified by the
Java Virtual Machine Specification [4] and
theJava Language Specification [16]. The
Class Loader is in charge of dynamically loading and verifying Java classfiles (which
contain byte codes).Timerscomponent
ex-poses functionalities to access system timers
through the JVM. Finally, theManagement
and Debugging component includes
func-tionalities for debugging Java applications and for the management of the JVM.
IV. Data Sources and data extraction procedure
Bug databases are a precious source of infor-mation related to reliability and robustness of software systems: software faults occur when buggy code is executed. Some kinds of bugs, namely HeisenBugs, MandelBugs and Schroed-inbugs, are particularly likely to elude all test-ing phases, since they usually disappear or alter their characteristics when they are researched. Failure data presented in this paper are ex-tracted by Sun [17] and Jikes [18] bug databases. Other implementations, such as Kaffe and JRockit had no public bug databases or very poor ones. These Bug Database were
periodi-cally checked between June 2005 and October 2005.
Among thousands of submissions related to the whole Java Platform, 698 bug submissions related to JVM failures were selected and ana-lyzed. This set was further refined by excluding submissions which met the following criteria:
• The bug has been marked as Fixed.
• The failure reported is related to a version
of the JVM still under development or
test-ing. Since our research is aimed to
dis-cover information about failures of opera-tional JVMs we dropped these submissions (i.e: submissions related to J2SE 6.0).
• The submission is elusive or it does not
con-tain enough information to characterize the failure.
• The failure report is related to a fault or
an error in lower levels, such as operating system or hardware. We are interested only in failures originated from software faults in the JVM itself.
• The failure is attributable to errors in upper
levels such as applications, middleware or application servers. Even if these reports are submitted as JVM-related bugs, their source is outside the JVM.
Among the initially selected submissions, 147 (29 from Jikes Database, 118 from Sun) were selected; only 3 of these submission were re-lated to BohrBugs, which can be easily repro-duced and located; 191 distinct failures were re-ported in these submissions. Each submission reports the environment on which the JVM was running,the configuration of the virtual machine (i.e.: heap configuration,JIT compiler used) and stack traces. Many failure reports also contained a detailed description of the source of the failure (given by specialists in the evaluation section of the report itself) and information related to the frequency of the failure and its reproducibility.
V. Failure Classification Criteria
In this section we discuss criteria we adopted to classify extracted failure reports.
• Failure Manifestation - Reported
fail-ures were classified according to their man-ifestations (i.e: the message printed on the console). Five failure manifestation types were defined:
- VM Error Message - A Java
Program-ming Language Exception was thrown and reported to the user.
-OS Error Message- An Operating System
level error message such asSIGSEGVwas
re-ported to the user.
Platform-Ind. The failure occurs independently by the environment OS-Dep. The failure occurs only on a specific Operating System Platform-Dep. The failure occurs only on a specific Hardware Platform
OS&Platform-Dep.The failure occurs only on a specific Hardware Platform and Operating System
TABLE I
Categories for the reliance by the environment
The Execution unit is mainly stressed, many compilation tasks are performed and a lot of thread synchronization happens.
EXAMPLES: Application servers,Transactional systems,Parallel algorithms.
Almost all available heap space is allocated, collections happen at an high frequency
EXAMPLES: Scientific applications,database processing applications
Input-Output operations on file system, databases, network connections are mainly executed
EXAMPLES: Web Servers,Transactional systems
The application does not impose any particular workload
EXAMPLES: web browsers, e-mail clients, address book s COMMON I/O BOUND MEMORY BOUND CPU BOUND TABLE II Workload levels
-Hang/Deadlock - The JVM did not crash,
but it stopped executing the application (or a part of it).
-Silent Crash - The JVM crashed silently,
without printing any error message.
- Computation Error - Results obtained
were different from the expected ones.
• Failure Source - By analyzing
informa-tion attached to failure reports we were able to pinpoint the component(s) of the JVM in which the source of the error was lo-cated. According to the architectural view described in section III the following cate-gories and subcatecate-gories were defined:
-Execution Unit - This category in further
divided intoShared Runtime, JIT
Compil-ers,Interpreter andJNI subcategories.
-OS Virtualization Layer Unit.
- Memory Management Unit - This
cate-gory is further divided intoGarbage
Collec-tor andReference Handling subcategories.
- System Services Unit - This category is
further divided into Thread Management,
Class LoaderandMonitoringsubcategories
• Severity- A failure is definedCatastrophic
if the failure leads to the crash of the JVM
or non-Catastrophic if the JVM still runs
despite of the failure.
• Environment - Failure reports were
clas-sified according to the reliance by the en-vironment on which the JVM was running. Four categories (described in table I) were defined.
• VM Activity - Past studies [19] showed
that the average failure rate of a system cor-related strongly with the average workload on the system. Failure reports were classi-fied according to the workload imposed on the JVM when the failure was reported.
Ta-- a Ta-- b -0,00% 5,00% 10,00% 15,00% 20,00% 25,00% 30,00% 35,00% 40,00% 45,00% 50,00% O S-Leve l VM -Lev el Sile nt C rash Han g/D eadl ock Com p. e rror Catastrophic Non-Catastrophic 0,00% 5,00% 10,00% 15,00% 20,00% 25,00% 30,00% 35,00% 40,00% 45,00% Out Of M em ory Stac k O verfl ow Run tim e Ex cept ion Asse rtion Fai lure Inte rnal Err or Oth ers Catastrophic Non-Catastrophic
Fig. 2. (a) Failure manifestations distribution (b) detailed view of VM-level failure manifestations. Computation errors were captured comparing the “Expected Output” against the “Actual Output” in the failure report.
REGULAR The failure occurs regularly whenever a particular sequence of operations is executed. STARTUP The failure occurs at JVM startup
HOURLY The failure occurs on average within an hour by JVM startup DAILY The failure occurs on average once a day
WEEKLY The failure occurs on average once a week
TABLE III
Timing categories
ble II shows the qualitative workload levels defined.
• Failure Frequency - Failure reports were
classified according to the frequency of fail-ure occurrences. This information was ex-tracted by JVM core dumps or by hints given by submitters. Table III describes the categories defined.
VI. Results
In this section we discuss the results obtained from the analysis of selected failure reports. The first part analyzes failure manifestations and their relationship with the environment on which the JVM runs. The second part high-lights the role of internal JVM components in reported failures, whereas the third part shades some light on the relationships between internal JVM components, failure frequency and work-loads imposed on the JVM itself. To this aim, above the extracted 147 submissions (account-ing for 191 failure reports), 108 failure reports (56.54%) were selected for frequency analysis, 114 failure reports (59.69%) for workload anal-ysis and 101 submissions (68.71%) for environ-ment dependency analysis.
A. Failure manifestation analysis
Figure 2-a depicts a bar chart of failure manifestation and severity. The most recurrent
manifestation is anOS-level message (45.03%),
followed byVM-level messages (32.46%),hangs
or deadlocks (11.52%), computation errors
(5.76%) and silent crashes (5.24%). We found
that almost all failures lead to VM crash (86.06%). Only computation errors are always non catastrophic. A quarter of hang/deadlocks are non catastrophic (the virtual machine is still able to run other tasks), whereas a little part of VM-level manifestations (13.09%) does not lead to VM crash. Almost an half of the failures manifested as OS-level messages (i.e:
SIGBUS, SIGSEGV or ACCESS VIOLATION). This means that built-in error detection mechanism are not able to cover all the activities of the JVM: in many cases the JVM crashes without detecting any abnormal condition.
VM-level manifestations appear when error detection mechanisms pinpoint faulty
condi-tions. In this cases an unchecked exception
can be thrown from the virtual machine, thus giving a chance to handle the faulty
condition in applicative code. With respect
to VM-level manifestation, figure 2-b depicts a bar chart of the various error messages
reported and their severity. Among
VM-level manifestations, the most recurrent is
OutOfMemoryError (44.07%). InternalError
(15.25%), RuntimeException and
AssertionFailure (11.86%), StackOverflow
(6.78%) and others exceptions (10.17%) (i.e.:
NullPointerException) are reported less fre-quently. Even if applications could handle these conditions through Java exception handling
27,59% 72,41% 6.32% (8.73%) 3.45% (4.76%) 23.56% (32.54%) 39.08% (53.97% )
UNKNOWN PLATFORM Ind. OS Dep.
PLATFORM Dep. OS&PLATFORM Dep.
Fig. 3. Relationships between failures and environment. In the bar reported on the right the value without paren-theses represent the absolute percentage of environment-dependent (or inenvironment-dependent) failures, whereas the value in parentheses represent the relative percentage of fail-ures with respect to the number of failfail-ures which OS-dependency is deductible from submissions in bug databases.
mechanism, we found that in the greater part of
cases (with the exception of RuntimeException
manifestations) the consequences were catas-trophic. This indicates that the state of the virtual machine has become so corrupted that no recovery action is possible or that no recovery action was taken in Java applications, since developers did not expect to face similar extreme conditions.
To gain an understanding of the relationship between failures and the underlying environ-ment, we analyzed the dependency of the reported failures on the Operating System and the Hardware Platform, as depicted in figure
3. In some cases we were not able to
dis-tinguish between environment-dependent and
independent failures. In the remaining cases
(more than 70%), we observed that 53.97% of the failures were platform-independent, i.e., the same application showed the same failure on different operating systems and hardware
platforms. Even if only a little percentage
of failures (4.76%) depended exclusively on the hardware platform, a more considerable percentage of failures were dependent on the Operating System (32.54%) or both OS and
Hardware (8.73%). These results indicate
that there is a substantial dependency on the
Operating System. Therefore, it is not possible
to claim that Java applications keep the same levels of dependability across different operating
systems.
To gain a more detailed view of the relation-ship between failures and operating systems, we analyzed OS-dependent failures reported in
Windows, Linux and Solaris. The results of
OS OS-DEP % OS-IND % UNKNOWN %
Windows 27,03% 38,71% 32,43%
Linux 40,00% 52,46% 12,00%
Solaris 19,35% 43,86% 40,32%
TABLE IV
Detailed view of OS-dependent failures
15.69%
44.12% 16.18%
24.02%
OS Virtualization Layer Execution Unit M emory M anagement Unit System Services Unit
Fig. 4. Failure sources
these analysis are described in table IV. These results showed that the dependency on the underlying operating system is more critical in Linux than in Windows and Solaris. However the fourth column of table IV highlights that there is a large margin of uncertainty, since in many cases it was not possible to distin-guish whether the failure is OS-dependent or OS-independent.
B. Failure sources analysis
By analyzing stack traces and core dumps at-tached to bug submissions it is possible to pro-vide useful insights into failure sources, nailing JVM components in which errors were located. Often the source of a failure is located in more than one component. Among the reported fail-ures, 22.47% of them were due to errors in more than one component.
The percentage of failures for each component of the JVM is depicted in figure 4. It is clearly
Execution 19,33% Optimizing JIT 15,13% JNI 3,78% Base JIT 4,20% Interpeter 2,10% GC 16,81% Ref Handler 5,04% Other Memory-Related 1,26% Thread Management 10,92% Class Loader 2,52% Monitoring 0,84% 18,07% OS Virtualization Layer
Memory Management Unit
System Services Execution Unit
TABLE V
- a - b -0,00% 5,00% 10,00% 15,00% 20,00% 25,00% 30,00% 35,00% 40,00% 45,00%
REGULAR STARTUP HOURLY DAILY WEEKLY OS Virtualization Lay er Ex ecution Unit
Memory Management Unit Sy stem Serv ices Unit
0,00% 5,00% 10,00% 15,00% 20,00% 25,00% 30,00% 35,00%
CPU BOUND I/O BOUND MEM BOUND COMMON OS Virtualization Lay er Ex ecution Unit Memory Management Unit Sy stem Serv ices Unit
Fig. 5. Frequency and workload classification of failures with respect to JVM components
visible that the greatest part of failures is due to the Execution unit. Moreover, looking on the details about the subcomponents described in section III, which are depicted in table V, it is straightforward that:
- The greatest part of failures in the mem-ory management unit (72.73%) is due to the Garbage Collector.
- Runtime support operations and optimized just-in-time compilation tasks cover the 77.36% of Execution unit failures.
- The greatest part of failures in the System Ser-vices Unit (76.41%) is due to the Thread Man-agement sub-component.
By analyzing these results it is possible to argue that:
- Runtime support operations, such as method invocation, stack frame allocation and dealloca-tion or excepdealloca-tion handling, seem to be the most critical dependability bottleneck in the JVM. - The optimizing JIT compiler, even if improves prominently the performance of Java applica-tions, is one of the major sources of failures in the JVM; therefore Java developers have to cope with a trade-off between performance and relia-bility.
- The Garbage Collector still remains one of most error-prone components in the JVM. In particular low-pause or high-throughput garbage collectors seem to be critical for JVM reliability; therefore there is another trade-off between the performance of the collector and its reliability.
- Also the OS Virtualization layer has a deep impact on the dependability of the JVM. In par-ticular this component is responsible for 15.91% of Solaris failures, 14.29% of Windows Failures and 24.21% of Linux Failures.
These regards show that the JVM is a com-plex system characterized by several
dependabil-ity bottlenecks. In particular, all performance-enabling components of the JVM represent a
se-rious threat for JVM dependability.
Further-more, the interface between the virtual machine and the underlying environment is one of the most critical dependability bottlenecks for the
JVM itself.
C. Relationships between failure frequency and workloads
We conclude the analysis with a discussion of the relationship between the frequency of fail-ures, the workloads imposed on the virtual ma-chine and the components of the virtual mama-chine itself. Figure 5-a reports the percentage of er-rors with respect to JVM components for each frequency category.
”Regular” Failures are most recurrent ones (39.81%). Since regular failures are related to known issues in JVM implementations, Java de-velopers can avoid them by adopting proper workarounds. Regular failures are mainly at-tributable to the Execution Unit (22.21%) and to the Memory Management Unit (12.96%). ”Startup” (11.11%) and ”Hourly” (10.19%) fail-ures occur at the first stages of Java Program Execution. The OS Virtualization Layer and the Execution Unit are the main causes of hourly failures (6.48%, 2.78%), whereas each compo-nent plays an equivalent role in startup fail-ures. Many non-regular failures shows a daily or weekly frequency (19.44%).
It is worth noting that Execution Unit and Sys-tem Services Unit failures increase when fre-quency decreases, whereas Memory Management Unit and OS Virtualization Layer Unit failures decrease when frequency decreases. This sug-gests the presence of software aging phenomena in this components (especially in JIT compilers, Shared Runtime Support and Thread
Manage-CPU BOUND I/O BOUND MEM BOUND COMMON STARTUP 20,00% 30,00% 10,00% 40,00% HOURLY 50,00% 40,00% 10,00% 0,00% DAILY 20,00% 55,00% 20,00% 5,00% WEEKLY 40,00% 40,00% 5,00% 15,00% REGULAR 15,15% 15,15% 27,27% 42,42% TABLE VI
Relationships between failure frequencies and workload levels
ment sub-components). Further investigations
are required to gain more details about the dy-namics of these phenomena, which, as stated in [20], represent a consistent source of failures in software systems.
Figure 5-b reports the percentage of error with respect to JVM components for each workload level defined. The greatest percentage of fail-ures occurred under CPU Bound Workloads (32.00%), followed by I/O Bound Workloads
(26.40%). Less failures occured under
Mem-ory Bound Workloads (21.60%) or ”Common” Workloads (20.00%).
It is straightforward that the greatest part of failures (80%) occurs when significant work-loads are imposed on the JVM, moreover CPU Bound and I/O Bound applications seem to be more critical for the JVM than Memory Bound applications. Moreover, these results in-dicate that CPU Bound and I/O Bound ap-plications, such as Web Servers, stress mainly the Execution unit (50.98% CPU Bound;38.46% I/O Bound) and the OS Virtualization Layer (25.49% CPU Bound; 32.69% I/O Bound). On the other hand, the most relevant percentage of failures with non-significant workloads are attributable to errors in the Memory Manage-ment Unit (33.03%) and in the System Services
Unit (37.04%). Therefore, since the JVM
suf-fers mainly CPU Bound and I/O Bound applica-tions, it is possible to argue that the development of strategies and mechanisms aimed to augment the reliability of the virtual machine should first address these kinds of applications.
Table VI shows that Regular and Startup fail-ures usually occur when non significant
work-loads are applied, thus confirming that many
failures are due to bugs in JVM implementa-tions or to issues in the interface between the
VM and the underlying environment. Moreover,
non regular failures occur when significant work-loads are applied. For instance, weekly failures usually occur when CPU Bound or I/O Bound workloads are applied.
System Classes
Java Applications
Java Virtual
Machine JVMTI Agent
Test App Monitor MBeans Local Log (events) Faulty Class Events
State information Querying
JVM Events and State Collected Data
Fig. 6. Fault Injection mechanism and monitoring in-frastructure
BUG ID Failure manifestation Code Description
4396719 Silent Crash (Linux) ACCESS VIOLATION (Windows) Iteratively allocates arrays of null objects. 5073365 NullPointerException Tries to change the priority of a thread after the
thread has exited.
6343401 Either Silent Crash,SIGSEGV or ACCESS VIOLATION Executes several times a function copying an array of bytes into another array.
TABLE VII
Faulty code fragments executed to analyze the behavior of the Virtual Machine
VII. Injecting faults into the JVM
The failure classification reported in the pre-vious section highlights the most error-prone components in the virtual machine. By ana-lyzing extracted submissions, it is possible to define several fault injection profiles, executing the code which activates the bug or reproduc-ing the conditions under which the failure has manifested. In this section we present an analy-sis of the behavior of the Java Virtual Machine when faults are injected through the execution of “faulty” code fragments.
The infrastructure used to inject faults and an-alyze the behavior of the virtual machine is re-ported in figure 6. The JVM is instrumented using the Java Platform Profiling Architecture
[3], namely aJVM Tool Interfaceagent and
sev-eral Java Management Extensions Beans (the
Monitor MBeans component depicted in figure 6). The former implements callbacks to handle events raised from the Virtual Machine. These callbacks make use of the JVMTI API in order to retrieve information about Virtual Machine’s state. The latter captures more details about the state of the virtual machine and collects in-formation sent from the JVMTI agent.
When the faulty code is executed, the monitor-ing infrastructure collects data about both the evolution of the state of the JVM and the fail-ure caused by that fault. This infrastructfail-ure is
Component Timestamp Event Additional Information RUNTIME CORE 20051117141924070 THREAD START GC Daemon
RUNTIME 20051117141924070 CONTEXT SWITCH 14 RMI Reaper - 15 GC Daemon MEMORY 20051117141924071 GC START
MEMORY 20051117141924119 GC FINISH
THREAD MANAGEMENT 20051117141924119 MONITOR WAITED Reference Handler Ljava/lang/ref/Reference$Lock; - NOTIFIED 17538308 THREAD MANAGEMENT 20051117141924119 MONITOR WAIT Reference Handler Ljava/lang/ref/Reference$Lock; - 17538308 0 THREAD MANAGEMENT 20051117141924119 MONITOR WAITED Finalizer Ljava/lang/ref/ReferenceQueue$Lock; - NOTIFIED 24212267 THREAD MANAGEMENT 20051117141924119 MONITOR WAIT Finalizer Ljava/lang/ref/ReferenceQueue$Lock; - 24212267 0 THREAD MANAGEMENT 20051117141924119 MONITOR WAIT GC Daemon Lsun/misc/GC$LatencyLock; - 12455463 60000 RUNTIME 20051117141924119 CONTEXT SWITCH 15 GC Daemon - 12 Thread-2
...
CLASSLOADER 20051117141924199 LOAD Ljava/util/TreeMap$KeyIterator; - CLASSLOADER 20051117141924199 PREPARE Ljava/util/TreeMap$PrivateEntryIterator; - CLASSLOADER 20051117141924199 PREPARE Ljava/util/TreeMap$KeyIterator; - MEMORY 20051117141927046 GC START MEMORY 20051117141927048 GC FINISH MEMORY 20051117141927379 GC START MEMORY 20051117141927380 GC FINISH MEMORY 20051117141927550 GC START MEMORY 20051117141927551 GC FINISH ... MEMORY 20051117141929066 GC START MEMORY 20051117141929068 GC FINISH MEMORY 20051117141929251 GC START MEMORY 20051117141929252 GC FINISH MEMORY 20051117141929407 GC START Collections in Faulty Conditions
JVM Crash
Fig. 7. Crash during garbage collection
a part of a more complex monitoring system dis-cussed in [21].
By analyzing collected data it is possible to iden-tify components that caused the failure, along with the error activated in that components. Moreover, in order to discover which fault led to activated errors, the same faulty code frag-ment with different configurations of the virtual machine.
Three types of code fragments, summarized in table VII, are analyzed. These are extracted from the Sun Hotspot Bug Database. The first proves that the error in a component could be activated by a fault in another component, whereas the second points out that the thread management sub-component should not be con-sidered very reliable. Finally, the third code fragment confirms the trade-off between perfor-mance and reliability regarding the optimizing JIT compiler.
A. Bug 4396719 - Error during garbage collec-tion
This failure iscatastrophic: the JVM crashes
after a certain number of iterations. Analyz-ing the event log (depicted in figure 7) it is evi-dent that the crash occurred during garbage
col-lection (last event logged was GC START
with-outGC FINISHevent). Therefore the failure has been activated by an error in the garbage col-lector. The first part of figure 7 shows events logged when a collection is performed in
“nor-mal” conditions: a daemon thread (GcDaemon) is
activated, the collection is executed (GC START
and GC FINISHevent pair), and then the heap
is freed (by the Finalizer thread). Instead,
the second part of figure 7 reports events logged when the collection is performed in “faulty” con-ditions: the Garbage Collector is invoked several times consequently (about 5 times per second) and no object is freed during this collections (the
finalizerthread is never activated).
Although it could seem this error is activated by a fault in the memory management unit, due to low memory conditions, we observed the fail-ure at the same point even with greater heap sizes. Moreover, the error is activated both with the client and the server virtual machine. Nev-ertheless, augmenting the initial heap size, the error is activated later. This could mean that the fault activated by this code fragment has to be located in the mechanisms that manage heap resizing (heap size dynamically grow or shrink according to application requirements).
B. Bug 5073365 - Error setting thread priority
This failure is non-catastrophic. A
NullPointerException is thrown trying to set the priority of a terminated thread. The error is clearly located in the Thread Management sub-component. The log reported in figure 8 shows that no finalization or garbage collection occurs between the thread
termi-nation (THREAD END event) and the exception
(EXCEPTIONevent). Thus the Thread object is still reachable and alive. Moreover, none of the methods of the class Thread were JIT-compiled, thus the failure is not caused by errors during code optimization.
C o m p o n e n t T i m e s t a m p E v e n t A d d i t i o n a l I n f o r m a t i o n T H R E A D M A N A G E M E N T 2 0 0 5 1 1 1 7 1 4 4 2 5 9 2 2 2 M O N I T O R W A I T m a i n L B u g 5 0 7 3 3 6 5 ; 2 2 2 9 3 1 0 9 R U N T I M E C O R E 2 0 0 5 1 1 1 7 1 4 4 2 5 9 2 2 2 T H R E A D S T A R T T h r e a d - 3 R U N T I M E 2 0 0 5 1 1 1 7 1 4 4 2 5 9 2 2 3 C O N T E X T S W I T C H T h r e a d - 2 T h r e a d - 3 R U N T I M E C O R E 2 0 0 5 1 1 1 7 1 4 4 2 5 9 7 2 5 T H R E A D E N D T h r e a d - 3 T H R E A D M A N A G E M E N T 2 0 0 5 1 1 1 7 1 4 4 2 5 9 7 2 6 M O N I T O R W A I T E D m a i n L B u g 5 0 7 3 3 6 5 ; N O T I F I E D 2 2 2 9 3 1 0 9 R U N T I M E 2 0 0 5 1 1 1 7 1 4 4 2 5 9 7 2 6 E X C E P T I O N m a i n s e t P r i o r i t y( I ) V L j a v a / l a n g / T h r e a d ; 2 8 j a v a . l a n g . N u l l P o i n t e r E x c e p t i o n R U N T I M E 2 0 0 5 1 1 1 7 1 4 4 2 5 9 7 2 6 C O N T E X T S W I T C H T h r e a d - 3 m a i n L j a v a / l a n g / N u l l P o i n t e r E x c e p t i o n ;
Fig. 8. An exception is thrown after thread termination
F a i l u r e d e t a i l s
T i m e s t a m p 20051117153216617 M e s s a g e SIGSEGV
C u r r e n t T h r e a d CompilerThread1 Java daemon in VM
O t h e r d e t a i l s opto: 32 Bug6343401.compressMsg([BI)I (153 bytes)
Fig. 9. Failure during optimizing JIT compilation
that led to this failure is attributable to an erroneous update of the thread’s data structures upon its termination.
C. Bug 6343401 - Error in just-in-time compi-lation
This failure iscatastrophic: the JVM crashes
each time this code is executed. Analyzing col-lected data it is not possible to find any anomaly in the behavior of the virtual machine. However, analyzing the failure more in detail, summarized in figure 9, it is evident that an error during just-in-time compilation led to a JVM crash. The error is no more activated when the same code is executed with the “client” VM. This means that the compilation of the faulty method
compressMsg (ref. fig.9) activates a fault in
the Optimizing JIT Compiler sub-component,
which in turns fails the optimization of the above mentioned method leading to the crash of the Java Virtual Machine.
VIII. Conclusions and Future Work
This paper presented a failure analysis for the Java Virtual Machine. The results of the analysis indicated how failures are distributed with respect to failure manifestations, host system environment, internal JVM components,
frequency and workloads. We showed that
there is a non-negligible dependency of JVM reliability on the Operating System on which it runs, and that the Execution Unit is respon-sible for the greatest percentage of reported
failures. Furthermore, even if a considerable
amount of failures are related to bugs in JVM implementations, there is a strict relationship between failures and workloads imposed on the JVM.
We then investigated the behavior of the virtual machine when faults are injected allowing us
to obtain more insight about its dependability issues.
Starting from the analysis presented in this paper, we are going to perform a fault injection campaign to investigate the behavior of the virtual machine when faults are injected into its components. Once enough knowledge about JVM failure modes is acquired, we will be able to conduct a comprehensive field-data measurement campaign aimed at perform a dependability assessment of the various imple-mentations of the Java Virtual Machine.
References
[1] J.M. Bull, L.A. Smith, L. Pottage, and R. Free-man. Benchmarking Java against C and Fortran for Scientific Applications. Proceedings of the joint ACM-ISCOPE conference on Java Grande, 2001. [2] Frank Hartman and Scott Maxwell. Driving the
Mars Rover.Linux Journal, (125):68–70, september 2004.
[3] Java Community Process (JCP). JSR-163: Java Platform Profiling Architecture (JPPA), 2004. [4] T.Lindholm and F.Yellin. The Java(TM) Virtual
Machine Specification. Sun Microsystems, 2nd edi-tion, 1999.
[5] D. Tang and R.K. Iyer. Dependability measure-ment and modeling of a multicomputer system.
IEEE Transactions on Computers, Volume 42(Is-sue 1):Pages 62–75, January 1993.
[6] A. Kalakech, K. Kanoun, Y. Crouzet, and J.Arlat. Benchmarking the dependability of windows nt4, 2000 and xp.Proceedings of the 2004 International Conference on Dependable System and Networks (DSN04), June 2004.
[7] R.K. Iyer, Z.Kalbarczyk, and M.Kalyanakrishnam. Measurement-based analisys of networked sys-tem availability. Performance Evaluation Ori-gins and Directions, Ed. G.Haring, Ch.Lindemann, M.Reiser.
[8] R.K.Iyer, Z.Kalbarczyk, and J.Xu. Networked win-dows nt system field data analysis. 1999 Pacific Rim International Symposium on Dependable Com-puting (PRDC99), December 1999.
[9] C.Simanche, M.Kaaniche, and A.Saidane. Event log based dependability analysis of windows nt and 2k systems. 2002 Pacific Rim Internation Symposium on Dependable Computing (PRDC02), December 2002.
[10] W.Gu, R.K. Iyer, Z.Kalbarczyk, and Z.Yang. Charachterization of linux kernel behavior under er-rors.2003 International Conference on Dependable System and Networks (DSN03), June 2003. [11] C.Simanche and M.Kaaniche. Measurement-based
availbaility analysis of unix systems in a dis-tributed environment. 12th International
Sym-posium on Software Reliability Engineering (IS-SRE01), November 2001.
[12] E. Martins, M.F. Rubira, and N.G.M. Leme. A re-flective fault injection tool based on patterns. Pro-ceedings of the International Conference on De-pendable Systems and Networks (DSN ’02), June 2002.
[13] R.L.O. Morales, E. Martins, and N.V. Mendes. Fault injecion approach based on dependence anal-ysis. Proceedings of the 29th Annual Computer Software and Applications Conference (COMPSAC ’05), 2005.
[14] G. Jacques-Silva, R.J. Debres, J. Gerchmann, and T. Silva Weber. Fiona: A fault injector for depend-ability evaluation of java-based networks applica-tions. Proceedings of the 3rd IEEE International Symposium on Network Computing and Applica-tions (NCA ’04), 2004.
[15] J.E. Smith and R.Nair. The architecture of vir-tual machines. IEEE Computer, Volume 38(Issue 5):Pages 32–38, May 2005.
[16] J.Gosling, B.Joy, G.Steele, and G.Bracha.The Java Language Specification. Sun Microsystems, 3rd edi-tion, 2005.
[17] Hotspot java virtual machine.
http://java.sun.com/products/hotspot/.
[18] Jikes Research Virtual Machine. http://jikesrvm.sourceforge.net.
[19] R.K. Iyer, S.E. Burtner, and E.J. McCluskey. A sta-tistical failure/load relationship: results of a multi-computer study.IEEE Transactions on Computers, Volume C-31:Pages 697–705, July 1982.
[20] K.S.Trivedi, K.Vaidyanathan, and K.Goseva-Popstojanova. Modeling and analysis of software aging and rejuvenation. Proceeding of the 33rd annual Symposium on Simulation, Washington D.C., 2000.
[21] Salvatore Orlando. Dependability analisys of the java virtual machine. Proceedings of the 2005 In-ternational Conference on Dependable Systems and Networks (DSN 05), Supplemental Volume, June 2005.