• No results found

3.5 CORBA

4.1.3 Components of a JVM

According to Meyer and Downing (1997) a JVM consists of several sub- components:

• The classes management subcomponent is used to provide func- tionality to load classes from external sources, such as jar files. • The class verifier is used for validity checks.

• The execution subcomponent is used for runtime management of interpreting and execution of the bytecode; this does not involve the compiler, which is a JAVAprogram itself.

• Subcomponents that handle Threading and metadata Handling. • The platform dependent abstraction classes hide details of internal

runtime and communication structures.

• The task of the boot class loader is to load the system libraries into the virtual machine.

The execution environment

The functionality of a JAVA Runtime Environment can be divided into several zones which JAVAprograms affect on their path of execution, the JVM core, the system libraries, the native interface layer and the boot class loader and verifier.

First presented is the core JVM as the location where - as shown above - instructions in bytecode are interpreted, and where thread and memory management are performed. JAVAis a portable language, which means that platform independent functionality of the JRE is itself implemented in JAVA semantics and located in the system libraries (such as rt.jar in the JRE from Sun). Platform dependent code such for wrapping na- tive printing, GUI integration or platform specific file system handling is typically integrated into the JRE as native methods.

The task of the bytecode verifier is to check that only classes compliant to the JVM specification are loaded into the virtual machine. Coglio (2003) presented a detailed analysis on the bytecode verification process.

Classes

One of the core concepts of the JAVA language is portability. Therefore the class files do not differ in format from platform to platform. To sup- port evolution of the language there are different versions of class files corresponding to the evolution of versions of the JAVA language. Newer versions may integrate new concepts within the JAVA language such as inner classes whilst providing downward compatibility. This means that a class file of version 47.0 (JAVA 1.3.1) still runs in a virtual machine for version 48.0 (Java 1.4.1) but not vice versa.

Class Loading

A JAVA application is composed of several components and implemen- tation classes. Each class and interface is distributed in an own class

TheJAVAVirtual Machine 119

file, also inner classes are distributed in an own class files. Class files do not only contain the bytecode, they also contain necessary metadata to describe constants, fields, methods, exceptions and dependencies to other classes and interfaces. This is important for the linking step during class loading by the JVM, as linking is accomplished dynamically via resolution of the symbolic names to concrete classes of the application.

When an application starts up, a recursive class-loading algorithm is ini- tiated in order to load all necessary classes. Recursion may be involved when a class A is loaded that is dependent (use, extends, implements) to another class B, then B has consequently to be loaded before A. The class loading step includes means that initially the physical transfer of the bytes from an external storage into the VM happens. As the next step the class A is analyzed and the root class files (interfaces and base classes) of A are loaded. This step may include recursion.

After the class is loaded by resolving and loading the dependencies, the bytecode verification process is performed. After successful verification, the class in memory is initialized by calling the static initializer (the <clinit> method). This is responsible to set the static variables of a class to their initial values. The attempt to load a class designed for a newer JVM is denied by throwing a java.lang.UnsupportedClassVersionError.

Class Files

JAVA class files are the platform independent container format for exe- cutable java code. They are typically generated by the JAVA compiler, which is part of the JDK. There are also are other class file generators available that translate from other languages such as Jython or Groovy (Codehaus Foundation, 2006) into the standardized class file format.

CA FE BA BE 00 03 00 2D 00 13 07 00 17 12 30 11 .. .. ..

public class Cat { void bite (int times) {

... } }

.class public Dog .method bite I .invokestatic seekVictim ... .end method .end class CA FE BA BE 00 03 00 2D 00 13 07 00 17 12 30 11 .. .. .. LoaderClass- P A S S 1 P A S S 2 P A S S 3 P A S S 4 Bytecode-Verifier JVM

Figure 4.1: Class files and the bytecode verifier

Class File Format

Beside executable code blocks JAVA class files also embed metadata that supports both verification and execution of compiled programs for the JAVAvirtual machine. A JAVAclass may define methods and fields, so a JAVAclass file contains a method table and field table.

Class files are physical representations of JAVA classes and interface definitions. They consist of a set of bytes conforming to the byte code format of the JAVAvirtual machine specification.

Every valid class file starts with a header, consisting of a magic constant (0xCAFEBABE). Then follows the version of the target JVM, for which the program is compiled, in this case 0x0031, which yields version 49.0. The value used for JAVA version 1.5, in contrast to JAVA version 1.4 which utilizes 48.0 as an identifier. A class file is structured like shown in Fig- ure 4.2, it has three basic big-endian (hi/lo) number types (u1=8 bit value u2=16 bit value and u4=32 bit value) and a complex table type is used.

The constant pool follows after the header. The constant pool is a list that is preceded by an u2-sized data item (holding constantpoolsize + 1).

TheJAVAVirtual Machine 121

The body of the list are constantpoolsize − 1 constant pool entries. The constant pool is followed by metadata about the class itself. The metadata specifies the class identity, the inheritance relations and the access modi- fier flags of the class. Classes can implement several interfaces. They are referred to by the entries in the interface table. In order to store state in objects, the classes need fields. A field table consists of a length counter and the f ield_inf o entries. Behavior of classes in specified by methods specified in the JAVA bytecode language. A method table with a length follows the specification of the fields. The class file is completed by a set of optional and non-optional attributes such as debugging info (Venners, 1999).

Every interface the class implements is described by a constant pool entry pointing to the name of each interface. Fields and methods both have an extended metadata area, which consist of a name, access flags, and the signature and attributes (see below) store exceptions and bytecode.

Constants such as arbitrary strings or technical names such method names are stored in the constant pool. Even used multiple times constants this allows constants to be stored only once and addressed by a unique reference number. This keeps class files as small as possible to reduce loading time in bandwidth-reduced scenarios.

A method in the method table is described by its visibility settings and its code block. The JAVA language specification does not support self- modifying code, so constructs to directly access a code block do not exist. From a code design perspective, this precaution prevents self-modification patterns often used within x86-code or branching into code between text blocks, which are commonly used as precaution against reverse engineer- ing.

Code stored in class files can be manipulated by a class loader prior to execution by overriding the defineClass method and use methods from a bytecode engineering library to perform specific transformations.

Class {

u4 magic_value; //0xCAFEBABE constantly u2 minor_version; u2 major_version; u2 constant_pool_element_count; constpool_info constants[constant_pool_element_count-1]; u2 access_flags; u2 this_class; u2 super_class; // extends; u2 interface_count; // implements; u2 interfaces[interface_count]; u2 field_count; field_info fields[field_count]; u2 method_count; method_info methods[method_count]; u2 attribute_count; u2 attributes[attributes_count]; }

TheJAVAVirtual Machine 123

Constant-Pool entries

Constant pool entries are equipped with a type. Every entry holds a value and a one-byte type indicator. The constant pool defines the symbolic names of classes (CONSTANT_Class type), fields (CONSTANT_Field), meth- ods (CONSTANT_Method) as well as the used interfaces (

CONSTANTITFMREF). These are used to perform the dynamic linking during class loading as well as the strings and numeric constants used in the bytecode. The constant pool is implemented as an indexed table, holding size − 1 entries, because the index 0 is unused. CONSTANT_Long and CONSTANT_Double use two locations in the constant pool due to their length of 64 bits. All constant pool entries except CONSTANT_Utf8 8 string entries are fixed in their value size.

Attributes

A typical attribute is described as in Figure 4.3. Attribute {

u2 attr_name_idx;

u2 attr_length;

u1 info[attr_length];

}

Figure 4.3: Attribute description

The value attr_name_idx specifies the name of the attribute (such as

Exceptions), other defininig elements are the length of the Attribute and

a data buffer specifying the Attribute. Attributes are described by their name and hold complex non-optional information such as the bytecode, exceptions, inner classes, start up values for static fields and optional in- formation such as metadata useful for debugging. Annotations by the compiler such as deprecation markers, synthetic accessor methods for in-

ner classes are stored in attributes. Attributes are identified by a key string name, such as Code, InnerClasses, LocalVariableTable and others. New at- tribute types can therefore be added to the class file format without chang- ing the physical representation of the class file format.

Object management

The data structures described above serve the purpose to run JAVAappli- cations. A fundamental concept of object-oriented languages like JAVAis to create typed object instances on the fly. Therefore, the virtual machine needs to manage the metadata for identity, the internal state, threading, reflection, and reference counters for the lifecycle (garbage collection) of the created JAVAobjects. Every JAVAobject has an identity that is calcu- lated by the hashcode method of its class implementation. A JAVAobject may optionally have instance variables (fields), which store the data as- sociated to an object. For reflective purposes an object needs access to class metadata. These are gathered by the use of the Object.getClass() method. As a prerequisite, code must be granted a RuntimePermission "accessDeclaredMembers" to be allowed to perform a lookup of arbi- trary fields within an object. As this functionality is restricted, the reflec- tion API is not fully available for applets, to block the access of untrusted code to non-public data. Object instances can be used as references for thread synchronization, therefore current monitor references have to be created and managed.

When an object instance is created the memory space to store the in- ternal data is allocated from the heap, the exact amount of needed bytes can be derived from the class metadata. After allocation, the memory for the instance variables is set to an initial state (normally zero-bytes). The pointer to the class structure is moved to the appropriate field in the object block. Then the dynamic initializer (constructor) is invoked for the object instance, which is denoted as <init>, which is the internal method name

TheJAVAVirtual Machine 125

for a constructor. A constructor implementation can be used to incor- porate checks (check point pattern) on the initialization parameters and environmental settings before creating the object.

In the later discussion it will be shown that there exist configurations in the object-lifecycle state-model that bypass the security function of con- structors. This is the case with injecting objects into the JVM with the use of the Serialization API.

After invoking the constructor, the object is initialized and available for usage by invoking other methods or accessing the fields.

The performance of the JAVAvirtual machine is dependent of the lookup speed of information stored in the fields of an object instances. Therefore, indirections needed for bytecode portability are resolved during execution to allow faster execution. This technique is named quickening, which al- lows replacing time-consuming indirect lookups to the constant-pool for methods, fields, etc. with quick lookups to the real native memory loca- tion. The slow portable GETFIELD bytecode instructions are then replaced inline with special internal bytecode operations GETFIELD_QUICK. This replacement technique allows further invocations to gain from an initial quickening step.

For portability reasons a JAVA compiler never emits these native op- codes. They are part of an encapsulated JVM design feature to perform optimization on bytecode sequences. After bytecode optimizations are ap- plied, the JVM typically applies hotspot compilation techniques. These may vary for the different usage patterns of java. A desktop client JVM is typically optimized for quick startup and instant execution whereas a server JVM spends more effort in identifying and replacing performance hot spots with adequate and faster native replacements.

Static and virtual invocation

JAVA allows methods, which are not related to an object identity to be associated to the class. Those methods are called static methods in con- trast to non-static instance methods, which are bound to specific object instance. In JAVA methods invocation is triggered by messages. In or- der to invoke static methods the JVM simply has to lookup the class in its internal class table and consequently invoke the bytecode stored for the particular method. For the invocation of non-static methods (so called in- stance methods) the object has to be looked up, and the instance becomes the this pointer to create an invocation context, which can be used in the method control flow. The current this pointer is typically propagated in the consequent instance method calls this object invokes. The feature of inheritance makes the lookup time-intensive therefore in order to gain performance, the JVM has the internal optimization option to flatten the class hierarchy, and copy inherited methods to the invoked subclass in- stead of looking the definition from the superclass on each invocation.

The JAVAlanguage specification not only supports inheritance, it also features the decoupling of callers and callees via the definition of inter- faces. Classes can implement multiple interfaces, therefore method and field lookups also have to incorporate the multiple interfaces a class imple- ments. Important security restrictions to the method lookup mechanism are the accessibility flags of fields, methods and classes. They are limiting access from private over protected to public namespace availability.

Native Interaction

A JAVAvirtual machine is very limited in its functionality when it cannot access functions available on the native platform such as for I/O, mem- ory management, networking, or usage of graphical capabilities. These are normally defined through the API of the underlying operating sys- tem and system near frameworks. In order to provide a safe and portable

TheJAVAVirtual Machine 127

execution environment these native functions are equipped with special checks and wrappers are integrated in the system classes to guard the call path from an application to a native function. This prevents injection of illegal parameters, which could break the stability of the JAVAruntime en- vironment.

The JAVAspecification specifies platform specific methods written in C or C++ with the keyword native. These methods are called by using a spe- cific calling convention the JAVA Native Interface (JNI) (Gordon, 1998). JNI provides a bidirectional methodology to bridge control flow between the native platform and the JVM. The native platform can use JNI func- tions to start a JVM and on the other hand, the JVM can call the native platform by using native methods.

Stubs in private classes are responsible for the implementation; they call native functions written in C or C++. Public functions that are available to the end user call these native stubs, with typically added parameter checks. Instead of having to deal with using platform dependent calling conven- tions, the JNI provides a portable way to call native functionality by provid- ing abstraction headers for the C compiler of the native platform. JNI also specifies the management of JAVAobjects, invocation rules of JAVAmeth- ods from native code, exception-handling, wrapping native return values, and class loading functionality. The management facilities for functions written with the JNI include inspection, update, and creation of simple JAVAobjects and arrays.

Problems with JNI

The security of a JAVAsystem is directly dependent from the security level of the defined native interfaces. Once the control reaches a native function the java security mechanisms can be bypassed or misused. Therefore en- try to native code should be avoided or appropriately guarded by restricting parameters to block the possibility of calling native code with parameters

from untrustworthy sources that aim to exploit underlying native vulner- abilities such as buffer or heap overflows (Koziol et al., 2004) in order to overtake the native control flow.