Java Virtual Machine, JVM
aTeodor Rus
rus@cs.uiowa.edu
The University of Iowa, Department of Computer Science
aThese slides have been developed by Teodor Rus. They are copyrighted materials and may not be used in other course settings outside of the University of Iowa in their current form or modified form without the express written permission of the copyright holder. During this course, students are prohibited from selling notes to or being paid for taking notes by any person or commercial firm without the express written permission of the copyright holder.
Target of the assembler
The target of the assembler in this class is the language of a virtual machines (VM):
V M = hP rocessor, P rogram, ExecutionM odeli
Note:
1. The VM should be such that it can be used to simulate the computation of real machines;
2. Java Virtual Machine is such a machines.
Rationale
1. JVM (under the name of P-machine) was successfully used as target in many projects on compiler design and implementation;
2. JVM is successfully used as an abstract machine simulating the computation performed by current real machines in Java language environments;
3. Interpreters simulating the execution of JVM programs on real hardware are extensively implemented and accepted;
4. Oolong, the assembly language of the JVM is available;
5. Finally, this provides a good educational experience.
Processor abstraction
• A processor abstraction needs to represent any concrete hardware; hence it should be a virtual computer and an implementation.
• Once the virtual computer is implemented on a particular system all programs written for the virtual computer will run on that system.
• This allows programmers to write programs once (for the virtual computer) and run them anywhere. (Java slogan)
Fact
• The virtual computer operates on an abstract memory handling objects rather than bits and bytes.
• That is, the virtual computer hides the complexities of a real hardware such as:
1. Memory structure and addressing;
2. Intricacies of instruction patterns;
3. Program control and data flows.
JVM specification
JVM is a computer abstraction defined by:
• The set of operations that it performs, called bytecodes;
• The structure of the program JVM can
execute called the class file format, (CFF);
• The verification algorithm that ensures the integrity of JVM program.
Program execution
1. JVM takes its instructions from the CFF;
2. Operations performed by JVM take their operands from a stack and generate their results on the stack. Hence address
computation is not a problem;
3. JVM operates on objects rather than operating on bits and bytes. Hence,
interpretation is the same for all instructions
JVM instructions
JVM instructions are classified in 6 groups:
1. instructions whose operands are in top of the stack
Examples: add, mul, div, etc.;
2. instructions for object allocation;
3. instructions for method invocation;
4. instruction for retrieving and modifying fields in the objects;
5. instructions for moving information between stack and objects.
Examples:
load n (moves the value of local variable n onto stack);
store n (store the value on top of the stack into variable n)
Example
Consider the following JVM code:
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "Hello, world"
invokevirtual java/io/PrintStream/println (Ljava/lang/String;)V
Example, continuation
The meaning of this code is:
1. Retrieve the value of out field in the class java/lang/System and push it on the stack; this is an object of the class
java/io/PrintStream
2. Push the constant "Hello, world" on the stack
3. Invoke the method println, which is defined in the class java/lang/PrintStream and expects stack to contain an
object of java/lang/String and a reference to out, an object of the class java/io/PrintStream
Class File Format, CFF
• Represents a Java class as a stream of bytes;
Java platform has methods for converting Java class files into classes in JVM
• CFF is not necessarily a file, it can be stored in a database, across the network, as part of Java archive file, JAR, etc.
• CFF is standardized and is manipulated by the ClassLoader, part of Java platform
Note
• If one stores CFF in a nonstandard form then one needs to construct an appropriate
ClassLoader to handle it.
Verification algorithm
• Purpose: ensures that programs follow a set of rules that are designed to protect the
integrity of JVM programs;
• The verification algorithm perform an abstract interpretation of CFF. If this fails the JVM
program in the CFF is aborted.
Note: this doesn’t mean that one cannot write a JVM program that
while conforming to the rules implemented by the verification algorithm violates the integrity of the JVM.
Java Platform
• JVM perform fundamental computational tasks but it lacks features for doing
computer-oriented things like graphics, Internet communications, etc
• Java platform includes JVM and a collection of classes that are collected into the package java.
• Examples of such classes:java.applet, java.io, java.awt (abstract window toolkit), java.security, etc.
Assumptions
• JVM cannot function independent of Java platform.
• We assume further that Java platform contain java.lang.Object,
java.lang.ClassLoader, java.lang.String,
java.lang.Class
• Note the dot-notation for Java and slash notation for JVM
JVM architecture
JVM is divided into four conceptual data spaces:
• Class area, where the JVM program
(consisting of byte codes and constants) is kept;
• Java stack, which keeps track of which methods have been called and the data associated with each method invocation;
• Heap, where objects are kept;
• Native method stacks, for supporting native methods.
Class area
Stores the classes loaded into the system; each class is defined in terms of the properties:
• Its superclasses;
• List of interfaces (possibly empty);
• List of fields;
• List of methods and their implementations stored in the method area;
• List of constants, stored in the constant pool.
All properties of a class are immutable (i.e., are unchangeable)
Class descriptors
• Each field is defined by a descriptor that shows the properties of the object occupying that field such as static or not;
• For nonstatic fields there is a copy in each object of the class; for static fields there is a single copy for the entire class of objects;
• Each method is defined by a descriptor that shows method type and method modifiers which are abstract, static, etc.;
• An abstract method has no implementation; a non-abstract method has an implementation defined in terms of JVM instructions.
Example of class representation
Figures 1 and 2 depict two class areas
main method implementation
main dscrptr:Ljava/lang/String; modifs:public, static
-
Methods:
name dscrptr: Ljava/lang/String; modifs: none Fields:
Superclass: java/lang/Object ClassName: GamePlayer
Figure 1: The GamePlayer class representation
Example, continuation
getMove method implementation
getMove dscrptr:()LMove; modifs:public
-
Methods:
piece dscrptr: I modifs: static color dscrptr: I modifs: private Fields:
Superclass: GamePlayer ClassName: ChessPlayer
Figure 2: The ChessPlayer class representation
JVM stack
JVM operates on a stack of stack frames.
• A stack frame consists of three elements:
1. The operand stack, which contains the operands of the operations performed by JVM;
2. The array of local variables of the method;
3. Program counter PC, shows first instruction of the method.
Execution model
• Each time a method is invoked a new stack frame is created and is pushed on the JVM stack;
• When a method terminates its stack frame is popped out.
The JVM performs the loop:
while (PC.opcode != Halt) {
Execute (PC);
PC := Next(PC);
More on execution model
• The top frame of JVM stack shows the currently executing method and is called active frame (AF);
• Only the operand stack and the local variable array in the active frame can be used during JVM program execution;
• Each operation performed by JVM evaluates an expression
whose operands are on the operand stacks and leave the result on the operand stack;
• When a method calls another method the PC of the caller is saved in the active frame; when callee completes the result is in top of the operand stack and the caller is resumed using the PC from callee stack frame and caller array of local variables.
The Heap
• Each object is associated with a class (its type) in the class area and is stored in the heap.
• Each object has a number of slots for storing fields; there is one slot for each nonstatic field in the class associated with the object.
• Each object has a number of slots storing
methods that operate on that object; there is one method for each abstract method of the class associated with the object.
Example object
Figure 3 shows the heap representation of an object of the class ChessPlayer.
color:1
ToTheClass -
ChessPlayer
pieces:16 Superclass -
GamePlayer
java/lang/String
ToTheName -
Player’s name
Pooky C
Figure 3: An object of the class ChessPlayer
Native method stacks
• Native methods are methods implemented using other languages than JVM;
• Native methods allow programmer to handle situations that
cannot be handled completely by Java, such as interfacing with platform dependent features or legacy code;
• Native methods are executed using C-like stacks;
• Native methods do not exist on all JVM implementations;
moreover, different JVM implementations may have different standards for native methods;
• The standard Java Native Interface, JNI, should be available for native method documentation.
Garbage collection
• Each object consumes some memory from the heap;
• Eventually the memory allocated to JVM object is reclaimed;
• JVM reclaims object’s memory automatically through a process called garbage collection;
• An object is ready to be garbage collected when it is no longer “alive".
Object liveness
Rules that determining if an object is alive are:
1. If there is a reference to the object on the stack then the object is alive;
2. If there is a reference to the object in a local variable on the stack or in a static field, then the object is alive;
3. If a field of an alive object contains a reference to the object then the object is alive;
4. JVM may internally keep references to certain objects, for example to support native methods. These objects are alive.
Verification process
• Ensures that class files follow certain rules;
• Allows JVM to assume that a class has certain safety properties and to make optimizations based on this;
• Makes it possible to safely download Java applets from Internet;
• Java compiler generates correct code.
However JVM programmer can bypass the
restrictions. Verification algorithm checks this.
How does it work?
It asks questions about CFF, such as:
• Is it a structurally valid class?
• Are all constant references correct?
• Are all instructions valid?
• Will stack and locals contain values of appropriate type?
• Do classes used really exist and are correct?
JVM machine language syntax
• Level 0:
byte codes,
indices in CFF (integers),
indices in the array of local variable, constant tags.
• Level 1:
constants and instructions;
• Level 2:
Class File Format, CFF.
JVM codes
1. JVM uses Unicode character codes (rather than ASCCI or EBCDIC). The Unicode Consortium manages this codes;
2. The Unicode was designed such that it can accommodate any known character set used by people’s alphabets;
3. Unicode Transformation Format, UTF-8, UTF-16, UTF-32 are Unicode character representations on byte, 2-bytes (half-word), 4-bytes (word).
Constant tags
Table 1: Constant tags
Tag Type Format Interpretation
1 UTF8 2+n First 2 bytes encode length n followed by n bytes of the text of the constant 2 undefined
3 Integer 4 bytes Text of a signed integer
4 Float 4 bytes Text of IEEE 754 floating-point number 5 Long 8 bytes Text of long signed integer
6 Double 8 bytes Text of IEEE 754 double-precision number 7 Class 2 bytes Reference to class name, a UTF8 constant 8 String 2 bytes Reference to string name, a UTF8 constant 9 FieldRef 4 bytes First 2 show a Class constant, second 2 a
NameAndType constant (tag 12 below)
Constant tags, continuation
Table 2: Constant tags
Tag Type Format Interpretation 10 MethodRef 4 bytes Same as FieldRef 11 IntMetRef 4 bytes Same as FieldRef
12 NameAndType 4 bytes First 2 point to name, second 2 point to descriptor. Both are UTF8 constants
Is CFF structurally valid?
• The first 4 bytes of CFF must contain the hex values: CA FE BA BE which is the magic
number;
• Following the magic number are minor and major version; each take two bytes
interpreted as a 16-bit unsigned:
Example: JDK 1.0, 1.1: Major = 0X2D (45), Minor = 0X3(3);
Java 2: Major: 0X2E(46); Minor: 0, if Major = 45 then Minor > 3
• Figure 4 shows the structure of a CFF
Structure of the CFF
Magic# Minor Major CnstPool Class Super Interface Fields Methods
Figure 4: Structure of a properly formatted CFF
More on CFF structure
• Most sections begin with a count, which is a two-byte unsigned, followed by count instances of some pattern of bytes;
• Example: (see tags in Tables 1,2)
1. Constant pool start with a count followed by as many constant patterns as it specifies;
2. Each constant pattern consists of a one byte tag and a number of bytes on which constant is written;
3. The tag describes the kind of constant that follows and how many bytes does it take;
4. If any tag is invalid or file ends before correct number of constants is found then CFF is rejected.
Check constant references
• Class and String constants must have references to UTF8;
• FieldRef,MethodRef, InterfaceMethodref must have a class index that is a class constant
and a name-and-type index;
• NameAndType constants must have two indices pointing to UTF8.
Example JVM code
Figure 5 shows a portion of the code:
.class Foo .super Bar
.implements Baz
.field field1 LFoo;
.method isEven (I)Z
; ; ...
.end method
7 0 9 Clas: name index = 9
1 0 3 F o o UTF8 Foo
7 0 7 Class name index = 7
1 0 3 B a r UTF8 Bar
7 0 5 Class name index = 5
1 0 3 B a z UTF8 Baz
1 0 5 L F o o ; UTF8 LFoo 1 0 6 f i e l d 1 UTF8 field1
1 0 4 ( I ) B UTF8 (I)B
1 0 6 i s E v e n UTF8 isEven
1 0 Constant pool count 162 = 256
. . .
0 1 Method attributes count
0 2 Method descriptor index (2=(I)B)
0 1 Method name index (1 = isEven)
0 0 There are no method flags
0 1 Method count = 1
0 0 Field attributes count = 0
0 4 Field descriptor index (4 = LFoo)
0 3 Field name index (3 = field1)
0 0 There are no field flags
0 1 Fields count = 1
0 6 Interface index (6 = Baz)
0 1 Interface count = 1
0 8 Superclas index (8 = Bar)
0 A This class index (10 = Foo
. . . Method attributes
12 34 56 78 910
Figure 5:
Are all instructions valid?
Once we know that overall class structure is valid we can look at method bodies to check if the in- structions are correctly formatted.
Problem to be solved
• Does each instruction begin with a recognized opcode?
• If instruction takes a constant pool reference as argument, does it point to an actual
constant pool entry with the correct type?
• If the instruction uses a local variable, is the local variable range within the correct range?
• If the instruction is a branch, does it point to the beginning of an instruction?
A closer look at CFF
Consider the Java "hello world" program:
public class hello {
public static void main(String argv[]) {
System.out.println("Hello, world");
} }
Note: the file hello.java, containing this program, is mapped by the java compiler (javac hello.java) into the CFF file hello.class that is
interpreted by JVM.
To understand CFF we look at the file hello.class
Notation
Represent CFF on three columns:
1. Left column: offset, in hex, into CFF
2. Middle column: bytes at the offset location in hex
3. Right column: interpretation of the middle column by JVM
Example
File header
000000 cafebabe Magic = ca fe ba be 000004 0003 Minor version = 3
000006 002d Major version = 2*16 + 13 = 45
Constant pool
000008 0020 There are 2 * 16 = 32 constants in the pool 00000a 08001f 1:a string at index 16 + 15 = 31 in CFF
00000d 07001d 2:a class name at index 16 + 13 = 29 in CFF 000010 070018 3:a class name at index 16 + 8 = 24 in CFF 000013 07000e 4:a class name at index 14 in CFF
000016 070013 5:a class name at index 19 in CFF
000019 090002000a 6:FieldRef:class index 2,name-and-type index 10 00001e 0a00040009 7:MethodRef:class index 4,name-and-type index 9 000023 0a0003000b 8:MethodRef:class index 3,name-and-type index 11 000028 0c000c0017 9:NameAndType:name index 12,descriptor index 23 00002d 0c0016001c 10:NameAndType:name index 22,descriptor index 28 000032 0c001b001e 11:NameAndType:name index 27,descriptor index 30 000037 010007 12: UTF8, length 7
00003a 7072696e746c6e println
Constant pool, continuation
000041 01000d 13: UTF8, length 13
000044 436f6e7374616e7456616c7565 ConstantValue
000051 010013 14: UTF8, length 19
000054 6a6176612f696f2f5072696e74537472 java/io/PrintStream
000067 01000a 15: UTF8, length 10
00006a 457863657074696f6e73 Exceptions
000074 01000a 16: UTF8, length 10
000077 68656c6c6f2e6a617661 hello.java
000081 01000f 17: UTF8, length 15
000084 4c696e654e756d6265725461626c65 LineNumberTable
000093 01000a 18: UTF8, length 10
000096 536f7572636546696c65 SourceFile
0000a0 010005 19: UTF8, length 5
0000a3 68656c6c6f hello
Constant pool, continuation
0000a8 01000e 20: UTF8, length 14
0000ab 4c6f63616c5661726961626c6573 LocalVariables
0000b9 010004 21: UTF8, length 4
0000bc 436f6465 Code
0000c0 010003 22: UTF8, length 3
0000c3 6f7574 out
0000c6 010015 23: UTF8, length 21
000069 284c6a6176612f6c616e672f53747269 (Ljava/lang/String;)V
0000de 010010 24: UTF8, length 16
0000e1 6a6176612f6c616e672f4f626a656374 java/lang/Object
0000f1 010004 25: UTF8, length 4
0000f4 6d61696e main
0000f8 010016 26: UTF8, length 22
0000fb 285b4c6a6176612f6c616e672f537472 ([Ljava/lang/String;)V
Constant pool, continuation
000111 010006 27: UTF8, length 6
000114 3c696e69743e <init>
00011a 010015 28: UTF8, length 21
00011d 4c6a6176612f696f2f5072696e745374 Ljava/io/PrintStream;
000132 010010 29: UTF8, length 16
000135 6a6176612f6c616e672f53797374656d java/lang/System
000145 010003 30: UTF8, length 3
000148 282956 ()V
00014b 01000c UTF8, length 12
00014e 48656c6c6f2c20776f726c64 Hello, world
Constant entries
• The first constants are strings codified as UTF8 entries
• Strings are followed by small constants, 3,4,5, etc (of which there is none in the example)
codified on a byte
• These are followed by integer and long constants codified as two’s complement signed integers on 32 and 64 bits
respectively.
• Floating and double constants codified as shown in Table 3
Other fields
Fields, Methods, and Class entries:
• Constants with tags 9, 10, 11 are identical. They are used to refer to fields and methods in field and method instructions such as
getfield, putstatic, invokevirtual
• Example: constant 7 in constant pool is 0a 0004 0009 i.e:
1. 0a = 10, it is a MethodRef
2. Class containing the method is at index 4 whose name is at index 14, i.e., java/io/PrintStream
3. Name and descriptor is at index 9: name index 12 (println), descriptor index 23 [(Ljava/lang/String;)V]
This is enough info to call the method; Constant 7 is used to code the arguments of Oolong instructions
Class information
• Following the constant pool is the information about the class itself which consists of: name, type, and access flags as seen below
• Example hello,java
00015b 0021 two bytes, access flags = 33
00015d 0005 two bytes, index of this in constant pool, 5 00015f 0003 two bytes, index of super in constant pool, 3 000161 0000 two bytes, number of interfaces, 0
Access flags
are interpreted as a bit-vector as seen below:
Bit Name Meaning
1 ACC_PUBLIC The class is public
2-4 Not used
5 ACC_FINAL The class is final 6 ACC_SUPER The class is supper
7-9 Not used
10 ACC_INTERFACE The class is an interface
11 Not used
12 ACC_ABSTRACT The class is abstract
Fields and Methods
After class information comes four bytes that describe the number of fields and methods. In our example they are:
000163 0000 Number of fields is zero
000165 0002 There are two methods in this class
Fields and methods have identical formats.
000167 0009 access flags of the method = 9
000169 0018 name of the method is index 24 in constant pool (main) 00016b 001a descriptor of the method has index 26 in constant pool
Method access flags
Are specified in the table:
Bit Name Meaning
1 ACC_PUBLIC The field/method is public 2 ACC_PRIVATE The field/method is private 3 ACC_PROTECTED The field/method is protected 4 ACC_STATIC The field/method is static 5 ACC_FINAL The field/method is final 6 ACC_SYNCHRONIZED The method id synchronized 7 ACC_VOLATILE The field is volatile
8 ACC_TRANSIENT The field is transient
9 ACC_NATIVE The method is native
10,11 Unused
12 ACC_ABSTRACT The method is abstract
Attributes
• After the general method or field information the CFF contains a list of attributes
• Fields and methods have different kind of attributes. Methods have a single attribute giving the implementation of method; most fields have no attributes at all
• Only the ConstantValue attribute is defined for fields
• Attributes for the methods are represented as shown bellow
Attributes for methods
00016d 0001 1 method attributes: method attribute 0 follows 00016f 0015 name: at index 21 in constant pool, Code
000171 00000025 Length of the code is 37 000175 0002 Maximum stack is 2 slots
000177 0001 Maximum space for locals is 1
The actual byte code
Disp. Bytecode Addr Interpretation
000179 00000009 Code length: 9 bytes
00017d b20006 0000 getstatic #6, index in constant pool 6 000180 1201 0003 ldc #1, index 1 in constant pool
000182 b60007 0005 invokevirtual #7, index 7 in constant pool
000185 b1 0008 return
Note: code of length up to 4G bytes (232) is al- lowed; however, other constraints limit code size to 64K.
Observations
1. There are two forms of ldc instruction, ldc and ldc w: ldc requires one byte argument interpreted as index 0
..255 in constant pool, ldc_w requires two bytes argument that may refer to any constant
2. In either case constant pool entry must be Integer, Float, Double, Long, or String
Exception table
Following byte code is an exception table entry which begins with two-byte count, the number of entries:
000186 0000 there are no exceptions in this method
Note: following the exception handler table, the code attribute may have attributes of its own, such as debugging info.
Main method
The main method has one attribute, LineNumberTable:
000188 0001 1 code attributes: code attribute 0 follows 00018a 0011 Name: index 17 in CFF: LineNumberTable 00018c 0000000a Length of attribute 10
000190 0002 Number of entries: 2 000192 0000 Start PC: 0
000194 0005 Line number: 5 000196 0008 Start PC: 8
000198 0003 Line number 3
Method 1
Starts after the code attribute of method 0
00019a 0000 Access flags = 0
00019c 001c Name: index 28 in constant pool (<init>) 00019e 001e Descriptor: index 30 in constant pool, ()V 0001a0 0001 1 method attributes: method attribute 0 0001a2 0015 Name: index 21 in constant pool (Code) 0001a4 0000001d length of the attribute 29
0001a8 0001 Maximum stack: 1 0001aa 0001 Maximum locals: 1 0001ac 00000005 Code length: 5 0001b0 2a 00000000 aload_0
0001b1 b70008 00000001 invokespecial #18 0001b4 b1 00000004 return
Method 1, continuation
0001b5 0000 0 exception table entries
0001b7 0001 1 code attributes: code attribute 0:
0001b9 0011 Name: index 17 in constant pool (LineNumberTable) 0001bb 00000006 Length of attribute : 6
0001bf 0001 Length of table 1
0001c1 0000 0001 Start PC: 0, Line number: 1
Class attributes
• CFF ends with a list of class attributes
• A class can have any attributes it wants but only SourceFile attribute is defined in Java specification
0001c5 0001 1 class file attributes Attribute 0:
0001c7 0012 Name: index 18 in constant pool (SourceFile) 0001c9 00000002 Length: 2 bytes
0001cd 0010 Name: index 16 in constant pool (hello.java)