• No results found

Java Virtual Machine, JVM

N/A
N/A
Protected

Academic year: 2021

Share "Java Virtual Machine, JVM"

Copied!
64
0
0

Loading.... (view fulltext now)

Full text

(1)

Java Virtual Machine, JVM

a

Teodor Rus

rus@cs.uiowa.edu

The University of Iowa, Department of Computer Science

aThese slides have been developed by Teodor Rus. They are copyrighted materials and may not be used in other course settings outside of the University of Iowa in their current form or modified form without the express written permission of the copyright holder. During this course, students are prohibited from selling notes to or being paid for taking notes by any person or commercial firm without the express written permission of the copyright holder.

(2)

Target of the assembler

The target of the assembler in this class is the language of a virtual machines (VM):

V M = hP rocessor, P rogram, ExecutionM odeli

Note:

1. The VM should be such that it can be used to simulate the computation of real machines;

2. Java Virtual Machine is such a machines.

(3)

Rationale

1. JVM (under the name of P-machine) was successfully used as target in many projects on compiler design and implementation;

2. JVM is successfully used as an abstract machine simulating the computation performed by current real machines in Java language environments;

3. Interpreters simulating the execution of JVM programs on real hardware are extensively implemented and accepted;

4. Oolong, the assembly language of the JVM is available;

5. Finally, this provides a good educational experience.

(4)

Processor abstraction

A processor abstraction needs to represent any concrete hardware; hence it should be a virtual computer and an implementation.

Once the virtual computer is implemented on a particular system all programs written for the virtual computer will run on that system.

This allows programmers to write programs once (for the virtual computer) and run them anywhere. (Java slogan)

(5)

Fact

The virtual computer operates on an abstract memory handling objects rather than bits and bytes.

That is, the virtual computer hides the complexities of a real hardware such as:

1. Memory structure and addressing;

2. Intricacies of instruction patterns;

3. Program control and data flows.

(6)

JVM specification

JVM is a computer abstraction defined by:

The set of operations that it performs, called bytecodes;

The structure of the program JVM can

execute called the class file format, (CFF);

The verification algorithm that ensures the integrity of JVM program.

(7)

Program execution

1. JVM takes its instructions from the CFF;

2. Operations performed by JVM take their operands from a stack and generate their results on the stack. Hence address

computation is not a problem;

3. JVM operates on objects rather than operating on bits and bytes. Hence,

interpretation is the same for all instructions

(8)

JVM instructions

JVM instructions are classified in 6 groups:

1. instructions whose operands are in top of the stack

Examples: add, mul, div, etc.;

2. instructions for object allocation;

3. instructions for method invocation;

4. instruction for retrieving and modifying fields in the objects;

5. instructions for moving information between stack and objects.

Examples:

load n (moves the value of local variable n onto stack);

store n (store the value on top of the stack into variable n)

(9)

Example

Consider the following JVM code:

getstatic java/lang/System/out Ljava/io/PrintStream;

ldc "Hello, world"

invokevirtual java/io/PrintStream/println (Ljava/lang/String;)V

(10)

Example, continuation

The meaning of this code is:

1. Retrieve the value of out field in the class java/lang/System and push it on the stack; this is an object of the class

java/io/PrintStream

2. Push the constant "Hello, world" on the stack

3. Invoke the method println, which is defined in the class java/lang/PrintStream and expects stack to contain an

object of java/lang/String and a reference to out, an object of the class java/io/PrintStream

(11)

Class File Format, CFF

Represents a Java class as a stream of bytes;

Java platform has methods for converting Java class files into classes in JVM

CFF is not necessarily a file, it can be stored in a database, across the network, as part of Java archive file, JAR, etc.

CFF is standardized and is manipulated by the ClassLoader, part of Java platform

(12)

Note

If one stores CFF in a nonstandard form then one needs to construct an appropriate

ClassLoader to handle it.

(13)

Verification algorithm

Purpose: ensures that programs follow a set of rules that are designed to protect the

integrity of JVM programs;

The verification algorithm perform an abstract interpretation of CFF. If this fails the JVM

program in the CFF is aborted.

Note: this doesn’t mean that one cannot write a JVM program that

while conforming to the rules implemented by the verification algorithm violates the integrity of the JVM.

(14)

Java Platform

JVM perform fundamental computational tasks but it lacks features for doing

computer-oriented things like graphics, Internet communications, etc

Java platform includes JVM and a collection of classes that are collected into the package java.

Examples of such classes:java.applet, java.io, java.awt (abstract window toolkit), java.security, etc.

(15)

Assumptions

JVM cannot function independent of Java platform.

We assume further that Java platform contain java.lang.Object,

java.lang.ClassLoader, java.lang.String,

java.lang.Class

Note the dot-notation for Java and slash notation for JVM

(16)

JVM architecture

JVM is divided into four conceptual data spaces:

Class area, where the JVM program

(consisting of byte codes and constants) is kept;

Java stack, which keeps track of which methods have been called and the data associated with each method invocation;

Heap, where objects are kept;

Native method stacks, for supporting native methods.

(17)

Class area

Stores the classes loaded into the system; each class is defined in terms of the properties:

Its superclasses;

List of interfaces (possibly empty);

List of fields;

List of methods and their implementations stored in the method area;

List of constants, stored in the constant pool.

All properties of a class are immutable (i.e., are unchangeable)

(18)

Class descriptors

Each field is defined by a descriptor that shows the properties of the object occupying that field such as static or not;

For nonstatic fields there is a copy in each object of the class; for static fields there is a single copy for the entire class of objects;

Each method is defined by a descriptor that shows method type and method modifiers which are abstract, static, etc.;

An abstract method has no implementation; a non-abstract method has an implementation defined in terms of JVM instructions.

(19)

Example of class representation

Figures 1 and 2 depict two class areas

main method implementation

main dscrptr:Ljava/lang/String; modifs:public, static

-

Methods:

name dscrptr: Ljava/lang/String; modifs: none Fields:

Superclass: java/lang/Object ClassName: GamePlayer

Figure 1: The GamePlayer class representation

(20)

Example, continuation

getMove method implementation

getMove dscrptr:()LMove; modifs:public

-

Methods:

piece dscrptr: I modifs: static color dscrptr: I modifs: private Fields:

Superclass: GamePlayer ClassName: ChessPlayer

Figure 2: The ChessPlayer class representation

(21)

JVM stack

JVM operates on a stack of stack frames.

A stack frame consists of three elements:

1. The operand stack, which contains the operands of the operations performed by JVM;

2. The array of local variables of the method;

3. Program counter PC, shows first instruction of the method.

(22)

Execution model

Each time a method is invoked a new stack frame is created and is pushed on the JVM stack;

When a method terminates its stack frame is popped out.

The JVM performs the loop:

while (PC.opcode != Halt) {

Execute (PC);

PC := Next(PC);

(23)

More on execution model

The top frame of JVM stack shows the currently executing method and is called active frame (AF);

Only the operand stack and the local variable array in the active frame can be used during JVM program execution;

Each operation performed by JVM evaluates an expression

whose operands are on the operand stacks and leave the result on the operand stack;

When a method calls another method the PC of the caller is saved in the active frame; when callee completes the result is in top of the operand stack and the caller is resumed using the PC from callee stack frame and caller array of local variables.

(24)

The Heap

Each object is associated with a class (its type) in the class area and is stored in the heap.

Each object has a number of slots for storing fields; there is one slot for each nonstatic field in the class associated with the object.

Each object has a number of slots storing

methods that operate on that object; there is one method for each abstract method of the class associated with the object.

(25)

Example object

Figure 3 shows the heap representation of an object of the class ChessPlayer.

color:1

ToTheClass -

ChessPlayer

pieces:16 Superclass -

GamePlayer

java/lang/String

ToTheName -

Player’s name

Pooky C

Figure 3: An object of the class ChessPlayer

(26)

Native method stacks

Native methods are methods implemented using other languages than JVM;

Native methods allow programmer to handle situations that

cannot be handled completely by Java, such as interfacing with platform dependent features or legacy code;

Native methods are executed using C-like stacks;

Native methods do not exist on all JVM implementations;

moreover, different JVM implementations may have different standards for native methods;

The standard Java Native Interface, JNI, should be available for native method documentation.

(27)

Garbage collection

Each object consumes some memory from the heap;

Eventually the memory allocated to JVM object is reclaimed;

JVM reclaims object’s memory automatically through a process called garbage collection;

An object is ready to be garbage collected when it is no longer “alive".

(28)

Object liveness

Rules that determining if an object is alive are:

1. If there is a reference to the object on the stack then the object is alive;

2. If there is a reference to the object in a local variable on the stack or in a static field, then the object is alive;

3. If a field of an alive object contains a reference to the object then the object is alive;

4. JVM may internally keep references to certain objects, for example to support native methods. These objects are alive.

(29)

Verification process

Ensures that class files follow certain rules;

Allows JVM to assume that a class has certain safety properties and to make optimizations based on this;

Makes it possible to safely download Java applets from Internet;

Java compiler generates correct code.

However JVM programmer can bypass the

restrictions. Verification algorithm checks this.

(30)

How does it work?

It asks questions about CFF, such as:

Is it a structurally valid class?

Are all constant references correct?

Are all instructions valid?

Will stack and locals contain values of appropriate type?

Do classes used really exist and are correct?

(31)

JVM machine language syntax

Level 0:

byte codes,

indices in CFF (integers),

indices in the array of local variable, constant tags.

Level 1:

constants and instructions;

Level 2:

Class File Format, CFF.

(32)

JVM codes

1. JVM uses Unicode character codes (rather than ASCCI or EBCDIC). The Unicode Consortium manages this codes;

2. The Unicode was designed such that it can accommodate any known character set used by people’s alphabets;

3. Unicode Transformation Format, UTF-8, UTF-16, UTF-32 are Unicode character representations on byte, 2-bytes (half-word), 4-bytes (word).

(33)

Constant tags

Table 1: Constant tags

Tag Type Format Interpretation

1 UTF8 2+n First 2 bytes encode length n followed by n bytes of the text of the constant 2 undefined

3 Integer 4 bytes Text of a signed integer

4 Float 4 bytes Text of IEEE 754 floating-point number 5 Long 8 bytes Text of long signed integer

6 Double 8 bytes Text of IEEE 754 double-precision number 7 Class 2 bytes Reference to class name, a UTF8 constant 8 String 2 bytes Reference to string name, a UTF8 constant 9 FieldRef 4 bytes First 2 show a Class constant, second 2 a

NameAndType constant (tag 12 below)

(34)

Constant tags, continuation

Table 2: Constant tags

Tag Type Format Interpretation 10 MethodRef 4 bytes Same as FieldRef 11 IntMetRef 4 bytes Same as FieldRef

12 NameAndType 4 bytes First 2 point to name, second 2 point to descriptor. Both are UTF8 constants

(35)

Is CFF structurally valid?

The first 4 bytes of CFF must contain the hex values: CA FE BA BE which is the magic

number;

Following the magic number are minor and major version; each take two bytes

interpreted as a 16-bit unsigned:

Example: JDK 1.0, 1.1: Major = 0X2D (45), Minor = 0X3(3);

Java 2: Major: 0X2E(46); Minor: 0, if Major = 45 then Minor > 3

Figure 4 shows the structure of a CFF

(36)

Structure of the CFF

Magic# Minor Major CnstPool Class Super Interface Fields Methods

Figure 4: Structure of a properly formatted CFF

(37)

More on CFF structure

Most sections begin with a count, which is a two-byte unsigned, followed by count instances of some pattern of bytes;

Example: (see tags in Tables 1,2)

1. Constant pool start with a count followed by as many constant patterns as it specifies;

2. Each constant pattern consists of a one byte tag and a number of bytes on which constant is written;

3. The tag describes the kind of constant that follows and how many bytes does it take;

4. If any tag is invalid or file ends before correct number of constants is found then CFF is rejected.

(38)

Check constant references

Class and String constants must have references to UTF8;

FieldRef,MethodRef, InterfaceMethodref must have a class index that is a class constant

and a name-and-type index;

NameAndType constants must have two indices pointing to UTF8.

(39)

Example JVM code

Figure 5 shows a portion of the code:

.class Foo .super Bar

.implements Baz

.field field1 LFoo;

.method isEven (I)Z

; ; ...

.end method

(40)

7 0 9 Clas: name index = 9

1 0 3 F o o UTF8 Foo

7 0 7 Class name index = 7

1 0 3 B a r UTF8 Bar

7 0 5 Class name index = 5

1 0 3 B a z UTF8 Baz

1 0 5 L F o o ; UTF8 LFoo 1 0 6 f i e l d 1 UTF8 field1

1 0 4 ( I ) B UTF8 (I)B

1 0 6 i s E v e n UTF8 isEven

1 0 Constant pool count 162 = 256

. . .

0 1 Method attributes count

0 2 Method descriptor index (2=(I)B)

0 1 Method name index (1 = isEven)

0 0 There are no method flags

0 1 Method count = 1

0 0 Field attributes count = 0

0 4 Field descriptor index (4 = LFoo)

0 3 Field name index (3 = field1)

0 0 There are no field flags

0 1 Fields count = 1

0 6 Interface index (6 = Baz)

0 1 Interface count = 1

0 8 Superclas index (8 = Bar)

0 A This class index (10 = Foo

. . . Method attributes

12 34 56 78 910

Figure 5:

(41)

Are all instructions valid?

Once we know that overall class structure is valid we can look at method bodies to check if the in- structions are correctly formatted.

(42)

Problem to be solved

Does each instruction begin with a recognized opcode?

If instruction takes a constant pool reference as argument, does it point to an actual

constant pool entry with the correct type?

If the instruction uses a local variable, is the local variable range within the correct range?

If the instruction is a branch, does it point to the beginning of an instruction?

(43)

A closer look at CFF

Consider the Java "hello world" program:

public class hello {

public static void main(String argv[]) {

System.out.println("Hello, world");

} }

Note: the file hello.java, containing this program, is mapped by the java compiler (javac hello.java) into the CFF file hello.class that is

interpreted by JVM.

To understand CFF we look at the file hello.class

(44)

Notation

Represent CFF on three columns:

1. Left column: offset, in hex, into CFF

2. Middle column: bytes at the offset location in hex

3. Right column: interpretation of the middle column by JVM

(45)

Example

File header

000000 cafebabe Magic = ca fe ba be 000004 0003 Minor version = 3

000006 002d Major version = 2*16 + 13 = 45

(46)

Constant pool

000008 0020 There are 2 * 16 = 32 constants in the pool 00000a 08001f 1:a string at index 16 + 15 = 31 in CFF

00000d 07001d 2:a class name at index 16 + 13 = 29 in CFF 000010 070018 3:a class name at index 16 + 8 = 24 in CFF 000013 07000e 4:a class name at index 14 in CFF

000016 070013 5:a class name at index 19 in CFF

000019 090002000a 6:FieldRef:class index 2,name-and-type index 10 00001e 0a00040009 7:MethodRef:class index 4,name-and-type index 9 000023 0a0003000b 8:MethodRef:class index 3,name-and-type index 11 000028 0c000c0017 9:NameAndType:name index 12,descriptor index 23 00002d 0c0016001c 10:NameAndType:name index 22,descriptor index 28 000032 0c001b001e 11:NameAndType:name index 27,descriptor index 30 000037 010007 12: UTF8, length 7

00003a 7072696e746c6e println

(47)

Constant pool, continuation

000041 01000d 13: UTF8, length 13

000044 436f6e7374616e7456616c7565 ConstantValue

000051 010013 14: UTF8, length 19

000054 6a6176612f696f2f5072696e74537472 java/io/PrintStream

000067 01000a 15: UTF8, length 10

00006a 457863657074696f6e73 Exceptions

000074 01000a 16: UTF8, length 10

000077 68656c6c6f2e6a617661 hello.java

000081 01000f 17: UTF8, length 15

000084 4c696e654e756d6265725461626c65 LineNumberTable

000093 01000a 18: UTF8, length 10

000096 536f7572636546696c65 SourceFile

0000a0 010005 19: UTF8, length 5

0000a3 68656c6c6f hello

(48)

Constant pool, continuation

0000a8 01000e 20: UTF8, length 14

0000ab 4c6f63616c5661726961626c6573 LocalVariables

0000b9 010004 21: UTF8, length 4

0000bc 436f6465 Code

0000c0 010003 22: UTF8, length 3

0000c3 6f7574 out

0000c6 010015 23: UTF8, length 21

000069 284c6a6176612f6c616e672f53747269 (Ljava/lang/String;)V

0000de 010010 24: UTF8, length 16

0000e1 6a6176612f6c616e672f4f626a656374 java/lang/Object

0000f1 010004 25: UTF8, length 4

0000f4 6d61696e main

0000f8 010016 26: UTF8, length 22

0000fb 285b4c6a6176612f6c616e672f537472 ([Ljava/lang/String;)V

(49)

Constant pool, continuation

000111 010006 27: UTF8, length 6

000114 3c696e69743e <init>

00011a 010015 28: UTF8, length 21

00011d 4c6a6176612f696f2f5072696e745374 Ljava/io/PrintStream;

000132 010010 29: UTF8, length 16

000135 6a6176612f6c616e672f53797374656d java/lang/System

000145 010003 30: UTF8, length 3

000148 282956 ()V

00014b 01000c UTF8, length 12

00014e 48656c6c6f2c20776f726c64 Hello, world

(50)

Constant entries

The first constants are strings codified as UTF8 entries

Strings are followed by small constants, 3,4,5, etc (of which there is none in the example)

codified on a byte

These are followed by integer and long constants codified as two’s complement signed integers on 32 and 64 bits

respectively.

Floating and double constants codified as shown in Table 3

(51)

Other fields

Fields, Methods, and Class entries:

Constants with tags 9, 10, 11 are identical. They are used to refer to fields and methods in field and method instructions such as

getfield, putstatic, invokevirtual

Example: constant 7 in constant pool is 0a 0004 0009 i.e:

1. 0a = 10, it is a MethodRef

2. Class containing the method is at index 4 whose name is at index 14, i.e., java/io/PrintStream

3. Name and descriptor is at index 9: name index 12 (println), descriptor index 23 [(Ljava/lang/String;)V]

This is enough info to call the method; Constant 7 is used to code the arguments of Oolong instructions

(52)

Class information

Following the constant pool is the information about the class itself which consists of: name, type, and access flags as seen below

Example hello,java

00015b 0021 two bytes, access flags = 33

00015d 0005 two bytes, index of this in constant pool, 5 00015f 0003 two bytes, index of super in constant pool, 3 000161 0000 two bytes, number of interfaces, 0

(53)

Access flags

are interpreted as a bit-vector as seen below:

Bit Name Meaning

1 ACC_PUBLIC The class is public

2-4 Not used

5 ACC_FINAL The class is final 6 ACC_SUPER The class is supper

7-9 Not used

10 ACC_INTERFACE The class is an interface

11 Not used

12 ACC_ABSTRACT The class is abstract

(54)

Fields and Methods

After class information comes four bytes that describe the number of fields and methods. In our example they are:

000163 0000 Number of fields is zero

000165 0002 There are two methods in this class

Fields and methods have identical formats.

000167 0009 access flags of the method = 9

000169 0018 name of the method is index 24 in constant pool (main) 00016b 001a descriptor of the method has index 26 in constant pool

(55)

Method access flags

Are specified in the table:

Bit Name Meaning

1 ACC_PUBLIC The field/method is public 2 ACC_PRIVATE The field/method is private 3 ACC_PROTECTED The field/method is protected 4 ACC_STATIC The field/method is static 5 ACC_FINAL The field/method is final 6 ACC_SYNCHRONIZED The method id synchronized 7 ACC_VOLATILE The field is volatile

8 ACC_TRANSIENT The field is transient

9 ACC_NATIVE The method is native

10,11 Unused

12 ACC_ABSTRACT The method is abstract

(56)

Attributes

After the general method or field information the CFF contains a list of attributes

Fields and methods have different kind of attributes. Methods have a single attribute giving the implementation of method; most fields have no attributes at all

Only the ConstantValue attribute is defined for fields

Attributes for the methods are represented as shown bellow

(57)

Attributes for methods

00016d 0001 1 method attributes: method attribute 0 follows 00016f 0015 name: at index 21 in constant pool, Code

000171 00000025 Length of the code is 37 000175 0002 Maximum stack is 2 slots

000177 0001 Maximum space for locals is 1

(58)

The actual byte code

Disp. Bytecode Addr Interpretation

000179 00000009 Code length: 9 bytes

00017d b20006 0000 getstatic #6, index in constant pool 6 000180 1201 0003 ldc #1, index 1 in constant pool

000182 b60007 0005 invokevirtual #7, index 7 in constant pool

000185 b1 0008 return

Note: code of length up to 4G bytes (232) is al- lowed; however, other constraints limit code size to 64K.

(59)

Observations

1. There are two forms of ldc instruction, ldc and ldc w: ldc requires one byte argument interpreted as index 0

..255 in constant pool, ldc_w requires two bytes argument that may refer to any constant

2. In either case constant pool entry must be Integer, Float, Double, Long, or String

(60)

Exception table

Following byte code is an exception table entry which begins with two-byte count, the number of entries:

000186 0000 there are no exceptions in this method

Note: following the exception handler table, the code attribute may have attributes of its own, such as debugging info.

(61)

Main method

The main method has one attribute, LineNumberTable:

000188 0001 1 code attributes: code attribute 0 follows 00018a 0011 Name: index 17 in CFF: LineNumberTable 00018c 0000000a Length of attribute 10

000190 0002 Number of entries: 2 000192 0000 Start PC: 0

000194 0005 Line number: 5 000196 0008 Start PC: 8

000198 0003 Line number 3

(62)

Method 1

Starts after the code attribute of method 0

00019a 0000 Access flags = 0

00019c 001c Name: index 28 in constant pool (<init>) 00019e 001e Descriptor: index 30 in constant pool, ()V 0001a0 0001 1 method attributes: method attribute 0 0001a2 0015 Name: index 21 in constant pool (Code) 0001a4 0000001d length of the attribute 29

0001a8 0001 Maximum stack: 1 0001aa 0001 Maximum locals: 1 0001ac 00000005 Code length: 5 0001b0 2a 00000000 aload_0

0001b1 b70008 00000001 invokespecial #18 0001b4 b1 00000004 return

(63)

Method 1, continuation

0001b5 0000 0 exception table entries

0001b7 0001 1 code attributes: code attribute 0:

0001b9 0011 Name: index 17 in constant pool (LineNumberTable) 0001bb 00000006 Length of attribute : 6

0001bf 0001 Length of table 1

0001c1 0000 0001 Start PC: 0, Line number: 1

(64)

Class attributes

CFF ends with a list of class attributes

A class can have any attributes it wants but only SourceFile attribute is defined in Java specification

0001c5 0001 1 class file attributes Attribute 0:

0001c7 0012 Name: index 18 in constant pool (SourceFile) 0001c9 00000002 Length: 2 bytes

0001cd 0010 Name: index 16 in constant pool (hello.java)

References

Related documents

The present work evaluates the nutritional and feed value of fermented sweet potato meal (ProEn-K TM ) to replace soybean meal in the diet of juvenileP.

Hipotezės „H2b: X ir Y kartų tęstinį įsipareigojimą organizacijai lemia skirtingi psichologinės sutarties veiksniai“ ir „H2c: X ir Y kartų normatyvinį

Based on this chart, try to schedule practice time during your peak energy hours, and schedule grunt work, naps, chores, or exercise for those times when you don’t have the energy

Generally, there are two primary legal approaches to internet contract enforcement: The United States model that relies on basic notice requirements to establish and

rights, the definition of an issuer under federal law has been explicitly limited to owners of fractional undivided interests in oil, gas, or other mineral rights who

Different pathways or biological processes were represented by genes associated with aggressive (zinc ion response and lipid metabolism), order (lipid metabolism), sexual/religious

If the object associated with the journal entry is a file object the object library field contains the file library name. ■

Having considered the cases of pancreas transplant loss due to immunological factors in the early postoperative period (irreversible antibody-mediated rejection in Pa- tient 29) and