FormulasP
Code Symbol Grammar
0x00 true 0x01 false 0x02 ∧ P P 0x03 ∨ P P 0x04 ⇒ P P 0x05 ¬ P 0x06 ∀ bv P 0x07 ∃ bv P 0x10 = E E 0x11 > E E 0x12 < E E 0x13 ≤ E E 0x14 ≥ E E
0x16 <: type(ident) type(ident)
0x17 6= E E
ExpressionsE
Code Symbol Grammar
0x20 + E E 0x21 − E E 0x22 ∗ E E 0x23 / E E 0x24 % E E 0x25 − E 0x40 int constant i 0x50 type ident 0x51 elemtype E 0x52 result 0x53 typeof E 0x54 TYPE 0x55 old 0x61 arrAccess(,) E E 0x63 . E E 0x70 this 0x80 null 0x90 FieldConstRef ident
0xA0 Reg digits
0xD0 EXC
0xE0 bv intLiteral
ModifiesmodLocation
Code Symbol Grammar
0xD2 nothing 0xD3 everything
0xD4 arrayModAt(,) E specIndex
0xD5 . E FieldConstRef
0xD6 Reg ident
Index of modified array elementsspecIndex
Code Symbol Grammar
0xE1 all
0xE2 . . . E E
Appendix B
Proofs of properties from Section
7.1.2
Lemma 7.1.2.1. For any statement or expression SE which does not terminate on return or
athrow, start label s and end label e, the compiler will produce a list of bytecode instruction
ps,SE, eqmsuch that instruction e+ 1 may execute aftere, i.e. e−→e+ 1
Proof: The proof is by structural induction over the compiled statement. We sketch the case for compositional statement, the other cases being similar. Remind that by compiler definition, we get that the compilation ofS1;S2 starting at indexsisps,S1;S2, eqm =ps,S1, e0qm;pe0+ 1,S2, eqm.
By induction hypothesis, we get that the lemma holds forpe0+1,S
2, eqmand we get thate−→e+1
which means that this case holds. Qed.
In the next, we will need several auxiliary lemmas that will allow us to prove the statements from Section 7.1.2. The next lemma states that all the jump instructions in the compilation of a statement target instruction which are also in the compilation of the statement. For illustration, we may return back to the example Fig. 7.6 on page 101 and focus on the compilation of the if statement if ( i >= 0)then{v = i}else {v =−i} which comprises instructions from 4 to 12. Note that sequential instructions have as successor the next instruction. Thus, sequential instructions respect this condition. Only jump instructions may cause control transfer outside the if compilation. We notice that the compilation contains two jump instructions. The first is the instruction 5 ifge 10 which jumps inside the compilation of the if statement and the instruction 9goto12 which jumps also inside. Thus, the compilation of theifstatement respects the property. Lemma B.1 (Jumps in statement compilation target instructions inside the statement compilation). For any statement or expressionSE, the compiler will produce a list of bytecode in- structionps,SE, eqmsuch that every jump instruction(gotoorifcond ) which is in the compilation
does not transfer the control outside the region of the compilation ps,SE, eqm.
Proof: The proof is done by contradiction and uses structural induction. We sketch the proof for the compilation of the if statement, the rest of the cases being similar.
Suppose that this is not true. Recall that the compilation of a conditional statement starting at indexs(Fig. 7.2 on page 98) is of the form:
ps,if(E1condE2)then{S1}else{S2}, eqm=
ps,E1, e0qm pe0+ 1,E2, e00q m e00+ 1 :ifconde000+ 2; pe00+ 2,S 2, e000qm e000+ 1 :gotoe pe000+ 2,S1, e−1q m; e:nop
Because by induction hypothesis every jump in pe000 + 2,S
1, eqm targets instructions inside
pe000 + 2,S
1, eqm, it is not possible that there be a jump inside pe000+ 2,S1, eqm which targets
outside the compilation of the conditional statement. Similarly, we conclude that such a jump cannot be contained inpe000+ 2,S
remain possible are the jumpse00+1 :ifconde000+2 ande000+1 :gotoe. But for both instructions
that is not true. Thus, the lemma holds for this case.
Qed.
The next property of the compiler is that any statement or expression is compiled in a list of bytecode instructions such that instructions inside the compilation of a statement or expression cannot be targeted by instructions which are outside the statement compilation except for the first instruction in the compilation. For instance, the instructions 15-30 in the compilation of the while statement in Fig. 7.6 on page 101 can be reached from outside of the statement compilation only by passing through the instruction at index 15.
Lemma B.2 (Compilation of statements and expressions cannot be jumped from out- side inside). For all statements and expressionsSE0andSE, such thatSE0is a substatement ofSE
(SE[SE0]) and their compilations areps,SE, eqmandps0,SE0, e0
qm. Let us have instruction at index
j in the compilation of SE ( j ∈ps,SE, eqm) but which is not in the compilation of ps0,SE0, e0
qm
( ¬(j∈ps0,SE0, e0
qm)). Let us also have instruction k in the compilationps0,SE0, e0
qm. Suppose
thatjmay execute afterk(j −→k). It then follows thatkis the first instruction in the compilation of SE0
Proof:The proof is by induction over the structure of statements and expressions. We sketch here the case for compositional statement, the other cases are similar and use previous Lemma B.1 and Lemma 7.1.2.1 The compositional statementS1;S2 has the compilationps,S1;S2, eqmwhich
by the compiler definition isps,S1, e0qmpe0+ 1,S
2, eqm. By induction hypothesis the lemma holds
both for ps,S1, e0qm and pe0+ 1,S
2, eqm. It is also necessary to show that there are no jumps
from the compilation ofS1into the compilation ofS2 and vice versa. Both directions follow from
Lemma B.1 that all jumps in a statement compilation are inside the statement. Moreover, from Lemma 7.1.2.1 we have that e0 −→ e0+ 1 which conforms to the statement. Thus, the case for
compositional statement holds.
Qed.
Lemma 7.1.2.2 (Compilation of expressions). For any expression E , starting labelsand end label e, the compilationps,E, eqm is a block of bytecode instruction in the sense of Def. 7.1.2.1
Proof:
Following the Def. 7.1.2.1 of block of bytecode instructions, we have to see if the compilation of an expression respects three conditions. The first condition of Def.7.1.2.1 states that none of the instructions is a target of an instruction outside of the compilation of the expression except from the first instruction. This follows from lemma B.2. The second condition in Def. 7.1.2.1 requires that there are no control transfer instructions (jumps,returnandathrow) in the list of instructions representing the compilation of an expression, i.e. that every instruction in the compilation of an expression is in execution relation with the next instruction. This is established by induction over the structure of the expression. The third condition in Def. 7.1.2.1 states that the compilation ps,E, eqmis such that no instruction except possibly for the first instruction in the expression com-
pilation is in a loop execution relation with its predecessor in the sense of Def. 3.9.1 in Chapter 5, Section 3, page 25. Assume that this is not the case. This would mean that there existi, s < i≤e such that between it and its predecessor there is a loop edge i−1−→l i. Following Def. 3.9.1 this would mean that every execution path reaching instruction i−1 must pass before through instruction i. As all the instructions in the compilation of an expression are sequential in order that the latter be true there should be a jump to instructionifrom outside the compilation of the expression. But this contradicts the first condition. Thus, it follows that our hypothesis is false and we can conclude that the third condition of Def. 7.1.2.1 holds for compilation of expressions.
Qed.
We shall now proceed to the lemma which establishes that there are loops in the bytecode control flow graph corresponding to the compilation of a statement only if the statement contains loops.
141
Lemma 7.1.2.3. The compilation ps,S, eqm of a statement S may contain an instruction k and
j which are respectively a loop entry and a loop end in the sense of Def. 3.9.1, page 43 (i.e. there existsj such thatj −→lk ) if and only if S contains a substatement S0 which is a loop statement:
j=loopEntryS0−1∧k=loopEntryS0
Proof: By structural induction over the compiled statement. The direction when statement contains a loop statement is trivial. We will show the other direction for compositional statements and if statement.
Compositional statement Let us have the statementS1;S2and its compilationps,S1;S2, eqm.
From the compiler definition, we have that
ps,S1;S2, eqm=ps,S1, e0qmpe0+ 1,S2, sqm
By induction hypothesis, the lemma holds for the compilations of S1 and S2 which are
ps,S1, e0qm andpe0+ 1,S2, sqm. Let us see which are the other possible execution edges in
the compilation.
Lemma 7.1.2.1 and Lemma B.2 is e0 −→ e0+ 1. We will show by contradiction that the
execution relation betweene0 and e0+ 1 is not a loop execution relation. Assume that the
execution is a loop execution relation, i.e. e0 −→l e0+ 1. Following Def. 3.9.1, this means
that every path P in the control flow graph from the program entry point instruction which reaches e0 has a subpath subP which does not pass through e0 and which passes through
e0+ 1. This is possible in two cases:
• if there is a jump from outsideps,S1;S2, eqmto the instructione0+ 1. Two possibilities
exist. S1;S2is a substatement of statementS3, i.e. S3[S1;S2] and an instruction from
the compilationps00,S
3, e00qm ofS3 but which does not belong tops,S1;S2, eqm jumps
to e0+ 1. This is not possible following Lemma B.2. The other possibility is that S
3
precedes or followsS1;S2, i.e. S3;S1;S2 orS1;S2;S3. But it is not possible that such
a jump exists from Lemma B.1
• there is a jump instruction in ps,S1, e0qm toe0+ 1 which is not the instructione0 but
this is not possible following Lemma B.1
Conditional statement By definition, its compilation results in
ps,if(E1condE2)then{S1}else{S2}, eqm=
ps,E1, e0qm pe0+ 1,E2, e00qm e00+ 1 :ifconde000+ 2 pe00+ 2,S 2, e000qm e000+ 1 :gotoe pe000+ 2,S1, e−1qm e:nop
By induction hypothesis, we get that the lemma holds for the substatement compilations. The possible loop execution edges are :
• e0 −→l e0 + 1 This would mean that every execution path reaching e0 passes before
through e0+ 1. Let us see how e0+ 1 can be reached from the program entry point.
From Lemma B.2 we know that there could be no jumps from outside the statement compilation inside it except for the first instruction. Thus, every control flow path P reachinge0+ 1 has a subpath subP
s which first reaches the instruction at indexs and does not pass through any instruction from the conditional statement. Because all the instructions inps,E1, e0qmare sequential (Lemma 7.1.2.2), every path from the
program entry point reachinge0+ 1 passes throughe0. Thus the assumption is false. We
may apply similar reasoning to establish that it is not true thate00−→le00+ 1 neither
e00+ 1−→le00+ 2,e00+ 1−→le000+ 2
• e000−→le000+ 1 This is not possible becausee000can be reached from the program entry
point by a pathP which has a subpathsubPswhich reachessand which does not pass through any instruction from the conditional statement. From the instruction s, the
control flow path passes throughs . . . e0, e0+ 1. . . e00, e00+ 1, e00+ 2. . . e000. Thus there is
a path from the program entry point to e000 which does not pass through e000+ 1 and
thus, it is not true thate000 −→le000+ 1 In the same way, we can show that it is not true
thate000+ 1−→le, neithere−1−→le
Qed.
For establishing Property 7.1.2.4, we will need several auxiliary lemmas. First, we have to show that the regions described in the exception handler table elements correspond to statements which are declared in the try clause of try catch or try finally statements. For instance, we can return back to the example in Fig. 7.7 on page 103 and see that the exception handler table abs.ExcHandler contains one element which describes the unique exception handler in the method. In particular, it states that the region between 2 and 20 is protected fromNullExcby the bytecode starting at index 22. We may remark that the region between 2 and 20 corresponds to the compilation of the
if statement.
Lemma B.3 (Exception handler element corresponds to a statement). Every element
(s, e, eH,Exc) in the exception handler table m.excHndlS resulting from the compilation of method
m is such that exist statements S1, S2 and S such thatps,S1, eqm and statement S is either a try
catch statement of the form S =try{S1}catch(Exc){S2} or a try finally statement of the form S =try{S1}finally{S2}
Proof: The proof is done by contradiction and follows directly from the definition of the com- piler. Particularly, from the compiler definition, we get that elements are added inm.excHndlSonly in the cases of try catch and try finally statement compilation and that the guarded region in the the newly added element correspond to the try statement.
Qed.
In the following when we refer to the fact that a statement S is either a try catch or a try finally statement with a try substatement S0 and we are not interested in the catch or finally part we denote this with S =tryS0. . .
From the compiler definition, we can also see that the indexes of the instructions corresponding to the compilation ps,SE, eqm ofSE are all comprised in between sande.
Lemma B.4 (Indexes of the compilation of expressions and statements). The compilation
ps,SE, eqm ofSE is such that • s≤e
• every instruction inps,SE, eqm has an index in betweensande
The latter follows directly from the compiler definition function.
We establish now several properties concerning substatement relation which has been introduced in Section 7.1.2, page 7.1.2. The next lemma establishes that the substatement relation between statements is preserved by the compiler. In particular, we establish that if a statement is a substatement of another then all of its instructions are contained in the compilation of the other one. Also, if two statements are not in a substatement relation neither their compilations are. Lemma B.5 (Substatement relation preserved by the compiler). For all statements S1
and S2, with respective compilations ps1,S1, e1qm andps2,S2, e2qm the following holds • if S2 is a substatement of S1 (S1[S2]) thens1≤s2 ande2≤e1
• if S2is not substatement of S1 (¬S1[S2]), neither S1is a substatement of S2(¬S2[S1]) then
e1< s2 or e2< s1
This also follows from the compiler function.
The next lemma states that if an instruction is part of the compilation of two source statements then we have that these statements are in a substatement relation.
143
Lemma B.6 (Common instructions in the compilation of statements ). For all statements S1and S2, with respective compilations ps1,S1, e1qmandps2,S2, e2qmif it is true thats1≤k≤e1
ands2≤k≤e2 then S1[S2]or S2[S1]
Proof: by contradiction.
The case when S1 and S2 are the same is trivial. Let S1 and S2 are different. Assume that
the above is not true, i.e. (1) ps1,S1, e1qm, (2) ps2,S2, e2qm, (3) s1 ≤ k ≤ e1∧s2 ≤ k ≤ e2,
(4)¬S1[S2]∧ ¬S2[S1]. From(4) and previous Lemma B.5, case 2 we obtaine1< s2 or e2 < s1
But this is in contradiction with(3)and thus the lemma holds also in this case.
Qed.
Lemma 7.1.2.4 (Exception handler property). Let us have a statement S which is not a try catch statement neither a try finally statement in methodm. Assume that statement S0 is its direct substatement, i.e. S[[S0]]. Let their respective compilations be ps,S, eqmandps0,S0, e0
qm, then the
exception handlers for the instruction pointseand e0 are the same:
∀Exc,findExcHandler(Exc, e,m.excHndlS) =findExcHandler(Exc, e0,m.excHndlS)
Proof: by contradiction Assume the following:
(1) S[[S0]], S0 is a strict substatement of S
(2)ps,S, eqm, the compilation ofS is in betweensande
(3)ps0,S0, e0
qm, the compilation ofS0 is in betweens0 ande0
(4)∃s1, e1, eH1, s2, e2, eH2,Exc such that
(s1, e1, eH1,Exc) is in the exception handler tablem.excHndlS
(s2, e2, eH2,Exc) is in the exception handler tablem.excHndlS
findExcHandler(Exc, e,m.excHndlS) =eH1
findExcHandler(Exc, e0,m.excHndlS) =eH2
eH16=eH2
From definition of the functionfindExcHandlerin Section 3.5, page 33, we get that iffindExcHandler
returns an exception handler for an index this means that the index is in the region protected by the exception handler:
(5)s1≤e≤e1
(6)s2≤e0 ≤e2
From Lemma B.3 we know that protected regions in the exception handler table correspond to source statements:
(7)∃S1, stmttry1 such thatps1,S1, e1qmandstmttry1 =try{S1}. . .
(8)∃S2, stmttry2 such thatps2,S2, e2qmandstmttry2 =try{S2}. . .
From Lemma B.5 we know that the compiler preserves the substatement relation and thus, from
(1)we conclude that the first and last indexes in the compilation ofS0are in the region determined by the firstsand last indexeof the compilation of statementS:
(9)s≤s0∧e0 ≤e
From Lemma B.4 for instructions in a source statement compilation: