Establishing the equivalence between verification conditions on source and bytecode

tions on source and bytecode level

In the following, we proceed with establishing the equivalence between source and bytecode proof obligations. As we have already seen, the source programming language supports expressions and statements. Because statements and expressions play different roles in a source program language they need a different treatment here in this proof. Let us remind briefly about the semantics of these constructs and their compilation. Expressions evaluate to a value and thus, their compilation affects the operand stack on execution time. As we have discussed previously, because we compile for a stack based virtual machine, an expression compilation results in a sequence of instructions whose execution must leave on the stack top the expression value. Statements have a different role in the language. They do not have values, but they control the control flow in the program.

We focus now on the relation between thewp predicate transformer functions for expressions on bytecode and source level. But before entering into technical details, we illustrate this relation by an example given in Fig.7.8. The figure contains three parts. The first part shows the compilation of a source expression sqr + 2∗s in a methodmstarting at index i. There we show the steps that the compiler will take for the compilation of the expression: compile the access of the variable sqr, the multiplication 2∗s and compile their addition.

The second part calculates the preconditions of the instructions resulting from the expression compilation against a postcondition that the stack top element is equal to 5 (st(cntr) = 5 ). Actually, this postcondition requires that the evaluation of the expression must be equal to 5. This is because as we said in the beginning of the section the compiler translates source expressions to

7.2 Establishing the equivalence between verification conditions on source and

bytecode level 105

a sequence of bytecode instructions such that their execution must leave the expression value on the stack top. Note that every instruction is followed by its postcondition and is preceded by its weakest precondition. This means that the weakest predicate of an instruction is the postcondition of the predecessor instruction. This is because compilation of expressions may not contain loop entries (see previous Section 7.1.2.2, page 102).

The third part shows how the precondition of the source expression is calculated w.r.t. the postconditionv= 5 wherevis the special logical variable which stands for the value of the expression sqr + 2∗s. Thus, the bytecode postcondition st(cntr) = 5 and the source postcondition v= 5 express the same condition respectively on source and bytecode. Note that substituting the abstract variablev with the stack top in the source postconditionv= 5[v\st(cntr)] results in the bytecode postcondition.

Let us focus on the intermediate stages in the calculation of the precondition of the whole expression. We may remark that the resulting postcondition of the source expression 2∗s isvsqr+

v2∗s= 5 and that the postcondition at line 1.9 for the last instruction of its compilation (instruction i+3:mul) isst(cntr−1) +st(cntr) = 5. Substituting in the source postcondition the abstract variablev2∗s withst(cntr) andvsqr withst(cntr-1) results in the bytecode postcondition

(1)(vsqr+v2∗s= 5)[v2∗s\st(cntr)][vsqr\st(cntr−1)]≡st(cntr−1) +st(cntr) = 5 We can remark that the precondition of the first instruction (i+1:const2) of the compilation of the expression 2∗s is st(cntr) + 2∗s = 5. We may remark that the precondition of the source expression 2∗s in the third part of the figure is equivalent tovsqr+ 2∗s = 5. Substituting in the source precondition the abstract variablevsqrwithst(cntr) results in a formula which is equivalent to the bytecode precondition

(2)(vsqr+ 2∗s = 5)[vsqr\st(cntr)]≡st(cntr) + 2∗s = 5

Equivalences(1)and(2)give the intuition of how the predicate transformers work over source expressions and their bytecode compilation. The predicate transformer over the instructions representing an expression substitutes the stack top element expression st(cntr) with the value of the expression and the stack counter is decremented. The predicate transformer over a source expression calculates a predicate where the abstract variable representing its value is substituted with the expression value. More formally, this is expressed in the next lemma.

Theorem 7.2.1. For every expression E and its compilation ps,E, eqmlet have a formula ψ and

expressionsw1. . . wk, k≥0such that

ψ[v\st(cntr)][wi\st(cntr−i)]i=1...k = inter(e, e+ 1,m)∧

∀Exc,excPost(Exc) =m.excPostIns(Exc, e)

then the following holds

wpsrc₍_E_{, ψ,}_excPost_,_m₎

v[wi\st(cntr−i+ 1)]i=1...k=wp(s,m)

Proof: the proof is by structural induction over the expression structure. We sketch the cases for field access and arithmetic expressions.

field access From the compiler definition we have

ps,E.f, eqm= _eps,_:_{getfield f}E, e−1qm; (7.2.1.1)

By initial hypothesis we have that we have a formulaψover the source language such that if the abstract variableswiwhich stand for expression values are substituted with stack expressions the formula will be the same as the formula inter(e, e+ 1,m). Or formally, for somek such thatk≥0 we have the following equality:

pi,sqr + 2∗s, i+ 4qm = pi,sqr, iqm pi+ 1,2∗s, i+ 3qm i+4:add where pi,sqr, iqm = i : loadsqr pi+ 1,2∗s, i+ 3qm = i+1:const2 i+2:loads i+3:mul 1.1 sqr + 2∗s = 5 1.2 i : loadsqr 1.3 st(cntr) + 2∗s = 5 1.4 i+1:const2 1.5 st(cntr−1) +st(cntr)∗s = 5 1.6 i+2:loads 1.7 st(cntr−2) +st(cntr−1)∗st(cntr) = 5 1.8 i+3:mul 1.9 st(cntr−1) +st(cntr) = 5 1.10 i+4:add 1.11 st(cntr) = 5 2.1 wpsrc(sqr + 2∗s, v= 5,excPost,m)v= 2.2 wpsrc(sqr,wpsrc(2∗s,(v= 5)[v\vsqr+v2∗s],excPost,m)v2∗s,excPost,m)vsqr= 2.3 wpsrc_(sqr_,_wpsrc₍₂_,_wpsrc_(s_,₍_v

sqr+v2∗s= 5)[v2∗s\v2∗vs],excPost,m)vs,excPost,m)v2,excPost,m)vsqr=

2.4 wpsrc_(sqr_,_wpsrc₍₂_,₍_v sqr+v2∗vs= 5)[vs\s],excPost,m)v2,excPost,m)vsqr= 2.5 wpsrc_(sqr_,₍_v sqr+v2∗s = 5)[v2\2],excPost,m)vsqr= 2.6 (vsqr+ 2∗s = 5)[vsqr\sqr] = 2.7 sqr + 2∗s = 5

Figure 7.8: Expression, its compilation and their respective preconditions

From the definition of thewp for source field access expressions, we have also

wpsrc₍_E_._f_{, ψ,}_excPost_,_m₎ v=wpsrc(E, ψ0,excPost,m)v1 where ψ0 ₌ v16=null⇒ψ[v\v1.f] ∧

v1=null⇒excPost(NullExc)

(7.2.1.3)

Because the execution relation betweeneande+ 1 is not a loop backedge by Lemma 7.1.2.2 which establishes that compilation of expressions does not contain loop entries and from the Def. 5.3.1.1 of the function inter: (5) inter(e−1, e,m) =wp(e,m)

definition of thewp function for getfield

wp(e,m) =

st(cntr)6=null⇒ inter(e, e+ 1,m)[st(cntr)\st(cntr).f]

∧

st(cntr) =null⇒m.excPostIns(NullExc, e)

From (7.2.1.1), (7.2.1.2) and (7.2.1.3) and the above facts for thewp function over bytecode and source respectively, we conclude:

ψ0[v1\st(cntr)][wi\st(cntr−i)]i=1...k= inter(e−1, e,m) (7.2.1.4) For the exceptional postcondition functions on source and bytecode we get from Lemma 7.1.2.5 and the initial hypothesis that

∀Exc,excPost(Exc) =m.excPostIns(Exc, e−1) (7.2.1.5) We apply the induction hypothesis over (7.2.1.4) and (7.2.1.5) and obtain

wpsrc(E, ψ0,excPost,m)v1[wi\st(cntr−i+ 1)]i=1...k=wp(s ,m) Finally, (7.2.1.3) and the last equality allows us to conclude that this case holds

7.2 Establishing the equivalence between verification conditions on source and

bytecode level 107

arithmetic expression From the compiler definition we have

(1)ps,E1 op E2, eqm=

ps,E1, e0qm;

pe0_{+ 1}_,_E

2, e−1qm;

e:op

As in the previous case, we can conclude that there exists a formulaψover source expressions with the following property

ψ[v\st(cntr)][wi\st(cntr−i)]i=1...k = inter(e, e+ 1,m) (7.2.1.6) We also remind the formulation of the wp function for source arithmetic expressions

wpsrc₍_E

1op E2, ψ,excPostsrc,m)v =wpsrc(E1,wpsrc(E2, ψ0,excPostsrc,m)v2,excPost src_,

m)v1 withψ0₌_ψ_[_v_\_v

1opv2]

It follows from Lemma 7.1.2.2 about expressions that the compilation of an expression results in a list of instructions which does not contain loop entries. Thus, from the Def.5.3.1.1 of the function inter we get

inter(e−1, e,m) =wp(e,m) It follows from thewp for an arithmetic instruction

wp(e,m) = inter(e, e+ 1,m)[cntr\cntr−1][st(cntr−1)\st(cntr−1)op st(cntr)] which because of (7.2.1.6) is equal to

ψ[v\st(cntr)][wi\st(cntr−i)]i=1...k[cntr\cntr−1][st(cntr−1)\st(cntr−1)op st(cntr)] Because the formulaψrefers to source expressions and does not contain stack expressions we can conclude by applying substitutions

ψ[v\st(cntr−1)opst(cntr)][wi\st(cntr−i+ 1)]i=1...k which is equal to

ψ0[v2\st(cntr)][v1\st(cntr−1)][wi\st(cntr−i+ 1)]i=1...k

From the last equalities we can apply the induction hypothesis overE2 andψ0 and get

wpsrc(E2, ψ0,excPostsrc,m)v2[v1\st(cntr)][wi\st(cntr−i+1)]i=1...k=wp(e0+1,m) (7.2.1.7) As it follows from Lemma 7.1.2.2 we have that inter(e0_{, e}0₊₁_,_m_{) =}_wp₍_e0₊₁_,_m_{) and because}

of the last result (7.2.1.7), we can apply the induction hypothesis overE1, we conclude that

this case holds.

Qed.

The next lemma states the same property but this time for the compilation of statements. Theorem 7.2.2. Let us have the statement S , its compilation ps,S, eqmin methodm, the formula

ψ and the exceptional functionexcPost: ExcType→P such that 1. ife+ 1 exists thenψ= inter(e, e+ 1,m)

3. ∀Exc,excPost(Exc) =m.excPostIns(Exc, e)

then the following holds:

wpsrc(S, ψ,excPost,m) =wp(s ,m)

Proof : the proof is by structural induction on statements and uses the properties of the compiler shown before. We give here the proof for the cases of compositional statement, while and try catch statement. The rest of the cases proceed in a similar way.

compositional statement

wpsrc(S1;S2, ψ,excPost,m) =wpsrc(S1,wpsrc(S2, ψ,excPostsrc,m),excPostsrc,m)

From the compiler definition we get

ps,S1;S2, eqm=ps,S1, e0qm;pe0+ 1,S2, eqm

It follows from the induction hypothesis overS2 and the initial hypothesis about the post-

condition

wpsrc(S2, ψ,excPost,m) =wp(e0+ 1 ,m) (7.2.2.1)

Lemma B.1 states thateande+ 1 are in execution relation Lemma 7.1.2.3 states that loop edges (−→l_{) appear only on loop statement compilation and thus, the edge between}_e0 _and

e0_{+ 1 is not a loop edge. In that case from Def. 5.3.1.1 of the function} _inter _{we get}

inter(e0, e0+ 1,m) =wp(e0+ 1,m) (7.2.2.2) The statementS1 is a strict substatement ofS1;S2 and thus, from Property 7.1.2.4 follows

∀Exc,m.findExcHandler(Exc, e,m.excHndlS) =m.findExcHandler(Exc, e0,m.excHndlS) From the above conclusion and the Def. 5.3.2.1 of functionexcPostIns

∀Exc, m.excPostIns(Exc, e) =m.excPostIns(Exc, e0) (7.2.2.3) From (7.2.2.1), (7.2.2.2) and (7.2.2.3) we get that

wpsrc(S1,wpsrc(S2, ψ,excPost,m),excPost,m) =wp(s ,m)

From this last equality we conclude that this case holds.

while statement Let us remind the definition of the wp for while statements

wpsrc₍_while₍_E

1 cond E2) [INV,modif] {S},nPostsrc,excPostsrc,m) =

INV∧ ∀ m, m∈modif, INV⇒wpsrc₍_E 1,wpsrc(E2,P,excPostsrc,m)v2,excPost src ,m)v1 where

P= (v1condv2)⇒ wpsrc(S,INV,excPostsrc,m) ∧

¬(v1condv2)⇒nPostsrc

We remind that the compilation of the while statement results as follows ps,while(E1cond E2)[INV,modif] {S}, eqm=

s:gotoe0_{+ 1;} ps+ 1,S, e0 qm; pe0_{+ 1}_,_E 1, e00qm; pe00_{+ 1}_,_E 2, e−1qm; e:ifcond s+ 1;

7.2 Establishing the equivalence between verification conditions on source and

bytecode level 109

From Lemma 7.1.2.3 we know that the execution relation between e0 _and _e0_{+ 1 is a loop}

execution relation. From Def. 5.3.1.1, 68 of the function inter we conclude that

inter(e0, e0+ 1,m) =INV (7.2.2.4) From Lemma 7.1.2.4 we get that the exception handlers for the last indexe0_{in the compilation}

ofS and the indexeare the same. Thus, from the initial hypothesis for the exception handler function, we conclude that

∀Exc,excPost(Exc) =m.excPostIns(Exc, e0) (7.2.2.5) We can apply the induction hypothesis over (7.2.2.4) and (7.2.2.5) and statementS and we get

wpsrc(S,INV,excPostsrc,m) =wp(s+ 1,m) (7.2.2.6) From Lemma 7.1.2.3 we get that the execution relation between eand s+ 1 is not a loop execution relation. From def. 5.3.1.1, we conclude that wp(s+ 1 ,m) = inter(e, s+ 1,m). Thus, from (7.2.2.6) and the definition of thewp function forifcond, we conclude that the

wp of the instruction at indexeis equivalent to

wp(e ,m) =

st(cntr)cond st(cntr−1)⇒wpsrc₍_S_,_INV_,_excPostsrc_,

∧

¬(st(cntr)cond st(cntr−1))⇒ inter(e, e+ 1,m)

Using the latter, the previous Lemma 7.2.1 for expressions is applied twice overE1 andE2

and thus, we obtain

wpsrc₍_E

1,wpsrc(E2,

(v1condv2)⇒ wpsrc(S,INV,excPostsrc,m) ∧ ¬(v1condv2)⇒nPostsrc ,excPostsrc,m)v2,excPost src_, m)v1 = wp(e0_{+ 1} _,_m₎

From Lemma 7.1.2.3, we get thate0_{+ 1 is a loop entry instruction. From Def. 5.3.1.1 of the}

function inter, we get that

wp(s ,m) = inter(s, e0_{+ 1}_,_m_{) =}

INV ∧ ∀m, m∈modif,(INV⇒wp(e0_{+ 1}_,_m₎₎

From the last results, we obtain that this case holds.

try catch statement Let us remind the definition of the weakest precondition predicate transformer for try catch statement

wpsrc₍_try_{_S_1}_catch₍_ExcType_Var₎_{_S_2}_{, ψ,}_excPost_,_m_{) =}

wpsrc₍_S

1, ψ,excPost(⊕ExcType→wpsrc(S2, ψ,excPost,m)),m) (7.2.2.7)

Also, by definition of the compiler function, we have that the compilation of try catch statement results in the following list of instructions

ps,try{S1}catch(ExcTypeVar){S2}, eqm=

ps,S1, e0qm;e0+ 1 :gotoe;pe0+ 2,S2, e−1qm;e:nop addExcHandler(m, s, e0_{, e}0_{+ 2}_, _ExcType₎

We apply the induction hypothesis overS2 and the initial hypothesis

From Lemma 7.1.2.6, we know that the indexeseande0 _{has the same exception handlers}

∀Exc,¬(Exc<:ExcType)⇒

(findExcHandler(Exc, e0_,_m_._excHndlS_{) =}_{findExcHandler}₍_Exc_{, e,}_m_._excHndlS₎₎_∧

findExcHandler(ExcType, e0_,_m_._excHndlS_{) =}_e0_{+ 2}

From the definition ofexcPostInsand the initial hypothesis about the exceptional postcondition functions for indexeseande0

∀Exc,excPost(⊕ExcType→wp(e0+ 2,m))(Exc) =m.excPostIns(Exc, e0) (7.2.2.8) Lemma 7.1.2.3 tells us that there are no loop edges betweene0 _and_e0_{+ 1,}_e0_{+ 1 and}_e_,_e_and

e+ 1. Thus, we can conclude from the definition of the function inter that

inter(e0, e0+ 1,m) = inter(e0+ 1, e,m) = inter(e, e+ 1,m)

We apply induction hypothesis over the last conclusion and (7.2.2.7), (7.2.2.8), we get that this case holds.

Qed.

As a conclusion of the current chapter, we would like to make several remarks. Such an equivalence between proof obligations on source and bytecode and the fact that we have proof for the soundness of the bytecode verification condition generator gives us the soundness of the source weakest precondition calculus:

Theorem 7.2.3 (Soundness of bytecodewp implies soundness of sourcewp).If the weakest precondition over bytecode programs is sound then the weakest precondition over source programs is sound.

Another point is that here we ignore the difference between names and types on source and bytecode level. In this chapter, we have considered that the compiler does not change variable, class and method names. This is not true for Java compilers, as these names are basically compiled into indexes in the constant pool table. The second point in the formalization presented here is that the compilers compiles boolean types into integer types. This neither holds for real Java compiler. However, this difference is a minor detail and this means that the equivalence between source and bytecode verification conditions in Java can be established modulo names.

7.3 Related work

Several works dealing with the relation between the verification conditions over source and its compilation into a low level programming language exist.

Barthe, Rezk and Saabas in [21] also argue that verification conditions produced over source code and bytecode produced by a nonoptimizing compiler are equivalent. The verification condition generators over source and bytecode are such that verification conditions are discharged immediately when they are generated. This is different from our approach where the verification condition generator are propagated by the verification condition generator up to the entry point of a method body. However, the first technique requires much stronger annotations than the latter. The source language which they use supports method invokation, exception throwing and han- dling. The evaluation of expressions do not throw exceptions which simplifies the reasoning over expressions.

In [99], Saabas and Uustalu present a goto language provided with a compositional structure called SGoto. They give a Hoare logic rules for the SGoto language and compiler from the source language into SGoto. They show that if a source program has a Hoare logic derivation against a pre and postcondition then its compilation in SGoto will also have a Hoare logic derivation in the aforementioned Hoare logic rules for the SGoto language. A limitation of such an approach is that the bytecode logic is defined over structured pieces of code. In particular, in a PCC framework

7.3 Related work 111

this could be problematic. The first reason is that the code producer must supply along with the bytecode and the certificate the structure of the unstructured code that the client has used to generate the certificate. The second reason is that the certificate can be potentially large as it consists of a Hoare logic derivation and thus, contains annotation for every instruction in the bytecode. In a PCC scenario, this could slow down the downloading time of the certificate if it comes via a network or could be problematic if it must be stored on a device where it will be checked especially if the device has limited resources.

In [12], F.Bannwart and P.Muller show how to transform a Hoare style logic derivation on source Java like program into a Hoare style logic derivation of a Java bytecode like program. This solution however has similar shortcoming as the previously cited work.

We would like to make a final remark concerning the ability to use such a scheme. In order that a verification framework be able to exploit the equivalence between source and bytecode verification conditions it must be provided with both verification on source and bytecode. For instance, the Spec# [15] programming system being not provided with a mechanism for source verification may not benefit from this fact.

Chapter 8

Constrained memory consumption

policies using verification

condition generator

Memory consumption policies provide a means to control resource usage on constrained devices, and play an important role in ensuring the overall quality of software systems, and in particular resistance against resource exhaustion attacks. Such memory consumption policies have been previously enforced through static analysis, which yield automatic bounds at the cost of precision, or run-time analysis, which incur an overhead that is not acceptable for constrained devices.

Several approaches have been suggested to date to enforce memory consumption policies for programs; all approaches are automatic, but none of them is ideally suited for TPDs(short for Trusted Personal Devices), either for their lack of precision, or for the runtime penalty they impose on programs:

• Static analysis and abstract interpretations: in such an approach, one performs an abstract execution of an approximation of the program. The approximation is chosen to be coarse enough to be computable, as a result of which it yields automatically bounds on memory consumption, but at the cost of precision. Such methods are not very accurate for recursive methods and loops, and often fail to provide bounds for programs that contain dynamic object creation within a loop or a recursive method;

• Proof-carrying code: here the program comes equipped with a specification of its memory consumption, in the form of statements expressed in an appropriate program logic, and a certificate that establishes that the program verifies the memory consumption specification attached to it. The approach potentially allows for precise specifications. However, existing works on proof carrying code for resource usage sacrifice the possibility of enforcing accurate policies in favor of the possibility of generating automatically the specification and the certificate, in line with earlier work on certifying compilation;

• Run-time monitoring: here the program also comes equipped with a specification of its memory consumption, but the verification is performed at run-time, and interrupted if the memory consumption policy is violated.

Such an approach is both precise and automatic, but incurs a runtime overhead which makes it unsuitable for TPDs.

In this chapter, we study the use of logical methods to specify and verify statically precise memory consumption policies for Java bytecode programs. In particular, we describe a methodology how

In document Vérification de bytecode et ses applications (Page 110-121)