The IR Type System - Grabowski, Robert (2012): Information flow analysis for mobile code

5.3.3 Semantics Preservation

The transformation algorithm preserves the semantics: if a bytecode method execution starting in an initial storesand heaphresults in a final stores0and final heaph0(with an empty operand stack), then the execution of the corresponding IR method in the same initial storesand heaphresults in the same final stores0_{and heap}_h0_{(with a}

default temporary variable store).

Theorem 5.6 Let PBC be a bytecode program, and PIR be an IR program such that

PIR=BC2IR(PBC). Let m be a method,¦be a domain lattice, s,s0be stores, and h,h0be

heaps. If (s,h) ==⇒m BC ¦_(s0_,h0₎ then (s,h) =⇒m IR ¦_(s0_,_h0₎

The full proof of this theorem is given in Appendix B. It relies on the correctness of theBC2IRinstrfunction: if the input stackasdescribes the initial operand stack, then

the output stackas0describes the final operand stack of the execution of the bytecode instruction:

Proposition 5.7 Suppose(m,i,s,h,ρ) −→B ¦(m,i0,s0,h0,ρ0). Let st be a temporary vari-

able store, and as,as0_{be abstract stacks and}_I_{be an IR instruction such that}

JasK

s,st,h=ρ and BC2IRinstr(i,B,as)=(I,as

0_).

Then there exists a temporary variable store s0

tsuch that(m,i,s,st,h) I −→¦(m,i0_,_s0_,_s0 t,h0) and Jas 0 K ¦ s0_,_s0 t,h0=ρ 0_.

5.4 The IR Type System

This section presents an information flow type system for the intermediate language. With the translation from bytecode and the semantics preservation result, one obtains a technique to prove noninterference property for bytecode programs.

The type system follows the spirit of the system for the (high-level) DSD language. In fact, as IR code contains some constructs of DSD, the type system reuses high-level DSD typing judgements for these constructs. Moreover, we shall see later that the compilation to bytecode and the translation to IR code are type preserving — that is, if the original DSD program is typable, then the corresponding IR program is typable, too.

The IR type system follows the small-step semantics of the IR language. For example, if there is an instructionIsuch that (m,i)7−→I (m,i0), then it assigns toia preconditionQ and an initial labelpc, and toi0a postconditionQ0and a final labelpc0that hold before and after the execution of the instruction, respectively. A program is well-typed if it is possible to assign to each address a unique typing information in this way.

5.4.1 The Purpose of the Pseudo-Instructions

A common problem for information flow type systems for unstructured code is that the instructions on which a subbranch ranges are not obvious from the syntax. This information, however, is needed to update thepclabel in order to prevent indirect leaks. While it is easy to “raise” thepclabel at a branching point, an expressive type system also needs to safely “lower” thepc label back to the previous level at the correspondingconfluence point, which is the first instruction address that all execution paths that start at the branching point eventually reach.

A common approach is to provide the structural information in form of externally computedcontrol dependence regions, an approach taken for the bytecode type system by Barthe et al [BBR04]. The type system then uses this precomputed information for the manipulation of thepclabel. The authors point out that the information may be easily provided by a compiler, because it has access to the control flow structure from the program’s high-level syntax. The problem, however, is that this delegation to exter- nal computation mechanisms introduces a new structure which must be trusted or verified: the computed regions must satisfy a set of safe over-approximation properties to ensure that they indeed describe the subbranches.

To simplify the proofs, I therefore follow the approach by Medel, Compagnoni, and Bonelli [MCB05] by using pseudo-instructions that indicate the start and the end of a subbranch. These instructions need to be available in the IR code. In contrast to control dependence regions, they are directly verified by the type system itself. Furthermore, a bytecode compiler can be easily extended to additionally produce these pseudo- instructions. Alternatively, if arbitrary bytecode without pseudo-instructions shall be analysed, one may compute control dependence regions and use this information to determine the points where the pseudo-instructions should be inserted.

5.4.2 Confluence Point Stacks

The typing judgement maintainsconfluence point stacks, which contain information before and after each instruction. These are stacks of pairs of program addresses and labels:

∆∈(Adr×Lab)∗

The top element on the stack indicates the address of the confluence point where the current subbranch ends, and thepclabel which shall be restored at that point. The

5.4 The IR Type System

element below the top indicates the address and thepclabel of confluence point of the subbranch surrounding the current one, and so on. The stacks are manipulated by the IR pseudo-instructionscpushjandcjmpjin the following fashion:

• cpushjpushes the addressjand the currentpclabel on the stack∆in the type system. In the semantics, the instruction has no effect.

• cjmp jpops the address j and the labelpcfrom the stack∆, jumps toj, and restores the labelpc. In the semantics,cjmpjjumps to the addressj.

The compiler translates a high-level conditional statementifethenS1elseS2as

follows: first, acpushj instruction is generated, where j is the address of the first instruction that follows the compiled statement (and also the confluence point). Then, the actual statement is compiled, such that the final instruction of both compiled branchesS1 andS2 iscjmp j. This causes the execution to jump to j. In the type

system, thecpush j instruction causes the initialpc label and j to be pushed on the stack. At the branching instruction, thepclabel is possibly raised to a different labelpc0. At the finalcjmpjinstructions, the type system restores the originalpclabel from the stack.

5.4.3 The Type System

In the following, let us fix a methodm. I define a typing judgement

Γ,Γt `i,pc,∆,Q I

−→ i0,pc0,∆0,Q0

which means that with instructionI, it is possible to get from program pointitoi0such that ifpc,∆,Qis valid in the initial state, thenpc0,∆0,Q0is valid in the final state. (I will give the exact notion of validity later.) The judgement is defined by the rules shown in Figure 5.3 on the following page.

The type system relies on the judgementΓ∪Γt` e : `from the high-level type sys-

tem, where the type environments for ordinary and temporary variables are combined. Also, I use on`vQ`0as defined as in the high-level type system.

Assignment blocks are typed with another judgement

Γ,Γt,∆,pc `{Q} a {Q0}

as given by the rules in Figure 5.4 on the next page. Assignment blocks are typed just like sequential compositions of statements in the high-level type system. The typing rules for individual assignments make use of the corresponding high-level typing rules, with the variable type environments combined:Γ∪Γt,pc ` {Q} a {Q0}. As in the

high-level type system, I require that the object reference expressions for field updates and method calls are access pathsπ. The only extension to the original typing rules, apart from takingΓtinto account, is that assigned variables and fields may not occur

Γ,Γt `i,pc,∆,Q0 I −→ i0,pc0,∆0,Q₀0 Q ⇒Q0 Q00 ⇒Q0 Γ,Γt ì,pc,∆,Q I −→ i0,pc0,∆0,Q0 I₌ife j Γ∪Γt` e : ` i0∈{i+1,j} Γ,Γt `i,pc,∆,Q I −→ i0,pct`,∆,Q I=ife j Γ∪Γtè : ` e=`1v`2 Γ,Γt `i,pc,∆,Q I −→ j,pct`,∆,Q∪{`1v`2} I=jmpj Γ,Γt `i,pc,∆,Q I −→ j,pc,∆,Q I=cpushj Γ,Γt ì,pc,∆,Q I −→ i+1,pc, (j,pc) ::∆,Q I₌cjmpj Γ,Γt ì,pc, (j,pc0) ::∆,Q −→I j,pc0,∆,Q I=blocka Γ,Γt,∆,pc `{Q} a {Q0} Γ,Γt ì,pc,∆,Q I −→ i+1,pc,∆,Q0 Figure 5.3: IR type system

Γ,Γt,∆,pc `{Q}² {Q}

Γ,Γt,∆,pc `{Q} a {Q0}

Γ,Γt,∆,pc ` {Q0} as {Q00}

Γ,Γt,∆,pc `{Q} a::as {Q00}

a∈{x:=e,x:=newC(e)} x6∈∆ Γ∪Γt,pc `{Q} a {Q0} Γ,Γt,∆,pc `{Q} a {Q0} a=π.f:=e f 6∈∆ Γ∪Γt,pc ` {Q} a {Q0} Γ,Γt,∆,pc `{Q} a {Q0} a=x:=π.m(e) x6∈∆ f 6=f_δ ⇒ f 6∈∆ Γ∪Γt,pc ` {Q} a {Q0} Γ,Γt,∆,pc `{Q} a {Q0}

5.4 The IR Type System

5.4.4 Well-Typed Programs

Atype mappingΛfor a methodmassociates each instruction addressi∈dom(IR(m)) with a program counter labelpc, a confluence point stack∆, and a constraint setQ. To access the components ofΛ(i), I also writeΛpc(i),Λ∆(i), andΛQ(i) in the following.

We are interested in type mappings that are well-formed with respect to the small-step typing rules:

Definition 5.8 A type mappingΛisderivablefor a method body IR(m)and type envi- ronmentsΓandΓt, if for all i,i0∈dom(IR(m)), whenever(m,i)

7−→(m,i0), then i,i0∈

dom(Λ)andΓ,Γt `i,Λpc(i),Λ∆(i),ΛQ(i) I

−→ i0,Λpc(i0),Λ∆(i0),ΛQ(i0).

An IR method is well-typed if the method signaturemsig(m) can be extended by a type environmentΓtsuch that a type mappingΛcan be derived for its instructions,

where the type information formentry(m) andmexit(m) matches the one given by the signature.

Definition 5.9 A method m iswell-typedwith respect to a signature

msig(m)=[Γ,pc,Q,Q0]

if there exists a type environmentΓtand a type mappingΛsuch that

1. Λis derivable for IR(m),ΓandΓt, and

2. Λ(mentry(m))=(pc,²,Q), and 3. Λ(mexit(m))=(pc,²,Q0).

An IR program PIRiswell-typedif all its methods are well-typed.

5.4.5 Soundness

The following theorem states the main soundness result for IR programs.

Theorem 5.10 If an IR program PIRis well-typed, then it is universally noninterferent.

The proof of the theorem can be found in Appendix C. With the semantics preservation result, we immediately get a soundness result for the underlying bytecode program.

Theorem 5.11 Let PBC be a bytecode program. Let PIR be an IR program such that

PIR=BC2IR(PBC)and PIRis well-typed. Then PBCis universally noninterferent.

PROOF Let m be a method such thatmsig(m)=[Γ,pc,Q,Q0]. Let (s1,h1), (s2,h2),

(s0₁,h0₁), (s0₂,h0₂) be store-heap pairs,βbe bijection,¦be a domain lattice, andk∈Dom¦ be a domain. Furthermore, let

• (s1,h1)|=¦Qand (s01,h01)|=¦Q, and • `¦(s1,h1)∼_βΓ,k(s₁0,h₁0), and • (s1,h1) m ==⇒ BC ¦_(s 2,h2) and (s0₁,h0₁) m ==⇒ BC ¦_(s0 2,h02).

Applying the semantics preservation result (Theorem 5.6 on page 75) to both executions, we know there are corresponding IR executions (s1,h1)

m =⇒ IR ¦_(s 2,h2) and (s0₁,h0₁) =⇒m IR ¦_(s0

2,h02). AsPIRis well-typed, we can apply the soundness theorem 5.10 on

the previous page and get thatmis universally noninterferent; that is, there exists a bijectionβ0⊇βsuch that

• (s2,h2)|=¦Q0and (s02,h02)|=¦Q0and

• `¦(s2,h2)∼Γ_β0,k(s20,h20).

It follows by definition that the bytecode version ofmis universally noninterferent. As this can be shown for each method, the entire programPBC is universally

noninterferent.

In document Grabowski, Robert (2012): Information flow analysis for mobile code in dynamic security environments. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 85-90)