• No results found

3.4 Infrastructure for Android Bytecode Certification

4.1.1 Overview of DEX Bytecode

A program Pis given by its list of instructions in Figure 4.1. The set Ris the set of DEX virtual registers,V is the set of values,PP is the set of program points, andρis

the mapping from registers to values.

As in the case for JVM, we assume that the program comes equipped with the set of class names C and the set of fields F. The program will also be extended with array manipulation instructions and the program will come parameterized by the set of available DEX typesTD analogous to Java typeTJ. The DEX language also deals with method invocation. As for JVM, DEX programs will also come with a set

m of method names. The method name and signatures themselves are represented explicitly in the DEX file, as such the lookup function required will be different from the JVM counterpart in that we do not need the class argument. Thus in the sequel, we will remove this lookup function and overload that method ID to refer to the code as well. DEX uses two special registers. We will useretfor the first one which can hold the return value of a method invocation. In the case of a moveresult, the instruction behaves like amoveinstruction with the special registerretas the source register. The second special register is ex which stores an exception thrown for the next instruction. Figure 4.1 contains the list of DEX instructions.

4.1.2 Operational Semantics

A state in DEX is just⟨i,ρ,h⟩where theρ here is a mapping from registers to values

andh is the heap. It is similar to that of the JVM, with several differences, e.g., the state in DEX does not have an operand stack, but its functionality is covered by the registers (local variables) ρ. The function fresh∶Heap → L is an allocator function

that given a heap returns the location for that object. The function default∶ C → O returns for each class a default object of that class. For every field of that default object, the value will be 0 if the field is of numeric type, and null if the field is of object type. Similarly for defaultArrayN× TD → (N⇀ V). The ↝relation which defines transitions between state is↝⊆State× (State+ (V ×Heap)).

The notation op denotes here the standard interpretation of arithmetic operation of op in the domainV of values (although there is no arithmetic operation on loca- tions). The operator⊕denotes the function whereρ⊕ {r↦v}means a new function ρ′such that∀i∈dom(ρ)/{r}.ρ′(i) =ρ(i)andρ′(r) =v. The operator⊕is overloaded

§4.1 Syntax, Semantics, and Type System for Android Bytecode 61

binopop r,ra,rb Do the binary operationopon the values contained in

ra andrb, then store the result inr. const r,v Fill registerrwith the constant valuev. move r,rs Copy the value stored inrsto r.

ifeq r,t Conditional jump ifrhas the value of 0.

ifneq r,t Conditional jump ifrhas the value of anything but 0. goto t Unconditional jump to the program pointt.

return rs Return the value stored inrs.

new r,c Create a new object of classcand put the reference inr

iget r,ro,f Get the value contained in the field f of object referenced byρ(ro) and store it in registerr. iput rs,ro,f Read the value contained in registerrsand store it

in the field f of object referenced byρ(ro).

newarray r,rl,t Create a new array of typet which hasrl number of elements and put the reference to the array inr. arraylength r,ra Store the length of array referenced byra in registerr. aget r,ra,ri Get the value stored in array referenced byρ(ra)at

indexρ(ri)and put it in registerr.

aput rs,ra,ri Put the value stored in registerrsto array referenced byρ(ra)at indexρ(ri).

invoke n,m,p⃗ Invokeρ(⃗p[0]).mwithnarguments stored in ⃗p

moveresult r Store invoke’s result tor. This instruction has to be placed directly after invoke, otherwise the result is lost. throw r Throw the exception object stored inr

moveexception r Store exception inr. This instruction has to be the first instruction in the handler region.

whereop∈ {+,−,×,/},v∈Z,{r,ra,rb,rs} ∈ R,t∈ PP,c∈ C,f ∈ F andρ∶ R →Z.

Pm[i] =const(r,v) ⟨i,ρ,h⟩ ↝m,Norm⟨i+1,ρ⊕ {r↦v},h⟩ Pm[i] =move(r,rs) rs∈dom(ρ) ⟨i,ρ,h⟩ ↝m,Norm⟨i+1,ρ⊕ {r↦ρ(rs)},h⟩ Pm[i] =binop(op,r,ra,rb) ra,rb∈dom(ρ) n=ρ(ra)opρ(rb) ⟨i,ρ,h⟩ ↝m,Norm⟨i+1,ρ⊕ {r↦n},h⟩ Pm[i] =goto(t) ⟨i,ρ,h⟩ ↝ ⟨t,ρ,h⟩ P[i]m =ifeq(r,t) ρ(r) =0 ⟨i,ρ,h⟩ ↝m,Norm⟨t,ρ,h⟩ Pm[i] =ifeq(r,t) ρ(r) ≠0 ⟨i,ρ,h⟩ ↝m,Norm⟨i+1,ρ,h⟩ P[i]m =return(rs) rs∈dom(ρ) ⟨i,ρ,h⟩ ↝m,Normρ(rs),h Pm[i] =new(r,c) l=fresh(h) ⟨i,ρ,h⟩ ↝ ⟨i+1,ρ⊕ {r↦l},h⊕ {l↦default(c)}⟩

Pm[i] =iget(r,ro,f) ρ(ro) ∈dom(h) f ∈dom(h(ρ(ro)))

⟨i,ρ,h⟩ ↝m,Norm⟨i+1,ρ⊕ {r↦h(ρ(ro)).f},h⟩

Pm[i] =iget(r,ro,f) ρ(ro) =null l′=fresh(h)

⟨i,ρ,h⟩ ↝m,npRuntimeExcHandling(h,l′, np,i,ρ)

Pm[i] =iput(rs,ro,f) ρ(ro) =null l′=fresh(h)

⟨i,ρ,h⟩ ↝n,npRuntimeExcHandling(h,l′, np,i,ρ)

Pm[i] =iput(rs,ro,f) ρ(ro) ∈dom(h) f ∈dom(h(ρ(ro)))

⟨i,ρ,h⟩ ↝n,Norm⟨i+1,ρ,os,h⊕ {ρ(ro) ↦h(ρ(ro)) ⊕ {f ↦ρ(rs)}}⟩

Pm[i] =newarray(r,rl,t) l=fresh(h) ρ(rl) ≥0

⟨i,ρ,h⟩ ↝ ⟨i+1,ρ⊕ {r↦l},h⊕ {l↦ (ρ(rl),defaultArray(ρ(rl),t),i)}⟩

Pm[i] =arraylength(r,ra) ρ(ra) ∈dom(h)

⟨i,ρ,h⟩ ↝m,Norm⟨i+1,ρ⊕ {r↦h(ρ(ra)).length,h}⟩

Pm[i] =arraylength(r,ra) ρ(ra) =null l′=fresh(h)

⟨i,ρ,h⟩ ↝m,npRuntimeExcHandling(h,l′, np,i,ρ)

Pm[i] =aget(r,ra,ri) ρ(ra) ∈dom(h) 0≤ρ(ri) <h(ρ(ra)).length

⟨i,ρ,h⟩ ↝m,Norm⟨i+1,ρ⊕ {r↦h(ρ(ra))[ρ(ri)]},h⟩

Pm[i] =aget(r,ra,ri) ρ(ra) =null l′=fresh(h)

⟨i,ρ,h⟩ ↝m,npRuntimeExcHandling(h,l′, np,i,ρ)

Pm[i] =aput(rs,ra,ri) ρ(ra) ∈dom(h) 0≤ρ(ri) <h(ρ(ra)).length

§4.1 Syntax, Semantics, and Type System for Android Bytecode 63

Pm[i] =aput(rs,ra,ri) ρ(ra) =null l′=fresh(h)

⟨i,ρ,h⟩ ↝m,npRuntimeExcHandling(h,l′, np,i,ρ) Pm[i] =moveresult(r) r∈dom(ρ) ⟨i,ρ,h⟩ ↝m,Norm⟨i+1,ρ⊕ {r↦ρ(ret)},h⟩ Pm[i] =invoke(n,m′,p⃗) ⃗p∈dom(ρ) ⟨1,{⃗x↦ ⃗p},h⟩ ↝+m′ v,h′ ⟨i,ρ,h⟩ ↝m,Norm ⟨i+1,ρ⊕ {ret↦v},h′⟩ Pm[i] =invoke(n,m′,p⃗) ⃗p∈dom(ρ) ⟨1,{⃗x↦ ⃗p},h⟩ ↝+m′⟨l′⟩,h′

e=class(h′(l′)) Handlerm(i,e) =t e∈excAnalysis(m′)

⟨i,ρ,h⟩ ↝m,e ⟨t,ρ⊕ {ex↦l′},h′⟩

Pm[i] =invoke(n,m′,p⃗) l′=fresh(h) ρ(⃗p[0]) =null

⟨i,ρ,h⟩ ↝m,npRuntimeExcHandling(h,l′, np,i,ρ)

Pm[i] =invoke(n,m′,p⃗) ⃗p∈dom(ρ) ⟨1,{⃗x↦ ⃗p},h⟩ ↝+m′⟨l′⟩,h′

e=class(h′(l′)) Handlerm(i,e) ↑ e∈excAnalysis(m′)

⟨i,ρ,h⟩ ↝m,e⟨l′⟩,h′

Pm[i] =throw(r) ρ(r) ∈dom(h) e=class(h(ρ(r)))

Handlerm(i,e) =t e∈classAnalysis(m,i)

⟨i,ρ,h⟩ ↝m,e⟨t,ρ⊕ {ex↦ρ(r)},h⟩

Pm[i] =throw(r) ρ(r) ∈dom(h) e=class(h(ρ(r)))

Handlerm(i,e) ↑ e∈classAnalysis(m,i)

⟨i,ρ,h⟩ ↝m,e⟨ρ(r)⟩,h

Pm[i] =throw(r) l′=fresh(h) ρ(r) =null

⟨i,ρ,h⟩ ↝m,npRuntimeExcHandling(h,l′, np,i,ρ)

Pm[i] =moveexception(r) r∈dom(ρ)

⟨i,ρ,h⟩ ↝m,Norm⟨i+1,ρ⊕ {r↦ρ(ex)},h⟩

RuntimeExcHandlingHeap× L × C × PP × (R ⇀ V) →State+ (L ×Heap) defined as

RuntimeExcHandling(h,l′,C,i,ρ) =

{ ⟨t,ρ⊕ {ex↦l′},h⊕ {l′↦default(C)}⟩ ifHandlerm(i,C) =t

⟨l′⟩,h⊕ {l′↦default(C)} ifHandlerm(i,C) ↑

There is also an additional partial functionHandlerm∶ PP × C ⇀ PP for method

mwhich gives the handler address for a given program point and type of exception. Given a program point i and a thrown exception c, if Handlerm(i,c) = t then the control will be transferred to program pointt, if the handler is undefined (denoted byHandlerm(i,c) ↑) then the exception is uncaught in methodm.

Figure 4.2 shows the operational semantics for DEX instructions. Here we give the intuitions for each of the instruction:

const(r,v): updates the mapping for register r with the value v. This instruction is the equivalent of JVM instruction push v. Whereas push always put the constant valuevat the top of the stack; theconstspecifies where the value will be stored.

move(r,rs): takes the value contained inrsand then update the mapping for register

r with this value. This instruction simulates the instructionstore xandloadx

in the JVM. The only difference between them is that the JVM relates the top of the stack and a particular local variablex(storeputs the value at the top of the stack to the local variable x, and loadputs the value of the local variable x to the top of the stack) while there is no restrictions formove.

binop(op,r,ra,rb): takes the value contained inra andrb, and apply the binary op- eration to these values. The mapping for register r is then updated with the resulting value. This instruction is the equivalent of JVM instructionbinop op, but with a difference in that the operands and the place to put the result in JVM are fixed (always pop the top two values from the stack, and then place the result on top of the stack) but in DEX we have to specify the registers (ra andrbfor the operands andr as the target register).

goto(t): transfer the next program point to execute to j. This instruction behaves exactly like its JVM counterpart.

ifeq(r,t): transfer the next program point to execute to t if the value contained in registerris 0. Otherwise, the next program point to execute is the next program point. This instruction behaves really similar to its JVM counterpart. The only difference is that in JVM, the conditional jump depends on the top of the stack whereas in DEX it depends on the register specified.

return(rs): ends the execution with the value contained in registerrs. Just likeifeq, it is really similar to JVM’sreturnexcept that JVM always return a value from the top of the stack and DEX can specify from which register it will return a value.

new(r,c): updates the mapping for register r with the created new object of classc

initialized with the default value. JVM also has new with the difference that in JVM, the newly instantiated object reference will be put on top of the stack while in DEX we have to specify where we will store the reference.

§4.1 Syntax, Semantics, and Type System for Android Bytecode 65

iget(r,ro,f): get the value contained in the field f of the object whose reference is contained in ro, and then update the mapping for register r with this value. If there is no exception, the instruction succeeds, and the program continues with the next program point. In the case where there is an exception, the behavior of the program is modeled by the function RuntimeExcHandling, which will creates an exception object and then either transfer the program point to the handler of such exception if it exists, or terminates the exception with the exception object. Such behavior is exhibited by all of the exception throwing instructions, so we will not mention this behavior for the rest of the exception throwing instructions. It is the DEX version ofgetfield f, with more flexibility to specify where we the reference of the object is (ro) and where we will put the value (r). In JVM, it will pop the object reference that is at the top of the stack and then put back the value to the top of the stack.

iput(rs,ro,f): get the value contained in register rs and then copy the value to the field f of the object whose reference is contained in ro. iput is an exception throwing instruction, so the behavior mentioned inigetis also applicable here. In JVM,putfield f will pop top two values from the stack for the source value and the object reference, whereas in DEX we have to specify the register con- taining the source value (rs) and the register containing the object reference (ro).

newarray(r,rl,t): create a new array of type t with the length of ρ(rl) initialized with the default value, and then update the mapping for register r with the reference to this newly created array. It is really similar to newarray t in that it is creating a new array of type t with a certain length. The difference lies in where do we get the length of the array from, and where will we put the reference for the newly created array. In JVM, the length of the array is the value at the top of the stack, and the reference will be put at the top of the stack, while in DEX the length of the array is contained in register ri and the reference will be put inr.

arraylength(r,ra): update the mapping for register r with the length of the array whose reference is contained in registerra. arraylengthis an exception throw- ing instruction, so the behavior mentioned in iget is also applicable here. As usual, it behaves similarly to its JVM counterpart except that in JVM, the array reference is always located at the top of the stack, and the length of the array will be put on top of the stack. DEXarraylength will take the array reference that is stored inra and then put the length in registerr.

aget(r,ra,ri): get the value of the array whose reference is contained in ra at index

ρ(ri) and then update the mapping for register r with this value. aget is an exception throwing instruction, so the behavior mentioned in iget is also ap- plicable here. arrayloadin JVM uses the top two values from the stack for the array reference and its index, and then put the content of the array on top of

the stack. It is different compared to DEX where the array location, array index, and the register to store the value are flexible.

aput(rs,ra,ri): get the value contained in registerrs and then update the value con- tained in the array (whose reference is contained in register ra) at index ρ(ri) with this value. aput is an exception throwing instruction, so the behavior mentioned in iget is also applicable here. arraystore in JVM uses the three two values from the stack for source value, the array reference, and its index. It is different compared to DEX where the array location, array index, and the register which contain the source value are flexible.

moveresult(r): get the value contained in the pseudo register ret and update the mapping for register r with this value. There is no JVM equivalent of this instruction.

invoke(n,m,p⃗): execute the methodmwith the parameters contained in registersp⃗. The resulting value of executing the method will be contained in the pseudo registerret. invokeis an exception throwing instruction, so the behavior men- tioned inigetis also applicable here. There are several differences with JVM’s invoke. Firstly, in JVM the number of parameters nis implicit in the method definition. Secondly, the parameters for the method invoked in JVM are lo- cated on top of the stack, whereas in DEX we can specify which registers are the parameters for the method.

throw(r): throw the exception contained in the registerr. If the handler for such ex- ception exists, the program point is then transferred to the handler. Otherwise, the program will ends abruptly with the exception as the result. The difference with its JVM counterpart is the location of the exception. In JVM, the exception is always obtained from the top of the stack, whereas in DEX it is obtained from the registerr.

moveexception(r): move the value contained in the pseudo register ex and then update the mapping for register r with this value. This instruction only ever appears as the first instruction in an exception handler. There is no equivalent of this instruction in JVM.

Successor RelationThe successor relation closely resembles that of the JVM; in- structions will have its next instruction as the successor, except jump instructions, return instructions, and instructions that throw an exception. As with JVM, when- ever it is clear from the context, we will drop the tag to reduce clutter.

• Pm[i] =gotot. The successor relation isi↦Normt

• Pm[i] = ifeq t or Pm[i] = ifneq t. In this case, there are 2 successor relations denoted byi↦Normi+1 andi↦Normt.

§4.1 Syntax, Semantics, and Type System for Android Bytecode 67

• Pm[i]is an instruction throwing a null pointer exception, and there is a handler for it(Handler(i, np) =t). In this case, the successor ist denoted byi↦npt. • Pm[i]is an instruction throwing a null pointer exception, and there is no handler

for it(Handler(i, np) ↑). In this case it is a return point denoted byi↦np. • Pm[i] =throw, throwing an exceptionC∈classAnalysis(m,i), and the handler

isHandler(i,C) =t. The successor relation isi↦C t.

• Pm[i] =throw, throwing an exceptionC∈classAnalysis(m,i), and the handler isHandler(i,C) =↑. It is a return point, and the successor relation isi↦C. • Pm[i] = invoke mID, throwing an exception C ∈ excAnalysis(mID), and the

handler isHandler(i,C) =t. The successor relation isi↦C t.

• Pm[i] = invoke mID, throwing an exception C ∈ excAnalysis(mID), and the

handler is Handler(i,C) ↑. It is a return point, and the successor relation is

i↦C.

• Pm[i]is any other cases. The successor is its immediate instruction denoted by

i↦normi+1

4.1.3 Type System

The transfer rules of DEX are defined in terms of register typingrt∶ (R → S)instead of stack typing. Note that this registers typing is total and is restricted to the number of registers used in a method, which is statically determined, so we know in advance how many registers there are. To be more concrete, if a method only uses 16 registers, then rt is a map for these 16 registers to security levels, as opposed to the whole number of possible 65535 registers according to And [2017].

The typing system is formally parameterized by :

Γ: a table of method signatures, needed to define the transfer rules for method invo- cation;

ft: a map from fields to their global policy level; CDR: a structure consisting of (region,jun).

se: a security environment

sgn: the method signature of the current method

rt: register typing after the instruction is executed

Thus the complete form of a judgment parameterized by a tagτ∈ {Norm+ C}is

Γ,ft,region,se,sgn,i⊢τ RT

although in the case where some elements are unnecessary, we may not write the full notation whenever it is clear from the context. In the table of operational semantics,