Type inference rules and trails - Preemptive type checking in dynamically typed programs

We introduce trails in the type judgements for both present and future use types. The type inference is therefore expressed using two inductively defined relations written as

hs, T i `_p u : τ and hs, T i `_f u : τ

where s is a truncated execution point and T is a trail. A trail is a set of pairs hs, ui of truncated execution points and variables. They represent the previously visited execution points (together with the variables that triggered the visit) and are used to ensure termination of the inference. This is explained in more detail later on.

Similar to the judgement in the preceding section, hs, T∅i `pu : τ (where T∅ is the empty trail) denotes that u will have type τ after the current instruction has been executed. The judgement hs, T∅i `f u : τ denotes that the variable u is required to have type τ in order to execute the instructions from the current instruction onward without raising a TypeError.

We now define the type inference rules with trails for both present and future use types. The rules for this are given in Figures4.3–4.6.

The axioms for inferring `p(cf. Figure4.3) account for situations in which the present type is fully determined by the current instruction. For example, after loading a constant (Rule pLC) the accumulator is known to have the type of the constant that has just been loaded. The inference rules in Figure4.4all follow a shared pattern: the types of the relevant variables in each previous state are calculated and the present type is the union of these. The relevant variables are instruction dependent. For example, in Rule pSG1 for the instruction SG x the type of x depends on the type of tos in the previous states.

Again, for the rules for `f we have axioms in Figure 4.5 and inference rules in Figure 4.6. Many of the axioms assign a future use type of > to a variable. This type indicates that there are no constraints on this variable and follows in cases where that variable is just about to be

s = hP, pci :: ... hs, xi 6∈ T Ppc = SG x hs_i, T ∪ {hs, xi}i `p tos : τifor each si ∈ prev(s)

hs, T i `px :F τi

pSG1

s = hP, pci :: ... hs, tosi 6∈ T Ppc = LG x hs_i, T ∪ {hs, tosi}i `p x : τifor each si ∈ prev(s)

hs, T i `p tos :F τi

pLG1

s = hP, pci :: ... hs, yi 6∈ T Ppc = SG x x 6= y hs_i, T ∪ {hs, yi}i `p y : τifor each si ∈ prev(s)

hs, T i `p y :F τi

pSG2

s = hP, pci :: ... hs, yi 6∈ T Ppc= LG x hs_i, T ∪ {hs, yi}i `py : τifor each si ∈ prev(s)

hs, T i `py :F τi

pLG2

s = hP, pci :: ... hs, ui 6∈ T Ppc ∈ {RET, JP pc0, CF f } hs_i, T ∪ {hs, ui}i `p u : τifor each si∈ prev(s)

hs, T i `p u :F τi

pRET/JP/CF

s = hP, pci :: ... hs, xi 6∈ T Ppc ∈ {LC c, JIF pc0, strOp, intOp, isInst τ } hs_i, T ∪ {hs, xi}i `px : τifor each si∈ prev(s)

hs, T i `p x :F τi

Figure 4.4: Inference rules for the `pjudgement.

overwritten (Rules fSET and fSG1). Hence, whatever is currently present in these variables is irrelevant. Otherwise, the immediate uses are recorded in the type (Rules fJIF, fSTR and fINT). Two interesting rules are fLG1 and fCF1. In these a variable is used but its contents remain intact so there may be future uses also. In the case of fCF1, f has to be a function type and also whatever it needs to be in the succeeding points. For this purpose we introduce a meet operation on types, written u· . This is defined in the following rules, which are applied in top-down order:

τ u· (τ1t τ2) = (τ u· τ1) t (τ u· τ2) (τ1t τ2)u· τ = (τ1u·τ ) t (τ2u· τ )

τ u· > = τ >u· τ = τ τ u· τ = τ τ1u·τ2 = ⊥

We use trails of computations to prevent cycles in the proof tree when trying to infer the type of a particular variable. Without trails, cycles would occur (see previous section) since some of the rules for `pand `f are defined in terms of other derivations for `por `f in the previous or next execution points in a program. Since a program usually contains cycles in the control flow graph

s = hP, pci :: ... hε, T i `f u : > fINIT Ppc= RET hhP, pci :: ε, T i `f u : > fEND Ppc = SG x hs, xi 6∈ T hs, T i `f x : > fSG1 hs, ui ∈ T hs, T i `f u : ⊥ fTRAIL Ppc = raise hs, ui 6∈ T hs, T i `f u : > fRAISE Ppc ∈ {LC c, LG x, isInst τ } hs, tosi 6∈ T hs, T i `f tos : > fSET Ppc= JIF pc0 hs, tosi 6∈ T hs, T i `f tos : Bool fJIF Ppc = strOp hs, tosi 6∈ T hs, T i `f tos : Str fSTR Ppc = intOp hs, tosi 6∈ T hs, T i `f tos : Int fINT

Figure 4.5: Inference rules for the `f judgement (axioms).

such as loops, these would also manifest themselves in the proof tree of a `p or `f judgement, as these branches along splits/joins in the control flow graph.

However, whenever the current execution point and variable are already present in the given trail, pTRAIL or fTRAIL are applied and the inferred type is ⊥. This is because no more type information can be gained by passing through the same execution point looking for the type of the same variable twice. If this trail entry is not present, the entry is added to the current trail if another rule is applied. Now that we explained the role of trails, we recap the notation hs, T i `p u : τ . This denotes that u will have type τ after the current instruction has been executed, omitting all type information that can be obtained by considering the items in the trail. The type inference algorithms encoded in our inference rules are clearly related to symbolic execution, with some key variations. Our inference rules only relate one variable at a time with each application. In the case of present types and whenever there is a control flow join, the analysis is forked and the result from each is joined using t. In the case of future use types and whenever there is a control flow split, the same happens as well. This helps collapse the state space of the analysis. Another difference to symbolic execution is that we ignore the predicates of conditional jump instructions. This is because we only look at one variable at a time and we do not keep track of the actual possible values inside the variables, but only their type. Due to this design, we also explore some unfeasible paths. Since it is not possible in general to statically determine the iteration bounds in loops and recursion, we need a mechanism to minimise the number of times the analysis iterates through the loops. In our case, we solve this problem using trails.

It is worth noting that the trail sets T are finitely bounded. This is due to the fact that call stacks are truncated to a fixed depth and that, for a given program, there are finitely many code locations and finitely many variables. For a given program, we write TU to denote the maximum trail containing all truncated execution point/variable pairs. Trails greatly facilitate the termination proof in the next section. Furthermore, if the type system is extended such that

s = hP, pci :: ... hs, xi 6∈ T Ppc= LG x hs_i, T ∪ {hs, xi}i `f tos : υifor each si ∈ next(s)

hsi, T ∪ {hs, xi}i `f x : νi for each si ∈ next(s) hs, T i `_f x :F(υ_iu·ν_i) fLG1

s = hP, pci :: hP0, ni :: ... hs, ui 6∈ T Ppc = RET hsi, T ∪ {hs, ui}i `f u : τifor each si ∈ next(s)

hs, T i `_f u :F τ_i fRET

s = hP, pci :: ... hs, tosi 6∈ T Ppc = SG x hsi, T ∪ {hs, tosi}i `f x : τi for each si∈ next(s)

hs, T i `_f tos :F τ_i fSG2

s ∈ hP, pci :: ... hs, yi 6∈ T Ppc ∈ {LG x, SG x} x 6= y hsi, T ∪ {hs, yi}i `f y : τifor each si∈ next(s)

hs, T i `_f y :F τi

fLG2/SG3

s = hP, pci :: ... hs, f i 6∈ T Ppc= CF f hsi, T ∪ {hs, f i}i `f f : τi for each si ∈ next(s)

hs, T i `_f f :F(τiu· Fn)

fCF1

s = hP, pci :: ... hs, ui 6∈ T Ppc = CF f u 6= f hsi, T ∪ {hs, ui}i `f u : τifor each si ∈ next(s)

hs, T i `_f u :F τi

fCF2

s = hP, pci :: ... hs, ui 6∈ T Ppc= JP n hsi, T ∪ {hs, ui}i `f u : τifor each si ∈ next(s)

hs, T i `_f u :F τi

fJP

s = hP, pci :: ... hs, xi 6∈ T Ppc∈ {LC c, JIF n, intOp, strOp, isInst τ } hsi, T ∪ {hs, xi}i `f x : τifor each si ∈ next(s)

hs, T i `_f x :F τi

Figure 4.6: Inference rules for the `f judgement.

the number of equivalence classes of types is no longer finite, the termination proof in the next section would still hold.

In the previous section we showed an example, where we tried to infer present types but ran into a cycle. We shall now use the rules defined in Figure4.5and Figure4.6to derive the future use type of a variable f at hM, 0i for program M :

M = [CF f ; 0 LG f ; 1 intOp; 2 RET 3 ]

Since there is no value that can be used both as a function and as an integer, we expect that the future use type of f be ⊥. We proceed to build the proof tree, which starts with the judgement hhM, 0i, T_∅i `_f f : ⊥. We omit the control flow graph in this example, as there is no branching. This is simply a sequence of ascending execution points.

hhM, 3i, {.., hhM, 2i, f i}i `_f f : >fEND hhM, 2i, {.., hhM, 1i, f i}i `f f : >

hhM, 2i, {.., hhM, 1i, f i}i `f tos : Int fINT

hhM, 1i, {hhM, 0i, f i}i `_f f : Int fLG1 hhM, 0i, T∅i `f f : ⊥

fCF1

We note that as we go up the proof tree, the trail is updated with the last element derived in the tree. For presentation purposes, we omit the previous elements from the trail. Note that the meet operation is used in the conclusion of fCF1 on Fn and Int, and in the conclusion of fLG1 on Int and >.

In the following sections we show that the type inference rules are correct and terminating.

In document Preemptive type checking in dynamically typed programs (Page 66-70)