Static Analysis - Comparison with other work

3.3 Comparison with other work

3.3.5 Static Analysis

In Section 2.3.2, we have seen that approximation of reachable terms have been used for the flow analysis of imperative, functional and logic programs. As explained in that section, since all those analysis share a common mechanism, we focus on the analysis of (Jones, 1987; Jones and Andersen, 2007) because it is defined through TRS and thus makes comparison more accurate. Now, we show how to use equational completion to perform a similar analysis. If necessary, the precision of the approximation can even be easily im- proved. We show in particular how we can lift the precision of an approximation from a basic flow analysis to a shape analysis. Instead of building by hand the automata produced

by equational completion, we use the Timbuk tool which will be presented more in detail in Chapter 4.

The contributions of (Jones and Andersen, 2007) are to deal with higher order functions and lazy evaluation. Since the higher order part can be done by reusing their encoding of higher-order function into first order TRS (detailed in Section 3.3.5), we here focus on the lazy evaluation part. This example is interesting because the resulting grammar is fully detailed in their paper so that we can compare. In (Jones and Andersen, 2007), the functional program is directly given in its TRS form:

g(N) -> first(N, sequence(nil)) first(nil, Xs) -> nil

first(cons(one,M), cons(X,Xs)) -> cons(X,first(M,Xs)) sequence(Y) -> cons(Y,sequence(cons(one,Y)))

For any list N composed of n one symbols, the function g computes the list of the n first elements of the infinite list [nil, [one], [one, one], . . .]. Note that this program needs a lazy or outermost evaluation strategy to terminate because the sequence function does not terminate and builds the infinite list [nil, [one], [one, one], . . .]. The initial set of terms is defined by the following automaton:

Automaton A0 States q0 ql qa q1 qnil Final States q0 Transitions g(ql) -> q0 cons(qa,ql) -> ql cons(q1,ql) -> ql cons(q1,qnil) -> ql cons(qa,qnil) -> ql nil -> qnil atom -> qa one -> q1

recognizing all terms of the form g(l) where l is any list of atoms that can be one or any other atom, as in (Jones and Andersen, 2007). Though (Jones and Andersen, 2007) do not comment on it, since the set of atoms is potentially infinite and grammars or automata can only be finite, it is necessary to finitely abstract it. We do this using two distinct constants one representing itself and atom representing all the other atoms distinct from one. In (Jones and Andersen, 2007), the objective is to infer the term structure of possible values for parameters and results of every function f without a priori knowledge on the inputs of the f function. Since completion covers all reachable terms, it covers also those that can be reached by a lazy evaluation. In fact, we can achieve exactly the same flow analysis and obtain the same result using equations defining a similar independent attribute approximation. This can be done using contextual equations (see Section 4.1.2). Recall that the intuition behind an independent attribute approximation for a function f is simply to merge together all possible call values for f . Hence, for the function first which

has two parameters, such an approximation can be defined using the single contextual equation for first: [first(X,Y), first(Z,U)] => [X=Z Y=U]. Similarly, for sequence the equation will be: [sequence(X), sequence(Y)] => [X=Y]. Using those equations, we obtain a completed automaton of 11 states and 18 transitions. Among the transitions, we can find the following subset recognizing the set of results of the g calls: nil -> q13 cons(q10,q13) -> q13 nil -> q10 cons(q3,q10) -> q10 one -> q3

which is the same result as the one obtained by (Jones and Andersen, 2007), i.e. any list whose elements are flat lists of one symbols.

Adapting the approximation to the property to prove

In order to illustrate the impact of equations on the precision of the approximation, we aim at proving some property on the reverse function. This function is classically defined by: append(nil,X) -> X

append(cons(X,Y), Z) -> cons(X, append(Y,Z)) rev(nil) -> nil

rev(cons(X,Y)) -> append(rev(Y), cons(X,nil))

Assume that we want to know what can be the result of rev(l) where l can be any flat list of a, b, c and d (in that order) and such that l contains at least one occurrence of each symbol. The language rev(l) is defined by the following tree automaton:

Automaton A0

States q0 qla qlb qlc qld qnil qf qa qb qc qd Final States q0

Transitions f(qla) -> q0

cons(qa, qla) -> qla cons(qa, qlb) -> qla cons(qb, qlb) -> qlb cons(qb, qlc) -> qlb cons(qc, qlc) -> qlc cons(qc, qld) -> qlc cons(qd, qld) -> qld cons(qd, qnil) -> qld nil -> qnil a -> qa b -> qb c -> qc d -> qd

The expected result is, of course, the language of flat list where symbols are in the opposite order and occur at least once. This can be seen as a shape analysis. If we use an independent attribute approximation, as in the previous section, using the following equations:

[append(X,Y), append(Z, U)] => [X=Z Y=U] [rev(X), rev(Y)] => [X=Y]

the Timbuk tool produces a tree automaton where state q29 recognizes the result of rev(l): nil -> q29 cons(q7,q29) -> q29 cons(q8,q29) -> q29 cons(q9,q29) -> q29 cons(q10,q29) -> q29 d -> q10 c -> q9 b -> q8 a -> q7

This language is the language of flat lists possibly containing symbols a, b, c and d but in any order. This result is coherent with an independent attribute approximation since all call values of append are merged together. For all function f (x, y), we can improve the approximation by merging together the calling values for x on one side and for y on the other side, only if the calling context are similar. This is the same idea that is used to improve a 0-CFA analysis into a 1-CFA analysis: take the direct calling context into account. For the append symbol, for instance, this can be done using the following kind of equations:

[cons(append(X,Y),_), cons(append(Z,U),_)] => [X=Z Y=U] [cons(_,append(X,Y)), cons(_,append(Z,U))] => [X=Z Y=U] [append(append(X,Y),_), append(append(Z,U),_)] => [X=Z Y=U] [append(_,append(X,Y)), append(_,append(Z,U))] => [X=Z Y=U] where we merge call values of append only if the calling context at depth 1 is the same. Even if it improves the precision of the approximation, the resulting automaton still does not preserve the order of symbols in the list. In fact, even by distinguishing between any calling context of depth k ∈ N (like in a k-CFA analysis), the approximation would not be precise enough to obtain the result we expect. However, we can construct a different approximation using the single equation:

append(append(X,Y),Z)=append(X,Z)

Using this equation, we obtain an approximation preserving the order of symbols: the resulting language contains any flat list of d, c, b and a in that order. However, the approximation is still too coarse since there is no guarantee on the occurrence of every symbol in the list. This is due to the fact that, using the previous equation, we have

in particular the following equality: append(append(cons(b, nil), cons(a, nil)), nil) = append(cons(b, nil), nil) meaning that every occurrence of the first term is equivalent to the second. This equation preserve the order of symbols but not their occurrence in the list: the symbol a has disappeared. Finally, it is possible to use the following equations: cons(a, cons(a, X))=cons(a,X)

cons(b, cons(b, X))=cons(b,X) cons(c, cons(c, X))=cons(c,X) cons(d, cons(d, X))=cons(d,X)

expressing more precisely where contractions of infinite lists have to be performed. These equations permit to construct a completed tree automaton whose recognized language is the expected one, and contains 19 states and 59 transitions.

Dealing with higher-order functions

In (Jones and Andersen, 2007) another contribution is to deal with higher order functions encoding them in a curried way into term rewriting systems. First, here is the functional program.

cons = λXλY.(X : Y ) double = λX.(X : X)

map = λF λL.if L = nil then nil else (F X) : (map F Xs) f = λX.(map double X) : (map (cons a) X)

where ’a’ is an atom, ’:’ and ’nil’ are the usual constructors for lists. Now here is its encoding into a TRS, using the curried form, where app stands for the application, fcons stands for ’cons’ and cs stands for ’:’

app(app(fcons, X), Y) -> cs(X, Y) app(double, X) -> cs(X, X)

app(app(map, F), nil) -> nil

app(app(map, F), cs(X, Xs)) -> cs(app(F,X), app(app(map, F), Xs))

app(f,X) -> cs(app(app(map, double), X), app(app(map, app(fcons, a)), X)) In (Jones and Andersen, 2007), the objective is to infer the term structure of possible

values for parameters and results of every function f without a priori knowledge on the inputs of the f function. We can do a similar analysis using Timbuk with the following set of contextual equations:

[app(app(map, X), Y), app(app(map, Z), U)] => [X=Z Y=U] [app(double, X), app(double,Y)] => [X=Y]

Note that equations on double and fcons are even not necessary for the completion to terminate.

Practical contributions

Experimenting with tree automata and completion, quickly requires a tool to help. The main reason is that, contrary to a word automaton, a tree automaton is a formal structure which is uneasy to draw. Hence, reasoning on tree automata on the paper is hard. The other reason is that, even on simple examples, tree automata completion can produce complex and large automata that cannot be managed by hand.

In document Reachability analysis of rewriting for . . . (Page 79-85)