Completeness in Abstract Interpretation - Automatic abstraction for bit vectors using decision

6.3 Formalization

6.5.3 Completeness in Abstract Interpretation

Completeness in abstract interpretation has already been considered by Cousot and Cousot [78, Sect. 7]. However, from their work, it took more than two decades until Giacobazzi et al. [108] provided a constructive characterization of completion for the general case. Most notably, they require a Scott-continuous transformer, which is a very general assumption, and then show how the problem of making abstract interpretations complete can be reduced to minimal extensions or refinements of the domain. In essence, their technique amounts to computing the least fixed point of a characterization that involves repeatedly computing closures. Our method for αV_exact(ϕ) is, of course, a significantly less general result, as we require an expressive (disjunctive) domain D and use incremental SAT solving to converge onto a complete transformer, which is possible due to finiteness of the base domain of Boolean formulae. This approach can indeed be seen as minimally extending the transformer to capture certain effects of ϕ. Later, Giacobazzi and Quintarelli [105] showed that CEGAR indeed fits into methodology of completing abstract interpretations. As far as we are aware, our algorithm is the first that effectively computes complete transformers for bit-vectors.

6 Complete Transformers

6.6 Discussion

The focus of Chap. 3, Chap. 4, and Chap. 5 was over-approximation, with the aim of deriving abstractions and transformers that subsume the reachable states and transition relations of a concrete program, whilst being as precise as possible. The topic of this chapter is different, although refinements based on backward interpretation have been weaved into the forward analysis in Chap. 3.2 already. It is that of computing transformers (or abstractions) that are complete (or under- approximate) w.r.t. the base semantics, i.e., propositional Boolean formulae ϕ, which is achieved using a parametric procedure αV

exact(ϕ). The procedure is itself

parametric in two abstract domains:

• a domain (G, vG) to specify ranges or sets of values, and

• the other domain (U, vU) to specify equalities symbolically.

Note, however, that this limitation to two domains is inconsequential for correctness. It merely entails that transformers can often be represented compactly.

For finite domains, disjunctive completion can always be achieved, and there has been much interest in optimality, e.g., using power-set liftings [98] or closures [106, Sect. 3]. However, as our work is not concerned with domain constructions, but rather the computation of complete transformers on complete domains, we have con- centrated on power-set liftings of standard domains (e.g., octagons paired with affine equalities). The drawback of complete abstract interpretation [108] is its obvious high cost for both, representing the reachable states and deriving transformers. In the worst case, this approach entails that one has to resort to a representation that is as complex as the Boolean base semantics itself. Thus, instead of attempting to perform complete abstract interpretation directly, we apply a method that performs complete backward analysis only upon encountering a potential property violation, trying to find a path to the program entry by going backward step-by-step. Ideally, there are few such warnings.

7 Conclusion

Abstract interpretation provides a methodology that guarantees sound approximations of all states reachable in any concrete execution of a program. Among other algorithmic aspects, correctness of abstract interpretation ultimately depends on correctness of transfer functions for any program statement in any abstract domain used. This is not without problems (cp. [184]) since designing and implementing transfer functions is indeed a challenging task, especially if the operations are low- level and diverse [115], as in binary analysis [9, 185, 186]. Prior to our work, the state-of-the-art in abstraction interpretation for bit-vectors was manual design of transformers for each operation in a program.

7.1 Discussion

This dissertation can be seen as a response to this unsatisfactory situation in the sense that it advocates automatic abstraction as a key component of abstract interpretation frameworks. Particularly, we have discussed a collection of techniques that eliminate the need to handcraft transformers for the widely used numerical domains of intervals, value sets, octagons, convex polyhedra, arithmetical congruences, affine equalities, and bounded polynomials altogether, based on relational encodings of the concrete semantics of instructions and basic blocks. A commonality of all techniques is that they exploit the structure of the underlying abstract domain to guide the search for a sound (and optimal) abstraction. The search is, in turn, implemented using incremental SAT solving.

In a nutshell, the desire to reason about instructions that operate on finite machine words — rather than unbounded integers — manifests itself in two notable design decisions.

Relational Semantics We model each instruction in the domain of propositional Boolean formulae. Encodings for entire blocks are then derived by relational composition of the instructions that constitute the respective block. This approach confers two significant advantages:

1. Low-level operations can straightforwardly be expressed in Boolean logic. 2. Automatic abstraction uses off-the-shelf solvers and thus directly benefits

7 Conclusion

Finite Machine Arithmetic Instead of targeting wrap-around arithmetic using mod- ular domains — which are notoriously difficult to support — we identify over- and underflow modes prior to the analysis, and then derive a transformer for each feasible mode combination. This choice leads to a formulation of transfer functions as guarded updates. Then, a guard describes a class of inputs that satisfy a mode combination, whereas an update stipulates how a class on input is transformed into a class on output.

As a first application, we have discussed the problem of value set analysis for control flow reconstruction in Chap. 3. Indeed, the runtimes of the analysis are much smaller than we expected initially, given that a SAT solver is invoked on any application of a transformer. However, for relational abstract domains such as octagons or convex polyhedra, this form of online evaluation of transformers becomes intractable, which can be seen from the runtimes presented in Chap. 4.4 and Chap. 5.5. For relational abstractions, we thus compute transformers prior to the analysis itself. Computing the output of a block then amounts to evaluating a (linear or polynomial) map, rather than invoking a decision procedure. Apart from being correct by construction in a principled way, our approach features two more interesting properties:

1. We generate symbolic best transformers, which are optimal in the sense that more descriptive abstractions do not exist (in the respective abstract domain). 2. We generate transformers for blocks rather than individual instructions. We

thus obtain more precise transformers than possible by computing abstractions separately, instruction by instruction.

Computing transformers instead of designing them manually eliminates one im- portant cause for incorrect implementations of abstract interpretation and yields optimal abstractions, too. Yet, the correctness argument comes at a price: We have to assume correctness of the underlying decision procedure, which is not always easy to establish [45]. However, we have not observed unexpected results from a solver in our experiments at all and have also cross-checked the generated abstractions against results obtained using explicit-state model checking with [mc]square [207].

A distinguished feature of our framework is that slight variations of the discussed algorithms are sufficient to generate complete (or under-approximate) abstractions rather than over-approximations. This feature makes our approach amenable to the generation of counterexample traces or the elimination of spurious warnings. Even though under-approximation integrates with abstract interpretation as smoothly as over-approximation does, comparably few known techniques intentionally use such constructions; this may be explained by the difficulty of designing descriptive under-approximate or even complete transformers. Specifically, our technique structurally depends on some form of power-set construction as the abstract domain

In document Automatic abstraction for bit vectors using decision procedures (Page 171-175)