Tracking Definitions and Shape-Class Generators

2.3 Shape Analysis

2.3.6 Tracking Definitions and Shape-Class Generators

Instead of directly considering shape classes, we separate two aspects of shape classes. First, a tracking definition provides information about which pointers and which field predicates need to be tracked on a syntactic level. Second, given a tracking definition, a shape-class generator determines which predicates are actually added to the shape class.

A tracking definition D = (T, Ts, Φ) consists of (1) a set T of tracked pointers, which is the set of variable identifiers that may be pointing to some node in a shape graph; (2) a set Ts ⊆ T of separating pointers, which is the set of variable identifiers for which we want the corresponding predicates (e.g., points- to, reachability) to be abstraction predicates (i.e., precisely tracked, no value 1/2 allowed); and (3) a set Φ of field assertions. A tracking definition D = (T, Ts, Φ) refines a tracking definition D0 = (T0, Ts0, Φ0), if T0 ⊆ T , Ts0 ⊆ Ts and Φ0 ⊆ Φ. We denote the set of all tracking definitions by D. The coarsest tracking definition (∅, ∅, ∅) is denoted by D0.

A shape-class generator (SCG) is a function m : D → S that takes as input a tracking definition and returns a shape class, which consists of core predicates, instrumentation predicates, and abstraction predicates. While useful SCGs contain points-to and field predicates for pointers and field assertions from the tracking definition, and the predicate eq, other predicates need to be added by appropriate SCGs. An SCG m refines an SCG m0(denoted by m v m0) if m(D) _{4 m}0(D) for every tracking definition D. We require that the set of SCGs contains at least the coarsest element m0, which is a constant function

that generates for each tracking definition the shape class (∅, ∅, ∅). Further- more, we require each SCG to be monotonic: given an SCG m and two tracking definitions D and D0, if D_{4 D}0, then m(D)_{4 m(D}0).

A shape type T = (σ, m, D) consists of a structure type σ, an SCG m, and a tracking definition D. For example, consider the type σ = {data, succ} (cor- responding to the C type struct node {int data; struct node *succ;}) and the tracking definition D = ({l1, l2}, {l1}, {data = 0}). To form a shape type for a singly-linked list, we can choose an SCG that takes a tracking defi- nition D = (T, Ts, Φ) and produces a shape class S = (Pcore, Pinstr, Pabs) with the following components: the set Pcore of core predicates contains the default binary predicate eq (for distinguishing summary nodes), a binary predicate succ for representing links between nodes in the list, a unary points-to predicate for each variable identifier in T , and a unary field predicate for each assertion in Φ. The set Pinstr of instrumentation predicates contains for each variable identifier in T a reachability predicate. The set Pabsof abstraction predicates contains all core and instrumentation predicates about separating pointers from Ts. More precise shape types for singly-linked lists can be defined by providing an SCG that adds more instrumentation predicates (e.g., cyclicity).

A shape-abstraction specification is a set of shape types. The specification bΨ defines a shape abstraction Ψ in the following way: a shape type T ∈ bΨ yields a shape class S ∈ Ψ with S = T.m(T.D). (We use the notation X.y to denote the component y of a structure X.) Given a program P , the initial shape-abstraction specification bΨ0 is defined as the set {(σ, m0, D0) | σ is a structure type of P };

the initial shape region G0corresponds to >Ψwhere Ψ is the shape abstraction

corresponding to bΨ. Region G0does not constrain the state space; it represents

CHAPTER 3

CONFIGURABLE PROGRAM

ANALYSIS

3.1 Motivation

Automatic program verification requires a choice between precision and efficiency. The more precise a method, the fewer false alarms it will produce, but also the more expensive it is, and thus applicable to fewer programs. Histori- cally, this trade-off was reflected in two major approaches to static verification: program analysis and model checking. While in principle, each of the two approaches can be (and has been) viewed as a subcase of the other [Schmidt 1998; Steffen 1991; Cousot and Cousot 1995], such theoretical relationships have had little impact on the practice of verification. Program analyzers, by and large, still target the efficient computation of few simple facts about large programs; model checkers, by contrast, focus still on the removal of false alarms through ever more refined analyses of relatively small programs. Emphasizing efficiency, static program analyzers are usually path-insensitive, because the most efficient abstract domains lose precision at the join points of program paths. Empha- sizing precision, software model checkers, on the other hand, usually never join abstract domain elements (such as predicates), but explore an abstract reachability tree that keeps different program paths separate.

In order to experiment with the trade-offs, and in order to be able to set the dial between the two extreme points, we have developed and implemented a new framework that permits customized program analyses. Traditionally, customization has meant to choose a particular abstract interpreter (abstract domain and transfer functions, perhaps a widening operator) [Lev-Ami and Sa-

giv 2000; Dwyer and Clarke 1996; Martin 1998; Tjiangan and Hennessy 1992], or a combination of abstract interpreters [Gulwani and Tiwari 2006; Cousot and Cousot 1979; Codish et al. 1993; Lerner et al. 2002]. Here, we go a step further in that we also configure the execution engine of the chosen abstract interpreters. At one extreme (typical for program analyzers), the execution engine propagates abstract domain elements along the edges of the control-flow graph of a program until a fixpoint is reached [Cousot and Cousot 1977]. At the other extreme (typical for model checkers), the execution engine unrolls the control-flow graph into a reachability tree and decorates the tree nodes with abstract domain elements, until each node is ‘covered’ by some other node that has already been explored [Beyer et al. 2007]. In order to customize the execution of a program analysis, we define and implement a meta engine that needs to be configured by providing, in addition to one or more abstract interpreters, a merge operator and a termination check.

The merge operator indicates when two nodes of a reachability tree are merged, and when they are explored separately: in classical program analysis, two nodes are merged if they refer to the same control location of the program; in classical model checking, no nodes are merged. The termination check indicates when the exploration of a path in the reachability tree is stopped at a node: in classical program analysis, when the corresponding abstract state does not represent new (unexplored) concrete states (i.e., a fixpoint has been reached); in classical model checking, when the corresponding abstract state represents a subset of the concrete states represented by another node. Our motivation is practical, not purely theoretical: while it is theoretically possible to redefine the abstract interpreter to capture different merge operators and termination checks within a single execution engine, we wish to reuse abstract interpreters as building blocks, while still experimenting with different merge operators and termination checks. This is particularly useful when several abstract interpreters are combined. In this case, our meta engine can be configured by defining a composite merge operator from the component merge operators; a composite

termination check from the component termination checks; but also a composite transfer function from the component transfer functions.

Combining the advantages of different execution engines for different ab- stract interpreters can yield dramatic results, as was shown by predicated lat-

tices [Fischer et al. 2005]. That work combined predicate abstraction with a data-flow domain: the data-flow analysis becomes more precise by distinguishing different paths through predicates; at the same time, the efficiency of a lattice-based analysis is preserved for facts that are difficult to track by

predicates. However, the configuration of predicated lattices is just one pos- sibility, combining abstract reachability trees for the predicate domain with a join-based analysis for the data-flow domain. Another example is lazy shape

analysis [Beyer et al. 2006], where we combined predicate abstraction and shape

analysis. Again, we ‘hard-wired’ one particular such combination: no merging of nodes; termination by checking coverage between individual nodes; Cartesian product of transfer functions. Our new, configurable implementation permits the systematic experimentation with many variations. We show that different configurations can lead to large, example-dependent differences in precision and performance. In particular, it is often useful to use non-Cartesian transfer functions, where information flows between multiple abstract interpreters, e.g., from the predicate state to the shape state (or lattice state), and vice versa. By choosing suitable abstract interpreters and configuring the meta engine, we can also compare the effectiveness and efficiency of symbolic versus explicit repre- sentations of values, and the use of different pointer alias analyses in software model checking.

In recent years we have observed a convergence of historically distinct program verification techniques. It is indeed difficult to say whether our configurable verifier is a model checker (as it is based on Blast) or a program an-

alyzer (as it is configured by choosing a set of abstract interpreters and some parameters for executing and combining them). We believe that the distinction is no longer practically meaningful (it has not been theoretically meaningful for some time), and that this signals a new phase in automatic software-verification tools.

In document Software Verification by Combining Program Analyses of Adjustable Precision (Page 57-61)