Since the representation of program state is a logical predicate, there is an alternative to keeping a complete representation of the state at
7.5 Reasoning about Data Structures and Classes
The contract of the binary search procedure can be specified in a relatively simple, self-contained manner. Imagine, though, that it is part of a module that maintains a dictionary structure (e.g., the relation between postal codes and the nearest airport with air-freight capability). In that case, the responsibility for keeping the table in sorted order would belong to the module itself, and not to its clients. If implemented in a modern object-oriented
language, the data structure would not even be visible to the client, but would rather be encapsulated within a class.
Modular reasoning about programs must follow the modular structure of program designs, with the same layering of design secrets. We must have ways of specifying contracts for classes and other modules that do not expose what the program constructs encapsulate.
Fortunately there are well-developed methods for modular specification and verification of modules that encapsulate data structures.
A data structure module provides a collection of procedures (methods) whose specifications are strongly interrelated. Their contracts with clients are specified by relating them to an abstract model of their (encapsulated) inner state. For example, the behavior of a dictionary object can be abstractly modeled as a set of 〈key,value〉 pairs. Reflecting the desired encapsulation and information hiding, the abstract model of the value of a dictionary structure is the same whether the structure is implemented using sorted arrays, a hash table, or a tree.
A module may be required to establish and preserve certain structural characteristics of the data structure it maintains. For example, if the dictionary structure is maintained as a pair of sorted arrays, then it is the responsibility of the dictionary module to maintain the arrays in sorted order. If the structure is a balanced search tree, then the responsibility is to properly initialize and maintain the tree structure. This is called a structural invariant, and it is directly analogous to a loop invariant. When reasoning about a loop invariant, we begin by showing that it is established when execution first reaches the loop; this corresponds to showing that the data structure is properly initialized. The methods of the data structure module
correspond to paths through the body of the loop. Each method must preserve the structural invariant; that is, if the invariant holds before invocation of the method, then it must still hold when the method returns.
The second responsibility of a class or other data structure module is that its behavior must faithfully reflect the abstract model. To make this precise, one posits an abstraction function that maps concrete object states to abstract model states. The abstraction function for a dictionary object would map the object to a set of 〈key,value〉 pairs. Using the
conventional notation φ for an abstraction function, the contract of the get method of java.util.Map might include a pre-and postcondition that can be expressed as the Hoare triple
Explicit consideration of the abstract model, abstraction function, and structural invariant of a class or other data structure model is the basis not only of formal or informal reasoning
about correctness, but also of designing test cases and test oracles.
Summary
Symbolic execution is a bridge from an operational view of program execution to logical and mathematical statements. The basic symbolic execution technique is like hand execution using symbols rather than concrete values. To use symbolic execution for loops, procedure calls, and data structures encapsulated in modules (e.g., classes), it is necessary to
proceed hierarchically, composing facts about small parts into facts about larger parts.
Compositional reasoning is closely tied to strategies for specifying intended behavior.
Symbolic execution is a fundamental technique that finds many different applications. Test data generators use symbolic execution to derive constraints on input data. Formal
verification systems combine symbolic execution to derive logical predicates with theorem provers to prove them. Many development tools use symbolic execution techniques to
perform or check program transformations, for example, unrolling a loop for performance or refactoring source code.
Human software developers can seldom carry out symbolic execution of program code in detail, but often use it (albeit informally) for reasoning about algorithms and data structure designs. The approach to specifying preconditions, postconditions, and invariants is also widely used in programming, and is at least partially supported by tools for run-time checking of assertions.
Further Reading
The techniques underlying symbolic execution were developed by Floyd [Flo67] and Hoare [Hoa69], although the fundamental ideas can be traced all the way back to Turing and the beginnings of modern computer science. Hantler and King [HK76] provide an excellent clear introduction to symbolic execution in program verification. Kemmerer and Eckman [KE85]
describe the design of an actual symbolic execution system, with discussion of many pragmatic details that are usually glossed over in theoretical descriptions.
Generation of test data using symbolic execution was pioneered by Clarke [Cla76], and Howden [How77, How78] described an early use of symbolic execution to test programs.
The PREfix tool described by Bush, Pincus, and Sielaff [BPS00] is a modern application of symbolic testing techniques with several refinements and simplifications for adequate
performance on large programs.
Exercises
Exercises
7.1
We introduce symbols to represent variables whose value may change, but we do not bother to introduce symbols for variables whose value remains unchanged in the code we are symbolically executing. Why are new symbols necessary in the former case but not in the latter?
7.2 Demonstrate that the statement return dictValues[mid] at line 27 of the binary search program of Figure 7.1 always returns the value of the input key.
7.3 Compute an upper bound to the number of iterations through the while loop of the binary search program of Figure 7.1.
7.4
The body of the loop of the binary search program of Figure 7.1 can be modified as follows:
1 if (comparison < 0) {
2 /* dictKeys[mid] too small; look higher */
3 low=mid+1;
4 }
5 if ( comparison > 0) {
6 /* dictKeys[mid] too large; look lower */
7 high=mid-1;
8 }
9 if (comparison=0) { 10 /* found */
11 return dictValues[mid];
12 }
Demonstrate that the path that traverses the false branch of all three statements is infeasible.
7.5 Write the pre-and postconditions for a program that finds the index of the maximum element in a nonempty set of integers.