Bounded Symbolic Execution Using Incremental Constraint Solving
7.2 Our Approach
Our approach can be used in the verification mode to check whether the desired property holds for a program, or in the testing mode to generate test inputs.
Given as input a parametrized program (annotated by specifications), the class bounds, and the loop bounds, our approach explores the program in depth-first order and checks the path conditions using an incremental SMT solver. In the verification mode, our approach terminates either when it finds a program input that leads the program to satisfies the negated specifications (i.e., a counterexample to the specifications), or when all paths have been explored.
In the testing mode, our approach ignores the specifications and terminates when it explores all paths corresponding to the different combinations of decisions. Our symbolic execution approach is divided into three steps. First, the decision graph is constructed from the analyzed program. Second, path conditions are constructed from the decision graph. Third, path conditions are solved using an incremental solver in path exploration.
7.2 Our Approach 113 7.2.1 Construction of the Decision Graph
A central part of our approach is the construction of a decision graph—a com-pact version of the verification graph—to support path condition generation and path exploration. We build this graph from an acyclic verification graph which is constructed from the analyzed program as shown in Section 3.1. In addition to the code transformations used in the construction of the verifica-tion graph, our approach uses heuristics to detect loops with concrete (i.e., non-symbolic) conditionals. For example, it identifies that the following loop iterates exactly K times for(int i=0; i<K; i++). In addition, constants are unfolded and unreachable code is removed. Using constant (and null-value) propagation, we avoid adding unnecessary exceptional branches. For example, there is no need to prepend an exceptional branch on x.f=y; when the object x is known to be not null at the field access. On the other hand, if xis known to be null on the current statement, we replace this statement by an edge that targets one error/exit node of the decision graph. Note that as the decision graph is acyclic, each variable definition is dynamically unique, i.e., we can treat them as constants. It is important to note that our approach performs all these transformations and the path condition generation (to be shown in Section 7.2.2) before path exploration.
Figure 7.1 shows a decision graph for the add function in Fig. 7.1(a). The code of function add is only for illustrative purpose; the decision graph in Fig. 7.1(b) is constructed from the add function, where new variables (b1and b2) are introduced and each variable is defined only once similar to SSA [Rosen et al., 1988]. It is important to note that similar effect can be obtained without applying this transformation, e.g., by hashing the assignment statements.
1 //@ ensures \result >= 0;
2 void add(int a, int b) {
3 if (a < 0) {
4 b = a + b;
5 }
6 if (b < 0) {
7 b = -b;
8 }
9 return b;
10 }
a < 0 b1= a + b
a ≥ 0 b1= b
b1< 0 b2= 0 − b1
b1≥ 0 b2= b1
¬(b2≥0)
(a) A simple code (b) Decision graph
Fig. 7.1: A decision graph in (b) for the code in (a). Its edges contain the branch conditions and the basic code block (highlighted).
7.2.2 Path Condition Generation
It is well known that sharing of structurally equal expressions can reduce space and time requirements in constraint solving, especially when dealing with large constraints. Modern SMT solvers identify the sharing automatically, but there is cost associated with it and the mechanism to identify sharing is non-optimal for the analyzed programs. Aware of that, we construct the path conditions in a representation that facilitates the identification of the sharing.
In particular, we translate the branch conditions and assignment statements on the edges of the decision graph into SMT assertions and variable definitions, respectively, using the rules presented in Fig. 3.6. In contrast to constructing a single long path condition over the program arguments for each branch, we treat the variables defined in the assignment statements as the references to the common sub-expressions (i.e., to the expressions on the right-hand side of the statements), and use them to construct many short path conditions.
Our representation of path conditions brings the information of code level to facilitate the elimination of common sub-expressions in SMT solving.
Consider, for example, the code fragment if(.)a=x+y;if(a+z>10){.}.
With traditional symbolic execution, the path corresponding to the traversal of the true branches is denoted by the constraint ... x + y + z > 10. Our approach, however, translates this constraint into ... a1 = x0+ y0∧ a1+ z0> 10as it identifies that the expression denoted by a1can be reused in other contexts.
The use of such representation increases space requirements, i.e., it increases the number of variables and conjuncts in the constraint. On the other hand, it helps the constraint solver by letting it associate information with newly defined symbols (in this case, a1).
A constraint solver does not admit destructive state updates; symbols that have been defined in the stack cannot be reassigned. For that reason, to enable the use of incremental solving it is necessary to use a functional program representation (e.g., an SSA-like program representation) whose variables can be assigned only once. This can be obtained explicitly in constructing the decision graph, by transforming the program into a functional representation, or implicitly, by renaming symbols on-the-fly during state-space exploration.
In our approach, each sub-expression that is reused triggers the definition of a new frame in the assertion stack. Symbolic execution restores state when backtracking by selectively dropping frames from the assertion stack.
Figure 7.2 shows side-by-side the SMT formulas produced with this op-timization disabled (Stack) and enabled (SymbolicJ) for the add method pre-sented in Fig. 7.1(a). In contrast to Stack, that generates fresh constraints on decision points, SymbolicJ reuses expressions. For example, in Fig. 7.2(b), SymbolicJ renames variable b_1 in query 1 to refer to a + b, and uses it in queries 2 and 4. Later in this chapter we evaluate how such transformation can speedup stack-based constraint solving.
7.2 Our Approach 115
(assert (not (>= b_2 0))) (check-sat) ; unsat
(define-fun b_2 () Int b_1) (push) ; query 5
(assert (not (>= b_2 0))) (check-sat) ; unsat (define-fun b_1 () Int b) (push) ; query 7
(assert (< b_1 0)) (check-sat) ; sat
(define-fun b_2 () Int (- 0 b_1)) (push) ; query 8
(assert (not (>= b_2 0))) (check-sat) ; unsat
(define-fun b_2 () Int b_1) (push) ; query 10
(assert (not (>= b_2 0))) (check-sat) ; unsat (pop)
(pop) (pop) (exit)
Fig. 7.2: The SMT-LIB scripts expressing path conditions of the Java add method presented in Fig. 7.1(a). They are generated using stack-based ap-proaches for the verification of the add method. Modern SMT solvers provide an assertion stack to incrementally solve the problems that share similar sets of definitions and assertions. SMT-LIB provides push and pop commands to manipulate such stack. Each stack frame stores an assertion set, which includes locally-scoped functions and logical formulas. The command (check-sat) returns sat if the conjunction of all assertions sets in the stack is satisfiable, or unsatotherwise. The SMT comments, i.e., the texts following the semicolon mark “;”, indicate what happens during exploration.
7.2.3 Path Exploration
Our technique takes as input the root of the decision graph, an initial model (i.e., a concrete input vector), and an optional time-budget for exploring the program state space. To facilitate illustration we omit the time-budget and define our path exploration as a recursive depth-first search (DFS) through the decision graph. It explores the edges of the decision graph according to the decisions from the current model. By construction, it elaborates a satisfiable stack of assertions as it drives execution towards a feasible path. Consequently, it only explores a new path when execution hits a dead-end. In that case, either the desired property holds for the explored path (using the verification mode), or a new test case is generated (using the testing mode). Then our approach backtracks exploration to the last unvisited path in the decision graph.
Algorithm 4 shows the algorithm of path exploration. We use the following functions to support our definition.
• genTest produces a test case for the input model;
• pushContext and popContext are wrappers for SMT-LIB commands pushand pop, respectively;
• loadDefsAndAsserts augments the logical context of the solver with definitions and assertions passed as argument;
• check-sat-and-get-model returns a feasible model if the set of asser-tions passed as argument is satisfiable; it returns null otherwise;
• hasSyntheticAsserts indicates if the edge has the assertion intro-duced in the loop unrolling. It returns true if the argument edge associates to the branch of the bound-hit iteration of the loop;
• eval checks if the decision associated to a branch is satisfied with concrete input.
• terminates denotes the overall symbolic execution terminates.
7.3 Evaluation
This section presents the experiments we have conducted to evaluate vari-ous techniques for symbolic execution. We aim to understand the extent to which constraint solving can be optimized. Our hypothesis is that two factors are important to determine efficiency of symbolic execution: (i) the use of incremental solving (since many path constraints from symbolic execution are similar), and (ii) the use of common sub-expressions elimination (since clause sharing plays an important role in constraint solving).
7.3.1 Experimental Setup
We considered five techniques to evaluate the effectiveness of the cache-based and stack-based approaches to incremental solving, and to investigate the
7.3 Evaluation 117 Algorithm 4Path Exploration Algorithm
1: functionTRAVERSE(node, model) 2: ifmodel == null then 3: return
4: end if
5: ifnode.hasNoChildren then