Potential Solutions - Automated Black Box Generation of Structured Inputs for Use in Software T

With the insights from Section 1.5 in hand, I argue that a solution to the structured black-box input generation problem must satisfy the following criteria:

• The solution must be based on encoding test inputs using logical constraints • The logical constraint language must be expressive

• The solver for these constraints must be capable of generating many (100K+) inputs • The solver for these constraints must be reasonably fast (100+ solutions per minute) Ideally, I also want to reuse existing logics and constraint solvers as much as possible; the topic of this dissertation is black-box testing, not logic and constraint solver design. Fortunately, there are many existing logics and constraint solvers discussed in the liter- ature. An assortment of these are discussed henceforth, along with their relative merits and problems.

1.6.1 SMT Solvers

SMT solvers [45] are incredibly popular for a wide variety of constraint satisfaction problems. For example, the paper for the Z3 SMT solver [47] has been cited over 1,000 times [49], and Z3 is but one of many SMT solvers. Over time, the state of the art in SMT solvers has improved dramatically, thanks in part to regular competitions [50, 51]. This improvement has led to dramatic increases in both performance and scalability, allowing modern SMT solvers to handle problems with thousands of variables and perhaps millions of constraints [45]. Additionally, solvers like Z3 [47] can regularly handle constraints over undecidable logics, making SMT solvers incredibly expressive.

For these reasons, SMT solvers may seem like viable solutions to the generalized structured black-box test generation problem. After all, their underlying logic for test inputs is remarkably expressive, and the solvers are quite fast. However, there is a major downside to SMT solvers for our purposes: they tend to be designed with the assumption that only one solution is needed, whereas we need hundreds of thousands for test generation. One can naively generate multiple solutions by adding blocking clauses to the original query [45], which effectively constrain the problem to generate a solution which differs from previously-generated solutions. One then queries the solver again with these blocking clauses, and iteratively continues this process of adding blocking clauses and solving until some set point. However, blocking clauses are problematic for two reasons:

1. As described, we must completely restart solving for each new set of blocking clauses. Considering that most solver work is expected to be in the initial query, in contrast to the blocking clauses, this is wasteful.

2. This approach fundamentally does not scale to many solutions. If we want k solutions, we will need to add in k sets of blocking clauses, where the size of each set

is bounded by the number of variables in the original problem [52]. With this in mind, the size of the blocking clauses will eventually exceed the size of the problem as k increases, and this may occur relatively early for variable-rich problems. While the above problems can be at least somewhat overcome by modifying the solver itself (e.g., [53, 54]), such changes begin to blur the meaning of “SMT solver”. Additionally, these changes cause their own problems. For example, while Z3 [47] has limited support for generating multiple solutions, using these capabilities can severely limit what the solver can practically reason about [48]. From personal experience, Z3 often becomes orders of magnitude slower if even two solutions are requested. As such, SMT solvers are overall inappropriate for solving the generalized structured black-box test generation problem.

1.6.2 ALLSAT Solvers

Unlike SMT solvers, ALLSAT solvers are designed from the ground-up to produce multiple solutions for a given query [52]. In this way, ALLSAT solvers are far superior to SMT solvers for solving the structured black-box test generation problem. However, ALLSAT solvers are nonetheless inappropriate for this problem.

The fundamental problem with ALLSAT solvers is that they are technologically primitive relative to SMT solvers. For one, the underlying logic of ALLSAT solvers is far more restrictive than that of SMT solvers. ALLSAT solvers are based on basic boolean satisfiability, whereas SMT solvers can handle more complex datatypes like mathemat- ical integers and bitvectors [28]. This is likely not a fundamental limitation, as SMT solvers tend to be based around augmenting DPLL-based SAT solvers [45, 55, 56] with additional algorithms to handle more complex constraints [57]. However, either such algorithms have not been developed for the ALLSAT context, or they have yet to be

practically implemented.

A second problem related to the relatively primitive nature of ALLSAT solvers is that they are limited to too few solutions for our purposes. There is no competition for improving the state of the art in ALLSAT solvers (like that for SMT solvers [50]), though prerequisite steps have been performed recently [52]. Looking at the results in Toda et al. [52], modern ALLSAT solvers currently seem restricted to producing only a few thousand solutions total, with a generation rate of less than 200 solutions per minute. While the generation rate is fast enough for our purposes, the limitation to a few thousand solutions makes modern ALLSAT solvers impractical for testing large industrial applications, where we are expected to need hundreds of thousands to millions of inputs to sufficiently explore the code.

For these reasons, while ALLSAT solvers seem applicable to this problem in theory, they are still too primitive for my purposes. Until the state of the art in ALLSAT solvers improves, ALLSAT solvers are not a viable solution to the generalized structured black-box test case generation problem.

1.6.3 Answer Set Programming

As the name implies, answer set programming [58, 59] is geared towards generating multiple solutions for the same constraints. In this way, answer set programming is similar to ALLSAT solvers. However, the constraint language for answer set programming is far more expressive than that of ALLSAT solvers, and appears Prolog-like [60, 61] in construction. This makes answer set programming intuitively more attractive than ALLSAT solvers for the generalized black-box structured test input generation problem. Unfortunately, much like with ALLSAT solvers, the state of the art in answer set programming is still primitive relative to SMT solvers. While there is an established

competition for improving answer set programming [62], it is younger than the equiva- lent competition for SMT solvers [50]. Looking closer at the answer set programming competition, it is currently biased towards incredibly difficult problems with fewer than 30 solutions. [62] Considering that test generation problems are usually simpler but with hundreds of thousands of inputs, we can extrapolate that this indicates that the current state of the art in answer set programming may not be appropriate.

We do not need to extrapolate far, however. Clasp [63, 64], a popular answer set programming implementation, also happens to have ALLSAT solver capabilities. Moreover, Clasp was evaluated in Toda et al. [52] alongside other ALLSAT solver implementations. All these implementations were deemed inappropriate for structured test case generation in Section 1.6.2. As such, assuming the answer set programming capabilities of Clasp behave similarly to its ALLSAT solver capabilities, then we can conclude that Clasp is overall inappropriate for this problem. Given that Clasp is a popular answer set programming language, it seems safe to assume that Clasp is indicative of the whole for answer set programming, so we can overall conclude that the state of the art in answer set programming is still too primitive for structured black-box test input generation.

1.6.4 Constraint Logic Programming

Much like SMT solvers, constraint logic programming (CLP) [40, 41] offers the user a highly expressive constraint language. However, unlike SMT solvers, CLP is well-suited to generating many solutions. Additionally, there is no fundamental limit to the number of solutions which can be generated, in stark contrast to the apparent limits seen with ALLSAT solvers (Section 1.6.2) and answer set programming (Section 1.6.3). CLP has been around longer than some of its competitors, and multiple high-performance implementations already exist (e.g., SICStus Prolog [65], GNU Prolog [66], SWI-Prolog [67],

and ECLiPSe [68]). For these reasons, CLP is the only solution discussed which satisfies all of the constraints of a necessary solution.

The one caveat of CLP is that the constraint language is so rich that it is Turing- complete. Because this can easily lead to situations where constraint satisfaction cannot proceed, CLP gives the programmer fine-grained control over how constraint search is performed. Specifically, CLP features a straightforward operational semantics which is simple enough to be described with a single inference rule (namely SLD-resolution [69]). This operational semantics makes the search behavior of CLP constraints not only pre- dictable, but exploitable. CLP users can effectively “inject” domain knowledge into the code through optimizations. Such optimizations bear a striking resemblance to the approach of defining domain-specific constraint satisfaction algorithms (seen in Toda et al. [52]). However, these CLP optimizations show one major difference: these optimizations are in CLP code itself, as opposed to escaping CLP entirely and defining novel search strategies. This speaks to the applicability of CLP to a large variety of domains, as we do not need to define whole new search algorithms for each new domain.

This fine-grained control over search in CLP is both a blessing and a curse. This control allows CLP to be extended to wildly disparate testing domains and to problems which range the entire gamut in complexity. This generality comes at the cost of occa- sionally being difficult to use; the user may need to inject domain knowledge if things start getting slow, as opposed to simply picking a smarter implementation with a better search strategy as one might do with an SMT solver. That said, a smart enough engine might not exist (e.g., one may already be using the best SMT solver available), in which case one would almost assuredly be completely stuck without CLP.

In document Automated Black Box Generation of Structured Inputs for Use in Software Testing (Page 31-37)