Overarching Thesis - Automated Black Box Generation of Structured Inputs for Use in Software Te

The insights in Section 1.5, along with the discussion of potential solutions in Sec- tion 1.6, lead me to my formal thesis statement:

Thesis Statement: We can represent structured test inputs as solutions to systems of logical constraints. In order to encode many kinds of test inputs and test a wide variety of systems, we need an expressive logic combined with a high-performance constraint solver capable of finding many solutions. I observe that CLP meets these criteria, and so CLP can ultimately solve the generalized structured black-box input generation problem.

To be clear, solutions other than CLP are possible, though CLP is the only solution I am aware of which meets all the solution criteria defined in Section 1.6.

In order to demonstrate that CLP is a valid solution to the generalized structured black-box input generation problem, I must demonstrate that:

• The logic of CLP is expressive enough to encode a wide variety of test generation problems, including highly complex ones

• Some arbitrary CLP engine is fast enough to produce a sufficient amount of tests within a reasonable timeframe

To this end, I present a series of case studies wherein CLP was applied to test case generation in some particular domain. I have intentionally chosen a wide variety of domains, including those where test inputs have significantly complex structure. I submit this as evidence of the generality and expressibility of CLP for test case generation.

Moreover, where applicable, CLP was able to find real bugs in popular, industry-grade software. Additionally, at multiple points CLP was compared to direct competitors in

specific domains. Such comparison revealed that CLP is consistently able to generate test inputs orders of magnitude more quickly than that of the competition. I submit this as evidence that modern CLP engines are fast enough to solve the generalized structured black-box input generation problem, and overall have little difficulty producing the large numbers of inputs required for effective testing.

1.7.1 Organization of This Document

The rest of this document goes through each of the aforementioned case studies in detail. A brief description of each one of these case studies follows:

1. Chapter 2 shows how CLP can be applied to fuzzing dynamic languages, in particular JavaScript. This chapter also demonstrates that CLP is a strict generalization of the stochastic grammar approach described in Section 1.3.1. This chapter is based on work published in ASE’14 (Citation: [70]; DOI: 10.1145/2642937.2642963; © 2014 ACM).

2. Chapter 3 shows how CLP can be applied to generating highly constrained data structures, well beyond anything ever previously attempted. Not only can CLP be applied to this problem, it was orders of magnitude faster than its competitors, namely Korat [34] and UDITA [35]. This chapter is based on work published in ICSE’15 (Citation: [36]; DOI: 10.1109/ICSE.2015.26; © 2015 IEEE).

3. Chapter 4 shows how CLP can be applied to fuzzing the typechecker of the Rust programming language [71], which features a complex type system. This was the first work which ever attempted to efficiently generate tests for a type system of this level of complexity. During this process, 14 developer-confirmed bugs were found. This chapter is based on work published in ASE’15 (Citation: [72]; DOI: 10.1109/ASE.2015.65; © 2015 IEEE).

4. Chapter 5 shows how CLP can be applied to fuzzing SMT solvers, via the generation of guaranteed satisfiable and unsatisfiable formulas. This process found 23 bugs, of which 22 have been fixed at this point. This chapter is based on work submitted (though not accepted) to ICSE’17.

5. Chapter 6 shows how CLP can be applied to fuzzing tokenizers and parsers, specifically in the context of finding bugs in student solutions to an educational problem. This process revealed CLP to be strictly better at finding bugs than a traditional manually-generated test suite, and exposed deficiencies in the student-provided assignment specification. This chapter is based on work published in ITiCSE’17 (Citation: [73]; DOI: 10.1145/3059009.3059051; © 2017 ACM).

6. Chapter 7 shows how to apply CLP to generating well-typed programs in a non- trivial language, specifically in an educational context. This chapter is written somewhat in a tutorial style, and offers lots of code samples for a complex generation problem. This chapter is unpublished.

The last two chapters (specifically Chapters 8 and 9) discuss certain limitations of CLP-based test generation, along with how these limitations are overcome. While this work has not been published, it has assisted me during multiple testing projects over the years. Although these latter chapters do not directly contribute to my thesis statement, they are so closely connected to the case studies described that I felt it necessary to include them. Additionally, these chapters highlight certain technical and engineering details which are relevant to using CLP for test case generation. As such, these chapters are absolutely relevant to using CLP for testing, even if they do not directly defend the thesis statement.

The rest of this thesis assumes the reader has at least a passing familiarity with CLP; readers who are not familiar with CLP should consult Appendix A. Mathematical

formalisms are presented in multiple areas of this thesis; details behind the notation used are available in Appendix B.

Pronouns Used in This Document

Both “I” and “we” are used in this document. Uses of “I” refer to thoughts and work which I consider entirely my own. However, much of this work was produced through collaboration and discussion with others, particularly with the people mentioned in the acknowledgements section. While I am the main author of all the work in this document, I do not claim total overship over such collaborative portions, and so I use “we” for such cases. Occasionally I use a royal “we” as well, referring to things in a more general sense.

Case Study: Generating Interesting

JavaScript Programs

2.1 Introduction

This chapter features a case study wherein CLP is applied to the generation of JavaScript programs with predictable runtime behaviors. The purpose of this case study is to demonstrate that CLP can be used to not only implement stochastic grammars (dis- cussed in Chapter 1, Section 1.3.1), but to implement program generators strictly more expressive than that possible with stochastic grammars alone. The sort of CLP-based generators defined in this chapter can perform simple but effective semantic reason- ing about JavaScript programs, which is well-beyond anything possible with a syntax- oriented approach like that of stochastic grammars.

In regards to the thesis, this chapter demonstrates that CLP is expressive, fast, and capable of generating many programs. Specifically, CLP is significantly more expressive than the otherwise popular stochastic grammars, to the point where CLP can accurrately reason about the semantic properties of generated programs. Despite the added express-

ibility, CLP is still nonetheless able to generate thousands to hundreds of thousands of programs per second.

In document Automated Black Box Generation of Structured Inputs for Use in Software Testing (Page 37-42)