Conclusion - Automated Black Box Generation of Structured Inputs for Use in Software Testing

This chapter has explored a very complex generator in-depth, and it has similarly shown how such a complex generator can incrementally arise from a more simplistic

generator. While this process is certainly involved and difficult at times, the end result is a generator which is orders of magnitude faster than competitors, with remaining room for optimization. Despite the fact that no industrial-grade software was tested with this generator, this was the most sophisticated generator I have ever worked on by far, both from a theoretical level (generating parametric polymorphic, generic programs is challenging even in theory) and a pragmatic level (there are a multitude of coding challenges, and the code transformation described in the latter portion of Section 7.4.2 is arguably a novel Prolog design pattern). Thanks to this generator, a test suite consisting of over 100K programs was generated for a complex CS162 assignment within a matter of hours. The programs in the test suite are sizable and perform a large variety of complex operations, and so these should give student solutions a good stress test.

Overall, this serves as an excellent case study to bolster my thesis. It would not have been possible to encode these test cases at all if not for the fact that CLP was so expressive. Additionally, with significant optimization, CLP was able to generate programs fast enough to form an effective test suite within a reasonable amount of time. Such optimizations were only possible because CLP gives the programmer fine-grained control over how constraint search is performed, allowing me to take advantage of domain- specific properties like whether or not a type is inhabited. Among the constraint solvers discussed in Chapter 1, Section 1.6, this capability to control search is unique to CLP; other constraint solvers would simply be too slow for the problem, if the problem could even be encoded at all. This all serves as evidence that CLP is an excellent solution to the generalized structured black-box test input generation problem.

Improving CLP for Testing:

Typed-Prolog

8.1 Introduction and Motivation

In this chapter, a type system and related language for CLP is discussed. While this chapter does not contribute directly to my thesis’ argument (that is, this chapter does not speak to the generality, expressibility, or performance of CLP when applied to structured black-box test case generation), I have used it with great success to implement a wide variety test case generators. With this in mind, this chapter is not about strength- ening the thesis, but rather about discussing closely related technologies and insights for practical usage of the insights and contributions of this thesis.

While CLP has proven itself to be immensely useful for automated test case generation, it is still imperfect for this task. In this chapter, there are three particular problems of concern I discuss and address:

2. CLP does not have proper capabilities for handling something like a functional higher-order function, and overall abstracts over computation poorly

3. CLP lacks a portable, sane module system

To elaborate further on the first problem, CLP is not a statically-typed language, nor can it be properly even qualified as a dynamically-typed language. For this reason, bugs can very easily slip into CLP-based test case generators. These bugs are particularlly challenging to debug in the CLP setting. Moreover, given the many tests generated, it may take a significant amount of time and resources before it becomes apparent that there even is a bug present. This is because bugs often manifest as failing to produce a particular program of interest, which is difficult to check for, especially given the often vague definition of “interest”. This problem of bugs is elaborated on in Section 8.3.1.

As for the second aforementioned problem, it is a hindrance that CLP lacks proper support for abstracting over computation, as with higher-order functions. This leads to repetitive code, and it makes it difficult to port ideas from functional languages into CLP. Because both functional languages and CLP share the same idiom stressing pure core (i.e., code without side-effects like mutable state), this can be frustrating. Moreover, the idiomatic CLP “solution” to abstracting over computation entails the dynamic construc- tion of programs, and fundamentally is no different from performing frequent calls to an eval-like routine (as seen in Lisp, JavaScript and Python). This entails all the usual problems of eval, including debugging difficulties and severe performance penalties. This problem is further discussed in Section 8.3.2.

With the third problem, it is far more difficult than it should be to split up CLP code into multiple files with distinct namespaces. There is no standardized CLP module system [177], so different implementations solve this problem in different, sometimes incompatible ways. Indeed, even relatively popular implementations like GNU Prolog [66]

completely lack any sort of module system [177], making it difficult to port the same CLP code to different engines. More discussion of this problem follows in Section 8.3.3.

In order to solve these problems, I propose a new language which exists as a thin wrapper on CLP: Typed-Prolog. Typed-Prolog features a type system based on Hindley- Milner [178, 179], solving the first problem via static types and static typechecking. A key type in Typed-Prolog is that of higher-order relations, which occupy a similar place as higher-order functions and have a similar look and feel. Typed-Prolog also features a simple, but effective, module system which can handle the import and export of code and data between files. Arguably the best part about Typed-Prolog is that it compiles down to standard CLP without the use of any eval-like built-in procedures, even if higher-order relations are used. This ensures maximum portability across different engines, without any significant performance costs.

Overall, this chapter makes the following major contributions:

• A thorough discussion of the three aforementioned problems, in the context of automated test case generation (in Section 8.3.3)

• A formalization of the type system employed to solve the typing problem (in Sec- tion 8.4)

• A discussion of how higher-order relations are handled in Typed-Prolog (in Sec- tion 8.5)

• A discussion of how the module system works in Typed-Prolog (in Section 8.6) • A discussion of how Typed-Prolog has been applied to test case generation, and

8.2 Related Work

Previous chapters lacked their own related work chapters, reflecting the fact that the related work for the overall thesis (Chapter 1, Section 1.3) was redundant with any chapter-specific related work. However, given that this chapter is more about a better CLP than using CLP for testing, the related work for this chapter is radically different than for prior chapters. As such, this chapter features its own separate related work section.

Mycroft et al. [180] first studied the application of the polymorphic Hindley-Milner type system [179, 178] to Prolog. This was powerful enough to encode polymorphic lists and operate over them in a similar way to that of Typed-Prolog. While Mycroft et al. discusses “higher-order objects” which bear close resemblance to Typed-Prolog’s higher- order relations, these are only discussed from a high level, and are not included with the main type system or its implementation.

O’Keefe mentions that a variety of type systems are available for CLP [174], though none of these are available in any modern engine I am aware of (the source is from 1990). A search of type systems for Prolog from this time revealed a number of unsound type systems. For example, Lakshman et al. [181] discuss a type system which is supposedly based on Mycroft et al. [180], though upon closer examination the type system is unsound; it supposedly handles full metaprogramming features including assert and retract in a static manner, which is fundamentally impossible without additional restrictions. It is questionable what utility fundamentally unsound type systems have, given that these cannot reliably used to catch bugs in CLP code ahead of time.

While work on adding type safety to CLP itself seems to have waned, there are other logical languages in existence with sound type systems. Both Mercury [182] and Curry [183] are based on the Mycroft-O’Keefe [180] system, and they augment it with

typeclasses [184, 107] and capabilities equivalent to the higher-order relations in Typed- Prolog. Mercury additionally features advanced mode analysis which can be used to automatically avoid the inefficient “generate-and-filter” style [185]. While these features make Mercury and Curry look appealing for test case generation, they are not well-suited to this task. Mercury is prohibitely restrictive in what it allows you to write, and forces the programmer to bare a significant type annotation burden. I briefly used Mercury for test case generation purposes, until it became apparent that a relatively simple generation task was not possible to encode in Mercury without repeatedly escaping from Mercury to another language. Curry has no standard implementation, and none of the existing implementations are built with performance in mind; this is in stark contrast to modern CLP engines. Additionally, the semantics behind Curry are relatively complex compared to other logical languages [186], further hindering adaptability.

In document Automated Black Box Generation of Structured Inputs for Use in Software Testing (Page 186-192)