Information-Flow Control - Random Testing For Language Design

4.5 Evaluation

4.5.3 Information-Flow Control

For a second large case study, we turned to the information-flow control case study of Chapter 3, re-implementing methods for generating indistinguishable machine

states. Given an abstract stack machine with data and instruction memories,

a stack, and a program counter, one attaches labels—security levels—to runtime

values, propagating them during execution and restricting potential flows of in-

formation from high (secret) to low (public) data. The desired security property,

termination-insensitive noninterference, states that if we start with two indistin-

guishable abstract machiness1ands2(i.e., all their low-tagged parts are identical)

and run each of them to completion, then the resulting statess1’ands2’are also

indistinguishable.

In “Testing Noninterference, Quickly” [65], we found that efficient testing of this property could be achieved in two ways: either by generating instruction memories that allow for long executions and checking for indistinguishability at each low step

(called LLNI, low-lockstep noninterference), or by looking for counter-examples

to a stronger invariant (strong enough to prove noninterference), generating two

arbitrary indistinguishable states and then running for a single step (SSNI, single

step noninterference). In both cases, there is some effort involved in generating indistinguishable machines: for efficiency, one must first generate one abstract

machine s and then vary s, to generate an indistinguishable one s’. In writing

such a generator for variations, one must effectively reverse the indistinguishability predicate between states and then keep the two artifacts in sync.

We first investigated the stronger property (SSNI), by encoding the indistinguishability predicate in Luck and using our prototype to generate small, indistinguishable pairs of states. In 216 lines of code we were able to describe both the predicate and the generator for indistinguishable machines. The same functional-

ity required ą1000 lines of complex Haskell code in the handwritten version. The

handwritten generator is reported to generate an average of 18400 tests per second, while the Luck prototype generates an average of 1450 tests per second, around 12.5 times slower.

In Chapter 3, to generate long sequences of instructions we used generation by execution: starting from a machine state where data memories and stacks are in- stantiated, they generate the current instruction ensuring it does not cause the machine to crash, then allow the machine to take a step and repeat. While intu- itively simple, this extra piece of generator functionality took significant effort to code, debug, and optimize for effectiveness, resulting in more than 100 additional lines of code. The same effect was achieved in Luck by the following 6 intuitive lines, where we just put the previous explanation in code:

sig runsLong :: Int -> AS -> Bool fun runsLong len st =

if len <= 0 then True else case step st of

| 99 % Just st’ -> runsLong (len - 1) st’ | 1 % Nothing -> True

We evaluated our generator on the same set of buggy information-flow analyses. We were able to find all of the same bugs, with similar effectiveness (number of bugs found per 100 tests). However, the Luck generator was 24 times slower (Luck: 150 tests/s, Haskell: 3600 tests/s). We expect to be able to improve this result (and the rest of the results in this section) with a more efficient implementation that compiles Luck programs to QuickCheck generators directly, instead of interpreting them in a minimally tuned prototype.

The success of the prototype in giving the user enough flexibility to achieve similar effectiveness with state-of-the-art generators, while significantly reducing the amount of code and effort required, suggests that the approach Luck takes is promising and points towards the need for a real, optimizing implementation.

Acknowledgments

The work presented in this Chapter was the basis for the POPL 2017 paper “Begin- ner’s Luck” [81], with Diane Gallois-Wong, Catalin Hrit¸cu, John Hughes, Benjamin Pierce and Li-yao Xia. While the majority of the work presented in this section

is mine, with the exception of the pattern match expansion algorithm that is at- tributed to Li-yao, this work would not have been possible without the constant discussions about the semantics of Luck with all the collaborators and especially Benjamin.

Chapter 5 Generating Good Generators for

Inductive Relations

In Chapter 2, we introduced QuickChick, a property-based testing QuickCheck clone for Coq and demonstrated its functionality. In particular, compared to similar tools in other proof assistants like Isabelle [17], QuickChick gives the user the full customizability that QuickCheck provides: one can easily write and compose generators using an established combinator library.

However, as we saw earlier, for complex properties and especially specifications involving sparse preconditions, setting up PBRT-style testing can involve substantial work. Writing generators for well-distributed random data for such properties can be both complex and time consuming, sometimes to the point of being a re- search contribution in its own right [65, 66, 103]!

In the previous chapter, we identified two techniques for automatically deriving a generator from a given precondition: narrowing and constraint solving. Auto- matic narrowing-based generators can achieve testing effectiveness (measured as bugs found per test generated) comparable to hand-written custom generators, even for challenging examples [31, 43, 81].

Unfortunately, both hand-written and narrowing-based automatic generators are subject to bugs. For hand-written ones, this is because generators for complex conditions can often also be complex, often more than the condition itself;

moreover, they must be kept in sync if the condition is changed, another source of errors. Automatic generators do not suffer from the latter problem, but narrowing solvers are themselves rather complex beasts, whose correctness is therefore ques- tionable. Even Luck, which we presented in Chapter 4, that comes with a proof of correctness, only proves an abstract model of the core algorithm, not the rather large Haskell implementation.

Bugs in generators can come in two forms: they can generate too much, or too little—i.e., they can be either unsound or incomplete. Unsoundness can lead to false positives, which can waste significant amounts of time. Incompleteness can lead to ineffective testing, where certain bugs in the program under test can never be found because the generator will never produce an input that provokes them. Both problems can be detected—unsoundness by double-checking whether generated values satisfy the property, incompleteness by techniques such as mutation testing [71]—and unsoundness can be mitigated by filtering away generated values that fail the double-check, but incompleteness bugs can require substantial effort to understand and repair.

The core contribution of this Chapter is a method for compiling a large class of logical conditions, expressed as Coq inductive relations, into random generators

together with soundness and completeness proofs for these generators. We do not

prove that the compiler itself is correct in the sense that it can only produce good

generators; rather, we adopt a translation validationapproach [107] where we pro-

duce a checkable certificate of correctness along with each generator. A side benefit of this approach is that, by compiling inductive relations into generators, we avoid the interpretive overhead of existing narrowing-based generators. As discussed in the previous Chapter, this overhead is one of the reasons existing generators can be an order of magnitude slower than their hand-written counterparts.

We have implemented our method as an extension of QuickChick. Using

QuickChick, a Coq user can write down desired properties like Conjecture preservation : forall (t t’ : tm) (T : ty),

and look for counterexamples with no additional effort:

QuickChick preservation. ÝÑQuickChecking preservation...

Passed 10000 tests

The technical contributions of this chapter are as follows:

We present a Luck-inspired method for compiling a large class of inductive

definitions into random generators. Section 5.1 introduces our compilation algorithm through a sequence of progressively more complex examples; Sec- tion 5.2 describes it in full detail.

We show how this algorithm can also be used to produce proofs of (pos-

sibilistic) correctness for every derived generator (Section 5.3). Indeed, by judicious application of Coq’s typeclass features, we can use exactly the same code to produce both generators and proof terms.

To evaluate the applicability of our method, we applied the QuickChick im-

plementation to a large part ofSoftware Foundations[106], a machine-checked

textbook on programming language theory. Of the 232 nontrivial theorems we considered, 84% are directly amenable to PBRT (the rest are higher-order properties that would at least require significant creativity to validate by random testing); of these, 83% can be tested using our algorithm. We discuss these findings in detail in Section 5.4.1.

To evaluate the efficiency of our generators, we compare them to fine-tuned

handwritten generators for information-flow control abstract machines (Sec- tion 5.4.2) and for well typed STLC terms (Section 5.4.3). The derived gen-

erators were 1.75ˆslower than the custom ones, demonstrating a significant

speedup over previous interpreted approaches such as Luck [81].

We conclude and draw directions for future work in Section 5.5. The implementation of the algorithm in QuickChick, further integrating testing and proving in the Coq proof assistant and providing more push-button-style automation while retaining customizability, is described in the next chapter (Section 6.1).

5.1 Good Generators, by Example

The main focus of this chapter is to derivecorrectgenerators for simply-typed data

satisfying dependently-typed, inductive invariants. This section uses examples to showcase different behaviors that our generation algorithm needs to exhibit; the algorithm itself will be described more formally in the following section. In particular, we are going to give a few progressively more complex inductive characterizations of trees, and detail how we can automatically produce a generator for trees satisfying those characterizations. We first encountered Coq trees in Chapter 2. We repeat their standard definition here for the reader’s convenience:

Inductive Tree A :=

| Leaf : Tree A

| Node : A -> Tree A -> Tree A -> Tree A.

In document Random Testing For Language Design (Page 123-129)