Experiences from Extending the Machine - Random Testing For Language Design

A natural question that arose from this line of work is whether our testing method- ology scales to larger, more realistic machines. To answer that question we extended the machine to include registers, as well as advanced IFC features such as a richer lattice of first-class labels and dynamically allocated memory with mu- table labels. Presenting all the features of that machine here is out of scope of this thesis; the interested reader is referred to the journal version of the “Testing Noninterference, Quickly” paper [66]. However, two specific extensions will allow us to continue the discussion of custom generators: using a larger label lattice and parameterizing the IFC rules in a rule table.

3.3.1 Decoupling of Generators and Predicates

In the extended machine, we moved from a two-point lattice for labels to an arbitrary lattice. For the purposes of this section, we can restrict our attention to a four-element diamond lattice:

` ::=L|M1 |M2 |H

where L Ď M1, L Ď M2, M1 Ď H, and M2 Ď H. The labels M1 and M2 are

incomparable. With this richer lattice, our definition of “low” and “high” becomes

relative to an arbitrary observer label `: we call some label `1 _{low with respect to}

` if `1 _Ď_` _{and high otherwise.}

In the new setting, a correct definition of indistinguishability of machine states requires the program counters to be equal only if their labels are “low” compared to the observer label; otherwise they can be different. QuickChick quickly finds a counterexample if we use an indistinguishability relation that is too restrictive. During our proof efforts for the register machine, we initially got the “fix” wrong: we allowed one machine to be “high” while the other was “low”. Such faulty definitions were not uncommon in our original designs; we used QuickChick to find much more subtle ones throughout our efforts. What makes this particular bug interesting however, is that our testing infrastructure (even MSNI) could not find

it at all!

The reason is, once again, that our generators for indistinguishable states were incomplete with respect to the (now faulty!) indistinguishability predicate. Indeed, our generator for indistinguishable machines never created starting configurations where one machine was in a “high” state and the other one in a “low” state, even though that was allowed by the indistinguishability relation. As a result, a large part of the state space was not exercised and a counterexample could not be produced. One of the main goals of this thesis is to tightly couple generators and predicates, so that such occurrences cease to exist.

3.3.2 Debugging Generators

In the extended machine, to avoid cluttering the step function with the IFC logic

and injected bugs, we parameterized the step relation to take a rule table as an

argument. A single rule would receive a number of labels as inputs (the pc label

and the labels of any arguments), potentially perform checks and return labels for

the result and the new pc. For example, the IFC entry for theStore instruction of

the stack machine (whose semantics are repeated here for convenience) ippcq=Store mppq=n1_@_`1

n `p_`pc Ď`1n m1 =mrp:=n@p`n_`p_`pcqs

pc@`pc p@`p :n@`n :s m ñ ppc+1q@`pc s m1

(Store)

would look like this:

Check F inal pc Label Result Label `p_`pc Ď`1n `pc `n_`p_`pc

This factorization of IFC rules allowed a more systematic approach todebugging

our generators. Since we are striving for a sound and permissive set of information- flow rules, every check performed and every tainting of result labels has to be

essential; in other words, if we were to remove a check or a label join, our testing infrastructure should lead to a counterexample. By doing exactly that, we can

using the lattice structure of the labels. For example, the Store rule above gives rise to the following mutants, where each row depicts a single dropped taint or check.

Check F inal pc Label Result Label `pc Ď`1n `pc `n_`p_`pc `p Ď`1n `pc `n_`p_`pc True `pc `n_`p_`pc `p_`pc Ď`1n K `n_`p_`pc `p_`pc Ď`1n `pc `p_`pc `p_`pc Ď`1n `pc `n_`pc `p_`pc Ď`1n `pc `n_`p `p_`pc Ď`1n `pc `n `p_`pc Ď`1n `pc `p `p_`pc Ď`1n `pc `pc `p_`pc Ď`1n `pc K

If QuickChick cannot find a counterexample to a specific mutant, it has revealed something interesting: either the testing is not complete or the IFC rule is too strict! This is a particularly fortunate situation compared to standard mutation testing [71]: it completely avoids the “equivalent mutant problem” by construction. Unfortunately, preliminary attempts to generalize this approach to a more general setting like arbitrary inductive properties have failed. This is discussed further in future work (Chapter 8).

In document Random Testing For Language Design (Page 67-69)