Mutating Reduceron Code - Static Methods to Check Low-Level Code for a Graph Reduction Machine

In our experiments to evaluate code-checkers, we need a suitable framework to emulate the situation when the code is altered or modified, perhaps maliciously to cause some damage during the evaluation in the target machine. Possible approaches include : randomly created programs, hand-coded programs, and mutated programs. Each has advantages and drawbacks.

Randomly created programs Let us firstly consider the idea of producing arbitrary random programs [75–78]. It seems to fit in our framework for testing. The framework QuickCheck [77] could be used to produce programs randomly. However the random approach is not a good candidate for our testing purposes because the programs created are completely arbitrary. For instance, we could have a empty list [ ] representing a list of templates. The design for obtaining programs close to genuine programs or programs that are not too expensive in terms of computation steps is complex in the random scenario.

Alteration by hand Another possibility is to create programs completely by hand, or by hand modification. This provides a good technique and close to what we want: bad code produced by hand or altered by hand. According to some experiments of hand-writing code we realise that it is easy to make a mistake, for example confusion in the order of function arguments. The drawback of this technique is that it takes too much time to produce programs compared to randomly created programs for example.

PTR rule

< p, h, P T R x : s, u > =⇒ state

where

upd = (1 + length s, x)

state = < p, h, hx: s, upd : u > , if 0 <= x < length h

state = (2,’Invalid HEAP address’) , otherwise

PRI rule

< p, h, P RI f : x : y : s, u > =⇒ state

where

ps= {(+),(<=),(−),(==),(/ =)}

state = < p, h, primApply f x y : s, u> , if f ∈ ps state = (7,’Invalid primitive’) , otherwise

CON rule

< p, h, CON n j : s, u > =⇒ state

where

state = < p, h, F U N 0 (i + j) : s, u > , if n < length s ∧ sn= T AB i

state = (5,’Bad STACK pattern’) , otherwise

FUN rule

< p, h, F U N n f : s, u > =⇒ state

where

(pop, spine, apps) = pf

h0 = h ++ map (instApp s h) apps s0 = instApp s h spine ++ drop pop s

state = < p, h0, s0, u > , if f < length p state = (5,’Bad Template address’) , otherwise

Update rule

< p, h, top : s, (sa, ha) : u > =⇒ state

where

n = 1 + length s − sa

h0 = update ha (top : take n s) h

state = < p, h0, top : s, u > , if arity top > n state = (5,’Bad STACK pattern’) , otherwise

Chapter 3. 43 Mutated programs The mutation technique was proposed by DeMillo [79] late in the 70’s, and followed by [80–87]. By mutating we intend to emulate the scenario in which we have code close to genuine code. This mutated code is the original code slightly modified. The idea of mutation is to start from an original valid program (that terminates under the operational semantics).

Once we have this genuine program, it is mutated by altering selected atoms, or changing the position of two contiguous atoms, or deleting an arbitrary atom. By performing small mutations, we create a mutant based on a genuine program and not arbitrary programs as in the random approach.

The number of mutations that we can create is in proportion to the number of atoms in the original program. Mutation takes the best of the other two techniques: realistic programs produced automatically. If the number of atoms in a given program is too small to produce enough mutations, we can create compound mutations based on a series of mutations (depth zero,depth 1,...,depth n). In the literature of mutation testing the mutations of depth one are called FOM (first order mutations) and the ones of any depth greater than one are called HOM (higher order mutation) [33].

Mutated programs and Random Selection For the purpose of our experiments, we generate first a list of all the possible mutations, and secondly, from that list we randomly extract an arbitrary number of mutations.

Our Mutation Testing Approach. For our work we reuse the idea of dam- aging code from mutation testing. But the use we make of mutant programs is different. In classic mutation testing, mutations are used to measure the effectiveness of a test suite in terms of the ability to detect faults [33]. If a test suite can detect a problem in a mutant, then that mutant is “killed” (a good thing!). A key step in mutation testing is to compare the output of the original valid program against the output of each mutant.

In our case we do not compare results computed by each mutation with those computed by the original program. Instead for each mutant we compare its be- haviour under the operational semantics, against the outcome of static checking. Our aim is to measure the effectiveness of the static checker. We do not care about

the correctness of a mutant; if the machine computes any value, without crashing, that is fine for our purposes.

The well known problem of equivalent mutants [33] can arise when mutation testing is used to evaluate test suites. A mutant may happen always to compute the same result as the original program, so no test suite can kill it. This problem is no concern here. We only want to know if each mutant is well-behaved or not, and if it is well-checked or not. The result it computes is ignored.

Random Testing and Further Discussion Another way of constructing realistic test programs is the use of random generators based on with attribute- grammars, as in the work of Drienyovszky, Horpasci et al. [88], where QuickCheck is applied to test refactoring tools for Erlang. The attribute-grammar generator provides a more expressive mechanism [89] than a context-free-grammar generator. The code for the generators is more concise and maintainable than one developed using the standard generator method of QuickCheck.

The idea of using program generators to automate the testing of programming- language tools can be quite successful. For instance, in the work of Daniel, Dig et al. [90] they found 45 previously unreported bugs in Eclipse and NetBeans, which are the most used refactoring tools for Java. Here the idea of the generators is taken from QuickCheck generators, even if they do not use random testing. This approach uses Java classes and an imperative and exhaustive way of generating test programs.

We could have used similar program-generation techniques to test our tools. How- ever, we decided to use the idea of mutants because it is a simple approach to implement, and it produces tests that are quite close to a well-behaved program.

In document Static Methods to Check Low-Level Code for a Graph Reduction Machine (Page 55-58)