Macario Polo, University of Castilla - La Mancha, Spain Mario Piattini, University of Castilla - La Mancha, Spain
Abstract
This chapter presents a new testing technique called “test-case mutation.” The idea is to apply a set of specific mutation operators to test cases for object-oriented software, which produces different versions of the original test cases. Then, the results of the original test case and of its corresponding mutants are compared; if they are very similar, the technique highlights the possible presence of a fault in the class under test.
The technique seems useful for testing the correctness of strongly constrained classes.
The authors have implemented a supporting tool that is also described in the chapter.
Introduction
Source-code mutation is a testing technique whose main goal is to check the quality of test cases used to test programs. Basically, a program mutant is a copy of the program under test, but with a small change in its source code, such as the substitution of “+”
by “-.”Thus, this small change simulates a fault in the program, the program mutant therefore being a faulty version of the program under test. If a test case is executed both
on the program being tested and on a mutant and their outputs are different, then it is said that the mutant is “killed.” This means that the test case has found the fault introduced in the original program and, therefore, the test case is “good.”
Changes in the source code are seeded by mutation operators that, in many cases, are language-dependent (i.e., there are mutation operators specifically designed for Java, C++, etc.).
Although powerful, source-code mutation is computationally a very expensive testing technique (Baudry, Fleurey, Jézéquel, & Traon, in press; Choi, Mathur, & Pattison, 1989;
Duncan, 1993; Mresa & Bottaci, 1999; Weiss & Fleyshgakker, 1993). In fact, source-code mutation has several very costly steps:
• Mutant generation: Offut, Rothermel, Untch, and Zapf (1996) reported on an experiment that, from a suite of 10 Fortran-77 programs ranging from 10 to 48 executable statements, between 183 and 3010 mutants were obtained. Mresa and Bottaci (1999) showed a set of 11 programs with a mean of 43.7 lines of code that produced 3211 mutants. The Mujava tool (Ma et al., 2004), when applied to a Java version of the triangle-type program with 37 lines of code, produces 469 mutants.
• Mutant execution: According to Ma, Offutt, and Kwon (2004), research in this line proposes the use of nonstandard computer architectures (i.e., Krauser, Mathur, &
Rego, 1991) and weak mutation. In weak mutation, the state of the mutant is examined immediately after the execution of the modified statement, considering that the mutant is killed even though the incorrect state is not propagated until the end of the program. Weak mutation was initially introduced by Howden . Offut and Lee (1994) concluded that weak mutation is a cost-effective alternative to strong mutation for unit testing of noncritical applications.
• Result analysis: Besides the study of mutants, both killed and alive, this step also involves the discovery of functionally equivalent mutants that, for Mresa &
Bottaci (1999), “is the activity that consumes the most time.”
In order to reduce the number of mutants generated, Mathur (1991) proposed “selective mutation.” This line has also been worked by other authors: Mresa and Bottaci (1999), Offut et al. (1996), and Wong and Mathur (1995) conducted experiments to find a set of sufficient mutant operators that decreases the number of mutants generated without information loss. In Mresa and Bottaci (1999) and Wong and Mathur (1995), the respective authors also investigate the power of randomly selected mutants and compare it to selective mutation. In Hirayama, Yamamoto, Okayasu, Mizuno, and Kikuno (2002), the authors proposed a new testing process starting with a prioritization of program functions from several viewpoints; according to these authors, this ordination reduces the number of test cases generated without decreasing the results.
In this very same line, Kim, Clark, and McDermid (2001) analyze the effectiveness of several strategies for test-case generation with the goal of finding out which one gets kills more mutants.
Offut and Pan (1997) demonstrated that it is possible to automatically detect functionally equivalent mutants. A mutant is equivalent to the original program if it is impossible to
find any test data to kill the mutant. The authors use constraints for this task. In their experiments, they can detect almost 50% of equivalent mutants.
When testing object-oriented software, it is common to consider a test case for a class K as a sequence of invocations to methods of K. In testing environments such as JUnit, Nunit, and tools of this family, each test case consists of the construction of an instance of the class being tested, and then in the execution of some of the services offered by the class. The obtained object is then compared with the expected object. With other testing tools, such as MuJava (Ma et al., 2004), a test case is also composed of a sequence of calls to some of the class services.
The manual generation of test cases for a class is a hard and costly task. A proposal to facilitate this task was made by Kirani and Tsai (1994), who proposed the use of a technique called “method-sequence specification” to represent the causal relationship between methods of a class. This specification documents the correct order in which the methods of a class can be invoked by client classes, and can be specified using state machines or regular expressions. As it is also interesting to test a class when its methods are invoked in an order different from “the correct” one, the specification (regular expression or state machine) can be also used to generate other sequences.
Once the tester has a set of sequences (correct or not), he or she must combine them with test values to obtain a set of test cases.
From one point of view, a test case composed of a sequence of methods with actual values passed, as parameters can be understood as a way of using the class being tested. Any variation in the test-case results in a different way of using the class. In many cases, the state of the instance after being executed in one or another way should be different. One only needs to think of a bank account, one of the simplest examples. The final state of a new instance is different if we deposit and then withdraw, or if we withdraw and then deposit; the state of the instance will also be different if we deposit 100 euros rather than 200.
As a matter of fact, unit tests are based on the fact that an instance must always be in a correct state; if an instance cannot reach a correct state, it must report throwing an exception. A class is defined by a set of fields and operations; moreover, it may have some class invariants, and some pre- and postconditions annotating its operations.
Class operations are divided into queries (which do not change the instance state) and commands (which change the instance state). To apply a command to an instance, the instance must:
1. Be in a correct state (state precondition). For example, a banking account cannot be locked against withdrawals.
2. Receive correct arguments (argument precondition). For example, the amount to be withdrawn must be greater than zero.
The evaluation of the state precondition may require taking into account the argument precondition, since arguments can be invalid by themselves or due to a conflict with the current state of the instance (e.g., if a banking account has a balance of 100, it must be impossible to withdraw 200).
The result of applying a command will be:
1. An instance in a correct state (state postcondition).
2. Possibly, a return value according to the class semantic (result postcondition).
Given a test case composed of a sequence of commands (operations that change the instance state) executed on the same instance, it is possible to obtain many versions by introducing some changes, such as removing a method invocation or repeating one, changing the order of invocation of methods, interchanging the values of two parameters of compatible types, setting to zero the value of a numeric parameter, setting to null a complex parameter, reducing the size of a parameter array, etc. The state of the instance will probably be different depending on which version of the test case has been executed.
Each version of the test case may be obtained by applying a test-case mutation operator.
If source code mutants simulate faults in the program under test, test case mutants simulate faults in the way of using the class being tested: in strongly-constrained class, we hope that different ways of use put the respective instances in different states. Thus, the test-case mutation technique we present in this chapter is especially useful to test strongly constrained classes.
Initially, the chapter describes the proposed technique, giving some concepts and definitions. Then it presents the results of its application to a set of programs. The tool we have implemented for supporting the technique is later explained. Finally, we draw our conclusions and future lines of work.