Adaptive Mutation - Experimental Analysis

5.5 Experimental Analysis

5.5.3 Adaptive Mutation

Motivated by the results of the previous section we present in this section a new mutation operator that changes its behavior throughout the search. The difference between this adaptive operator, denoted by MDα, and the ones studied in the previous section is the probability distribution used for selecting a class. In MDα the probability distribution is:

p(c, c′) =        ₁ d(c,c′ ) α P r∈U,r6=c( 1 d(c,r)) α if c 6= c′ 0 if c = c′ (5.10)

In this expression, if α = 0, we have the uniform mutation MU and if α = 1, we have the distance-based mutation MDn. We can use values higher than 1 for α. If the value of α is high, then the mutation only selects classes that are close to the ones in the individual. We can see α as an exploitation-exploration parameter. A low value for α leads to an explorative search. A high value leads to an exploitative search. In order to make it adaptive we must change the value of α throughout the search. We use a linear increase for α, that is:

α = λ · step (5.11)

where λ is a parameter called adaptive speed. With this expression for α, the behavior of the adaptive mutation is the same as the behavior of MU at the beginning and it switches to the behavior of MDn as the search progresses. The higher the value of λ, the higher the speed of this change in the behavior. If λ = 0 we have the uniform mutation, MU. On the other hand, if λ = 1/T , then MDα behaves like MDn in T steps.

The adaptive speed λ is a new parameter and we must analyze the behavior of the algorithm for different values of λ in order to give some guidelines for selecting its value. A low value for λ means a very explorative search. A high value for λ makes the algorithm change very fast from the explorative phase to a very exploitative one. It is well-known in the metaheuristic field that one of the key points in the design of an algorithm is to select the exact balance between exploration and exploitation. Thus, we expect the best value for λ to be not too high and not too low: it should be something in between. In order to support this hypothesis we have applied our test data generator using the adaptive mutation to the nine programs presented in Section 5.4.3. We used nine different values for λ and performed 100 independent runs for each program and configuration. In all the cases the generator was executed until 100% branch coverage was obtained and we use the number of evaluations for comparison purposes. In Figure 5.7 we show the average number of evaluations for all the programs and the nine values of λ. We have also included the results of MU (λ = 0).

As expected, when extreme values for λ are used the effort required to reach the total coverage is higher. In particular, when random mutation is used (λ = 0) the effort is higher than for intermediate values of λ (there are statistically significant differences that confirm this observation). On the other hand, when λ = 1/60, the higher value of λ, the effort required is again increased. The reason is that the search reaches a very exploitative stage in a few steps, in which newly generated solutions are similar to the parent solutions. In this situation it is difficult for the algorithm to reach the objective. The best values for λ are between 1/100 and 1/200.

We have also compared our proposals against a random search. The random search proposes random classes for the vector of objects that is used as test case. The random search is able to reach 100% branch coverage only in obj2 1 with an average of 1302 evaluations. For the other

Figure 5.7: Average number of evaluations required for 100% branch coverage in all the test programs for different values of λ.

(more complex) programs we stopped the random search after 50,000 evaluations and the average coverage obtained varies from 21% in obj4 3 to 99% in obj3 1. In the results shown in Figure 5.7 the maximum number of average evaluations for a 100% branch coverage is around 8,000. Thus, we conclude that our proposals are much better than a simple random search.

The results of Figure 5.7 also show the “difficulty” of the programs for the test data generator. From the results we can sort the programs according to the effort required to reach 100% branch coverage. We can observe that, except in a few cases, this ranking is independent of the value of λ and is correlated with the value i + j where i is the number of atomic conditions per logical expression, and j is the nesting degree. Furthermore, we can observe that the influence of λ on the results is higher in the “most difficult” programs, as we could expect.

5.6 Conclusions

In this chapter we have studied one aspect of OO Software, inheritance, to propose some approaches that can help to better guide the search of test data in the context of OO evolutionary testing. In particular, we have answered the RQ1 providing a distance measure to compute the branch distance in the presence of the instanceof operator in Java programs. We have also proposed two mutation operators that change the solutions based on the distance measure defined. In addition to the proposals we have performed a set of experiments to test our hypothesis. First, we have analyzed the most important parameters of the algorithm in order to select the best configuration. After that, we have analyzed and compared one of the proposed mutation operators against a uniform mutation. Finally, we have proposed an adaptive mutation operator that is able to make a better exploration and we have studied its main parameter.

One of the main conclusions of this work is that the difficulty to test a program depends on the number of atomic conditions per logical expression, and the nesting degree. Since we are interested in measuring the complexity of testing a program, in the next chapter we analyze the most common static measures and define a testing complexity measure to estimate the effort required to test software.

Estimating Software Testing

Complexity

6.1 Introduction

Since the birth of Software Industry, there has been a high interest in measuring the effort in terms of time and cost required by a task. Nowadays, software applications are essential for Industry, thus software developers need to measure all sort of elements. Tom DeMarco stated [59]: “You can not control what you cannot measure. Measurement is the prerequisite to management control”. The importance of metrics have also been highlighted by the famous physicist Lord Kelvin [198]: “When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind: it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the state of science”. For these reasons, in this chapter we focus on complexity measures, which quantify the effort required to complete any kind of task.

First, it is needed to define what program complexity means. Basili [27] defines complexity as a measure of the resources used by a system while interacting with a piece of software to perform a given task. If the interacting system is a computer, then complexity is defined by the execution time and storage required to perform the computation described by the program. If the interacting system is a programmer then complexity is defined by the difficulty of performing tasks such as coding, debugging, testing or modifying the software. There exist metrics introduced as all-purpose measures of software complexity, however these measures seem to be ineffective in order to measure the testing complexity [107]. The absence of a metric to properly measure the difficulty to test a piece of code encourage us to characterize the testing complexity, and drive us to the following research question RQ1: how difficult is the automated testing of a piece of code?

Analyzing the testing complexity, it can be seen as the difficulty for a computer to create a test suite for finding errors in the developed code. Finding errors in early stages of the development is an important task that saves costs of the project. It is estimated that half the time spent on the software project development and more than half its cost, is devoted to testing the product [158]. To this end, in recent years researchers have attempted to predict fault-prone software modules using complexity metrics [228]. In addition, the overall experimental results show that complexity metrics are able to predict fault-prone source code [232].

In most previous works they defined the testing complexity as the number of test cases required [130, 227]. Some works try to compute the lower bound [30] of the test cases required, and other works try to provide better understanding on the testing criterion used to generate those test cases [139]. However, they do not focus on the effort to generate these test cases. In a recent work, Nogueira focuses on the correlation between the complexity of the SUT and the complexity of the test cases [164], but the work did not propose any estimation measure.

We propose in this thesis dissertation a new complexity measure with the aim of helping the tester to find errors in the code. This measure will predict in a better way the behavior of an automatic test data generator depending on the SUT. This original complexity measure, called “Branch Coverage Expectation” (BCE), is the main contribution of this chapter. The definition of the new measure lies on a Markov model that represents the program. Based on the model of a program, we can also provide an estimation of the number of random test cases that must be generated to obtain a concrete coverage. From these estimations, we can create a theoretical prediction of the evolution of the coverage depending on the number of generated test cases. This second contribution will help the testers to obtain some knowledge about the possible evolution of the testing phase.

The validation of the proposed measure is also addressed in this work. For the theoretical validation of the BCE complexity measure we have used the validation framework proposed by Kitchenham et al. [120]. For the experimental validation we have used Evolutionary and Random Testing techniques, which are the most popular search algorithms for automatically generating test cases [10, 14, 74, 128], to compare our estimation with the real value obtained by several test data generators.

Finally, we also analyze software complexity measures at program level and we discuss a number of issues associated with these known measures. In addition, we have performed an experimental study of correlations with the aim of highlighting the existing relationships among some static measures. We are especially interested in the existing relationships between the static measures and the branch coverage. In this experimental study we have used two large groups of automatically generated programs and one group of real-world ones to serve as a benchmark.

In document Optimization Techniques for Automated Software Test Data Generation (Page 80-84)