Experimental Setup and Analysis - Optimal Multi-Objective Pairwise Testing

9.4 Optimal Multi-Objective Pairwise Testing

9.4.3 Experimental Setup and Analysis

The experimental corpus of our evaluation is composed by a benchmark of 118 feature models8_,

whose number of products ranges from 16 to 640 products, that are publicly available from the SPL Conqueror [190] and the SPLOT [194] repositories. The objectives to optimize are the number of products required to test the SPL and the achieved coverage. Additionally, as performance measure we have also analyzed the time required to run the algorithm, since we want the algorithm to be as fast as possible.

We computed the Pareto optimal front for each model. Figure 9.2 shows this front for our running example GPL, where the total coverage is obtained with 12 products, and for every test suite size the obtained coverage is also optimal. As our approach is able to compute the Pareto optimal front for every feature model in our corpus, it makes no sense to analyze the quality of the solutions. Instead, we consider more interesting to study the scalability of our approach. For that, we analyzed the execution time of the algorithm as a function of the number of products represented by the feature model as shown in Figure 9.3. In this figure we can observe a tendency: the higher the number of products, the higher the execution time. Although it cannot be clearly appreciated in the figure, the execution time does not grow linearly with the number of products, the growth is faster than linear.

In order to check our intuition, we have performed a Spearman’s rank correlation test. This test’s coefficient ρ takes into account the rank of the samples instead of the samples themselves. The correlation coefficient between the execution time and the number of products denoted by a feature model is 0.831. This is a very high value that confirms our expectations, the higher the number of products, the higher the execution time of the algorithm. We also computed the Spearman’s rank correlation for the execution time against the number of features of the feature models which was quite lower (0.407). This is because two feature models with the same number of features could denote significantly different number of products depending on the constraints derived from the relationships between the features. In summary, the answer of the RQ4 is that

Figure 9.3: Time (log scale) required to find optimal Pareto set against the number of products of the feature models.

the best indicator of the execution time of our approach is the number of products denoted by a feature model.

9.5 Conclusions

Throughout this chapter we have filled several existent gaps in the SPL literature. We have tackled the pairwise test data generation problem in SPL, then we have successfully applied classical MO techniques to SPL, and finally we have presented an exact approach for computing the optimal Pareto front. Let us draw some conclusions separately for each of them.

First, we have formalized a SPL testing prioritization scheme (in Section 2.4.3) and presented its implementation with PPGS. We evaluated PPGS with 235 feature models of different character- istics using different selection criteria for product prioritization. Furthermore, we compared PPGS with greedy algorithm pICPL, a comparison that totalled 79,800 independent runs. Our analysis showed that while PPGS obtains overall shorter covering arrays it exhibits a performance difference with pICPL that tends to decrease for the feature models with larger number of products.

Second, we study the behavior of classical multi-objective evolutionary techniques applied to SPL pairwise testing. The group of algorithms were selected to cover a diverse array of techniques and concepts of multi-objective evolutionary computing. In addition, we study the impact of seeding in performance. Our evaluation unequivocally showed that seeding with knowledge from a single-objective technique produces significantly better results in less time. It also suggests that using this seeding strategy with either of NSGA-II, SPEA2 or MOCell yields results of similar quality. Our findings enable software engineers facing SPL combinatorial testing challenges to select not just one solution (as in the case of single-objective techniques) but instead to select from an array of test suite possibilities that can better match their economical or technological constraints.

Finally, we have proposed an approach to exactly obtain the optimal Pareto set of the multi- objective SPL pairwise testing problem. We defined a zero-one linear mathematical program and

an algorithm based on SAT solvers for obtaining the optimal Pareto set. By construction the solution obtained using this approach is optimal and could serve as reference for measuring the quality of the solutions proposed by approximated methods. The evaluation revealed a generally large runtime for our feature models. This fact prompted us to analyze the impact of the number of products and number of features in runtime. We found a high correlation in the first case and a low correlation in the second case. As a result of this finding our future work is twofold. First, we want to streamline the mathematical program representation in order to reduce the runtime of the algorithm. We observed that some of the constraints can be redundant. For instance, features that are selected in all the products of the product line do not need a variable since they are valid for any product. Similarly, there are pairs of feature combinations, that is ci,j,k,l variables, that

are not valid according to the feature model and hence can be eliminated [100]. We also noticed that removing some of the redundant constraints can increase the runtime, while adding more constraints could help the SAT solver search for a solution. We plan to study the right balance of both reducing and augmenting constraints. Second, we will look at larger feature models to further study the scalability of our approach.

Conclusions and Future Lines of

Research

Conclusions and Future Work

This thesis proposes a variety of contributions to the software testing field, mainly using metaheuristic techniques. We have encompassed a wide range of aspects related to testing a program: procedural and object-oriented source code, structural and functional paradigms, single-objective and multi-objective problems, isolated test cases and test sequences, and theoretical and experimental work. Regarding the analysis carried out, we have put more stress in the statistical analysis to assess the practical significance of the results. This thesis dissertation is the beginning of a research work which should be continued. For this reason, in this chapter we also describe some open issues we think are interesting to tackle in the near future.

10.1 Conclusions

Summarizing, we draw here the conclusions we have extracted from the main contributions of this thesis dissertation:

1. Definition of a new distance measure for the instanceof operator in Object Ori- ented programs. In this work we have focused on one aspect of OO Software, inheritance, to propose some approaches that can help to better guide the search of test data in the context of OO evolutionary testing. In particular, we have proposed a distance measure to compute the branch distance in the presence of the instanceof operator in Java programs. We have also proposed two mutation operators that change the solutions based on the distance measure defined. One of them is an adaptive mutation operator that is able to make a better search. Its main parameter λ controls the velocity the search changes from exploration to exploitation behavior. The experimentation confirms that the search works worse with extremme values of λ. Finally, one of the main conclusions of this work is that the difficulty to test a program depends on the number of atomic conditions per logical expression and the nesting degree, since we are interested in measuring the complexity of testing a program. 2. Definition of a new complexity measure called “Branch Coverage Expectation”.

In this work we dealt with the testing complexity from an original point of view: a program is more complex if it is more difficult to be automatically tested. Therefore, we defined the “Branch Coverage Expectation” in order to provide some knowledge about the difficulty of testing programs. The foundation of this measure is based on a Markov model of the program. The Markov model provides a theoretical background. The analysis of this measure indicates

that it is more correlated with branch coverage than the other studied static measures. This means that this is a good way of estimating the difficulty of testing a program. We think, supported by the results, that this measure is useful for predicting the behaviour of an automatic test data generator.

3. Theoretical prediction of the number of test cases needed to cover a concrete percentage of the program computed. Our Markov model of a program can be used to provide an estimation of the number of test cases needed to cover a concrete percentage of the program. We have compared our theoretical prediction with an average of real executions of a test data generator. The results show that our prediction is very similar to the evolution of a real execution of the test data generator. This model can help project managers to predict the evolution of the testing phase, which consequently can save time and cost of the entire project. This theoretical prediction could be also very useful to determine the coverage percentage using a particular number of test cases.

4. Proposal of a whole test suite approach for solving multi-objective test data generation problem. We have studied the Multi-Objective Test Data Generation Problem with the aim of analyzing the performance of a direct whole test suite multi-objective approach versus the application of mono-objective algorithms followed by a test case selection. Previous results in the literature have only focused on the coverage of a program while the oracle cost is a significant cost that has been ignored. We have evaluated four state-of-the-art multi-objective optimization algorithms: MOCell, NSGA-II, SPEA2, and PAES, two mono- objective algorithms GA, ES, and two random algorithms. In terms of convergence towards the optimal Pareto front, GA and MOCell have been the best solvers in our comparison. Although the multi-objective approach is working very well in most of the programs, we realized that dealing with only one branch at the same time (mono-objective approach) can be more effective when the program under test has high nesting degree. However, we highly recommend the direct approach if we have time restrictions.

5. Comparison of different prioritization strategies in Software Product Lines and Classification Trees. We have studied the Prioritized Pairwise Test Data Generation Prob- lem with the aim of analyzing the performance of several approaches. We have compared five different approaches related to the CTM, and two related to SPL, four of them proposed by us. We have performed some experiments on a great number of different scenario/distribution combinations and for different values of weight coverage, which makes our study meaningful. The genetic algorithm outperforms the other algorithms in most scenarios and distributions, it is the best choice when one has some time restrictions or the execution of a test case is quite costly. In the SPL experimentation, our analysis also showed that while our parallel genetic approach obtains overall shorter covering arrays it exhibits a performance difference with the parallel version of ICPL that tends to decrease for the feature models with larger number of products.

6. Definition of the Extended Classification Tree Method to generate test sequences. We have defined an entire model (ECTM) which both industry and academia could use to completely describe all aspects needed to generate sequences of tests for testing a program. We have presented two different metaheuristic approaches to optimize the automatic generation of test sequences for the CTM. The first is a genetic algorithm with memory operator (GTSG), which is able to preserve the memory required to evaluate individuals, while also allowing the algorithm to compute a solution faster than without the operator. The second

is an ACO algorithm that is able to obtain good quality solutions using little memory. Our comparison shows that ACOts is the best algorithm in the comparison (with an state-of- the-art greedy algorithm and the GTSG). It has a good tradeoff between test suite size and coverage. Its benefits are clear, we can save costs and time executing all test steps sequen- tially because the previous test step puts the software in the adequate state to test the next functionality.

7. Exploration of the effect of different seeding strategies in the computation of the Pareto fronts in SPL. We study the behaviour of classical multi-objective evolutionary techniques applied to SPL pairwise testing. In addition, we study the impact of seeding in performance. The group of algorithms were selected to cover a diverse array of techniques and concepts of multi-objective evolutionary computing. Our evaluation unequivocally showed that seeding with knowledge from a single-objective technique produces significantly better results in less time. It also suggests that using this seeding strategy with either of NSGA-II, SPEA2 or MOCell yields results of similar quality. Our findings enable software engineers facing SPL combinatorial testing challenges to select not just one solution (as in the case of single-objective techniques) but instead to select from an array of test suite possibilities that can better match their economical or technological constraints.

8. Proposal of an exact technique for the computation of optimal Pareto fronts in SPL. We have proposed an approach to exactly obtain the optimal Pareto set of the multi-objective SPL pairwise testing problem. We defined a zero-one linear mathematical program and an algorithm based on SAT solvers for obtaining the optimal Pareto set. Since the solution obtained using this approach is optimal, it could serve as reference for measuring the quality of the solutions proposed by approximated methods. The evaluation revealed a generally large runtime for the feature models. This fact prompted us to analyze the impact of the number of products and number of features in runtime. We found a high correlation in the first case and a low correlation in the second one. This means that the computation time of the optimal Pareto front depends on the number of products defined by the feature model.

One of the topics that we found of particular interest in this thesis is the proposal of a new complexity measure to provide some knowledge about the difficulty of testing programs. Actu- ally, the University of Malaga have considered to register this work as an international patent (PCT/ES2015/000100).

In document Optimization Techniques for Automated Software Test Data Generation (Page 175-183)