Future Work - Optimization Techniques for Automated Software Test Data Generation

There are different future lines possible after the work contained in the thesis. Broadly speaking, these lines are the continuation with the design of the Markov model of a program, the extension of the feature model corpus and the group of multi-objective algorithms analyzed, the proposition of trajectory search-based algorithm for the generation of test sequences, and a validation strategy based on software testing techniques for the traffic light programs. Our future proposals in them are as follows.

As to the continuation with the design of the Markov model of a program, we plan to improve the representation of a program, but without losing its simplicity. We plan to advance in the knowledge of the features of a program that occurs in a condition. The computation of the probabilities associated to a concrete decision is a great challenge to improve our measure. In

addition, we will take into account the data dependencies in the probabilities computed for the Markov model. This fact will provide more precision in the transition probabilities. Besides that, we plan to consider the amount of resources needed to execute different test cases, in particular when loops are involved in the execution of a test case. We would also like to apply our complexity measure to real-world software and compare the results with the real difficulty of testing the program by an expert.

Our work opens up several research venues, which we plan to address as part of our future work. One of them is the extension of the feature model corpus and the group of multi-objective algorithms analyzed. Besides these objectives, our goal is to expand our study beyond pairwise coverage (t ≥ 2), to integrate domain knowledge such as control flow information as an optimization objective for test code generation, and to characterize when a particular multi-objective algorithm performs better based, for example, on structural metrics of the feature models [24].

Moreover, an interesting research topic nowadays is the generation of test sequences. We have proposed different approaches for the automatic generation of test sequences to be integrated in the CTE XL professional tool. We need to collect more real scenarios for comparison purposes, this is absolutely necessary when you are adding functionality to a professional tool. Although we have obtained great results with the ACOts algorithm, we plan to propose a trajectory search-based algorithm such as Simulated Annealing that has obtained great results in Combinatorial Testing (Covering Arrays [200]) and might suit this problem.

Regarding the evaluation of the exact approach for the computation of the true Pareto front in SPL, we found a high correlation with the number of products defined by the feature model of a program. As a result of this finding our future work, we wants to streamline the mathematical program representation in order to reduce the runtime of the algorithm. We observed that some of the constraints can be redundant. For instance, features that are selected in all the products of the product line do not need a variable since they are valid for any product. Similarly, there are pairs of feature combinations, that are not valid according to the feature model and hence can be eliminated [100]. We also noticed that removing some of the redundant constraints can increase the runtime, while adding more constraints could help the SAT solver search for a solution. We plan to study the right balance of both by reducing and augmenting constraints.

Finally, we also plan to apply the algorithms proposed in this thesis for solving real-world instance problems. In concrete, we are interested in applying them for solving problems related to the scheduling of traffic lights. We plan to propose a validation strategy for the traffic light programs to be used by the human experts of Smart Mobility. We will propose the use of a traffic feature model with priorities, which allows us not only to reduce the number of traffic scenarios to test the available cycle programs, but also to generate the most important scenarios. A first journal article following this research line is under review at the moment of writing.

Publications Supporting this PhD

Thesis Dissertation

In this appendix, we present the set of scientific articles that have been published during the years in which this thesis has been developed. These publications speak for the interest, validity, and impact on the scientific community and literature of the work contained in this thesis, since they have appeared in impact fora, and have been subjected to peer review by expert researchers. International Journals indexed by ISI-JCR

[1] Ferrer, J., Kruse P. M., Chicano F., and Alba E. (2014). Search based algorithms for test sequence generation in functional testing. Information and Software Technology.

[2] Ferrer, J., Chicano F., and Alba E. (2013). Estimating software testing complexity. Informa- tion and Software Technology. 55, 2125 − 2139.

[3] Ferrer, J., Chicano F., and Alba E. (2012). Evolutionary algorithms for the multi-objective test data generation problem. Software Practice and Experience 42, 1331 − 1362.

International Journals indexed by ISI-JCR under review

[4] Ferrer J. ,Garc´ıa-Nieto J., Chicano F., and Alba E. (2015). Intelligent Testing of Traffic Light Programs: Validation in Smart Mobility Scenarios. Submitted to Mathematical Problems in Engineering.

International Journals

[5] Ferrer, J., Chicano F., and Alba E. (2010). Correlation between static measures and code coverage in evolutionary test data generation. International Journal of Software Engineering and its Applications. 4, 57−79.

Book Chapters and LNCS Series

[6] R. Lopez-Herrejon, J. Ferrer, F. Chicano, A.Egyed, and E.Alba Evolutionary Computation for Software Product Line Testing: An Overview and Open Challenges (to appear)- Book chapter.

[7] Chicano, F., Ferrer J., and Alba E. (2011). Elementary Landscape Decomposition of the Test Suite Minimization Problem. Third International Symposium, SSBSE 2011, Szeged, Hungary, September 10−12, 2011, p.48−63.

International Conferences (Core A)

[8] Garca-Nieto J., Ferrer, J., and Alba E. (2014). Optimising Traffic Lights with Metaheuristics: Reduction of Car Emissions and Consumption. Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2014), Beijing, China, July 6−11, 2014, 48−54. [9] Lopez-Herrejon, R. Erick, Ferrer J., Chicano F., Haslinger E. Nicole, Egyed A., and Alba E.

(2014). A parallel evolutionary algorithm for prioritized pairwise testing of software product lines. Genetic and Evolutionary Computation Conference, GECCO ’14, Vancouver, BC, Canada, July 12−16, 2014. p. 1255−1262.

[10] Lopez-Herrejon, R. E., Chicano F., Ferrer J., Egyed A., and Alba E. (2013). Multi-objective Optimal Test Suite Computation for Software Product Line Pairwise Testing. IEEE Interna- tional Conference on Software Maintenance, Eindhoven, The Netherlands, September 22−28, 2013. p. 404−407.

[11] Ferrer, J., Kruse P. M., Chicano F., and Alba E. (2012). Evolutionary algorithm for prioritized pairwise test data generation. Genetic and Evolutionary Computation Conference, GECCO ’12, Philadelphia, PA, USA, July 7−11, 2012. p. 1213−1220.

[12] Ferrer, J., Chicano F., and Alba E. (2009). Dealing with inheritance in OO evolutionary testing. Genetic and Evolutionary Computation Conference, GECCO 2009, Proceedings, Montreal, Qubec, Canada, July 8−12, 2009. p.1665−1672.

International Conferences (Core B, Core C, and non-Core)

[13] Lopez-Herrejon, R. Erick, Ferrer J., Chicano F., Egyed A., and Alba E. (2014). Comparative analysis of classical multi-objective evolutionary algorithms and seeding strategies for pairwise testing of Software Product Lines. Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2014, Beijing, China, July 6−11, 2014. p. 387−396.

[14] Ferrer, J., Chicano F., and Alba E. (2011). Benchmark Generator for Software Testers. Artificial Intelligence Applications and Innovations - 12th INNS EANN-SIG International Conference, EANN 2011 and 7th IFIP WG 12.5 International Conference, AIAI 2011, Corfu, Greece, September 15−18, 2011, Proceedings, Part II. p. 378−388.

[15] Ferrer, J., Chicano F., and Alba E. (2010). Measuring Testing Complexity. 2nd International Symposium on Search Based Software Engineering, Benevento, Italy (Short paper online URL: http://ssbse.org/2010/fastabstracts/ssbse2010 fastabstract 02.pdf)

National Conferences

[16] Ferrer, J., Alba, E., and Chicano F. (2015) Sistema Inteligente para la Recogida de Residuos en las Ciudades basado en Predicciones de Llenado. Proceedings of the I Congreso Ciudades Inteligentes, Madrid, March 24−25, 2015, p. 319−324

[17] Ferrer, J., Kruse P. M., Chicano F., and Alba E. (2015) Generaci´on de secuencias de pruebas funcionales con algoritmos bio-inspirados. X Congreso Espa˜nol de Metaheursticas, Algorit- mos Evolutivos y Bioinspirados (MAEB), Merida, February 4−6, 2015, p. 59−66.

[18] Ferrer, J., Garca-Nieto J., Alba E., and Chicano F. (2013). Validación Inteligente para la Sin- cronización de Semáforos Basada en Feature Models. IX Congreso Espaol de Metaheursticas, Algoritmos Evolutivos y Bioinspirados (MAEB), Albacete, September 17−20, p. 812−821. [19] Ferrer, J., Chicano F., and Alba E. (2009). On the Correlation between Static Measures

and Code Coverage using Evolutionary Test Case Generation. Actas de los Talleres de las JISBD, San Sebasti´an, September 8−9, vol 3, n 1, p.50−61.

Articles in CoRR - not reviewed

[20] Roberto E. Lopez-Herrejon, Javier Ferrer, Francisco Chicano, Evelyn Nicole Haslinger, Alexan- der Egyed, and Enrique Alba. (2014) Towards a Benchmark and a Comparison Framework for Combinatorial Interaction Testing of Software Product Lines. CoRR abs/1401.5367 [21] Roberto E. Lopez-Herrejon, Javier Ferrer, Francisco Chicano, Lukas Linsbauer, Alexander

Egyed, and Enrique Alba. (2014) A Hitchhiker’s Guide to Search-Based Software Engineering for Software Product Lines. CoRR abs/1406.2823

Resumen en Espa˜nol

B.1 Introducci´on

La mayor´ıa de los pa´ıses dependen de sistemas complejos basados en computadores que gobiernan las principales infraestructuras y servicios, as´ı como la mayor´ıa de dispositivos electrónicos usados diariamente. Consecuentemente, producir y mantener software es esencial para la sociedad actual, lo que significa un importante desaf´ıo para la comunidad cient´ıfica. La ciencia de la computación es un campo de investigación en expansión que involucra el entendimiento y diseño de los computadores y su software. El principal objetivo de este campo es el estudio de los procesos algor´ıtmicos automáticos que escalan para eliminar cuellos de botella. Uno de los más relevantes aspectos de la investigación en computadores es el diseño y desarrollo de nuevos algoritmos eficientes capaces de resolver problemas complejos en cada vez menos tiempo. Adicionalmente, la comunidad de investigadores en este campo está determinada a enfrentar estos problemas que no pudieron ser resueltos previamente.

Los problemas del mundo real son duros (en la mayor´ıa de casos NP-duros), que significa que la complejidad del problema crece de forma exponencial con su tamaño. Cuando se usan técnicas exactas, estas aseguran encontrar la solución óptima en problemas relativamente pequeños, pero son extremadamente lentos en problemas medios y grandes. La mayor´ıa de los problemas del mundo real tiene un espacio de búsqueda grande, a la vez que esátn sujetos a restricciones e incertidumbres, as´ı que son dif´ıciles de resolver por los algoritmos exactos. Con este objetivo en mente, los investigadores han aplicado técnicas emergentes como las metaheur´ısticas, las cuales han probado su utilidad enfrentando problemas duros. Los enfoques metaheur´ısticos son capaces de proporcionar soluciones cuasi-óptimas en un tiempo moderado, as´ı que ofrecen un buen compromiso entre calidad de las soluciones y el ahorro de recursos, que es de hecho, el principal objetivo de las compañ´ıas industriales.

La Ingenier´ıa del Software es una de las disciplinas de ingenier´ıa que se preocupa de todos los aspectos de la producción de software, desde las etapas más tempranas de la especificación hasta el mantenimiento cuando el software ya está en uso [192]. El desarrollo de software confiable es un desaf´ıo clave para la industria del software, de hecho, se estima que alrededor de la mitad del tiempo empleado en el desarrollo del proyecto software y más de la mitad de su coste se debe a las pruebas del producto. La automatización de la generación de pruebas podr´ıa reducir el coste de todo el proyecto, esto explica porque tanto la industria del software como la academia están interesados en las herramientas de pruebas automáticas. Como la generación de pruebas adecuadas

implica un coste computacional grande, los enfoques basados en b´usqueda son esenciales para lidiar con este problema.

En esta tesis aplicamos técnicas de búsqueda metaheur´ısticas para la optimización de problemas derivados del proceso de automatización de pruebas, particularmente el problema de auto- matización de la generación de datos de prueba para encontrar errores en el código fuente. Hay un rango amplio de problemas de generación de datos de prueba, en este trabajo intentamos abarcar ambos tipos de pruebas de software, de caja blanca as´ı como de caja negra. En pruebas de caja blanca, también llamadas pruebas estructurales, los ingenieros de pruebas requieren conocimiento de cómo está implementado el software, en contraste en pruebas de caja negra o pruebas funcionales, donde los ingenieros de pruebas se concentran en qué hace el software, en vez de cómo lo hace.

Este es el contexto de este trabajo de tesis. Estamos determinados a proponer técnicas metaheur´ısticas para resolver las diferentes variantes de la generación de datos automatizada para software. Analizamos las diferentes técnicas con el objetivo de obtener resultados óptimos en calidad y coste, ambos aspectos muy importantes en el desarrollo software debido a la falta de recursos. Debido al carácter intr´ınseco hemos propuesto técnicas de optimización mono-objetivo para ma- ximizar la calidad del conjunto de pruebas y técnicas multi-objetivo que consideran adicionalmente el coste de las pruebas. Además, pensamos que para comprender y resolver un problema necesi- tamos extraer todo el conocimiento posible. Consecuentemente proponemos una medida nueva de complejidad para predecir, de una mejor manera, la dificultad de probar una pieza de código con el objetivo de ayudar a los gestores de los proyectos a estimar el esfuerzo necesario para llevar a cabo esta importante tarea del desarrollo del software.

In document Optimization Techniques for Automated Software Test Data Generation (Page 183-194)