Summary - Using Entropy Measures for Comparison of Software Traces

3 Using Entropy Measures for Comparison of Software Traces

3.5 Summary

In this work we analyze the applicability of entropies to predictive classification of traces related to software defects. Our validating case study shows promising performance of extended entropies with emphasis on rare events

(

q∈

{

10 ,10−5 −4

})

. The events are based on triplets (3-words) of “characters” incorporating information about function name, depth of function call, and type of probe point (c=FDT).

In the future, we are planning to increase the number of datasets under study, derive additional measures of distance (e.g., using tree classification algorithms) and identify an optimal set of combinations of parameters.

We had to exclude a subset of entropies with E =L, q= 102 for all l and c from Λ. The values of entropies obtained with these parameters are very large (> 10100), which leads to numeric instability of (6). We keep just one of the various named q= 1 entropies to avoid redundancy.

References

[1] J. Aczél and Z. Daróczy. On measures of information and their characterizations. Academic Press, 1975.

[2] Thomas H. Cormen and Charles E. Leiserson and Ronald L. Rivest and Clifford Stein. Introduction to Algorithms. The MIT Press, 3 edition, 2009.

[3] Domenico Cotroneo and Roberto Pietrantuono and Leonardo Mariani and Fabrizio Pastore. Investigation of failure causes in workload-driven reliability testing. Proc. of the 4th international wokshop on software quality assurance: in conjunction with the 6th ESEC/FSE joint meeting, pages 78--85, 2007.

[4] Matt Davison and J. S. Shiner. Extended Entropies And Disorder. Advances in Complex Systems (ACS), 8(01):125--158, 2005.

[5] Matthew Davison and Mechelle Sophia Gittens and David Richard Godwin and Nazim H. Madhavji and Andriy Vladimir Miranskyy and Mark Francis Wilding. Computer software test coverage analysis. 2006.

[6] W. Ebeling and G. Nicolis. Word frequency and entropy of symbolic sequences: a dynamical perspective. Chaos, Solitons and Fractals, 2(6):635--650, 1992. [7] Sebastian Elbaum and Satya Kanduri and Anneliese Andrews. Trace anomalies

as precursors of field failures: an empirical study. Empirical Softw. Eng., 12(5):447--469, 2007.

[8] Sebastian Elbaum and Gregg Rothermel and Satya Kanduri and Alexey G. Malishevsky. Selecting a Cost-Effective Test Case Prioritization Technique. Software Quality Control, 12(3):185--210, 2004.

[9] Murali Haran and Alan Karr and Alessandro Orso and Adam Porter and Ashish Sanil. Applying classification techniques to remotely-collected program execution data. Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering, pages 146--155, 2005.

[10] Monica Hutchins and Herb Foster and Tarak Goradia and Thomas Ostrand. Experiments of the effectiveness of dataflow- and controlflow-based test

adequacy criteria. Proceedings of the 16th International Conference on Software Engineering, pages 191--200, 1994.

[11] James A. Jones and Mary Jean Harrold. Empirical evaluation of the tarantula automatic fault-localization technique. Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering, pages 273--282, 2005.

[12] Peter T. Landsberg and Vlatko Vedral. Distributions and channel capacities in generalized statistical mechanics. Physics Letters A, 247(3):211--217, 1998.

[13] Wenke Lee and Salvatore J. Stolfo and Philip K. Chan. Learning Patterns from Unix Process Execution Traces for Intrusion Detection. In AAAI Workshop on AI Approaches to Fraud Detection and Risk Management, :50--56, 1997.

[14] Leonardo Mariani and Mauro Pezzé. Dynamic Detection of COTS Component Incompatibility. IEEE Software, 24(5):76--85, 2007.

[15] A. V. Miranskyy and M. S. Gittens and N. H. Madhavji and C. A. Taylor. Usage of Long Execution Sequences for Test Case Prioritization. Supplemental

Proceedings of 18th IEEE International Symposium on Software Reliability Engineering, 2007.

[16] A. V. Miranskyy and N. H. Madhavji and M. S. Gittens and M. Davison and M. Wilding and D. Godwin and C. A. Taylor. SIFT: a scalable iterative-unfolding technique for filtering execution traces. Proceedings of the 2008 conference of the center for advanced studies on collaborative research, pages 274--288, 2008. [17] S. S. Murtaza and M. Gittens and Z. Li and N. H. Madhavji. F007: Finding

Rediscovered Faults from the Field using Function-level Failed Traces of Software in the Field. Proceedings of the 2010 conference of the center for advanced studies on collaborative research, 2010, to appear.

[18] Andy Podgurski and David Leon and Patrick Francis and Wes Masri and Melinda Minch and Jiayang Sun and Bin Wang. Automated support for classifying

software failure reports. Proceedings of the 25th International Conference on Software Engineering, pages 465--475, 2003.

[19] Alfréd Rényi. Probability theory. North-Holland Pub. Co., 1970.

[20] Gregg Rothermel and Mary Jean Harrold and Jeffery Ostrin and Christie Hong. An Empirical Study of the Effects of Minimization on the Fault Detection Capabilities of Test Suites. Proceedings of the International Conference on Software Maintenance, pages 34, 1998.

[21] Claude E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal, 27:623--656, 1948.

[22] Constantino Tsallis. Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical Physics, 52(1):479--487, 1988.

[23] Susana Vinga and Jonas S Almeida. Rényi continuous entropy of DNA sequences. Journal of Theoretical Biology, 231(3):377--388, 2004.

[24] Chun Yuan and Ni Lao and Ji-Rong Wen and Jiwei Li and Zheng Zhang and Yi- Min Wang and Wei-Ying Ma. Automated known problem diagnosis with event traces. Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, pages 375--388, 2006.

[26] Siemens Suite. http://pleuma.cc.gatech.edu /aristotle/Tools/subjects/, Accessed: November 15, 2010.

[27] Hyunsook Do and Sebastian Elbaum and Gregg Rothermel. Supporting

Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact. Empirical Software Engineering, 10(4):405--435, 2005. [28] Hamou-Lhadj, Abdelwahab. Measuring the Complexity of Traces Using

Shannon Entropy. Proceedings of the Fifth International Conference on Information Technology: New Generations, pages 489--494, 2008.

[29] Johan Moe and David A. Carr. Using execution trace data to improve distributed systems. Software: Practice and Experience, 32(9):889--906, 2002.

[30] R. Y. Rubinstein. Optimization of computer simulation models with rare events. European Journal of Operational Research, 99:89--112, 1997.

In document Models, Techniques, and Metrics for Managing Risk in Software Engineering (Page 88-91)