Conclusion And Future Work - SIFT: A Scalable Iterative-Unfolding Technique for Filtering Execu

2 SIFT: A Scalable Iterative-Unfolding Technique for Filtering Execution Traces

2.7 Conclusion And Future Work

The comparison of traces resulting from the execution of a software system is of considerable interest for a variety of purposes, such as software testing [1-3], program comprehension [13-15], and security [11]. In this paper, we have proposed a new, iterative-unfolding, approach (called SIFT) for filtering-out traces to help speed up the overall comparison process.

The essence of this approach is that it iteratively compares traces at different levels of compression, from high to low, and in the process it rapidly eliminates dissimilar traces, eventually leaving residual, similar, traces at the lowest level of compaction. Once similar traces are identified, they can be passed to external tools for further analysis. We use fingerprinting techniques for compressing traces, and comparison and clustering algorithms for identifying similar traces.

Our approach can be packaged as a framework, where the component algorithms and techniques can be replaced with alternate techniques making the framework portable to other development environments. Further details are web-accessible from [24, Section 5], where also details of the usage environment of this technology are described (see [24, Section 6]).

The paper describes a significant case study involving 1416 traces from a large, distributed software system in use at numerous sites worldwide. The efficiency of the approach is linear with the growth of the number of traces when comparing a trace against a set of traces, and quadratic when comparing within a set of traces using clustering techniques (see Figure 10). The timings from the case study data are feasible in a practical environment. From these results, we thus conclude that the iterative-unfolding approach is scalable for use in a practical environment

We plan to conduct a number of further case studies. These include, for example, increasing the dataset size; validation of set-to-set comparisons; and cost-benefit analysis

in a practical environment. Our tool development effort is on-going with the long-term goal being to transfer the technology to the production environment.

References

[1] Avvari, M. V., Chin, P. A., Nandigama, M. K. and Dhanikonda, Software application test coverage analyzer. U.S. Patent #6,978,401, Sun Microsystems, Inc., U.S., (2005).

[2] Biermann, A. and Feldman, J. On the Synthesis of Finite State Machines from Samples of their Behavior. IEEE Trans. Computers, 21, 6 (1972), 592–597.

[3] Cook, J. E. and Wolf, A. Discovering Models of Software Processes from Event- Based Data. ACM Trans. Software Eng. and Methodology, 7, 3 (1998), 215-249. [4] Cotroneo, D., Pietrantuono, R., Mariani, L. and Pastore, F. Investigation of

Failure Causes in Workload-Driven Reliability Testing. In Proc. 4th International Workshop on Software Quality Assurance (2007), 78-85.

[5] Dallmeier, V., Lindig, C. and Zeller, A. Lightweight Defect Localization for Java.

In Proc. European Conference on Object Oriented Programming (2005), 528-

550.

[6] Davison, M., Gittens, M., Godwin, D., Madhavji, N. H., Miranskyy, A. V. and Wilding, M. Improvement of computer software test coverage analysis. . U.S. Patent Application # 11/549410, IBM Corp., (2006).

[7] Elbaum, S., Kanduri, S. and Andrews, A. Trace anomalies as precursors of field failures: an empirical study Empir. Software Eng. , 12, 5 (2007), 447-469.

[8] Elbaum, S., Rothermel, G., Kanduri, S. and Malishevsky, A. G. Selecting a Cost- Effective Test Case Prioritization Technique. Software Quality Control, 12, 3 (2004), 185-210.

[9] Feder, T. and Motwani, R. Clique partitions, graph compression and speeding-up algorithms. J. Comput. Syst. Sci., 51, 2 (1995), 261-272.

[10] Fremuth-Paeger, C. and Jungnickel, D. Balanced network flows. VIII. A revised theory of phase-ordered algorithms and the O( sqrt(n) m log(n^2 /m)/log n) bound for the nonbipartite cardinality matching problem. Networks, 41, 3 (2003), 137- 142.

[11] Greevy, O., Ducasse, S. and Girba, T. Analyzing Feature Traces to Incorporate the Semantics of Change in Software Evolution Analysis. In Proc. 21st IEEE Int'l Conference on Software Maintenance (2005), 347-356.

[12] Hamou-Lhadj, A. and Lethbridge, T. C. Compression techniques to simplify the analysis of large execution traces. In Proc. 10th Int'l Wkshp on Program Comprehension (2002), 159-168.

[13] Haran, M., Karr, A., Orso, A., Porter, A. and Sanil, A. Applying classification techniques to remotely-collected program execution data. SIGSOFT Softw. Eng. Notes, 30, 5 (2005), 146-155.

[14] Jain, A. K., Murty, M. N. and Flynn, P. J. Data clustering: a review. ACM Computing Surveys, 31, 3 (1999), 264-323.

[15] kBehavior http://www.lta.disco.unimib.it/kbehavior/.

[16] Kuhn, A. and Greevy, O. Exploiting the Analogy Between Traces and Signal Processing In Proc. 22nd IEEE Int'l Conference on Softw. Maintenance (2006), 320-329.

[17] Lee, W., Stolfo, S. J. and Chan, P. K. Learning patterns from unix process execution traces for intrusion detection. In Proc. AAAI Workshop: AI Approaches to Fraud Detection and Risk Management (1997), 50-56.

[18] Levenshtein, V. I. Binary codes capable of correcting deletions, insertions, and reversals (Russian). Doklady Akademii Nauk SSSR, 163, 4 (1966), 845-848.

[19] Li, M., Ma, B., Kisman, D. and Tromp, J. PatternHunter II: Highly Sensitive and Fast Homology Search. J. of Bioinformatics and Computational Biology, 2, 3 (2004), 417-440.

[20] Mariani, L. and Pezzè, M. Inference of component protocols by the kBehavior algorithm. University of Milano Bicocca, 2004.

[21] Mariani, L. and Pezzè, M. Dynamic Detection of COTS Component Incompatibility. IEEE Software, 24, 5 (2007), 76-85.

[22] Masri, W., Podgurski, A. and Leon, D. An Empirical Study of Test Case Filtering Techniques Based on Exercising Information Flows. IEEE Trans. Softw. Eng., 33, 7 (2007), 454-477.

[23] Miranskyy, A. V., Gittens, M. S., Madhavji, N. H. and Taylor, C. A. Usage of Long Execution Sequences for Test Case Prioritization. In Proc. Suppl. Proc. of 18th IEEE Int'l Symp. on Softw. Reliability Eng. (2007).

[24] Miranskyy, A. V., Madhavji, N. H., Gittens, M. S., Davison, M., Wilding, M. and Godwin, D. An Iterative, Multi-Level, and Scalable Approach to Comparing Execution Traces, TR-74.209. IBM Center for Advanced Studies (CAS), Toronto, 2007 (https://www.ibm.com/ibm/cas/publications/index.shtml).

[25] Moe, J. and Carr, D. A. Using execution trace data to improve distributed systems. Softw. Pract. Exper., 32(2002), 889-906.

[26] Myers, E. An O(ND) Difference Algorithm and Its Variations. Algorithmica, 1, 2 (1986), 251-266.

[27] Podgurski, A., Leon, D., Francis, P., Masri, W., Minch, M., Sun, J. and Wang, B. Automated support for classifying software failure reports. In Proc. 25th Int'l Conference on Software Engineering (2003), 465 - 475.

[28] Reiss, S. P. and Renieris, M. Encoding program executions. In Proc. 23rd Int'l Conf. on Software Engineering (2001), 221-230.

[29] Renieris, M., Ramaprasad, S. and Reiss, S. P. Arithmetic program paths. In Proc. 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering (2005), 90-98.

[30] Rothermel, G., Harrold, M. J., Ostrin, J. and Hong, C. An Empirical Study of the Effects of Minimization on the Fault Detection Capabilities of Test Suites. In Proc. International Conference on Software Maintenance (1998), 34-43.

[31] Yuan, C., Lao, N., Wen, J.-R., Li, J., Zhang, Z., Wang, Y.-M. and Ma, W.-Y. Automated known problem diagnosis with event traces. In Proc. 2006 EuroSys conference (2006), 357-388.

In document Models, Techniques, and Metrics for Managing Risk in Software Engineering (Page 61-65)