Not long ago, refactoring to improve the design of code was impractical and only done by a select group of hero programmers. Refactoring tools empowered the average programmer to explore the design space like a pro. A similar situation exists today for parallel refactoring. However, these require more in-depth analysis than design refactorings, which means that they can be even more useful to automate. In this decade, refactorings to simplify parallel applications and to improve their performance can become as transformative as design refactorings was in the previous decade. This thesis demonstrate the feasibility of automated refactorings to increase the maintainability, scalability and efficiency of parallel applications, by presenting the algorithms to automate two such transformations for very different domains. Furthermore, it presents the beginnings of a refactoring catalogue for high performance computing.
The first of the two automated refactoring addresses the need for immutable classes in irregular shared-memory parallel applications. Programmers use immutability to simplify both sequential and parallel applications, and to reduce the need for synchronization in shared-memory parallel applications. Although some classes are designed from the beginning to be immutable, other classes are retrofitted with immutability. Transforming mutable to immutable classes is tedious and error- prone. Our tool, Immutator, automates the analysis and transformations required to make a class
immutable. Experiments and case studies of manual transformations, as well as runningImmutator
on 346 open-source classes, show that Immutator is useful. It is applicable in more than 33% of
the studied classes. It is safer than manual transformations which introduced between 2 and 6 errors/class. It can save the programmer significant work (analyzing 57 methods and editing 45 lines) and time (27 minutes) per transformed class.
The second transformation provides a solution to the difficulty of constructing datatypes for regular distributed memory applications. The upwards scalability limitations of shared memory programming will cause more systems with distributed memory to be deployed in the future.
The increasing relative cost of communication will cause hardware vendors to move towards more advanced network subsystems. The prevalence of non-contiguous transfers, especially in scientific applications, will make hardware capable of non-contiguous zero-copy transfers common. Such features can only be used if datatypes are provided. Datatypes can simplify distributed applications and can, with the right hardware, drastically reduce the cost of non-contiguous communication over that of packing code. We have presented an algorithm that automates the process of converting packing code to datatype code. We have implemented this algorithm as a refactoring tool and evaluated it by transforming the packing code in the NAS LU application. The evaluation shows that the algorithm is applicable, and that it generates good datatypes.
These transformations demonstrate that the feasibility of refactoring for maintainable, scalable and efficient parallelism. In the future we hope that many more such refactoring transformations will be developed to improve the productivity of the parallel programmer.
REFERENCES
[1] M. Fowler, K. Beck, J. Brant, W. F. Opdyke, and D. Roberts, Refactoring : improving the design of existing code. Addison-Wesley, 2000.
[2] W. F. Opdyke and R. E. Johnson, “Refactoring: An aid in designing application frameworks and evolving object-oriented systems,” in Proceedings of Symposium on Object-Oriented Pro- gramming Emphasizing Practical Applications (SOOPPA), 1990, pp. 145–160.
[3] D. Dig, J. Marrero, and M. D. Ernst, “Refactoring sequential java code for concurrency via concurrent libraries,” in Proceedings of the 31st International Conference on Software Engi- neering (ICSE ’09), 2009.
[4] D. Dig, M. Tarce, C. Radoi, M. Minea, and R. Johnson, “Relooper: refactoring for loop parallelism in java,” in Proceeding of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications, ser. OOPSLA ’09. New York, NY, USA: ACM, 2009. [Online]. Available: http://doi.acm.org/10.1145/1639950.1640018 pp. 793–794.
[5] W. G. Griswold and D. Notkin, “Automated assistance for program restructuring,” ACM Transactions on Software Engineering and Methodology, vol. 2, no. 3, pp. 228–269, 1993. [6] J. Wloka, M. Sridharan, and F. Tip, “Refactoring for Reentrancy,” in Symposium on the
Foundations of Software Engineering (FSE), 2009, pp. 173–182.
[7] M. Sch¨afer, M. Sridharan, J. Dolby, and F. Tip, “Refactoring java programs for flexible lock- ing,” IEEE International Conference on Software Engineering, 2011.
[8] I. Pechtchanski and V. Sarkar, “Immutability specification and its applications,” in Java Grande, 2002, pp. 202–211.
[9] M. S. Tschantz and M. D. Ernst, “Javari: Adding reference immutability to Java,” in Object- Oriented Programming Systems, Languages, and Applications (OOPSLA), 2005, pp. 211–230. [10] Y. Zibin, A. Potanin, M. Ali, S. Artzi, A. Kiezun, and M. D. Ernst, “Object and Reference Immutability using Java Generics,” in Symposium on the Foundations of Software Engineering (FSE), 2007, pp. 75–84.
[11] J. P. Banning, “An efficient way to find the side effects of procedure calls and the aliases of variables,” in Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages (POPL), 1979.
[12] A. Rountev, “Precise identification of side-effect-free methods in Java,” in IEEE International Concerence on Software Maintenance (ICSM), 2004, pp. 82–91.
[13] A. Salcianu and M. C. Rinard, “Purity and side effect analysis for java programs,” in Inter- national Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI), 2005, pp. 199–215.
[14] B. G. Ryder, W. Landi, P. Stocks, S. Zhang, and R. Altucher, “A schema for interprocedural modification side-effect analysis with pointer aliasing,” ACM Transactions on Programming Languages and Systems (TOPLAS), vol. 23, no. 2, pp. 105–186, 2001.
[15] J.-D. Choi, M. Gupta, M. Serrano, V. C. Sreedhar, and S. Midkiff, “Escape analysis for java,” in Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), 1999, pp. 1–19.
[16] J. Whaley and M. C. Rinard, “Compositional pointer and escape analysis for java programs,” in Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), 1999, pp. 187–206.
[17] D. S. Goujon, M. Michel, J. Peters, and J. E. Devaney, “AutoMap and AutoLink: Tools for communicating complex and dynamic data-structures using mpi,” in Lecture Notes in Com- puter Science: Network-Based Parallel Computing Communication, Architecture, and Appli- cations, 1998, pp. 98–109.
[18] W. Tansey and E. Tilevich, “Efficient automated marshaling of c++ data structures for mpi applications,” in IEEE International Symposium on Parallel and Distributed Processing (IPDPS), April 2008, pp. 1–12.
[19] W. Gropp, E. Lusk, and D. Swider, “Improving the performance of mpi derived datatypes,” in MPI Developer’s Conference, 1999.
[20] S. Byna, W. Gropp, X.-H. Sun, and R. Thakur, “Improving the performance of mpi derived datatypes by optimizing memory-access cost,” in International Conference on Cluster Com- puting, December 2003.
[21] R. Ross, N. Miller, and W. Gropp, “Implementing fast and reusable datatype processing,” in Recent Advances in Parallel Virtual Machine and Message Passing Interface (PVM/MPI), ser. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2003, vol. 2840, pp. 404–413.
[22] I. T. Association, “Infiniband architecture specification, release 1.1,” November 2002.
[23] J. Wu, P. Wyckoff, and D. Panda, “High performance implementation of mpi derived datatype communication over infiniband,” in IEEE International Symposium on Parallel and Distributed Processing (IPDPS), 2004.
[24] G. Santhanaraman, J. Wu, and D. K. Panda, “Zero-copy MPI derived datatype communication over InfiniBand,” in Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI), ser. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2004, vol. 3241, pp. 484–490. [Online]. Available: http://nowlab.cse.ohio-state.edu/publications/conf-papers/2004/santhana-eurompi04.pdf [25] J. Moses, “Algebraic simplification: A guide for the perplexed,” in Proceedings of the second
[26] B. Buchberger and R. Loos, “Algebraic simplification,” Computer Algebra - Symbolic and Algebraic Computation, pp. 11–44, 1982.
[27] “Symja,” June 2011. [Online]. Available: http://code.google.com/p/symja/
[28] J. Cocke and K. Kennedy, “An algorithm for reduction of operator strength,” Communications of the ACM, vol. 20, no. 11, pp. 850–856, November 1977.
[29] K. D. Cooper, L. T. Simpson, and C. A. Vick, “Operator strength reduction,” ACM Trans- actions on Programming Languages and Systems (TOPLAS), vol. 23, no. 5, pp. 603–625, September 2001.
[30] S. Muchnick, Advanced Compiler Design and Implementation. Morgan Kaufmann, August 1997.
[31] F. Kjolstad, D. Dig, G. Acevedo, and M. Snir, “Transformation for class immutability,” in Proceedings of the 31st International Conference on Software Engineering (ICSE ’09), 2011. [32] D. Thomas, “Functional programming – crossing the chasm?” Journal of Object Technology,
vol. 8, no. 1, pp. 45–48, 2009.
[33] D. Riehle, “Value object,” in Conference on Pattern Languages of Programs (PLOP), 2006. [34] J. J. Dolado, M. Harman, M. C. Otero, and L. Hu, “An empirical investigation of the in-
fluence of a type of side effects on program comprehension,” IEEE Transactions on Software Engineering, vol. 29, no. 7, pp. 665–670, 2003.
[35] D. B¨aumer, D. Riehle, W. Siberski, C. Lilienthal, D. Megert, K.-H. Sylla, and H. Z¨ullighoven, “Want value objects in java?” Ubilab, Tech. Rep. 1998-10-10, 1998.
[36] “Java SE 6 API Specification,” http://java.sun.com/javase/6/docs/api.
[37] D. Marinov and R. O’Callahan, “Object equality profiling,” in Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), 2003, pp. 313–325.
[38] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Professional, 1994.
[39] J. Bloch, Effective Java: Programming Language Guide. Addison-Wesley, 2001. [40] D. Lea, Concurrent Programming In Java, 2nd ed. Addison-Wesley, 2000.
[41] B. Goetz, T. Peierls, J. Bloch, J. Bowbeer, D. Holmes, and D. Lea, Java Concurrency in Practice. Addison-Wesley Professional, 2006.
[42] R. Hickey, “The clojure programming language,” in DLS, 2008.
[43] “T.J. Watson Libraries for Analysis (WALA),” http://wala.sourceforge.net/wiki/index.php. [44] T. Hoefler and S. Gottlieb, “Parallel zero-copy algorithms for fast fourier transform and con-
jugate gradient using MPI datatypes,” in Recent Advances in the Message Passing Interface (EuroMPI), September 2010, pp. 132–141.
[45] A. Danalis, K.-Y. Kim, L. Pollock, and M. Swany, “Transformations to parallel codes for communication-computation overlap,” in The International Conference for High Performance Computing, Networking, Storage, and Analysis, 2005.
[46] G. Santhanaraman, J. Wu, W. Huang, and D. K. Panda, “Designing zero-copy message pass- ing interface derived datatype communication over infiniband: Alternative approaches and performance evaluation,” International Journal of High Performance Computing Applications (IJHPCA), vol. 19, pp. 129–142, 2005.
[47] M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra, MPI: The Complete Refer- ence, 2nd ed. MIT Press, 1998, vol. 1.
[48] J. Nieplocha and B. Carpenter, “Armci: A portable remote memory copy library for distributed array libraries and compiler run-time systems,” in Parallel and Distributed Processing, ser. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 1999, vol. 1586, pp. 533– 546.
[49] http://www.openfabrics.org/index.php.
[50] D. Bonachea, “Gasnet specification, v1.1,” U.C.Berkeley, Tech. Rep. UCB/CSD-02-1207, 2002. [51] UPC Consortium, “Upc language specifications, v1.2,” Lawrence Berkeley National Lab, Tech.
Rep. LBNL-59208, 2005.
[52] R. W. Numrich and J. Reid, “Co-array fortran for parallel programming,” ACM SIGPLAN Fortran Forum, 1998.
[53] F. Kjolstad and M. Snir, “Ghost cell pattern,” in Proceedings of the 2010 Workshop on Parallel Programming Patterns, 2010.
[54] J. Worringen, A. Gaer, and F. Reker, “Exploiting transparent remote memory access for non- contiguous- and one-sided-communication,” in Workshop on Communication Architecture for Clusters, 2002.
[55] IEEE, “Scalable coherent interface (sci).” ANSI/IEEE Std. 1596-1992, 1992.
[56] N. Tanabe and H. Nakajo, “Acceleration for MPI derived datatypes using an enhancer of memory and network,” in IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010.
[57] D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and M. Yarrow, “The nas parallel benchmarks 2.0,” NASA Ames Research Center, Tech. Rep. NAS-95-020, 1995. [58] V. Basili, J. Carver, D. Cruzes, L. Hochstein, J. Hollingsworth, F. Shull, and M. Zelkowitz,
“Understanding the high-performance-computing community: A software engineer’s perspec- tive,” IEEE Software, vol. 25, no. 4, pp. 29 – 36, 2008.
[59] G. Watson, “The parallel tools platform: A development environment for high performance computing,” in EclipseCon, 2010.
[60] I. J. . W. Group, “Rationale for international standard — programming languages — c,” 2003. [61] C. Ericson, Real-time Collision Detection. Elsevier, 2005.
[62] K. Kennedy, K. S. McKinley, and C. W. Tseng, “Interactive parallel programming using the parascope editor,” IEEE Transactions on Parallel and Distributed Systems, vol. 2, 1991. [63] K. Cooper, M. Hall, R. Hood, K. Kennedy, K. McKinley, J. Mellor-Crummey, L. Torczon,
and S. Warren, “The parascope parallel programming environment,” Proceedings of the IEEE, vol. 81, no. 2, pp. 244 – 263, 1993.
[64] D. Quinlan, “Rose: Compiler support for object-oriented frameworks,” Parallel Processing Letters (PPL), vol. 10, no. 2–3, pp. 215–226, 2000.
[65] C. Lattner and V. Adve, “Llvm: A compilation framework for lifelong program analysis & transformation,” in International Symposium on Code Generation and Optimization (CGO), 2004.