• No results found

Both the unblocked and a blocked direct Hessenberg reduction algorithms take O(N3) flops and O(N3/B) I/Os. For large matrices, the performance can be

improved on machines with multiple levels of memory, by the two step reduction, since all most all the operations of the first step are matrix-matrix operations. We show that reduction of a nonsymmetric matrix to banded Hessenberg form of bandwidth t takes O(N3/ min{t,M}B) I/Os. We also show that the slab based

algorithm does the best when the slab width k is chosen as min{√M , t}. It is also observed that, in the existing slab based algorithms, some of the elementary matrix operations like matrix multiplication should be handled I/O efficiently, to achieve optimal I/O performances.

References

1. Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Comm. ACM 31(9), 1116–1127 (1988)

2. Vitter, J.S.: External memory algorithms. In: Handbook of Massive Data Sets. Massive Comput., vol. 4, pp. 359–416. Kluwer Acad. Publ., Dordrecht (2002) 3. Mohanty, S.K.: I/O Efficient Algorithms for Matrix Computations. PhD thesis,

Indian Institute of Technology Guwahati, Guwahati, India (2010)

4. Mohanty, S.K., Sajith, G.: I/O efficient QR and QZ algorithms. In: 19th IEEE Annual International Conference on High Performance Computing (HiPC 2012), Pune, India (accepted, December 2012)

5. Roh, K., Crochemore, M., Iliopoulos, C.S., Park, K.: External memory algorithms for string problems. Fund. Inform. 84(1), 17–32 (2008)

6. Chiang, Y.J., Goodrich, M.T., Grove, E.F., Tamassia, R., Vengroff, D.E., Vit- ter, J.S.: External-memory graph algorithms. In: Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 139–149. ACM, Philadelphia (1995)

7. Chiang, Y.J.: Dynamic and I/O-Efficient Algorithms for Computational Geometry and Graph Problems: Theoretical and Experimental Results. PhD thesis, Brown University, Providence, RI, USA (1996)

8. Goodrich, M.T., Tsay, J.J., Vengroff, D.E., Vitter, J.S.: External-memory com- putational geometry. In: Proceedings of the 34th Annual IEEE Symposium on Foundations of Computer Science, pp. 714–723. IEEE Computer Society Press, Palo Alto (1993)

9. Arge, L.: The buffer tree: a technique for designing batched external data struc- tures. Algorithmica 37(1), 1–24 (2003)

10. Vitter, J.S.: External memory algorithms and data structures: dealing with massive data. ACM Comput. Surv. 33(2), 209–271 (2001)

11. Demaine, E.D.: Cache-oblivious algorithms and data structures. Lecture Notes from the EEF Summer School on Massive Data Sets, BRICS, University of Aarhus, Denmark (2002)

12. Vitter, J.S., Shriver, E.A.M.: Algorithms for parallel memory. I. Two-level memo- ries. Algorithmica 12(2-3), 110–147 (1994)

13. Toledo, S., Gustavson, F.G.: The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations. In: Fourth Workshop on Input/Output in Parallel and Distributed Systems, pp. 28–40. ACM Press (1996)

14. Reiley, W.C., Van de Geijn, R.A.: POOCLAPACK: parallel out-of-core linear al- gebra package. Technical Report CS-TR-99-33, Department of Computer Science, The University of Texas at Austin (November 1999)

15. Alpatov, P., Baker, G., Edwards, H.C., Gunnels, J., Morrow, G., Overfelt, J., de Geijn, R.A.V.: PLAPACK: Parallel linear algebra package design overview. In: Supercomputing 1997: Proceedings of the ACM/IEEE Conference on Supercom- puting, pp. 1–16. ACM, New York (1997)

16. Van de Geijn, R.A., Alpatou, P., Baker, G., Edwards, C., Gunnels, J., Morrow, G., Overfelt, J.: Using PLAPACK: Parallel Linear Algebra Package. MIT Press, Cambridge (1997)

17. Choi, J., Dongarra, J.J., Pozo, R., Walker, D.W.: ScaLAPACK: A scalable lin- ear algebra library for distributed memory concurrent computers. In: Proceedings of the Fourth Symposium on the Frontiers of Massively Parallel Computation, pp. 120–127. IEEE Computer Society Press (1992)

18. Anderson, E., Bai, Z., Bischof, C.H., Demmel, J., Dongarra, J.J., Croz, J.D., Green- baum, A., Hammarling, S., McKenney, A., Ostrouchov, S., Sorensen, D.C.: LA- PACK Users’ Guide, 2nd edn. SIAM, Philadelphia (1995)

19. Basic Linear Algebra Subprograms(BLAS), http://www.netlib.org/blas/ 20. Toledo, S.: A survey of out-of-core algorithms in numerical linear algebra. In: Ex-

ternal Memory Algorithms. DIMACS Ser. Discrete Math. Theoret. Comput. Sci. Amer. Math. Soc., vol. 50, pp. 161–179, Piscataway, NJ, Providence, RI (1999) 21. Elmroth, E., Gustavson, F.G., Jonsson, I., K˚agstr¨om, B.: Recursive blocked al-

gorithms and hybrid data structures for dense matrix library software. SIAM Rev. 46(1), 3–45 (2004)

22. Haveliwala, T., Kamvar, S.D.: The second eigenvalue of the google matrix. Tech- nical Report 2003-20, Stanford InfoLab (2003)

23. Christopher, M.D., Eugenia, K., Takemasa, M.: Estimating and correcting global weather model error. Monthly Weather Review 135(2), 281–299 (2007)

24. Alter, O., Brown, P.O., Botstein, D.: Processing and modeling genome-wide expres- sion data using singular value decomposition. In: Bittner, M.L., Chen, Y., Dorsel, A.N., Dougherty, E.R. (eds.) Microarrays: Optical Technologies and Informatics, vol. 4266, pp. 171–186. SPIE (2001)

25. Xu, S., Bai, Z., Yang, Q., Kwak, K.S.: Singular value decomposition-based al- gorithm for IEEE 802.11a interference suppression in DS-UWB systems. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E89-A(7), 1913–1918 (2006) 26. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins Stud-

ies in the Mathematical Sciences. Johns Hopkins University Press, Baltimore (1996) 27. Watkins, D.S.: Fundamentals of Matrix Computations, 2nd edn. Pure and Applied

Mathematics. Wiley-Interscience. John Wiley & Sons, New York (2002)

28. Dongarra, J.J., Duff, I.S., Sorensen, D.C., Van der Vorst, H.A.: Numerical Lin- ear Algebra for High Performance Computers. Software, Environments and Tools, vol. 7. SIAM, Philadelphia (1998)

29. Dongarra, J.J., Croz, J.D., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990)

30. Elmroth, E., Gustavson, F.G.: New Serial and Parallel Recursive QR Factorization Algorithms for SMP Systems. In: K˚agstr¨om, B., Elmroth, E., Wa´sniewski, J., Don- garra, J. (eds.) PARA 1998. LNCS, vol. 1541, pp. 120–128. Springer, Heidelberg (1998)

31. Gunter, B.C., Reiley, W.C., Van de Geijn, R.A.: Implementation of out-of-core Cholesky and QR factorizations with POOCLAPACK. Technical Report CS-TR- 00-21, Austin, TX, USA (2000)

32. Gunter, B.C., Reiley, W.C., Van De Geijn, R.A.: Parallel out-of-core Cholesky and QR factorization with POOCLAPACK. In: IPDPS 2001: Proceedings of the 15th International Parallel & Distributed Processing Symposium. IEEE Computer Society, Washington, DC (2001)

33. Gunter, B.C., Van de Geijn, R.A.: Parallel out-of-core computation and updating of the QR factorization. ACM Trans. Math. Software 31(1), 60–78 (2005) 34. Buttari, A., Langou, J., Kurzak, J., Dongarra, J.J.: A class of parallel tiled lin-

ear algebra algorithms for multicore architectures. Parallel Comput. 35(1), 38–53 (2009)

35. Bischof, C.H., Lang, B., Sun, X.: A framework for symmetric band reduction. ACM Trans. Math. Software 26(4), 581–601 (2000)

36. Quintana Ort´ı, G., de Geijn, R.A.V.: Improving the performance of reduction to Hessenberg form. ACM Trans. Math. Software 32(2), 180–194 (2006)

37. Dongarra, J.J., Sorensen, D.C., Hammarling, S.J.: Block reduction of matrices to condensed forms for eigenvalue computations. J. Comput. Appl. Math. 27(1-2), 215–227 (1989)

38. Dongarra, J.J., van de Geijn, R.A.: Reduction to condensed form for the eigenvalue problem on distributed memory architectures. Parallel Comput. 18(9), 973–982 (1992)

39. Bischof, C.H., Lang, B., Sun, X.: Parellel tridiagonal through two-step band re- duction. In: Proceedings of the Scalable High-Performance Computing Conference, pp. 23–27. IEEE Computer Society Press (May 1994)

40. Lang, B.: Using level 3 BLAS in rotation-based algorithms. SIAM J. Sci. Com- put. 19(2), 626–634 (1998)

41. Lang, B.: A parallel algorithm for reducing symmetric banded matrices to tridiag- onal form. SIAM J. Sci. Comput. 14(6), 1320–1338 (1993)

42. Berry, M.W., Dongarra, J.J., Kim, Y.: A parallel algorithm for the reduction of a nonsymmetric matrix to block upper-Hessenberg form. Parallel Comput. 21(8), 1189–1211 (1995)

43. Ltaief, H., Kurzak, J., Dongarra, J.J.: Parallel block Hessenberg reduction using algorithms-by-tiles for multicore architectures revisited. LAPACK Working Note #208, University of Tennessee, Knoxville (2008)

44. Bai, Y., Ward, R.C.: Parallel block tridiagonalization of real symmetric matrices. J. Parallel Distrib. Comput. 68(5), 703–715 (2008)

45. Großer, B., Lang, B.: Efficient parallel reduction to bidiagonal form. Parallel Com- put. 25(8), 969–986 (1999)

46. Lang, B.: Parallel reduction of banded matrices to bidiagonal form. Parallel Com- put. 22(1), 1–18 (1996)

47. Trefethen, L.N., Bau III, D.: Numerical Linear Algebra. SIAM (1997)

48. Ltaief, H., Kurzak, J., Dongarra, J.J.: Scheduling two-sided transformations us- ing algorithms-by-tiles on multicore architectures. LAPACK Working Note #214, University of Tennessee, Knoxville (2009)

49. Bischof, C.H., Van Loan, C.F.: The W Y representation for products of Householder matrices. SIAM J. Sci. Statist. Comput. 8(1), S2–S13 (1987)

50. Wu, Y.J.J., Alpatov, P., Bischof, C.H., van de Geijn, R.A.: A parallel implemen- tation of symmetric band reduction using PLAPACK. In: Proceedings of Scalable Parallel Library Conference. PRISM Working Note 35, Mississippi State University (1996)

51. Bai, Y.: High performance parallel approximate eigensolver for real symmetric matrices. PhD thesis, University of Tennessee, Knoxville (2005)

through Mathematical Modelling