The author thanks U.S. National Science Foundation (NSF grants MIP 9702569, CCR 9711566, and CCR 0073582) for supporting this work.
PE interconnect complexity (mostly wire delays) Static partitioning complexity
(compilation and programming)
Superscalar
Difficulty in scaling up
the number of PEs
Dynamic partitioning complexity (mostly logic delays)
Multiscalar superthreading SpMT processor Trace processor DMT
FIGURE 1.16 Complexity in multithreading=multiprocessing.
*Although it is possible to pipeline a crossbar interconnect so that it can accept new requests every cycle, the long inter-PE latency that it causes would increase the number of clock cycles required to execute a program, compared with what is obtained with scalable interconnects [27].
References
1. H. Akkary and M.A. Driscoll, ‘‘A Dynamic Multithreading Processor,’’ Proceedings of 31st Inter- national Symposium on Microarchitecture, 1998.
2. R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and J.B. Smith, ‘‘The Tera Computer System,’’Proceedings of International Conference on Supercomputing, pp. 1–6, 1990. 3. D.E. Culler and J.P. Singh, Parallel Computer Architecture A Hardware=Software Approach. Morgan
Kaufmann, 1999.
4. W.J. Dally and S. Lacy, ‘‘VLSI Architecture: Past, Present, and Future,’’Proceedings of Advanced Research in VLSI Conference, 1999.
5. P. Dubey, K. O’Brien, K.M. O’Brien, and C. Barton, ‘‘Single-Program Speculative Multithreading (SPSM) Architecture: Compiler-assisted Fine-Grained Multithreading,’’Proceedings of International Conference on Parallel Architecture and Compilation Techniques(PACT’95), 1995.
6. M. Dubois, C. Scheurich, and F.A. Briggs, ‘‘Memory Access Buffering in Multiprocessors,’’Proceed- ings of the 13th International Symposium on Computer Architecture, pp. 434–442, 1986.
7. K. Ebcioglu and E.R. Altman, ‘‘DAISY: Dynamic Compilation for 100% Architectural Compatibility,’’
Proceedings of the 24th Annual International Symposium on Computer Architecture, pp. 26–37, 1997. 8. M. Franklin and G.S. Sohi, ‘‘The Expandable Split Window Paradigm for Exploiting Fine-Grain
Parallelism,’’Proceedings of 19th International Symposium on Computer Architecture, pp. 58–67, 1992. 9. M. Franklin, ‘‘The Multiscalar Architecture,’’ Ph.D. Thesis, Technical Report TR 1196, Computer
Sciences Department, University of Wisconsin, Madison, 1993.
10. M. Franklin, ‘‘Multi-Version Caches for Multiscalar Processors,’’Proceedings of International Con- ference on High Performance Computing, 1995.
11. M. Franklin and G.S. Sohi, ‘‘ARB: A Hardware Mechanism for Dynamic Reordering of Memory References,’’IEEE Transactions on Computers, vol. 45, no. 5, pp. 552–571, May 1996.
12. K. Gharachorloo et al., ‘‘Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors,’’Proceedings of the 17th International Symposium on Computer Architecture, pp. 15– 25, 1990.
13. J.R. Goodman, ‘‘Cache Consistency and Sequential Consistency,’’ Technical Report 61, IEEE SCI Committee, 1990.
14. S. Gopal, T.N. Vijaykumar, J.E. Smith, and G.S. Sohi, ‘‘Speculative Versioning Cache,’’Proceedings of 4th International Symposium on High Performance Computer Architecture(HPCA-4), 1998. 15. L. Hammond, B.A. Nayfeh, and K. Olukotun, ‘‘A Single-Chip Multiprocessor,’’ IEEE Computer,
September 1997.
16. R. Hookway, ‘‘Running 32-bit386 Applications on Alpha NT,’’Proceedings of IEEE COMPCON 97, pp. 37–42, 1997.
17. K. Hwang and Z. Xu,Scalable Parallel Computing, WCB McGraw-Hill, New York, 1998.
18. R. Joy and K. Kennedy,President’s Information Technology Advisory Committee(PITAC)—Interim Report to the President. National Coordination Office for Computing, Information and Communi- cation, 4201 Wilson Blvd, Suite 690, Arlington, VA 22230, August 10, 1998.
19. V. Krishnan and J. Torellas, ‘‘A Chip Multiprocessor Architecture with Speculative Multithreading,’’
IEEE Transactions on Computers, September 1999.
20. L. Lamport, ‘‘How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs,’’IEEE Transactions on Computers, vol. C-28, pp. 690–691, September 1979.
21. O.C. Maquelin, H.H.J. Hum, and G.R. Gao. ‘‘Costs and Benefits of Multithreading with Off- the-Shelf RISC Processors,’’ Proceedings of 1st International EURO-PAR Conference, 1995. 22. P. Marcuello, A. Gonzalez, and J. Tubella, ‘‘Speculative Multithreaded Processors,’’ Proceedings of
International Conference on Supercomputing, 1998.
23. R. Nair and M.E. Hopkins, ‘‘Exploiting Instruction Level Parallelism in Processors by Caching Scheduled Groups,’’Proceedings of the 24th Annual International Symposium on Computer Architec- ture, pp. 13–25, 1997.
24. N. Nishi et al., ‘‘A 1-GIPS 1-W Single-Chip Tightly Coupled Four-Way Multiprocessor with Architecture Support for Multiple Control-Flow Execution,’’Proceedings of the 47th International Solid-States Circuits Conference, pp. 418–475, 2000.
25. D. Padua, ‘‘Polaris: An Optimizing Compiler for Parallel Workstations and Scalable Multiproces- sors,’’ Technical Report 1475, University of Illinois at Urbana-Champaign, Center for Supercom- puting Research & Development, January 1996.
26. C. Polychronopoulos, M.B. Girkar, M.R. Haghighat, C.L. Lee, B.P. Leung, and D.A. Schouten, ‘‘The Structure of Parafrase-2: An Advanced Parallelizing Compiler for C and Fortran,’’Languages and Compilers for Parallel Computing, MIT Press, Cambridge, MA, 1990.
27. N. Ranganathan and M. Franklin, ‘‘An Empirical Study of Decentralized ILP Execution Models,’’
Proceedings of 8th International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS-VIII), pp. 272–281, 1998.
28. E. Rotenberg, Q. Jacobson, Y. Sazeides, and J.E. Smith, ‘‘Trace Processors,’’Proceedings of the 30th International Symposium on Microarchitecture, pp. 138–148, 1997.
29. B.J. Smith, ‘‘The Architecture of HEP,’’Parallel MIMD Computation: HEP Supercomputer and Its Applications, pp. 41–55, MIT Press, Cambridge, MA.
30. G.S. Sohi, S.E. Breach, and T.N. Vijaykumar, ‘‘Multiscalar Processors,’’ Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 414–425, 1995.
31. J.G. Steffan and T.C. Mowry, ‘‘The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization,’’Proceedings of 4th International Symposium on High Performance Com- puter Architecture, 1998.
32. K.K. Sundararaman and M. Franklin, ‘‘Multiscalar Execution along a Single Flow of Control,’’
Proceedings of International Conference on Parallel Processing(ICPP), pp. 106–113, 1997.
33. M. Thistle and B.J. Smith, ‘‘A Processor Architecture for Horizon,’’Proceedings of Supercomputing ’88, pp. 35–41, 1988.
34. M. Tremblay et al, ‘‘The MAJC Architecture: A Synthesis of Parallelism and Scalability,’’ IEEE MICRO, pp. 12–25, November=December 2000.
35. J.-Y. Tsai and P.-C. Yew, ‘‘The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation,’’ Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques(PACT ’96), pp. 35–46, 1996.
36. S. Vajapeyam and T. Mitra, ‘‘Improving Superscalar Instruction Dispatch and Issue by Exploiting Dynamic Code Sequences,’’Proceedings of the 24th Annual International Symposium on Computer Architecture, pp. 1–12, 1997.
37. U. Vishkin, S. Dascal, E. Berkovich, and J. Nuzman, ‘‘Explicit Multi-threaded (XMT) Bridging Models for Instruction Parallelism,’’Proceedings of the 10th ACM Symposium on Parallel Algorithms and Architectures(SPAA), pp. 140–151, 1998.
38. S. Wallace, B. Calder, and D. Tullsen, ‘‘Threaded Multiple Path Execution,’’Proceedings of the 25th Annual International Symposium on Computer Architecture,pp. 238–249, 1998.
39. ‘‘The National Technology Roadmap for Semiconductors,’’ Semiconductor Industry Association, 1997.