5.2
Future works
By using a central limit theoram [13] a Dynamic Set Insertion Policy can be implemented by considering Least Recently Used(LRU) cache re- placement policy, if a application is cache friendly.
A Dynamic Set Insertion Policy can be implemented by considering Bit Set Insertion Policy (BSIP) cache replacement policy, if a application is thrashing.
Cache replacement policies can be compared using power consumed as a performance parameter.
Cache replacement policies can be implemented by considering thread level parallelism in multicore processors.
Bibliography
[1] John P Hayes. Computer Organization and Architecture. McGraw-Hill, Inc. Newyork,USA, 1978.
[2] William Stallings. Computer organization and architecture: designing for performance. Pearson Education India, 1993.
[3] Behrooz Parhami. Computer architecture: from microprocessors to supercom- puters. Oxford University Press New York, NY, 2005.
[4] Xian-He Sun and Yong Chen. Reevaluating amdahls law in the multicore era. Journal of Parallel and Distributed Computing, 70(2):183–188, 2010.
[5] Moinuddin K Qureshi and Yale N Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the 39th Annual IEEE/ACM International Sympo- sium on Microarchitecture, pages 423–432. IEEE Computer Society, 2006.
[6] Aamer Jaleel, William Hasenplaugh, Moinuddin Qureshi, Julien Sebot, Si- mon Steely Jr, and Joel Emer. Adaptive insertion policies for managing shared caches. In Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 208–219. ACM, 2008.
[7] Yuejian Xie and Gabriel H Loh. Pipp: promotion/insertion pseudo- partitioning of multi-core shared caches. In ACM SIGARCH Computer Ar- chitecture News, volume 37, pages 174–183. ACM, 2009.
[8] Alan Jay Smith. Cache memories. ACM Computing Surveys (CSUR), 14(3):473–530, 1982.
Bibliography
[9] James R Goodman. Using cache memory to reduce processor-memory traffic. In ACM SIGARCH Computer Architecture News, volume 11, pages 124–131. ACM, 1983.
[10] Fei Guo and Yan Solihin. An analytical model for cache replacement pol- icy performance. In ACM SIGMETRICS Performance Evaluation Review, volume 34, pages 228–239. ACM, 2006.
[11] Asit Dan and Don Towsley. An approximate analysis of the LRU and FIFO buffer replacement schemes, volume 18. ACM, 1990.
[12] Song Jiang and Xiaodong Zhang. Lirs: an efficient low inter-reference recency set replacement policy to improve buffer cache performance. In ACM SIG- METRICS Performance Evaluation Review, volume 30, pages 31–42. ACM, 2002.
[13] Moinuddin K Qureshi, Aamer Jaleel, Yale N Patt, Simon C Steely, and Joel Emer. Adaptive insertion policies for high performance caching. In ACM SIGARCH Computer Architecture News, volume 35, pages 381–391. ACM, 2007.
[14] Aamer Jaleel, Kevin B Theobald, Simon C Steely Jr, and Joel Emer. High performance cache replacement using re-reference interval prediction (rrip). In ACM SIGARCH Computer Architecture News, volume 38, pages 60–71. ACM, 2010.
[15] Hassan Ghasemzadeh, Sepideh Mazrouee, and Mohammad Reza Kakoee. Modified pseudo lru replacement algorithm. In Engineering of Computer Based Systems, 2006. ECBS 2006. 13th Annual IEEE International Sympo- sium and Workshop on, pages 6–pp. IEEE, 2006.
[16] Mainak Chaudhuri. Pseudo-lifo: the foundation of a new family of re- placement policies for last-level caches. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pages 401–412. ACM, 2009.
Bibliography
[17] Donghee Lee, Jongmoo Choi, Honggi Choe, Sam H Noh, Sang Lyul Min, and Yookun Cho. Implementation and performance evaluation of the lrfu replace- ment policy. In EUROMICRO 97.’New Frontiers of Information Technology’. Short Contributions., Proceedings of the 23rd Euromicro Conference, pages 106–111. IEEE, 1997.
[18] Theodore Johnson and Dennis Shasha. X3: A low overhead high performance buffer management replacement algorithm. 1994.
[19] Sorav Bansal and Dharmendra S Modha. Car: Clock with adaptive replace- ment. In FAST, volume 4, pages 187–200, 2004.
[20] Kamil Kedzierski, Miquel Moreto, Francisco J Cazorla, and Mateo Valero. Adapting cache partitioning algorithms to pseudo-lru replacement policies. In Parallel & Distributed Processing (IPDPS), 2010 IEEE International Sympo- sium on, pages 1–12. IEEE, 2010.
[21] Wayne A Wong and Jean-Loup Baer. Modified lru policies for improving second-level cache behavior. In High-Performance Computer Architecture, 2000. HPCA-6. Proceedings. Sixth International Symposium on, pages 49–60. IEEE, 2000.
[22] Jaekyu Lee and Hyesoon Kim. Tap: A tlp-aware cache management policy for a cpu-gpu heterogeneous architecture. In High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on, pages 1–12. IEEE, 2012.
[23] Vineeth Mekkat, Anup Holey, Pen-Chung Yew, and Antonia Zhai. Managing shared last-level cache in a heterogeneous multicore processor. In Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, pages 225–234. IEEE Press, 2013.
[24] Francesc Alted. Why modern cpus are starving and what can be done about it. Computing in Science & Engineering, 12(2):68–71, 2010.
Bibliography
[25] Kun Luo, Jayanth Gummaraju, and Manoj Franklin. Balancing thoughput and fairness in smt processors. In Performance Analysis of Systems and Software, 2001. ISPASS. 2001 IEEE International Symposium on, pages 164– 171. IEEE, 2001.
[26] Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, and David Kaeli. Multi2sim: A simulation framework for cpu-gpu computing. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques, pages 335–344. ACM, 2012.
[27] Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta. The splash-2 programs: Characterization and method- ological considerations. In ACM SIGARCH Computer Architecture News, volume 23, pages 24–36. ACM, 1995.
[28] David Tam, Reza Azimi, Livio Soares, and Michael Stumm. Managing shared l2 caches on multicore systems in software. In Workshop on the Interaction between Operating Systems and Computer Architecture, 2007.
[29] Alan Jay Smith. Cache evaluation and the impact of workload choice, vol- ume 13. IEEE Computer Society Press, 1985.
[30] Mark D Hill and Alan Jay Smith. Evaluating associativity in cpu caches. Computers, IEEE Transactions on, 38(12):1612–1630, 1989.
[31] Alan Jay Smith. Line (block) size choice for cpu cache memories. Computers, IEEE Transactions on, 100(9):1063–1075, 1987.
[32] Paul Sweazey and Alan Jay Smith. A class of compatible cache consistency protocols and their support by the ieee futurebus. In ACM SIGARCH Com- puter Architecture News, volume 14, pages 414–423. IEEE Computer Society Press, 1986.
[33] Erlin Yao, Yungang Bao, Guangming Tan, and Mingyu Chen. Extending am- dahl’s law in the multicore era. ACM SIGMETRICS Performance Evaluation Review, 37(2):24–26, 2009.
Bibliography
[34] Seongbeom Kim, Dhruba Chandra, and Yan Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Tech- niques, pages 111–122. IEEE Computer Society, 2004.
[35] Mainak Chaudhuri, Jayesh Gaur, Nithiyanandan Bashyam, Sreenivas Subra- money, and Joseph Nuzman. Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches. In Proceedings of the 21st in- ternational conference on Parallel architectures and compilation techniques, pages 293–304. ACM, 2012.
[36] Nitin Chaturvedi and S Gurunarayanan. Study of various factors affecting performance of multi-core processors. International Journal, 2013.
[37] Yi Yang, Ping Xiang, Mike Mantor, and Huiyang Zhou. Cpu-assisted gpgpu on fused cpu-gpu architectures. In High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on, pages 1–12. IEEE, 2012.
[38] Alan Jay Smith. Cache memories. ACM Computing Surveys (CSUR), 14(3):473–530, 1982.
[39] Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, and David Kaeli. Multi2sim: A simulation framework for cpu-gpu computing. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques, pages 335–344. ACM, 2012.