Conclusions - Speeding up architectural simulation through high-level core abstractions and sam

Exploration of a variety of system parameters in a short amount of time is critical to determining successful future architecture designs. With the ever growing number of processors per system and cores per socket, there are challenges when trying to simulate these growing system sizes in reason- able amounts of time. Compound the growing number of cores with larger cache sizes, and one can see that longer, accurate simulations are needed to effectively evaluate next generation system designs. But, because of com- plex core-uncore interactions and multi-core effects due to heterogeneous workloads, realistic models that represent modern processor architectures become even more important. In this work, we present the combination of a highly accurate, yet easy to develop core model, the interval model, with a fast, parallel simulation infrastructure. This combination provides accurate simulation of modern computer systems with high performance, up to 2.0 MIPS.

Even when comparing a one-IPC model that is able to take into account attributes of many superscalar, out-of-order processors, the benefits of the

2.8 Conclusions 31

interval model provide a key simulation trade-off point for architects. We have shown a 23.8% average absolute error when simulating a 16-core In- tel X7460-based system; more than half that of our one-IPC model’s 59.3% accuracy. By providing a detailed understanding of both the hardware and software, and allowing for a number of accuracy and simulation performance trade-offs, we conclude that interval simulation and Sniper is a use- ful complement in the architect’s toolbox for simulating high-performance multi-core and many-core systems.

Chapter 3

An Evaluation of High-Level

Mechanistic Core Models

In the previous chapter, we provide an introduction to the Sniper Multi-Core Sim- ulator and compare it to the one-IPC core model. We show that the interval simulation models used in Sniper provide higher accuracy while maintaining good performance.

In this chapter, we explore, analyze and compare the accuracy and simulation speed of a new high-abstraction core model. We introduce the instruction-window centric core model, a mechanistic core model that bridges the gap between interval simulation and cycle-accurate simulation by enabling high-speed simulations with higher levels of detail. In addition, we describe a number of enhancements to interval simulation to improve its accuracy while maintaining simulation speed.

3.1 Introduction

In this chapter, we provide an overview of interval simulation, and present improvements that both build on interval simulation as well as extend it in a new direction for higher simulation accuracy. We extend the original interval simulation model to take into account limited execution units, and improve its handling of overlapping memory accesses through a more detailed dependency analysis of memory accesses. These modifications improve accuracy for a range of workloads at a minimal increase in com- plexity. In addition, we present a new core model that uses the insights from interval modeling, and combines them with the detailed model of the instruction window, or reorder buffer (ROB). We call this methodol- ogy instruction-window centric (IW-centric) simulation, where the reorder buffer of an application is at the center of the amount of core-level performance that we can extract. While the original interval simulation method-

ology calculates the ILP of an application analytically, IW-centric simulation models micro-op dependency and issue timing in detail, providing additional accuracy with respect to fine-grained events. The cost of this additional level of detail is a somewhat lower simulation speed. IW-centric simulation therefore represents a different point on the speed versus accuracy trade-off, and can be a good middle ground between interval simulation (as described in Chapter 2) and cycle-accurate modeling.

More specifically, we make the following contributions in this work: • We present issue contention for interval simulation. Issue contention

takes into account core-level instruction execution limitations (e.g., a limited number of functional units) to more accurately predict core performance.

• We present an improvement to the interval model whereby we improve dependency analysis tracking by differentiating between in- structions or micro-ops that are dependent on long-latency loads, to those that are not.

• We introduce the instruction-window centric core model, which is a new speed-vs.-accuracy trade-off point between interval simulation and traditional detailed cycle-level core model simulations.

• We present a detailed analysis of the trade-offs between the one-IPC, interval-core and the instruction-window centric simulation models, and provide both single-core and multi-core analysis across a number of simulated hardware configurations to demonstrate the effects that each of the core models have on the scaling and accuracy of the simulations.

• We validate these core models against real hardware and show single- core average errors of 11% for the instruction-window centric model and 24% for interval simulation.

This chapter is organized as follows. We first discuss high-level core models that are used to provide a speed versus accuracy trade-off for mi- croprocessor simulation. Next, we present improvements to interval simulation, and introduce a new core model, instruction-window centric simulation, that improves accuracy with respect to hardware. Finally, we provide an evaluation of these core models for both speed and for accuracy, and discuss how core model resolution affects microarchitecture conclusions.

3.2 Core-level Abstractions 35

In document Speeding up architectural simulation through high-level core abstractions and sampling (Page 58-63)