Open-source multiprocessor simulators - Improving prefetching mechanisms for tiled CMP platform

In order to accurately simulate the effects of prefetching, it is necessary to emulate the cores, the memory system, and the NoC in a realistic way. Table 2.2 summarizes the main open-source simulators that we have analyzed in order to assess whether they possess the features needed to accurately simulate a CMP system with prefetching. MARSSx86 [62] possesses a full simulation environment with which to simulate a multi-core system and memory hierarchy. Moreover, in the memory hierarchy it offers a prefetching module with different prefetching engines whose performance in the hierarchy can be tested. However, it does not support the simulation of the NoC. The second simulator, Simics-GEMS [54], possesses most of the requirements, but is almost obsolete because the researchers who were working on the GEMS part of Simics-GEMS abandoned this project some years ago in order to build the newer gem5 simulator [7]. This latter full system simulator is able to simulate a simple (Classic) or a detailed (Ruby) memory hierarchy. The two memory models of this system have been placed on a different row in Table 2.2, which shows that, with Ruby, gem5 can simulate a detailed NoC system. Note that, the classic memory system from gem5 models a simplistic bus between cores and the shared cache, and between the cache and main memory. This bus is also able to show statistics about contention. When the number of cores increases, it is not realistic to use a bus to connect them. We need to make use of the Ruby memory system, which is able to model a more realistic switched NoC. This detailed memory system does not support any kind of prefetching. Therefore, to the best of our knowledge, there is no open-source simulator that, in its official release form, is capable of the accurate simulation of prefetching in multi-core environments with large number of cores.

2.5 Open-source multiprocessor simulators 33

Core Memory NoC Up-to-date Prefetch

MARSSx86 X X X X

Simics-GEMS X X X

gem5-Classic X X X X

gem5-Ruby X X X X

Table 2.2 Requirements for multi-core simulators.

2.5.1 Benchmarks

Several benchmark suits are supported to work with this simulator (SPLASH, SPEC, and PARSEC). From this set of benchmarks, we chose PARSEC because is the multithreaded suite with the most wide range of workloads. According to [6] these are the five requirements for a benchmark suite:

• Support for multi-threaded applications: Shared-memory CMPs are already ubiq- uitous. The trend for future processors is to deliver large performance improvements through increasing core counts on CMPs while only providing modest serial performance improvements. Consequently, applications that require additional processing power will need to be parallel.

• Support for emerging workloads: Rapidly increasing processing power is enabling a new class of applications whose computational requirements were beyond the capabili- ties of the earlier generation of processors. Such applications are significantly different from earlier applications. Future processors will be designed to meet the demands of these emerging applications and a benchmark suite should represent them.

• Need to be diverse: Applications are increasingly diverse, run on a variety of platforms and accommodate different usage models. They include both interactive applications like computer games, offline applications like data mining programs and programs with different parallelization models. Specialized collections of benchmarks can be used to study some of these areas in more detail, but decisions about general-purpose processors should be based on a diverse set of applications. While a truly representative suite is impossible to create, reasonable effort should be made to maximize the diversity of the program selection. The number of benchmarks must be large enough to capture a sufficient amount of characteristics of the target application space.

• Employ state-of-art techniques: A number of application domains have changed dra- matically over the last decade and use very different algorithms and techniques. Visual applications for example have started to increasingly integrate physics simulations to

generate more realistic animations. A benchmark should not only represent emerging applications but also use state-of-art techniques.

Fig. 2.9 Qualitative summary of the inherent key characteristics of PARSEC benchmarks. The pipeline model is a data-parallel model which also uses a functional partitioning. PARSEC workloads were chosen to cover different application domains, parallel models and runtime behaviors.

• Support research: A benchmark suite intended for research has additional requirements compared to one used for benchmarking real machines alone. Benchmark suites intended for research usually go beyond pure scoring systems and provide infras- tructure to instrument, manipulate, and perform detailed simulations of the included programs in an efficient manner.

In Figure 2.9, we present a qualitative summary of the PARSEC Benchmarks key characteristics. PARSEC workloads were selected to include different combinations of parallel models, machine requirements and runtime behaviors. As stated before, PARSEC meets all the requirements outlined in the previous point and because of this reason this has been the chosen benchmark suite to be used in the experiments of this thesis.

Chapter 3 Prefetching evaluation in multi-core

platforms

Good ideas are not adopted

automatically. They must be driven into practice with courageous patience.

Hyman Rickover

In document Improving prefetching mechanisms for tiled CMP platforms (Page 52-55)