• No results found

2.8

Other SDRAM Optimization Techniques

This section briefly introduces some other SDRAM related techniques, which also attempt to improve SDRAM performance by exploiting locality and parallelism.

2.8.1 The Impulse Memory System

Proposed by John Carter et al., the Impulse Memory System adds two features to a tra- ditional memory controller. First, an extra stage of address translation is added into the memory controller to allow applications to control how their data is accessed and cached. Second, the Impulse controller supports prefetching at the memory controller, which reduces the effective latency to memory [10, 76].

Real systems usually do not use all physical address space, i.e. in a system with 4GB of physical address space having only 1GB memory populated. The Impulse system uses these unused addresses constitute a shadow address space which is mapped to physical memory by the Impulse controller. By giving applications and the operating system control over the use of shadow addresses, Impulse supports application-specific optimizations that restructure data.

As an example, consider a program that accesses five elements of an array. Given the physical memory layout of the requested elements shown in Figure 2.21(a), a conventional memory system loads the five elements in five separate memory accesses, each of which contains a full cache line of contiguous physical memory. The five elements then occupy five cache lines, although only a subset of each line is requested.

In an Impulse memory system, an application can configure the memory controller to export a dense shadow space alias that contains just the requested elements, and have the OS map a new set of virtual addresses, which fall into the same cache line. The application can then access the elements via the virtual alias in a single memory access as shown in Figure 2.21(b). In this way the five requested elements only occupy one cache line and

1 2 3 4 5 2 4 5 1 2 3 4 5 2 3 4 5 Impulse Controller 1 1 3

(a) Conventional Memory System (b) Impulse Memory System

Cache Physial memory Cache Physial memory Memory bus Memory bus

Figure 2.21: The Impulse Memory System

require only one bus transfer.

The Impulse memory controller also supports prefetching by adding a small amount of SRAM on the controller. For non-remapped data, prefetching is useful for reducing the latency of sequentially accessed data. The Impulse memory system is designed for appli- cations that do not exhibit sufficient locality, e.g. sparse matrix, database and multimedia applications.

2.8.2 SDRAM Controller Policy Predicator

SDRAM controller policy is introduced in Section 2.3. Two static controller policies, CPA and OP, are commonly used and can be selected through many BIOS [71]. However, which policy yields a better performance largely depends upon an applications’ access pattern. A dynamic controller policy which applies different policies to each access can reduce access latency.

An ideal dynamic controller policy, referred to as Dynamic Upper Bound, will be pre- sented in Section 3.2.1.3. The dynamic upper bound policy uses future information only available in simulations to give an upper bound of performance improvement that a real dynamic controller policy can achieve without reordering accesses [58].

2.8 Other SDRAM Optimization Techniques 39

While dynamic upper bound policy requires future access information, a two-level dy- namic SDRAM controller policy predictor proposed by Ying Xu uses a history based pre- dictor to make the decision of leaving the accessed row open or precharging the bank after completing each access [74]. This controller policy predictor is similar to branch predic- tors [40, 75], making predictions using history information.

2.8.3 Adaptive Data Placement

When virtual paging systems are in use, the performance of a virtual paging system is often evaluated by how fast a virtual page can be allocated or freed. However, how fast a page can be accessed during the runtime also impacts performance, especially when the main memory has nonuniform access latencies.

SDRAM address mapping techniques, as will be presented in Chapter 4, can change access distribution in the SDRAM address space to exploit parallelism. However, SDRAM address mapping must be static and does not reflect any dynamic changes in program behavior.

The operating system, provided with the knowledge of the memory hierarchy, can in- telligently place data in the SDRAM space to exploit parallelism available in the main memory. Theoretically an intelligent virtual paging system can achieve at least the same performance improvement as that of SDRAM address mapping techniques at the cost of operating system page allocation complexity. In addition, the virtual paging system may have the ability to change the data placement as program access patterns change [37].

Chapter 3

Methodology

This chapter describes the simulation environments and methodologies used in the thesis. Two modified simulators, SimpleScalar v3.0d and M5 v1.1, are used for SDRAM address mapping and access reordering mechanisms studies respectively. Decisions and considera- tions in selecting simulators are discussed. Modifications made to the simulators in order to support the studies are presented.

3.1

Methodologies Used in the Thesis

Simulators are widely used in computer architecture and microprocessor studies [35, 2, 1]. Using these architecture simulators is an efficient way to study memory organizations and optimization techniques. However, many simulators focus on microprocessor studies, such as pipeline or cache organizations, and use simplified main memory modules, which may not be accurate for memory studies.

The selected simulators are revised to replace the original main memory modules with more detailed SDRAM modules. The techniques being studied, including the proposed and existing techniques, are implemented in simulation modules and added into the simulators. Using standard benchmarks, these techniques are simulated and examined. Trace files gen-

erated by the simulators are used to validate the implementations. Further improvements are made based on analysis of simulation results.