• No results found

4. FLEXIBLE MEMORY: A NOVEL MAIN MEMORY FRAMEWORK BASED ON

4.3 Challenges of Compressed Memory

Traditional main memory design follows the following principles (although some variants of design may differ):

• Storing data in raw format, each byte of data occupies one physical address.

• Placing data in strict order according to its assigned address. Physical address is con- verted to device cell location by simple truncating.

Being straightforward, simple and effective, Traditional Memory Design has been long used as main memory system. During the time when speed gap between processor and memory device was narrow enough, sophistication level of traditional memory was more than enough to to handle demand of memory traffic. However, following Moore’s law, computation ability roars, bringing much heavier load on memory traffic. Memory systems are stressed in many cases, especially in server clusters. It has been reported that on IBM eServer, main memory alone consumes as much as 40%[35] of total system energy. With huge power consumption and unsatisfactory performance, we can conclude that traditional memory design is starting to hit its limits in terms of latency, capacity, power and bandwidth limitations.

Power as the currency to buy more performance proposes greater and very unique chal- lenge. Also it has drawn more and more attention in both academic and industrial field as the

popularity of mobile device advances and they generally have very limited energy budget and performance per watt metric has become one of top performance metrics. However, previous memory compression works doesn’t reach DRAM operation details and thus not able to provide very accurate DRAM power changes caused by memory compression.

We can take bandwidth as an example to address the limitation of traditional memory design. Bandwidth is defined as maximum amount of data that memory system can process in a fixed amount of time. In multi-core processors environment, bandwidth demand is usually high and memory traffic has to be queued, leading to higher queuing delay. Straightforward and effective solution is upgrading to higher frequency bus or accommodating more memory channels, incurring monetary cost and more importantly higher power consumption.

In order to avoid a dilemma like this, we need to break the strong binding between number of data bytes and amount of information. In traditional memory design, 1 byte data ≡ 1 byte information. While in essence, information is what processors need and amount of data is what burdens memory system.

So now we have a clear picture that the optimal way is to reduce number of bytes trans- ferred on the bus while keeping amount of information conveyed unchanged. Main memory compression is an obvious solution to effectively save power and bandwidth.

However, main memory compression obviously breaks base rocks that traditional memory system is built on. Thus, it proposes several challenges and if not handled well, it incurs great complications and performance downgrade. So we propose a new DRAM-based memory scheme Flexible Memory.

First challenge is addressing, blocks no longer have fixed size and it would be wasting memory resource and negating the whole purpose of main memory compression to allocate memory blocks same space and location as in traditional uncompressed memory. Therefore, memory blocks should be able to be placed in arbitrary order. However, traditional memory system rely on simple implicit address operation to locate a block, which can not be easily adapted towards arbitrarily placed blocks.

For addressing challenge, we propose a unique data structure BMT (Block Mapping Table) and a whole set of supporting components. Its core data contains offset/size pair of every block

in each compressed page. BMT resides in main memory together with memory pages, And we propose a fully-associative BMT Cache to ensure timely access to BMT structure of each page without requiring 2x memory access. Main memory controller can then read out page offset of each memory block according to its BMT. Also, we add another layer of address mapping after traditional virtual-physical address translation that supports various page sizes. We call it virtual physical address. Combining virtual physical address and BMT information, it is easy to locate a block in DRAM.

Second challenge is page reorganization handling. In compressed main memory, fat write [33] is the most common and major reason for overhead. It comes from the fact that each block is able to have various size and a memory write request could mess up block layout by trying to fit a larger block into its original smaller slot. In order to avoid correctness issue, block movements are necessary to adjust block layout to make room for new block. Without proper data structures, policy and logic support, this could incur high overhead.

To tackle this challenge, we rely on great flexibility provided by BMT. As any block can be placed at any location that has enough free space for its compressed size, instead of following sequential order of any kind, we are able to move away any block that is in the way of any layout adjust attempt. With least restrictions, main memory controller can pick wisest block movement method to make room for new blocks.

Third challenge is how to get most bandwidth benefits out of main memory compression. Previously proposed main memory compression works rely on DRAM Cache to hold addition- ally fetched data. These extra data may be useful for future memory accesses. However, this relies on memory access locality to work. For applications with poor memory access pattern, DRAM Cache traffic reduction may not perform as well as expected.

Related documents