1. Memory technology & Hierarchy

(1)

1. Memory technology

& Hierarchy

RAM types

Advances in Computer Architecture

Andy D. Pimentel

(2)

Advances in Computer Architecture, Andy D. Pimentel

Memory wall

“Memory wall” = divergence between CPU and RAM speed

We can increase bandwidth by introducing concurrency in memory access (e.g. through pipelining accesses) but this requires regular access patterns

random accesses to main memory can cause severe performance degradation

(3)

Memory hierarchy - issues

Conflicting requirements in a memory system: we want both large ^and ^fast

Electronic systems have higher latencies as they increase in size

Speed of light is approximately 1ns for 30cms N.b. 1 ns is 3 clock cycles in a state of the art processor

Pins, wires, connectors etc. all add resistance and capacitance which delay signals significantly

A memory hierarchy attempts to make a large slow memory appear fast by

buffering data in smaller faster memories close to the processor

(4)

Memory hierarchy - issues

Can make memories faster but this requires more power Need to drive long wires on chip across memory array – to do this fast needs more power

Memory performance is a compromise between power and performance

(As is processor performance today)

(5)

RAM cell designs

(6)

Components - DRAM

DRAM - is a very dense form of RAM - it is volatile

access is destructive & data must be re-written

charge also leaks from the capacitor which stores the data so data must be refreshed periodically (“dynamic” RAM)

Typical DRAM chip characteristics

256-1024 Mbit and 2-800MHz cycle Uses two cycle Row/Column addressing (RAS/CAS)

two-stage access and requirement to rewrite data contribute to slow cycle time

but good design can help for regular accesses

(7)

DRAM - RAS/CAS addressing

(8)

DRAM refresh

Time Threshold

voltage

0 Stored

1 Written Refreshed Refreshed Refreshed

10s of ms before needing

refresh cycle

Voltage for 1

Voltage

for 0

(9)

Components - SRAM

SRAM - is less dense but faster than DRAM

uses four transistors to store data and two transistors to access it - access is non-destructive

data is stable while the RAM has power connected (“static” RAM) Typical SRAM chip characteristics

~64Mbit and 100MHz-1GHz cycle

although SRAM cycle times are similar to DRAM, SRAM is true random access memory  

DRAM can only read consecutive bits at the cycle rate, therefore DRAM has a much larger latency time SRAM is also used for memory on the processor chip

registers and cache both use SRAM technology

the smaller the memory the “faster” it operates: lower latency

(10)

New technology

Memristor a circuit element in which the resistance is a function of the history of the current through the device

Memristor theory was formulated and named by Leon Chua in 1971 

Ref: Chua, Leon O (1971), "Memristor -The Missing Circuit Element", IEEE Transactions on Circuit Theory CT-18 (5): 507–519, doi:10.1109/TCT.1971.1083337

HP announced they teamed up with Hynix to produce a commercial product dubbed "ReRam" August 2010 

http://www.reuters.com/article/idUS254583059320100901

has the potential to be dense and non-volatile

(11)

Bandwidth vs. Latency

Memory latency is the time delay required to obtain a specific item of data This is larger in DRAM than in SRAM

SRAM can access any bit each cycle

DRAM is restricted to bits in the same row, CAS cycles

Memory Bandwidth is the rate at which data can be accessed (e.g. bits per second)

Bandwidth unit is normally 1/cycle time

This rate can be improved by concurrent access

(12)

Improving DRAM bandwidth

Using locality to get maximum bandwidth One RAS multiple CAS e.g.

Fast page mode DRAM

E(xtended) D(ata) O(utput) RAMs Burst-mode DRAMs

Burst EDO RAM

S(ynchronous)DRAM By improving the interface

DDR SDRAM and RAMBUS

(13)

Example - Synchronous DRAM

SDRAM changed the memory interface from asynchronous to synchronous and uses a form of pipelining – will return to this concept later

DDR SDRAM - double data rate uses transfers on both rising and

falling clock edges

(14)

Different SDRAM technologies

Advanced memory technologies

Despite the performance improvement in the overall system due to use of SDRAM, the growing performance gap between the memory and processor must be filled by more advanced memory technologies. These technologies, which are described on the following pages, boost the overall performance of systems using the latest high-speed processors (Figure 8).

Figure 8. Peak bandwidth comparison of SDRAM and advanced SDRAM technologies

Double Data Rate SDRAM technologies

Double Data Rate (DDR) SDRAM is advantageous for systems that require higher bandwidth than can be obtained using SDRAM. Basically, DDR SDRAM doubles the transfer rate without increasing the frequency of the memory clock. This section describes three generations of DDR SDRAM technology.

DDR-1

To develop the first generation of DDR SDRAM (DDR-1), designers made enhancements to the SDRAM core to increase the data rate. These enhancements include prefetching, double transition clocking, strobe-based data bus, and SSTL_2 low voltage signaling. At 400 MHz, DDR increases memory bandwidth to 3.2 GB/s, which is 400 percent more than original SDRAM.

Prefetching

In SDRAM, one bit per clock cycle is transferred from the memory cell array to the input/output (I/O) buffer or data queue (DQ). The I/O buffer releases one bit to the bus per pin and clock cycle (on the rising edge of the clock signal). To double the data rate, DDR SDRAM uses a technique called

prefetching to transfer two bits from the memory cell array to the I/O buffer in two separate pipelines.

Then the I/O buffer releases the bits in the order of the queue on the same output line. This is known as a 2n-prefetch architecture because the two data bits are fetched from the memory cell array before they are released to the bus in a time multiplexed manner.

10

(15)

RAMBUS DRAMs

This is an interface improvement using a pipelined bus interface sometimes called a split-transaction

Bus comprises row and column address line + 18 bits of data

3 transactions on bus simultaneously (RAS/CAS/Data) High clock rate (400MHz) with data transfers on both edges

Note that neither technique (SDRAM or RAMBUS) can

improve the latency to access a single item of data

(16)

Flash memory

S o u r c e l i n e s

B i t l i n e s

Word lines

n+

n−

p subs- trate

Control gate Floating gate Source

Drain

(17)

Memory wall – solutions

The most common solution to the memory wall is to cache data

requires locality of access or memory reuse

compiler optimisations can help to localise data

Can also design banked memory systems to provide high bandwidth to random memory locations

Some access patterns will still break the memory Can design processors that tolerate high-latency memory accesses  

“don’t wait do something else”

Requires concurrency in instruction execution

(18)

1. Memory technology & Hierarchy