• No results found

1. Memory technology & Hierarchy

N/A
N/A
Protected

Academic year: 2022

Share "1. Memory technology & Hierarchy"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

1. Memory technology

& Hierarchy

RAM types

Advances in Computer Architecture

Andy D. Pimentel

(2)

Advances in Computer Architecture, Andy D. Pimentel

Memory wall

“Memory wall” = divergence between CPU and RAM speed

We can increase bandwidth by introducing concurrency in memory access (e.g. through pipelining accesses) but this requires regular access patterns

random accesses to main memory can cause severe performance degradation

(3)

Advances in Computer Architecture, Andy D. Pimentel

Memory hierarchy - issues

Conflicting requirements in a memory system: we want both large and fast

Electronic systems have higher latencies as they increase in size

Speed of light is approximately 1ns for 30cms N.b. 1 ns is 3 clock cycles in a state of the art processor

Pins, wires, connectors etc. all add resistance and capacitance which delay signals significantly

A memory hierarchy attempts to make a large slow memory appear fast by

buffering data in smaller faster memories close to the processor

(4)

Advances in Computer Architecture, Andy D. Pimentel

Memory hierarchy - issues

Can make memories faster but this requires more power Need to drive long wires on chip across memory array – to do this fast needs more power

Memory performance is a compromise between power and performance

(As is processor performance today)

(5)

Advances in Computer Architecture, Andy D. Pimentel

RAM cell designs

(6)

Advances in Computer Architecture, Andy D. Pimentel

Components - DRAM

DRAM - is a very dense form of RAM - it is volatile

access is destructive & data must be re-written

charge also leaks from the capacitor which stores the data so data must be refreshed periodically (“dynamic” RAM)

Typical DRAM chip characteristics

256-1024 Mbit and 2-800MHz cycle Uses two cycle Row/Column addressing (RAS/CAS)

two-stage access and requirement to rewrite data contribute to slow cycle time

but good design can help for regular accesses

(7)

Advances in Computer Architecture, Andy D. Pimentel

DRAM - RAS/CAS addressing

(8)

Advances in Computer Architecture, Andy D. Pimentel

DRAM refresh

Time Threshold

voltage

0 Stored

1 Written Refreshed Refreshed Refreshed

10s of ms before needing

refresh cycle

Voltage for 1

Voltage

for 0

(9)

Advances in Computer Architecture, Andy D. Pimentel

Components - SRAM

SRAM - is less dense but faster than DRAM

uses four transistors to store data and two transistors to access it - access is non-destructive

data is stable while the RAM has power connected (“static” RAM) Typical SRAM chip characteristics

~64Mbit and 100MHz-1GHz cycle

although SRAM cycle times are similar to DRAM, SRAM is true random access memory 


DRAM can only read consecutive bits at the cycle rate, therefore DRAM has a much larger latency time SRAM is also used for memory on the processor chip

registers and cache both use SRAM technology

the smaller the memory the “faster” it operates: lower latency

(10)

Advances in Computer Architecture, Andy D. Pimentel

New technology

Memristor a circuit element in which the resistance is a function of the history of the current through the device

Memristor theory was formulated and named by Leon Chua in 1971


Ref: Chua, Leon O (1971), "Memristor -The Missing Circuit Element", IEEE Transactions on Circuit Theory CT-18 (5): 507–519, doi:10.1109/TCT.1971.1083337

HP announced they teamed up with Hynix to produce a commercial product dubbed "ReRam" August 2010


http://www.reuters.com/article/idUS254583059320100901

has the potential to be dense and non-volatile

(11)

Advances in Computer Architecture, Andy D. Pimentel

Bandwidth vs. Latency

Memory latency is the time delay required to obtain a specific item of data This is larger in DRAM than in SRAM

SRAM can access any bit each cycle

DRAM is restricted to bits in the same row, CAS cycles

Memory Bandwidth is the rate at which data can be accessed (e.g. bits per second)

Bandwidth unit is normally 1/cycle time

This rate can be improved by concurrent access

(12)

Advances in Computer Architecture, Andy D. Pimentel

Improving DRAM bandwidth

Using locality to get maximum bandwidth One RAS multiple CAS e.g.

Fast page mode DRAM

E(xtended) D(ata) O(utput) RAMs Burst-mode DRAMs

Burst EDO RAM

S(ynchronous)DRAM By improving the interface

DDR SDRAM and RAMBUS

(13)

Advances in Computer Architecture, Andy D. Pimentel

Example - Synchronous DRAM

SDRAM changed the memory interface from asynchronous to synchronous and uses a form of pipelining – will return to this concept later

DDR SDRAM - double data rate uses transfers on both rising and

falling clock edges

(14)

Advances in Computer Architecture, Andy D. Pimentel

Different SDRAM technologies

Advanced memory technologies

Despite the performance improvement in the overall system due to use of SDRAM, the growing performance gap between the memory and processor must be filled by more advanced memory technologies. These technologies, which are described on the following pages, boost the overall performance of systems using the latest high-speed processors (Figure 8).

Figure 8. Peak bandwidth comparison of SDRAM and advanced SDRAM technologies

Double Data Rate SDRAM technologies

Double Data Rate (DDR) SDRAM is advantageous for systems that require higher bandwidth than can be obtained using SDRAM. Basically, DDR SDRAM doubles the transfer rate without increasing the frequency of the memory clock. This section describes three generations of DDR SDRAM technology.

DDR-1

To develop the first generation of DDR SDRAM (DDR-1), designers made enhancements to the SDRAM core to increase the data rate. These enhancements include prefetching, double transition clocking, strobe-based data bus, and SSTL_2 low voltage signaling. At 400 MHz, DDR increases memory bandwidth to 3.2 GB/s, which is 400 percent more than original SDRAM.

Prefetching

In SDRAM, one bit per clock cycle is transferred from the memory cell array to the input/output (I/O) buffer or data queue (DQ). The I/O buffer releases one bit to the bus per pin and clock cycle (on the rising edge of the clock signal). To double the data rate, DDR SDRAM uses a technique called

prefetching to transfer two bits from the memory cell array to the I/O buffer in two separate pipelines.

Then the I/O buffer releases the bits in the order of the queue on the same output line. This is known as a 2n-prefetch architecture because the two data bits are fetched from the memory cell array before they are released to the bus in a time multiplexed manner.

10

(15)

Advances in Computer Architecture, Andy D. Pimentel

RAMBUS DRAMs

This is an interface improvement using a pipelined bus interface sometimes called a split-transaction

Bus comprises row and column address line + 18 bits of data

3 transactions on bus simultaneously (RAS/CAS/Data) High clock rate (400MHz) with data transfers on both edges

Note that neither technique (SDRAM or RAMBUS) can

improve the latency to access a single item of data

(16)

Advances in Computer Architecture, Andy D. Pimentel

Flash memory

S o u r c e l i n e s

B i t l i n e s

Word lines

n+

n−

p subs- trate

Control gate Floating gate Source

Drain

(17)

Advances in Computer Architecture, Andy D. Pimentel

Memory wall – solutions

The most common solution to the memory wall is to cache data

requires locality of access or memory reuse

compiler optimisations can help to localise data

Can also design banked memory systems to provide high bandwidth to random memory locations

Some access patterns will still break the memory Can design processors that tolerate high-latency memory accesses 


“don’t wait do something else”

Requires concurrency in instruction execution

(18)

Advances in Computer Architecture, Andy D. Pimentel

Summary

Need for new dense + fast technology Most common solution: cache data

Requires locality of access, or memory reuse

Design processors that tolerate high-latency memory accesses – “don’t wait do something else”

Trend: hardware multithreading 


in current and future chips

(19)

Summary

Memory wall

Limits of RAM components

New components (?)

Caches

& hw multithreading

cause

problem

solutions

References

Related documents

States that the course is No Longer Offered for this award but it still exists in WECM. States that the course is no longer

Vertical flux of zonal and meridional momentum flux was calculated using the radial velocity variances of the east-west and north-south beam pairs (i.e. in the four-beam

Remove the axle mounting screw and locknut that secure the patient operated wheel lock, spacers, and rear wheel to the wheelchair frame3. Secure rear wheel, patient operated

The BMD16N is fully compatible with the s88-N standard, It can be connected directly to any equip- ment bearing the s88-N logo.. Less chance of

Pro obličeje jsou zřejmě vhodnější metody pracující s lokální charakteristikou okolí (jako jsou Haarovy příznaky nebo LBP), než detekce na základě gradientů.

1.2 The resolution consolidates the precepts of Nottinghamshire County Council, Nottinghamshire Police and Crime Commissioner, Nottinghamshire Fire Authority,

Recommendation 11: when looking at options for long term care services for First nations people living on-Reserve, consideration should be given to establishing

a) To establish a mathematical model to investigate Casson fluid model with the effects of magnetic field and nano-particles yields a moving cylinder. b) To introduce an