Memory Technology
Computer Science 104
Lecture 16
2 © Alvin R. Lebeck CPS 104Administrivia
• Midterm II Next Wednesday
• HW 4 due Friday
3 © Alvin R. Lebeck CPS 104• Memory
Outline
• Review
• Big Picture of Memory
• Memory Technology
SRAM DRAMReading
C.9
Today’s Lecture
4 © Alvin R. Lebeck CPS 104 Instr Decode / Reg FetrchThe Five Steps of a Load Instruction
Clk PC Rs, Rt, Rd, Op, Func Clk-to-Q ALUctrInstruction Memory Access Time
Old Value New Value
RegWr Old Value New Value
Delay through Control Logic
busA
Register File Access Time
Old Value New Value
busB
ALU Delay
Old Value New Value
Old Value New Value New Value Old Value
ExtOp Old Value New Value
ALUSrc Old Value New Value
Address Old Value New Value
busW Old Value New
Delay through Extender & Mux
Data Memory Access Time
Instruction Fetch Address Data Memory Reg Wr
R egi ste r F ile W rite T ime 5 © Alvin R. Lebeck
A Pipelined
Datapath
IF /ID R eg ist er ID /Ex R eg ist er Ex/ Me m R eg ist er Me m/ W r R eg ist er PC Data Mem WA Di RA Do IUnit A I RFile Di Ra Rb Rw MemW r RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mu x 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 PC+4 Rs Rt PC +4 Zero Branch 1 0 ClkIfetch Reg/Dec Exec Mem WrB
6 © Alvin R. Lebeck
A More Extensive Pipelining
Example
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Ifetch Reg/Dec Exec Mem WrB
0: Load
Ifetch Reg/Dec Exec Mem WrB
4: R-type
Ifetch Reg/Dec Exec Mem WrB
8: Store
Ifetch Reg/Dec Exec Mem WrB
12: Beq (target is 1000)
End of
Cycle 4 Cycle 5 End of Cycle 6 End of Cycle 7 End of • End of Cycle 4: Load’sMem, R-type’s Exec, Store’s Reg, Beq’s Ifetch
• End of Cycle 5: Load’sWrB, R-type’s Mem, Store’s Exec, Beq’sReg
• End of Cycle 6: R-type’sWrB, Store’sMem, Beq’sExec
7 © Alvin R. Lebeck
Initial Representation: Finite State Diagram
1: PCWr, IRWr ALUOp=Add Others: 0s x: PCWrCond RegDst, Mem2R Ifetch 1: BrWr, ExtOp ALUOp=Add Others: 0s x: RegDst, PCSrc ALUSelB=10 IorD, MemtoReg Rfetch/Decode 1: PCWrCond ALUOp=Sub x: IorD, Mem2Reg ALUSelB=01 RegDst, ExtOp ALUSelA BrComplete PCSrc 1: RegDst ALUOp=Rtype ALUSelB=01 x: PCSrc, IorD MemtoReg ALUSelA ExtOp RExec 1: RegDst, RegWr ALUOp=Rtype ALUselA x: IorD, PCSrc ALUSelB=01 ExtOp Rfinish ALUOp=Or IorD, PCSrc 1: ALUSelA ALUSelB=11 x: MemtoReg OriExec 1: ALUSelA ALUOp=Or x: IorD, PCSrc RegWr ALUSelB=11 OriFinish ALUOp=Add PCSrc 1: ExtOp ALUSelB=11 x: MemtoReg ALUSelA AdrCal ALUOp=Add x: PCSrc,RegDst 1: ExtOp ALUSelB=11 MemtoReg MemWr ALUSelA SWMem ALUOp=Add x: MemtoReg 1: ExtOp ALUSelB=11 ALUSelA, IorD PCSrc LWmem ALUOp=Add x: PCSrc 1: ALUSelA ALUSelB=11 MemtoReg RegWr, ExtOp IorD LWwr lw or sw lw sw Rtype Ori beq 0 1 8 10 6 5 3 2 4 7 11 Wait Wait 8 © Alvin R. Lebeck CPS 104• The Five Classic Components of a Computer
• Today’s Topic: Memory Technology
Control Datapath Memory Processor Input OutputBig Picture
9 © Alvin R. Lebeck CPS 104Where Are We?
I/O system CPU Compiler Operating System Application Digital Design Circuit Design Instruction Set Architecture, Memory, I/O Firmware Memory
Software
Hardware
Interface Between
HW and SW
You are here.
10 © Alvin R. Lebeck CPS 104 1 2 3 4 • 2n-1 • • 0 00110110 00001100 Byte Address Data
Review: Program’s View of Memory
• Memory is a large linear array of
bytes.
Each byte has a unique address (location). Byte of data at address 0x100, and 0x101
• Most computers have instructions
with byte (8-bit) addressing.
• Data may have to be aligned on
word (4 byte) or double word (8
byte) boundary.
int is 4 bytes
double precision floating point is 8 bytes
• 32-bit v.s. 64-bit addresses
we will assume 32-bit for rest of course, unless otherwise stated
1 • 2n-1 • • 0 2n-1-4 Word Address 11 © Alvin R. Lebeck CPS 104 Clk 5 Rw Ra Rb 32 32-bit Registers Rd A LU Clk Data In DataOut Data Address Ideal Data Memory Instruction Instruction Address Ideal Instruction Memory Clk PC 5 Rs 5 Rt 16 Imm 32 32 32 32 A B
Our Naïve View of Memory (Single Cycle)
12
© Alvin R. Lebeck CPS 104
Question
• What issues do we need to worry about in
13 © Alvin R. Lebeck CPS 104 I/O Bus Memory Bus CPU Cache Disk Controller Disk Memory Disk Graphics Controller Network Interface Graphics Network interrupts
System Organization
I/O Bridge
Core Chip Set
The
memory
hierarchy
14
© Alvin R. Lebeck CPS 104
Level Two Cache Datapath Registers
Level One
Cache Control Processor
Processor and Caches
To main memory
15
© Alvin R. Lebeck CPS 104
Memory
Controller Memory Bus
D IM M S lot 0 D IM M S lot 1 D IM M S lot 2 D IM M S lot 3 D IM M S lot 4 D IM M S lot 5 D IM M S lot 6 D IM M S lot 7 DRAM DIMM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM
Main Memory
Why is it called DRAM?
To Processor16
© Alvin R. Lebeck CPS 104
• Random Access:
“Random” is good: access time is the same for all locations
DRAM: Dynamic Random Access Memory
» High density, low power, cheap, slow » Dynamic: needs to be “refreshed” regularly » Main memory
SRAM: Static Random Access Memory » Low density, high power, expensive, fast » Static: content will last “forever” (until power loss) » Caches
• “Not-so-random” Access Technology:
Access time varies from location to location and from time to time Examples: Disk, DVD/CD
• Sequential Access Technology: access time linear in
location (e.g.,Tape)
Memory Technology
17
© Alvin R. Lebeck CPS 104
• Why do computer professionals need to know about
RAM technology?
Processor performance is usually limited by memory latency and bandwidth.
Latency: The time it takes to access a single word in memory.
Bandwidth: The average speed of access to memory (Words/Sec). As integrated circuit (IC) densities increase, lots of memory will fit
on processor chip
» Tailor on-chip memory to specific needs.
- Instruction cache - Data cache - Write buffer
• What makes RAM different from a bunch of
flip-flops?
Density: RAM is much more dense
Speed: RAM access is slower than flip-flop (register) access.
Random Access Memory (RAM) Technology
18
© Alvin R. Lebeck CPS 104
DRAM
Year Size Cycle Time 1980 64 Kb 250 ns 1983 256 Kb 220 ns 1986 1 Mb 190 ns 1989 4 Mb 165 ns 1992 16 Mb 145 ns 1995 64 Mb 120 ns 1999 128Mb 100 ns 2003 256Mb 100 ns 2007 2Gb 55ns 2010 2Gb 20ns Capacity Speed Logic: 2x in 3 years 2x in 3 years DRAM: 4x in 3 years 1.4x in 10 years Disk: 2x in 3 years 1.4x in 10 years
1000:1! 2:1!
19
© Alvin R. Lebeck CPS 104
6-Transistor SRAM Cell
bit bit
word (row select)
• Write:
1. Drive bit lines (bit=1, bit=0) 2. Select row
• Read:
1. Precharge bit and bit to Vdd (set to 1) 2. Select row
3. Cell pulls one line low (pulls to 0)
4. Sense amp on column detects difference between bit and bit
bit bit
word
1 0 0 1
Static RAM Cell
20 © Alvin R. Lebeck CPS 104 SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM
Cell SRAM Cell SRAM Cell SRAM Cell
SRAM Cell SRAM Cell SRAM Cell SRAM Cell - Sense Amp + - Sense Amp + - Sense Amp + - Sense Amp +
:
:
:
:
Word 0 Word 1 Word 15 Dout 0 Dout 1 Dout 2 Dout 3- Wr Driver & Precharger + - Wr Driver & Precharger + - Wr Driver & Precharger + - Wr Driver & Precharger +
A d d re ss D ec od er WrEn Precharge Din 0 Din 1 Din 2 Din 3 A0 A1 A2 A3
Typical SRAM Organization: 16-word x 4-bit
21
© Alvin R. Lebeck CPS 104
• Write Enable is usually active low (WE_L)
• Din and Dout are combined to save pins:
A new control signal, output enable (OE_L) is needed WE_L is asserted (Low), OE_L is disasserted (High)
» D serves as the data input pin
WE_L is disasserted (High), OE_L is asserted (Low) » D is the data output pin
Both WE_L and OE_L are asserted: » Result is unknown. Don’t do that!!!
A D OE_L 2 N words x M bit SRAM N M WE_L
Logic Diagram of a Typical SRAM
22 © Alvin R. Lebeck CPS 104 Write Timing: D Read Timing: WE_L A Write Hold Time
Write Setup Time A D OE_L 2 N words x M bit SRAM N M WE_L Data In Write Address OE_L High Z
Junk Read Address Junk Read Access Time Data Out Read Access Time Data Out Junk Read Address
Typical SRAM Timing
23
© Alvin R. Lebeck CPS 104
• Dynamic RAM (DRAM): Refresh required Very high density Low power (.1 - .5 W active, .25 - 10 mW standby) Low cost per bit Pin sensitive (few pins):
» Output Enable (OE_L) » Write Enable (WE_L) » Row address strobe (ras) » Col address strobe (cas)
cell array NxN bits N N r o w SA & c o l addr log N 2 D WE_L OE_L
Introduction to DRAM
24 © Alvin R. Lebeck CPS 104• Write:
1. Drive bit line 2. Select row• Read:
1. Precharge bit line to Vdd (1) 2. Select row
3. Cell and bit line share charges » Very small voltage changes on the
bit line
4. Sense (fancy sense amp) » Can detect changes of ~1 million
electrons 5. Write: restore the value
• Refresh
1. Just do a dummy read to every cell.
row select
bit
25 © Alvin R. Lebeck CPS 104 r o w d e c o d e r row address Sense-Amps, Column Selector &
I/O Circuits Column Address
data RAM Cell Array
word (row) select bit (data) lines
• Row and Column Address together:
Select 1 bit a time
Each intersection represents a 1-T DRAM Cell
Classical DRAM Organization (square)
26
© Alvin R. Lebeck CPS 104
• Typical DRAMs: access multiple bits in parallel
Example: 2 Mb DRAM = 256K x 8 = 512 rows x 512 cols x 8 bits Row and column addresses are applied to all 8 planes in parallelOne “Plane” of 256 Kb DRAM 512 row s Plane 0 512 cols D<0> Plane 1 D<1> Plane 7 D<7> 256 Kb DRAM 256 Kb DRAM
Typical DRAM Organization
27
© Alvin R. Lebeck CPS 104
Access Pattern without Interleaving:
Start Access for D1
CPU Memory
Start Access for D2 D1 available
Access Pattern with 4-way Interleaving:
A cc es s Ban k 0 Access Bank 1 Access Bank 2 Access Bank 3
We can Access Bank 0 again CPU Memory Bank 1 Memory Bank 0 Memory Bank 3 Memory Bank 2
Increasing Bandwidth - Interleaving
Cycle Time
28 © Alvin R. Lebeck
MICRON 2Gb DRAM (512Mx4, circa 2010)
CPS 104
29
© Alvin R. Lebeck CPS 104
Fast Memory Systems: DRAM specific
• Modern DRAMs
Synchronous DRAM (SDRAM): Provide a clock signal to DRAM, transfer synchronous to system clock
Dual Data Rate DRAM (DDRAM) Also RAMBUS (DDR, DDR2, DDR3) » transfer data on both clock edges
» Each Chip a module vs. slice of memory » Short bus between CPU and chips » Does own refresh
» Variable amount of data returned » 1 byte / 2 ns (500 MB/s per chip)
30
© Alvin R. Lebeck CPS 104
Summary of Memory Technology
• DRAM is
slow
but
cheap
and
dense
:
Good choice for presenting the user with a BIG memory system Uses one transistor, must be refreshed.
• SRAM is
fast
but
expensive
and
not very dense
:
Good choice for providing the user FAST access time. Uses six transistors, holds state as long as power is supplied.
• GOAL:
Present the user with large amounts of memory using the cheapest technology.
Provide access at the speed offered by the fastest technology.