Memory Technology
Computer Science 104
Lecture 16
© Alvin R. Lebeck
Administrivia
• Midterm II Next Monday
3
© Alvin R. Lebeck CPS 104
• Memory
Outline
• Review
• Big Picture of Memory
• Memory Technology
SRAM DRAMReading
C.9
Today’s Lecture
Instr Decode / Reg FetrchThe Five Steps of a Load Instruction
Clk PC Rs, Rt, Rd, Op, Func Clk-to-Q ALUctrInstruction Memory Access Time
Old Value New Value
RegWr Old Value New Value
Delay through Control Logic
busA
Register File Access Time
Old Value New Value
busB
ALU Delay
Old Value New Value
Old Value New Value New Value
Old Value
ExtOp Old Value New Value
ALUSrc Old Value New Value
Address Old Value New Value
Delay through Extender & Mux
Data Memory Access Time
Instruction Fetch Address Data Memory Reg Wr
5 © Alvin R. Lebeck
A Pipelined
Datapath
IF/ID Register ID/Ex Register Ex/Mem Register Mem/W
r Register PC Data Mem WA Di RA Do IUnit A I RFile Di Ra Rb Rw Mem Wr RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mux 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 PC+4 Rs Rt PC+4 Zero Branch 1 0 Clk
Ifetch Reg/Dec Exec Mem WrB
© Alvin R. Lebeck
A More Extensive Pipelining
Example
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Ifetch Reg/Dec Exec Mem WrB
0: Load
Ifetch Reg/Dec Exec Mem WrB
4: R-type
Ifetch Reg/Dec Exec Mem WrB
8: Store
Ifetch Reg/Dec Exec Mem WrB
12: Beq (target is 1000) End of Cycle 4 End of Cycle 5 End of Cycle 6 End of Cycle 7
• End of Cycle 4: Load’sMem, R-type’s Exec, Store’s Reg, Beq’s Ifetch
• End of Cycle 5: Load’sWrB, R-type’s Mem, Store’s Exec, Beq’sReg
• End of Cycle 6: R-type’sWrB, Store’sMem, Beq’sExec
7 © Alvin R. Lebeck
Initial Representation: Finite State Diagram
1: PCWr, IRWr ALUOp=Add Others: 0s x: PCWrCond RegDst, Mem2R Ifetch 1: BrWr, ExtOp ALUOp=Add Others: 0s x: RegDst, PCSrc ALUSelB=10 IorD, MemtoReg Rfetch/Decode 1: PCWrCond ALUOp=Sub x: IorD, Mem2Reg ALUSelB=01 RegDst, ExtOp ALUSelA BrComplete PCSrc 1: RegDst ALUOp=Rtype ALUSelB=01 x: PCSrc, IorD MemtoReg ALUSelA ExtOp RExec 1: RegDst, RegWr ALUOp=Rtype ALUselA x: IorD, PCSrc ALUSelB=01 ExtOp Rfinish ALUOp=Or IorD, PCSrc 1: ALUSelA ALUSelB=11 x: MemtoReg OriExec 1: ALUSelA ALUOp=Or x: IorD, PCSrc RegWr ALUSelB=11 OriFinish ALUOp=Add PCSrc 1: ExtOp ALUSelB=11 x: MemtoReg ALUSelA AdrCal ALUOp=Add x: PCSrc,RegDst 1: ExtOp ALUSelB=11 MemtoReg MemWr ALUSelA SWMem ALUOp=Add x: MemtoReg 1: ExtOp ALUSelB=11 ALUSelA, IorD PCSrc LWmem ALUOp=Add x: PCSrc 1: ALUSelA ALUSelB=11 MemtoReg RegWr, ExtOp IorD LWwr lw or sw lw sw Rtype Ori beq0
1
8
10
6
5
3
2
4
7
11
Wait Wait• The Five Classic Components of a Computer
• Today’s Topic: Memory Technology
9
© Alvin R. Lebeck CPS 104
Where Are We?
I/O system CPU Compiler Operating System Application Digital Design Circuit Design Instruction Set Architecture, Memory, I/O Firmware Memory
Software
Hardware
Interface Between
HW and SW
You are here.
© Alvin R. Lebeck 1 2 3 4
•
2n-1•
•
0 00110110 00001100 Byte Address DataReview: Program’s View of Memory
• Memory is a large linear array of
bytes.
Each byte has a unique address (location). Byte of data at address 0x100, and 0x101
• Most computers have instructions
with byte (8-bit) addressing.
• Data may have to be aligned on
word (4 byte) or double word (8
byte) boundary.
int is 4 bytes
double precision floating point is 8 bytes
• 32-bit v.s. 64-bit addresses
we will assume 32-bit for rest of course,
unless otherwise stated
11 © Alvin R. Lebeck CPS 104 Clk 5 Rw Ra Rb 32 32-bit Registers Rd ALU Clk Data In DataOut Data Address Ideal Data Memory Instruction Instruction Address Ideal Instruction Memory Clk PC 5 Rs 5 Rt 16 Imm 32 32 32 32 A B
Our Naïve View of Memory (Single Cycle)
Question
• What issues do we need to worry about in
13 © Alvin R. Lebeck CPS 104 I/O Bus Memory Bus CPU Cache Disk Controller Disk Memory Disk Graphics
Controller Network Interface
Graphics Network interrupts
System Organization
I/O Bridge
Core Chip Set
The
memory
hierarchy
© Alvin R. Lebeck
Level Two Cache
Datapath
Registers
Level One
Cache
Control
Processor
Processor and Caches
15
© Alvin R. Lebeck CPS 104
Memory
Controller
Memory Bus
DIMM Slot 0 DIMM Slot 1 DIMM Slot 2 DIMM Slot 3 DIMM Slot 4 DIMM Slot 5 DIMM Slot 6 DIMM Slot 7
DRAM DIMM
DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAMMain Memory
Why is it called DRAM?
To Processor
• Random Access:
“Random” is good: access time is the same for all locations
DRAM: Dynamic Random Access Memory
» High density, low power, cheap, slow » Dynamic: needs to be “refreshed” regularly » Main memory
SRAM: Static Random Access Memory
» Low density, high power, expensive, fast
» Static: content will last “forever” (until power loss) » Caches
• “Not-so-random” Access Technology:
Access time varies from location to location and from time to time Examples: Disk, DVD/CD
• Sequential Access Technology: access time linear in
location (e.g.,Tape)
17
© Alvin R. Lebeck CPS 104
• Why do computer professionals need to know about
RAM technology?
Processor performance is usually limited by
memory latency and bandwidth.
Latency: The time it takes to access a single word in memory.
Bandwidth: The average speed of access to memory (Words/Sec).
As integrated circuit (IC) densities increase, lots of memory will fit
on processor chip
» Tailor on-chip memory to specific needs.
- Instruction cache - Data cache - Write buffer
• What makes RAM different from a bunch of flip
-flops?
Density: RAM is much more dense
Speed: RAM access is slower than flip-flop (register) access.
Random Access Memory (RAM) Technology
© Alvin R. Lebeck
DRAM
Year Size Cycle Time 1980 64 Kb 250 ns 1983 256 Kb 220 ns 1986 1 Mb 190 ns 1989 4 Mb 165 ns 1992 16 Mb 145 ns 1995 64 Mb 120 ns 1999 128Mb 100 ns 2003 256Mb 100 ns 2007 2Gb 55ns
Capacity
Speed
Logic:
2x in 3 years 2x in 3 years
DRAM:
4x in 3 years 1.4x in 10 years
Disk:
2x in 3 years 1.4x in 10 years
1000:1!
2:1!
19
© Alvin R. Lebeck CPS 104
6-Transistor SRAM Cell
bit bit
word (row select)
• Write:
1. Drive bit lines (bit=1, bit=0) 2. Select row
• Read:
1. Precharge bit and bit to Vdd (set to 1) 2. Select row
3. Cell pulls one line low (pulls to 0)
4. Sense amp on column detects difference between bit and bit
bit bit
word
1 0
0 1
Static RAM Cell
SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM Cell - Sense Amp + - Sense Amp + - Sense Amp + - Sense Amp +
:
:
:
:
Word 0 Word 1 Word 15 Dout 0 Dout 1 Dout 2 Dout 3- Wr Driver & Precharger + - Wr Driver & Precharger + - Wr Driver & Precharger + - Wr Driver & Precharger +
Addr ess Decoder WrEn Precharge Din 0 Din 1 Din 2 Din 3 A0 A1 A2 A3
21
© Alvin R. Lebeck CPS 104
• Write Enable is usually active low (WE_L)
• Din and Dout are combined to save pins:
A new control signal, output enable (OE_L) is needed WE_L is asserted (Low), OE_L is disasserted (High)
» D serves as the data input pin
WE_L is disasserted (High), OE_L is asserted (Low) » D is the data output pin
Both WE_L and OE_L are asserted: » Result is unknown. Don’t do that!!!
A D OE_L 2 N words x M bit SRAM N M WE_L
Logic Diagram of a Typical SRAM
© Alvin R. Lebeck Write Timing: D Read Timing: WE_L A Write Hold Time
Write Setup Time
A D OE_L 2 N words x M bit SRAM N M WE_L Data In Write Address OE_L High Z
Junk Read Address Junk Read Access Time Data Out Read Access Time Data Out Junk Read Address
23
© Alvin R. Lebeck CPS 104
• Dynamic RAM (DRAM):
Refresh required Very high density Low power (.1 - .5 W active,
.25 - 10 mW standby)
Low cost per bit Pin sensitive (few pins):
» Output Enable (OE_L) » Write Enable (WE_L) » Row address strobe (ras) » Col address strobe (cas)
cell array NxN bits N N r o w SA & c o l addr log N 2 D WE_L OE_L
Introduction to DRAM
• Write:
1. Drive bit line 2. Select row
• Read:
1. Precharge bit line to Vdd (1) 2. Select row
3. Cell and bit line share charges » Very small voltage changes on the
bit line
4. Sense (fancy sense amp)
» Can detect changes of ~1 million
electrons
5. Write: restore the value
• Refresh
1. Just do a dummy read to every cell.
row select
bit
25 © Alvin R. Lebeck CPS 104 r o w d e c o d e r row address Sense-Amps, Column Selector &
I/O Circuits Column Address
data RAM Cell Array
word (row) select bit (data) lines
• Row and Column Address together:
Select 1 bit a time
Each intersection represents a 1-T DRAM Cell
Classical DRAM Organization (square)
© Alvin R. Lebeck
• Typical DRAMs: access multiple bits in parallel
Example: 2 Mb DRAM = 256K x 8 = 512 rows x 512 cols x 8 bits Row and column addresses are applied to all 8 planes in parallel
One “Plane” of 256 Kb DRAM 512 rows Plane 0 512 cols D<0> Plane 1 D<1> Plane 7 D<7> 256 Kb DRAM 256 Kb DRAM
27 © Alvin R. Lebeck CPS 104 A D OE_L 256K x 8 DRAM 9 8 WE_L
• Control Signals (RAS_L, CAS_L, WE_L, OE_L) are all
active low
• Din and Dout are combined (D):
WE_L is asserted (Low), OE_L is disasserted (High) » D serves as the data input pin
WE_L is disasserted (High), OE_L is asserted (Low) » D is the data output pin
• Row and column addresses share the same pins (A)
RAS_L goes low: Pins A are latched in as row address CAS_L goes low: Pins A are latched in as column address RAS/CAS edge-sensitive
CAS_L RAS_L
Logic Diagram of a Typical DRAM
A D OE_L 256K x 8 DRAM 9 8 WE_L CAS_L RAS_L
• Every DRAM access begins at:
The assertion of the RAS_L 2 ways to write:early or late v. CAS
WE_L A Row Address
OE_L
Junk
WR Access Time WR Access Time CAS_L
RAS_L
Col Address Row Address Col Address Junk
D Junk Data In Junk Data In Junk
DRAM WR Cycle Time
Early Wr Cycle: WE_L asserted before CAS_L Late Wr Cycle: WE_L asserted after CAS_L
29 © Alvin R. Lebeck CPS 104 A D OE_L 256K x 8 DRAM 9 8 WE_L CAS_L RAS_L
• Every DRAM access begins at:
The assertion of the RAS_L 2 ways to read:early or late v. CAS
OE_L A Row Address WE_L Junk Read Access Time Output Enable Delay CAS_L RAS_L
Col Address Row Address Col Address Junk
D High Z Data Out
DRAM Read Cycle Time
Early Read Cycle: OE_L asserted before CAS_L Late Read Cycle: OE_L asserted after CAS_L
Junk Data Out High Z
Asynchronous DRAM Read Timing
© Alvin R. Lebeck
Access Pattern without Interleaving:
Start Access for D1
CPU Memory
Start Access for D2 D1 available
Access Pattern with 4-way Interleaving:
Access Bank 0
Access Bank 1
Access Bank 2
Access Bank 3
We can Access Bank 0 again CPU Memory Bank 1 Memory Bank 0 Memory Bank 3 Memory Bank 2
Increasing Bandwidth - Interleaving
31
© Alvin R. Lebeck CPS 104
Fast Memory Systems: DRAM specific
• Multiple RAS accesses: several names
page mode, fast page mode, EDO
64 Mbit DRAM: cycle time = 100 ns, page mode = 20 ns
• New DRAMs
Synchronous DRAM (SDRAM): Provide a clock signal to DRAM, transfer synchronous to system clock
Dual Data Rate DRAM (DDRAM) Also RAMBUS (DDR, DDR2, DDR3) » transfer data on both clock edges
» Each Chip a module vs. slice of memory » Short bus between CPU and chips » Does own refresh
» Variable amount of data returned » 1 byte / 2 ns (500 MB/s per chip)
Cached DRAM (CDRAM): Keep entire row in SRAM
Summary of Memory Technology
• DRAM is
slow
but
cheap
and
dense
:
Good choice for presenting the user with a BIG memory system Uses one transistor, must be refreshed.
• SRAM is
fast
but
expensive
and
not very dense
:
Good choice for providing the user FAST access time.
Uses six transistors, holds state as long as power is supplied.
• GOAL:
Present the user with large amounts of memory using the cheapest
technology.
Provide access at the speed offered by the fastest technology.