• No results found

Administrivia. Memory Technology. The Five Steps of a Load Instruction. Today s Lecture. A Pipelined Datapath

N/A
N/A
Protected

Academic year: 2021

Share "Administrivia. Memory Technology. The Five Steps of a Load Instruction. Today s Lecture. A Pipelined Datapath"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Memory Technology

Computer Science 104

Lecture 16

2 © Alvin R. Lebeck CPS 104

Administrivia

•  Midterm II Next Wednesday

•  HW 4 due Friday

3 © Alvin R. Lebeck CPS 104

•  Memory

Outline

•  Review

•  Big Picture of Memory

•  Memory Technology

 SRAM  DRAM

Reading

C.9

Today’s Lecture

4 © Alvin R. Lebeck CPS 104 Instr Decode / Reg Fetrch

The Five Steps of a Load Instruction

Clk PC Rs, Rt, Rd, Op, Func Clk-to-Q ALUctr

Instruction Memory Access Time

Old Value New Value

RegWr Old Value New Value

Delay through Control Logic

busA

Register File Access Time

Old Value New Value

busB

ALU Delay

Old Value New Value

Old Value New Value New Value Old Value

ExtOp Old Value New Value

ALUSrc Old Value New Value

Address Old Value New Value

busW Old Value New

Delay through Extender & Mux

Data Memory Access Time

Instruction Fetch Address Data Memory Reg Wr

R egi ste r F ile W rite T ime 5 © Alvin R. Lebeck

A Pipelined

Datapath

IF /ID R eg ist er ID /Ex R eg ist er Ex/ Me m R eg ist er Me m/ W r R eg ist er PC Data Mem WA Di RA Do IUnit A I RFile Di Ra Rb Rw MemW r RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc Mu x 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 PC+4 Rs Rt PC +4 Zero Branch 1 0 Clk

Ifetch Reg/Dec Exec Mem WrB

6 © Alvin R. Lebeck

A More Extensive Pipelining

Example

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Ifetch Reg/Dec Exec Mem WrB

0: Load

Ifetch Reg/Dec Exec Mem WrB

4: R-type

Ifetch Reg/Dec Exec Mem WrB

8: Store

Ifetch Reg/Dec Exec Mem WrB

12: Beq (target is 1000)

End of

Cycle 4 Cycle 5 End of Cycle 6 End of Cycle 7 End of •  End of Cycle 4: Load’sMem, R-type’s Exec, Store’s Reg, Beq’s Ifetch

•  End of Cycle 5: Load’sWrB, R-type’s Mem, Store’s Exec, Beq’sReg

•  End of Cycle 6: R-type’sWrB, Store’sMem, Beq’sExec

(2)

7 © Alvin R. Lebeck

Initial Representation: Finite State Diagram

1: PCWr, IRWr ALUOp=Add Others: 0s x: PCWrCond RegDst, Mem2R Ifetch 1: BrWr, ExtOp ALUOp=Add Others: 0s x: RegDst, PCSrc ALUSelB=10 IorD, MemtoReg Rfetch/Decode 1: PCWrCond ALUOp=Sub x: IorD, Mem2Reg ALUSelB=01 RegDst, ExtOp ALUSelA BrComplete PCSrc 1: RegDst ALUOp=Rtype ALUSelB=01 x: PCSrc, IorD MemtoReg ALUSelA ExtOp RExec 1: RegDst, RegWr ALUOp=Rtype ALUselA x: IorD, PCSrc ALUSelB=01 ExtOp Rfinish ALUOp=Or IorD, PCSrc 1: ALUSelA ALUSelB=11 x: MemtoReg OriExec 1: ALUSelA ALUOp=Or x: IorD, PCSrc RegWr ALUSelB=11 OriFinish ALUOp=Add PCSrc 1: ExtOp ALUSelB=11 x: MemtoReg ALUSelA AdrCal ALUOp=Add x: PCSrc,RegDst 1: ExtOp ALUSelB=11 MemtoReg MemWr ALUSelA SWMem ALUOp=Add x: MemtoReg 1: ExtOp ALUSelB=11 ALUSelA, IorD PCSrc LWmem ALUOp=Add x: PCSrc 1: ALUSelA ALUSelB=11 MemtoReg RegWr, ExtOp IorD LWwr lw or sw lw sw Rtype Ori beq 0 1 8 10 6 5 3 2 4 7 11 Wait Wait 8 © Alvin R. Lebeck CPS 104

•  The Five Classic Components of a Computer

•  Today’s Topic: Memory Technology

Control Datapath Memory Processor Input Output

Big Picture

9 © Alvin R. Lebeck CPS 104

Where Are We?

I/O system CPU Compiler Operating System Application Digital Design Circuit Design Instruction Set Architecture, Memory, I/O Firmware Memory

Software

Hardware

Interface Between

HW and SW

You are here.

10 © Alvin R. Lebeck CPS 104 1 2 3 4 2n-1 • • 0 00110110 00001100 Byte Address Data

Review: Program’s View of Memory

•  Memory is a large linear array of

bytes.

 Each byte has a unique address (location).  Byte of data at address 0x100, and 0x101

•  Most computers have instructions

with byte (8-bit) addressing.

•  Data may have to be aligned on

word (4 byte) or double word (8

byte) boundary.

  int is 4 bytes

 double precision floating point is 8 bytes

•  32-bit v.s. 64-bit addresses

 we will assume 32-bit for rest of course, unless otherwise stated

1 2n-1 • • 0 2n-1-4 Word Address 11 © Alvin R. Lebeck CPS 104 Clk 5 Rw Ra Rb 32 32-bit Registers Rd A LU Clk Data In DataOut Data Address Ideal Data Memory Instruction Instruction Address Ideal Instruction Memory Clk PC 5 Rs 5 Rt 16 Imm 32 32 32 32 A B

Our Naïve View of Memory (Single Cycle)

12

© Alvin R. Lebeck CPS 104

Question

•  What issues do we need to worry about in

(3)

13 © Alvin R. Lebeck CPS 104 I/O Bus Memory Bus CPU Cache Disk Controller Disk Memory Disk Graphics Controller Network Interface Graphics Network interrupts

System Organization

I/O Bridge

Core Chip Set

The

memory

hierarchy

14

© Alvin R. Lebeck CPS 104

Level Two Cache Datapath Registers

Level One

Cache Control Processor

Processor and Caches

To main memory

15

© Alvin R. Lebeck CPS 104

Memory

Controller Memory Bus

D IM M S lot 0 D IM M S lot 1 D IM M S lot 2 D IM M S lot 3 D IM M S lot 4 D IM M S lot 5 D IM M S lot 6 D IM M S lot 7 DRAM DIMM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM

Main Memory

Why is it called DRAM?

To Processor

16

© Alvin R. Lebeck CPS 104

•  Random Access:

 “Random” is good: access time is the same for all locations

 DRAM: Dynamic Random Access Memory

»  High density, low power, cheap, slow »  Dynamic: needs to be “refreshed” regularly »  Main memory

 SRAM: Static Random Access Memory »  Low density, high power, expensive, fast »  Static: content will last “forever” (until power loss) »  Caches

•  “Not-so-random” Access Technology:

 Access time varies from location to location and from time to time  Examples: Disk, DVD/CD

•  Sequential Access Technology: access time linear in

location (e.g.,Tape)

Memory Technology

17

© Alvin R. Lebeck CPS 104

•  Why do computer professionals need to know about

RAM technology?

 Processor performance is usually limited by memory latency and bandwidth.

 Latency: The time it takes to access a single word in memory.

 Bandwidth: The average speed of access to memory (Words/Sec).  As integrated circuit (IC) densities increase, lots of memory will fit

on processor chip

»  Tailor on-chip memory to specific needs.

-  Instruction cache -  Data cache -  Write buffer

•  What makes RAM different from a bunch of

flip-flops?

 Density: RAM is much more dense

 Speed: RAM access is slower than flip-flop (register) access.

Random Access Memory (RAM) Technology

18

© Alvin R. Lebeck CPS 104

DRAM

Year Size Cycle Time 1980 64 Kb 250 ns 1983 256 Kb 220 ns 1986 1 Mb 190 ns 1989 4 Mb 165 ns 1992 16 Mb 145 ns 1995  64 Mb 120 ns 1999  128Mb 100 ns 2003  256Mb 100 ns 2007 2Gb 55ns 2010 2Gb 20ns Capacity Speed Logic: 2x in 3 years 2x in 3 years DRAM: 4x in 3 years 1.4x in 10 years Disk: 2x in 3 years 1.4x in 10 years

1000:1! 2:1!

(4)

19

© Alvin R. Lebeck CPS 104

6-Transistor SRAM Cell

bit bit

word (row select)

•  Write:

1. Drive bit lines (bit=1, bit=0) 2. Select row

•  Read:

1. Precharge bit and bit to Vdd (set to 1) 2. Select row

3. Cell pulls one line low (pulls to 0)

4. Sense amp on column detects difference between bit and bit

bit bit

word

1 0 0 1

Static RAM Cell

20 © Alvin R. Lebeck CPS 104 SRAM Cell SRAM Cell SRAM Cell SRAM Cell SRAM

Cell SRAM Cell SRAM Cell SRAM Cell

SRAM Cell SRAM Cell SRAM Cell SRAM Cell - Sense Amp + - Sense Amp + - Sense Amp + - Sense Amp +

:

:

:

:

Word 0 Word 1 Word 15 Dout 0 Dout 1 Dout 2 Dout 3

- Wr Driver & Precharger + - Wr Driver & Precharger + - Wr Driver & Precharger + - Wr Driver & Precharger +

A d d re ss D ec od er WrEn Precharge Din 0 Din 1 Din 2 Din 3 A0 A1 A2 A3

Typical SRAM Organization: 16-word x 4-bit

21

© Alvin R. Lebeck CPS 104

•  Write Enable is usually active low (WE_L)

•  Din and Dout are combined to save pins:

 A new control signal, output enable (OE_L) is needed  WE_L is asserted (Low), OE_L is disasserted (High)

»  D serves as the data input pin

 WE_L is disasserted (High), OE_L is asserted (Low) »  D is the data output pin

 Both WE_L and OE_L are asserted: »  Result is unknown. Don’t do that!!!

A D OE_L 2 N words x M bit SRAM N M WE_L

Logic Diagram of a Typical SRAM

22 © Alvin R. Lebeck CPS 104 Write Timing: D Read Timing: WE_L A Write Hold Time

Write Setup Time A D OE_L 2 N words x M bit SRAM N M WE_L Data In Write Address OE_L High Z

Junk Read Address Junk Read Access Time Data Out Read Access Time Data Out Junk Read Address

Typical SRAM Timing

23

© Alvin R. Lebeck CPS 104

•  Dynamic RAM (DRAM):   Refresh required   Very high density   Low power (.1 - .5 W active, .25 - 10 mW standby)   Low cost per bit   Pin sensitive (few pins):

»  Output Enable (OE_L) »  Write Enable (WE_L) »  Row address strobe (ras) »  Col address strobe (cas)

cell array NxN bits N N r o w SA & c o l addr log N 2 D WE_L OE_L

Introduction to DRAM

24 © Alvin R. Lebeck CPS 104

•  Write:

 1. Drive bit line  2. Select row

•  Read:

 1. Precharge bit line to Vdd (1)  2. Select row

 3. Cell and bit line share charges »  Very small voltage changes on the

bit line

 4. Sense (fancy sense amp) »  Can detect changes of ~1 million

electrons  5. Write: restore the value

•  Refresh

 1. Just do a dummy read to every cell.

row select

bit

(5)

25 © Alvin R. Lebeck CPS 104 r o w d e c o d e r row address Sense-Amps, Column Selector &

I/O Circuits Column Address

data RAM Cell Array

word (row) select bit (data) lines

•  Row and Column Address together:

 Select 1 bit a time

Each intersection represents a 1-T DRAM Cell

Classical DRAM Organization (square)

26

© Alvin R. Lebeck CPS 104

•  Typical DRAMs: access multiple bits in parallel

 Example: 2 Mb DRAM = 256K x 8 = 512 rows x 512 cols x 8 bits  Row and column addresses are applied to all 8 planes in parallel

One “Plane” of 256 Kb DRAM 512 row s Plane 0 512 cols D<0> Plane 1 D<1> Plane 7 D<7> 256 Kb DRAM 256 Kb DRAM

Typical DRAM Organization

27

© Alvin R. Lebeck CPS 104

Access Pattern without Interleaving:

Start Access for D1

CPU Memory

Start Access for D2 D1 available

Access Pattern with 4-way Interleaving:

A cc es s Ban k 0 Access Bank 1 Access Bank 2 Access Bank 3

We can Access Bank 0 again CPU Memory Bank 1 Memory Bank 0 Memory Bank 3 Memory Bank 2

Increasing Bandwidth - Interleaving

Cycle Time

28 © Alvin R. Lebeck

MICRON 2Gb DRAM (512Mx4, circa 2010)

CPS 104

29

© Alvin R. Lebeck CPS 104

Fast Memory Systems: DRAM specific

•  Modern DRAMs

 Synchronous DRAM (SDRAM): Provide a clock signal to DRAM, transfer synchronous to system clock

 Dual Data Rate DRAM (DDRAM) Also RAMBUS (DDR, DDR2, DDR3) »  transfer data on both clock edges

»  Each Chip a module vs. slice of memory »  Short bus between CPU and chips »  Does own refresh

»  Variable amount of data returned »  1 byte / 2 ns (500 MB/s per chip)

30

© Alvin R. Lebeck CPS 104

Summary of Memory Technology

•  DRAM is

slow

but

cheap

and

dense

:

 Good choice for presenting the user with a BIG memory system  Uses one transistor, must be refreshed.

•  SRAM is

fast

but

expensive

and

not very dense

:

 Good choice for providing the user FAST access time.  Uses six transistors, holds state as long as power is supplied.

•  GOAL:

 Present the user with large amounts of memory using the cheapest technology.

 Provide access at the speed offered by the fastest technology.

References

Related documents

That the granting of the variance will not adversely affect the public health, safety or welfare, will not alter the essential character of the general vicinity, will not cause

OE_L A Row Address WE_L Junk Read Access Time Output Enable Delay CAS_L RAS_L. Col Address Row Address Col Address

◼ ISA influences design of datapath and control ◼ Datapath and control influence design of ISA ◼ Pipelining improves instruction throughput.

Two Port Measurement of a 12 kHz Band Pass Filter: Reverse Measurement. TX

 Processor Control

113 Solution Sort a linked list using insertion sort in Java Code: package algorithm.sort; class ListNode { int val; ListNode next; ListNode( int x) { val = x; next = null ; } }.

(21) If p is good, then the variety U (of unipotent elements of G) is amorphic, as a G-space, to the variety N of nilpotent elements fL. If p = 0, Kostant [5] ias obtained

Note in this figure that the datapath sends the signal lteflg to the control unit, and the control unit sends the register load signals ald, sqld, dld, and outld to the datapath..