• No results found

Pipeline

N/A
N/A
Protected

Academic year: 2021

Share "Pipeline"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

ECE4680 Pipeline.1

ECE4680 Pipeline.1 2002-4-32002-4-3

ECE4680

ECE4680

Computer Organiza

Computer Organiza

tion

tion

and Arch

and Arch

itecture

itecture

De

De

signin

signin

g a Pipeline Processor 

g a Pipeline Processor 

ECE4680 Pipeline.2

ECE4680 Pipeline.2 2002-4-32002-4-3

 A Si

 A Si

ng

ng

le C

le C

yc

yc

le Pr

le Pr

oc

oc

ess

ess

or 

or 

32 32 ALUctr  ALUctr  Clk  Clk   busW  busW RegWr  RegWr  32 32 32 32  busA  busA 32 32  busB  busB 5 5 5 5 55 R Rw w RRa a RRbb 32 32-bit 32 32-bit Registers Registers Rs Rs Rt Rt Rt Rt Rd  Rd  RegDst RegDst E  E  x x  t    t    e   e  n n  d   d   e   e  r  r  M M  u  u x x Mux Mux 32 32 16 16 imm16 imm16 ALUSrc ALUSrc ExtOp ExtOp M M  u  u x x MemtoReg MemtoReg Clk  Clk  Data In Data In WrEn WrEn 32 32 Adr  Adr  Data Data Memory Memory 32 32 MemWr  MemWr  A A L  L   U  U Zero Zero 0 0 1 1 0 0 1 1 0 0 1 1 Instruction Instruction Fetch Unit Fetch Unit Clk  Clk  Instruction<31:0> Instruction<31:0> Jump Jump Branch Branch <  <  2   2   1   1    :    :  2  2    5    5   <  <  1   1    6    6    :    :  2  2    0    0   <  <  1   1   1   1    :    :  1  1    5    5   <  <   0    0    :    :  1  1    5    5   >  >  Imm1 Imm166 Rd  Rd  Main Main Control Control op op ALU ALU Control Control func func ALUop ALUop 3 3 RegDst RegDst ALUSrc ALUSrc

::

<5:0> <5:0> <31:26> <31:26> Instr<15:0> Instr<15:0> Zero Zero 3 3

(2)

ECE4680 Pipeline.3

ECE4680 Pipeline.3 2002-4-32002-4-3

Dra

Dra

wbacks o

wbacks o

f this

f this

Single Cy

Single Cy

cle Processor 

cle Processor 

!

!Long cycle time:Long cycle time:

• Cycle time must be Cycle time must be long enough for the load instructiolong enough for the load instructio n:n:

-- PC’s PC’s Clock -to-Q Clock -to-Q ++

-- InstructioInstructio n Men Memory Access Time +mory Access Time +

-- Register File Register File Access Time Access Time ++

--  AL AL U Delay U Delay (add(add resres s cs c alcalc ululatiati onon ) ) ++

-- Data Data Memory Access Time Memory Access Time ++

-- Register File Register File Setup TSetup Time ime ++

-- Clock SkewClock Skew

!

!Cycle time is much longer than needeCycle time is much longer than needed for all other instrd for all other instr uctionsuctions .. Examples:

Examples:

• R-R-type instructype instruc tions dtions d o not o not require data merequire data memory accessmory access

• Jump dJump d oes not require ALU operation nor data memory accessoes not require ALU operation nor data memory access

Overview of a Multiple Cycle I

Overview of a Multiple Cycle I

mpl

mpl

ementation

ementation

!

!The root of the single cycle processor’s prThe root of the single cycle processor’s pr oblems:oblems:

• The cyThe cycle time has to be long enough for tcle time has to be long enough for t he slowest instructihe slowest instructi onon

!

!Solution:Solution:

• Break Break the instruction into smaller the instruction into smaller stepssteps

• Execute eExecute each step (instead of the entire instrach step (instead of the entire instr uction) in uction) in one cycleone cycle

-- Cycle time: time it takes to execute the longest stepCycle time: time it takes to execute the longest step

-- Keep all the steps to Keep all the steps to have similar lengthhave similar length

• This is This is the ethe essence of the ssence of the multiple cycle processor multiple cycle processor 

!

!The aThe advantagedvantages of s of the multiple cthe multiple c ycle processor:ycle processor:

• Cycle time is much shorter Cycle time is much shorter 

• Different instructions take different number of cycles to completeDifferent instructions take different number of cycles to complete

-- Load takes five cyclesLoad takes five cycles

-- Jump oJump o nly takes three cyclesnly takes three cycles

• Al Allolo wws a fs a f unun ctct ioio nal nal unun it it to to be ube u sed sed momo re tre t han han ononce pce p er ier i nsns trtr ucuc titi onon

Why will this hinder Why will this hinder

(3)

ECE4680 Pipeline.5 2002-4-3

Multiple Cycle Processor 

!MCP: A functio nal unit to be used more than once per instruction

Ideal Memory WrAdr  Din RAdr  32 32 32 Dout MemWr  32 A L   U 32 32 ALUOp ALU Control I   n  s   t   r   u  c   t   i    o n R  e   g 32 IRWr  32 Reg File Ra Rw  busW Rb 5 5 32  busA 32  busB RegWr  Rs Rt M  u x 0 1 Rt Rd  PCWr  ALUSelA Mux 0 1 RegDst M  u x 0 1 32 PC MemtoReg Extend ExtOp M  u x 0 1 32 0 1 2 3 4 16 Imm 32 << 2 ALUSelB M  u x 1 0 Target 32 Zero Zero PCWrCond PCSrc BrWr   32 IorD ECE4680 Pipeline.6 2002-4-3

Outline of Today’s

Lecture---Pipelining

!Introductio n to the Concept of Pipelined Processor

!Pipelined Datapath and Pipelined Contr ol

!How to Avoid Race Condition in a Pipeline Design?

(4)

ECE4680 Pipeline.7 2002-4-3

Timing Diagram of a Load Instruction

Clk  PC Rs, Rt, Rd, Op, Func Clk-to-Q ALUctr 

Instruction Memory Access Time

Old Value New Value

RegWr Old Value New Value

Delay through Control Logic

 busA

Register File Access Time

Old Value New Value

 busB

ALU Delay

Old Value New Value

Old Value New Value

 New Value Old Value

ExtOp Old Value New Value

ALUSrc Old Value New Value

Address Old Value New Value

 busW Old Value New

Delay through Extender & Mux

Data Memory Access Time

Instruction Fetch Instr Decode /  Reg. Fetch

Address Data Memory Reg Wr

R  e   g i    s   t    e  r  F  i   l    e   W r  i    t    e  T  i   m  e  1 3 2

The Five Stages of Load

!Ifetch: Instruction Fetch

• Fetch the instruct ion from the Instructi on Memory

!Reg/Dec: Registers Fetch and Instruction Decode

!Exec: Calculate the memory address

!Mem: Read the data fro m the Data Memory

!Wr: Write the data back to the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem Wr Load

(5)

ECE4680 Pipeline.9 2002-4-3

Key Ideas Behind Pipelining

!Grading t he mid t erm exams:

• 5 problems, five people grading the exam

• Each person ONLY grades one problem

• Pass the exam to the next person as soon as one finishes his part

• As su me eac h p ro blem t akes 0.5 ho ur to gr ade

- Each in dividual exam still takes 2.5 hours t o grade

- But w ith 5 people, all exams can b e graded muc h quick er 

!The load inst ruction h as 5 stages:

• Five independent functional units t o work on each stage

- Each funct ional unit is used only once

• The 2nd load can start as soon as the 1st finis hes its Ifet stage

• Each load still t akes five cycles to complete

• The throughput, however, is much hig her 

ECE4680 Pipeline.10 2002-4-3

Pipelining th e Load Instruc tion

!The five independent functional units in the pipeline datapath are:

• Instruction Memory for the Ifetch stage

• Register File’s Read ports (bus A and busB) for the Reg/Dec stage

• ALU fo r t he Ex ec s tag e

• Data Memory for th e Mem stage

• Register File’s Write port (bus W) for the Wr stage

!One instruction enters the pipeline every cycle

• One instruction c omes out of the pipeline (complete) every cycle

• Comparison with m -cycle processor: 5x3 cycles versus 7 cycles

• The “ Effective” Cycles per Instruct ion (CPI) is 1

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Ifetch Reg/Dec Exec Mem Wr 1st lw

Ifetch Reg/Dec Exec Mem Wr 2nd lw

Ifetch Reg/Dec Exec Mem Wr 3rd lw

Regiester file is used 2 times but no problem !

(6)

ECE4680 Pipeline.11 2002-4-3

The Four Stages of R-type

!Ifetch: Instruction Fetch

• Fetch the instruct ion from the Instructi on Memory

!Reg/Dec: Registers Fetch and Instruction Decode

!Exec: ALU operates on the two r egister operands

!Wr: Write the ALU output back to the register file

Cy cle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec Wr R-type

Pipelining the R-type and Load Instruct ion

!Ideal Case

• each functional unit is used once per instruct ion AND

• all instructio ns are the same, just as discussed in the previous 2 slides.

!We have a prob lem:

• Two instruct ions try t o write to th e register file at the same time!

!We will see many other probl ems involved in pi pelining.

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Exec Wr R-type

Ifetch Reg/Dec Exec Wr R-type

Ifetch Reg/Dec Exec Mem Wr Load

Ifetch Reg/Dec Exec Wr R-type

Ifetch Reg/Dec Exec Wr R-type

(7)

ECE4680 Pipeline.13 2002-4-3

Important Observation

!Each functional unit can only be used once per instruct ion:

• necessary but not su fficient.

!Each functional unit mus t be used at the same stage for all instruc tions:

• Load uses Register File’s Write Port during its 5th stage

• R-type uses Register File’s Write Port du ring it s 4th s tage

Ifetch Reg/Dec Exec Mem Wr Load

1 2 3 4 5

Ifetch Reg/Dec Exec Wr R-type

1 2 3 4

Problem !

ECE4680 Pipeline.14 2002-4-3

Solution 1: Insert “ Bubble” into the Pipeline

!Insert a “ bubble” into the pipeline to prevent 2 writes at the same cycle

• The control logic can be complex

!No instructi on is compl eted during Cycle 5:

• The “ Effective” CPI for load is 2

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch R-type

Ifetch Reg/Dec Exec Mem Wr Load

Ifetch Reg/Dec Exec R-type

Ifetch Reg/Dec R-type

Ifetch Reg/Dec Exec Wr

Reg/Dec Exec Wr Ifetch Reg/Dec Exec

Wr

Exec Wr

Reg/Dec Exec Wr Ifetch Reg/Dec Exec

Wr

Exec Wr

Pipeline Bubble

(8)

ECE4680 Pipeline.15 2002-4-3

Soluti on 2: Delay R-type’s Writ e by One Cycle

!Delay R-type’s r egister write b y one c ycle:

• Now R-type instruct ions also use Reg File’s write port at Stage 5

• Mem stage is a NOOP stage: nothin g is being done

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Ifetch Reg/Dec Wr

R-type

Ifetch Reg/Dec Wr

R-type

Ifetch Reg/Dec Exec Mem Wr Load

Ifetch Reg/Dec Wr

R-type

Ifetch Reg/Dec Wr

R-type

Ifetch Reg/Dec Exec Wr

R-type Mem Exec Exec Exec Exec 1 2 3 4 5 Mem Mem Mem Mem

The Four Stages of Stor e

!Ifetch: Instruction Fetch

• Fetch the instruct ion from the Instructi on Memory

!Reg/Dec: Registers Fetch and Instruction Decode

!Exec: Calculate the memory address

!Mem: Write the data int o the Data Memory

Cy cle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec Mem

Store Wr

(9)

ECE4680 Pipeline.17 2002-4-3

The Four Stages of Beq

!Ifetch: Instruction Fetch

• Fetch the instruct ion from the Instructi on Memory

!Reg/Dec: Registers Fetch and Instruction Decode

!Exec: See slide 23 for more details

• ALU co mp ares th e tw o r egi st er o per and s

• Adder cal cu lat es t he b ran ch tar get addres s

!Mem: If the registers we compared in the Exec stage are the same,

• Write the branch target address into the PC

Cy cle 1 Cycle 2 Cycle 3 Cycle 4

Ifetch Reg/Dec Exec Mem

Beq Wr

Find the difference of Beq between pipeline processor and

M-cycle processor, think why?

NOOP!

ECE4680 Pipeline.18 2002-4-3

 A Pi pel in ed Dat apath

I   F   /   I   D R  e   g i    s   t    e  r  I   D  /   E  x R  e   g i    s   t    e  r  E  x  /   M  e  m R  e   g i    s   t    e  r  M  e  m  /    W r  R  e   g i    s   t    e  r  P   C Data Me m WA Di RA Do I    U n i    t   A I RFile Di Ra Rb Rw MemWr  RegWr    ExtOp Exec Unit  busA  busB Imm16 ALUOp ALUSrc M  u x 1 0 MemtoReg 1 0 RegDst Rt Rd  Imm16 PC+4 PC+4 Rs Rt P   C  +  4   Zero Branch 1 0 Clk

Ifetch Reg/Dec Exec Mem Wr

Why need such registers?  N  o  d   a  t    a  p  a  t   h   u  d   e  r   W r   ?  Clock-to-Q delay

(10)

ECE4680 Pipeline.19 2002-4-3

The Instruction Fetch Stage

I   F   /   I   D  :  l    w  $   1   , 1   0   0   (    $   2   )   I   D  /   E  x R  e   g i    s   t    e  r  E  x  /   M  e  m R  e   g i    s   t    e  r  M  e  m  /    W r  R  e   g i    s   t    e  r  P   C = 1  4  Data Me m WA Di RA Do I    U n i    t   A I RFile Di Ra Rb Rw MemWr  RegWr    ExtOp Exec Unit  busA  busB Imm16 ALUOp ALUSrc M  u x 1 0 MemtoReg 1 0 RegDst Rt Rd  Imm16 PC+4 PC+4 Rs Rt P   C  +  4   Zero Branch 1 0 Clk

Ifetch Reg/Dec Exec Mem

You are here!

!Loc ation 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

 A Detail View of th e Ins tr uc ti on Uni t

!Loc ation 10: lw $1, 0x100($2) I   F   /   I   D  :  l    w  $   1   , 1   0   0   (    $   2   )   P   C = 1  4  1 0 10 A  d   d   e  r  Instruction Memory “4” Instruction Address Clk Ifetch

You are here!

Reg/Dec

(11)

ECE4680 Pipeline.21 2002-4-3

The Decod e / Regis ter Fetch Stage

I   F   /   I   D  :  I   D  /   E  x  :  R  e   g  . 2   &  0  x 1   0   0  E  x  /   M  e  m R  e   g i    s   t    e  r  M  e  m  /    W r  R  e   g i    s   t    e  r  P   C Data Me m WA Di RA Do I    U n i    t   A I RFile Di Ra Rb Rw MemWr  RegWr    ExtOp Exec Unit  busA  busB Imm16 ALUOp ALUSrc M  u x 1 0 MemtoReg 1 0 RegDst Rt Rd  Imm16 PC+4 PC+4 Rs Rt P   C  +  4   Zero Branch 1 0 Clk

Ifetch Reg/Dec Exec Mem

You are here!

!Loc ation 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

ECE4680 Pipeline.22 2002-4-3

Load’s Add ress Calculation Stage

I   F   /   I   D  :  I   D  /   E  x R  e   g i    s   t    e  r  E  x  /   M  e  m  :  L   o  a  d  ’    s  A  d   d  r   e   s   s  M  e  m  /    W r  R  e   g i    s   t    e  r  P   C Data Me m WA Di RA Do I    U n i    t   A I RFile Di Ra Rb Rw MemWr  RegWr  ExtOp=1 Exec Unit  busA  busB Imm16 ALUOp=Add  ALUSrc=1 M  u x 1 0 MemtoReg 1 0 RegDst=0 Rt Rd  Imm16 PC+4 PC+4 Rs Rt P   C  +  4   Zero Branch 1 0 Clk

Ifetch Reg/Dec Exec Mem

You are here!

!Loc ation 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

Remember Rt/Rd was connected to Rfile before?

(12)

ECE4680 Pipeline.23 2002-4-3

 A Detail View of th e Exec ut io n Un it

I   D  /   E  x R  e   g i    s   t    e  r  E  x  /   M  e  m  :  L   o  a  d  ’    s  M  e  m  o r   y A  d   d  r   e   s   s  ALU Control ALUctr  32  busA 32  busB E  x  t    e  n  d   e  r  M  u x 16 imm16 ALUSrc=1 ExtOp=1 3 A L   U Zero 0 1 32 ALUout 32 A  d   d   e  r  3 ALUOp=Add  << 2 32 PC+4 Target 32 Clk Exec

You are here!

Mem

Load’s Memory Access Stage

I   F   /   I   D  :  I   D  /   E  x R  e   g i    s   t    e  r  E  x  /   M  e  m R  e   g i    s   t    e  r  M  e  m  /    W r   :  L   o  a  d  ’    s  D  a  t    a P   C Data Mem WA Di RA Do I    U n i    t   A I RFile Di Ra Rb Rw MemWr=0 RegWr    ExtOp Exec Unit  busA  busB Imm16 ALUOp ALUSrc M  u x 1 0 MemtoReg 1 0 RegDst Rt Rd  Imm16 PC+4 PC+4 Rs Rt P   C  +  4   Zero Branch=0 1 0 Clk

Ifetch Reg/Dec Exec Mem

You are here!

(13)

ECE4680 Pipeline.25 2002-4-3

Load’s Write Back Stage

I   F   /   I   D  :  I   D  /   E  x R  e   g i    s   t    e  r  E  x  /   M  e  m R  e   g i    s   t    e  r  M  e  m  /    W r  R  e   g i    s   t    e  r  P   C Data Mem WA Di RA Do I    U n i    t   A I RFile Di Ra Rb Rw MemWr  RegWr=1   ExtOp Exec Unit  busA  busB Imm16 ALUOp ALUSrc M  u x 1 0 MemtoReg=1 1 0 RegDst Rt Rd  Imm16 PC+4 PC+4 Rs Rt P   C  +  4   Zero Branch 1 0 Clk

Ifetch Reg/Dec Exec Mem

You are somewhere out there!

!Loc ation 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]

Wr

ECE4680 Pipeline.26 2002-4-3

How About Control Signals?

I   F   /   I   D  :  I   D  /   E  x R  e   g i    s   t    e  r  E  x  /   M  e  m  :  L   o  a  d  ’    s  A  d   d  r   e   s   s  M  e  m  /    W r  R  e   g i    s   t    e  r  P   C Data Mem WA Di RA Do I    U n i    t   A I RFile Di Ra Rb Rw MemWr  RegWr  ExtOp=1 Exec Unit  busA  busB Imm16 ALUOp=Add  ALUSrc=1 M  u x 1 0 MemtoReg 1 0 RegDst=0 Rt Rd  Imm16 PC+4 PC+4 Rs Rt P   C  +  4   Zero Branch 1 0

Ifetch Reg/Dec Exec Mem

!Key Observation: Control Signals at Stage N = Func (Instr. at Stage N)

• N = Exec, Mem, or Wr 

!Example: Controls Signals at Exec Stage = Func(Load’s Exec)

Wr

Why no control signals at 1stand 2ndstages?

(14)

ECE4680 Pipeline.27 2002-4-3

Pipeline Control

!The Main Contro l generates the c ontrol signals duri ng Reg/Dec

• Contro l sig nals fo r Exec (ExtOp, ALUSrc, ...) are used 1 cycl e later 

• Control signals for Mem (MemWr Branch) are used 2 cycles later 

• Control sig nals for Wr (MemtoReg MemWr) are used 3 cycles later 

I   F   /   I   D R  e   g i    s   t    e  r  I   D  /   E  x R  e   g i    s   t    e  r  E  x  /   M  e  m R  e   g i    s   t    e  r  M  e  m  /    W r  R  e   g i    s   t    e  r 

Reg/Dec Exec Mem

ExtOp ALUOp RegDst ALUSrc Branch MemW r  MemtoReg RegWr  Main Control ExtOp ALUOp RegDst ALUSrc MemtoReg RegWr  MemtoReg RegWr  MemtoReg RegWr  Branch MemW r    Branch MemW r  Wr

Beginn ing o f the Wr’s Stage: A Real World Problem

! At t he beg inn in g of t he Wr stag e, we have a pr obl em if :

• RegAdr’s (Rd or Rt) Clk-to-Q > RegWr’s Clk-to-Q

!Similarly, at the beginning of the Mem stage, we have a problem if:

• WrAdr’ s Clk-to-Q > MemWr’s Clk-to-Q

!We have a race condition between Address and Write Enable!

!Why can M-cycle processor’s approach not be useful?

E  x  /   M  e  m M  e  m  /    W r   RegAdr  RegWr MemWr   Data WrAdr  Data Reg File Data Memory Clk RegAdr RegWr RegWr’s Clk-to-Q RegAdr’s Clk-to-Q Clk WrAdr MemWr MemWr’s Clk-to-Q WrAdr’s Clk-to-Q

(15)

ECE4680 Pipeline.29 2002-4-3

The Pipelin e Prob lem

!Multiple Cycle design prevents race condition between Addr and WrEn:

• Make sure Address i s stable by th e end of Cycle N

• As ser ts WrEn du ri ng Cycl e N + 1

!This approach can NOT be used in the pipeline design because:

• Must be able to write the register file every cycle

• Must be able write the data memory every cycle

Clock

Ifetch Reg/Dec Exec Mem Wr Store

Ifetch Reg/Dec Exec Mem Wr Store

Ifetch Reg/Dec Exec Mem Wr R-type

Ifetch Reg/Dec Exec Mem Wr R-type

!Solution? Recall 1-cycle processor’s approach.

ECE4680 Pipeline.30 2002-4-3

Synchronize Register File & Synchronize Memory

!Solution: And the Write Enable signal with t he Clock

• This is t he ONLY place where gating t he clock is used

• MUST consult circuit expert to ensure no tim ing violation:

- Example: Clock High Time > Write Access Delay

WrEn I_Addr  I_Data Reg File or Memory Clk I_Addr I_WrEn Address Data I_WrEn C_WrEn C_WrEn Clk Address Data WrEn Reg File or Memory

Synchronize Memory and Register File Address, Data, and WrEn must be stable at least 1 set-up time before the Clk edge Write occurs at the cycle following the clock edge that captures the signals Actual write

(16)

ECE4680 Pipeline.31 2002-4-3

 A Mo re Ex ten si ve Pipeli ning Exam ple

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Ifetch Reg/Dec Exec Mem Wr 0: Load

Ifetch Reg/Dec Exec Mem Wr 4: R-type

Ifetch Reg/Dec Exec Mem Wr 8: Store

Ifetch Reg/Dec Exec Mem Wr 12: Beq (target is 1000) End of  Cycle 4 End of  Cycle 5 End of  Cycle 6 End of  Cycle 7

!End of Cycle 4: Load’s Mem, R-type’s Exec, Store’s Reg, Beq’s Ifetch

!End of Cycle 5: Load’s Wr, R-type’s Mem, Store’s Exec, Beq’s Reg

!End of Cycle 6: R-type’s Wr, Store’s Mem, Beq’s Exec

!End of Cycle 7: Store’s Wr, Beq’s Mem

Pipelining Example: End of Cycle 4

!0: Load’s Mem 4: R-type’s Exec 8: Store’s Reg 12: Beq’s Ifetch

I   F   /   I   D  :  B   e   q I   n  s   t   r   u  c   t   i    o n I   D  /   E  x  :   S   t    o r   e  ’    s   b   u  s  A  & B  E  x  /   M  e  m  :  R - t    y  p  e  ’    s  R  e   s   u l    t   M  e  m  /    W r   :  L   o  a  d  ’    s  D  o  u  t   P   C = 1   6  Data Mem WA Di RA Do I    U n i    t   A I RFile Di Ra Rb Rw RegWr=0   ExtOp=x Exec Unit  busA  busB Imm16 ALUOp=R-type ALUSrc=0 M  u x 1 0 MemtoReg=x 1 0 RegDst=1 Rt Rd  Imm16 PC+4 PC+4 Rs Rt P   C  +  4   Zero Branch=0 1 0 12: Beq’s Ifet

8: Store’s R eg 4: R-type’s Exec 0: Load’s Mem

Clk 

MemWr=0 Clk 

(17)

ECE4680 Pipeline.33 2002-4-3

Pipelining Example: End of Cycle 5

!0: Lw’ s Wr 4: R’s Mem 8: Store’s Exec 12: Beq’s Reg 16: R’s Ifetch

I   F   /   I   D  :  I   n  s   t   r   u  c   t   i    o n  @ 1   6  I   D  /   E  x  :  B   e   q ’    s   b   u  s  A  & B  E  x  /   M  e  m  :   S   t    o r   e  ’    s  A  d   d  r   e   s   s  M  e  m  /    W r   :  R - t    y  p  e  ’    s  R  e   s   u l    t   P   C = 2   0  Data Me m WA Di RA Do I    U n i    t   A I RFile Di Ra Rb Rw RegWr=1   ExtOp=1 Exec Unit  busA  busB Imm16 ALUOp=Add  ALUSrc=1 M  u x 1 0 MemtoReg=1 1 0 RegDst=x Rt Rd  Imm16 PC+4 PC+4 Rs Rt P   C  +  4   Zero Branch=0 1 0 16: R’s Ifet

12: Beq’s Reg 8: Store’s Exec 4: R-type’s Mem 0: Load’s Wr

Clk 

MemWr=0 Clk 

ECE4680 Pipeline.34 2002-4-3

Pipelining Example: End of Cycle 6

!4: R’s Wr 8: Store’s Mem 12: Beq’s Exec 16: R’s Reg 20: R’s Ifet

I   F   /   I   D  :  I   n  s   t   r   u  c   t   i    o n  @ 2   0  I   D  /   E  x  :  R - t    y  p  e  ’    s   b   u  s  A  & B  E  x  /   M  e  m  :  B   e   q ’    s  R  e   s   u l    t    s  M  e  m  /    W r   :   N  o  t   h  i   n  g f    o r   S   t   P   C = 2  4  Data Me m WA Di RA Do I    U n i    t   A I RFile Di Ra Rb Rw RegWr=1   ExtOp=1 Exec Unit  busA  busB Imm16 ALUOp=Sub ALUSrc=0 M  u x 1 0 MemtoReg=0 1 0 RegDst=x Rt Rd  Imm16 PC+4 PC+4 Rs Rt P   C  +  4   Zero Branch=0 1 0 20: R-type’s Ifet

16: R-type’s R eg 12: Beq’s E xec 8: Store’s M em 4: R-type’s Wr

Clk 

MemWr=1 Clk 

(18)

ECE4680 Pipeline.35 2002-4-3

Pipelining Example: End of Cycle 7

!8: Store’s Wr 12: Beq’s Mem 16: R’s Exec 20: R’s Reg 24: R’s Ifet

I   F   /   I   D  :  I   n  s   t   r   u  c   t   i    o n  @ 2  4  I   D  /   E  x  :  R - t    y  p  e  ’    s   b   u  s  A  & B  E  x  /   M  e  m  :  R  t    y  p  e  ’    s  R  e   s   u l    t    s  M  e  m  /    W r   :   N  o  t   h  i   n  g f    o r  B   e   q P   C = 1   0   0   0  Data Me m WA Di RA Do I    U n i    t   A I RFile Di Ra Rb Rw RegWr=0   ExtOp=x Exec Unit  busA  busB Imm16 ALUOp=R-type ALUSrc=0 M  u x 1 0 MemtoReg=x 1 0 RegDst=1 Rt Rd  Imm16 PC+4 PC+4 Rs Rt P   C  +  4   Zero Branch=1 1 0 24: R-type’s Ifet

20: R-type’s Reg 16: R-type’s Exec 12: Beq’s Mem 8: Store’s Wr

Clk 

MemWr=0 Clk 

The Delay Branch Phenom enon

! Al th ou gh Beq is fet ch ed d urin g Cycl e 4:

• Target address is NOT written int o the PC until the end of Cycle 7

• Branch’s target is NOT fetched until Cycle 8

• 3-instruc tion delay before the branch take effect

!This is referred to as Branch Hazard:

• Clever design techniques c an reduce the delay to ONE instru ction

Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Cycle 11

Ifetch Reg/Dec Exec Mem Wr

Ifetch Reg/Dec Exec Mem Wr 16: R-type

Ifetch Reg/Dec Exec Mem Wr

Ifetch Reg/Dec Exec Mem Wr 24: R-type

12: Beq (target is 1000)

20: R-type Clk

Ifetch Reg/Dec Exec Mem Wr 1000: Target of Br

(19)

ECE4680 Pipeline.37 2002-4-3

The Delay Load Phenomenon

! Al th ou gh Lo ad i s f etc hed duri ng Cycle 1:

• The data is NOT wri tten int o the Reg File until the end of Cycle 5

• We cannot read this value from the Reg File until Cycle 6

• 3-instruc tion delay before the load take effect

!This is r eferred to as Data Hazard:

• Clever design techniques c an reduce the delay to ONE instru ction

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

Ifetch Reg/Dec Exec Mem Wr I0: Load

Ifetch Reg/Dec Exec Mem Wr Plus 1

Ifetch Reg/Dec Exec Mem Wr Plus 2

Ifetch Reg/Dec Exec Mem Wr Plus 3

Ifetch Reg/Dec Exec Mem Wr Plus 4

ECE4680 Pipeline.38 2002-4-3

Summary

!Disadvantages of the Single Cycle Processor 

• Long cycle time

• Cycle time is too long fo r all instruct ions except the Load

!Multiple Clock Cycle Processor:

• Divide the instruct ions into sm aller steps

• Execute each step (instead of th e entire instruction) in on e cycle

!Pipeline Processor:

• Natural enhancement of the multiple clock cycle processor 

• Each functional unit can only be used once per instruct ion

• If a instruction is going t o use a functional unit:

- it must use it at the same stage as all other instructions

• Pipeline Control:

- Each stage’s cont rol signal depends ONLY on the instruction

(20)

ECE4680 Pipeline.39 2002-4-3

Single Cycle, Multiple Cycle, vs. Pipeline

Clk

Cycle 1

Multiple Cycle Implementation:

Ifetch Reg Exec Mem Wr

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

Load Ifetch Reg Exec Mem Wr

Ifetch Reg Exec Mem

Load Store

Pipeline Implementation:

Ifetch Reg Exec Mem Wr

Store Clk

Single Cycle Implementation:

Load Store Waste

Ifetch R-type

Ifetch Reg Exec Mem Wr

R-type

References

Related documents