ECE4680 Pipeline.1
ECE4680 Pipeline.1 2002-4-32002-4-3
ECE4680
ECE4680
Computer Organiza
Computer Organiza
tion
tion
and Arch
and Arch
itecture
itecture
De
De
signin
signin
g a Pipeline Processor
g a Pipeline Processor
ECE4680 Pipeline.2
ECE4680 Pipeline.2 2002-4-32002-4-3
A Si
A Si
ng
ng
le C
le C
yc
yc
le Pr
le Pr
oc
oc
ess
ess
or
or
32 32 ALUctr ALUctr Clk Clk busW busW RegWr RegWr 32 32 32 32 busA busA 32 32 busB busB 5 5 5 5 55 R Rw w RRa a RRbb 32 32-bit 32 32-bit Registers Registers Rs Rs Rt Rt Rt Rt Rd Rd RegDst RegDst E E x x t t e e n n d d e e r r M M u u x x Mux Mux 32 32 16 16 imm16 imm16 ALUSrc ALUSrc ExtOp ExtOp M M u u x x MemtoReg MemtoReg Clk Clk Data In Data In WrEn WrEn 32 32 Adr Adr Data Data Memory Memory 32 32 MemWr MemWr A A L L U U Zero Zero 0 0 1 1 0 0 1 1 0 0 1 1 Instruction Instruction Fetch Unit Fetch Unit Clk Clk Instruction<31:0> Instruction<31:0> Jump Jump Branch Branch < < 2 2 1 1 : : 2 2 5 5 > > < < 1 1 6 6 : : 2 2 0 0 > > < < 1 1 1 1 : : 1 1 5 5 > > < < 0 0 : : 1 1 5 5 > > Imm1 Imm166 Rd Rd Main Main Control Control op op ALU ALU Control Control func func ALUop ALUop 3 3 RegDst RegDst ALUSrc ALUSrc
::
<5:0> <5:0> <31:26> <31:26> Instr<15:0> Instr<15:0> Zero Zero 3 3ECE4680 Pipeline.3
ECE4680 Pipeline.3 2002-4-32002-4-3
Dra
Dra
wbacks o
wbacks o
f this
f this
Single Cy
Single Cy
cle Processor
cle Processor
!
!Long cycle time:Long cycle time:
•
• Cycle time must be Cycle time must be long enough for the load instructiolong enough for the load instructio n:n:
-- PC’s PC’s Clock -to-Q Clock -to-Q ++
-- InstructioInstructio n Men Memory Access Time +mory Access Time +
-- Register File Register File Access Time Access Time ++
-- AL AL U Delay U Delay (add(add resres s cs c alcalc ululatiati onon ) ) ++
-- Data Data Memory Access Time Memory Access Time ++
-- Register File Register File Setup TSetup Time ime ++
-- Clock SkewClock Skew
!
!Cycle time is much longer than needeCycle time is much longer than needed for all other instrd for all other instr uctionsuctions .. Examples:
Examples:
•
• R-R-type instructype instruc tions dtions d o not o not require data merequire data memory accessmory access
•
• Jump dJump d oes not require ALU operation nor data memory accessoes not require ALU operation nor data memory access
Overview of a Multiple Cycle I
Overview of a Multiple Cycle I
mpl
mpl
ementation
ementation
!
!The root of the single cycle processor’s prThe root of the single cycle processor’s pr oblems:oblems:
•
• The cyThe cycle time has to be long enough for tcle time has to be long enough for t he slowest instructihe slowest instructi onon
!
!Solution:Solution:
•
• Break Break the instruction into smaller the instruction into smaller stepssteps
•
• Execute eExecute each step (instead of the entire instrach step (instead of the entire instr uction) in uction) in one cycleone cycle
-- Cycle time: time it takes to execute the longest stepCycle time: time it takes to execute the longest step
-- Keep all the steps to Keep all the steps to have similar lengthhave similar length
•
• This is This is the ethe essence of the ssence of the multiple cycle processor multiple cycle processor
!
!The aThe advantagedvantages of s of the multiple cthe multiple c ycle processor:ycle processor:
•
• Cycle time is much shorter Cycle time is much shorter
•
• Different instructions take different number of cycles to completeDifferent instructions take different number of cycles to complete
-- Load takes five cyclesLoad takes five cycles
-- Jump oJump o nly takes three cyclesnly takes three cycles
•
• Al Allolo wws a fs a f unun ctct ioio nal nal unun it it to to be ube u sed sed momo re tre t han han ononce pce p er ier i nsns trtr ucuc titi onon
Why will this hinder Why will this hinder
ECE4680 Pipeline.5 2002-4-3
Multiple Cycle Processor
!MCP: A functio nal unit to be used more than once per instruction
Ideal Memory WrAdr Din RAdr 32 32 32 Dout MemWr 32 A L U 32 32 ALUOp ALU Control I n s t r u c t i o n R e g 32 IRWr 32 Reg File Ra Rw busW Rb 5 5 32 busA 32 busB RegWr Rs Rt M u x 0 1 Rt Rd PCWr ALUSelA Mux 0 1 RegDst M u x 0 1 32 PC MemtoReg Extend ExtOp M u x 0 1 32 0 1 2 3 4 16 Imm 32 << 2 ALUSelB M u x 1 0 Target 32 Zero Zero PCWrCond PCSrc BrWr 32 IorD ECE4680 Pipeline.6 2002-4-3
Outline of Today’s
Lecture---Pipelining
!Introductio n to the Concept of Pipelined Processor
!Pipelined Datapath and Pipelined Contr ol
!How to Avoid Race Condition in a Pipeline Design?
ECE4680 Pipeline.7 2002-4-3
Timing Diagram of a Load Instruction
Clk PC Rs, Rt, Rd, Op, Func Clk-to-Q ALUctr
Instruction Memory Access Time
Old Value New Value
RegWr Old Value New Value
Delay through Control Logic
busA
Register File Access Time
Old Value New Value
busB
ALU Delay
Old Value New Value
Old Value New Value
New Value Old Value
ExtOp Old Value New Value
ALUSrc Old Value New Value
Address Old Value New Value
busW Old Value New
Delay through Extender & Mux
Data Memory Access Time
Instruction Fetch Instr Decode / Reg. Fetch
Address Data Memory Reg Wr
R e g i s t e r F i l e W r i t e T i m e 1 3 2
The Five Stages of Load
!Ifetch: Instruction Fetch
• Fetch the instruct ion from the Instructi on Memory
!Reg/Dec: Registers Fetch and Instruction Decode
!Exec: Calculate the memory address
!Mem: Read the data fro m the Data Memory
!Wr: Write the data back to the register file
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem Wr Load
ECE4680 Pipeline.9 2002-4-3
Key Ideas Behind Pipelining
!Grading t he mid t erm exams:
• 5 problems, five people grading the exam
• Each person ONLY grades one problem
• Pass the exam to the next person as soon as one finishes his part
• As su me eac h p ro blem t akes 0.5 ho ur to gr ade
- Each in dividual exam still takes 2.5 hours t o grade
- But w ith 5 people, all exams can b e graded muc h quick er
!The load inst ruction h as 5 stages:
• Five independent functional units t o work on each stage
- Each funct ional unit is used only once
• The 2nd load can start as soon as the 1st finis hes its Ifet stage
• Each load still t akes five cycles to complete
• The throughput, however, is much hig her
ECE4680 Pipeline.10 2002-4-3
Pipelining th e Load Instruc tion
!The five independent functional units in the pipeline datapath are:
• Instruction Memory for the Ifetch stage
• Register File’s Read ports (bus A and busB) for the Reg/Dec stage
• ALU fo r t he Ex ec s tag e
• Data Memory for th e Mem stage
• Register File’s Write port (bus W) for the Wr stage
!One instruction enters the pipeline every cycle
• One instruction c omes out of the pipeline (complete) every cycle
• Comparison with m -cycle processor: 5x3 cycles versus 7 cycles
• The “ Effective” Cycles per Instruct ion (CPI) is 1
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
Ifetch Reg/Dec Exec Mem Wr 1st lw
Ifetch Reg/Dec Exec Mem Wr 2nd lw
Ifetch Reg/Dec Exec Mem Wr 3rd lw
Regiester file is used 2 times but no problem !
ECE4680 Pipeline.11 2002-4-3
The Four Stages of R-type
!Ifetch: Instruction Fetch
• Fetch the instruct ion from the Instructi on Memory
!Reg/Dec: Registers Fetch and Instruction Decode
!Exec: ALU operates on the two r egister operands
!Wr: Write the ALU output back to the register file
Cy cle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec Wr R-type
Pipelining the R-type and Load Instruct ion
!Ideal Case
• each functional unit is used once per instruct ion AND
• all instructio ns are the same, just as discussed in the previous 2 slides.
!We have a prob lem:
• Two instruct ions try t o write to th e register file at the same time!
!We will see many other probl ems involved in pi pelining.
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Exec Wr R-type
Ifetch Reg/Dec Exec Wr R-type
Ifetch Reg/Dec Exec Mem Wr Load
Ifetch Reg/Dec Exec Wr R-type
Ifetch Reg/Dec Exec Wr R-type
ECE4680 Pipeline.13 2002-4-3
Important Observation
!Each functional unit can only be used once per instruct ion:
• necessary but not su fficient.
!Each functional unit mus t be used at the same stage for all instruc tions:
• Load uses Register File’s Write Port during its 5th stage
• R-type uses Register File’s Write Port du ring it s 4th s tage
Ifetch Reg/Dec Exec Mem Wr Load
1 2 3 4 5
Ifetch Reg/Dec Exec Wr R-type
1 2 3 4
Problem !
ECE4680 Pipeline.14 2002-4-3
Solution 1: Insert “ Bubble” into the Pipeline
!Insert a “ bubble” into the pipeline to prevent 2 writes at the same cycle
• The control logic can be complex
!No instructi on is compl eted during Cycle 5:
• The “ Effective” CPI for load is 2
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch R-type
Ifetch Reg/Dec Exec Mem Wr Load
Ifetch Reg/Dec Exec R-type
Ifetch Reg/Dec R-type
Ifetch Reg/Dec Exec Wr
Reg/Dec Exec Wr Ifetch Reg/Dec Exec
Wr
Exec Wr
Reg/Dec Exec Wr Ifetch Reg/Dec Exec
Wr
Exec Wr
Pipeline Bubble
ECE4680 Pipeline.15 2002-4-3
Soluti on 2: Delay R-type’s Writ e by One Cycle
!Delay R-type’s r egister write b y one c ycle:
• Now R-type instruct ions also use Reg File’s write port at Stage 5
• Mem stage is a NOOP stage: nothin g is being done
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9
Ifetch Reg/Dec Wr
R-type
Ifetch Reg/Dec Wr
R-type
Ifetch Reg/Dec Exec Mem Wr Load
Ifetch Reg/Dec Wr
R-type
Ifetch Reg/Dec Wr
R-type
Ifetch Reg/Dec Exec Wr
R-type Mem Exec Exec Exec Exec 1 2 3 4 5 Mem Mem Mem Mem
The Four Stages of Stor e
!Ifetch: Instruction Fetch
• Fetch the instruct ion from the Instructi on Memory
!Reg/Dec: Registers Fetch and Instruction Decode
!Exec: Calculate the memory address
!Mem: Write the data int o the Data Memory
Cy cle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec Mem
Store Wr
ECE4680 Pipeline.17 2002-4-3
The Four Stages of Beq
!Ifetch: Instruction Fetch
• Fetch the instruct ion from the Instructi on Memory
!Reg/Dec: Registers Fetch and Instruction Decode
!Exec: See slide 23 for more details
• ALU co mp ares th e tw o r egi st er o per and s
• Adder cal cu lat es t he b ran ch tar get addres s
!Mem: If the registers we compared in the Exec stage are the same,
• Write the branch target address into the PC
Cy cle 1 Cycle 2 Cycle 3 Cycle 4
Ifetch Reg/Dec Exec Mem
Beq Wr
Find the difference of Beq between pipeline processor and
M-cycle processor, think why?
NOOP!
ECE4680 Pipeline.18 2002-4-3
A Pi pel in ed Dat apath
I F / I D R e g i s t e r I D / E x R e g i s t e r E x / M e m R e g i s t e r M e m / W r R e g i s t e r P C Data Me m WA Di RA Do I U n i t A I RFile Di Ra Rb Rw MemWr RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc M u x 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 PC+4 Rs Rt P C + 4 Zero Branch 1 0 Clk
Ifetch Reg/Dec Exec Mem Wr
Why need such registers? N o d a t a p a t h u d e r W r ? Clock-to-Q delay
ECE4680 Pipeline.19 2002-4-3
The Instruction Fetch Stage
I F / I D : l w $ 1 , 1 0 0 ( $ 2 ) I D / E x R e g i s t e r E x / M e m R e g i s t e r M e m / W r R e g i s t e r P C = 1 4 Data Me m WA Di RA Do I U n i t A I RFile Di Ra Rb Rw MemWr RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc M u x 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 PC+4 Rs Rt P C + 4 Zero Branch 1 0 Clk
Ifetch Reg/Dec Exec Mem
You are here!
!Loc ation 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]
A Detail View of th e Ins tr uc ti on Uni t
!Loc ation 10: lw $1, 0x100($2) I F / I D : l w $ 1 , 1 0 0 ( $ 2 ) P C = 1 4 1 0 10 A d d e r Instruction Memory “4” Instruction Address Clk Ifetch
You are here!
Reg/Dec
ECE4680 Pipeline.21 2002-4-3
The Decod e / Regis ter Fetch Stage
I F / I D : I D / E x : R e g . 2 & 0 x 1 0 0 E x / M e m R e g i s t e r M e m / W r R e g i s t e r P C Data Me m WA Di RA Do I U n i t A I RFile Di Ra Rb Rw MemWr RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc M u x 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 PC+4 Rs Rt P C + 4 Zero Branch 1 0 Clk
Ifetch Reg/Dec Exec Mem
You are here!
!Loc ation 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]
ECE4680 Pipeline.22 2002-4-3
Load’s Add ress Calculation Stage
I F / I D : I D / E x R e g i s t e r E x / M e m : L o a d ’ s A d d r e s s M e m / W r R e g i s t e r P C Data Me m WA Di RA Do I U n i t A I RFile Di Ra Rb Rw MemWr RegWr ExtOp=1 Exec Unit busA busB Imm16 ALUOp=Add ALUSrc=1 M u x 1 0 MemtoReg 1 0 RegDst=0 Rt Rd Imm16 PC+4 PC+4 Rs Rt P C + 4 Zero Branch 1 0 Clk
Ifetch Reg/Dec Exec Mem
You are here!
!Loc ation 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]
Remember Rt/Rd was connected to Rfile before?
ECE4680 Pipeline.23 2002-4-3
A Detail View of th e Exec ut io n Un it
I D / E x R e g i s t e r E x / M e m : L o a d ’ s M e m o r y A d d r e s s ALU Control ALUctr 32 busA 32 busB E x t e n d e r M u x 16 imm16 ALUSrc=1 ExtOp=1 3 A L U Zero 0 1 32 ALUout 32 A d d e r 3 ALUOp=Add << 2 32 PC+4 Target 32 Clk Exec
You are here!
Mem
Load’s Memory Access Stage
I F / I D : I D / E x R e g i s t e r E x / M e m R e g i s t e r M e m / W r : L o a d ’ s D a t a P C Data Mem WA Di RA Do I U n i t A I RFile Di Ra Rb Rw MemWr=0 RegWr ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc M u x 1 0 MemtoReg 1 0 RegDst Rt Rd Imm16 PC+4 PC+4 Rs Rt P C + 4 Zero Branch=0 1 0 Clk
Ifetch Reg/Dec Exec Mem
You are here!
ECE4680 Pipeline.25 2002-4-3
Load’s Write Back Stage
I F / I D : I D / E x R e g i s t e r E x / M e m R e g i s t e r M e m / W r R e g i s t e r P C Data Mem WA Di RA Do I U n i t A I RFile Di Ra Rb Rw MemWr RegWr=1 ExtOp Exec Unit busA busB Imm16 ALUOp ALUSrc M u x 1 0 MemtoReg=1 1 0 RegDst Rt Rd Imm16 PC+4 PC+4 Rs Rt P C + 4 Zero Branch 1 0 Clk
Ifetch Reg/Dec Exec Mem
You are somewhere out there!
!Loc ation 10: lw $1, 0x100($2) $1 <- Mem[($2) + 0x100]
Wr
ECE4680 Pipeline.26 2002-4-3
How About Control Signals?
I F / I D : I D / E x R e g i s t e r E x / M e m : L o a d ’ s A d d r e s s M e m / W r R e g i s t e r P C Data Mem WA Di RA Do I U n i t A I RFile Di Ra Rb Rw MemWr RegWr ExtOp=1 Exec Unit busA busB Imm16 ALUOp=Add ALUSrc=1 M u x 1 0 MemtoReg 1 0 RegDst=0 Rt Rd Imm16 PC+4 PC+4 Rs Rt P C + 4 Zero Branch 1 0
Ifetch Reg/Dec Exec Mem
!Key Observation: Control Signals at Stage N = Func (Instr. at Stage N)
• N = Exec, Mem, or Wr
!Example: Controls Signals at Exec Stage = Func(Load’s Exec)
Wr
Why no control signals at 1stand 2ndstages?
ECE4680 Pipeline.27 2002-4-3
Pipeline Control
!The Main Contro l generates the c ontrol signals duri ng Reg/Dec
• Contro l sig nals fo r Exec (ExtOp, ALUSrc, ...) are used 1 cycl e later
• Control signals for Mem (MemWr Branch) are used 2 cycles later
• Control sig nals for Wr (MemtoReg MemWr) are used 3 cycles later
I F / I D R e g i s t e r I D / E x R e g i s t e r E x / M e m R e g i s t e r M e m / W r R e g i s t e r
Reg/Dec Exec Mem
ExtOp ALUOp RegDst ALUSrc Branch MemW r MemtoReg RegWr Main Control ExtOp ALUOp RegDst ALUSrc MemtoReg RegWr MemtoReg RegWr MemtoReg RegWr Branch MemW r Branch MemW r Wr
Beginn ing o f the Wr’s Stage: A Real World Problem
! At t he beg inn in g of t he Wr stag e, we have a pr obl em if :
• RegAdr’s (Rd or Rt) Clk-to-Q > RegWr’s Clk-to-Q
!Similarly, at the beginning of the Mem stage, we have a problem if:
• WrAdr’ s Clk-to-Q > MemWr’s Clk-to-Q
!We have a race condition between Address and Write Enable!
!Why can M-cycle processor’s approach not be useful?
E x / M e m M e m / W r RegAdr RegWr MemWr Data WrAdr Data Reg File Data Memory Clk RegAdr RegWr RegWr’s Clk-to-Q RegAdr’s Clk-to-Q Clk WrAdr MemWr MemWr’s Clk-to-Q WrAdr’s Clk-to-Q
ECE4680 Pipeline.29 2002-4-3
The Pipelin e Prob lem
!Multiple Cycle design prevents race condition between Addr and WrEn:
• Make sure Address i s stable by th e end of Cycle N
• As ser ts WrEn du ri ng Cycl e N + 1
!This approach can NOT be used in the pipeline design because:
• Must be able to write the register file every cycle
• Must be able write the data memory every cycle
Clock
Ifetch Reg/Dec Exec Mem Wr Store
Ifetch Reg/Dec Exec Mem Wr Store
Ifetch Reg/Dec Exec Mem Wr R-type
Ifetch Reg/Dec Exec Mem Wr R-type
!Solution? Recall 1-cycle processor’s approach.
ECE4680 Pipeline.30 2002-4-3
Synchronize Register File & Synchronize Memory
!Solution: And the Write Enable signal with t he Clock
• This is t he ONLY place where gating t he clock is used
• MUST consult circuit expert to ensure no tim ing violation:
- Example: Clock High Time > Write Access Delay
WrEn I_Addr I_Data Reg File or Memory Clk I_Addr I_WrEn Address Data I_WrEn C_WrEn C_WrEn Clk Address Data WrEn Reg File or Memory
Synchronize Memory and Register File Address, Data, and WrEn must be stable at least 1 set-up time before the Clk edge Write occurs at the cycle following the clock edge that captures the signals Actual write
ECE4680 Pipeline.31 2002-4-3
A Mo re Ex ten si ve Pipeli ning Exam ple
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Ifetch Reg/Dec Exec Mem Wr 0: Load
Ifetch Reg/Dec Exec Mem Wr 4: R-type
Ifetch Reg/Dec Exec Mem Wr 8: Store
Ifetch Reg/Dec Exec Mem Wr 12: Beq (target is 1000) End of Cycle 4 End of Cycle 5 End of Cycle 6 End of Cycle 7
!End of Cycle 4: Load’s Mem, R-type’s Exec, Store’s Reg, Beq’s Ifetch
!End of Cycle 5: Load’s Wr, R-type’s Mem, Store’s Exec, Beq’s Reg
!End of Cycle 6: R-type’s Wr, Store’s Mem, Beq’s Exec
!End of Cycle 7: Store’s Wr, Beq’s Mem
Pipelining Example: End of Cycle 4
!0: Load’s Mem 4: R-type’s Exec 8: Store’s Reg 12: Beq’s Ifetch
I F / I D : B e q I n s t r u c t i o n I D / E x : S t o r e ’ s b u s A & B E x / M e m : R - t y p e ’ s R e s u l t M e m / W r : L o a d ’ s D o u t P C = 1 6 Data Mem WA Di RA Do I U n i t A I RFile Di Ra Rb Rw RegWr=0 ExtOp=x Exec Unit busA busB Imm16 ALUOp=R-type ALUSrc=0 M u x 1 0 MemtoReg=x 1 0 RegDst=1 Rt Rd Imm16 PC+4 PC+4 Rs Rt P C + 4 Zero Branch=0 1 0 12: Beq’s Ifet
8: Store’s R eg 4: R-type’s Exec 0: Load’s Mem
Clk
MemWr=0 Clk
ECE4680 Pipeline.33 2002-4-3
Pipelining Example: End of Cycle 5
!0: Lw’ s Wr 4: R’s Mem 8: Store’s Exec 12: Beq’s Reg 16: R’s Ifetch
I F / I D : I n s t r u c t i o n @ 1 6 I D / E x : B e q ’ s b u s A & B E x / M e m : S t o r e ’ s A d d r e s s M e m / W r : R - t y p e ’ s R e s u l t P C = 2 0 Data Me m WA Di RA Do I U n i t A I RFile Di Ra Rb Rw RegWr=1 ExtOp=1 Exec Unit busA busB Imm16 ALUOp=Add ALUSrc=1 M u x 1 0 MemtoReg=1 1 0 RegDst=x Rt Rd Imm16 PC+4 PC+4 Rs Rt P C + 4 Zero Branch=0 1 0 16: R’s Ifet
12: Beq’s Reg 8: Store’s Exec 4: R-type’s Mem 0: Load’s Wr
Clk
MemWr=0 Clk
ECE4680 Pipeline.34 2002-4-3
Pipelining Example: End of Cycle 6
!4: R’s Wr 8: Store’s Mem 12: Beq’s Exec 16: R’s Reg 20: R’s Ifet
I F / I D : I n s t r u c t i o n @ 2 0 I D / E x : R - t y p e ’ s b u s A & B E x / M e m : B e q ’ s R e s u l t s M e m / W r : N o t h i n g f o r S t P C = 2 4 Data Me m WA Di RA Do I U n i t A I RFile Di Ra Rb Rw RegWr=1 ExtOp=1 Exec Unit busA busB Imm16 ALUOp=Sub ALUSrc=0 M u x 1 0 MemtoReg=0 1 0 RegDst=x Rt Rd Imm16 PC+4 PC+4 Rs Rt P C + 4 Zero Branch=0 1 0 20: R-type’s Ifet
16: R-type’s R eg 12: Beq’s E xec 8: Store’s M em 4: R-type’s Wr
Clk
MemWr=1 Clk
ECE4680 Pipeline.35 2002-4-3
Pipelining Example: End of Cycle 7
!8: Store’s Wr 12: Beq’s Mem 16: R’s Exec 20: R’s Reg 24: R’s Ifet
I F / I D : I n s t r u c t i o n @ 2 4 I D / E x : R - t y p e ’ s b u s A & B E x / M e m : R t y p e ’ s R e s u l t s M e m / W r : N o t h i n g f o r B e q P C = 1 0 0 0 Data Me m WA Di RA Do I U n i t A I RFile Di Ra Rb Rw RegWr=0 ExtOp=x Exec Unit busA busB Imm16 ALUOp=R-type ALUSrc=0 M u x 1 0 MemtoReg=x 1 0 RegDst=1 Rt Rd Imm16 PC+4 PC+4 Rs Rt P C + 4 Zero Branch=1 1 0 24: R-type’s Ifet
20: R-type’s Reg 16: R-type’s Exec 12: Beq’s Mem 8: Store’s Wr
Clk
MemWr=0 Clk
The Delay Branch Phenom enon
! Al th ou gh Beq is fet ch ed d urin g Cycl e 4:
• Target address is NOT written int o the PC until the end of Cycle 7
• Branch’s target is NOT fetched until Cycle 8
• 3-instruc tion delay before the branch take effect
!This is referred to as Branch Hazard:
• Clever design techniques c an reduce the delay to ONE instru ction
Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Cycle 11
Ifetch Reg/Dec Exec Mem Wr
Ifetch Reg/Dec Exec Mem Wr 16: R-type
Ifetch Reg/Dec Exec Mem Wr
Ifetch Reg/Dec Exec Mem Wr 24: R-type
12: Beq (target is 1000)
20: R-type Clk
Ifetch Reg/Dec Exec Mem Wr 1000: Target of Br
ECE4680 Pipeline.37 2002-4-3
The Delay Load Phenomenon
! Al th ou gh Lo ad i s f etc hed duri ng Cycle 1:
• The data is NOT wri tten int o the Reg File until the end of Cycle 5
• We cannot read this value from the Reg File until Cycle 6
• 3-instruc tion delay before the load take effect
!This is r eferred to as Data Hazard:
• Clever design techniques c an reduce the delay to ONE instru ction
Clock
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8
Ifetch Reg/Dec Exec Mem Wr I0: Load
Ifetch Reg/Dec Exec Mem Wr Plus 1
Ifetch Reg/Dec Exec Mem Wr Plus 2
Ifetch Reg/Dec Exec Mem Wr Plus 3
Ifetch Reg/Dec Exec Mem Wr Plus 4
ECE4680 Pipeline.38 2002-4-3
Summary
!Disadvantages of the Single Cycle Processor
• Long cycle time
• Cycle time is too long fo r all instruct ions except the Load
!Multiple Clock Cycle Processor:
• Divide the instruct ions into sm aller steps
• Execute each step (instead of th e entire instruction) in on e cycle
!Pipeline Processor:
• Natural enhancement of the multiple clock cycle processor
• Each functional unit can only be used once per instruct ion
• If a instruction is going t o use a functional unit:
- it must use it at the same stage as all other instructions
• Pipeline Control:
- Each stage’s cont rol signal depends ONLY on the instruction
ECE4680 Pipeline.39 2002-4-3
Single Cycle, Multiple Cycle, vs. Pipeline
Clk
Cycle 1
Multiple Cycle Implementation:
Ifetch Reg Exec Mem Wr
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Load Ifetch Reg Exec Mem Wr
Ifetch Reg Exec Mem
Load Store
Pipeline Implementation:
Ifetch Reg Exec Mem Wr
Store Clk
Single Cycle Implementation:
Load Store Waste
Ifetch R-type
Ifetch Reg Exec Mem Wr
R-type