Computer Architecture
ELEC3441
Lecture 4 – Single Cycle Processor
Dr. Hayden Kwok-Hay So
Department of Electrical and
Electronic Engineering
Overview
n
First implementation the RISC-V ISA in this
course
•
More variations to come…
n
Single cycle processor:
•
Each instruction takes 1 cycle to complete
n
Idealized memory
•
Instantaneous read
•
Single cycle write
n
Implements base RV32
HKU EEE ENGG3441 - HS 2
Full RISCV1Stage Datapath (HW1)
3 +4 Instruction Mem Reg File IType Sign Extend Decoder Data Mem ir[24:20] br/jmp pc+4 p c_ se l ir[31:20] rs1 ALU Control Signals w b _ se l Reg File rf _ w e n va l me m_ rw PC tohost testrig_tohost cpr_en me m_ va l addr wdata rdata Inst BrJmp TargGen ir[19:15] ir[31], ir[7], ir[30:25], ir[11:8]] PC+4 jalr 0 rs2 Branch CondGen br_eq? br_lt? co n tro l st a tu s re g ist e rs Execute Stage br_ltu? PC addr BType Sign Extend JumpReg TargGen Op2Sel Op1Sel AluFun data wa wd en addr data ir[ 1 1 :7 ] 13
HKU EEE ENGG3441 - HS HKU EEE ENGG3441 - HS 4
RV32I
RISC-V v2.1
specification
Hardware Elements
n
Combinational circuits
•
Mux, Decoder, ALU, ...
n
Synchronous state elements
•
Flipflop, Register, Register file, SRAM, DRAM
Edge-triggered: Data is sampled at the rising edge
Clk
D
Q
En
ff
Q
D
Clk
En
OpSelect - Add, Sub, ... - And, Or, Xor, Not, ... - GT, LT, EQ, Zero, ... Result Comp? A BALU
Sel O A0 A1 An-1 Mux..
.
lg(n) A De co d e r..
.
O0 O1 On-1 lg(n)HKU EEE ENGG3441 - HS 5
Register
n
An n-bit register can be constructed by combining
n
DFFs in parallel
n
Each DFF responsible for read/write of 1 bit of
the input
n
Shared control signal:
•
Clock, reset, enable, …
HKU EEE ENGG3441 - HS 6
ff
Q
0D
0 Clk Enff
Q
1D
1ff
Q
2D
2ff
Q
n-1D
n-1...
...
...
register
D Q enD<n-1:0>
Q<n-1:0>
Register File
n
Reads are combinational
•
rd=regfile(ra) in the same cycle
n
Writes take place at rising the clock edge
•
Write only take place if WE=1 at clock edge
n
Read address (ra) and write address (wa) choose which
register to read/write
n
RISCV needs a register file with
2 read ports
and
1 write port
n
What happen in these cases?
•
ra1 = ra2
•
ra1 = wa1
7ReadData1
ReadAddr1
ReadAddr2
WriteAddr
Register
file
2R+1W
ReadData2
WriteData
WE
Clock
rd1 ra1 ra2 wa wd rd2 weHKU EEE ENGG3441 - HS
Register File Implementation
n
2R + 1W, 32 registers, each 32 bits wide
n
Decoder select 1 out of 32 register depending on wa, ra1, ra2
n
On writes:
•
Only 1 of the 32 register has WE=1
n
On reads:
•
Only 1 of the 32 register may output to rd{1,2} bus
•
The same register may output to both rd1 and rd2 bus (e.g. rs1 = rs2)
8
reg 31
wa
clk
reg 1
wd
we
ra1
rd1
rd2
reg 0
…
32…
5 32 32…
ra2
5 5A Simple Memory Model
9
Reads and writes are always completed in one cycle
• a Read can be done any time (i.e. combinational)
• a Write is performed at the rising clock edge
if it is enabled
⟹ the write address and data
must be stable at the clock edge
Later in the course we will present a more realistic model of memory
MAGIC
RAM
ReadData
WriteData
Address
WriteEnable
Clock
32 32 32HKU EEE ENGG3441 - HS
Instruction Execution
10
Execution of a RISC-V instruction involves:
1. instruction fetch
2. decode and register fetch
3. ALU operation
4. memory operation (optional)
5. write back to register file (optional)
+ the computation of the
next instruction
address
HKU EEE ENGG3441 - HS
Datapath: Reg-Reg ALU Instructions
RegWrite Timing?
0x4 Add clk addr inst Inst. Memory PC Inst<19:15> Inst<24:20> Inst<11:7> <31:25,14:12> OpCode ALU ALU Control RegWriteEn clk rd1 GPRs ra1 ra2 wa wd rd2 we 31 25 24 20 19 15 14 12 11 7 6 0funct7 rs2 rs1 funct3 rd opcode
7 5 5 3 5 7 0000000 src2 src1 ADD/SLT/SLTU dest OP 0000000 src2 src1 AND/OR/XOR dest OP 0000000 src2 src1 SLL/SRL dest OP 0100000 src2 src1 SUB/SRA dest OP
rd ß (rs1) func (rs2)
11 HKU EEEDatapath: Reg-Imm ALU Instructions
Sign Ext inst<31:20> OpCode 0x4 Add clk addr inst Inst. Memory PC ALU RegWriteEn clk rd1 GPRs ra1 ra2 wa wd rd2 we inst<19:15> inst<11:7> inst<14:12> ALU Control
rd ß (rs1) op imm
HKU EEE ENGG3441 - HS 12
31 20 19 15 14 12 11 7 6 0
imm[11:0] rs1 funct3 rd opcode
12 5 3 5 7
I-immediate[11:0] src ADDI/SLTI[U] dest OP-IMM I-immediate[11:0] src ANDI/ORI/XORI dest OP-IMM
Conflicts in Merging Datapath
13 Sign Ext OpCode 0x4 Add clk addr inst Inst. Memory PC ALU RegWriteEn clk rd1 GPRs ra1 ra2 wa wd rd2 we <19:15> <11:7> <31:20> <31:25,14:12> ALU Control <14:12>Introduce
muxes
<24:20> 31 27 26 25 24 20 19 15 14 12 11 7 6 0funct7 rs2 rs1 funct3 rd opcode R-type imm[11:0] rs1 funct3 rd opcode I-type imm[11:5] rs2 rs1 funct3 imm[4:0] opcode S-type HKU EEE ENGG3441 - HS
ALU Op2 Select
14 Sign Ext OpCode 0x4 Add clk addr inst Inst. Memory PC ALU RegWriteEn clk rd1 GPRs ra1 ra2 wa wd rd2 we <19:15> <11:7> <31:20> ALU Control <24:20> 31 27 26 25 24 20 19 15 14 12 11 7 6 0
funct7 rs2 rs1 funct3 rd opcode R-type imm[11:0] rs1 funct3 rd opcode I-type imm[11:5] rs2 rs1 funct3 imm[4:0] opcode S-type
op2Sel Reg / Imm
If OPCODE==“OP” then op2sel = ‘1’ else ‘0’
HKU EEE ENGG3441 - HS
Quick Quiz
n
How do you implement op2sel in hardware?
HKU EEE ENGG3441 - HS 15
If OPCODE==“OP” then op2sel = ‘1’ else ‘0’
opcode
op2sel
OP
=?
opcode
op2sel
OP
=?
1
0
1
0
How do you implement this?
=?
2
1
Determining ALU functions
n
All basic integer R-R instructions have opcode = OP
(“0110011”)
•
only funct3 and funct7 are needed to determine needed ALU
function:
•
E.g. 000èAdd, 001èShiftLeft, 100èXOR,0100000èSub, …
n
Immediate instructions requires the same ALU function,
but has slightly different encoding
•
ADDI is same as ADD, except no need to check for Sub in
funct7
•
opcode = OP-IMM (“0010011”)
n
Need opcode to help determine ALU function
•
More cases like these come up later…
HKU EEE ENGG3441 - HS 16
imm[11:5] rs2 rs1 010 imm[4:0] 0100011 SW rs1,rs2,imm
imm[11:0] rs1 000 rd 0010011 ADDI rd,rs1,imm
imm[11:0] rs1 010 rd 0010011 SLTI rd,rs1,imm
0000000 rs2 rs1 000 rd 0110011 ADD rd,rs1,rs2
0100000 rs2 rs1 000 rd 0110011 SUB rd,rs1,rs2
imm[11:0] rs1 110 rd 0010011 ORI rd,rs1,imm
imm[11:0] rs1 111 rd 0010011 ANDI rd,rs1,imm
000000 shamt rs1 001 rd 0010011 SLLI rd,rs1,shamt
000000 shamt rs1 101 rd 0010011 SRLI rd,rs1,shamt
010000 shamt rs1 101 rd 0010011 SRAI rd,rs1,shamt
ALU Instructions Datapath
17 <30,14:12,6:0> Op2Sel Reg / Imm Sign Ext OpCode 0x4 Add clk addr inst Inst. Memory PC ALU RegWriteEn clk rd1 GPRs rs1 rs2 wa wd rd2 we <19:15> <24:20> ALU Control <11:7> 31 27 26 25 24 20 19 15 14 12 11 7 6 0funct7 rs2 rs1 funct3 rd opcode R-type imm[11:0] rs1 funct3 rd opcode I-type imm[11:5] rs2 rs1 funct3 imm[4:0] opcode S-type
<31:20>
HKU EEE ENGG3441 - HS
Load Instructions
n
Use ALU for address calculation
n
Mux to select data for regfile: mem or ALU
18 WBSel ALU / Mem Op2Sel base offset OpCode ALU Control ALU 0x4 Add clk addr inst Inst. Memory PC RegWriteEn clk rd1 GPRs rs1 rs2 wa wd rd2 we Sign Ext clk MemWrite addr wdata rdata Data Memory we imm[11:0] rs1 f3 rd opcode
I
offset[11:0] base width dest LOAD
Load: (dest) ß M[(base) + offset]
<11:7>
HKU EEE ENGG3441 - HS
Store Instructions
n
Also use ALU for address calculation
nNo need to write back to regfile
n
Need to tell memory it is a write è Set MemWrite to ‘1’
19 WBSel ALU / Mem Op2Sel base offset OpCode ALU Control ALU 0x4 Add clk addr inst Inst. Memory PC RegWriteEn clk rd1 GPRs rs1 rs2 wa wd rd2 we Sign Ext clk MemWrite addr wdata rdata Data Memory we
S
imm[11:5] rs2 rs1 f3 imm[4:0] opcodeStore: M[(base) + offset] ß (src)
offset[11:5] src base width offset[4:0] STOREHKU EEE ENGG3441 - HS
RISC-V Conditional Branches
n
Requires:
•
1. Logic to compare register values (rs1 and rs2)
•
2. Datapath to calculate branch target address relative to
PC
n
Current implementation: dedicated logic for both 1
and 2
•
Dedicated comparison logic (=, <, [≠, ≥])
•
Dedicated adder for jump target calculation
n
May use ALU for (2) above
•
Performance tradeoff…
20
SB
imm[10:5] rs2 rs1 funct3 imm[4:1] opcodeoffset[12,10:5] src2 src1 BEQ/BNE BLT[U] BGE[U} offset[11,4:0] BRANCH imm[11] imm[12]
if (rs1 BR_OP rs2) then jump to PC + branch_imm
Conditional Branches
(BEQ/BNE/BLT/BGE/BLTU/BGEU)
21 0x4 Add PCSel clk WBSel MemWrite addr wdata rdata Data Memory we Op2Sel OpCode Bcomp? clk clk addr inst Inst. Memory PC rd1 GPRs rs1 rs2 wa wd rd2 we Branch Imm ALU ALU Control Add br pc+4 RegWrEn Br LogicHKU EEE ENGG3441 - HS
RISC-V
Unconditional JAL
UB
offset[20:1] dest JALjump to PC + j_imm; rd ß PC+4
imm[10:1] imm[19:12] rd opcode
imm[20] Imm[11] 0x4 Add PCSel clk WBSel MemWrite addr wdata rdata Data Memory we Op2Sel OpCode Bcomp? clk clk addr inst Inst. Memory PC rd1 GPRs rs1 rs2 wa wd rd2 we Branch Imm ALU ALU Control Add brjmp pc+4 RegWrEn Br Logic Jump Imm
HKU EEE ENGG3441 - HS 22
JALR
23 0x4 Add PCSel clk WBSel MemWrite addr wdata rdata Data Memory we Op2Sel OpCode Bcomp? clk clk addr inst Inst. Memory PC rd1 GPRs rs1 rs2 wa wd rd2 we ALU ALU Control Add brjmp pc+4 RegWrEn Br Logic imm[11:0] rs1 f3 rd opcodeI
offset[11:0] base 000 dest JALR
jump to imm + (rs1); rd ß PC+4
offset Sign
Ext
jmpreg
HKU EEE ENGG3441 - HS
Hardwired Control is pure
Combinational Logic
n
Decoding instruction determines the setting of
various muxes and ALU function
n
Simple decoding helps to make faster hardware
24
combinational
logic
op code
Bcomp?
Op2Sel
AluFunc
MemWrite
WBSel
RegWriteEn
PCSel
Hardwired Control Table (Excerpt)
Instruction Op2Sel AluFunc WBSel RegWriteEn MemWrite PCSel
ADD
SUB
ADDI
SLL
LW
SW
BEQ
JAL
JALR
HKU EEE ENGG3441 - HS 25
RS2
RS2
IMI
RS2
IMI
IMS
IMB
IMJ
IMI
ADD
SUB
ADD
SLL
ADD
ADD
X
X
X
ALU
ALU
ALU
ALU
MEM
X
X
PC+4
PC+4
T
T
T
T
T
F
F
T
T
F
F
F
F
F
T
F
F
F
PC+4
PC+4
PC+4
PC+4
PC+4
PC+4
PC+4/BA
JA
JRA
• Op2Sel: rs2, {I,B,J}-type immediate IM{I, B, J}
• AluFunc: Add, Sub, Shift, XOR, etc
• WBSel: what values to write to rd
Single-Cycle Hardwired Control
We will assume clock period is sufficiently long for
all of the following steps to be “completed”:
1.
Instruction fetch
2.
Decode and register fetch
3.
ALU operation
4.
Data fetch if required
5.
Register write-back setup time
⟹
t
C
> t
IFetch
+ t
RFetch
+ t
ALU
+ t
DMem
+ t
RWB
At the rising edge of the following clock, the PC,
register file and memory are updated
26 HKU EEE ENGG3441 - HS
Full RISCV1Stage Datapath (HW1)
27 +4 Instruction Mem Reg File IType Sign Extend Decoder Data Mem ir[24:20] br/jmp pc+4 p c_ se l ir[31:20] rs1 ALU Control Signals w b _ se l Reg File rf _ w e n va l me m_ rw PC tohost testrig_tohost cpr_en me m_ va l addr wdata rdata Inst BrJmp TargGen ir[19:15] ir[31], ir[7], ir[30:25], ir[11:8]] PC+4 jalr 0 rs2 Branch CondGen br_eq? br_lt? co n tro l st a tu s re g ist e rs Execute Stage br_ltu? PC addr BType Sign Extend JumpReg TargGen Op2Sel Op1Sel AluFun data wa wd en addr data ir[ 1 1 :7 ] 13
HKU EEE ENGG3441 - HS 28