Micro architecture Micro-architecture Datapath & Control
CIT 595 Spring 2010
Instruction Set Architecture
ISA=Programmer-visiblecomponents & operations Memory organization
¾ Address space -- how may locations can be addressed?
¾ Addressibility -- how many bits per location?
Register set
¾ How many? What size?
Instruction set
O d
CIT 595 2
¾ Opcodes
¾ Data types
¾ Addressing modes
All information needed to write/generate machine languageprogram
Instruction
Fundamental unit of work
Constituents
Opcode: operation to be performed (e.g. ADD, LD)
Operands: data/locations to be used for operation
¾Source: location that contains the data/instruction
¾Destination: location that will store the result of computation
¾Immediate: data values not contained at a particular location
CIT 595 3
location
LC-3 Overview: Memory and Registers
Memory
Address space: 216locations (16-bit addresses)
Addressibility: 16 bits
Registers
Temporary storage (Memory access takes longer)
Eight general-purpose registers: R0 - R7 (each 16 bits wide)
Other registers: Not directly addressable, but used by and affected by instructions
¾E.g.PC(program counter), condition codes (NZP)
CIT 595 4
Word Size
Number of bits normally processed by ALU in one instruction
Also width of registers
LC-3 word size is 16 bits
LC-3 ISA: Overview
Opcodes
16 opcodes ([15:12] of instruction = 24= 16 possible values)
Types of instructions:
¾Operateinstructions: E.g. ADD
¾Data movementinstructions: E g LDR LEA STR
¾Data movementinstructions: E.g. LDR, LEA, STR
¾Controlinstructions: E.g. BR, JMP, TRAP JSR, RTI
Operate and Data movement instructions (except Store) set/clear condition codes, based on result
¾N = negative (<0), Z = zero (=0), P = positive (> 0)
Addressing Modes
How is the location of an operand (data to acted upon) specified?
CIT 595 5
p ( p ) p
Non-memory addresses: register, immediate (literal)
Memory addresses: base+offset, PC-relative, indirect
Data Types
16-bit 2’s complement integer
Example: ADD Instruction Format
LC-3 ADD : Add the contents of R2 to the contents of R6, and store the result in R6.
CIT 595 6
Example: LDR Instruction Format
Add the value 6 to the contents of R3 to form a memory address. Load the contents of memory y y at that address and place the resulting data in R2
Microarchitecture (Machine Internals)
Describes a large number of details that are hidden in the programming model
Constituent parts of the processor and
How these interconnect and interoperate to implement the architectural specification
Computer = processing unit + memory system + I/O
Processing unit = control + datapath
Control = finite state machine
Inputs = machine instruction, datapath conditions
Outputs = register transfer control signals, ALU operation codesp g g p
Instruction interpretation = instruction fetch, decode, execute, write
Datapath = functional units + registers
All the logic used to process information
¾Functional units = ALU, multipliers, dividers, etc.
Control Unit
Circuitry that
controls the flow of information through the processor, and
Coordinates activities of the other units within it.
Is a FSM
States enumerate all possible configurations the machine can be in
Using the opcode information & some other inputs (e.g. Condition Code, Interrupt Signal) determines next state and output
CIT 595 9
p
¾Decides for each stage in instruction processing cycle
¾Which registers/memory location are enabled?
¾Which operation should ALU perform?
¾Choose ALU output or Memory Output?
Instruction Processing Cycle
DECODE instruction
FETCH instruction from mem.
DECODE instruction EVALUATE ADDRESS FETCH OPERANDS
CIT 595 10
EXECUTE operation STORE result
E.g. LC3 FSM diagram
CIT 595 11
Variations in Processing Cycle
Example in LC3
Evaluate Address and Execute are combined as they both use ALU (adder)
Operand Fetch is separated into Register Fetch and Memory Access
Store consists of only register writes
f
CIT 595 12
Memory Write is part of Memory Access
Thus we have a total of 6 stages
Simple LC3 Datapath
*
*
[15:12] & [5]
W XY Z [ \ ] ^ _ `
CIT 595 13
Memory
Harvard architecture (physically separate Storage)
As opposed to Von Nuemann Model
CIT 595 14
As opposed to Von Nuemann Model
Dominant in RISC style architecture e.g. ARM processor
Instruction Memory is default set to read
Data Memory can be read or written based on WE
WE = 1 means write
Registers/Register File
2 Reads Ports
Default read (no need for enable)
Rd1 – SR1, BaseR
Rd2 – SR2
Rd2 SR2
1 Write Port
Need Enable (WE)
WR – DR
Source: Prof. Milo Martin at Upenn
ALU
ADD
Reg-Reg
Immediate
¾ Bits [4:0]
Default control unit signal setDefault control unit signal set to 0 for ADD
LDR, STR
Used to evaluate the address of the load and store
¾ Bits [5:0]
Need to expand to
Need to expand to accommodate other
instructions
E.g. NOT instruction
MAR(Memory Address) MUX (for Data Memory)
CIT 595 17
PC and PC MUX
PC (16-bit register) update is based on the PC MUX
PC + 1(default)
Address based on BR, JMP, TRAP
¾ For BR to work in this implementation need CC registersp g
CIT 595 18
DRval MUX
Select appropriate DR (destination register) value
CIT 595 19
Control Signals
0: 3-bit Rd1 (Source 1 – SR1/BaseR)
1: 3-bit Rd2 (Source 2 – SR2)
2: 3-bit Wr (Destination Register – DR)
3: 1-bit WE for Register (to control register update)
4: 2-bit MUX to select 2ndOperand to ALU
5: 2-bit MAR (Memory Address) MUX to select Address to the Data Memory
6: 1-bit PC enable (to update PC)
CIT 595 20
6: 1 bit PC enable (to update PC)
7: 1-bit WE for Data Memory (to control memory update)
8: 1-bit ALUMEM MU (to select output of Memory or ALU
9: 1-bit DRval MUX (to select value written to Destination register)
10: 1-bit PC MUX (select value of PC)
[15:12] & [5]
W XY Z [ \ ] ^ _ `
ADD Instruction
] ]
SR1 SR2 DR
Instr Opcode CONTROL SIGNALS
I[15:12] I[5]
W X Y Z [ \ ] ^ _ `
ADD 0001 0 I[8:6] I[2:0] I[11:9] 1 00 00 1 0 1 0 1
LDR Instruction
W XY Z [ \ ] ^ _ `
]
[15:12] & [5]Instr Opcode CONTROL SIGNALS
I[15:12] I[5]
W X Y Z [ \ ] ^ _ `
LDR 0110 x I[8:6] xxx I[11:9] 1 10 00 1 0 0 0 1 SR1 SR2 DR
JMP Instruction
W XY Z [ \ ] ^ _ `
]
[15:12] & [5]Opcode CONTROL SIGNALS
I[15:12] I[5]
W X Y Z [ \ ] ^ _ `
TRAP
Used to get data in and o t of the comp ter
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 0 0 0 0 trapvect8 TRAP
Used to get data in and out of the computer
Calls operating system “service routine”
Identified by 8-bit trap vector
Execution resumes after OS code executes
TRAP Instruction
]
[15:12] & [5]W XY Z [ \ ] ^ _ `R7 = PC + 1
Opcode CONTROL SIGNALS
I[15:12] I[5]
W X Y Z [ \ ] ^ _ `
TRAP 1111 x xxx xxx 111(R7) 1 xx 01 1 0 0 1 0
Sequencing an instruction
For appropriate sequencing of instruction processing cycle i.e. F->D->EA->OP->EX->S
Updating PC and condition code registers
Reading/Updating Registers and Memory g p g g y
Use clock to sequence each phase of an instruction by raising the right signals as the right time
It takes fixed number of clock ticks/cycles (repetition of
CIT 595 26
rising or falling edge) to execute each instruction
How is this done?
Sequencing an instruction (contd..)
We connect the clock to a synchronous counter and the counter to the decoder
Th d d t t bl d i b d t
The decoder output enabled is based on counter outputs (i.e. which cycle you are in)
The control signals are combination of the decoder output, opcode and some other inputs
CIT 595 27
n-bit n counter
n x 2
nDecoder
2
nClock
Hardwired Control Unit
Combinational
Control signals are combination of
Opcode bits
Combinational circuit
CIT 595 28
Opcode bits
Other signals such as interrupts, or condition codes (NZP)
Timing info (T1 to Tn) – these signals are essential for timing for proper sequencing through instruction cycle
Clocking Methodology
How long should the clock cycle be such that we complete a one phase of the instruction cycle?
When is data valid or stable?
So that it can be read or written
Do not want to end with mix of old and new data
In a processor only memory elements can store values
CIT 595 29
This means any collection of combinational logic must have its
Inputs coming from a set of memory elements and
Outputs written into a set of memory elements
Clocking Methodology (contd..)
The length of the clock cycle is determined as follows:
The time necessary for the signals to reach
CIT 595 30
memory element 2 defines the length of the clock cycle
i.e. minimum clock cycle time must be at least as great as the maximum propagation delay of the circuit
Example of Clock Cycle Length
W XY Z [ \ ] ^ _ `
]
[15:12] & [5]Programmable Control Unit
Programmable Control Unit (contd..)
Each machine instruction is in turn implemented by a series of instructions called microinstructions
Micro instructions encodes
Micro instructions encodes
Control signals for carrying out a particular stage in the instruction cycle
The address of the most likely next micro instruction
The microinstructions form a microprogram, which is
CIT 595 33
stored in programmable memory
Sometimes called Control Store
E.g. Flash Memory is non-volatile and reprogrammable
E.g. LC3 Implemented using Program Control
The behavior of LC-3 during a given clock cycle is completely described by the 49 bit microinstruction
39 bits for control signals 10 bit f ibl t t t f th
IR[15:12]
10 bits for possible next state of the machine
Each phase of instruction cycle may require more than one microinstruction
E.g. Fetch stage takes 3 microinstructions
34
6-bit address is used lookup the memory
There are 52 possible microinstructions (states) that can describe LC3’s behavior
Memory size 26 x 49
CIT595
E.g. LC3 Implemented using Program Control
The microsequencer produces the 6 bit address
Corresponds to the next
IR[15:12]
p
behavior of the processor
Combinational circuit based on
¾
10 bits of Microinstruction
¾
8 bit additional info based on other events
35
Microprogram Control
CIT595
R
To 8 (See Figure C.7)
RTI
MAR <–PC PC<–PC+1 [INT]
MDR<–M
IR<–MDR R
DR<–SR1+OP2* [BEN]
18
32
1
0 0
0 To 49
(See Figure C.7)
NOT
JSR ADD
AND
JMP BR 1
BEN<–IR[11] & N + IR[10] & Z + IR[9] & P [IR[15:12]]
1101 To 13 33
35
TRAP
Appendix C of Yale
& Patt Fig C.2
R R
PC<–BaseR 20 PC<–BaseR
R7<–PC [IR[11]]
1 0
12
4
PC<–PC+off11 21
To 18
To 18 To 18
DR<–SR1+OP2 set CC
DR<–SR1&OP2*
set CC
[BEN]
PC<–PC+off9
PC<–MDR
MAR<–PC+off9
MDR<–M[MAR ]
R R MAR<–PC+off9
MDR<–M[MAR]
5
11
0
1 22
29 24
To 18
To 18
To 18
R R
28
30
10
MDR<–M[MAR]
R7<–PC DR<–NOT(SR)
set CC 9
LEA LDLDR LDI STI STR ST
JSR
MAR<–ZEXT[IR[7:0]]
15 TRAP
36
R R
PC< BaseR
To 18 To 18
To 18
MAR<–MDR MAR<–MDR MAR<–B+off6
MAR<–PC+off9
MAR<–B+off6
MAR<–PC+off9
MDR<–SR
DR<–MDR
set CC M[MAR]<–MDR
7 6
3 31
26
23 25
27
To 18 To 18
MDR<–M[MAR]
2
NOTES
16
B+off6 : Base + SEXT[offset6]
PC+off9 : PC + SEXT{offset9]
PC+off11 : PC + SEXT[offset11]
*OP2 may be SR2 or SEXT[imm5]
DR<–PC+off914 set CC
R R
CIT595
Hardwired vs. Programmable Control
Complexity
There is an extra level of instruction interpretation in microprogrammed control, which makes it slower than hardwired control
Flexibility
Instruction and Control Logic are tied together in hardwired control, which makes it difficult to modify
New instructions can be easily added by only making changes to the microprogram in programmed control implementation
CIT 595 37
Instruction
[15:12] & [5]
W XY Z [ \ ] ^ _ `
] ]
CIT 595 38
SR1 SR2 DR
Instr Opcode CONTROL SIGNALS
I[15:12] I[5]