Laboratorio de Tecnologías de Información
Instruction Set Architecture Instruction Set Architecture
Arquitectura de Computadoras Arquitectura de Computadoras
Arturo D
Arturo Díaz Píaz Péérezrez Centro de Investigaci
Centro de Investigacióón y de Estudios Avanzados del IPNn y de Estudios Avanzados del IPN Laboratorio de Tecnolog
Laboratorio de Tecnologíías de Informacias de Informacióónn
Laboratorio de Tecnologías de Información
Instruction
Instruction Set Set
♦ ... the attributes of a [computing] system as seen by the programmer, i. e., the conceptual structure and functional behavior, as distinct from the
organization of the data flows and controls of logic design, and the physical implementation.
■ Amdahl, Blaaw, Brooks, 1964.
Laboratorio de Tecnologías de Información
Instruction Set Design Instruction Set Design
instruction set
software
hardware
Laboratorio de Tecnologías de Información
Instruction Set Architecture Instruction Set Architecture
♦ ... the attributes of a [computing] system as seen by the programmer, i. e., the conceptual structure and functional
behavior, as distinct from the organization of the data flows and controls of logic design, and the physical implementation.
■ Amdahl, Blaaw, Brooks, 1964.
♦ Organization of programmable storage
♦ Data types & data structures: encodings and representations
♦ Instruction formats
♦ Instruction (or Operand Code) Set
♦ Modes of addressing and accessing data items and instructions
♦ Exceptional conditions
Laboratorio de Tecnologías de Información
ISA: What Must be Specified?
ISA: What Must be Specified?
Instruction Fetch Instruction
Decode Operand
Fetch Execute
Result Store
Next
♦ Instruction Format or Encoding
■ how is it decoded?
♦ Location of operands and result
■ where other than memory?
■ how many explicit operands?
■ how are memory operands located?
■ which can or cannot be in memory?
♦ Data type and Size
♦ Operations
■ what are supported
♦ Successor instruction
Laboratorio de Tecnologías de Información
Evolution
Evolution of of Instruction Instruction Sets Sets
Single Accumulator (EDSAC 1950) Accumulator + Index Registers
(Manchester Mark I, IBM 700 series 1953 Separation of Programming Model
from Implementation High-level Language Based
(B5000 1963)
Concept of a Family IBM 360 1964 General Purpose Register Machines
Complex Instruction Sets (Vax, Intel 432 1977-80)
Load/Store Architecture (CDC 6600, Cray 1 1963-76)
RISC: MIPS, Sparc, 88000, IBM RS6000, ... 1987
Laboratorio de Tecnologías de Información
Basic ISA
Basic ISA Classes Classes
♦ Accumulator:
1 address add A acc ← acc + mem[A]
1+x address addx A acc ← acc + mem[A + x]
♦ Stack
0 address add tos ← tos + next
♦ General Purpose register
2 address add A B EA(A) ← EA(A) + EA(B) 3 address add A B C EA(A) ← EA(B) + EA(C)
♦ Load/Store
3 address add Ra Rb Rc Ra ← Rb + Rc load Ra Rb Ra ← mem[Rb]
Laboratorio de Tecnologías de Información
Code sequence for (C = A + B) for four classes of instruction sets:
Accumulator Load A Add B Store C
Register
(register-memory) Load R1,A
Add R1,B Store C, R1 Stack
Push A Push B Add Pop C
Register (load-store) Load R1,A Load R2,B
Add R3,R1,R2 Store C,R3
Comparison:
Bytes per instruction? Number of Instructions? Cycles per instruction?
Comparing Number of Instructions
Comparing Number of Instructions
Laboratorio de Tecnologías de Información
General Purpose Registers Dominate General Purpose Registers Dominate
♦ 1975-200x all machines use general purpose registers
♦ Advantages of registers
■ registers are faster than memory
■ registers are easier for a compiler to use
» e.g., (A*B) – (C*D) – (E*F) can do multiplies in any order vs.
stack
■ registers can hold variables
» memory traffic is reduced, so program is sped up (since registers are faster than memory)
Laboratorio de Tecnologías de Información
Caches vs.
Caches vs. Registers Registers
♦ Registers advantages
■ Faster (no addressing mode, no tags)
■ Deterministic (no misses)
■ Can duplicate for two ports
■ Short identifier (3-8 bits)
♦ Register disadvantages
■ Must save/restore on procedure calls
■ Can’t take the address of a register
■ Fixed size (FP, strings, structures)
■ Compiler must control (?)
Laboratorio de Tecnologías de Información
Caches vs.
Caches vs. Registers Registers ( ( cont cont ’ ’ d d ) )
♦ How many registers? More means
+ Hold operands longer (reducing memory traffic & potentially execution time)
- Longer register specifiers (except with register windows) - Slow registers
- More state slows context switches
Laboratorio de Tecnologías de Información
♦ Programmable storage
■ 232 x bytes of memory
■ 31 x 32-bit GPRs (R0 = 0)
■ 32 x 32-bit FP regs (paired DP)
■ HI, LO, PC
r0 0
r1
°
°
° r31 PC lo hi
MIPS I Registers
MIPS I Registers
Laboratorio de Tecnologías de Información
Data Movement Load (from memory)
Store (to memory)
memory-to-memory move register-to-register move input (from I/O device) output (to I/O device)
push, pop (to/from stack)
Arithmetic integer (binary + decimal) or FP Add, Subtract, Multiply, Divide
Logical not, and, or, set, clear
Shift shift left/right, rotate left/right
Typical Operations
Typical Operations
Laboratorio de Tecnologías de Información
Control (Jump/Branch) unconditional, conditional Subroutine Linkage call, return
Interrupt trap, return
Synchronization test & set (atomic r-m-w)
String search, translate
Graphics (MMX) parallel subword ops (4 16bit add)
Typical Operations Typical Operations
little change since 1960
Laboratorio de Tecnologías de Información
° Rank instruction Integer Average Percent total executed
1 load 22%
2 conditional branch 20%
3 compare 16%
4 store 12%
5 add 8%
6 and 6%
7 sub 5%
8 move register-register 4%
9 call 1%
10 return 1%
Top 10 80x86 Instructions
Top 10 80x86 Instructions
Laboratorio de Tecnologías de Información
Operation Summary Operation Summary
♦ Support these simple instructions, since they
will dominate the number of instructions executed:
■ load,
■ store,
■ add,
■ subtract,
■ move register-register,
■ and,
■ shift,
■ compare equal, compare not equal,
■ branch, jump,
■ call,
■ return;
Laboratorio de Tecnologías de Información
Operands
Operands for for ALU ALU instructions instructions
♦ ALU instructions combine operands (e.g. ADD)
♦ Number of explicit operands
■ Two - destination equals one source
■ Three - orthogonal
♦ Operands in registers or memory
■ Any combination -- VAX
» (orthogonal, but variable instr. formats)
■ At least one register -- much of 360
» (not orthogonal)
■ All registers -- CRAY, DLX, RISCs
» (orthogonal, but needs loads/stores)
Laboratorio de Tecnologías de Información
Memory Addressing Memory Addressing
♦ Since 1980 almost every machine uses addresses to level of 8-bits (byte)
♦ 2 questions for design of ISA:
■ Since could read a 32-bit word as four loads of bytes from sequential byte
addresses or as one load word from a single byte address,
» How do byte addresses map onto words?
» Can a word be placed on any byte boundary?
Laboratorio de Tecnologías de Información
♦ Big Endian: address of most significant byte = word address (xx00 = Big End of word)
■ IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA
♦ Little Endian: address of least significant byte = word address (xx00 = Little End of word)
■ Intel 80x86, DEC Vax, DEC Alpha (Windows NT)
♦ Mode selectable
■ becoming more common: PowerPC, MIPS R10000
msb lsb
3 2 1 0
little endian byte 0
Addressing Objects:
Addressing Objects: Endian Endian Wars Wars
Laboratorio de Tecnologías de Información
0 1 2 3 Aligned
Not Aligned
Addressing Objects: Alignment Addressing Objects: Alignment
♦ Alignment: require that objects fall on address that is
multiple of their size.
Laboratorio de Tecnologías de Información
Alignment Alignment
♦ No restrictions
■ Simpler software
■ Hardware must detect misalignment and make 2 memory accesses
■ expensive logic, slows down all references
■ sometimes required for backward compatibility
♦ Restrictred alignment
■ software must guarantee alignment
■ hardware only detecs misalignment and traps
■ trap handler does it
♦ Middle group
■ misaligned data ok but requires multiple instructions
■ compiler must skill know
■ still trap on misaligned access
Laboratorio de Tecnologías de Información
A A “ “ typical typical ” ” RISC RISC
♦ 32-bit fixed format instruction (3 formats)
♦ 32 32-bit GPR (R0 contains zero, DP take pair)
♦ 3-address, reg-reg arithmetic instruction
♦ Single address mode for load/store: base+displacement
■ no indirection
♦ Simple branch conditions
♦ Delay branch
see: SPARC, MIPS MC88100, AMD2900, i960, i860,
PARisc, DEC Alpha, Clipper, CDC 6600, CDC 7600,
Cray-1, Cray-2, Cray-3, ...
Laboratorio de Tecnologías de Información
VAX VAX - - 11 11
♦ Variable format, 2 and 3 address instruction
♦ 32-bit word size, 16 GPR (four reserved)
♦ Rich set of addressing modes (apply to any operand)
♦ Rich set of operations
■ bit-field, stack, call, case, loop, string, poly, system)
♦ Rich set of data types (B, W, L, Q, O, F, D, G, H)
♦ Condition codes
OpCode A/M A/M A/M
Byte 0 1 n m
Laboratorio de Tecnologías de Información
VAX VAX - - 11: 11: Addressing Addressing Modes Modes
1. Register Ri
2. Base + Displacement M[Ri + v]
3. Immediate v
4. Register Indirect M[Ri]
5. Direct (absolute) M[v]
6. Base + Index M[Ri + Rj]
7. Scaled Index M[Ri + Rj*d + v]
8. Autoincrement M[Ri++]
9. Autodecrement M[Ri--]
10. Memory Indirec M[ M[Ri] ]
11. [Indirection chains]
Modes 1-4 account for 93 % of all operands on the VAX
Memory
Register File
Laboratorio de Tecnologías de Información
Addressing Mode Usage Addressing Mode Usage
♦ 3 programs measured on machine with all address modes (VAX)
--- Displacement: 42% avg, 32% to 55%
--- Immediate: 33% avg, 17% to 43%
--- Register deferred (indirect): 13% avg, 3% to 24%
--- Scaled: 7% avg, 0% to 16%
--- Memory indirect: 3% avg, 1% to 6%
--- Misc: 2% avg, 0% to 3%
♦ 75% displacement & immediate
♦ 85% displacement, immediate & register indirect
75% 85%
Laboratorio de Tecnologías de Información
° Avg. of 5 SPECint92 programs v. avg. 5 SPECfp92 programs
° 1% of addresses > 16-bits
° 12 - 16 bits of displacement needed 0%
5%
10%
15%
20%
25%
30%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Int. Avg. FP Avg.
Address Bits
Displacement Address Size?
Displacement Address Size?
Laboratorio de Tecnologías de Información
Immediate Size?
Immediate Size?
♦ 50% to 60% fit within 8 bits
♦ 75% to 80% fit within 16 bits
Laboratorio de Tecnologías de Información
Addressing Summary Addressing Summary
♦ Data Addressing modes that are important:
■ Displacement, Immediate, Register Indirect
♦ Displacement size should be 12 to 16 bits
♦ Immediate size should be 8 to 16 bits
Laboratorio de Tecnologías de Información
Variable:
Fixed:
Hybrid:
…
…
Generic Example of Instruction Format Widths
Generic Example of Instruction Format Widths
Laboratorio de Tecnologías de Información
Instruction Formats Instruction Formats
♦ If code size is most important, use variable length instructions
♦ If performance is most important, use fixed length instructions
♦ Recent embedded machines (ARM, MIPS) added optional mode to execute subset of 16-bit wide
instructions (Thumb, MIPS16); per procedure decide performance or density
♦ Some architectures actually exploring on-the-fly
decompression for more density.
Laboratorio de Tecnologías de Información
Instruction Format Instruction Format
♦ If have many memory operands per instruction and/or many addressing modes:
=>Need one address specifier per operand
♦ If have load-store machine with 1 address per instr.
and one or two addressing modes:
=> Can encode addressing mode in the opcode
Laboratorio de Tecnologías de Información
op rs rt rd
register Register (direct)
immed
op rs rt
Immediate
• All instructions 32 bits wide
Base+index
immed
op rs rt
register +
Memory
PC-relative
immed
op rs rt
PC +
Memory
• Register Indirect?
MIPS Addressing Modes/Instruction MIPS Addressing Modes/Instruction
Formats
Formats
Laboratorio de Tecnologías de Información
Most Popular ISA of all time: Intel Most Popular ISA of all time: Intel
80x86 80x86
♦ 1971: Intel invents microprocessor 4004/8008, 8080 in 1975
♦ 1975: Gordon Moore realized one more chance for new ISA before ISA locked in for decades
■ hired CS people in Oregon
■ weren’t ready in 1977 (CS people did 432 in 1980)
■ started crash effort for 16-bit microcomputer
♦ 1978: 8086 dedicated registers, segmented address, 16 bit
■ 8088; 8-bit external bus version of 8086
♦ 1980: IBM selects 8088 as basis for IBM PC
♦ 1980: 8087 floating point coprocessor: adds 60 instructions using hybrid stack/register scheme
♦ 1982: 80286 24-bit address, protection, memory mapping
Laboratorio de Tecnologías de Información
Intel x86 (IA
Intel x86 (IA - - 32) 32)
♦ 1989: 80486 & Pentium in 1992: faster + MP few instructions
♦ 1997: MMX multimedia extensions
♦ 200X: Superseded by IA-64 (Merced, McKinley, Itanium, etc.)
♦ “Difficult to explain and impossible to love”
■ See H&P Appendix D.8
♦ Eight 32-bit registers (EAX, EBX, ..., but also ESP, EBP)
♦ Also 16- and 8-bit version (AX, AH, AL)
♦ Most instructions have two operands, one possibly from memory
♦ One super-duper addressing mode w/ effective address =
■ base_reg + (index_reg * scaling_factor) + displacement
♦ Many formats: see H&P fig. D.8
Laboratorio de Tecnologías de Información
Intel MMX Intel MMX
♦ MultiMedia eXtension to IA-32 [Peleg & Weiser, IEEE Micro, 8/96]
■ Multimedia data values often need much less than 32 bits
■ But are organized in groups (e.g. red/green/blue)
■ So in 64-bit FP registers: 2x32, 4x16, 8x8
♦ E.g. ADDB (for byte)
■ 17 87 100 ... 6 more
■ +17 13 200 ... 6 more
■ --- --- --- ... ---
■ 34 100 255 ... 6 more
♦ MMX takes 16-element dot product (a
0*b
0+ a
1*b
1+ ... +
a *b )
Laboratorio de Tecnologías de Información
Control (Jump/Branch) unconditional, conditional Subroutine Linkage call, return
Interrupt trap, return
Synchronization test & set (atomic r-m-w)
String search, translate
Graphics (MMX) parallel subword ops (4 16bit add)
Typical Operations Typical Operations
little change since 1960
Laboratorio de Tecnologías de Información
Control Instructions Control Instructions
Taken or not taken?
Where is the target?
Link return address
Save or restore state Conditional
branches
X X
Jumps X
Procedure calls
X X X
Procedure returns
X X
O.S. calls X X X
O.O. returns X X
Laboratorio de Tecnologías de Información
(1) (1) Taken or not taken ? Taken or not taken ?
♦ Compare and branch instruction
+ No extra compare instruction
+ No state passed between instructions - Requires ALU operation
- Restricts code scheduling opportunities
♦ Implicitly set condition codes (Z, N, V, C)
+ Can be set “for free”
- Constrains code reordering - Extra state to save and restore
♦ Explicitly set condition codes (Z, N, V, C)
+ Can be set “for free”
+ Decouples branch/fetch from pipeline - Extra state to save and restore
Laboratorio de Tecnologías de Información
(1) (1) Taken or not taken ?, cont. Taken or not taken ?, cont.
♦ Condition in general-purpose register
+ No special state to save and implement but uses up a register - branch condition separated from branch logic in pipeline
♦ Some data for MIPS
■ > 80 % of compares for branches use immediates
■ > 80 % of these immediates are zero
■ 50 % compares for branches are =0 or != 0
♦ Compromise used in MIPS
■ Have branch-if = 0 and branch-if != 0
■ Have compare instructions (r1=r2, r1 != r2, r1 < r2, r1 <= r2, etc.)
♦ With pipelining, can we predict whether taken ?
■
Laboratorio de Tecnologías de Información
Arquitectura de Computadoras ISA- 40
(2) Where is the target ? (2) Where is the target ?
♦ Could use Arbitrary Specifier ?
+ Orthogonal and powerful
- More bits to specify, more time to decode
- branch execution and target separated in pipeline
♦ PC-relative with immediate
+ Position independence (helps linking), target computable in branch unit
+ Short immediate sufficient. MIPS word immediate:
<= 4 bits: 47 %
<= 8 bits: 94 %
<= 12 bits: 100 %
- Target must be known statically (to link) - Can’t jump arbitrarily far
- Other techniques are required for returns and distance jumps
Laboratorio de Tecnologías de Información
(2) Where is the target ?, cont.
(2) Where is the target ?, cont.
♦ Register
+ Short specification + Can jump anywhere + Dynamic target okay
(returns)
- Extra instruction to load register
♦ (Vectored) Trap
Critical for O.S. calls + Protection.
- Implementation headache
♦ Common compromise
■ (Conditional) branches (pc-rel)
■ (Unconditional) jumps (pc-rel, reg)
■ Procedure calls (pc-rel, reg)
■ Procedure returns (reg)
■ O.S. calls (trap)
■ O.S. returns (reg)
Laboratorio de Tecnologías de Información
(3) Link return address ? (3) Link return address ?
♦ Implicit register + Fast, simple
- SW must save register before next call
- Surprise traps or interrups ?
♦ Explicit register
- No important advantages over above
- Register must be specified
Required for procedure calls and O.S. calls
• Processor stack
+ Recursion supported directly - Complex instruction
Many recent architectures use implicit register
Laboratorio de Tecnologías de Información
(4) (4) Save or restore state ? Save or restore state ?
♦ What state ?
■ Procedure calls: registers
■ O.S. calls: registers and PSW (incl. CCs)
♦ Hardware need not save registers
■ Caller can save registers in use
■ Callee can save registers it will use
♦ Hardware register save
■ Which (IBM STM, VAX CALLS) ?
■ Is the above faster ?
■ Register windows
Laboratorio de Tecnologías de Información
MIPS:
MIPS: Register Register State State
♦ 32 integer registers
■ $0 is hardwared to 0
■ $31 is the return address register
■ software convention for other registers
♦ 32 single-precision FP registers or 16 double- precision FP registers
♦ PC and other special registers
Laboratorio de Tecnologías de Información
MIPS I Operation Overview MIPS I Operation Overview
♦ Arithmetic Logical:
■ Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU
■ AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI
■ SLL, SRL, SRA, SLLV, SRLV, SRAV
♦ Memory Access:
■ LB, LBU, LH, LHU, LW, LWL,LWR
■ SB, SH, SW, SWL, SWR
Laboratorio de Tecnologías de Información
Multiply / Divide Multiply / Divide
♦ Start multiply, divide
■ MULT rs, rt
■ MULTU rs, rt
■ DIV rs, rt
■ DIVU rs, rt
♦ Move result from multiply, divide
■ MFHI rd
■ MFLO rd
♦ Move to HI or LO
■ MTHI rd
■ MTLO rd
♦ Why not third field for destination?
■ (Hint: how many clock cycles for multiply or divide vs. add?) Registers
HI LO
Laboratorio de Tecnologías de Información
Data Types Data Types
Bit: 0, 1
Bit String: sequence of bits of a particular length 4 bits is a nibble
8 bits is a byte
16 bits is a half-word 32 bits is a word
64 bits is a double-word Character:
ASCII 7 bit code
UNICODE 16 bit code Decimal:
digits 0-9 encoded as 0000b thru 1001b two decimal digits packed per 8 bit byte Integers:
2's Complement
Floating Point: exponent How many +/- #'s?
Laboratorio de Tecnologías de Información
Operand Size Usage Operand Size Usage
Frequency of reference by size
0% 20% 40% 60% 80%
Byte Halfword
Word Doubleword
0%
0%
31%
69%
7%
19%
74%
0%
Int Avg.
FP Avg.
• Support for these data sizes and types:
8-bit, 16-bit, 32-bit integers and
32-bit and 64-bit IEEE 754 floating point numbers
Laboratorio de Tecnologías de Información
MIPS arithmetic instructions MIPS arithmetic instructions
Instruction Example Meaning Comments
add add $1,$2,$3 $1 = $2 + $3 3 operands; exception possible subtract sub $1,$2,$3 $1 = $2 – $3 3 operands; exception possible add immediate addi $1,$2,100 $1 = $2 + 100 + constant; exception possible add unsigned addu $1,$2,$3 $1 = $2 + $3 3 operands; no exceptions subtract unsigned subu $1,$2,$3 $1 = $2 – $3 3 operands; no exceptions add imm. unsign. addiu $1,$2,100 $1 = $2 + 100 + constant; no exceptions multiply mult $2,$3 Hi, Lo = $2 x $3 64-bit signed product
multiply unsigned multu$2,$3 Hi, Lo = $2 x $3 64-bit unsigned product
divide div $2,$3 Lo = $2 ÷ $3, Lo = quotient, Hi = remainder Hi = $2 mod $3
divide unsigned divu $2,$3 Lo = $2 ÷ $3, Unsigned quotient & remainder Hi = $2 mod $3
Move from Hi mfhi $1 $1 = Hi Used to get copy of Hi
Laboratorio de Tecnologías de Información
MIPS logical instructions MIPS logical instructions
Instruction Example Meaning Comment
and and $1,$2,$3 $1 = $2 & $3 3 reg. operands; Logical AND or or $1,$2,$3 $1 = $2 | $3 3 reg. operands; Logical OR xor xor $1,$2,$3 $1 = $2 ⊕ $3 3 reg. operands; Logical XOR nor nor $1,$2,$3 $1 = ~($2 |$3) 3 reg. operands; Logical NOR and immediate andi $1,$2,10 $1 = $2 & 10 Logical AND reg, constant or immediate ori $1,$2,10 $1 = $2 | 10 Logical OR reg, constant xor immediate xori $1, $2,10 $1 = ~$2 &~10 Logical XOR reg, constant shift left logical sll $1,$2,10 $1 = $2 << 10 Shift left by constant
shift right logical srl $1,$2,10 $1 = $2 >> 10 Shift right by constant shift right arithm. sra $1,$2,10 $1 = $2 >> 10 Shift right (sign extend) shift left logical sllv $1,$2,$3 $1 = $2 << $3 Shift left by variable shift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variable
shift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable
Laboratorio de Tecnologías de Información
MIPS data transfer instructions MIPS data transfer instructions
Instruction Comment SW 500(R4), R3 Store word SH 502(R2), R3 Store half SB 41(R3), R2 Store byte LW R1, 30(R2) Load word LH R1, 40(R3) Load halfword
LHU R1, 40(R3) Load halfword unsigned LB R1, 40(R3) Load byte
LBU R1, 40(R3) Load byte unsigned
LUI R1, 40 Load Upper Immediate (16 bits shifted left by 16)
Laboratorio de Tecnologías de Información
When does MIPS sign extend?
When does MIPS sign extend?
♦ When value is sign extended, copy upper bit to full value:
Examples of sign extending 8 bits to 16 bits:
00001010 ⇒ 00000000 00001010 10001100 ⇒ 11111111 10001100
♦ When is an immediate value sign extended?
■ Arithmetic instructions (add, sub, etc.) sign extend immediates even for the unsigned versions of the instructions!
■ Logical instructions do not sign extend
♦ Load/Store half or byte do sign extend, but unsigned
versions do not.
Laboratorio de Tecnologías de Información
Methods of Testing Condition Methods of Testing Condition
♦ Condition Codes
■ Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions.
ex: add r1, r2, r3 bz label
♦ Condition Register Ex: cmp r1, r2, r3
bgt r1, label
♦ Compare and Branch Ex: bgt r1, r2, label
Laboratorio de Tecnologías de Información
Conditional Branch Distance Conditional Branch Distance
Bits of Branch Dispalcement 0%
10%
20%
30%
40%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Int. Avg. FP Avg.
• 25% of integer branches are 2 to 4 instructions
Laboratorio de Tecnologías de Información
Conditional Branch Addressing Conditional Branch Addressing
♦ PC-relative since most branches are relatively close to the current PC
♦ At least 8 bits suggested (±128 instructions)
♦ Compare Equal/Not Equal most important for integer programs (86%)
0% 50% 100%
EQ/NE GT/LE LT/GE
37%
23%
40%
86%
7%
7%
Int Avg.
FP Avg.
Laboratorio de Tecnologías de Información
Arquitectura de Computadoras ISA- 56
MIPS Compare and Branch MIPS Compare and Branch
♦ Compare and Branch
■ BEQ rs, rt, offset if R[rs] == R[rt] then PC-relative branch
■ BNE rs, rt, offset <>
♦ Compare to zero and Branch
■ BLEZ rs, offset if R[rs] <= 0 then PC-relative branch
■ BGTZ rs, offset >
■ BLT <
■ BGEZ >=
■ BLTZAL rs, offset if R[rs] < 0 then branch and link (into R 31)
■ BGEZAL >=!
♦ Remaining set of compare and branch ops take two instructions
♦ Almost all comparisons are against zero!
Laboratorio de Tecnologías de Información
MIPS jump, branch, compare MIPS jump, branch, compare
instructions instructions
Instruction Example Meaning
branch on equal beq $1,$2,100 if ($1 == $2) go to PC+4+100 Equal test; PC relative branch
branch on not eq. bne $1,$2,100 if ($1!= $2) go to PC+4+100 Not equal test; PC relative
set on less than slt $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; 2’s comp.
set less than imm. slti $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; 2’s comp.
set less than uns. sltu $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; unsigned numbers
set l. t. imm. uns. sltiu $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; unsigned numbers
jump j 10000 go to 10000
Jump to target address
Laboratorio de Tecnologías de Información
Signed vs. Unsigned Comparison Signed vs. Unsigned Comparison
R1=
0…00 0000 0000 0000 0001
R2=
0…00 0000 0000 0000 0010
R3=
1…11 1111 1111 1111 1111
♦ After executing these instructions:
slt r4,r2,r1 ; if (r2 < r1) r4=1; else r4=0 slt r5,r3,r1 ; if (r3 < r1) r5=1; else r5=0 sltu r6,r2,r1 ; if (r2 < r1) r6=1; else r6=0 sltu r7,r3,r1 ; if (r3 < r1) r7=1; else r7=0
♦ What are values of registers r4 - r7? Why?
r4 = ; r5 = ; r6 = ; r7 = ;
two two two
Laboratorio de Tecnologías de Información
Calls: Why Are Stacks So Great?
Calls: Why Are Stacks So Great?
Stacking of Subroutine Calls & Returns and Environments:
A:
CALL B
CALL C C:
RET
RET B:
A
A B
A B C
A B
A
Some machines provide a memory stack as part of the architecture (e.g., VAX)
Laboratorio de Tecnologías de Información
Memory Stacks Memory Stacks
Useful for stacked environments/subroutine call & return even if operand stack not part of architecture
Stacks that Grow Up vs. Stacks that Grow Down:
a b c
0 Little
inf. Big 0 Little
inf. Big
Memory Addresses SP
Next Empty?
Last Full?
How is empty stack represented?
Little --> Big/Last Full
POP: Read from Mem(SP) Decrement SP
PUSH: Increment SP
grows up
grows down
Little --> Big/Next Empty POP: Decrement SP
Read from Mem(SP) PUSH: Write to Mem(SP)
Laboratorio de Tecnologías de Información
Call Call - - Return Linkage: Stack Frames Return Linkage: Stack Frames
FP
ARGS
Callee Save Registers
Local Variables
SP
Reference args and local variables at fixed (positive) offset from FP
Grows and shrinks during expression evaluation (old FP, RA)
Many variations on stacks possible (up/down, last pushed / next )
High Mem
Low Mem
Laboratorio de Tecnologías de Información
0 zero constant 0
1 at reserved for assembler 2 v0 expression evaluation &
3 v1 function results 4 a0 arguments
5 a1 6 a2 7 a3
8 t0 temporary: caller saves . . . (callee can clobber) 15 t7
MIPS: Software conventions for MIPS: Software conventions for
Registers Registers
16 s0 callee saves . . . (callee must save) 23 s7
24 t8 temporary (cont’d) 25 t9
26 k0 reserved for OS kernel 27 k1
28 gp Pointer to global area 29 sp Stack pointer
30 fp frame pointer
31 ra Return Address (HW)
Laboratorio de Tecnologías de Información
MIPS / GCC Calling Conventions MIPS / GCC Calling Conventions
FP
fact: SP
addiu $sp, $sp, -32 sw $ra, 20($sp) sw $fp, 16($sp) addiu $fp, $sp, 32 . . .
sw $a0, 0($fp) ...
lw $31, 20($sp) lw $fp, 16($sp) addiu $sp, $sp, 32
ra old FP
ra low
address
FP SP ra
FP SP
Laboratorio de Tecnologías de Información
Details of the MIPS instruction set Details of the MIPS instruction set
♦ Register zero always has the value zero (even if you try to write it)
♦ Branch/jump and link put the return addr. PC+4 or 8 into the link register (R31) (depends on logical vs physical architecture)
♦ All instructions change all 32 bits of the destination register (including lui, lb, lh) and all read all 32 bits of sources (add, sub, and, or, …)
♦ Immediate arithmetic and logical instructions are extended as follows:
■ logical immediates ops are zero extended to 32 bits
■ arithmetic immediates ops are sign extended to 32 bits (including addu)
♦ The data loaded by the instructions lb and lh are extended as follows:
■ lbu, lhu are zero extended
■ lb, lh are sign extended
♦ Overflow can occur in these arithmetic and logical instructions:
■ add, sub, addi
■ it cannot occur in addu, subu, addiu, and, or, xor, nor, shifts, mult, multu, div, divu
Laboratorio de Tecnologías de Información
Delayed Branches Delayed Branches
♦ In the “Raw” MIPS, the instruction after the branch is executed even when the branch is taken?
■This is hidden by the assembler for the MIPS “virtual machine”
li r3, #7 sub r4, r4, 1 bz r4, LL addi r5, r3, 1 subi r6, r6, 2 LL: slt r1, r3, r5
Laboratorio de Tecnologías de Información
Branch & Pipelines Branch & Pipelines
execute
Branch
Delay Slot
Branch Target
By the end of Branch instruction, the CPU knows whether or not the branch will take place.
However, it will have fetched the next instruction by then, regardless of whether or not a branch will be taken.
Why not execute it?
ifetch execute
ifetch execute
ifetch execute LL: slt r1, r3, r5
li r3, #7
sub r4, r4, 1 bz r4, LL
addi r5, r3, 1
Time
ifetch execute
Laboratorio de Tecnologías de Información
Filling Delayed Branches Filling Delayed Branches
IF DEC & OP fetch
Branch:
execute successor even if branch taken!
Then branch target
or continue Single delay slot
impacts the critical path
•Compiler can fill a single delay slot with a useful instruction 50% of the time.
• try to move down from above jump
•move up from target, if safe
add r3, r1, r2 sub r4, r4, 1 bz r4, LL
NOP ...
Execute
IF DEC & OP fetch Execute
IF
Laboratorio de Tecnologías de Información
Miscellaneous MIPS I instructions Miscellaneous MIPS I instructions
♦ break A breakpoint trap occurs, transfers
control to exception handler
♦ syscall A system trap occurs, transfers control to exception handler
♦ coprocessor instrs. Support for floating point
♦ TLB instructions Support for virtual memory: discussed later
♦ restore from exception Restores previous interrupt mask &
kernel/user mode bits into status register
♦ load word left/right Supports misaligned word loads
♦ store word left/right Supports misaligned word stores
Laboratorio de Tecnologías de Información
MIPS:
MIPS: Instruction Instruction Set Set Format Format
■ load/store architecture with 3 explicit operands (ALU ops)
■ fixed 32-bit instructions
■ 3 instruction formats
» R-Type
» I-Type
» J-Type
■ 6 instruction set groups:
» load/store - data movement operations
» computational - arithmetic, logical, and shift operations
» jump/branch - including call and returns
» coprocessor - FP instructions
» coprocessor0 - memory management and exception handling
Laboratorio de Tecnologías de Información
R2000/3000
R2000/3000 Instruction Instruction Formats Formats
♦ R-type (register)
e.g. add $8, $17, $18 # $8 = $17 + $18
OpCode rs rt rd shamt funct
0 5
6 10
11 15
16 20
21 25
26 31
0 17 18 8 0 32
0 5
6 10
11 15
16 20
21 25
26 31
Laboratorio de Tecnologías de Información
• I-type (immediate)
e.g. addi $8, $17, -44 # $8 = $17 -44
lw $8, -44($17) # $8 = M[$17 - 44]
beq $17, $8, label # if( $8 == $17) go to label:
OpCode rs rt immediate
0 15
16 20
21 25
26 31
“op” 17 8 -44
0 15
16 20
21 25
26 31
R2000/3000
R2000/3000 Instruction Instruction Formats Formats
Laboratorio de Tecnologías de Información
• J-type (jump)
e.g. jump label # call label: ;
$31 = $pc + 8
OpCode target
0 25
26 31
3 -44
0 25
26 31
R2000/3000
R2000/3000 Instruction Instruction Formats Formats
Laboratorio de Tecnologías de Información
Bhandarkar
Bhandarkar and and Clark: RISC vs. CISC Clark: RISC vs. CISC
♦ Compares the VAX 8700 vs MIPS M/2000 (R3000 chip)
♦ Combines three fractors:
■ Architecture
■ Implementation
■ Compilers and OS
♦ Argues that:
■ Implementation effects are second order
■ Compilers are “similar”
■ RISCs are better than CISCs
Laboratorio de Tecnologías de Información
Bhandarkar
Bhandarkar and and Clark, cont. Clark, cont.
RISC factor
Risc Facto Instr
CPI Instr
vax mips mips
r = CPIvax
Bechmark Inst.
Ratio
CPI MIPS
CPI VAX
Ratio RISC factor
spice2g6 2.5 1.8 8.0 4.4 1.8
matrix300 2.4 3.0 13.8 4.5 1.9
nasa7 2.1 3.0 15.0 5.0 2.4
fpppp 2.9 1.5 15.2 10.5 2.7
tomcatv 2.9 2.1 17.5 8.2 2.9
dudoc 2.7 1.7 13.2 7.9 3.0
espresso 1.7 1.1 5.4 5.1 3.0
eqntott 1.1 1.3 4.4 3.5 3.3
li 1.6 1.1 6.5 6.0 3.7
geo. mean 2.2 1.7 9.9 5.8 2.7
Laboratorio de Tecnologías de Información
Bhandarkar
Bhandarkar and and Clark, cont. Clark, cont.
♦ Compensation Factors
■ Increase VAX CPI but decrease VAX instruction count
■ Increase MIPS instruction count
■ Example 1: Loads and stores vs. operand specifiers
■ Example 2: Necessary complex operations, e.g. loop branches
♦ Factors favoring VAX
■ Big immediate values
■ Not-taken branches incur no delay
Laboratorio de Tecnologías de Información
Bhandarkar
Bhandarkar and and Clark, cont. Clark, cont.
♦ Factors favoring MIPS
■ Operand specifier decoding
■ Number of registers
■ Separating floating point unit
■ Simple jumps and branches (lower latency)
■ Fancy VAX instructions: Unnecessary functionality
■ Instruction scheduling
■ Translation buffer
■ Branch displacement size
Laboratorio de Tecnologías de Información
Homework
Homework Assignment Assignment 3 3
♦ Make a review of the ISA for a specific processor
♦ Write a three page report explaining the following:
1. Kind of IS Architecture (RISC, CISC or other, 16-bit, 32- bit, 64-bit)
2. Classes of instructions (ALU, Memory Movement, Branches, etc.)
3. Addressing Modes (immediate, base+displacemente, indirect, etc.)
4. Displacements in branches and control flow instructions (call, ret)
5. Special instructions: system calls, traps, access to
special purpose registers
Laboratorio de Tecnologías de Información
Hw Hw 3 3 : List of Processors : List of Processors
1 Claudia Méndez Garza Xscale o ARM
2 José Alberto Ramírez Uresti Opteron AMD 64 bits
3 Víctor Echeverría Ríos Texas Instruments TMS320DM64x o un DSP
Due date: September 26th, 2008.
Laboratorio de Tecnologías de Información
L L á á minas complementarias no minas complementarias no expuestas en la clase que pueden expuestas en la clase que pueden
servir de soporte
servir de soporte
Laboratorio de Tecnologías de Información
VAX VAX - - 11 11
♦ Introduced by DEC in 1977: VAX- 11/780
♦ Upward compatible from PDP-11
♦ 32-bit “word” and addresses
♦ Virtual memory is first-class
♦ 16 GPRs (r15 is PC, r14 is SP), CCs
♦ Extremely orthogonal, memory- memory
♦ Decode as byte stream
■ Opcode: operation, number of operands
& operand type
■ Variable-length address specifiers
Laboratorio de Tecnologías de Información
VAX VAX - - 11, cont. 11, cont.
♦ Data types
■ 8-, 16-, 32-, 64-, 128-bit integers
■ F (32 bits), D (64), G (64), H (128) FP
■ Character string (8-bits/char)
■ Decimal (4-bits/digit)
■ Numeric string (8-bits/digit)
♦ Addresing modes include
–Literal (6 bits)
–8-, 16-, 32-bit immediates –Register, register deferred –8-, 16-, 32-bit displacements
–Indexed
–Autoincrement –Autodecrement
–Autoincrement deferred