Instruction Set Architecture

(1)

Laboratorio de Tecnologías de Información

Instruction Set Architecture Instruction Set Architecture

Arquitectura de Computadoras Arquitectura de Computadoras

Arturo D

Arturo Díaz Píaz Péérezrez Centro de Investigaci

Centro de Investigacióón y de Estudios Avanzados del IPNn y de Estudios Avanzados del IPN Laboratorio de Tecnolog

Laboratorio de Tecnologíías de Informacias de Informacióónn

(2)

Instruction

Instruction Set Set

♦ ... the attributes of a [computing] system as seen by the programmer, i. e., the conceptual structure and functional behavior, as distinct from the

organization of the data flows and controls of logic design, and the physical implementation.

■ Amdahl, Blaaw, Brooks, 1964.

(3)

Instruction Set Design Instruction Set Design

instruction set

software

hardware

(4)

Instruction Set Architecture Instruction Set Architecture

♦ ... the attributes of a [computing] system as seen by the programmer, i. e., the conceptual structure and functional

behavior, as distinct from the organization of the data flows and controls of logic design, and the physical implementation.

■ Amdahl, Blaaw, Brooks, 1964.

♦ Organization of programmable storage

♦ Data types & data structures: encodings and representations

♦ Instruction formats

♦ Instruction (or Operand Code) Set

♦ Modes of addressing and accessing data items and instructions

♦ Exceptional conditions

(5)

ISA: What Must be Specified?

Instruction Fetch Instruction

Decode Operand

Fetch Execute

Result Store

Next

♦ Instruction Format or Encoding

■ how is it decoded?

♦ Location of operands and result

■ where other than memory?

■ how many explicit operands?

■ how are memory operands located?

■ which can or cannot be in memory?

♦ Data type and Size

♦ Operations

■ what are supported

♦ Successor instruction

(6)

Evolution

Evolution of of Instruction Instruction Sets Sets

Single Accumulator (EDSAC 1950) Accumulator + Index Registers

(Manchester Mark I, IBM 700 series 1953 Separation of Programming Model

from Implementation High-level Language Based

(B5000 1963)

Concept of a Family IBM 360 1964 General Purpose Register Machines

Complex Instruction Sets (Vax, Intel 432 1977-80)

Load/Store Architecture (CDC 6600, Cray 1 1963-76)

RISC: MIPS, Sparc, 88000, IBM RS6000, ... 1987

(7)

Basic ISA

Basic ISA Classes Classes

♦ Accumulator:

1 address add A acc ← acc + mem[A]

1+x address addx A acc ← acc + mem[A + x]

♦ Stack

0 address add tos ← tos + next

♦ General Purpose register

2 address add A B EA(A) ← EA(A) + EA(B) 3 address add A B C EA(A) ← EA(B) + EA(C)

♦ Load/Store

3 address add Ra Rb Rc Ra ← Rb + Rc load Ra Rb Ra ← mem[Rb]

(8)

Code sequence for (C = A + B) for four classes of instruction sets:

Accumulator Load A Add B Store C

Register

(register-memory) Load R1,A

Add R1,B Store C, R1 Stack

Push A Push B Add Pop C

Register (load-store) Load R1,A Load R2,B

Add R3,R1,R2 Store C,R3

Comparison:

Bytes per instruction? Number of Instructions? Cycles per instruction?

Comparing Number of Instructions

(9)

General Purpose Registers Dominate General Purpose Registers Dominate

♦ 1975-200x all machines use general purpose registers

♦ Advantages of registers

■ registers are faster than memory

■ registers are easier for a compiler to use

» e.g., (A*B) – (C*D) – (E*F) can do multiplies in any order vs.

stack

■ registers can hold variables

» memory traffic is reduced, so program is sped up (since registers are faster than memory)

(10)

Caches vs.

Caches vs. Registers Registers

♦ Registers advantages

■ Faster (no addressing mode, no tags)

■ Deterministic (no misses)

■ Can duplicate for two ports

■ Short identifier (3-8 bits)

♦ Register disadvantages

■ Must save/restore on procedure calls

■ Can’t take the address of a register

■ Fixed size (FP, strings, structures)

■ Compiler must control (?)

(11)

Caches vs.

Caches vs. Registers Registers ( ( cont cont ’ ’ d d ) )

♦ How many registers? More means

+ Hold operands longer (reducing memory traffic & potentially execution time)

- Longer register specifiers (except with register windows) - Slow registers

- More state slows context switches

(12)

♦ Programmable storage

■ 2³² x bytes of memory

■ 31 x 32-bit GPRs (R0 = 0)

■ 32 x 32-bit FP regs (paired DP)

■ HI, LO, PC

r0 0

r1

°

° r31 PC lo hi

MIPS I Registers

(13)

Data Movement Load (from memory)

Store (to memory)

memory-to-memory move register-to-register move input (from I/O device) output (to I/O device)

push, pop (to/from stack)

Arithmetic integer (binary + decimal) or FP Add, Subtract, Multiply, Divide

Logical not, and, or, set, clear

Shift shift left/right, rotate left/right

Typical Operations

(14)

Control (Jump/Branch) unconditional, conditional Subroutine Linkage call, return

Interrupt trap, return

Synchronization test & set (atomic r-m-w)

String search, translate

Graphics (MMX) parallel subword ops (4 16bit add)

Typical Operations Typical Operations

little change since 1960

(15)

° Rank instruction Integer Average Percent total executed

1 load 22%

2 conditional branch 20%

3 compare 16%

4 store 12%

5 add 8%

6 and 6%

7 sub 5%

8 move register-register 4%

9 call 1%

10 return 1%

Top 10 80x86 Instructions

(16)

Operation Summary Operation Summary

♦ Support these simple instructions, since they

will dominate the number of instructions executed:

■ load,

■ store,

■ add,

■ subtract,

■ move register-register,

■ and,

■ shift,

■ compare equal, compare not equal,

■ branch, jump,

■ call,

■ return;

(17)

Operands

Operands for for ALU ALU instructions instructions

♦ ALU instructions combine operands (e.g. ADD)

♦ Number of explicit operands

■ Two - destination equals one source

■ Three - orthogonal

♦ Operands in registers or memory

■ Any combination -- VAX

» (orthogonal, but variable instr. formats)

■ At least one register -- much of 360

» (not orthogonal)

■ All registers -- CRAY, DLX, RISCs

» (orthogonal, but needs loads/stores)

(18)

Memory Addressing Memory Addressing

♦ Since 1980 almost every machine uses addresses to level of 8-bits (byte)

♦ 2 questions for design of ISA:

■ Since could read a 32-bit word as four loads of bytes from sequential byte

addresses or as one load word from a single byte address,

» How do byte addresses map onto words?

» Can a word be placed on any byte boundary?

(19)

♦ Big Endian: address of most significant byte = word address (xx00 = Big End of word)

■ IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA

♦ Little Endian: address of least significant byte = word address (xx00 = Little End of word)

■ Intel 80x86, DEC Vax, DEC Alpha (Windows NT)

♦ Mode selectable

■ becoming more common: PowerPC, MIPS R10000

msb lsb

3 2 1 0

little endian byte 0

Addressing Objects:

Addressing Objects: Endian Endian Wars Wars

(20)

0 1 2 3 Aligned

Not Aligned

Addressing Objects: Alignment Addressing Objects: Alignment

♦ Alignment: require that objects fall on address that is

multiple of their size.

(21)

Alignment Alignment

♦ No restrictions

■ Simpler software

■ Hardware must detect misalignment and make 2 memory accesses

■ expensive logic, slows down all references

■ sometimes required for backward compatibility

♦ Restrictred alignment

■ software must guarantee alignment

■ hardware only detecs misalignment and traps

■ trap handler does it

♦ Middle group

■ misaligned data ok but requires multiple instructions

■ compiler must skill know

■ still trap on misaligned access

(22)

A A “ “ typical typical ” ” RISC RISC

♦ 32-bit fixed format instruction (3 formats)

♦ 32 32-bit GPR (R0 contains zero, DP take pair)

♦ 3-address, reg-reg arithmetic instruction

♦ Single address mode for load/store: base+displacement

■ no indirection

♦ Simple branch conditions

♦ Delay branch

see: SPARC, MIPS MC88100, AMD2900, i960, i860,

PARisc, DEC Alpha, Clipper, CDC 6600, CDC 7600,

Cray-1, Cray-2, Cray-3, ...

(23)

VAX VAX - - 11 11

♦ Variable format, 2 and 3 address instruction

♦ 32-bit word size, 16 GPR (four reserved)

♦ Rich set of addressing modes (apply to any operand)

♦ Rich set of operations

■ bit-field, stack, call, case, loop, string, poly, system)

♦ Rich set of data types (B, W, L, Q, O, F, D, G, H)

♦ Condition codes

OpCode A/M A/M A/M

Byte 0 1 n m

(24)

VAX VAX - - 11: 11: Addressing Addressing Modes Modes

1. Register Ri

2. Base + Displacement M[Ri + v]

3. Immediate v

4. Register Indirect M[Ri]

5. Direct (absolute) M[v]

6. Base + Index M[Ri + Rj]

7. Scaled Index M[Ri + Rj*d + v]

8. Autoincrement M[Ri++]

9. Autodecrement M[Ri--]

10. Memory Indirec M[ M[Ri] ]

11. [Indirection chains]

Modes 1-4 account for 93 % of all operands on the VAX

Memory

Register File

(25)

Addressing Mode Usage Addressing Mode Usage

♦ 3 programs measured on machine with all address modes (VAX)

--- Displacement: 42% avg, 32% to 55%

--- Immediate: 33% avg, 17% to 43%

--- Register deferred (indirect): 13% avg, 3% to 24%

--- Scaled: 7% avg, 0% to 16%

--- Memory indirect: 3% avg, 1% to 6%

--- Misc: 2% avg, 0% to 3%

♦ 75% displacement & immediate

♦ 85% displacement, immediate & register indirect

75% 85%

(26)

° Avg. of 5 SPECint92 programs v. avg. 5 SPECfp92 programs

° 1% of addresses > 16-bits

° 12 - 16 bits of displacement needed 0%

5%

10%

15%

20%

25%

30%

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Int. Avg. FP Avg.

Address Bits

Displacement Address Size?

(27)

Immediate Size?

♦ 50% to 60% fit within 8 bits

♦ 75% to 80% fit within 16 bits

(28)

Addressing Summary Addressing Summary

♦ Data Addressing modes that are important:

■ Displacement, Immediate, Register Indirect

♦ Displacement size should be 12 to 16 bits

♦ Immediate size should be 8 to 16 bits

(29)

Variable:

Fixed:

Hybrid:

…

Generic Example of Instruction Format Widths

(30)

Instruction Formats Instruction Formats

♦ If code size is most important, use variable length instructions

♦ If performance is most important, use fixed length instructions

♦ Recent embedded machines (ARM, MIPS) added optional mode to execute subset of 16-bit wide

instructions (Thumb, MIPS16); per procedure decide performance or density

♦ Some architectures actually exploring on-the-fly

decompression for more density.

(31)

Instruction Format Instruction Format

♦ If have many memory operands per instruction and/or many addressing modes:

=>Need one address specifier per operand

♦ If have load-store machine with 1 address per instr.

and one or two addressing modes:

=> Can encode addressing mode in the opcode

(32)

op rs rt rd

register Register (direct)

immed

op rs rt

Immediate

• All instructions 32 bits wide

Base+index

immed

op rs rt

register +

Memory

PC-relative

immed

op rs rt

PC +

Memory

• Register Indirect?

MIPS Addressing Modes/Instruction MIPS Addressing Modes/Instruction

Formats

(33)

Most Popular ISA of all time: Intel Most Popular ISA of all time: Intel

80x86 80x86

♦ 1971: Intel invents microprocessor 4004/8008, 8080 in 1975

♦ 1975: Gordon Moore realized one more chance for new ISA before ISA locked in for decades

■ hired CS people in Oregon

■ weren’t ready in 1977 (CS people did 432 in 1980)

■ started crash effort for 16-bit microcomputer

♦ 1978: 8086 dedicated registers, segmented address, 16 bit

■ 8088; 8-bit external bus version of 8086

♦ 1980: IBM selects 8088 as basis for IBM PC

♦ 1980: 8087 floating point coprocessor: adds 60 instructions using hybrid stack/register scheme

♦ 1982: 80286 24-bit address, protection, memory mapping

(34)

Intel x86 (IA

Intel x86 (IA - - 32) 32)

♦ 1989: 80486 & Pentium in 1992: faster + MP few instructions

♦ 1997: MMX multimedia extensions

♦ 200X: Superseded by IA-64 (Merced, McKinley, Itanium, etc.)

♦ “Difficult to explain and impossible to love”

■ See H&P Appendix D.8

♦ Eight 32-bit registers (EAX, EBX, ..., but also ESP, EBP)

♦ Also 16- and 8-bit version (AX, AH, AL)

♦ Most instructions have two operands, one possibly from memory

♦ One super-duper addressing mode w/ effective address =

■ base_reg + (index_reg * scaling_factor) + displacement

♦ Many formats: see H&P fig. D.8

(35)

Intel MMX Intel MMX

♦ MultiMedia eXtension to IA-32 [Peleg & Weiser, IEEE Micro, 8/96]

■ Multimedia data values often need much less than 32 bits

■ But are organized in groups (e.g. red/green/blue)

■ So in 64-bit FP registers: 2x32, 4x16, 8x8

♦ E.g. ADDB (for byte)

■ 17 87 100 ... 6 more

■ +17 13 200 ... 6 more

■ --- --- --- ... ---

■ 34 100 255 ... 6 more

♦ MMX takes 16-element dot product (a

₀

*b

₀

+ a

₁

*b

₁

+ ... +

a *b )

(36)

Control (Jump/Branch) unconditional, conditional Subroutine Linkage call, return

Interrupt trap, return

Synchronization test & set (atomic r-m-w)

String search, translate

Graphics (MMX) parallel subword ops (4 16bit add)

Typical Operations Typical Operations

little change since 1960

(37)

Control Instructions Control Instructions

Taken or not taken?

Where is the target?

Link return address

Save or restore state Conditional

branches

X X

Jumps X

Procedure calls

X X X

Procedure returns

X X

O.S. calls X X X

O.O. returns X X

(38)

(1) (1) Taken or not taken ? Taken or not taken ?

♦ Compare and branch instruction

+ No extra compare instruction

+ No state passed between instructions - Requires ALU operation

- Restricts code scheduling opportunities

♦ Implicitly set condition codes (Z, N, V, C)

+ Can be set “for free”

- Constrains code reordering - Extra state to save and restore

♦ Explicitly set condition codes (Z, N, V, C)

+ Can be set “for free”

+ Decouples branch/fetch from pipeline - Extra state to save and restore

(39)

(1) (1) Taken or not taken ?, cont. Taken or not taken ?, cont.

♦ Condition in general-purpose register

+ No special state to save and implement but uses up a register - branch condition separated from branch logic in pipeline

♦ Some data for MIPS

■ > 80 % of compares for branches use immediates

■ > 80 % of these immediates are zero

■ 50 % compares for branches are =0 or != 0

♦ Compromise used in MIPS

■ Have branch-if = 0 and branch-if != 0

■ Have compare instructions (r1=r2, r1 != r2, r1 < r2, r1 <= r2, etc.)

♦ With pipelining, can we predict whether taken ?

■

(40)

Arquitectura de Computadoras ISA- 40

(2) Where is the target ? (2) Where is the target ?

♦ Could use Arbitrary Specifier ?

+ Orthogonal and powerful

- More bits to specify, more time to decode

- branch execution and target separated in pipeline

♦ PC-relative with immediate

+ Position independence (helps linking), target computable in branch unit

+ Short immediate sufficient. MIPS word immediate:

<= 4 bits: 47 %

<= 8 bits: 94 %

<= 12 bits: 100 %

- Target must be known statically (to link) - Can’t jump arbitrarily far

- Other techniques are required for returns and distance jumps

(41)

(2) Where is the target ?, cont.

♦ Register

+ Short specification + Can jump anywhere + Dynamic target okay

(returns)

- Extra instruction to load register

♦ (Vectored) Trap

Critical for O.S. calls + Protection.

- Implementation headache

♦ Common compromise

■ (Conditional) branches (pc-rel)

■ (Unconditional) jumps (pc-rel, reg)

■ Procedure calls (pc-rel, reg)

■ Procedure returns (reg)

■ O.S. calls (trap)

■ O.S. returns (reg)

(42)

(3) Link return address ? (3) Link return address ?

♦ Implicit register + Fast, simple

- SW must save register before next call

- Surprise traps or interrups ?

♦ Explicit register

- No important advantages over above

- Register must be specified

Required for procedure calls and O.S. calls

• Processor stack

+ Recursion supported directly - Complex instruction

Many recent architectures use implicit register

(43)

(4) (4) Save or restore state ? Save or restore state ?

♦ What state ?

■ Procedure calls: registers

■ O.S. calls: registers and PSW (incl. CCs)

♦ Hardware need not save registers

■ Caller can save registers in use

■ Callee can save registers it will use

♦ Hardware register save

■ Which (IBM STM, VAX CALLS) ?

■ Is the above faster ?

■ Register windows

(44)

MIPS:

MIPS: Register Register State State

♦ 32 integer registers

■ $0 is hardwared to 0

■ $31 is the return address register

■ software convention for other registers

♦ 32 single-precision FP registers or 16 double- precision FP registers

♦ PC and other special registers

(45)

MIPS I Operation Overview MIPS I Operation Overview

♦ Arithmetic Logical:

■ Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU

■ AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI

■ SLL, SRL, SRA, SLLV, SRLV, SRAV

♦ Memory Access:

■ LB, LBU, LH, LHU, LW, LWL,LWR

■ SB, SH, SW, SWL, SWR

(46)

Multiply / Divide Multiply / Divide

♦ Start multiply, divide

■ MULT rs, rt

■ MULTU rs, rt

■ DIV rs, rt

■ DIVU rs, rt

♦ Move result from multiply, divide

■ MFHI rd

■ MFLO rd

♦ Move to HI or LO

■ MTHI rd

■ MTLO rd

♦ Why not third field for destination?

■ (Hint: how many clock cycles for multiply or divide vs. add?) Registers

HI LO

(47)

Data Types Data Types

Bit: 0, 1

Bit String: sequence of bits of a particular length 4 bits is a nibble

8 bits is a byte

16 bits is a half-word 32 bits is a word

64 bits is a double-word Character:

ASCII 7 bit code

UNICODE 16 bit code Decimal:

digits 0-9 encoded as 0000b thru 1001b two decimal digits packed per 8 bit byte Integers:

2's Complement

Floating Point: exponent How many +/- #'s?

(48)

Operand Size Usage Operand Size Usage

Frequency of reference by size

0% 20% 40% 60% 80%

Byte Halfword

Word Doubleword

0%

31%

69%

7%

19%

74%

0%

Int Avg.

FP Avg.

• Support for these data sizes and types:

8-bit, 16-bit, 32-bit integers and

32-bit and 64-bit IEEE 754 floating point numbers

(49)

MIPS arithmetic instructions MIPS arithmetic instructions

Instruction Example Meaning Comments

add add $1,$2,$3 $1 = $2 + $3 3 operands; exception possible subtract sub $1,$2,$3 $1 = $2 – $3 3 operands; exception possible add immediate addi $1,$2,100 $1 = $2 + 100 + constant; exception possible add unsigned addu $1,$2,$3 $1 = $2 + $3 3 operands; no exceptions subtract unsigned subu $1,$2,$3 $1 = $2 – $3 3 operands; no exceptions add imm. unsign. addiu $1,$2,100 $1 = $2 + 100 + constant; no exceptions multiply mult $2,$3 Hi, Lo = $2 x $3 64-bit signed product

multiply unsigned multu$2,$3 Hi, Lo = $2 x $3 64-bit unsigned product

divide div $2,$3 Lo = $2 ÷ $3, Lo = quotient, Hi = remainder Hi = $2 mod $3

divide unsigned divu $2,$3 Lo = $2 ÷ $3, Unsigned quotient & remainder Hi = $2 mod $3

Move from Hi mfhi $1 $1 = Hi Used to get copy of Hi

(50)

MIPS logical instructions MIPS logical instructions

Instruction Example Meaning Comment

and and $1,$2,$3 $1 = $2 & $3 3 reg. operands; Logical AND or or $1,$2,$3 $1 = $2 | $3 3 reg. operands; Logical OR xor xor $1,$2,$3 $1 = $2 ⊕ $3 3 reg. operands; Logical XOR nor nor $1,$2,$3 $1 = ~($2 |$3) 3 reg. operands; Logical NOR and immediate andi $1,$2,10 $1 = $2 & 10 Logical AND reg, constant or immediate ori $1,$2,10 $1 = $2 | 10 Logical OR reg, constant xor immediate xori $1, $2,10 $1 = ~$2 &~10 Logical XOR reg, constant shift left logical sll $1,$2,10 $1 = $2 << 10 Shift left by constant

shift right logical srl $1,$2,10 $1 = $2 >> 10 Shift right by constant shift right arithm. sra $1,$2,10 $1 = $2 >> 10 Shift right (sign extend) shift left logical sllv $1,$2,$3 $1 = $2 << $3 Shift left by variable shift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variable

shift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable

(51)

MIPS data transfer instructions MIPS data transfer instructions

Instruction Comment SW 500(R4), R3 Store word SH 502(R2), R3 Store half SB 41(R3), R2 Store byte LW R1, 30(R2) Load word LH R1, 40(R3) Load halfword

LHU R1, 40(R3) Load halfword unsigned LB R1, 40(R3) Load byte

LBU R1, 40(R3) Load byte unsigned

LUI R1, 40 Load Upper Immediate (16 bits shifted left by 16)

(52)

When does MIPS sign extend?

♦ When value is sign extended, copy upper bit to full value:

Examples of sign extending 8 bits to 16 bits:

00001010 ⇒ 00000000 00001010 10001100 ⇒ 11111111 10001100

♦ When is an immediate value sign extended?

■ Arithmetic instructions (add, sub, etc.) sign extend immediates even for the unsigned versions of the instructions!

■ Logical instructions do not sign extend

♦ Load/Store half or byte do sign extend, but unsigned

versions do not.

(53)

Methods of Testing Condition Methods of Testing Condition

♦ Condition Codes

■ Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions.

ex: add r1, r2, r3 bz label

♦ Condition Register Ex: cmp r1, r2, r3

bgt r1, label

♦ Compare and Branch Ex: bgt r1, r2, label

(54)

Conditional Branch Distance Conditional Branch Distance

Bits of Branch Dispalcement 0%

10%

20%

30%

40%

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Int. Avg. FP Avg.

• 25% of integer branches are 2 to 4 instructions

(55)

Conditional Branch Addressing Conditional Branch Addressing

♦ PC-relative since most branches are relatively close to the current PC

♦ At least 8 bits suggested (±128 instructions)

♦ Compare Equal/Not Equal most important for integer programs (86%)

0% 50% 100%

EQ/NE GT/LE LT/GE

37%

23%

40%

86%

7%

Int Avg.

FP Avg.

(56)

Arquitectura de Computadoras ISA- 56

MIPS Compare and Branch MIPS Compare and Branch

♦ Compare and Branch

■ BEQ rs, rt, offset if R[rs] == R[rt] then PC-relative branch

■ BNE rs, rt, offset <>

♦ Compare to zero and Branch

■ BLEZ rs, offset if R[rs] <= 0 then PC-relative branch

■ BGTZ rs, offset >

■ BLT <

■ BGEZ >=

■ BLTZAL rs, offset if R[rs] < 0 then branch and link (into R 31)

■ BGEZAL >=!

♦ Remaining set of compare and branch ops take two instructions

♦ Almost all comparisons are against zero!

(57)

MIPS jump, branch, compare MIPS jump, branch, compare

instructions instructions

Instruction Example Meaning

branch on equal beq $1,$2,100 if ($1 == $2) go to PC+4+100 Equal test; PC relative branch

branch on not eq. bne $1,$2,100 if ($1!= $2) go to PC+4+100 Not equal test; PC relative

set on less than slt $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; 2’s comp.

set less than imm. slti $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; 2’s comp.

set less than uns. sltu $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; unsigned numbers

set l. t. imm. uns. sltiu $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; unsigned numbers

jump j 10000 go to 10000

Jump to target address

(58)

Signed vs. Unsigned Comparison Signed vs. Unsigned Comparison

R1=

0…00 0000 0000 0000 0001

R2=

0…00 0000 0000 0000 0010

R3=

1…11 1111 1111 1111 1111

♦ After executing these instructions:

slt r4,r2,r1 ; if (r2 < r1) r4=1; else r4=0 slt r5,r3,r1 ; if (r3 < r1) r5=1; else r5=0 sltu r6,r2,r1 ; if (r2 < r1) r6=1; else r6=0 sltu r7,r3,r1 ; if (r3 < r1) r7=1; else r7=0

♦ What are values of registers r4 - r7? Why?

r4 = ; r5 = ; r6 = ; r7 = ;

two two two

(59)

Calls: Why Are Stacks So Great?

Stacking of Subroutine Calls & Returns and Environments:

A:

CALL B

CALL C C:

RET

RET B:

A

A B

A B C

A B

A

Some machines provide a memory stack as part of the architecture (e.g., VAX)

(60)

Memory Stacks Memory Stacks

Useful for stacked environments/subroutine call & return even if operand stack not part of architecture

Stacks that Grow Up vs. Stacks that Grow Down:

a b c

0 Little

inf. Big 0 Little

inf. Big

Memory Addresses SP

Next Empty?

Last Full?

How is empty stack represented?

Little --> Big/Last Full

POP: Read from Mem(SP) Decrement SP

PUSH: Increment SP

grows up

grows down

Little --> Big/Next Empty POP: Decrement SP

Read from Mem(SP) PUSH: Write to Mem(SP)

(61)

Call Call - - Return Linkage: Stack Frames Return Linkage: Stack Frames

FP

ARGS

Callee Save Registers

Local Variables

SP

Reference args and local variables at fixed (positive) offset from FP

Grows and shrinks during expression evaluation (old FP, RA)

Many variations on stacks possible (up/down, last pushed / next )

High Mem

Low Mem

(62)

0 zero constant 0

1 at reserved for assembler 2 v0 expression evaluation &

3 v1 function results 4 a0 arguments

5 a1 6 a2 7 a3

8 t0 temporary: caller saves . . . (callee can clobber) 15 t7

MIPS: Software conventions for MIPS: Software conventions for

Registers Registers

16 s0 callee saves . . . (callee must save) 23 s7

24 t8 temporary (cont’d) 25 t9

26 k0 reserved for OS kernel 27 k1

28 gp Pointer to global area 29 sp Stack pointer

30 fp frame pointer

31 ra Return Address (HW)

(63)

MIPS / GCC Calling Conventions MIPS / GCC Calling Conventions

FP

fact: SP

addiu $sp, $sp, -32 sw $ra, 20($sp) sw $fp, 16($sp) addiu $fp, $sp, 32 . . .

sw $a0, 0($fp) ...

lw $31, 20($sp) lw $fp, 16($sp) addiu $sp, $sp, 32

ra old FP

ra low

address

FP SP ra

FP SP

(64)

Details of the MIPS instruction set Details of the MIPS instruction set

♦ Register zero always has the value zero (even if you try to write it)

♦ Branch/jump and link put the return addr. PC+4 or 8 into the link register (R31) (depends on logical vs physical architecture)

♦ All instructions change all 32 bits of the destination register (including lui, lb, lh) and all read all 32 bits of sources (add, sub, and, or, …)

♦ Immediate arithmetic and logical instructions are extended as follows:

■ logical immediates ops are zero extended to 32 bits

■ arithmetic immediates ops are sign extended to 32 bits (including addu)

♦ The data loaded by the instructions lb and lh are extended as follows:

■ lbu, lhu are zero extended

■ lb, lh are sign extended

♦ Overflow can occur in these arithmetic and logical instructions:

■ add, sub, addi

■ it cannot occur in addu, subu, addiu, and, or, xor, nor, shifts, mult, multu, div, divu

(65)

Delayed Branches Delayed Branches

♦ In the “Raw” MIPS, the instruction after the branch is executed even when the branch is taken?

■This is hidden by the assembler for the MIPS “virtual machine”

li r3, #7 sub r4, r4, 1 bz r4, LL addi r5, r3, 1 subi r6, r6, 2 LL: slt r1, r3, r5

(66)

Branch & Pipelines Branch & Pipelines

execute

Branch

Delay Slot

Branch Target

By the end of Branch instruction, the CPU knows whether or not the branch will take place.

However, it will have fetched the next instruction by then, regardless of whether or not a branch will be taken.

Why not execute it?

ifetch execute

ifetch execute LL: slt r1, r3, r5

li r3, #7

sub r4, r4, 1 bz r4, LL

addi r5, r3, 1

Time

ifetch execute

(67)

Filling Delayed Branches Filling Delayed Branches

IF DEC & OP fetch

Branch:

execute successor even if branch taken!

Then branch target

or continue Single delay slot

impacts the critical path

•Compiler can fill a single delay slot with a useful instruction 50% of the time.

• try to move down from above jump

•move up from target, if safe

add r3, r1, r2 sub r4, r4, 1 bz r4, LL

NOP ...

Execute

IF DEC & OP fetch Execute

IF

(68)

Miscellaneous MIPS I instructions Miscellaneous MIPS I instructions

♦ break A breakpoint trap occurs, transfers

control to exception handler

♦ syscall A system trap occurs, transfers control to exception handler

♦ coprocessor instrs. Support for floating point

♦ TLB instructions Support for virtual memory: discussed later

♦ restore from exception Restores previous interrupt mask &

kernel/user mode bits into status register

♦ load word left/right Supports misaligned word loads

♦ store word left/right Supports misaligned word stores

(69)

MIPS:

MIPS: Instruction Instruction Set Set Format Format

■ load/store architecture with 3 explicit operands (ALU ops)

■ fixed 32-bit instructions

■ 3 instruction formats

» R-Type

» I-Type

» J-Type

■ 6 instruction set groups:

» load/store - data movement operations

» computational - arithmetic, logical, and shift operations

» jump/branch - including call and returns

» coprocessor - FP instructions

» coprocessor0 - memory management and exception handling

(70)

R2000/3000

R2000/3000 Instruction Instruction Formats Formats

♦ R-type (register)

e.g. add $8, $17, $18 # $8 = $17 + $18

OpCode rs rt rd shamt funct

0 5

6 10

11 15

16 20

21 25

26 31

0 17 18 8 0 32

0 5

6 10

11 15

16 20

21 25

26 31

(71)

• I-type (immediate)

e.g. addi $8, $17, -44 # $8 = $17 -44

lw $8, -44($17) # $8 = M[$17 - 44]

beq $17, $8, label # if( $8 == $17) go to label:

OpCode rs rt immediate

0 15

16 20

21 25

26 31

“op” 17 8 -44

0 15

16 20

21 25

26 31

R2000/3000

R2000/3000 Instruction Instruction Formats Formats

(72)

• J-type (jump)

e.g. jump label # call label: ;

$31 = $pc + 8

OpCode target

0 25

26 31

3 -44

0 25

26 31

R2000/3000

R2000/3000 Instruction Instruction Formats Formats

(73)

Bhandarkar

Bhandarkar and and Clark: RISC vs. CISC Clark: RISC vs. CISC

♦ Compares the VAX 8700 vs MIPS M/2000 (R3000 chip)

♦ Combines three fractors:

■ Architecture

■ Implementation

■ Compilers and OS

♦ Argues that:

■ Implementation effects are second order

■ Compilers are “similar”

■ RISCs are better than CISCs

(74)

Bhandarkar

Bhandarkar and and Clark, cont. Clark, cont.

RISC factor

Risc Facto Instr

CPI Instr

vax mips mips

r = CPI_vax

Bechmark Inst.

Ratio

CPI MIPS

CPI VAX

Ratio RISC factor

spice2g6 2.5 1.8 8.0 4.4 1.8

matrix300 2.4 3.0 13.8 4.5 1.9

nasa7 2.1 3.0 15.0 5.0 2.4

fpppp 2.9 1.5 15.2 10.5 2.7

tomcatv 2.9 2.1 17.5 8.2 2.9

dudoc 2.7 1.7 13.2 7.9 3.0

espresso 1.7 1.1 5.4 5.1 3.0

eqntott 1.1 1.3 4.4 3.5 3.3

li 1.6 1.1 6.5 6.0 3.7

geo. mean 2.2 1.7 9.9 5.8 2.7

(75)

Bhandarkar

Bhandarkar and and Clark, cont. Clark, cont.

♦ Compensation Factors

■ Increase VAX CPI but decrease VAX instruction count

■ Increase MIPS instruction count

■ Example 1: Loads and stores vs. operand specifiers

■ Example 2: Necessary complex operations, e.g. loop branches

♦ Factors favoring VAX

■ Big immediate values

■ Not-taken branches incur no delay

(76)

Bhandarkar

Bhandarkar and and Clark, cont. Clark, cont.

♦ Factors favoring MIPS

■ Operand specifier decoding

■ Number of registers

■ Separating floating point unit

■ Simple jumps and branches (lower latency)

■ Fancy VAX instructions: Unnecessary functionality

■ Instruction scheduling

■ Translation buffer

■ Branch displacement size

(77)

Homework

Homework Assignment Assignment 3 3

♦ Make a review of the ISA for a specific processor

♦ Write a three page report explaining the following:

1. Kind of IS Architecture (RISC, CISC or other, 16-bit, 32- bit, 64-bit)

2. Classes of instructions (ALU, Memory Movement, Branches, etc.)

3. Addressing Modes (immediate, base+displacemente, indirect, etc.)

4. Displacements in branches and control flow instructions (call, ret)

5. Special instructions: system calls, traps, access to

special purpose registers

(78)

Hw Hw 3 3 : List of Processors : List of Processors

1 Claudia Méndez Garza Xscale o ARM

2 José Alberto Ramírez Uresti Opteron AMD 64 bits

3 Víctor Echeverría Ríos Texas Instruments TMS320DM64x o un DSP

Due date: September 26th, 2008.

(79)

L L á á minas complementarias no minas complementarias no expuestas en la clase que pueden expuestas en la clase que pueden

servir de soporte

(80)

VAX VAX - - 11 11

♦ Introduced by DEC in 1977: VAX- 11/780

♦ Upward compatible from PDP-11

♦ 32-bit “word” and addresses

♦ Virtual memory is first-class

♦ 16 GPRs (r15 is PC, r14 is SP), CCs

♦ Extremely orthogonal, memory- memory

♦ Decode as byte stream

■ Opcode: operation, number of operands

& operand type

■ Variable-length address specifiers

(81)

VAX VAX - - 11, cont. 11, cont.

♦ Data types

■ 8-, 16-, 32-, 64-, 128-bit integers

■ F (32 bits), D (64), G (64), H (128) FP

■ Character string (8-bits/char)

■ Decimal (4-bits/digit)

■ Numeric string (8-bits/digit)

♦ Addresing modes include

–Literal (6 bits)

–8-, 16-, 32-bit immediates –Register, register deferred –8-, 16-, 32-bit displacements

–Indexed

–Autoincrement –Autodecrement

–Autoincrement deferred