• No results found

Instruction Set Architecture

N/A
N/A
Protected

Academic year: 2021

Share "Instruction Set Architecture"

Copied!
89
0
0

Loading.... (view fulltext now)

Full text

(1)

Laboratorio de Tecnologías de Información

Instruction Set Architecture Instruction Set Architecture

Arquitectura de Computadoras Arquitectura de Computadoras

Arturo D

Arturo Díaz Píaz Péérezrez Centro de Investigaci

Centro de Investigacióón y de Estudios Avanzados del IPNn y de Estudios Avanzados del IPN Laboratorio de Tecnolog

Laboratorio de Tecnologíías de Informacias de Informacióónn

(2)

Laboratorio de Tecnologías de Información

Instruction

Instruction Set Set

♦ ... the attributes of a [computing] system as seen by the programmer, i. e., the conceptual structure and functional behavior, as distinct from the

organization of the data flows and controls of logic design, and the physical implementation.

■ Amdahl, Blaaw, Brooks, 1964.

(3)

Laboratorio de Tecnologías de Información

Instruction Set Design Instruction Set Design

instruction set

software

hardware

(4)

Laboratorio de Tecnologías de Información

Instruction Set Architecture Instruction Set Architecture

... the attributes of a [computing] system as seen by the programmer, i. e., the conceptual structure and functional

behavior, as distinct from the organization of the data flows and controls of logic design, and the physical implementation.

Amdahl, Blaaw, Brooks, 1964.

Organization of programmable storage

Data types & data structures: encodings and representations

Instruction formats

Instruction (or Operand Code) Set

Modes of addressing and accessing data items and instructions

Exceptional conditions

(5)

Laboratorio de Tecnologías de Información

ISA: What Must be Specified?

ISA: What Must be Specified?

Instruction Fetch Instruction

Decode Operand

Fetch Execute

Result Store

Next

Instruction Format or Encoding

how is it decoded?

Location of operands and result

where other than memory?

how many explicit operands?

how are memory operands located?

which can or cannot be in memory?

Data type and Size

Operations

what are supported

Successor instruction

(6)

Laboratorio de Tecnologías de Información

Evolution

Evolution of of Instruction Instruction Sets Sets

Single Accumulator (EDSAC 1950) Accumulator + Index Registers

(Manchester Mark I, IBM 700 series 1953 Separation of Programming Model

from Implementation High-level Language Based

(B5000 1963)

Concept of a Family IBM 360 1964 General Purpose Register Machines

Complex Instruction Sets (Vax, Intel 432 1977-80)

Load/Store Architecture (CDC 6600, Cray 1 1963-76)

RISC: MIPS, Sparc, 88000, IBM RS6000, ... 1987

(7)

Laboratorio de Tecnologías de Información

Basic ISA

Basic ISA Classes Classes

♦ Accumulator:

1 address add A acc ← acc + mem[A]

1+x address addx A acc ← acc + mem[A + x]

♦ Stack

0 address add tos ← tos + next

♦ General Purpose register

2 address add A B EA(A) ← EA(A) + EA(B) 3 address add A B C EA(A) ← EA(B) + EA(C)

♦ Load/Store

3 address add Ra Rb Rc Ra ← Rb + Rc load Ra Rb Ra ← mem[Rb]

(8)

Laboratorio de Tecnologías de Información

Code sequence for (C = A + B) for four classes of instruction sets:

Accumulator Load A Add B Store C

Register

(register-memory) Load R1,A

Add R1,B Store C, R1 Stack

Push A Push B Add Pop C

Register (load-store) Load R1,A Load R2,B

Add R3,R1,R2 Store C,R3

Comparison:

Bytes per instruction? Number of Instructions? Cycles per instruction?

Comparing Number of Instructions

Comparing Number of Instructions

(9)

Laboratorio de Tecnologías de Información

General Purpose Registers Dominate General Purpose Registers Dominate

♦ 1975-200x all machines use general purpose registers

♦ Advantages of registers

■ registers are faster than memory

■ registers are easier for a compiler to use

» e.g., (A*B) – (C*D) – (E*F) can do multiplies in any order vs.

stack

■ registers can hold variables

» memory traffic is reduced, so program is sped up (since registers are faster than memory)

(10)

Laboratorio de Tecnologías de Información

Caches vs.

Caches vs. Registers Registers

♦ Registers advantages

■ Faster (no addressing mode, no tags)

■ Deterministic (no misses)

■ Can duplicate for two ports

■ Short identifier (3-8 bits)

♦ Register disadvantages

■ Must save/restore on procedure calls

■ Can’t take the address of a register

■ Fixed size (FP, strings, structures)

■ Compiler must control (?)

(11)

Laboratorio de Tecnologías de Información

Caches vs.

Caches vs. Registers Registers ( ( cont cont d d ) )

♦ How many registers? More means

+ Hold operands longer (reducing memory traffic & potentially execution time)

- Longer register specifiers (except with register windows) - Slow registers

- More state slows context switches

(12)

Laboratorio de Tecnologías de Información

♦ Programmable storage

■ 232 x bytes of memory

■ 31 x 32-bit GPRs (R0 = 0)

■ 32 x 32-bit FP regs (paired DP)

■ HI, LO, PC

r0 0

r1

°

°

° r31 PC lo hi

MIPS I Registers

MIPS I Registers

(13)

Laboratorio de Tecnologías de Información

Data Movement Load (from memory)

Store (to memory)

memory-to-memory move register-to-register move input (from I/O device) output (to I/O device)

push, pop (to/from stack)

Arithmetic integer (binary + decimal) or FP Add, Subtract, Multiply, Divide

Logical not, and, or, set, clear

Shift shift left/right, rotate left/right

Typical Operations

Typical Operations

(14)

Laboratorio de Tecnologías de Información

Control (Jump/Branch) unconditional, conditional Subroutine Linkage call, return

Interrupt trap, return

Synchronization test & set (atomic r-m-w)

String search, translate

Graphics (MMX) parallel subword ops (4 16bit add)

Typical Operations Typical Operations

little change since 1960

(15)

Laboratorio de Tecnologías de Información

° Rank instruction Integer Average Percent total executed

1 load 22%

2 conditional branch 20%

3 compare 16%

4 store 12%

5 add 8%

6 and 6%

7 sub 5%

8 move register-register 4%

9 call 1%

10 return 1%

Top 10 80x86 Instructions

Top 10 80x86 Instructions

(16)

Laboratorio de Tecnologías de Información

Operation Summary Operation Summary

♦ Support these simple instructions, since they

will dominate the number of instructions executed:

■ load,

■ store,

■ add,

■ subtract,

■ move register-register,

■ and,

■ shift,

■ compare equal, compare not equal,

■ branch, jump,

■ call,

■ return;

(17)

Laboratorio de Tecnologías de Información

Operands

Operands for for ALU ALU instructions instructions

♦ ALU instructions combine operands (e.g. ADD)

♦ Number of explicit operands

■ Two - destination equals one source

■ Three - orthogonal

♦ Operands in registers or memory

■ Any combination -- VAX

» (orthogonal, but variable instr. formats)

■ At least one register -- much of 360

» (not orthogonal)

■ All registers -- CRAY, DLX, RISCs

» (orthogonal, but needs loads/stores)

(18)

Laboratorio de Tecnologías de Información

Memory Addressing Memory Addressing

Since 1980 almost every machine uses addresses to level of 8-bits (byte)

2 questions for design of ISA:

Since could read a 32-bit word as four loads of bytes from sequential byte

addresses or as one load word from a single byte address,

» How do byte addresses map onto words?

» Can a word be placed on any byte boundary?

(19)

Laboratorio de Tecnologías de Información

Big Endian: address of most significant byte = word address (xx00 = Big End of word)

IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA

Little Endian: address of least significant byte = word address (xx00 = Little End of word)

Intel 80x86, DEC Vax, DEC Alpha (Windows NT)

Mode selectable

becoming more common: PowerPC, MIPS R10000

msb lsb

3 2 1 0

little endian byte 0

Addressing Objects:

Addressing Objects: Endian Endian Wars Wars

(20)

Laboratorio de Tecnologías de Información

0 1 2 3 Aligned

Not Aligned

Addressing Objects: Alignment Addressing Objects: Alignment

♦ Alignment: require that objects fall on address that is

multiple of their size.

(21)

Laboratorio de Tecnologías de Información

Alignment Alignment

No restrictions

Simpler software

Hardware must detect misalignment and make 2 memory accesses

expensive logic, slows down all references

sometimes required for backward compatibility

Restrictred alignment

software must guarantee alignment

hardware only detecs misalignment and traps

trap handler does it

Middle group

misaligned data ok but requires multiple instructions

compiler must skill know

still trap on misaligned access

(22)

Laboratorio de Tecnologías de Información

A A typical typical RISC RISC

♦ 32-bit fixed format instruction (3 formats)

♦ 32 32-bit GPR (R0 contains zero, DP take pair)

♦ 3-address, reg-reg arithmetic instruction

♦ Single address mode for load/store: base+displacement

■ no indirection

♦ Simple branch conditions

♦ Delay branch

see: SPARC, MIPS MC88100, AMD2900, i960, i860,

PARisc, DEC Alpha, Clipper, CDC 6600, CDC 7600,

Cray-1, Cray-2, Cray-3, ...

(23)

Laboratorio de Tecnologías de Información

VAX VAX - - 11 11

♦ Variable format, 2 and 3 address instruction

♦ 32-bit word size, 16 GPR (four reserved)

♦ Rich set of addressing modes (apply to any operand)

♦ Rich set of operations

■ bit-field, stack, call, case, loop, string, poly, system)

♦ Rich set of data types (B, W, L, Q, O, F, D, G, H)

♦ Condition codes

OpCode A/M A/M A/M

Byte 0 1 n m

(24)

Laboratorio de Tecnologías de Información

VAX VAX - - 11: 11: Addressing Addressing Modes Modes

1. Register Ri

2. Base + Displacement M[Ri + v]

3. Immediate v

4. Register Indirect M[Ri]

5. Direct (absolute) M[v]

6. Base + Index M[Ri + Rj]

7. Scaled Index M[Ri + Rj*d + v]

8. Autoincrement M[Ri++]

9. Autodecrement M[Ri--]

10. Memory Indirec M[ M[Ri] ]

11. [Indirection chains]

Modes 1-4 account for 93 % of all operands on the VAX

Memory

Register File

(25)

Laboratorio de Tecnologías de Información

Addressing Mode Usage Addressing Mode Usage

♦ 3 programs measured on machine with all address modes (VAX)

--- Displacement: 42% avg, 32% to 55%

--- Immediate: 33% avg, 17% to 43%

--- Register deferred (indirect): 13% avg, 3% to 24%

--- Scaled: 7% avg, 0% to 16%

--- Memory indirect: 3% avg, 1% to 6%

--- Misc: 2% avg, 0% to 3%

♦ 75% displacement & immediate

♦ 85% displacement, immediate & register indirect

75% 85%

(26)

Laboratorio de Tecnologías de Información

° Avg. of 5 SPECint92 programs v. avg. 5 SPECfp92 programs

° 1% of addresses > 16-bits

° 12 - 16 bits of displacement needed 0%

5%

10%

15%

20%

25%

30%

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Int. Avg. FP Avg.

Address Bits

Displacement Address Size?

Displacement Address Size?

(27)

Laboratorio de Tecnologías de Información

Immediate Size?

Immediate Size?

♦ 50% to 60% fit within 8 bits

♦ 75% to 80% fit within 16 bits

(28)

Laboratorio de Tecnologías de Información

Addressing Summary Addressing Summary

♦ Data Addressing modes that are important:

■ Displacement, Immediate, Register Indirect

♦ Displacement size should be 12 to 16 bits

♦ Immediate size should be 8 to 16 bits

(29)

Laboratorio de Tecnologías de Información

Variable:

Fixed:

Hybrid:

Generic Example of Instruction Format Widths

Generic Example of Instruction Format Widths

(30)

Laboratorio de Tecnologías de Información

Instruction Formats Instruction Formats

♦ If code size is most important, use variable length instructions

♦ If performance is most important, use fixed length instructions

♦ Recent embedded machines (ARM, MIPS) added optional mode to execute subset of 16-bit wide

instructions (Thumb, MIPS16); per procedure decide performance or density

♦ Some architectures actually exploring on-the-fly

decompression for more density.

(31)

Laboratorio de Tecnologías de Información

Instruction Format Instruction Format

♦ If have many memory operands per instruction and/or many addressing modes:

=>Need one address specifier per operand

♦ If have load-store machine with 1 address per instr.

and one or two addressing modes:

=> Can encode addressing mode in the opcode

(32)

Laboratorio de Tecnologías de Información

op rs rt rd

register Register (direct)

immed

op rs rt

Immediate

• All instructions 32 bits wide

Base+index

immed

op rs rt

register +

Memory

PC-relative

immed

op rs rt

PC +

Memory

Register Indirect?

MIPS Addressing Modes/Instruction MIPS Addressing Modes/Instruction

Formats

Formats

(33)

Laboratorio de Tecnologías de Información

Most Popular ISA of all time: Intel Most Popular ISA of all time: Intel

80x86 80x86

1971: Intel invents microprocessor 4004/8008, 8080 in 1975

1975: Gordon Moore realized one more chance for new ISA before ISA locked in for decades

hired CS people in Oregon

weren’t ready in 1977 (CS people did 432 in 1980)

started crash effort for 16-bit microcomputer

1978: 8086 dedicated registers, segmented address, 16 bit

8088; 8-bit external bus version of 8086

1980: IBM selects 8088 as basis for IBM PC

1980: 8087 floating point coprocessor: adds 60 instructions using hybrid stack/register scheme

1982: 80286 24-bit address, protection, memory mapping

(34)

Laboratorio de Tecnologías de Información

Intel x86 (IA

Intel x86 (IA - - 32) 32)

1989: 80486 & Pentium in 1992: faster + MP few instructions

1997: MMX multimedia extensions

200X: Superseded by IA-64 (Merced, McKinley, Itanium, etc.)

“Difficult to explain and impossible to love”

See H&P Appendix D.8

Eight 32-bit registers (EAX, EBX, ..., but also ESP, EBP)

Also 16- and 8-bit version (AX, AH, AL)

Most instructions have two operands, one possibly from memory

One super-duper addressing mode w/ effective address =

base_reg + (index_reg * scaling_factor) + displacement

Many formats: see H&P fig. D.8

(35)

Laboratorio de Tecnologías de Información

Intel MMX Intel MMX

♦ MultiMedia eXtension to IA-32 [Peleg & Weiser, IEEE Micro, 8/96]

■ Multimedia data values often need much less than 32 bits

■ But are organized in groups (e.g. red/green/blue)

■ So in 64-bit FP registers: 2x32, 4x16, 8x8

♦ E.g. ADDB (for byte)

■ 17 87 100 ... 6 more

■ +17 13 200 ... 6 more

■ --- --- --- ... ---

■ 34 100 255 ... 6 more

♦ MMX takes 16-element dot product (a

0

*b

0

+ a

1

*b

1

+ ... +

a *b )

(36)

Laboratorio de Tecnologías de Información

Control (Jump/Branch) unconditional, conditional Subroutine Linkage call, return

Interrupt trap, return

Synchronization test & set (atomic r-m-w)

String search, translate

Graphics (MMX) parallel subword ops (4 16bit add)

Typical Operations Typical Operations

little change since 1960

(37)

Laboratorio de Tecnologías de Información

Control Instructions Control Instructions

Taken or not taken?

Where is the target?

Link return address

Save or restore state Conditional

branches

X X

Jumps X

Procedure calls

X X X

Procedure returns

X X

O.S. calls X X X

O.O. returns X X

(38)

Laboratorio de Tecnologías de Información

(1) (1) Taken or not taken ? Taken or not taken ?

♦ Compare and branch instruction

+ No extra compare instruction

+ No state passed between instructions - Requires ALU operation

- Restricts code scheduling opportunities

♦ Implicitly set condition codes (Z, N, V, C)

+ Can be set “for free”

- Constrains code reordering - Extra state to save and restore

♦ Explicitly set condition codes (Z, N, V, C)

+ Can be set “for free”

+ Decouples branch/fetch from pipeline - Extra state to save and restore

(39)

Laboratorio de Tecnologías de Información

(1) (1) Taken or not taken ?, cont. Taken or not taken ?, cont.

♦ Condition in general-purpose register

+ No special state to save and implement but uses up a register - branch condition separated from branch logic in pipeline

♦ Some data for MIPS

■ > 80 % of compares for branches use immediates

■ > 80 % of these immediates are zero

■ 50 % compares for branches are =0 or != 0

♦ Compromise used in MIPS

■ Have branch-if = 0 and branch-if != 0

■ Have compare instructions (r1=r2, r1 != r2, r1 < r2, r1 <= r2, etc.)

With pipelining, can we predict whether taken ?

(40)

Laboratorio de Tecnologías de Información

Arquitectura de Computadoras ISA- 40

(2) Where is the target ? (2) Where is the target ?

♦ Could use Arbitrary Specifier ?

+ Orthogonal and powerful

- More bits to specify, more time to decode

- branch execution and target separated in pipeline

♦ PC-relative with immediate

+ Position independence (helps linking), target computable in branch unit

+ Short immediate sufficient. MIPS word immediate:

<= 4 bits: 47 %

<= 8 bits: 94 %

<= 12 bits: 100 %

- Target must be known statically (to link) - Can’t jump arbitrarily far

- Other techniques are required for returns and distance jumps

(41)

Laboratorio de Tecnologías de Información

(2) Where is the target ?, cont.

(2) Where is the target ?, cont.

♦ Register

+ Short specification + Can jump anywhere + Dynamic target okay

(returns)

- Extra instruction to load register

♦ (Vectored) Trap

Critical for O.S. calls + Protection.

- Implementation headache

♦ Common compromise

■ (Conditional) branches (pc-rel)

■ (Unconditional) jumps (pc-rel, reg)

■ Procedure calls (pc-rel, reg)

■ Procedure returns (reg)

■ O.S. calls (trap)

■ O.S. returns (reg)

(42)

Laboratorio de Tecnologías de Información

(3) Link return address ? (3) Link return address ?

Implicit register + Fast, simple

- SW must save register before next call

- Surprise traps or interrups ?

Explicit register

- No important advantages over above

- Register must be specified

Required for procedure calls and O.S. calls

Processor stack

+ Recursion supported directly - Complex instruction

Many recent architectures use implicit register

(43)

Laboratorio de Tecnologías de Información

(4) (4) Save or restore state ? Save or restore state ?

♦ What state ?

■ Procedure calls: registers

■ O.S. calls: registers and PSW (incl. CCs)

♦ Hardware need not save registers

■ Caller can save registers in use

■ Callee can save registers it will use

♦ Hardware register save

■ Which (IBM STM, VAX CALLS) ?

■ Is the above faster ?

■ Register windows

(44)

Laboratorio de Tecnologías de Información

MIPS:

MIPS: Register Register State State

♦ 32 integer registers

■ $0 is hardwared to 0

■ $31 is the return address register

■ software convention for other registers

♦ 32 single-precision FP registers or 16 double- precision FP registers

♦ PC and other special registers

(45)

Laboratorio de Tecnologías de Información

MIPS I Operation Overview MIPS I Operation Overview

♦ Arithmetic Logical:

■ Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU

■ AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI

■ SLL, SRL, SRA, SLLV, SRLV, SRAV

♦ Memory Access:

■ LB, LBU, LH, LHU, LW, LWL,LWR

■ SB, SH, SW, SWL, SWR

(46)

Laboratorio de Tecnologías de Información

Multiply / Divide Multiply / Divide

♦ Start multiply, divide

■ MULT rs, rt

■ MULTU rs, rt

■ DIV rs, rt

■ DIVU rs, rt

♦ Move result from multiply, divide

■ MFHI rd

■ MFLO rd

♦ Move to HI or LO

■ MTHI rd

■ MTLO rd

♦ Why not third field for destination?

■ (Hint: how many clock cycles for multiply or divide vs. add?) Registers

HI LO

(47)

Laboratorio de Tecnologías de Información

Data Types Data Types

Bit: 0, 1

Bit String: sequence of bits of a particular length 4 bits is a nibble

8 bits is a byte

16 bits is a half-word 32 bits is a word

64 bits is a double-word Character:

ASCII 7 bit code

UNICODE 16 bit code Decimal:

digits 0-9 encoded as 0000b thru 1001b two decimal digits packed per 8 bit byte Integers:

2's Complement

Floating Point: exponent How many +/- #'s?

(48)

Laboratorio de Tecnologías de Información

Operand Size Usage Operand Size Usage

Frequency of reference by size

0% 20% 40% 60% 80%

Byte Halfword

Word Doubleword

0%

0%

31%

69%

7%

19%

74%

0%

Int Avg.

FP Avg.

Support for these data sizes and types:

8-bit, 16-bit, 32-bit integers and

32-bit and 64-bit IEEE 754 floating point numbers

(49)

Laboratorio de Tecnologías de Información

MIPS arithmetic instructions MIPS arithmetic instructions

Instruction Example Meaning Comments

add add $1,$2,$3 $1 = $2 + $3 3 operands; exception possible subtract sub $1,$2,$3 $1 = $2 – $3 3 operands; exception possible add immediate addi $1,$2,100 $1 = $2 + 100 + constant; exception possible add unsigned addu $1,$2,$3 $1 = $2 + $3 3 operands; no exceptions subtract unsigned subu $1,$2,$3 $1 = $2 – $3 3 operands; no exceptions add imm. unsign. addiu $1,$2,100 $1 = $2 + 100 + constant; no exceptions multiply mult $2,$3 Hi, Lo = $2 x $3 64-bit signed product

multiply unsigned multu$2,$3 Hi, Lo = $2 x $3 64-bit unsigned product

divide div $2,$3 Lo = $2 ÷ $3, Lo = quotient, Hi = remainder Hi = $2 mod $3

divide unsigned divu $2,$3 Lo = $2 ÷ $3, Unsigned quotient & remainder Hi = $2 mod $3

Move from Hi mfhi $1 $1 = Hi Used to get copy of Hi

(50)

Laboratorio de Tecnologías de Información

MIPS logical instructions MIPS logical instructions

Instruction Example Meaning Comment

and and $1,$2,$3 $1 = $2 & $3 3 reg. operands; Logical AND or or $1,$2,$3 $1 = $2 | $3 3 reg. operands; Logical OR xor xor $1,$2,$3 $1 = $2 ⊕ $3 3 reg. operands; Logical XOR nor nor $1,$2,$3 $1 = ~($2 |$3) 3 reg. operands; Logical NOR and immediate andi $1,$2,10 $1 = $2 & 10 Logical AND reg, constant or immediate ori $1,$2,10 $1 = $2 | 10 Logical OR reg, constant xor immediate xori $1, $2,10 $1 = ~$2 &~10 Logical XOR reg, constant shift left logical sll $1,$2,10 $1 = $2 << 10 Shift left by constant

shift right logical srl $1,$2,10 $1 = $2 >> 10 Shift right by constant shift right arithm. sra $1,$2,10 $1 = $2 >> 10 Shift right (sign extend) shift left logical sllv $1,$2,$3 $1 = $2 << $3 Shift left by variable shift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variable

shift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable

(51)

Laboratorio de Tecnologías de Información

MIPS data transfer instructions MIPS data transfer instructions

Instruction Comment SW 500(R4), R3 Store word SH 502(R2), R3 Store half SB 41(R3), R2 Store byte LW R1, 30(R2) Load word LH R1, 40(R3) Load halfword

LHU R1, 40(R3) Load halfword unsigned LB R1, 40(R3) Load byte

LBU R1, 40(R3) Load byte unsigned

LUI R1, 40 Load Upper Immediate (16 bits shifted left by 16)

(52)

Laboratorio de Tecnologías de Información

When does MIPS sign extend?

When does MIPS sign extend?

♦ When value is sign extended, copy upper bit to full value:

Examples of sign extending 8 bits to 16 bits:

00001010 ⇒ 00000000 00001010 10001100 ⇒ 11111111 10001100

♦ When is an immediate value sign extended?

■ Arithmetic instructions (add, sub, etc.) sign extend immediates even for the unsigned versions of the instructions!

■ Logical instructions do not sign extend

♦ Load/Store half or byte do sign extend, but unsigned

versions do not.

(53)

Laboratorio de Tecnologías de Información

Methods of Testing Condition Methods of Testing Condition

Condition Codes

Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions.

ex: add r1, r2, r3 bz label

Condition Register Ex: cmp r1, r2, r3

bgt r1, label

Compare and Branch Ex: bgt r1, r2, label

(54)

Laboratorio de Tecnologías de Información

Conditional Branch Distance Conditional Branch Distance

Bits of Branch Dispalcement 0%

10%

20%

30%

40%

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Int. Avg. FP Avg.

• 25% of integer branches are 2 to 4 instructions

(55)

Laboratorio de Tecnologías de Información

Conditional Branch Addressing Conditional Branch Addressing

PC-relative since most branches are relatively close to the current PC

At least 8 bits suggested (±128 instructions)

Compare Equal/Not Equal most important for integer programs (86%)

0% 50% 100%

EQ/NE GT/LE LT/GE

37%

23%

40%

86%

7%

7%

Int Avg.

FP Avg.

(56)

Laboratorio de Tecnologías de Información

Arquitectura de Computadoras ISA- 56

MIPS Compare and Branch MIPS Compare and Branch

♦ Compare and Branch

■ BEQ rs, rt, offset if R[rs] == R[rt] then PC-relative branch

■ BNE rs, rt, offset <>

♦ Compare to zero and Branch

■ BLEZ rs, offset if R[rs] <= 0 then PC-relative branch

■ BGTZ rs, offset >

■ BLT <

■ BGEZ >=

■ BLTZAL rs, offset if R[rs] < 0 then branch and link (into R 31)

■ BGEZAL >=!

♦ Remaining set of compare and branch ops take two instructions

♦ Almost all comparisons are against zero!

(57)

Laboratorio de Tecnologías de Información

MIPS jump, branch, compare MIPS jump, branch, compare

instructions instructions

Instruction Example Meaning

branch on equal beq $1,$2,100 if ($1 == $2) go to PC+4+100 Equal test; PC relative branch

branch on not eq. bne $1,$2,100 if ($1!= $2) go to PC+4+100 Not equal test; PC relative

set on less than slt $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; 2’s comp.

set less than imm. slti $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; 2’s comp.

set less than uns. sltu $1,$2,$3 if ($2 < $3) $1=1; else $1=0 Compare less than; unsigned numbers

set l. t. imm. uns. sltiu $1,$2,100 if ($2 < 100) $1=1; else $1=0 Compare < constant; unsigned numbers

jump j 10000 go to 10000

Jump to target address

(58)

Laboratorio de Tecnologías de Información

Signed vs. Unsigned Comparison Signed vs. Unsigned Comparison

R1=

0…00 0000 0000 0000 0001

R2=

0…00 0000 0000 0000 0010

R3=

1…11 1111 1111 1111 1111

♦ After executing these instructions:

slt r4,r2,r1 ; if (r2 < r1) r4=1; else r4=0 slt r5,r3,r1 ; if (r3 < r1) r5=1; else r5=0 sltu r6,r2,r1 ; if (r2 < r1) r6=1; else r6=0 sltu r7,r3,r1 ; if (r3 < r1) r7=1; else r7=0

♦ What are values of registers r4 - r7? Why?

r4 = ; r5 = ; r6 = ; r7 = ;

two two two

(59)

Laboratorio de Tecnologías de Información

Calls: Why Are Stacks So Great?

Calls: Why Are Stacks So Great?

Stacking of Subroutine Calls & Returns and Environments:

A:

CALL B

CALL C C:

RET

RET B:

A

A B

A B C

A B

A

Some machines provide a memory stack as part of the architecture (e.g., VAX)

(60)

Laboratorio de Tecnologías de Información

Memory Stacks Memory Stacks

Useful for stacked environments/subroutine call & return even if operand stack not part of architecture

Stacks that Grow Up vs. Stacks that Grow Down:

a b c

0 Little

inf. Big 0 Little

inf. Big

Memory Addresses SP

Next Empty?

Last Full?

How is empty stack represented?

Little --> Big/Last Full

POP: Read from Mem(SP) Decrement SP

PUSH: Increment SP

grows up

grows down

Little --> Big/Next Empty POP: Decrement SP

Read from Mem(SP) PUSH: Write to Mem(SP)

(61)

Laboratorio de Tecnologías de Información

Call Call - - Return Linkage: Stack Frames Return Linkage: Stack Frames

FP

ARGS

Callee Save Registers

Local Variables

SP

Reference args and local variables at fixed (positive) offset from FP

Grows and shrinks during expression evaluation (old FP, RA)

Many variations on stacks possible (up/down, last pushed / next )

High Mem

Low Mem

(62)

Laboratorio de Tecnologías de Información

0 zero constant 0

1 at reserved for assembler 2 v0 expression evaluation &

3 v1 function results 4 a0 arguments

5 a1 6 a2 7 a3

8 t0 temporary: caller saves . . . (callee can clobber) 15 t7

MIPS: Software conventions for MIPS: Software conventions for

Registers Registers

16 s0 callee saves . . . (callee must save) 23 s7

24 t8 temporary (cont’d) 25 t9

26 k0 reserved for OS kernel 27 k1

28 gp Pointer to global area 29 sp Stack pointer

30 fp frame pointer

31 ra Return Address (HW)

(63)

Laboratorio de Tecnologías de Información

MIPS / GCC Calling Conventions MIPS / GCC Calling Conventions

FP

fact: SP

addiu $sp, $sp, -32 sw $ra, 20($sp) sw $fp, 16($sp) addiu $fp, $sp, 32 . . .

sw $a0, 0($fp) ...

lw $31, 20($sp) lw $fp, 16($sp) addiu $sp, $sp, 32

ra old FP

ra low

address

FP SP ra

FP SP

(64)

Laboratorio de Tecnologías de Información

Details of the MIPS instruction set Details of the MIPS instruction set

Register zero always has the value zero (even if you try to write it)

Branch/jump and link put the return addr. PC+4 or 8 into the link register (R31) (depends on logical vs physical architecture)

All instructions change all 32 bits of the destination register (including lui, lb, lh) and all read all 32 bits of sources (add, sub, and, or, …)

Immediate arithmetic and logical instructions are extended as follows:

logical immediates ops are zero extended to 32 bits

arithmetic immediates ops are sign extended to 32 bits (including addu)

The data loaded by the instructions lb and lh are extended as follows:

lbu, lhu are zero extended

lb, lh are sign extended

Overflow can occur in these arithmetic and logical instructions:

add, sub, addi

it cannot occur in addu, subu, addiu, and, or, xor, nor, shifts, mult, multu, div, divu

(65)

Laboratorio de Tecnologías de Información

Delayed Branches Delayed Branches

♦ In the “Raw” MIPS, the instruction after the branch is executed even when the branch is taken?

■This is hidden by the assembler for the MIPS “virtual machine”

li r3, #7 sub r4, r4, 1 bz r4, LL addi r5, r3, 1 subi r6, r6, 2 LL: slt r1, r3, r5

(66)

Laboratorio de Tecnologías de Información

Branch & Pipelines Branch & Pipelines

execute

Branch

Delay Slot

Branch Target

By the end of Branch instruction, the CPU knows whether or not the branch will take place.

However, it will have fetched the next instruction by then, regardless of whether or not a branch will be taken.

Why not execute it?

ifetch execute

ifetch execute

ifetch execute LL: slt r1, r3, r5

li r3, #7

sub r4, r4, 1 bz r4, LL

addi r5, r3, 1

Time

ifetch execute

(67)

Laboratorio de Tecnologías de Información

Filling Delayed Branches Filling Delayed Branches

IF DEC & OP fetch

Branch:

execute successor even if branch taken!

Then branch target

or continue Single delay slot

impacts the critical path

•Compiler can fill a single delay slot with a useful instruction 50% of the time.

try to move down from above jump

•move up from target, if safe

add r3, r1, r2 sub r4, r4, 1 bz r4, LL

NOP ...

Execute

IF DEC & OP fetch Execute

IF

(68)

Laboratorio de Tecnologías de Información

Miscellaneous MIPS I instructions Miscellaneous MIPS I instructions

break A breakpoint trap occurs, transfers

control to exception handler

syscall A system trap occurs, transfers control to exception handler

coprocessor instrs. Support for floating point

TLB instructions Support for virtual memory: discussed later

restore from exception Restores previous interrupt mask &

kernel/user mode bits into status register

load word left/right Supports misaligned word loads

store word left/right Supports misaligned word stores

(69)

Laboratorio de Tecnologías de Información

MIPS:

MIPS: Instruction Instruction Set Set Format Format

■ load/store architecture with 3 explicit operands (ALU ops)

■ fixed 32-bit instructions

■ 3 instruction formats

» R-Type

» I-Type

» J-Type

■ 6 instruction set groups:

» load/store - data movement operations

» computational - arithmetic, logical, and shift operations

» jump/branch - including call and returns

» coprocessor - FP instructions

» coprocessor0 - memory management and exception handling

(70)

Laboratorio de Tecnologías de Información

R2000/3000

R2000/3000 Instruction Instruction Formats Formats

♦ R-type (register)

e.g. add $8, $17, $18 # $8 = $17 + $18

OpCode rs rt rd shamt funct

0 5

6 10

11 15

16 20

21 25

26 31

0 17 18 8 0 32

0 5

6 10

11 15

16 20

21 25

26 31

(71)

Laboratorio de Tecnologías de Información

I-type (immediate)

e.g. addi $8, $17, -44 # $8 = $17 -44

lw $8, -44($17) # $8 = M[$17 - 44]

beq $17, $8, label # if( $8 == $17) go to label:

OpCode rs rt immediate

0 15

16 20

21 25

26 31

“op” 17 8 -44

0 15

16 20

21 25

26 31

R2000/3000

R2000/3000 Instruction Instruction Formats Formats

(72)

Laboratorio de Tecnologías de Información

J-type (jump)

e.g. jump label # call label: ;

$31 = $pc + 8

OpCode target

0 25

26 31

3 -44

0 25

26 31

R2000/3000

R2000/3000 Instruction Instruction Formats Formats

(73)

Laboratorio de Tecnologías de Información

Bhandarkar

Bhandarkar and and Clark: RISC vs. CISC Clark: RISC vs. CISC

♦ Compares the VAX 8700 vs MIPS M/2000 (R3000 chip)

♦ Combines three fractors:

■ Architecture

■ Implementation

■ Compilers and OS

♦ Argues that:

■ Implementation effects are second order

■ Compilers are “similar”

■ RISCs are better than CISCs

(74)

Laboratorio de Tecnologías de Información

Bhandarkar

Bhandarkar and and Clark, cont. Clark, cont.

RISC factor

Risc Facto Instr

CPI Instr

vax mips mips

r = CPIvax

Bechmark Inst.

Ratio

CPI MIPS

CPI VAX

Ratio RISC factor

spice2g6 2.5 1.8 8.0 4.4 1.8

matrix300 2.4 3.0 13.8 4.5 1.9

nasa7 2.1 3.0 15.0 5.0 2.4

fpppp 2.9 1.5 15.2 10.5 2.7

tomcatv 2.9 2.1 17.5 8.2 2.9

dudoc 2.7 1.7 13.2 7.9 3.0

espresso 1.7 1.1 5.4 5.1 3.0

eqntott 1.1 1.3 4.4 3.5 3.3

li 1.6 1.1 6.5 6.0 3.7

geo. mean 2.2 1.7 9.9 5.8 2.7

(75)

Laboratorio de Tecnologías de Información

Bhandarkar

Bhandarkar and and Clark, cont. Clark, cont.

♦ Compensation Factors

■ Increase VAX CPI but decrease VAX instruction count

■ Increase MIPS instruction count

■ Example 1: Loads and stores vs. operand specifiers

■ Example 2: Necessary complex operations, e.g. loop branches

♦ Factors favoring VAX

■ Big immediate values

■ Not-taken branches incur no delay

(76)

Laboratorio de Tecnologías de Información

Bhandarkar

Bhandarkar and and Clark, cont. Clark, cont.

♦ Factors favoring MIPS

■ Operand specifier decoding

■ Number of registers

■ Separating floating point unit

■ Simple jumps and branches (lower latency)

■ Fancy VAX instructions: Unnecessary functionality

■ Instruction scheduling

■ Translation buffer

■ Branch displacement size

(77)

Laboratorio de Tecnologías de Información

Homework

Homework Assignment Assignment 3 3

♦ Make a review of the ISA for a specific processor

♦ Write a three page report explaining the following:

1. Kind of IS Architecture (RISC, CISC or other, 16-bit, 32- bit, 64-bit)

2. Classes of instructions (ALU, Memory Movement, Branches, etc.)

3. Addressing Modes (immediate, base+displacemente, indirect, etc.)

4. Displacements in branches and control flow instructions (call, ret)

5. Special instructions: system calls, traps, access to

special purpose registers

(78)

Laboratorio de Tecnologías de Información

Hw Hw 3 3 : List of Processors : List of Processors

1 Claudia Méndez Garza Xscale o ARM

2 José Alberto Ramírez Uresti Opteron AMD 64 bits

3 Víctor Echeverría Ríos Texas Instruments TMS320DM64x o un DSP

Due date: September 26th, 2008.

(79)

Laboratorio de Tecnologías de Información

L L á á minas complementarias no minas complementarias no expuestas en la clase que pueden expuestas en la clase que pueden

servir de soporte

servir de soporte

(80)

Laboratorio de Tecnologías de Información

VAX VAX - - 11 11

♦ Introduced by DEC in 1977: VAX- 11/780

♦ Upward compatible from PDP-11

♦ 32-bit “word” and addresses

♦ Virtual memory is first-class

♦ 16 GPRs (r15 is PC, r14 is SP), CCs

♦ Extremely orthogonal, memory- memory

♦ Decode as byte stream

■ Opcode: operation, number of operands

& operand type

■ Variable-length address specifiers

(81)

Laboratorio de Tecnologías de Información

VAX VAX - - 11, cont. 11, cont.

♦ Data types

■ 8-, 16-, 32-, 64-, 128-bit integers

■ F (32 bits), D (64), G (64), H (128) FP

■ Character string (8-bits/char)

■ Decimal (4-bits/digit)

■ Numeric string (8-bits/digit)

♦ Addresing modes include

–Literal (6 bits)

–8-, 16-, 32-bit immediates –Register, register deferred –8-, 16-, 32-bit displacements

–Indexed

–Autoincrement –Autodecrement

–Autoincrement deferred

References

Related documents

Our refunding formula is such that efficient abate- ment levels are achieved in industrial and in developing countries even if developing countries suffer higher marginal damages

In the year to 30 April 2015 the net asset value (NAV) total return (capital and income), with borrowings at fair value, was 13.0% and the share price total return was 18.8%..

Pooled results from the present meta-analysis, includ- ing four RCTs in patients with diabetic and antiretroviral toxic neuropathy, showed the ef ficacy of ALC compared to placebo

RETURN OF INCOME, CHARGES AND CAPITAL GAINS FOR THE YEAR ENDED 31 DECEMBER 2014 CLAIM FOR TAX CREDITS, ALLOWANCES AND RELIEFS FOR THE YEAR ENDED 31 DECEMBER 2014.. This Tax

Tanaman kemangi yang diberi 100% urine sapi menghasilkan jumlah cabang, diameter batang, bobot segar dan bobot kering lebih besar dibandingkan dengan tanaman yang diberi 100%

What are the driving factors leading companies to request sales tax outsourcing services:. • Complexity of returns at the local level of tax (County

Finally, the seminested PCR was tested in DNA samples extracted from the blood of llamas raised in two regions of Argentina, Buenos Aires Pampa (n = 3) and Jujuy Puna (n = 4),

Vidare undersöks om det finns förbättringspotential inom branschen angående kommunikation av klimatfördelar till kunderna, vilka faktorer i kommunikationen som spelar roll samt