• No results found

Chapter8-Pipelining.ppt

N/A
N/A
Protected

Academic year: 2020

Share "Chapter8-Pipelining.ppt"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

Computer Architecture and

Computer Architecture and

Organization

Organization

Chapter 8

Chapter 8

(2)

What is Pipelining?

What is Pipelining?

Implementation technique whereby multiple

Implementation technique whereby multiple

instructions are overlapped in execution

instructions are overlapped in execution

Throughput of an instruction pipeline is

Throughput of an instruction pipeline is

determined by how often an instruction exists

determined by how often an instruction exists

the pipeline - The

the pipeline - The time per instructiontime per instruction does not does not change, only

change, only throughputthroughput increases increases

(3)

What is Pipelining?

What is Pipelining?

Potential speedup = Number pipe

Potential speedup = Number pipe

stages

stages

Pipeline rate limited by slowest pipeline

Pipeline rate limited by slowest pipeline

stage (

stage (lengthlength of the machine cycle ) of the machine cycle )

Unbalanced lengths of pipe stages

Unbalanced lengths of pipe stages

reduces speedup (involves overhead) -

reduces speedup (involves overhead) -

since the clock can run no faster than the

since the clock can run no faster than the

time needed for the slowest pipeline stage

(4)

What is Pipelining?

What is Pipelining?

Time to “fill” pipeline and time to “drain”

Time to “fill” pipeline and time to “drain”

(

(pipeline bubbles or no-operations

(NOPs), load delay) it reduces speedupit reduces speedup

Dependencies (data, procedural, resource

Dependencies (data, procedural, resource

conflicts, anti-dependency) stall the

conflicts, anti-dependency) stall the

pipeline -

pipeline - implies that there would be a

chain of one or more hazards between the two instructions, thereby reducing or

(5)

True data dependency

True data dependency

ADD r1, r2 (r1 := r1+r2;)

ADD r1, r2 (r1 := r1+r2;)

MOVE r3,r1 (r3 := r1;)

MOVE r3,r1 (r3 := r1;)

Can fetch and decode second instruction

Can fetch and decode second instruction

in parallel with first

in parallel with first

Can NOT execute second instruction until

Can NOT execute second instruction until

first is finished

(6)

Procedural Dependency

Procedural Dependency

Can not execute instructions after a

Can not execute instructions after a

branch in parallel with instructions before a

branch in parallel with instructions before a

branch

branch

Also, if instruction length is not fixed,

Also, if instruction length is not fixed,

instructions have to be decoded to find out

instructions have to be decoded to find out

how many fetches are needed

how many fetches are needed

This prevents simultaneous fetches

(7)

Resource Conflicts

Resource Conflicts

Two or more instructions requiring access

Two or more instructions requiring access

to the same resource at the same time

to the same resource at the same time

– e.g. two arithmetic instructionse.g. two arithmetic instructions

Can duplicate resources

Can duplicate resources

(8)

Anti-dependency

Anti-dependency

Write-write dependency

Write-write dependency

– R3:=R3 + R5; (I1)R3:=R3 + R5; (I1) – R4:=R3 + 1; (I2)R4:=R3 + 1; (I2) – R3:=R5 + 1; (I3)R3:=R5 + 1; (I3)R7:=R3 + R4; (I4)R7:=R3 + R4; (I4)

– I3 can not complete before I2 starts as I2 I3 can not complete before I2 starts as I2

needs a value in R3 and I3 changes R3

(9)

Dependencies

(10)

Extra incrementer to update the PC

Extra incrementer to update the PC

more often (instead of the ALU)

more often (instead of the ALU)

A separate MDR for loads (memory to

A separate MDR for loads (memory to

CPU) and stores (CPU to memory)

CPU) and stores (CPU to memory)

High memory bandwidth to

High memory bandwidth to

accommodate more data to and from

accommodate more data to and from

memory

memory

(11)

Stage

Stage

A

A stagestage of the pipeline is a portion of the of the pipeline is a portion of the pipeline execution where different phases

pipeline execution where different phases

in different instructions execute at the

in different instructions execute at the

same time.

same time.

Stages are connected one to the next to

Stages are connected one to the next to

form a pipe - instructions enter at one end,

form a pipe - instructions enter at one end,

progress through the stages, and exit at

progress through the stages, and exit at

the other end.

(12)

Depth

Depth

The

The depthdepth of a pipeline is the number of of a pipeline is the number of instructions it can overlap

(13)

Designer’s Goal

Designer’s Goal

To balance the length of each pipeline stage

To balance the length of each pipeline stage

For a perfectly balanced stage:

Ns = Number of pipe stages

Time per instruction on the pipelined machine Tip =

Tip =

Tiup = Time per instruction on unpipelined machine

(14)

Example 1

Example 1

Given: Consider an unpipelined machine

Given: Consider an unpipelined machine

with 6 execution stages of lengths 55 ns,

with 6 execution stages of lengths 55 ns,

55 ns, 60 ns, 60 ns, 55 ns, and 55 ns.

55 ns, 60 ns, 60 ns, 55 ns, and 55 ns.

Determine: Instruction latency on this

Determine: Instruction latency on this

machine;

machine;

How much time does it take to execute How much time does it take to execute 100 instructions?

(15)

Solution to Example 1

Solution to Example 1

Instruction latency

Instruction latency = 55+55+60+60+55+55= 340 ns= 55+55+60+60+55+55= 340 ns

Time to execute 100 instructions

Time to execute 100 instructions = 100*340 = 100*340 = 34,000 ns

(16)

Example 2

Example 2

Given: Suppose we introduce pipelining on

Given: Suppose we introduce pipelining on

this machine. Assume that when

this machine. Assume that when

introducing pipelining, the clock skew adds

introducing pipelining, the clock skew adds

5ns of overhead to each execution stage

5ns of overhead to each execution stage

Determine: instruction latency on the

Determine: instruction latency on the

pipelined machine

pipelined machine

How much time does it take to execute How much time does it take to execute 100 instructions?

(17)

Solution to Example 2

Solution to Example 2

Remember that in the pipelined implementation, the

Remember that in the pipelined implementation, the

length of the pipe stages must all be the same,

length of the pipe stages must all be the same,

i.e., the speed of the slowest stage plus overhead.

i.e., the speed of the slowest stage plus overhead.

With 10 ns overhead it comes to:

With 10 ns overhead it comes to:

The length of pipelined stage

The length of pipelined stage = MAX(lengths of = MAX(lengths of unpipelined stages) + overhead = 60 + 10 = 70 ns

unpipelined stages) + overhead = 60 + 10 = 70 ns

Instruction latency

Instruction latency = 70 ns= 70 ns

Time to execute 100 instructionsTime to execute 100 instructions = 70*6*1 + = 70*6*1 + 70*1*99 = 420 + 6930 = 7350 ns

(18)

What is the speedup obtained from

What is the speedup obtained from

pipelining?

pipelining?

Solution:

Solution: not considering any not considering any stallsstalls introduced by introduced by different types of hazards

different types of hazards

Average instruction time not pipelined

Average instruction time not pipelined = 340 ns = 340 ns

Average instruction time pipelined

Average instruction time pipelined = 70 ns = 70 ns

Speedup

(19)

Instruction sequences (Sidetrack)

Instruction sequences (Sidetrack)

1. Fetch the instruction from

memory;

2. Decode the instruction (it

is an arithmetic instruction, but the CPU has to find that out through a decode operation);

3. Fetch the operands from the register file;

4. Apply the operands to the

ALU;

5. Write the result back to

the register file.

1. Fetch the instruction from

memory;

2. Decode the instruction (it is

a branch instruction);

3. Fetch the components of

the address from the

instruction or register file;

4. Apply the components of

the address to the ALU (address arithmetic);

5. Copy the resulting effective

address into the PC, thus accomplishing the branch.

(20)

Load and Store Instructions Sequence

1. Fetch the instruction from memory;

2. Decode the instruction (it is a load or store

instruction);

3. Fetch the components of the address from the

instruction or register file;

4. Apply the components of the address to the

ALU (address arithmetic);

5. Apply the resulting effective address to memory

along with a read (load) or write (store) signal. If it is a write signal, the data item to be written

(21)

Frequency of occurrence of instruction types for a variety of languages. The percentages do not sum

to 100 due to roundoff.

(Adapted from [Knuth, 1991].)

Statement

Statement Average Percent of TimeAverage Percent of Time

Assignment

Assignment 4747

If

If 2323

Call

Call 1515

Loop

Loop 66

Goto

Goto 33

Other

(22)

Pipelined VS. Unpipelined

Pipelined VS. Unpipelined

F – Fetch Instruction D – Decode Instruction

E – Execute to the ALU

Unpipelined

Pipelined

F D O E S F D O

S E

F D O

time

F D O E S

F D O E S

F D O E S

O – Fetch Operand

S – Store Result

Both fetching and executing the instruction is done by one unit

t

F D O E S

(23)

Hazards

Hazards

prevent the next instruction in the

prevent the next instruction in the

instruction stream from being

instruction stream from being

executing during its designated clock

executing during its designated clock

cycle

cycle

Three classes of hazards:

Three classes of hazards:

Structural hazardsStructural hazardsData hazardsData hazards

(24)

Structural Hazard

Structural Hazard

arise from resource conflicts when the

arise from resource conflicts when the

hardware cannot support all possible

hardware cannot support all possible

combinations of instructions in

combinations of instructions in

simultaneous overlapped execution

(25)

Common instances of structural

Common instances of structural

hazards arise when :

hazards arise when :

Some functional unit is not fully pipelined - then

Some functional unit is not fully pipelined - then

a sequence of instructions using that

a sequence of instructions using that

unpipelined unit cannot proceed at the rate of

unpipelined unit cannot proceed at the rate of

one per clock cycle

one per clock cycle

Some resource has not been duplicated enough

Some resource has not been duplicated enough

- to allow all combinations of instructions in the

- to allow all combinations of instructions in the

pipeline to execute

(26)

Solution of Structural Hazard

Solution of Structural Hazard

stall the pipeline for one clock cycle when

stall the pipeline for one clock cycle when

a data-memory access occurs (bubbles)

a data-memory access occurs (bubbles)

Stall - prevent the succeeding instruction

Stall - prevent the succeeding instruction

from doing its phase of the cycle

from doing its phase of the cycle

allows the stalled instruction to proceed

allows the stalled instruction to proceed

without conflict but defeats the purpose of

without conflict but defeats the purpose of

overlapping the instructions for the stage

(27)

Pipeline Bubble

Pipeline Bubble

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Fetch Decode Exec Store R-type

Fetch Decode Exec Fetch Decode Oper Exec Store

Load

Fetch Decode Exec Store R-type

Fetch Decode Exec Store R-type

Fetch Decode Exec Store

Bubble Bubble Bubble Oper

Insert a bubble into the pipeline to prevent two writes or stores at the same cycle

(28)

Delay Store Cycle

Delay Store Cycle

Add NOP (No Operation)

Clock

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Fetch Decode Exec Store R-type

Fetch Decode Exec Fetch Decode Oper Exec Store

Load

Fetch Decode Exec Store R-type

Fetch Decode Exec Store R-type

Fetch Decode Exec Store

NOP

NOP

NOP Oper

(29)

Data Hazard

Data Hazard

arise when an instruction depends on the

arise when an instruction depends on the

result of a previous instruction in a way

result of a previous instruction in a way

that is exposed by the overlapping of

that is exposed by the overlapping of

instructions in the pipeline

(30)

Data Hazard Classification

Data Hazard Classification

RAW (read after write)

RAW (read after write) - - jj tries to read a source before tries to read a source before ii

writes it, so

writes it, so jj incorrectly gets the old value incorrectly gets the old value

Solution: Forwarding (“Forward” result from one stage to

Solution: Forwarding (“Forward” result from one stage to

another)

another)

– The ALU result from the Op Fetch /Exec register is The ALU result from the Op Fetch /Exec register is

always fed back to the ALU input latches

always fed back to the ALU input latches

– If the forwarding hardware detects that the previous If the forwarding hardware detects that the previous

ALU operation has written the register corresponding

ALU operation has written the register corresponding

to the source for the current ALU operation,

to the source for the current ALU operation, control control logic

logic selects the forwarded result as the ALU input selects the forwarded result as the ALU input rather than the value read from the register file

rather than the value read from the register file

(31)

Data Hazard Classification

Data Hazard Classification

WAW (write after write)

WAW (write after write) - - j j tries to write an tries to write an operand before it is written by

operand before it is written by ii. The writes end . The writes end up being performed in the wrong order, leaving

up being performed in the wrong order, leaving

the value written by

the value written by ii rather than the value rather than the value written by

written by jj in the destination in the destination

Solution: Stall

Solution: Stall

(32)

Data Hazard Classification

Data Hazard Classification

WAR (write after read)

WAR (write after read) - - j j tries to write a tries to write a destination before it is read by

destination before it is read by ii , so , so ii incorrectly incorrectly gets the new value

gets the new value

Solution: Stall

Solution: Stall

(33)

Control Hazard

Control Hazard

arise from the pipelining of branches and

arise from the pipelining of branches and

other instructions that change the PC

other instructions that change the PC

Solution:

Solution: Stall the pipeline as soon as the Stall the pipeline as soon as the branch is detected - Program Counter value

branch is detected - Program Counter value

changed to target address

changed to target address

(34)

References

References

David Patterson and John L. Hennessy.

David Patterson and John L. Hennessy.

Computer Architecture: A Quantitative

Computer Architecture: A Quantitative

Approach. 2nd Edition. 1996.

Approach. 2nd Edition. 1996.

William Stallings. Computer Architecture

William Stallings. Computer Architecture

and Organization. 2000

and Organization. 2000

Andrie Coronel and Ariel Maguyon.

Andrie Coronel and Ariel Maguyon.

Computer Architecture slides. First Sem.

Computer Architecture slides. First Sem.

SY 2006-2007, Ateneo de Manila

SY 2006-2007, Ateneo de Manila

University

References

Related documents

Intra-regional trade is affected by its regional trade agreement known as the ASEAN Free Trade Area (AFTA), while its impact is expected to attract long-run investment

This paper describes several subsistence livelihoods (e.g. rice farming, fishing, and community forestry) that have traditionally been present in Northeast Thailand, a region known

mismo, otro at the Biblioteca Nacional, Buenos Aires, 2016. Reproduction of first page only, Lame Duck Books, Catalogue No. Five pages are published in Borges el mismo, otro,

cars Eco-cars Light cars Older model cars General Consumers Luxury foreign cars Appraisal Purchase(Stock) 50,000 Cars × gross profit (retail + wholesale). Directly managed

Township; sophomore at West Leyden High School in Northlake.. LIBERTY -- Katie Tenhouse, daughter of Art and Sharon Tenhouse of

ФОРМУЛА ВИНАХОДУ Спосіб візуалізації цифрових рентгенологічних зображень, який полягає в тому, що здійснюють 35 фільтрацію вхідного зображення для

Online trust partially mediates the relationships between Web site and consumer characteristics and behavioral intent, and this mediation is strongest (weakest) for sites

We have conducted the first airborne hygro- scopic growth closure study to utilize data from an Aero- dyne compact Time-of-Flight Aerosol Mass Spectrome- ter (C-ToF-AMS) coupled