• No results found

Data Dependences. A data dependence occurs whenever one instruction needs a value produced by another.

N/A
N/A
Protected

Academic year: 2021

Share "Data Dependences. A data dependence occurs whenever one instruction needs a value produced by another."

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Hazards

(2)

Hazards: Key Points

Hazards cause imperfect pipelining

They prevent us from achieving CPI = 1

They are generally causes by “counter flow” data dependences in the pipeline

Three kinds

Structural -- contention for hardware resources

Data -- a data value is not available when/where it is needed.

Control -- the next instruction to execute is not known.

Two ways to deal with hazards

Removal -- add hardware and/or complexity to work around the hazard so it does not exist

Bypassing/forwarding

Speculation

Stall -- Sacrifice performance to prevent the hazard from occurring

(3)

Data Dependences

A data dependence occurs whenever one

instruction needs a value produced by another.

Register values (for now)

Also memory accesses (more on this later)

add $s0, $t0, $t1 sub $t2, $s0, $t3 add $t3, $s0, $t4 and $t3, $t2, $t4

sw $t1, 0($t2)

ld $t3, 0($t2)

ld $t4, 16($s4)

(4)

Data Dependences

A data dependence occurs whenever one

instruction needs a value produced by another.

Register values (for now)

Also memory accesses (more on this later)

add $s0, $t0, $t1 sub $t2, $s0, $t3 add $t3, $s0, $t4 and $t3, $t2, $t4

sw $t1, 0($t2)

ld $t3, 0($t2)

ld $t4, 16($s4)

(5)

Data Dependences

A data dependence occurs whenever one

instruction needs a value produced by another.

Register values (for now)

Also memory accesses (more on this later)

add $s0, $t0, $t1 sub $t2, $s0, $t3 add $t3, $s0, $t4 and $t3, $t2, $t4

sw $t1, 0($t2)

ld $t3, 0($t2)

ld $t4, 16($s4)

(6)

In our simple pipeline, these instructions cause a hazard

Dependences in the pipeline

Deco EX

de

Fetch Mem Write

add $s0, $t0, $t1 back

Deco EX

de

Fetch Mem Write

sub $t2, $s0, $t3 back

Cycles

(7)

How can we fix it?

Ideas?

(8)

Solution 1: Make the compiler deal with it.

Expose hazards to the big A architecture

A result is available N instructions after the instruction that generates it.

In the meantime, the register file has the old value.

“delay slots”

What is N?

Can it change?

What can the compiler do?

Deco EX

de

Fetch Mem Write

back

(9)

Compiling for delay slots

add $s0, $t0, $t1 sub $t2, $s0, $t3 add $t3, $s0, $t4 and $t7, $t5, $t4

add $s0, $t0, $t1 and $t7, $t5, $t4 sub $t2, $s0, $t3 add $t3, $s0, $t4 Rearrange

instructions

The compiler must fill the delay slots with other instructions

What if it can’t?

(10)

Compiling for delay slots

add $s0, $t0, $t1 sub $t2, $s0, $t3 add $t3, $s0, $t4 and $t7, $t5, $t4

add $s0, $t0, $t1 and $t7, $t5, $t4 sub $t2, $s0, $t3 add $t3, $s0, $t4 Rearrange

instructions

The compiler must fill the delay slots with other instructions

What if it can’t?

No-ops

(11)

Solution 2: Stall

When you need a value that is not ready, “stall”

Suspend the execution of the executing instruction

and those that follow.

This introduces a pipeline “bubble.” A bubble is a lack of work to do. It moves through the pipeline like an

instruction.

Deco EX

de

Fetch Mem Write

add $s0, $t0, $t1 back

Fetch

sub $t2, $s0, $t3

Cycles

Deco EX

de

Mem Write

Stall back

(12)

Stalling the pipeline

Freeze all pipeline stages before the stage where the hazard occurred.

Disable the PC update

Disable the pipeline registers

This essentially equivalent to always inserting a nop when a hazard exists

Insert nop control bits at stalled stage (decode in our example)

How is this solution still potentially “better” than relying on the compiler?

(13)

Stalling the pipeline

Freeze all pipeline stages before the stage where the hazard occurred.

Disable the PC update

Disable the pipeline registers

This essentially equivalent to always inserting a nop when a hazard exists

Insert nop control bits at stalled stage (decode in our example)

How is this solution still potentially “better” than relying on the compiler?

(14)

The Impact of Stalling On Performance

ET = I * CPI * CT

I and CT are constant

What is the impact of stalling on CPI?

What do we need to know to figure it out?

(15)

The Impact of Stalling On Performance

ET = I * CPI * CT

I and CT are constant

What is the impact of stalling on CPI?

Fraction of instructions that stall: 30%

Baseline CPI = 1

Stall CPI = 1 + 2 = 3

New CPI =

(16)

The Impact of Stalling On Performance

ET = I * CPI * CT

I and CT are constant

What is the impact of stalling on CPI?

Fraction of instructions that stall: 30%

Baseline CPI = 1

Stall CPI = 1 + 2 = 3

New CPI = 0.3*3 + 0.7*1 = 1.6

(17)

Solution 3: Bypassing/Forwarding

Data values are computed in Ex and Mem but

“publicized in write back”

The data exists! We should use it.

Deco EX

de

Fetch Mem Write

back

results known Results "published"

to registers inputs are needed

(18)

Take the values, where ever they are

Bypassing or Forwarding

Deco EX

de

Fetch Mem Write

add $s0, $t0, $t1 back

Deco EX

de

Fetch Mem Write

sub $t2, $s0, $t3 back

Cycles

(19)

Forwarding Paths

Deco EX

de

Fetch Mem Write

add $s0, $t0, $t1 back

Deco EX

de

Fetch Mem Write

sub $t2, $s0, $t3 back

Cycles

Deco EX

de

Fetch Mem Write

back

Deco EX

de

Fetch Mem Write

back

sub $t2, $s0, $t3

sub $t2, $s0, $t3

(20)

Forwarding in Hardware

Read Address

Instruc(on Memory

Add

PC

4

Write  Data Read  Addr  1 Read  Addr  2 Write  Addr

Register File

 Data  1Read

 Data  2Read

16 32

ALU Shi<

le<  2

Add

Data Memory

Address Write  Data

Read

IFetch/Dec Dec/Exec Exec/Mem Data Mem/WB

Sign Extend Add

(21)

The forwarding unit detects instances when the destination and source registers of executing

instructions match

Set the control lines on the ALU input muxes accordingly

Stall if, for some reason, forwarding is not possible.

Forwarding Control

(22)

Forwarding for Loads

Load values come from the Mem stage

Deco EX

de

Fetch Mem Write

ld $s0, (0)$t0 back

Deco EX

de

Fetch Mem

sub $t2, $s0, $t3

Cycles

(23)

Forwarding for Loads

Load values come from the Mem stage

Deco EX

de

Fetch Mem Write

ld $s0, (0)$t0 back

Deco EX

de

Fetch Mem

sub $t2, $s0, $t3

Cycles

Time travel presents significant implementation challenges

(24)

What can we do?

Punt to the compiler

Easy enough.

Will work.

Same dangers apply as before.

Always stall.

Forward when possible, stall otherwise

Here the compiler still has leverage

Code will be faster if the compiler generates code as if there is a delay slot.

If the compiler can’t fix it, the hardware will stall

(25)

Performance cost of stalling

ET = I * CPI * CT

CPI = 1 + %Stall * StallTime

% Stall is determined by how aggressive our bypassing is and the quality of our compiler.

Stall time is related to pipeline depth. In our

case, it is 1 or 2, because our pipeline is shallow.

In deeper pipelines, it can larger.

(26)

Hardware Cost of Forwarding

In our pipeline, adding forwarding required relatively little hardware.

For deeper pipelines it gets much more expensive

Roughly: ALU * pipeline stages you need to forward over

Some modern processor have multiple ALUs (4-5)

And deeper pipelines (4-5 stages of to forward across)

Not all forwarding paths need to be supported.

If a path does not exist, the processor will need to stall.

References

Related documents

Turning to the effect of college application decisions, the results in Table 1.5 show that, even with additional control variables, there still remains a positive and

The birth of Alexander Daling, grandson of Eileen &amp; Michael Daling by Donna &amp; Mark Finkelstein The birth of Sadie Haaz, daughter of Rabbi Haaz &amp; Bonnie Rose Schulman

Process 10 – The DFAS/Payment Center Approval and Processing of Payment to TPPS Provider : Within 48 hours after monthly summary invoice certification, the TPPS creates an EDI 810

To identify molecular mechanisms underlying pediatric SEPN (pSEPN), we compared gene expression profiles of spinal (n = 6) and intracranial tumors (PF/ST, n = 30/30) from

When many secondary content area teacher candidates begin their course of study thinking they need to cover all the content through lecture or text book readings, they do realize

In session 16, the PWA were asked to produce their scripts in response to the primary researcher’s questions over videoconferencing, and then all participants completed the

A website designed for accessibility will provide useful alternative text to help users understand images, audio, and video content. This inclusion is also popular with Google’s