• No results found

ECEN 676 Advanced Computer Architecture

N/A
N/A
Protected

Academic year: 2021

Share "ECEN 676 Advanced Computer Architecture"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

ECEN 676

A

dvanced Computer Architecture

Prof. Michel A. Kinsy

(2)

Memory Consistency in SMPs

Cache-1

A 100

CPU-Memory bus

CPU-1

CPU-2

Cache-2

A

100

Memory

A 100

(3)

Memory Consistency in SMPs

§

Suppose CPU-1 updates

A

to

200

§

Write-back: memory and cache-2 have stale values

§

Write-through: cache-2 has a stale value

Cache-1

A

200

CPU-Memory bus

CPU-1

CPU-2

Cache-2

A

100

Memory

A

100

(4)

Memory Consistency in SMPs

§

Suppose CPU-1 updates

A

to

200

§

Do these stale values matter?

§

What is the view of shared memory for programming?

Cache-1

A

200

CPU-Memory bus

CPU-1

CPU-2

Cache-2

A

100

Memory

A

100

(5)

§

T1 is executed

Prog T2

LD Y, R1

ST Y’, R1

LD X, R2

ST X’,R2

Prog T1

ST X, 1

ST Y,11

Cache-2

Cache-1

Memory

X = 0 Y =10 X’= Y’=

X=

1

Y=

11

Y = Y’= X = X’=

§

Cache-1 writes back

Y

X = 0Y =11

X’= Y’=

X= 1

Y=11

Y = Y’= X = X’=

§

Cache-1 writes back

X

X = Y =111

X’= Y’=

X= 1

Y=11

Y = 11 Y’= 11 X = 0 X’= 0

§

T2 executed

X = 0Y =11 X’= Y’=

X= 1

Y=11

Y = 11 Y’= 11 X = 0 X’= 0

(6)

§

T1 is executed

Prog T2

LD Y, R1

ST Y’, R1

LD X, R2

ST X’,R2

Prog T1

ST X, 1

ST Y,11

Cache-2

Cache-1

Memory

X = 0 Y =10 X’= Y’=

X=

1

Y=

11

Y = Y’= X = X’=

§

Cache-1 writes back

X

X = Y =111

X’= Y’=

X= 1

Y=11

Y = 11 Y’= 11 X = 0 X’= 0

§

T2 executed

X = 0Y =11 X’= Y’=

X= 1

Y=11

Y = 11 Y’= 11 X = 0 X’= 0 X = 1 Y =11 X’= 0 Y’=11

X= 1

Y=11

Y =11 Y’=11 X = 0 X’= 0

§

Cache-2 writes back

X’

&

Y’

inc

oh

ere

nt

Write-through Caches & SC

(7)

Write-through Caches & SC

§

Write-through caches don’t preserve sequential

consistency either

Prog T2

LD Y, R1

ST Y’, R1

LD X, R2

ST X’,R2

Prog T1

ST X, 1

ST Y,11

Cache-2

Cache-1

Memory

X = 0 Y =10 X’= Y’=

X=0

Y=10

Y = Y’= X = 0 X’=

§

T1 is executed

X =Y =111 X’= Y’=

X=

1

Y=

11

Y = Y’= X =0 X’=

§

T2 executed

X =1Y =11 X’= 0 Y’=11

X= 1

Y=11

Y = 11 Y’= 11 X = 0 X’=0

(8)

Maintaining Sequential Consistency

§

Motivation: We can do without locks -- SC is

sufficient for writing producer-consumer and

mutual exclusion codes (e.g., Dekker)

§

Problem: Multiple copies of a location in various

caches can cause SC to break down.

§

Hardware support is required such that

§

Only one processor at a time has write permission for

a location

§

No processor can load a stale copy of the location

after a write

(9)

A System with Multiple Caches

§

Modern systems often have hierarchical caches

§

Each cache has exactly one parent but can have zero or

more children

§

Only a parent and its children can communicate directly

M

L1

P

L1

P

L1

P

L1

P

L2

L2

L1

P

L1

P

Interconnect

(10)

A System with Multiple Caches

§

Inclusion property is maintained between a parent

and its children, i.e.,

§

a in L

i

implies that a in L

i+1

M

L1

P

L1

P

L1

P

L1

P

L2

L2

L1

P

L1

P

Interconnect

(11)

Cache Coherence Protocols for SC

§

Write request:

§

The address is invalidated in all other caches

before the write is performed, or

§

The address is updated in all other caches after

the write is performed

§

Read request:

§

If a dirty copy is found in some cache, a write-back

is performed before the memory is read

§

We will focus on Invalidation protocols as opposed

to Update protocols

(12)

Warmup: Parallel I/O

§

DMA stands for Direct Memory Access

Either Cache or DMA can be the Bus Master and

effect transfers DISK

DMA Physical Memory Proc. R/W Data (D) Cache Address (A) A D R/W Page transfers occur while the Processor is running

Memory Bus

(13)

Problems with Parallel I/O

§

Memory à Disk: Physical memory may be stale if cache

copy is dirty

§

Disk à Memory: Cache may have data corresponding

to

the memory

DISK DMA Physical Memory Proc. Cache Memory Bus Cached portions of page DMA transfers

(14)

Snoopy Cache Goodman 1983

§

Idea: have the cache watch (or snoop upon)

DMA transfers, and then “do the right thing”

§

Snoopy cache tags are dual-ported

Proc.

Cache

Snoopy read port attached to Memory Bus Data (lines) Tags and State A D R/W

Used to drive Memory Bus when Cache is Bus Master

A R/W

(15)

Snoopy Cache Actions

Observed Bus

Cycle

Cache State

Cache Action

Read Cycle, i.e.,

Memory à Disk

Address not cached

No action

Cached, unmodified

No action

Cached, modified

Cache intervenes

Write Cycle, i.e.,

Disk à Memory

Address not cached

No action

Cached, unmodified

Cache purges its copy

(16)

Shared Memory Multiprocessor

§

Use snoopy mechanism to keep all processors’ view of

memory coherent

P

1

P

2

P

3

Snoopy

Cache

DMA

Physical

Memory

Memory

Bus

Snoopy

Cache

Snoopy

Cache

DISKS

Local

Memories

Processor

Cores

(17)

Cache State Transition Diagram

§

The MSI protocol

M: Modified

S: Shared

I: Invalid

Each cache line has a tag

Address tag state

bits

Cache state in processor P1

M S I Write miss Read miss P1int ends to write Other processor intends to write Read by any processor P1 reads or writes Other processor reads

(18)

Cache State Transition Diagram

§

The MSI protocol

M: Modified

S: Shared

I: Invalid

Each cache line has a tag

Address tag state

bits

Cache state in processor P1

M S I Write miss Other processor intends to write P1 writes back Read miss P1int ends to write Other processor intends to write Read by any processor P1 reads or writes

(19)

Cache State Transition Diagram

§

The MSI protocol

M: Modified

S: Shared

I: Invalid

Each cache line has a tag

Address tag state bits M S I Write miss Other processor intends to write P1 writes back Read miss P1int ends to write Other processor intends to write Read by any processor P1 reads or writes Other processor reads

P1 writes back

(20)

2 Processor Example

M S I Write miss P2 intends to write P1 reads or writes P2 intends to write P1 writes back P2 reads, P1 writes back P1inte nt to w rite Read miss P1 M S I Write miss P1 intends to write P2 reads or writes P1 intends to write P2 writes back P1 reads, P2 writes back P2inte nds to w rite Read miss P2

P

1

reads

P

1

writes

P

2

reads

P

2

writes

P

1

writes

P

2

writes

P

1

reads

P

1

writes

(21)

2 Processor Example

M S I Write miss Read miss P1 intent t o write P2 intends to write P2 reads, P1 writes back P1 reads or writes P2 intends to write P1 writes back P1 M S I Write miss Read miss P2 intends to write P1 intends to write P1 reads, P2 writes back P2 reads or writes P1 intends to write P2 writes back P2

P

1

reads

P

1

writes

P

2

reads

P

2

writes

P

1

writes

P

2

writes

P

1

reads

P

1

writes

(22)

Observation

§

If a line is in the M state then no other cache can have

a copy of the line!

§

Memory stays coherent, multiple differing copies cannot

exist

M S I Write miss Other processor intent to write P1 writes back Read miss P1int ent to write Other processor intent to write Read by any processor P1 reads or writes Other processor reads

(23)

Next Class

References

Related documents