• No results found

Multiprocessors. Multiprocessors, Multiprocessing, and Multithreading. Classifying Multiprocessors. Interconnection Network

N/A
N/A
Protected

Academic year: 2021

Share "Multiprocessors. Multiprocessors, Multiprocessing, and Multithreading. Classifying Multiprocessors. Interconnection Network"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

CSE 141 Dean Tullsen

Multiprocessors, Multiprocessing, and

Multithreading

more is better?

CSE 141 Dean Tullsen

Multiprocessors

why would you want a multiprocessor?

what things can it do well?

What things can’t it do well?

What things can it do that a bunch of computerscan’t do?

How much are you willing to pay?

(2)

CSE 141 Dean Tullsen

Memory Topology

UMA(Uniform Memory Access)

NUMA(Non-uniform Memory Access)

pros/cons? cpu cpu cpu cpu . . . M M M M . . . Network Cache Processor Cache Processor Cache Processor Single bus Memory I/O Network Cache Processor Cache Processor Cache Processor

Memory Memory Memory

CSE 141 Dean Tullsen

Programming Model

Shared Memory-- every processor can name every address

location

Message Passing-- each processor can name only it’s local

memory. Communication is through explicit messages.

pros/cons?

find the max of 100,000 integers on 10 processors.

Network Cache Processor Cache Processor Cache Processor

Memory Memory Memory

Parallel Programming

Processor A

index = i++;

Processor B

index = i++;

Shared-memory programming requires synchronization to provide mutual exclusion and prevent race conditions

– locks (semaphores) – barriers

i = 47

Multiprocessor Caches (Shared Memory)

the problem -- cache coherency

(3)

CSE 141 Dean Tullsen

Cache Coherency

write-update

– on each write, each cache holding that location updates its value

write-invalidate<= most common

– on each write, each cache holding that location invalidates the cache line.

both schemes MUCH easier on a bus-based multiprocessor

potentially requires a LOT of messages, but...

Cache tag and data Processor Single bus Memory I/O Snoop

tag Cache tagand data Processor

Snoop

tag Cache tagand data Processor

Snoop tag

CSE 141 Dean Tullsen

Cache Coherency

A good cache coherency protocol can avoid sending unnecessary (and expensive) invalidate or update messages.

Allows each cache line to be in one of several states.

MESI (Illinois) – modified – exclusive – shared – Invalid

What happens: – Load miss?

– Store hit to exclusive line? Modified? Shared? – Store miss?

Cache Coherency

How do you know when an external read/write occurs?

Snoopingprotocols

Directoryprotocols Cache tag and data Processor Single bus Memory I/O Snoop tag Cache tag and data Processor Snoop tag Cache tag and data Processor Snoop tag

Directory Memory Directory Memory

(4)

CSE 141 Dean Tullsen

Hardware Multithreading

Conventional Processor PC regs CPU instruction stream PC regs PC regs PC regs Multithreaded

CSE 141 Dean Tullsen

Superscalar Execution

Issue Slots

Time (proc cycles)

Vertical waste

Horizontal waste

Superscalar Execution with Fine-Grain Multithreading

Issue Slots

Time (proc cycles)

Thread 1

Thread 2

Thread 3

Simultaneous Multithreading

Issue Slots

Time (proc cycles)

Thread 1

Thread 2

Thread 3

Thread 4

(5)

CSE 141 Dean Tullsen

SMT Performance

0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 Number of Threads Thr oughput ( Inst ruct ions per C y cl e ) Simultaneous Multithreading Fine-Grain Multithreading Conventional Superscalar

CSE 141 Dean Tullsen

Multithreaded Processors

• Coarse-grain multithreading (Alewife-MIT)

– context switch at long-latency operations (cache misses) • Fine-grain multithreading (Tera Supercomputer)

– context switch every cycle

• Simultaneous multithreading (SMT) (Tullsen, Eggers, Levy 1995)

– execute instructions from multiple threads in the same cycle – is only different from fine-grain multithreading in the context of

superscalar execution

– requires surprisingly few changes to a conventional out-of-order superscalar processor

– Was announced to be featured in the next Compaq Alpha processor (21464), but that processor never completed.

– Introduced in the Intel Pentium 4 processor – announced as “Hyper-threading technology.” (HT Technology)

– IBM Power 5, 6 has 2 cores, each 2-way SMT.

• Sun Niagara not SMT, but has eight cores, each capable of running four threads.

Multi-Core Processors

(aka Chip Multiprocessors)

Multiple cores on the same die, may or may not share L2 cache.

Intel, AMD, IBM all currently have dual-core processors (some SMT as well). Intel, AMD just announced quad-core. Sun Niagara 4x8.

Everyone’s roadmap seems to be increasingly multi-core.

CPU CPU CPU

CPU CPU CPU

Multiprocessors -- Key Points

Network vs. Bus

Message-passing vs. Shared Memory

Shared Memory is more intuitive, but creates problems for both the programmer (memory consistency, requiring synchronization) and the architect (cache coherency).

Multithreading gives the illusion of multiprocessing (including, in many cases, the performance) with very little additional hardware.

References

Related documents

This is the fi rst study to use a comparative effective- ness research design for a longitudinal dataset of hypertensive patients who receive care in a municipal health care system

Caching &amp; Cache Structures CACHE Processor Main Memory Address Address Data Data Address Tag Data Block Data. Byte Data Byte Data Byte Line 100 304 6848 copy of main

Impact of Inundation and Changes in Garrison Diversion Project Plans on the North Dakota Economy, Agricultural Economics Report No.. 127, Department

I agree to undertake assessment in the knowledge that information gathered will only be used for professional development purposes and can only be accessed by

You can choose the external access for an Add-On Instruction tag from the list on the New Add-On Instruction Parameter or Local Tag dialog box or from the External Access column

(A set-associative operation is assumed.) Included are the required blocks for a set-associative cache (tag arrays and control, cache buffer), the central memory

Several mechanisms of CG are specifically enforced by the Code, including executive compensation, board independence, audit committee, external (statutory) audit,

(A set-associative operation is assumed.) Included are the required blocks for a set-associative cache (tag arrays and control, cache buffer), the central memory