• No results found

SCSP3133: High Performance Data Processing / SCSR3233: Hig. Performance & Parallel Computing. Module 2: Parallel Computers Architectures

N/A
N/A
Protected

Academic year: 2021

Share "SCSP3133: High Performance Data Processing / SCSR3233: Hig. Performance & Parallel Computing. Module 2: Parallel Computers Architectures"

Copied!
53
0
0

Loading.... (view fulltext now)

Full text

(1)

Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network Some parallel computers in this world

SCSP3133: High Performance Data

Processing / SCSR3233: High

Performance & Parallel Computing

Module 2: Parallel Computers Architectures

School of Computing, Faculty of Engineering,

Universiti Teknologi Malaysia

November 2, 2020

(2)

Outline

Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network Some parallel computers in this world

Outline

1

Flynn’s Taxonomy

2

Parallel Architectures

3

Interconnection Networks

1

Network Design issues

2

Network Topology Properties

3

Network Topology

(3)

Outline

Flynn’s Taxonomy

Parallel Computer Architectures Interconnection Network Some parallel computers in this world

Flynn’s Taxonomy

Flynn’s Taxonomy

is a classification of computer

systems by number of instruction streams and data

streams:

SISD

: single instruction stream, single data

stream.

SIMD

: single instruction stream, multiple data

streams.

MISD

: multiple instruction stream, single data

stream.

MIMD

: multiple instruction stream, multiple

data stream.

(4)
(5)

SISD

SISD

(

single instruction stream, single data stream

) is a computer architecture in

which a single uni-core processor executes a single instruction stream, to operate

on data stored in a single memory. This corresponds to the

von Neumann

(6)

MISD

(

multiple instruction, single data

) is a type of

parallel

computing architecture

where many functional units perform different operations

on the same data.

Pipeline

architectures belong to this type, though a purist might

say that the data is different after processing by each stage in the pipeline.

Fault

tolerance

executing the same instructions redundantly in order to detect and mask

errors, in a manner known as

task replication

, may be considered to belong to this

type.

(7)

SIMD Architecture

• Computers with this architecture have a

control unit in the CPU that performs

simultaneous operations on multiple

processing elements.

• Each processing element (PE) has its

own processing memory, and we can

say that each unit will process a

(8)

CU

PE

PE

PE

PE

PE

M

M

M

M

M

Diagram to show SIMD

architecture

Where,

CU is the Control Unit in the CPU

PE is the processing element

(9)

MIMD Architecture

• Computers designed with this

architecture have a set of processors

that executes different operations

simultaneously on different data sets.

• There are two types of MIMD systems :

MIMD systems with shared memory

MIMD systems with local memory

(10)

CU

CU

CU

CU

CU

PE

PE

PE

PE

PE

SHARED MEMORY

PROCESSOR INTERCONNECTION NETWORK

MIMD Architecture Diagram

CU

CU

CU

CU

CU

PE

PE

PE

PE

PE

PROCESSOR INTERCONNECTION NETWORK

M

M

M

M

M

Shared

Memory

Local

Memory

CU – Control

Unit

PE – Processing

element

M - Memory

(11)

Outline Flynn’s Taxonomy

Parallel Computer Architectures

Interconnection Network Some parallel computers in this world

Shared Memory Multiprocessor

P

1

M

1

Interconnection Network

P

2

M

2

P

N

M

N

. . .

. . .

M

=

memory

P

=

processor

Shared Memory Multiprocessor

Processor(s) and memory

are in the same computer.

Processors share

a single global memory.

Processors communicate

through shared data values.

(12)

Parallel architecture:

Multi-computers

• A multi-computer is a

multiple-CPU

computer

without

shared memory

.

• Each processor

only

has direct access to its

own local memory

.

• The

same address

on different processor

refers to

two different physical memory

locations.

• Since there is no shared memory, processors

interact to each other by

passing messages

.

(13)

Outline Flynn’s Taxonomy

Parallel Computer Architectures

Interconnection Network Some parallel computers in this world

Multicomputer

P1 M1 Interconnection Network P2 M2 PN MN

. . .

. . .

M=memory P=processor

Multicomputer

Multiple

computers are connected through

an interconnection network.

Each computer consists

of local CPU and memory units.

A processor only has direct

access to its own local memory.

If a processor requires data from

other computers, the processors

communicate to each others using

Message Passing mechanism.

A multicomputer is also known as

a cluster computer.

(14)

Distributed Multiprocessors

• Other names that refers to distributed

multiprocessor architecture

– Non Uniform Memory Access (NUMA)

• Components in the architecture:

– An interconnection bus

– More than one (CPU + Cache + Memory + I/O

devices) groups connected to the bus.

• Therefore memory and I/O Devices are

distributed

• Distributed collection of memories forms one

logical address space.

(15)

Outline Flynn’s Taxonomy

Parallel Computer Architectures

Interconnection Network Some parallel computers in this world

Distributed Shared Memory Multiprocessor

P

1

M

1

Interconnection Network

P

2

M

2

P

N

M

N

. . .

. . .

M

=

memory

P

=

processor

Distributed

Shared Memory Multiprocessor:

All CPUs share access to

a virtually single memory.

Processors communicate

through shared data values.

(16)

Commercial Vs Commodity

• Commercial:

– Custom machine and switching network

– Good balance between processor speed and

communication network speed.

• Commodity

– Constructed from mass-produced equipments

(computers, switches and etc)

– Less expensive

– Higher latency and lower communication

bandwidth.

(17)

Asymmetrical Multi-computers

• A front-end computer interacts with users and

I/O devices.

• Back-end computers are exclusively for

executing parallel programs.

• Advantages:

– Simple OS in back-end computers.

– Easier to understand, model and tune the

performance of a parallel application.

(18)

Asymmetrical Multi-computers

• Disadvantages:

– Single point of failure. If front-end computer is down then the

whole system is down.

– Scalability is limited by the performance capability of the

front-end computer.

– Simple OS in back-end computers results debugging

operation more difficult. Always need to use the front end to

display debugging reports or messages.

– Requires two distinct programs for every parallel

applications: front-end and back-end versions.

(19)

Symmetrical Multi-computers

• Every computer executes the same operating

system and packages. Every computer has

identical functionality.

• Users may login and develop program in any

computer.

• Advantages:

– Overcome disadvantages of asymmetrical

multi-computer mentioned earlier.

(20)

Symmetrical Multi-computers

• Disadvantages:

– Difficult to maintain the illusion of “a single

multi-computer” with users can log in into any machine.

– Difficult to have consistent system performance

due to the workload of computers are varying with

time.

– Difficult to achieve high performance from parallel

programs because processes must compete with

other processes for resources (especially CPU

and Cache memory)

(21)

Parallel Architecture:

Interconnection Networks

• Multi-processor computers use

interconnection network to allow:

– Interaction between processors

– Memory access by processors

• Two main principles of interconnection

network:

– Shared Medium

– Switched Medium

(22)

Interconnection Network

• Shared medium

– Allow only one message at a time to be sent.

– Procedure of message transmission:

• Listen & Send (broadcast)

• If message collision has occurred, Wait (random delay) &

Resend.

– Issue: Performance degradation by message

collisions on heavily utilized shared medium.

(23)

Interconnection Network

• Switched medium

– Support

point to point

message

transmission among processors.

– Support

concurrent transmission

of

multiple messages among different pairs of

processors.

– Support

scaling of the interconnection

network

to accommodate greater number

(24)

Outline Flynn’s Taxonomy Parallel Computer Architectures

Interconnection Network

Some parallel computers in this world

Network Topology Properties Network Topology Network Design Issues Routing

Interconnection Network

Commonly, remote data is required during the

computation, thus communication (connection) is

needed.

Network connects between processors to processors

(distributed memory architecture) to provide physical

path for messages send from one processor (computer)

to another processor (computer).

If there are

N

nodes in the network,

N

(

N

1)

/

2 of wires

(link) are required to ensure every nodes have

direct

connection

to each of other nodes.

(25)

Outline Flynn’s Taxonomy Parallel Computer Architectures

Interconnection Network

Some parallel computers in this world

Network Topology Properties Network Topology Network Design Issues Routing

Interconnection Network

If

N

is high, direct connection between

processors is not always the case in real

practice.

Instead, nodes are connected by indirect

connection which requires switched

interconnection, according to a

network

topology

.

(26)

Switch Network Topologies

• Categories of switch network topology:

– Direct topology

• The ratio of switch nodes to processor nodes

is

1.

– Indirect topology

• The ratio of switch nodes to processor nodes is

greater than 1

.

Number of switch nodes = number of processors

(27)

Outline Flynn’s Taxonomy Parallel Computer Architectures

Interconnection Network

Some parallel computers in this world

Network Topology Properties

Network Topology

Network Design Issues Routing

Some common network topologies

Ismail Fauzi Isnin SCSP3133: High Performance Data Processing

Direct topology

5 switch nodes

5 processor nodes

Direct topology

16 switch nodes

16 processor nodes

indirect topology

9 switch nodes

8 processor nodes

direct topology

shared medium

(no switch)

8 processor nodes

Direct topology

8 switch nodes

8 processor nodes

Direct topology

16 switch nodes

16 processor nodes

(28)

Outline Flynn’s Taxonomy Parallel Computer Architectures

Interconnection Network

Some parallel computers in this world

Network Topology Properties

Network Topology

Network Design Issues Routing

Some common network topologies

Ismail Fauzi Isnin SCSP3133: High Performance Data Processing

Indirect topology 32 switch nodes 8 processor nodes

Indirect topology

15 switch nodes

8 processor nodes

4 switch nodes 4 processor nodes (Direct Topology) 16 switch nodes 16 processor nodes 8 switch nodes 8 processor nodes 2 switch nodes 2 processor nodes
(29)

Outline Flynn’s Taxonomy Parallel Computer Architectures

Interconnection Network

Some parallel computers in this world

Network Topology Properties

Network Topology Network Design Issues Routing

Interconnection Network Topology

Properties

Diameter (

D

) - minimum number of links (shortest

route) between the farthest pair of nodes. This

determines maximum distance/delay a message may

travel/experience. This also representing the

performance lower bound.

Bisection width (

W

) - the minimum number of links

that must be removed to split the network into two

subnetwork of equal size. This determines ability to

support simultaneous global communication.

(30)

Outline Flynn’s Taxonomy Parallel Computer Architectures

Interconnection Network

Some parallel computers in this world

Network Topology Properties Network Topology Network Design Issues Routing

Interconnection Network Topology

Properties

Bisection bandwidth - the maximum bitrate can be

transmitted between the two halves of the network.

Edge per switch node (E) - maximum number of edges

(links) attache to any switch node (vertex). Ideally

E

is

a constant independent of network size.

Edge-length (

L

) - maximum physical length of any wire

(link); Ideally

L

is a constant independent of network

size.

(31)

Switch Network Topologies

• 2D Mesh Network

N = d

2

where N is number of processor nodes in the network.

S = N

where S is the number of switch nodes in the network

Direct topology

D = 2(sqrt(n) - 1)

W = sqrt(n)

E = 4

(32)

Mesh Network

2 Dimensional

(33)

Switch Network Topologies

• Binary Tree Network

N = 2

d

S = 2N - 1

Indirect topology

D = 2 log

2

N

W = 1

E = 3

L = No

(34)
(35)

Switch Network Topologies

• Butterfly Network

N = 2

d

S = N (log

2

N + 1)

Indirect topology

D = log

2

N

W = N/2

E = 4

L = No

(36)

Butterfly network with 8 processor nodes and 32 switch

nodes

(37)

Switch Network Topologies

• Hypercube Network

Also known as Binary n-Cube

N = 2

d

S = N

direct topology

D = log

2

N

W = N/2

E = log

2

N

L = No

(38)

Hypercube connected architectures

0-D dimension

1-D dimension

2-D dimension

3-D dimension

4-D

dimension

(39)

Outline Flynn’s Taxonomy Parallel Computer Architectures

Interconnection Network

Some parallel computers in this world

Network Topology Properties Network Topology

Network Design Issues

Routing

Interconnection Network Design Issues

The design of interconnection networks could influences the

algorithm, implementation and performance of parallel

programs. Four interconnection network design issues are:

Bandwidth - number of bits that can be transmitted in

unit time, e.g. bits/sec

Cost of construction

Easy of construction

(40)

Outline Flynn’s Taxonomy Parallel Computer Architectures

Interconnection Network

Some parallel computers in this world

Network Topology Properties Network Topology

Network Design Issues

Routing

Interconnection Network Design Issues

Communication Latency (

T

)

T

=

t

s

+

t

w

L

where

T

= total time required to send data from source

to destination.

t

s

= message latency (startup time)

time to send zero-length message.

t

w

= network latency

time required to send a word of data.

L

= length of data in words.

Note:

t

s

"

t

w

for most of real parallel systems.

(41)

Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network

Some parallel computers in this world

Example of parallel computers

1

The Cosmic Cube

2

Touchstone Delta & Intel Personal Super

Computer (iPSC)

3

The Beowulf Project

4

Accelerated Strategic Computing Initiative

Project (ASCI)

(42)

Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network

Some parallel computers in this world

The Cosmic Cube

Built in 1983.

Constructed out of 64 Intel 8086

microprocessors.

Computing speed: 5 to 10 MFlops

1

.

1

flops means floating point operations per seconds

(43)
(44)

Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network

Some parallel computers in this world

Touchstone Delta & Intel Personal Super

Computer (iPSC)

Touchstone Delta & Intel Personal Super Computer (iPSC)

Ismail Fauzi Isnin SCSP3133: High Performance Data Processing

URL for further information about iPSC :

https://en.wikipedia.org/wiki/Intel_iPSC

(45)

Intel announced the iPSC/1 in

1985, with 32 to 128 nodes

connected with Ethernet into a

hypercube

(46)

The Intel iPSC/2 was announced in 1987.

the base setup being one cabinet with 16

Intel

80386

processors at 16 MHz, each with 4 MB of memory

and a

80387

coprocessor on the same module.

Instead of Ethernet, a custom Direct-Connect Module with

eight channels of about 2.8 Mbyte/s data rate each was

used for hypercube interconnection.

(47)

Intel announced the iPSC/860 in 1990. The iPSC/860

consisted of up to 128 processing elements connected in a

hypercube, each element consisting of an

Intel i860

at 40–

(48)

Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network

Some parallel computers in this world

The Beowulf Project

Built in 1994, at NASA’s Goddard Space Flight

Center.

Constructed from commodity computers and

freely available softwares.

Main Issue: Computing speed and

communication speed are not balanced.

Advantage: Lower cost and rapidly improving

performance.

Photo1

Photo2

Photo3

(49)
(50)

Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network

Some parallel computers in this world

Accelerated Strategic Computing Initiative

Project (ASCI)

A project by the U.S. Department of Energy.

Develop a series of five supercomputers:

1

ASCI Red, 1997-2005, 2 TFlops.

2

ASCI Blue Pacific, 1998, 3.9 TFlops.

3

ASCI Blue Mountain, 1999, 2.5 TFlops.

4

ASCI White, 2001, 12.3 TFlops.

5

ASCI Purple, 2005-2010, 100 TFlops.

ASC Systems Photos

(51)
(52)

Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network

Some parallel computers in this world

What is the fastest supercomputer

nowadays?

Lets find it out from the TOP 500 List Website

http://www.top500.org

(53)

Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network

Some parallel computers in this world

References

Ian Foster, Designing and Building Parallel

Programs: Concepts and Tools for Parallel

Software Engineering, Addison-Wesley, 1995.

Barry Wilkinson and Michael Allen, Parallel

Programming: Techniques and Applications

Using Networked Workstation and Parallel

Computers, 2nd Edition, Pearson Prentice Hall,

2005.

Michael T. Heath, CSE 512 / CS 554 Lecture

slides, University of Illinois.

on data stored in a single memory. This corresponds to thevon Neumann ( Pipeline say that the data is different after processing by each stage in the pipeline.Fault task replication, Touchstone Delta & Intel Personal Super Computer (iPSC) 80386processors at 16 MHz, each with 4 MB of 80387 Intel i860 microprocessor. Photo1 Photo2 Photo3 http://www.top500.org

Figure

Diagram to show SIMD architecture

References

Related documents

In the hardware architecture, we used parallel computation cores to minimize cycle count, and applied circuit-level and block-level pipeline strategy to benefit parallel processing