Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network Some parallel computers in this world
SCSP3133: High Performance Data
Processing / SCSR3233: High
Performance & Parallel Computing
Module 2: Parallel Computers Architectures
School of Computing, Faculty of Engineering,
Universiti Teknologi Malaysia
November 2, 2020
Outline
Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network Some parallel computers in this world
Outline
1
Flynn’s Taxonomy
2Parallel Architectures
3Interconnection Networks
1
Network Design issues
2
Network Topology Properties
3Network Topology
Outline
Flynn’s Taxonomy
Parallel Computer Architectures Interconnection Network Some parallel computers in this world
Flynn’s Taxonomy
Flynn’s Taxonomy
is a classification of computer
systems by number of instruction streams and data
streams:
SISD
: single instruction stream, single data
stream.
SIMD
: single instruction stream, multiple data
streams.
MISD
: multiple instruction stream, single data
stream.
MIMD
: multiple instruction stream, multiple
data stream.
SISD
SISD
(
single instruction stream, single data stream
) is a computer architecture in
which a single uni-core processor executes a single instruction stream, to operate
on data stored in a single memory. This corresponds to the
von Neumann
MISD
(
multiple instruction, single data
) is a type of
parallel
computing architecture
where many functional units perform different operations
on the same data.
Pipeline
architectures belong to this type, though a purist might
say that the data is different after processing by each stage in the pipeline.
Fault
tolerance
executing the same instructions redundantly in order to detect and mask
errors, in a manner known as
task replication
, may be considered to belong to this
type.
SIMD Architecture
• Computers with this architecture have a
control unit in the CPU that performs
simultaneous operations on multiple
processing elements.
• Each processing element (PE) has its
own processing memory, and we can
say that each unit will process a
CU
PE
PE
PE
PE
PE
M
M
M
M
M
Diagram to show SIMD
architecture
Where,
CU is the Control Unit in the CPU
PE is the processing element
MIMD Architecture
• Computers designed with this
architecture have a set of processors
that executes different operations
simultaneously on different data sets.
• There are two types of MIMD systems :
MIMD systems with shared memory
MIMD systems with local memory
CU
CU
CU
CU
CU
PE
PE
PE
PE
PE
SHARED MEMORY
PROCESSOR INTERCONNECTION NETWORK
MIMD Architecture Diagram
CU
CU
CU
CU
CU
PE
PE
PE
PE
PE
PROCESSOR INTERCONNECTION NETWORK
M
M
M
M
M
Shared
Memory
Local
Memory
CU – Control
Unit
PE – Processing
element
M - Memory
Outline Flynn’s Taxonomy
Parallel Computer Architectures
Interconnection Network Some parallel computers in this world
Shared Memory Multiprocessor
P
1M
1Interconnection Network
P
2M
2P
NM
N. . .
. . .
M
=
memory
P
=
processor
Shared Memory Multiprocessor
Processor(s) and memory
are in the same computer.
Processors share
a single global memory.
Processors communicate
through shared data values.
Parallel architecture:
Multi-computers
• A multi-computer is a
multiple-CPU
computer
without
shared memory
.
• Each processor
only
has direct access to its
own local memory
.
• The
same address
on different processor
refers to
two different physical memory
locations.
• Since there is no shared memory, processors
interact to each other by
passing messages
.
Outline Flynn’s Taxonomy
Parallel Computer Architectures
Interconnection Network Some parallel computers in this world
Multicomputer
P1 M1 Interconnection Network P2 M2 PN MN. . .
. . .
M=memory P=processorMulticomputer
Multiple
computers are connected through
an interconnection network.
Each computer consists
of local CPU and memory units.
A processor only has direct
access to its own local memory.
If a processor requires data from
other computers, the processors
communicate to each others using
Message Passing mechanism.
A multicomputer is also known as
a cluster computer.
Distributed Multiprocessors
• Other names that refers to distributed
multiprocessor architecture
– Non Uniform Memory Access (NUMA)
• Components in the architecture:
– An interconnection bus
– More than one (CPU + Cache + Memory + I/O
devices) groups connected to the bus.
• Therefore memory and I/O Devices are
distributed
• Distributed collection of memories forms one
logical address space.
Outline Flynn’s Taxonomy
Parallel Computer Architectures
Interconnection Network Some parallel computers in this world
Distributed Shared Memory Multiprocessor
P
1M
1Interconnection Network
P
2M
2P
NM
N. . .
. . .
M
=
memory
P
=
processor
Distributed
Shared Memory Multiprocessor:
All CPUs share access to
a virtually single memory.
Processors communicate
through shared data values.
Commercial Vs Commodity
• Commercial:
– Custom machine and switching network
– Good balance between processor speed and
communication network speed.
• Commodity
– Constructed from mass-produced equipments
(computers, switches and etc)
– Less expensive
– Higher latency and lower communication
bandwidth.
Asymmetrical Multi-computers
• A front-end computer interacts with users and
I/O devices.
• Back-end computers are exclusively for
executing parallel programs.
• Advantages:
– Simple OS in back-end computers.
– Easier to understand, model and tune the
performance of a parallel application.
Asymmetrical Multi-computers
• Disadvantages:
– Single point of failure. If front-end computer is down then the
whole system is down.
– Scalability is limited by the performance capability of the
front-end computer.
– Simple OS in back-end computers results debugging
operation more difficult. Always need to use the front end to
display debugging reports or messages.
– Requires two distinct programs for every parallel
applications: front-end and back-end versions.
Symmetrical Multi-computers
• Every computer executes the same operating
system and packages. Every computer has
identical functionality.
• Users may login and develop program in any
computer.
• Advantages:
– Overcome disadvantages of asymmetrical
multi-computer mentioned earlier.
Symmetrical Multi-computers
• Disadvantages:
– Difficult to maintain the illusion of “a single
multi-computer” with users can log in into any machine.
– Difficult to have consistent system performance
due to the workload of computers are varying with
time.
– Difficult to achieve high performance from parallel
programs because processes must compete with
other processes for resources (especially CPU
and Cache memory)
Parallel Architecture:
Interconnection Networks
• Multi-processor computers use
interconnection network to allow:
– Interaction between processors
– Memory access by processors
• Two main principles of interconnection
network:
– Shared Medium
– Switched Medium
Interconnection Network
• Shared medium
– Allow only one message at a time to be sent.
– Procedure of message transmission:
• Listen & Send (broadcast)
• If message collision has occurred, Wait (random delay) &
Resend.
– Issue: Performance degradation by message
collisions on heavily utilized shared medium.
Interconnection Network
• Switched medium
– Support
point to point
message
transmission among processors.
– Support
concurrent transmission
of
multiple messages among different pairs of
processors.
– Support
scaling of the interconnection
network
to accommodate greater number
Outline Flynn’s Taxonomy Parallel Computer Architectures
Interconnection Network
Some parallel computers in this world
Network Topology Properties Network Topology Network Design Issues Routing
Interconnection Network
Commonly, remote data is required during the
computation, thus communication (connection) is
needed.
Network connects between processors to processors
(distributed memory architecture) to provide physical
path for messages send from one processor (computer)
to another processor (computer).
If there are
N
nodes in the network,
N
(
N
−
1)
/
2 of wires
(link) are required to ensure every nodes have
direct
connection
to each of other nodes.
Outline Flynn’s Taxonomy Parallel Computer Architectures
Interconnection Network
Some parallel computers in this world
Network Topology Properties Network Topology Network Design Issues Routing
Interconnection Network
If
N
is high, direct connection between
processors is not always the case in real
practice.
Instead, nodes are connected by indirect
connection which requires switched
interconnection, according to a
network
topology
.
Switch Network Topologies
• Categories of switch network topology:
– Direct topology
• The ratio of switch nodes to processor nodes
is
1.
– Indirect topology
• The ratio of switch nodes to processor nodes is
greater than 1
.
Number of switch nodes = number of processors
Outline Flynn’s Taxonomy Parallel Computer Architectures
Interconnection Network
Some parallel computers in this world
Network Topology Properties
Network Topology
Network Design Issues Routing
Some common network topologies
Ismail Fauzi Isnin SCSP3133: High Performance Data Processing
Direct topology
5 switch nodes
5 processor nodes
Direct topology
16 switch nodes
16 processor nodes
indirect topology
9 switch nodes
8 processor nodes
direct topology
shared medium
(no switch)
8 processor nodes
Direct topology
8 switch nodes
8 processor nodes
Direct topology
16 switch nodes
16 processor nodes
Outline Flynn’s Taxonomy Parallel Computer Architectures
Interconnection Network
Some parallel computers in this world
Network Topology Properties
Network Topology
Network Design Issues Routing
Some common network topologies
Ismail Fauzi Isnin SCSP3133: High Performance Data Processing
Indirect topology 32 switch nodes 8 processor nodes
Indirect topology
15 switch nodes
8 processor nodes
4 switch nodes 4 processor nodes (Direct Topology) 16 switch nodes 16 processor nodes 8 switch nodes 8 processor nodes 2 switch nodes 2 processor nodesOutline Flynn’s Taxonomy Parallel Computer Architectures
Interconnection Network
Some parallel computers in this world
Network Topology Properties
Network Topology Network Design Issues Routing
Interconnection Network Topology
Properties
Diameter (
D
) - minimum number of links (shortest
route) between the farthest pair of nodes. This
determines maximum distance/delay a message may
travel/experience. This also representing the
performance lower bound.
Bisection width (
W
) - the minimum number of links
that must be removed to split the network into two
subnetwork of equal size. This determines ability to
support simultaneous global communication.
Outline Flynn’s Taxonomy Parallel Computer Architectures
Interconnection Network
Some parallel computers in this world
Network Topology Properties Network Topology Network Design Issues Routing
Interconnection Network Topology
Properties
Bisection bandwidth - the maximum bitrate can be
transmitted between the two halves of the network.
Edge per switch node (E) - maximum number of edges
(links) attache to any switch node (vertex). Ideally
E
is
a constant independent of network size.
Edge-length (
L
) - maximum physical length of any wire
(link); Ideally
L
is a constant independent of network
size.
Switch Network Topologies
• 2D Mesh Network
N = d
2
where N is number of processor nodes in the network.
S = N
where S is the number of switch nodes in the network
Direct topology
D = 2(sqrt(n) - 1)
W = sqrt(n)
E = 4
Mesh Network
2 Dimensional
Switch Network Topologies
• Binary Tree Network
N = 2
d
S = 2N - 1
Indirect topology
D = 2 log
2
N
W = 1
E = 3
L = No
Switch Network Topologies
• Butterfly Network
N = 2
d
S = N (log
2
N + 1)
Indirect topology
D = log
2
N
W = N/2
E = 4
L = No
Butterfly network with 8 processor nodes and 32 switch
nodes
Switch Network Topologies
• Hypercube Network
Also known as Binary n-Cube
N = 2
d
S = N
direct topology
D = log
2
N
W = N/2
E = log
2
N
L = No
Hypercube connected architectures
0-D dimension
1-D dimension
2-D dimension
3-D dimension
4-D
dimension
Outline Flynn’s Taxonomy Parallel Computer Architectures
Interconnection Network
Some parallel computers in this world
Network Topology Properties Network Topology
Network Design Issues
Routing
Interconnection Network Design Issues
The design of interconnection networks could influences the
algorithm, implementation and performance of parallel
programs. Four interconnection network design issues are:
Bandwidth - number of bits that can be transmitted in
unit time, e.g. bits/sec
Cost of construction
Easy of construction
Outline Flynn’s Taxonomy Parallel Computer Architectures
Interconnection Network
Some parallel computers in this world
Network Topology Properties Network Topology
Network Design Issues
Routing
Interconnection Network Design Issues
Communication Latency (
T
)
T
=
t
s
+
t
w
L
where
T
= total time required to send data from source
to destination.
t
s
= message latency (startup time)
time to send zero-length message.
t
w
= network latency
time required to send a word of data.
L
= length of data in words.
Note:
t
s
"
t
w
for most of real parallel systems.
Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network
Some parallel computers in this world
Example of parallel computers
1
The Cosmic Cube
2
Touchstone Delta & Intel Personal Super
Computer (iPSC)
3
The Beowulf Project
4
Accelerated Strategic Computing Initiative
Project (ASCI)
Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network
Some parallel computers in this world
The Cosmic Cube
Built in 1983.
Constructed out of 64 Intel 8086
microprocessors.
Computing speed: 5 to 10 MFlops
1
.
1
flops means floating point operations per seconds
Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network
Some parallel computers in this world
Touchstone Delta & Intel Personal Super
Computer (iPSC)
Touchstone Delta & Intel Personal Super Computer (iPSC)
Ismail Fauzi Isnin SCSP3133: High Performance Data Processing
URL for further information about iPSC :
https://en.wikipedia.org/wiki/Intel_iPSC
•
Intel announced the iPSC/1 in
1985, with 32 to 128 nodes
connected with Ethernet into a
hypercube
•
The Intel iPSC/2 was announced in 1987.
•
the base setup being one cabinet with 16
Intel
80386
processors at 16 MHz, each with 4 MB of memory
and a
80387
coprocessor on the same module.
•
Instead of Ethernet, a custom Direct-Connect Module with
eight channels of about 2.8 Mbyte/s data rate each was
used for hypercube interconnection.
•
Intel announced the iPSC/860 in 1990. The iPSC/860
consisted of up to 128 processing elements connected in a
hypercube, each element consisting of an
Intel i860
at 40–
Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network
Some parallel computers in this world
The Beowulf Project
Built in 1994, at NASA’s Goddard Space Flight
Center.
Constructed from commodity computers and
freely available softwares.
Main Issue: Computing speed and
communication speed are not balanced.
Advantage: Lower cost and rapidly improving
performance.
Photo1
Photo2
Photo3
Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network
Some parallel computers in this world
Accelerated Strategic Computing Initiative
Project (ASCI)
A project by the U.S. Department of Energy.
Develop a series of five supercomputers:
1
ASCI Red, 1997-2005, 2 TFlops.
2ASCI Blue Pacific, 1998, 3.9 TFlops.
3ASCI Blue Mountain, 1999, 2.5 TFlops.
4ASCI White, 2001, 12.3 TFlops.
5
ASCI Purple, 2005-2010, 100 TFlops.
ASC Systems Photos
Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network
Some parallel computers in this world
What is the fastest supercomputer
nowadays?
Lets find it out from the TOP 500 List Website
http://www.top500.org
Outline Flynn’s Taxonomy Parallel Computer Architectures Interconnection Network
Some parallel computers in this world
References
Ian Foster, Designing and Building Parallel
Programs: Concepts and Tools for Parallel
Software Engineering, Addison-Wesley, 1995.
Barry Wilkinson and Michael Allen, Parallel
Programming: Techniques and Applications
Using Networked Workstation and Parallel
Computers, 2nd Edition, Pearson Prentice Hall,
2005.
Michael T. Heath, CSE 512 / CS 554 Lecture
slides, University of Illinois.