Esteban Meneses, PhD
School of Computing, Costa Rica Institute of Technology
Scientific Computing Programming
with Parallel Objects
Scientific Computing Programming with Parallel Objects
Parallel Architectures Galore
2
Personal Computing
Embedded Computing
Mobile Computing
Supercomputing
Moore’s Law
Scientific Computing Programming with Parallel Objects
My Parallel Laptop
3Processor
(multicore)
Accelerator
(manycore)
Intel Core i7
NVIDIA GeForce GT
750M
4 cores
384 cores
2.5 GHz
967 MHz
Scientific Computing Programming with Parallel Objects
It’s movie time!
4
Scientific Computing Programming with Parallel Objects
Speedup
5
Heat Transfer Problem
Sp
eedup
1
10
100
Sequential
Multicore
Manycore
1
3.7
68.78
Time (seconds)
0
10
20
30
40
Sequential Multicore Manycore
0.475
8.83
Scientific Computing Programming with Parallel Objects
Supercomputer
6
Scientific Computing Programming with Parallel Objects
Top500
7Source:
http://www.top500.org
(June 2015)
Scientific Computing Programming with Parallel Objects
Exascale
Challenges:
• Heterogeneity
• Low resilience
• Thermal variation
• Irregular computation
• Programability
8Big Data
Big Network
Big Compute
(Internet of Things)
(Exascale)
Source: http://www.top500.org (June 2015)
Big Intelligence
Scientific Computing Programming with Parallel Objects
Single Program Multiple Data (SPMD)
9
Sequential
Parallel
CPU
send
data decomposition + communication
receive
MPI
Poor functional decomposition
Synchronized communication
CPU
CPU
Scientific Computing Programming with Parallel Objects
Parallel Objects
10
NAMD
Charm++
Entities and interactions
Asynchronous communication
Flexible Distribution
Non-blocking communication operations
Source: http://charm.cs.illinois.edu
Scientific Computing Programming with Parallel Objects
Parallel Objects Model
• An application is
decomposed into wudus
(work and data units).
• Objects are reactive
entities: interface of
remote methods.
• All message-passing
operations are
non-blocking: asynchronous
method invocation.
• A message-driven
execution similar to
Active Messages.
• Objects know how to
serialize/deserialize,
also called the
pack-unpack (PUP)
framework.
Goals:
❖
Latency hiding
❖
Load balancing
❖
Adaptivity
11
!
"
#
$
%
&
'
(
Scientific Computing Programming with Parallel Objects
Introspective Runtime System
• A thin layer between
the application and
the machine.
• Based on object-based
overdecomposition:
many more objects
than processing
entities.
• Components:
• Message scheduler.
• Routing tables.
• Load and communication
monitoring.
12!
"
#
$
%
&
'
(
Node A
Node B
Node C
Node D
!
"
#
$
%
&
'
(
Scientific Computing Programming with Parallel Objects
Migration
• The underlying system consists of a collection of
processing entities (processors, or nodes).
• Objects are distributed among the processing entities.
That assignment may change dynamically if load
imbalance arises.
• An introspective runtime
system detects
performance bottlenecks
and balances load by
moving objects around.
13
Node A Node B Node C Node D
!
"
#
$
%
&
'
(
#
Scientific Computing Programming with Parallel Objects
Dynamic Load Balancing
• NP-complete problem.
• Runtime system collects
load information and
communication graph.
• Greedy strategies, graph
partitioning.
• Runtime system shuffles
objects around to avoid
overloading.
• Principle of persistence.
• Based on PUP
framework.
Scientific Computing Programming with Parallel Objects
Charm++
• Actively developed since mid 90’s.
• Features language extensions,
network layers, load balancers,
tools, and several applications.
• Objects are called chares.
• Chare arrays are the
main collection of objects.
15
Scientific Computing Programming with Parallel Objects
Charm++ (cont.)
16
Scientific Computing Programming with Parallel Objects
Charm++ (cont.)
17
Scientific Computing Programming with Parallel Objects
Charm++ Runtime System
18
Scientific Computing Programming with Parallel Objects
MPI vs Charm++
19
MPI
Charm++
Over-decomposition
No*
Yes
Load Balancing
No*
Yes
Fault Tolerance
No*
Yes
Non-blocking Collectives
Yes**
Yes
Dynamic Adaptivity
No
Yes
Introspection
No
Yes
Wide Adoption
Yes
No
* Some third-party libraries may implement this feature.
** As of MPI-3 standard.
Scientific Computing Programming with Parallel Objects
Example: Heat Transfer Problem
20
Scientific Computing Programming with Parallel Objects
Example: Heat Transfer Problem
21
Scientific Computing Programming with Parallel Objects
Computational Fluid Dynamics
22
#"Grids" #"Par*cles" #"Species"
Required"
Memory"
GBs"
GFLOP"per"
itera*on" #"Itera*ons"
Serial""""
Run>*me""
(1"GFLOP/s)"
10
6$6$x$10
6$9$
1.69$
29.5$
60,000$
20.5$days$
10
6$6$x$10
6$19$
2.48$
90.7$
60,000$
63$days$
5$x$10
6$50$x$10
6$19$
24.0$
544.7$
220,000$
3.8$years$
Scientific Computing Programming with Parallel Objects
IPLMCFD
• IPLMCFD:
❖
Irregularly Portioned Lagrangian Monte Carlo Finite Difference.
❖
A massively parallel solver for turbulent reactive flows.
• LES via filtered density function (FDF).
Scientific Computing Programming with Parallel Objects
Load Imbalance
• IPLMCFD uses a graph partitioning library
(METIS) to redistribute work.
• Requires to split execution between calls to
repartition cells.
Scientific Computing Programming with Parallel Objects
IPLMCFD
• Goals:
• Load balance processors through
weighted graph partitioning.
• To minimize the edge-cut.
• Irregularly shaped
decompositions:
• Disadvantages:
• Nontrivial communication patterns
• Increased communication cost.
• Advantage (major):
• Evenly distributed load among
partitions.
25
P. H. Pisciuneri et al., SIAM J. Sci.
Comput., vol. 35, no. 4, pp.
Scientific Computing Programming with Parallel Objects
Simulation of a Premixed Flame
Scientific Computing Programming with Parallel Objects
Performance of IPLMCFD
27
Scientific Computing Programming with Parallel Objects
Cost of Repartitioning
28
O(10
2)-O(10
3)
Scientific Computing Programming with Parallel Objects
HPC Languages
29Fortran
C/C++
Python
Chapel
UPC
HPF
CAF
Scientific Computing Programming with Parallel Objects
Parallel Objects in Python
30
class
Patch:
particles = ...
def
send():
computes[i,j].
recv
(particles)
def
update(part_info):
...
class
Compute:
def
recv(particles):
...
patches[i].
update
(part_info)
patches[j].
update
(part_info)
Node X
Node Y
Node Z
Scientific Computing Programming with Parallel Objects
Acknowledgments
• University of Illinois:
❖
Prof. Laxmikant V. Kalé (Computer Science)
• University of Pittsburgh:
❖
Dr. Patrick Pisciuneri (Center for Simulation and Modeling)
❖
Prof. Peyman Givi (Mechanical Engineering)
• Images extracted from Wikipedia and
www.defenceindustrydaily.com
www.maclife.com
www.theregister.co.uk
www.geforce.com
Scientific Computing Programming with Parallel Objects
Conclusions
• Big potential of parallel objects in
scientific computing:
❖
Simplified programming model
❖
Improved performance due to overdecomposition
❖
Dynamic load balancing
• Research opportunity:
❖
Parallel-objects abstractions in Python
32