• No results found

Scientific Computing Programming with Parallel Objects

N/A
N/A
Protected

Academic year: 2021

Share "Scientific Computing Programming with Parallel Objects"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

Esteban Meneses, PhD

School of Computing, Costa Rica Institute of Technology

Scientific Computing Programming


with Parallel Objects

(2)

Scientific Computing Programming with Parallel Objects

Parallel Architectures Galore

2

Personal Computing

Embedded Computing

Mobile Computing

Supercomputing

Moore’s Law

(3)

Scientific Computing Programming with Parallel Objects

My Parallel Laptop

3

Processor

(multicore)

Accelerator

(manycore)

Intel Core i7

NVIDIA GeForce GT

750M

4 cores

384 cores

2.5 GHz

967 MHz

(4)

Scientific Computing Programming with Parallel Objects

It’s movie time!

4

(5)

Scientific Computing Programming with Parallel Objects

Speedup

5

Heat Transfer Problem

Sp

eedup

1

10

100

Sequential

Multicore

Manycore

1

3.7

68.78

Time (seconds)

0

10

20

30

40

Sequential Multicore Manycore

0.475

8.83

(6)

Scientific Computing Programming with Parallel Objects

Supercomputer

6

(7)

Scientific Computing Programming with Parallel Objects

Top500

7

Source:

http://www.top500.org

(June 2015)

(8)

Scientific Computing Programming with Parallel Objects

Exascale

Challenges:

• Heterogeneity

• Low resilience

• Thermal variation

• Irregular computation

• Programability

8

Big Data

Big Network

Big Compute

(Internet of Things)

(Exascale)

Source: http://www.top500.org (June 2015)

Big Intelligence

(9)

Scientific Computing Programming with Parallel Objects

Single Program Multiple Data (SPMD)

9

Sequential

Parallel

CPU

send

data decomposition + communication

receive

MPI

Poor functional decomposition

Synchronized communication

CPU

CPU

(10)

Scientific Computing Programming with Parallel Objects

Parallel Objects

10

NAMD

Charm++

Entities and interactions

Asynchronous communication

Flexible Distribution

Non-blocking communication operations

Source: http://charm.cs.illinois.edu

(11)

Scientific Computing Programming with Parallel Objects

Parallel Objects Model

• An application is

decomposed into wudus

(work and data units).

• Objects are reactive

entities: interface of

remote methods.

• All message-passing

operations are

non-blocking: asynchronous

method invocation.

• A message-driven

execution similar to

Active Messages.

• Objects know how to

serialize/deserialize,

also called the

pack-unpack (PUP)

framework.


Goals:

Latency hiding

Load balancing

Adaptivity


11

!

"

#

$

%

&

'

(

(12)

Scientific Computing Programming with Parallel Objects

Introspective Runtime System

• A thin layer between

the application and

the machine.

• Based on object-based

overdecomposition:

many more objects

than processing

entities.

• Components:

• Message scheduler.

• Routing tables.

• Load and communication

monitoring.

12

!

"

#

$

%

&

'

(

Node A

Node B

Node C

Node D

!

"

#

$

%

&

'

(

(13)

Scientific Computing Programming with Parallel Objects

Migration

• The underlying system consists of a collection of

processing entities (processors, or nodes).

• Objects are distributed among the processing entities.

That assignment may change dynamically if load

imbalance arises.

• An introspective runtime 


system detects 


performance bottlenecks


and balances load by


moving objects around.

13

Node A Node B Node C Node D

!

"

#

$

%

&

'

(

#

(14)

Scientific Computing Programming with Parallel Objects

Dynamic Load Balancing

• NP-complete problem.

• Runtime system collects

load information and

communication graph.

• Greedy strategies, graph

partitioning.

• Runtime system shuffles

objects around to avoid

overloading.

• Principle of persistence.

• Based on PUP

framework.

(15)

Scientific Computing Programming with Parallel Objects

Charm++

• Actively developed since mid 90’s.

• Features language extensions,

network layers, load balancers,

tools, and several applications.

• Objects are called chares.

• Chare arrays are the 


main collection of objects.

15

(16)

Scientific Computing Programming with Parallel Objects

Charm++ (cont.)

16

(17)

Scientific Computing Programming with Parallel Objects

Charm++ (cont.)

17

(18)

Scientific Computing Programming with Parallel Objects

Charm++ Runtime System

18

(19)

Scientific Computing Programming with Parallel Objects

MPI vs Charm++

19

MPI

Charm++

Over-decomposition

No*

Yes

Load Balancing

No*

Yes

Fault Tolerance

No*

Yes

Non-blocking Collectives

Yes**

Yes

Dynamic Adaptivity

No

Yes

Introspection

No

Yes

Wide Adoption

Yes

No

* Some third-party libraries may implement this feature.


** As of MPI-3 standard.

(20)

Scientific Computing Programming with Parallel Objects

Example: Heat Transfer Problem

20

(21)

Scientific Computing Programming with Parallel Objects

Example: Heat Transfer Problem

21

(22)

Scientific Computing Programming with Parallel Objects

Computational Fluid Dynamics

22

#"Grids" #"Par*cles" #"Species"

Required"

Memory"

GBs"

GFLOP"per"

itera*on" #"Itera*ons"

Serial""""

Run>*me""

(1"GFLOP/s)"

10

6$

6$x$10

6$

9$

1.69$

29.5$

60,000$

20.5$days$

10

6$

6$x$10

6$

19$

2.48$

90.7$

60,000$

63$days$

5$x$10

6$

50$x$10

6$

19$

24.0$

544.7$

220,000$

3.8$years$

(23)

Scientific Computing Programming with Parallel Objects

IPLMCFD

• IPLMCFD:

Irregularly Portioned Lagrangian Monte Carlo Finite Difference.

A massively parallel solver for turbulent reactive flows.

• LES via filtered density function (FDF).

(24)

Scientific Computing Programming with Parallel Objects

Load Imbalance

• IPLMCFD uses a graph partitioning library

(METIS) to redistribute work.

• Requires to split execution between calls to

repartition cells.

(25)

Scientific Computing Programming with Parallel Objects

IPLMCFD

• Goals:

• Load balance processors through

weighted graph partitioning.

• To minimize the edge-cut.

• Irregularly shaped

decompositions:

• Disadvantages:

• Nontrivial communication patterns

• Increased communication cost.

• Advantage (major):

• Evenly distributed load among

partitions.

25

P. H. Pisciuneri et al., SIAM J. Sci.

Comput., vol. 35, no. 4, pp.

(26)

Scientific Computing Programming with Parallel Objects

Simulation of a Premixed Flame

(27)

Scientific Computing Programming with Parallel Objects

Performance of IPLMCFD

27

(28)

Scientific Computing Programming with Parallel Objects

Cost of Repartitioning

28

O(10

2

)-O(10

3

)

(29)

Scientific Computing Programming with Parallel Objects

HPC Languages

29

Fortran

C/C++

Python

Chapel

UPC

HPF

CAF

(30)

Scientific Computing Programming with Parallel Objects

Parallel Objects in Python

30

class

Patch:

particles = ...

def

send():

computes[i,j].

recv

(particles)

def

update(part_info):

...

class

Compute:

def

recv(particles):

...

patches[i].

update

(part_info)

patches[j].

update

(part_info)

Node X

Node Y

Node Z

(31)

Scientific Computing Programming with Parallel Objects

Acknowledgments

• University of Illinois:

Prof. Laxmikant V. Kalé (Computer Science)

• University of Pittsburgh:

Dr. Patrick Pisciuneri (Center for Simulation and Modeling)

Prof. Peyman Givi (Mechanical Engineering)

• Images extracted from Wikipedia and

www.defenceindustrydaily.com


www.maclife.com


www.theregister.co.uk


www.geforce.com


(32)

Scientific Computing Programming with Parallel Objects

Conclusions

• Big potential of parallel objects in 


scientific computing:

Simplified programming model

Improved performance due to overdecomposition

Dynamic load balancing

• Research opportunity:

Parallel-objects abstractions in Python

32

Thank you!

[email protected]

www.emeneses.org

References

Related documents

This work presents a Java tm interface to a native BSPlib library for implement- ing parallel algorithms in a structured way (as described by the BSP model), using the Java

Digital signal processing, fast Fourier transforms, dis- crete cosine transforms, parallel programming, high performance computing, message passing interface, OpenMP, MATLAB,

Skeletal parallel programming based on algorithmic skeletons [9, 44, 82, 112] gives a sys- tematic way to construct or derive efficient parallel programs using generic and

The NVIDIA CUDA technology is a novel computing architecture that enables the GPU to solve complex computational problems in image processing applications.. CUDA

Keywords: Scientific Computation, FORTRAN, C++, Java, MATLAB, Julia, Python, Programming Languages, Verification, Performance

This paper presents a sublinear parallel algorithm for dynamic programming problems such as computing an optimal order of matrix multiplications, an optimal

Hans-Wolfgang Loidl (Heriot-Watt Univ) F21DP – 2014/2015 1 / 28.. What Is Datacenter

We described trends in STEM education especially at the secondary education level as the moves toward introducing computer programming, introducing scientific computing using mobile