• No results found

Charm++, what s that?!

N/A
N/A
Protected

Academic year: 2021

Share "Charm++, what s that?!"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Charm++, what’s that?!

Les Mardis du dev’

François Tessier - Runtime team

(2)

1 Introduction

2 Charm++

3 Basic examples

4 Load Balancing

(3)

Outline

1 Introduction 2 Charm++ 3 Basic examples 4 Load Balancing 5 Conclusion

(4)

Parallel programming

Decomposition

What to do in parallel

Mapping

Which processor execute each task

Scheduling

The order

Machine dependent expression

(5)

Introduction

Scalable execution of parallel applications

Cluster computing

Number of cores is increasing

Butmemory per coreis decreasing (or increasing slowly)

Applications need to communicate more and more...

10 100 1000 10000 100000 1e+06 1e+07 1990 1995 2000 2005 2010 2015

#Cores / Memory per core (in MB)

Years

Number of cores and amount of memory per core of the first ranking cluster in the Top500 list since 1993

#Cores Memory per core

(6)

How to communicate between nodes?

Message passing paradigm Inter-process communication MPI, SOAP, Charm++

Message passing interface

Process 1 Process 2 Process 3 Process 4

(7)

Outline

1 Introduction 2 Charm++ 3 Basic examples 4 Load Balancing 5 Conclusion

(8)
(9)

Charm++

Presentation

Developed at the PPL (Urbana-Champaign, IL) High-level abstraction of parallel programming C++, Python

(10)

Presentation

Developed at the PPL (Urbana-Champaign, IL) High-level abstraction of parallel programming C++, Python

How to define Charm++?

Object-oriented

Based on the C++ programming language

Objects called chares (contain data, send and receive messages, perform task)

(11)

Charm++

Presentation

Developed at the PPL (Urbana-Champaign, IL) High-level abstraction of parallel programming C++, Python

How to define Charm++?

Object-oriented

Based on the C++ programming language

Objects called chares (contain data, send and receive messages, perform task)

Asynchronous message passing

Chares exchange messages to communicate

(12)

Presentation

Developed at the PPL (Urbana-Champaign, IL) High-level abstraction of parallel programming C++, Python

How to define Charm++?

Object-oriented

Based on the C++ programming language

Objects called chares (contain data, send and receive messages, perform task)

Asynchronous message passing

Chares exchange messages to communicate

Messages sent/received asynchronously to the code execution

Parallel

Chunks of code Messages

(13)

Charm++

Presentation

Developed at the PPL (Urbana-Champaign, IL) High-level abstraction of parallel programming C++, Python

How to define Charm++?

Object-oriented

Based on the C++ programming language

Objects called chares (contain data, send and receive messages, perform task)

Asynchronous message passing

Chares exchange messages to communicate

Messages sent/received asynchronously to the code execution

Parallel

Chunks of code Messages

Programming paradigm

Way of writing program

(14)

What is a Charm++ Program?

Collection of communicant chare objects, included the "main" chare (global object space)

No importance about the number of PU or the type of interconnect (RTS) Send a message by calling another chare’s entry point function (reception point)

(15)

Charm++

The Charm++ Compilation and the Runtime System

First strenght: it works!

Application written as a collection of communicating objects

Compilation, execution : Specific target platform, physical resources (-np) RTS manages the details of the physical resources...

... and can take some decisions about :

Mapping chare objects to physical processors Load-balancing chare objects

Checkpointing Fault-tolereance ...

(16)

1 Introduction

2 Charm++

3 Basic examples

4 Load Balancing

(17)

Charm++ - Hello World!

Listing 1: Hello.h # i f n d e f _ _ M A I N _ H _ _ # d e f i n e _ _ M A I N _ H _ _ c l a s s M a i n : p u b l i c C B a s e _ M a i n { p u b l i c: M a i n ( C k A r g M s g * msg ); M a i n ( C k M i g r a t e M e s s a g e * msg ); }; # e n d i f // _ _ M A I N _ H _ _ Listing 2: Hello.ci m a i n m o d u l e m a i n { m a i n c h a r e M a i n { e n t r y M a i n ( C k A r g M s g * msg ); }; }; Listing 3: Hello.C # i n c l u d e " m a i n . d e c l . h " # i n c l u d e " m a i n . h " // E n t r y p o i n t of C h a r m ++ a p p l i c a t i o n M a i n :: M a i n ( C k A r g M s g * msg ) { // P r i n t a m e s s a g e for the u s e r C k P r i n t f (" H e l l o ␣ W o r l d !\ n "); // E x i t the a p p l i c a t i o n C k E x i t (); } // C o n s t r u c t o r n e e d e d for c h a r e o b j e c t // m i g r a t i o n ( i g n o r e for now ) // N O T E : T h i s c o n s t r u c t o r d o e s not n e e d // to a p p e a r in the ". ci " f i l e M a i n :: M a i n ( C k M i g r a t e M e s s a g e * msg ) { } # i n c l u d e " m a i n . def . h "

(18)

Listing 4: 1Darray.ci m a i n m o d u l e h e l l o { r e a d o n l y C P r o x y _ M a i n m a i n P r o x y ; r e a d o n l y int n E l e m e n t s ; m a i n c h a r e M a i n { e n t r y M a i n ( C k A r g M s g * m ); e n t r y v o i d d o n e (v o i d); }; a r r a y [1 D ] H e l l o { e n t r y H e l l o (v o i d); e n t r y v o i d S a y H i (int f r o m ); }; }; Listing 5: 1Darray.C # i n c l u d e < s t d i o . h > # i n c l u d e" h e l l o . d e c l . h " /* r e a d o n l y */ C P r o x y _ M a i n m a i n P r o x y ; /* r e a d o n l y */ int n E l e m e n t s ; /* m a i n c h a r e */ c l a s s M a i n : p u b l i c C B a s e _ M a i n { p u b l i c: M a i n ( C k A r g M s g * m ) { if( m - > a r g c >1 ) n E l e m e n t s = a t o i ( m - > a r g v [ 1 ] ) ; d e l e t e m ; C k P r i n t f (" Run ␣ on ␣ % d ␣ p r o c e s s o r s ␣ for ␣ % d ␣ el .\ n ", C k N u m P e s () , n E l e m e n t s ); m a i n P r o x y = t h i s P r o x y ; C P r o x y _ H e l l o arr = C P r o x y _ H e l l o :: c k N e w ( n E l e m e n t s ); arr [ 0 ] . S a y H i ( -1); }; v o i d d o n e (v o i d) { C k P r i n t f (" All ␣ d o n e \ n "); C k E x i t (); }; }; /* a r r a y [1 D ] */ c l a s s H e l l o : p u b l i c C B a s e _ H e l l o { p u b l i c: H e l l o () { C k P r i n t f (" H e l l o ␣ % d ␣ c r e a t e d \ n ", t h i s I n d e x ); } H e l l o ( C k M i g r a t e M e s s a g e * m ) {} v o i d S a y H i (int f r o m ) { C k P r i n t f (" \" H e l l o \" ␣ f r o m ␣ H e l l o ␣ c h a r e ␣ # ␣ % d ␣ on ␣ " " p r o c e s s o r ␣ % d ␣ ( t o l d ␣ by ␣ % d ).\ n ", t h i s I n d e x , C k M y P e () , f r o m ); if ( t h i s I n d e x < n E l e m e n t s -1) t h i s P r o x y [ t h i s I n d e x + 1 ] . S a y H i ( t h i s I n d e x ); e l s e m a i n P r o x y . d o n e (); // D o n e !

(19)

Charm++ - 1Darray

$ ./charmrun +p3 ./hello 10

Running "Hello World" with 10 elements using 3 processors. "Hello" from Hello chare # 0 on processor 0 (told by -1). "Hello" from Hello chare # 1 on processor 1 (told by 0). "Hello" from Hello chare # 2 on processor 2 (told by 1). "Hello" from Hello chare # 3 on processor 0 (told by 2). "Hello" from Hello chare # 4 on processor 1 (told by 3). "Hello" from Hello chare # 5 on processor 2 (told by 4). "Hello" from Hello chare # 6 on processor 0 (told by 5). "Hello" from Hello chare # 7 on processor 1 (told by 6). "Hello" from Hello chare # 9 on processor 0 (told by 8). "Hello" from Hello chare # 8 on processor 2 (told by 7).

(20)

1 Introduction

2 Charm++

3 Basic examples

4 Load Balancing

(21)

Charm++ - Load balancing

Load-balancing

Balance the load between the processing units

(22)
(23)

Charm++ - Load balancing

Initial state

(24)

Listing 6: NewLB.C

C r e a t e L B F u n c _ D e f ( NewLB , " C r e a t e ␣ N e w L B ")

v o i d N e w L B :: w o r k ( B a s e L B :: L D S t a t s * s t a t s ) {

C k P r i n t f (" % d ␣ c h a r e s ␣ e x e c u t e d ␣ on ␣ % d ␣ p r o c s \ n ", stats - > n_obj , stats - > n p r o c s ( ) ) ; // P r o c e s s o r a r r a y P r o c A r r a y * p a r r = new P r o c A r r a y ( s t a t s ); // O b j e c t g r a p h O b j G r a p h * ogr = new O b j G r a p h ( s t a t s ); std :: vector < Vertex >:: i t e r a t o r v _ i t ; std :: vector < Edge >:: i t e r a t o r e _ i t ;

for ( v _ i t = ogr - > v e r t i c e s . b e g i n (); v _ i t != ogr - > v e r t i c e s . end (); ++ v _ i t ) {

d o u b l e l o a d = (* v _ i t ). g e t V e r t e x L o a d (); for ( e _ i t =(* v _ i t ). s e n d T o L i s t . b e g i n (); e _ i t ! = ( * v _ i t ). s e n d T o L i s t . end (); ++ e _ i t ) { int f r o m = (* v _ i t ). g e t V e r t e x I d (); int to = (* e _ i t ). g e t N e i g h b o r I d (); if ( f r o m != to ) { C k P r i n t f (" % d ␣ m s g s ␣ s e n t ␣ f r o m ␣ % d ␣ to ␣ % d \ n ", (* e _ i t ). g e t N u m M s g s () , from , to ); } } } ogr - > v e r t i c e s [ 1 2 ] . s e t N e w P e ( 2 ) ;

(25)

Charm++ - Load balancing

Load-balancing

For iterative applications

charmrun +p64 ./App +balancer newLB +LBDebug 1 [...] Objects can migrate using a pup function (Pack and Unpack)

Listing 7: PUP function c l a s s M y C l a s s { p u b l i c: int a ; int * b ; M y C l a s s () { int a = 42;

int * b = (int *) m a l l o c ( a *s i z e o f(int)); }

v o i d pup ( PUP :: er & p ) { p | a ; P U P a r r a y ( p , b , a ); if ( p . i s U n p a c k i n g ()) { b = (int *) m a l l o c ( a *s i z e o f(int)); } } }

(26)

Load-balancing

For iterative applications

charmrun +p64 ./App +balancer newLB +LBDebug 1 [...] Objects can migrate using a pup function (Pack and Unpack) Several load balancers

GreedyLB RefineLB

Hierarchical load balancers

Thermal aware load balancers...

(27)

Charm++ - Load balancing

Load-balancing

For iterative applications

charmrun +p64 ./App +balancer newLB +LBDebug 1 [...] Objects can migrate using a pup function (Pack and Unpack) Several load balancers

GreedyLB RefineLB

Hierarchical load balancers

Thermal aware load balancers...

0 2 4 6 1 3 5 7

Groups of chares assigned to cores

(28)

Load-balancing

For iterative applications

charmrun +p64 ./App +balancer newLB +LBDebug 1 [...] Objects can migrate using a pup function (Pack and Unpack) Several load balancers

GreedyLB RefineLB

Hierarchical load balancers

Thermal aware load balancers...

(29)

Charm++ - Load balancing

Load-balancing

For iterative applications

charmrun +p64 ./App +balancer newLB +LBDebug 1 [...] Objects can migrate using a pup function (Pack and Unpack) Several load balancers

GreedyLB RefineLB

Hierarchical load balancers

Thermal aware load balancers...

0 2 4 6 1 3 5 7

Groups of chares assigned to cores

(30)

kNeighbor

Benchmarks application designed to simulate intensive communication between processes

Experiments on PlaFRIM : 8 nodes with 8 cores on each (Intel Xeon 5550)

Dumm

yLB

GreedyLB reeBased

Ex

ecution time (in seconds)

0 500 1000 1500 2000 kNeighbor on 64 cores 256 elements − 1MB message size

(31)

Outline

1 Introduction 2 Charm++ 3 Basic examples 4 Load Balancing 5 Conclusion

(32)

To conclude...

Charm++, message passing since 1984 High-level abstraction of a parallel program Interesting features : load balancing, fault-tolerance,...

Useful tools; CharmDebug, Projections Used by some important applications : ChaNGa, NAMD, OpenAtom

Want to know more?

Website : http://charm.cs.uiuc.edu/

Tutorials : http://charm.cs.illinois. edu/tutorial/TableOfContents.htm

(33)

The End

Figure

Figure : Compilation Process for a Chare Class(Credits : http://charm.cs.illinois.edu)

References

Related documents