• No results found

Session 2: MUST. Correctness Checking

N/A
N/A
Protected

Academic year: 2021

Share "Session 2: MUST. Correctness Checking"

Copied!
35
0
0

Loading.... (view fulltext now)

Full text

(1)

Dr. Matthias S. Müller

(RWTH Aachen University)

Tobias Hilbrich

(Technische Universität Dresden)

Joachim Protze

(RWTH Aachen University, LLNL)

Email:

Center for Information Services and High Performance Computing (ZIH)

Session 2: MUST

(2)

Matthias Müller, Joachim Protze, Tobias Hilbrich

2

Content

!  

Motivation

!  

MPI usage errors

!  

Examples: Common MPI usage errors

Ø

Including MUST’s error descriptions

!  

Correctness tools

(3)

#include <mpi.h> #include <stdio.h>

int main (int argc, char** argv) {

int rank, size, buf[8];

MPI_Comm_rank (MPI_COMM_WORLD, &rank); MPI_Comm_size (MPI_COMM_WORLD, &size); MPI_Datatype type;

MPI_Type_contiguous (2, MPI_INTEGER, &type);

MPI_Recv (buf, 2, MPI_INT, size - rank, 123, MPI_COMM_WORLD, MPI_STATUS_IGNORE); MPI_Send (buf, 2, type, size - rank, 123, MPI_COMM_WORLD);

printf ("Hello, I am rank %d of %d.\n", rank, size); return 0;

}

(4)

Matthias Müller, Joachim Protze, Tobias Hilbrich

4

Content

!  

Motivation

!  

MPI usage errors

!  

Examples: Common MPI usage errors

Ø

Including MUST’s error descriptions

!  

Correctness tools

(5)

MPI usage errors

!  

MPI programming is error prone

!  

Bugs may manifest as:

Ø

Crashes

Ø

Hangs

Ø

Wrong results

Ø

Not at all! (Sleeping bugs)

(6)

Matthias Müller, Joachim Protze, Tobias Hilbrich

6

MPI usage errors (2)

!  

Complications in MPI usage:

Ø

Non-blocking communication

Ø

Persistent communication

Ø

Complex collectives (e.g. Alltoallw)

Ø

Derived datatypes

Ø

Non-contiguous buffers

!  

Error Classes include:

Ø

Incorrect arguments

Ø

Resource errors

Ø

Buffer usage

Ø

Type matching

(7)

Content

!  

Motivation

!  

MPI usage errors

!  

Examples: Common MPI usage errors

Ø

Including MUST’s error descriptions

!  

Correctness tools

(8)

Matthias Müller, Joachim Protze, Tobias Hilbrich

8

Skipping some errors

!  

Missing MPI_Init:

!  

Current release doesn’t start to work,

implementation in progress

!  

Missing MPI_Finalize:

!  

Current release doesn’t terminate all

analyses, work in progress

!

Src/dest rank out of range (size-rank): leads to

(9)

MUST: Tool design

Application

MPI Library

PMPI

In

te

rf

ace

P^

nMPI

L

ib

ra

ry

Split-comm-world

GTI: event forwarding, network

Rewrite-comm-world

Force status wrapper

Local Analyses

Non-Local Analyses

MPI Library

(10)

Matthias Müller, Joachim Protze, Tobias Hilbrich

10

GTI: event forwarding, network

Local Analyses

Non-Local Analyses

0

1

2

3

4

5

6

(11)

#include <mpi.h> #include <stdio.h>

int main (int argc, char** argv) {

int rank, size, buf[8]; MPI_Init (&argc, &argv);

MPI_Comm_rank (MPI_COMM_WORLD, &rank); MPI_Comm_size (MPI_COMM_WORLD, &size); MPI_Datatype type;

MPI_Type_contiguous (2, MPI_INTEGER, &type);

MPI_Recv (buf, 2, MPI_INT, size - rank - 1, 123, MPI_COMM_WORLD, MPI_STATUS_IGNORE); MPI_Send (buf, 2, type, size - rank - 1, 123, MPI_COMM_WORLD);

printf ("Hello, I am rank %d of %d.\n", rank, size); MPI_Finalize ();

return 0; }

(12)

Matthias Müller, Joachim Protze, Tobias Hilbrich

12

Click  for  graphical  representa2on  of  

the  detected  deadlock  situa2on.  

(13)

Graphical representation of deadlocks

Simple  call  

stack  for  this  

example.  

Rank  0  waits  

for  rank  1  

and  vv.  

(14)

Matthias Müller, Joachim Protze, Tobias Hilbrich

14

Fix1: use asynchronous receive

#include <mpi.h> #include <stdio.h>

int main (int argc, char** argv) {

int rank, size, buf[8]; MPI_Init (&argc, &argv);

MPI_Comm_rank (MPI_COMM_WORLD, &rank); MPI_Comm_size (MPI_COMM_WORLD, &size); MPI_Datatype type;

MPI_Type_contiguous (2, MPI_INTEGER, &type); MPI_Request request;

MPI_Irecv (buf, 2, MPI_INT, size - rank - 1, 123, MPI_COMM_WORLD, &request); MPI_Send (buf, 2, type, size - rank - 1, 123, MPI_COMM_WORLD);

printf ("Hello, I am rank %d of %d.\n", rank, size); MPI_Finalize ();

return 0; }

Use  asynchronous  

receive:  (MPI_Irecv)  

(15)

MUST detects errors in handling datatypes

Use  of  uncommited  

datatype:  

type

(16)

Matthias Müller, Joachim Protze, Tobias Hilbrich

16

Fix2: use MPI_Type_commit

#include <mpi.h> #include <stdio.h>

int main (int argc, char** argv) {

int rank, size, buf[8]; MPI_Init (&argc, &argv);

MPI_Comm_rank (MPI_COMM_WORLD, &rank); MPI_Comm_size (MPI_COMM_WORLD, &size); MPI_Datatype type;

MPI_Type_contiguous (2, MPI_INTEGER, &type); MPI_Type_commit (&type);

MPI_Request request;

MPI_Irecv (buf, 2, MPI_INT, size - rank - 1, 123, MPI_COMM_WORLD, &request); MPI_Send (buf, 2, type, size - rank - 1, 123, MPI_COMM_WORLD);

printf ("Hello, I am rank %d of %d.\n", rank, size); MPI_Finalize (); return 0; }

Commit  the  

datatype  before  

usage  

(17)

MUST detects errors in transfer buffer sizes / types

Size  of  sent  message  larger  

than  receive  buffer  

(18)

Matthias Müller, Joachim Protze, Tobias Hilbrich

18

Fix3: use same message size for send and receive

#include <mpi.h> #include <stdio.h>

int main (int argc, char** argv) {

int rank, size, buf[8]; MPI_Init (&argc, &argv);

MPI_Comm_rank (MPI_COMM_WORLD, &rank); MPI_Comm_size (MPI_COMM_WORLD, &size); MPI_Datatype type;

MPI_Type_contiguous (2, MPI_INTEGER, &type); MPI_Type_commit (&type);

MPI_Request request;

MPI_Irecv (buf, 2, MPI_INT, size - rank - 1, 123, MPI_COMM_WORLD, &request); MPI_Send (buf, 1, type, size - rank - 1, 123, MPI_COMM_WORLD);

printf ("Hello, I am rank %d of %d.\n", rank, size); MPI_Finalize ();

return 0; }

Reduce  the  message  

size  

(19)

MUST detects use of wrong argument values

Use  of  Fortran  type  in  C,  

datatype  mismatch  between  

(20)

Matthias Müller, Joachim Protze, Tobias Hilbrich

20

Fix4: use C-datatype constants in C-code

#include <mpi.h> #include <stdio.h>

int main (int argc, char** argv) {

int rank, size, buf[8]; MPI_Init (&argc, &argv);

MPI_Comm_rank (MPI_COMM_WORLD, &rank); MPI_Comm_size (MPI_COMM_WORLD, &size); MPI_Datatype type;

MPI_Type_contiguous (2, MPI_INT, &type); MPI_Type_commit (&type);

MPI_Request request;

MPI_Irecv (buf, 2, MPI_INT, size - rank - 1, 123, MPI_COMM_WORLD, &request); MPI_Send (buf, 1, type, size - rank - 1, 123, MPI_COMM_WORLD);

printf ("Hello, I am rank %d of %d.\n", rank, size); MPI_Finalize ();

return 0; }

Use  the  integer  

datatype  intended  

(21)

MUST detects data races in asynchronous communication

Data  race  between  send  and  

ascynchronous  receive  opera2on

(22)

Matthias Müller, Joachim Protze, Tobias Hilbrich

22

Graphical representation of the race condition

Graphical  representa2on  of  the  data  

race  loca2on

(23)

Fix5: use independent memory regions

#include <mpi.h> #include <stdio.h>

int main (int argc, char** argv) {

int rank, size, buf[8]; MPI_Init (&argc, &argv);

MPI_Comm_rank (MPI_COMM_WORLD, &rank); MPI_Comm_size (MPI_COMM_WORLD, &size); MPI_Datatype type;

MPI_Type_contiguous (2, MPI_INTEGER, &type); MPI_Type_commit (&type);

MPI_Request request;

MPI_Irecv (buf, 2, MPI_INT, size - rank - 1, 123, MPI_COMM_WORLD, &request); MPI_Send (buf + 4, 1, type, size - rank - 1, 123, MPI_COMM_WORLD);

printf ("Hello, I am rank %d of %d.\n", rank, size); MPI_Finalize ();

return 0;

Offset  points  to  

independent  

(24)

Matthias Müller, Joachim Protze, Tobias Hilbrich

24

MUST detects leaks of user defined objects

Leak  of  user  defined  

datatype  object  

!  

User defined objects include

!

MPI_Comms (even by MPI_Comm_dup)

!

MPI_Datatypes

(25)

Fix6: Deallocate datatype object

#include <mpi.h> #include <stdio.h>

int main (int argc, char** argv) {

int rank, size, buf[8]; MPI_Init (&argc, &argv);

MPI_Comm_rank (MPI_COMM_WORLD, &rank); MPI_Comm_size (MPI_COMM_WORLD, &size); MPI_Datatype type;

MPI_Type_contiguous (2, MPI_INT, &type); MPI_Type_commit (&type);

MPI_Request request;

MPI_Irecv (buf, 2, MPI_INT, size - rank - 1, 123, MPI_COMM_WORLD, &request); MPI_Send (buf + 4, 1, type, size - rank - 1, 123, MPI_COMM_WORLD);

printf ("Hello, I am rank %d of %d.\n", rank, size); MPI_Type_free (&type);

MPI_Finalize ();

Deallocate  the  

(26)

Matthias Müller, Joachim Protze, Tobias Hilbrich

26

MUST detects unfinished asynchronous communication

Remaining  unfinished  

asynchronous  receive  

(27)

Fix8: use MPI_Wait to finish asynchronous communication

#include <mpi.h> #include <stdio.h>

int main (int argc, char** argv) {

int rank, size, buf[8]; MPI_Init (&argc, &argv);

MPI_Comm_rank (MPI_COMM_WORLD, &rank); MPI_Comm_size (MPI_COMM_WORLD, &size);

MPI_Datatype type;

MPI_Type_contiguous (2, MPI_INT, &type); MPI_Type_commit (&type);

MPI_Request request;

MPI_Irecv (buf, 2, MPI_INT, size - rank - 1, 123, MPI_COMM_WORLD, &request); MPI_Send (buf + 4, 1, type, size - rank - 1, 123, MPI_COMM_WORLD);

MPI_Wait (&request, MPI_STATUS_IGNORE);

printf ("Hello, I am rank %d of %d.\n", rank, size); MPI_Type_free (&type);

MPI_Finalize ();

Finish  the  

asynchronous  

communica2on  

(28)

Matthias Müller, Joachim Protze, Tobias Hilbrich

28

No  further  error  

detected  

Hopefully  this  message  

applies  to  many  

(29)

Content

!  

Motivation

!  

MPI usage errors

!  

Examples: Common MPI usage errors

Ø

Including MUST’s error descriptions

!  

Correctness tools

(30)

Matthias Müller, Joachim Protze, Tobias Hilbrich

30

Classes of Correctness Tools

!  

Debuggers:

Ø

Helpful to pinpoint any error

Ø

Finding the root cause may be very hard

Ø

Won’t detect sleeping errors

Ø

E.g.: gdb, TotalView, Alinea DDT

!  

Static Analysis:

Ø

Compilers and Source analyzers

Ø

Typically: type and expression errors

Ø

E.g.: MPI-Check

!  

Model checking:

Ø

Requires a model of your applications

Ø

State explosion possible

(31)

Strategies of Correctness Tools

!  

Runtime error detection:

Ø

Inspect MPI calls at runtime

Ø

Limited to the timely interleaving that is observed

Ø

Causes overhead during application run

Ø

E.g.: Intel Trace Analyzer, Umpire, Marmot, MUST

!  

Formal verification:

Ø

Extension of runtime error detection

Ø

Explores ALL possible timely interleavings

Ø

Can detect potential deadlocks or type missmatches that

would otherwise not occur in the presence of a tool

Ø

For non-deterministic applications exponential exploration

space

(32)

Matthias Müller, Joachim Protze, Tobias Hilbrich

32

Content

!  

Motivation

!  

MPI usage errors

!  

Examples: Common MPI usage errors

Ø

Including MUST’s error descriptions

!  

Correctness tools

(33)

MUST Usage

1)

Compile and link application as usual

-

Link against the shared version of the MPI lib (Usually default)

2)

Replace “mpiexec” with “mustrun”

-

E.g.:

mustrun –np 4 myApp.exe input.txt output.txt

3)

Inspect “MUST_Output.html” in run directory

-

“MUST_Output/MUST_Deadlock.dot” exists in case of deadlock

-

Visualize with:

dot –Tps MUST_Deadlock.dot –o deadlock.ps

The mustrun script will use an extra process for

non-local checks (Invisible to application)

I.e.: “mustrun –np 4

” will issue a “mpirun –np 5

(34)

Matthias Müller, Joachim Protze, Tobias Hilbrich

34

MUST - Features

!  

Local checks:

Ø

Integer validation

Ø

Integrity checks (pointers valid, etc.)

Ø

Operation, Request, Communicator, Datatype,

Group usage

Ø

Resource leak detection

Ø

Memory overlap checks

!  

Non-local checks:

Ø

Collective verification

Ø

Lost message detection

Ø

Type matching (For P2P and collectives)

(35)

MUST - Features: Scalability

!  

Local checks largely scalable

!  

Non-local checks:

!  

Current default uses a central process

Ø

This process is an MPI task taken from the

application

Ø

Limited scalability ~100 tasks (Depending on

application)

!  

Distributed analysis available (tested with 10k

tasks)

Ø

Uses more extra tasks (10%-100%)

!  

Recommended: Logging to an HTML file

References

Related documents

(4) method for OFMSW management, whereas this waste collected in each town is transported to a theoretical biogas plant placed at Marineo, using the above efficient

A total number and ranking of Iraqi journals on 2012 was only three scientific journals that indexed in the Journal Citation Reports (JCR) and SCOPUS.In conclusion, Iraqi publications

In a unanimous decision, the Missouri Supreme Court reversed the trial court, holding that Dyer was not entitled to statutory expungement under section 610.122

Development Supplement, the present study used fixed effects models to examine within- child associations between changes in family income, cumulative risk exposure (as.. measured by

Provision of counseling and treatment for sexual trauma by the De- partment of Veterans Affairs to members of the Armed Forces.. Department of Veterans Affairs screening mechanism

• The effects of monetary policy on activity, as measured by the impact of policy-determined interest rate changes on housing market interest rates and thence to house prices

Disk storage technology still suggests large central disks because of their better price/capacity ratio and their performance characteristics: Accessing a remote fast disk over a