• No results found

RA MPI Compilers Debuggers Profiling. March 25, 2009

N/A
N/A
Protected

Academic year: 2021

Share "RA MPI Compilers Debuggers Profiling. March 25, 2009"

Copied!
61
0
0

Loading.... (view fulltext now)

Full text

(1)

RA

MPI Compilers

Debuggers

Profiling

(2)

Examples and Slides

To download examples on RA 1. mkdir class 2. cd class 3. wget http://geco.mines.edu/workshop/class2/examples/examples.tgz 4. tar -xzf examples.tgz 5. cd stommel

Slides

http://geco.mines.edu/workshop/tools

(3)

Experimental MPI

Versions

(4)

New MPI Compilers

Version

MVAPICH2 1.2

MVAPICH 1.1

OpenMPI 1.3.1

Both Intel and Portland Group Compilers

Support for Debuggers

(5)

Need to modify your Environment

Change .tcshrc or .bashrc file

Log out then log back in

Changes override mpi_selector settings

(6)

.tcshrc settings

setenv MPI_VERSION /lustre/home/apps/mpi/db/mvapich-1.1 setenv MPI_VERSION /lustre/home/apps/mpi/db/mvapich2-1.2 setenv MPI_VERSION /lustre/home/apps/mpi/db/openmpi1.3.1 setenv MPI_COMPILER intel

#setenv MPI_COMPILER pg

if ( $?MPI_COMPILER && $?MPI_VERSION ) then

setenv MPI_BASE $MPI_VERSION/$MPI_COMPILER

setenv LD_LIBRARY_PATH $MPI_BASE/lib:$LD_LIBRARY_PATH

setenv LD_LIBRARY_PATH $MPI_BASE/lib/shared:$LD_LIBRARY_PATH

setenv MANPATH $MPI_BASE/man:$MPI_BASE/shared/man:$MANPATH

set path = ( $MPI_BASE/bin $path )

(7)

.bashrc settings

export MPI_VERSION=/lustre/home/apps/mpi/db/mvapich-1.1 export MPI_VERSION=/lustre/home/apps/mpi/db/mvapich2-1.2 export MPI_VERSION=/lustre/home/apps/mpi/db/openmpi1.3.1 export MPI_COMPILER=intel #export MPI_COMPILER=pg if [ -n $MPI_COMPILER ]; then if [ -n $MPI_VERSION ]; then export MPI_BASE=$MPI_VERSION/$MPI_COMPILER export LD_LIBRARY_PATH=$MPI_BASE/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=$MPI_BASE/lib/shared:$LD_LIBRARY_PATH export MANPATH=$MPI_BASE/man:$MPI_BASE/shared/man:$MANPATH export PATH=$MPI_BASE/bin:$PATH fi fi

(8)

Base Script

#!/bin/csh #PBS -l nodes=2:ppn=8 #PBS -l walltime=00:02:00 #PBS -N testIO #PBS -o stdout.$PBS_JOBID #PBS -e stderr.$PBS_JOBID #PBS -r n #PBS -V #---cd $PBS_O_WORKDIR

sort -u $PBS_NODEFILE > mynodes.$PBS_JOBID

(9)

MPI Run commands

Version

Command

openmpi1.3.1 mpiexec -np 16 stc_06

mvapich2-1.2 mpiexec -np 16 /lustre/home/tkaiser/examples/stommel/stc_06 < st.in

mvapich-1.1

mpirun_rsh -hostfile $PBS_NODEFILE -np 16 stc_06 < st.in

mvapich-1.1

(10)
(11)

Not a big fan of debuggers

End up debugging the debugger

Steep learning curve

Can be misleading

Difficult for large processor count and the problem might only show up there

My favorite debuggers are

printf

(12)

However...

I recently used ddt to find a problem for which

printf did not work. It might have taken me weeks.

Print statements might make the problem go away

Debuggers are useful for learning a program that you have never seen

(13)

Allinea DDT debugger

X-Windows based

ssh -X ra

An initial setup is done the first time you run

Works with both Portland Group and Intel Fortran

Good support for Fortran modules

(14)

Environment for ddt

.tcshrc

set path = (

/lustre/home/apps/ddt2.4.1/bin

$path )

setenv DMALLOCPATH /lustre/home/apps/ddt2.4.1 setenv DMALLOC

setenv LD_LIBRARY_PATH $DMALLOCPATH/lib/64:$LD_LIBRARY_PATH

.bashrc

export PATH=

/lustre/home/apps/ddt2.4.1/bin

:$PATH

Requires that you use a MPI that supports debugging such as those listed above

(15)

Debug Compile Line

mpicc -g \ -L/lustre/home/apps/gdb-6.8/lib64 \ -liberty \ stc_06.c \ -o stc_06.g

(16)

Debug Compile Line

mpicc -g -L/lustre/home/apps/gdb-6.8/lib64 -liberty \ stc_06.c \

/lustre/home/apps/ddt2.4.1/lib/64/libdmalloc.a -o \

stc_06.g

Here we link to the debug memory

library. This is required if you want to

track memory usage in ddt.

(17)

stdin stdout stderr

stdin works for both Intel and Portland Group

stdout works with the Intel compiler without modification

Portland Group compiler requires a special call to be able to see stdout while the program is

running, (before MPI_Init)

This is NOT a bug

call setvbuf3f(6,2,0) for Fortan

(18)

Initial ddt setup

Run first time, creates a directory ~/.ddt

type ddt

Choose a MPI version

Choose a list of nodes (Default)

Note location of this file

Need to change this list to connect to running process

(19)
(20)

Running ddt

Select “Run and Debug a Program”

Select the program that you will run

Set number of processes

Most likely Set threads to “off”

Click Run

(21)

To show you...

Routine required for correct stdio with Portland Group compiler

Setting stdin

Module support

Changing values

(22)

Option: Let ddt submit a batch job

Your run script becomes a template which ddt fills in the arguments at submit time

Tell ddt the particulars

Program

Input

# processors <= 16

ddt will watch the queue for your job to start and then connect

(23)

Let ddt submit a batch job

Change your run line to run ddt with your program as an argument

mpiexec -n 8 stf_03.g < st.in

for example, becomes

mpiexec -n NUM_PROCS_TAG DDTPATH_TAG/bin/ddt-debugger DDT_DEBUGGER_ARGUMENTS_TAG PROGRAM_ARGUMENTS_TAG

Add (Not required but useful for attaching to already running jobs)

sort -u $PBS_NODEFILE > mynodes.$PBS_JOBID cp mynodes.$PBS_JOBID ~/.ddt/nodes

(24)

A simple script

(more later for specific versions of MPI) #!/bin/csh #PBS -l nodes=1:ppn=8 #PBS -l walltime=00:10:00 #PBS -N testIO #PBS -o stdout.$PBS_JOBID #PBS -e stderr.$PBS_JOBID #PBS -r n #PBS -V #--- cd $PBS_O_WORKDIR

#save a nicely sorted list of nodes

sort -u $PBS_NODEFILE > mynodes.$PBS_JOBID cp mynodes.$PBS_JOBID ~/.ddt/nodes

#for openmpi

#mpiexec -n 8 stf_03.g < st.in

Note this line is

commented out.

(25)
(26)
(27)

Let ddt submit the job

for you

(28)
(29)

OpenMPI

Debug Script

#!/bin/csh #PBS -l nodes=1:ppn=8 #PBS -l walltime=00:10:00 #PBS -N testIO #PBS -o stdout.$PBS_JOBID #PBS -e stderr.$PBS_JOBID #PBS -r n #PBS -V #--- cd $PBS_O_WORKDIR

#save a nicely sorted list of nodes

sort -u $PBS_NODEFILE > mynodes.$PBS_JOBID cp mynodes.$PBS_JOBID ~/.ddt/nodes

DDTPATH_TAG/bin/ddt-client DDT_DEBUGGER_ARGUMENTS_TAG mpiexec -np \ NUM_PROCS_TAG EXTRA_MPI_ARGUMENTS_TAG PROGRAM_TAG \

(30)

MVAPICH2

Debug Script

#!/bin/csh #PBS -l nodes=1:ppn=8 #PBS -l walltime=00:10:00 #PBS -N testIO #PBS -o stdout.$PBS_JOBID #PBS -e stderr.$PBS_JOBID #PBS -r n #PBS -V #--- cd $PBS_O_WORKDIR

#save a nicely sorted list of nodes

sort -u $PBS_NODEFILE > mynodes.$PBS_JOBID cp mynodes.$PBS_JOBID ~/.ddt/nodes

(31)

MVAPICH-1.1

Debug Script

#!/bin/csh #PBS -l nodes=1:ppn=8 #PBS -l walltime=00:15:00 #PBS -N testIO #PBS -o stdout.$PBS_JOBID #PBS -e stderr.$PBS_JOBID #PBS -r n #PBS -V cd $PBS_O_WORKDIR

#save a nicely sorted list of nodes

sort -u $PBS_NODEFILE > mynodes.$PBS_JOBID cp mynodes.$PBS_JOBID ~/.ddt/nodes

mpirun_rsh -hostfile $PBS_NODEFILE -n \

NUM_PROCS_TAG DDTPATH_TAG/bin/ddt-debugger \

(32)

Attaching to a batch job

Key here is that ddt needs to know where your job is running

Add the following two lines to your script

sort -u $PBS_NODEFILE > mynodes.$PBS_JOBID cp mynodes.$PBS_JOBID ~/.ddt/nodes

(33)

Attaching to a batch

job

(34)

To Attach to a Running Process

Session - New Session - Attach

List

should

pop up

(35)
(36)

Attaching to a interactive job

Key here is that ddt needs to know where your job is running

ddt will look in ~/.ddt/nodes for nodes to search

(37)

Attaching to an

interactive job

(38)
(39)

Things to show...

Changing MPI version

Basic setup

Launching a parallel job

Setting break points

Seeing and changing variables

Seeing modules

(40)
(41)

Integrated Performance Monitoring

(IPM)

Developed by Nick Wright of SDSC

http://www.sdsc.edu/us/tools/top/ipm/

Local limited documentation http://geco.mines.edu/ipm/

Available on RA for Experimental versions of MVAPICH*

Normal Compile - adding IPM library

Normal MPI run

Summary of MPI stats at the end of your run to stdout

(42)

Integrated Performance Monitoring

(IPM)

Integrated Performance Monitoring (IPM) is a tool that allows users to obtain a concise summary of the

performance and communication characteristics of their codes. IPM is invoked by the user at the time a job is run. By default, a short, text-based summary of the code's performance is provided, and a more detailed

Web page summary with graphs to help visualize the

(43)

Environment Additions for IPM

.tcshrc

set path = ( $path /lustre/home/apps/pl/bin )

set path = ( $path /lustre/home/apps/ipm/bin )

setenv IPM_KEYFILE /lustre/home/apps/ipm/ipm_key

.bashrc

export PATH=$PATH:/lustre/home/apps/pl/bin

export PATH=$PATH:/lustre/home/apps/ipm/bin

(44)

Compiling for IPM

mpif90 -g stf_03.f90 -L$MPI_BASE/ipm/lib -lipm -o stf_03.ipm

VERSION

Works?

mvapich-1.1/pg

yes

mvapich-1.1/intel

Stay Tuned

mvapich2-1.2/pg

yes

mvapich2-1.2/intel

yes

(45)

##IPMv0.923#################################################################### #

# command : unknown (completed)

# host : compute-9-9/x86_64_Linux mpi_tasks : 8 on 1 nodes # start : 03/24/09/14:08:52 wallclock : 31.347469 sec # stop : 03/24/09/14:09:24 %comm : 1.24

# gbytes : 0.00000e+00 total gflop/sec : 0.00000e+00 total #

############################################################################## # region : * [ntasks] = 8

#

# [total] <avg> min max # entries 8 1 1 1 # wallclock 250.773 31.3467 31.3465 31.3475 # user 250.589 31.3236 31.1813 31.3532 # system 0.448929 0.0561161 0.043993 0.089986 # mpi 3.10778 0.388473 0.112456 0.610158 # %comm 1.23925 0.35875 1.94643 # gflop/sec 0 0 0 0 # gbytes 0 0 0 0 # #

# [time] [calls] <%mpi> <%wall> # MPI_Recv 2.60098 32032 83.69 1.04 # MPI_Reduce 0.272061 8000 8.75 0.11 # MPI_Send 0.232291 32032 7.47 0.09 # MPI_Bcast 0.00119273 96 0.04 0.00 # MPI_Comm_size 0.000790782 24 0.03 0.00 # MPI_Allreduce 0.000330307 32 0.01 0.00 # MPI_Allgather 0.000130918 16 0.00 0.00 # MPI_Comm_rank 6.7791e-06 46 0.00 0.00 ###############################################################################

(46)

3/24/09 2:15 PM IPM profile for unknown

Page 1 of 5 file:///Users/tkaiser/Desktop/unknown_8_tkaiser.1237925332.870435.0_ipm_unknown/index.html

unknown

Load Balance Communication Balance Message Buffer Sizes Communication Topology Switch Traffic Memmory Usage Executable Info Host List Environment Developer Info command: unknown

codename: unknown state: running

username: tkaiser group: tkaiser

host: (x86_64_Linux)compute-9-9mpi_tasks: 8 on 1 hosts start: 03/24/09/14:08:52 wallclock: 3.13475e+01 sec stop: 03/24/09/14:09:24 %comm: 1.23924675013956 total memory: 0 gbytes total gflop/sec: 0.255203764255523 -switch(send): 0 gbytes switch(recv): 0 gbytes

Computation Event Count Pop

NULL 0 *

Communication % of MPI Time

HPM Counter Statistics

Event Ntasks Avg Min(rank) Max(rank)

NULL * 0.00 0 (0) 0 (0)

Communication Event Statistics (100.00% detail, 3.0422e-06 error)

Buffer Size Ncalls Total Time Min Time Max Time %MPI %Wall

MPI_Recv 8016 12012 1.779 2.316e-06 1.511e-02 57.26 0.71 MPI_Recv 4016 8008 0.816 8.717e-07 1.487e-02 26.26 0.33 MPI_Reduce 8 8000 0.272 3.898e-06 9.003e-04 8.75 0.11 MPI_Send 8016 16016 0.192 4.191e-08 6.679e-05 6.17 0.08 MPI_Send 4016 16016 0.041 5.402e-08 4.328e-05 1.30 0.02

Load balance by task: HPM counters

3/24/09 2:15 PM IPM profile for unknown

Page 2 of 5 file:///Users/tkaiser/Desktop/unknown_8_tkaiser.1237925332.870435.0_ipm_unknown/index.html

by MPI rank, by MPI time

Load balance by task: memory, flops, timings

by MPI rank, by MPI time

Communication balance by task (sorted by MPI time)

3/24/09 2:15 PM IPM profile for unknown

Page 3 of 5 file:///Users/tkaiser/Desktop/unknown_8_tkaiser.1237925332.870435.0_ipm_unknown/index.html

by MPI rank , time detail by MPI time , time detail by rank , call list

Message Buffer Size Distributions: time

cumulative values, values

Message Buffer Size Distributions: Ncalls

3/24/09 2:15 PM IPM profile for unknown

data sent , data recv , time spent , map_data file map_adjacency file

Switch Traffic (volume by node)

3/24/09 2:15 PM IPM profile for unknown

cumulative values, values

Communication Topology : point to point data flow

ipm_parse -html tkaiser.1237925332.870435.0

Generate a web page:

(47)

Can profile sections

!turn on profiling

call mpi_pcontrol( 1,"proc_a"//char(0)) ...

!turn off profilingcall

mpi_pcontrol( -1,"proc_a"//char(0)) /* turn on profiling*/

MPI_Pcontrol( 1,"proc_a"); ...

/* turn off profiling*/

MPI_Pcontrol(-1,"proc_a");

Report will have a new page

with the given label

(48)

What’s Missing What are we doing

about it?

Timeline style program tracing

Time in MPI routines

Communication patterns

Time in “other” routines

Memory Tracking

Performance numbers

Flops

(49)

Tracing

Evaluated a commercial package and rejected it

Will be installing Tau

http://www.cs.uoregon.edu/research/tau/home.php

Large package which does preprocessing of source

Works with many analysis packages

Includes memory tracking if malloc/allocate can be seen

(50)

Performance Information

Some Examples:

http://www.ncsa.uiuc.edu/UserInfo/Resources/ Software/Tools/PAPI/

http://perfsuite.ncsa.uiuc.edu/publications/LJ135/ x50.html

How do we get it?

(51)

PAPI - Performance API

http://icl.cs.utk.edu/papi/

Specifies a standard application programming interface (API) for accessing hardware

performance counters available on most modern microprocessors

Used by both Tau and IPM

Can show the effects of different optimizations

(52)

Tau and PAPI part of POINT

http://nic.uoregon.edu/point

Productivity from Open, INtegrated Tools (POINT) project is funded as part of the NSF's Software

Development for Cyberinfrastructure (SDCI) program

Goal: integrate, harden, and deploy an open,

(53)

Summary

The DDT debugger is available for parallel applications

DDT can also track memory usage

IPM is currently available for simple profiling

We will be installing additional performance analysis tools

(54)

### mpi settings ##

setenv MPI_VERSION /lustre/home/apps/mpi/db/mvapich-1.1 setenv MPI_VERSION /lustre/home/apps/mpi/db/mvapich2-1.2 setenv MPI_VERSION /lustre/home/apps/mpi/db/openmpi1.3.1 setenv MPI_COMPILER intel

#setenv MPI_COMPILER pg

if ( $?MPI_COMPILER && $?MPI_VERSION ) then

setenv MPI_BASE $MPI_VERSION/$MPI_COMPILER

setenv LD_LIBRARY_PATH $MPI_BASE/lib:$LD_LIBRARY_PATH

setenv LD_LIBRARY_PATH $MPI_BASE/lib/shared:$LD_LIBRARY_PATH

setenv MANPATH $MPI_BASE/man:$MPI_BASE/shared/man:$MANPATH

set path = ( $MPI_BASE/bin $path )

endif

### ddt settings ###

set path = ( /lustre/home/apps/ddt2.4.1/bin $path ) setenv DMALLOCPATH /lustre/home/apps/ddt2.4.1

setenv DMALLOC

setenv LD_LIBRARY_PATH $DMALLOCPATH/lib/64:$LD_LIBRARY_PATH ### ipm settings ###

set path = ( $path /lustre/home/apps/pl/bin )

(55)

### mpi settings ### export MPI_VERSION=/lustre/home/apps/mpi/db/mvapich-1.1 export MPI_VERSION=/lustre/home/apps/mpi/db/mvapich2-1.2 export MPI_VERSION=/lustre/home/apps/mpi/db/openmpi1.3.1 export MPI_COMPILER=intel #export MPI_COMPILER=pg if [ -n $MPI_COMPILER ]; then if [ -n $MPI_VERSION ]; then export MPI_BASE=$MPI_VERSION/$MPI_COMPILER export LD_LIBRARY_PATH=$MPI_BASE/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=$MPI_BASE/lib/shared:$LD_LIBRARY_PATH export MANPATH=$MPI_BASE/man:$MPI_BASE/shared/man:$MANPATH export PATH=$MPI_BASE/bin:$PATH fi fi ### ddt settings ### export PATH=/lustre/home/apps/ddt2.4.1/bin:$PATH export DMALLOCPATH=/lustre/home/apps/ddt2.4.1 export DMALLOC="" export LD_LIBRARY_PATH=$DMALLOCPATH/lib/64:$LD_LIBRARY_PATH ### ipm settings ### export PATH=$PATH:/lustre/home/apps/pl/bin export PATH=$PATH:/lustre/home/apps/ipm/bin export IPM_KEYFILE=/lustre/home/apps/ipm/ipm_key

(56)

Compiling for IPM

mpif90 -g stf_03.f90 -L$MPI_BASE/ipm/lib -lipm -o stf_03.ipm

VERSION

Works?

mvapich-1.1/pg

yes

mvapich-1.1/intel

Stay Tuned

mvapich2-1.2/pg

yes

mvapich2-1.2/intel

yes

(57)

Debug Compile Line

mpicc -g \ -L/lustre/home/apps/gdb-6.8/lib64 \ -liberty \ stc_06.c \ -o stc_06.g

(58)

Debug Compile Line

mpicc -g -L/lustre/home/apps/gdb-6.8/lib64 -liberty \ stc_06.c \

/lustre/home/apps/ddt2.4.1/lib/64/libdmalloc.a -o \

stc_06.g

Here we link to the debug memory

library. This is required if you want to

track memory usage in ddt.

(59)

OpenMPI

Debug Script

#!/bin/csh #PBS -l nodes=1:ppn=8 #PBS -l walltime=00:10:00 #PBS -N testIO #PBS -o stdout.$PBS_JOBID #PBS -e stderr.$PBS_JOBID #PBS -r n #PBS -V #--- cd $PBS_O_WORKDIR

#save a nicely sorted list of nodes

sort -u $PBS_NODEFILE > mynodes.$PBS_JOBID cp mynodes.$PBS_JOBID ~/.ddt/nodes

DDTPATH_TAG/bin/ddt-client DDT_DEBUGGER_ARGUMENTS_TAG mpiexec -np \ NUM_PROCS_TAG EXTRA_MPI_ARGUMENTS_TAG PROGRAM_TAG \

(60)

MVAPICH2

Debug Script

#!/bin/csh #PBS -l nodes=1:ppn=8 #PBS -l walltime=00:10:00 #PBS -N testIO #PBS -o stdout.$PBS_JOBID #PBS -e stderr.$PBS_JOBID #PBS -r n #PBS -V #--- cd $PBS_O_WORKDIR

#save a nicely sorted list of nodes

sort -u $PBS_NODEFILE > mynodes.$PBS_JOBID cp mynodes.$PBS_JOBID ~/.ddt/nodes

(61)

MVAPICH-1.1

Debug Script

#!/bin/csh #PBS -l nodes=1:ppn=8 #PBS -l walltime=00:15:00 #PBS -N testIO #PBS -o stdout.$PBS_JOBID #PBS -e stderr.$PBS_JOBID #PBS -r n #PBS -V cd $PBS_O_WORKDIR

#save a nicely sorted list of nodes

sort -u $PBS_NODEFILE > mynodes.$PBS_JOBID cp mynodes.$PBS_JOBID ~/.ddt/nodes

mpirun_rsh -hostfile $PBS_NODEFILE -n \

NUM_PROCS_TAG DDTPATH_TAG/bin/ddt-debugger \

References

Related documents

It is the (education that will empower biology graduates for the application of biology knowledge and skills acquired in solving the problem of unemployment for oneself and others

When applying the base case conditions for the dynamic simula- tion of different realisations of model Type I the following features were observed: (i) the N/G has noticeable impact

As shown in this study, loyalty to the organization resulting from merger or acquisition has different intensity level for employees in different hierarchical

UNCHS (Habitat) provides network support services for the sharing and exchange of information on, inter alia, good and best practices, training and human resources development,

Detection was very strong in the reactive set with cloud access available, and not bad in the offl ine proactive sets either. The WildList sets were well covered, and most of the

The outer- most layer is the predictor-corrector interior-point method; the middle layer is the Krylov subspace method for least squares problems, where we may use AB-GMRES, CGNE

As indicated by the results of a randomised controlled trial conducted in Kenya ( Freeman et al.  ), the presence of an appropriate WASH environment (hygiene promotion,

According to a statement published by the NSTA, it is their recommendation that professors of higher education take on the following responsibilities with regard to the