Retrospective Theses and Dissertations Iowa State University Capstones, Theses and Dissertations
1-1-2002
Deadlock detection in MPI programs
Deadlock detection in MPI programs
Yan ZouIowa State University
Follow this and additional works at: https://lib.dr.iastate.edu/rtd
Recommended Citation Recommended Citation
Zou, Yan, "Deadlock detection in MPI programs" (2002). Retrospective Theses and Dissertations. 21380. https://lib.dr.iastate.edu/rtd/21380
This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and
Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected].
by
Yan Zou
A thesis submitted to the graduate faculty
in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE
Major: Computer Engineering Program of Study Committee: Glenn R. Luecke, Co-major Professor Douglas W. Jacobson, Co-major Professor
Julie Dickerson
Iowa State University Ames, Iowa
2002
11
Graduate College Iowa State University
This is to certify that the master's thesis of Yan Zou
has met the thesis requirements of Iowa State University
TABLE OF CONTENTS LIST OF FIGURES 1v ABSTRACT Vl CHAPTERl. GENERALINTRODUCTION 1 Introduction 1 Thesis Organization 3
CHAPTER 2. DEADLOCK DETECTION IN MPI PROGRAMS 4
Abstract 4
1. Introduction 4
2. Deadlock Detection for Blocking Point-to-Point MPI Routines 6
2.1 Deadlock Detection Strategy 6
2.2 Handshake Procedure for mpi_bsend 15 2.3 Handshake Procedure for mpi_sendrecv and mpi_sendrecv_replace 16 2.4 Detection Strategy Considering mpi_probe Problem 17 3. Deadlock Detection for Non-blocking Point-to-Point MPI Routines 21 4. Deadlock Detection for Collective MPI Routines 27
5. Conclusions 35
CHAPTER 3. GENERAL CONCLUSIONS 36
APPENDIX. HANDSHAKE ROUTINES 37
BIBLIOGRAPHY 53
lV
LIST OF FIGURES
Figure 1. Dependency cycle with two processes 7 Figure 2. Dependency with four processes 7 Figure 3. Code inserted prior to calling mpi_send, mpi_ssend or mpi_rsend 12 Figure 4. Code inserted prior to calling mpi_recv 13
Figure 5. A cycle with mpi_ bsend involved 16 Figure 6. Code inserted prior to calling mpi _ bsend 16 Figure 7. A dependency cycle involving mpi_sendrecv 17 Figure 8. Code inserted prior to calling mpi_ sendrecv 18 Figure 9. Dependency cycle involving a call to mpi_probe 19 Figure 10. Handshake strategy in receiver side 20 Figure 11. Dependency cycle involving non-blocking calls 21 Figure 12. Situation where a deadlock will never occur 22 Figure 13. Situation where a deadlock will never occur 22 Figure 14. Potential deadlock situation 23 Figure 15. No actual or potential deadlock situation 23 Figure 16. No actual or potential deadlock situation 24 Figure 17. Code inserted prior to calling mpi_isend/mpi_issend/mpi_irsend and the
corresponding mpi_ wait 25
Figure 18. Code inserted prior to calling mpi_irecv and the corresponding mpi_ wait 26 Figure 19. Collective operation with a missing call 28 Figure 20. Mismatched collective operations 28 Figure 21. Incorrectly ordered collective operations 28
#"
Figure 22. Incorrectly ordered collective operations 29 Figure 23. Interleaved collective and send_recv operations 29
Figure 24. Code inserted prior to a collective routine ( on process with rank > 0) 32 Figure 25. Code inserted prior to a collective routine (on process with rank= 0) 33
Vl
ABSTRACT
Message Passing Interface (MPI) [2, 3] is commonly used to write parallel programs for distributed memory parallel computers. MPI-CHECK is a tool developed by the Iowa State University's High Performance Computing Group to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77. MPI-CHECK provides automatic compile-time checking and some run-time checking of MPI Programs. However, MPI-CHECK 1.0 does not detect situations where possible deadlocks may occur. This thesis presents the methods used in MPI-CHECK 2.0 to detect many situations where actual and potential deadlocks may occur when using blocking and non-blocking point-to-point routines as well when using collective routines.
CHAPTER 1. GENERAL INTRODUCTION
Introduction
The parallel programming paradigm is powerful in that it allows scientists and engineers to address a variety of computationally expensive problems. Message Passing Interface (MPI) [2][3] is commonly used to write parallel programs for distributed memory parallel computers. Unfortunately, debugging parallel and distributed programs is often more difficult than the debugging of sequential ones due to the inherently non-deterministic feature of these programs. A number of factors complicate parallel debugging. One difficult area involves detecting or locating communication errors. Concurrently executing processes complicates program understanding, and can obscure the point of origin of errors. Namely, errors can originate in processes other than the process showing the symptoms of the error. Accordingly, the well-known sequential (deterministic) techniques like step-by-step execution, use of breakpoints and replay until to hit a breakpoint is not applicable in distributed environment. New techniques and methods are needed as well as tools that can support this new style of debugging.
MPI-CHECK [1] is a tool developed by the Iowa State University's High Performance Computing Group to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77. MP I-CHECK provides automatic compile-time checking and some run-time checking of MPI Programs. However, MPI-CHECK 1.0 does not detect situations where possible deadlocks may occur.
This thesis presents methods for the automatic detection of actual and potential deadlocks in MPI programs. These methods have been implemented in MPI-CHECK 2.0 for MPI programs written in free or fixed format Fortran 90 and Fortran 77.
2
However, the methods presented in this thesis may also be applied to MPI programs written in C or C++.
While MPI-CHECK was being developed, a project named Umpire was being carried out at Lawrence Livermore National Laboratories. Umpire [4] is a tool for detecting MPI errors at run-time. Its deadlock detection function tracks blocking point-to-point and collective MPI communication calls, communicator management routines, completions of non-blocking requests, and detects cycles in dependency graphs prior to program execution. Unlike MPI-CHECK, Umpire uses a central manager to collect the MPI call information and check them with a verification algorithm. The central manager then controls the execution of the MPI program. The manager communicates with all MPI processes via its shared memory buffers. Currently Umpire only runs on shared memory machines.
Some other debugging efforts on distributed program go to developing debugging tools based on the collection of data during program execution and controlled replay of the program using the collected data [5-9, 11,13-17]. This method has several problems. The following three paragraphs discuss problems with this method.
First, since each process will generate a trace file during execution, hundreds of files may be generated when the program uses hundreds of processes. A user may run out of their disk space because of the generation of tremendous trace files.
Secondly, when a deadlock occurs, the trace files are likely to be incomplete. This can cause an incorrect replay of the program execution.
Some post-mortem debugging tools require the user to provide information about the execution of their MPI programs. For example, Ariadne [8] is such a tool. The problem here is that few users will be able to provide this needed information, also users tend to be reluctant to use debugging tools that require a learning curve.
Thesis Organization
In chapter 2, paper "Deadlock Detection in MPI programs" is presented. Yan Zou is the primary researcher and author of this paper. This paper introduces all the deadlock detection strategies in MPI-CHECK 2.0 and their implementation methodologies. A general conclusion is given in chapter 3.
4
CHAPTER 2. DEADLOCK DETECTION IN MPI PROGRAMS
A paper to be submitted to the Journal of Concurrency and Computation: Practice and Experience
Yan Zou, Glenn Luecke
Abstract
Message Passing Interface (MPI) [2, 3] is commonly used to write parallel programs for distributed memory parallel computers. MPI-CHECK is a tool developed to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77. This paper presents the methods used in MPI-CHECK 2.0 to detect many situations where actual and potential deadlocks occur when using blocking and non-blocking point-to-point routines as well as when using collective routines.
1.
Introduction
MPI [2, 3] is commonly used to write programs for distributed memory parallel computers. Since writing and debugging MPI programs is often difficult and time consuming, MP I-CHECK [ 1] has been developed to help make this process easier and less time consuming. However, MPI-CHECK 1.0 does not detect situations where possible deadlocks may occur. This paper presents the methods used in MPI-CHECK 2.0 to detect situations where a deadlock may occur when using blocking and some non-blocking point-to-point routines as well when using collective routines.
MPI-CHECK 2.0 detects "actual" and "potential" deadlocks in MPI programs. An "actual" deadlock occurs when a process waits for something to occur that will never occur. For example, an "actual" deadlock will occur if a process executes the MPI synchronous send, mpi_ ssend, and there is no corresponding call to an MPI receive routine. A "potential" deadlock occurs in those situations where the MPI program may
have an "actual" deadlock depending on the MPI implementation. For example, a "potential" deadlock will occur when a process executes the MPI standard send, mpi_send, ifthere is no corresponding MPI receive and if the call to mpi_send copies the message to a buffer and execution continues. Thus, "potential" deadlocks are not deadlocks in the sense that execution stops, but they are definitely produced by incorrect MPI code. Notice that if a process executes an MPI buffered send, mpi_bsend, and there is not corresponding MPI receive, then this is not considered a "potential" deadlock and MPI-CHECK currently does not find this programming error. MPI-CHECK 2.0 detects "actual" and "potential" deadlocks when using blocking and some non-blocking point-to-point routines as well as when using collective routines.
While MPI-CHECK was being developed, a project named Umpire was being carried out at Lawrence Livermore National Laboratories. Umpire [4] is a tool for detecting MPI errors at run-time. Its deadlock detection function tracks blocking point-to-point and collective MPI communication calls, communicator management routines, completions of non-blocking requests, and detects cycles in dependency graphs prior to program execution. Unlike MPI-CHECK, Umpire uses a central manager to collect the MPI call information and check them with a verification algorithm. The central manager then controls the execution of the MPI program. The manager communicates with all MPI processes via its shared memory buffers. Currently Umpire only runs on shared memory machines.
In this paper, detection of "actual" and "potential" deadlock situations involving blocking point-to-point MPI routines is discussed in section 2. Detection of actual and potential deadlock situations involving non-blocking point-to-point MPI routines is discussed in section 3. Detection of actual and potential deadlock situations caused by the incorrect use of collective MPI routines is discussed in section 4. Section 5 contains our conclusions.
6
2. Deadlock Detection For Blocking Point-To-Point MPI Routines
MPI provides both blocking and non-blocking point-to-point communication routines. Recall that when a process executes a blocking point-to-point routine, execution does not continue until it is safe to change the send/receive buffer. This section presents the methods used by MPI-CHECK to detect "actual" and "potential" deadlocks for blocking, point-to-point MPI routines.
2.1 Deadlock Detection Strategy
There are three categories of actual or potential deadlock situations that occur when using blocking, point-to-point MPI routines:
1. a process executes a receive routine and there is no corresponding call to a send routine,
2. a process executes mpi_send, mpi_ssend or mpi_rsend and there 1s no corresponding call to a receive routine, and
3. a send-receive cycle may exist due to incorrect usage of sends and receives. It is obvious that the situation described in item 1 above causes an actual deadlock. As was explained in the introduction, the situation described in item 2 will cause an actual deadlock when using the synchronous send, mpi_ssend, and sometimes when using the standard send, mpi _ send. The situation in item 2 is a potential deadlock when using mpi_send. According to the MPI standard, a ready mode send operation, mpi_rsend, may be started only if the matching receive has been posted; otherwise, the operation is erroneous and its outcome is undefined. MPI-CHECK does not currently determine if a matching receive has been posted prior to the execution of the call to mpi_rsend, but it does determine if there is a matching receive. Notice that if a process calls the buffered send, mpi_ bsend, and there is not matching receive, then this is neither an actual nor potential deadlock situation. Currently, MPI-CHECK does not check for matching receives when mpi _ bsend is called. Detailed information about how MPI-CHECK handles mpi_ bsend can be found in section 2.2.
Figures 1 and 2 illustrate the incorrect usage of sends and receives when usmg mpi_ ssend for a dependency cycle with two processes and with four processes. Notice that no send can complete until the corresponding receive has been posted. This causes an actual deadlock. If one uses mpi_send in figures 1 and 2, then either an actual or potential deadlock will occur depending on the implementation of mpi _ send and the message size used. If one uses mpi_bsend in figures 1 and 2 for at least one of the sends, then no deadlock will occur and the usage is correct. MPI-CHECK will detect such cycles when using mpi_ssend, mpi_rsend, and mpi_send.
Case 1 Case2
Process 0 Process 1 Process 0 Process 1
Send Send Send Recv
Recv Recv Send Recv
Figure 1. Dependency cycle with two processes
Process 0 Process 1 Process 2 Process 3
Send Send Send Send
Recv Recv Recv Recv
8
We next discuss methods that could be used to automatically detect the actual and potential deadlocks discussed above. One possible method would be to have MPI-CHECK automatically replace all mpi_send and mpi_rsend calls in the MPI program with mpi _ ssend. When the modified program is executed under the control of a debugger, the debugger will stop at the point of the deadlock. There are several problems with this approach. The first is that there may not be a parallel debugger available on the machine being used. If there were a parallel debugger, then recompiling a large application code for a debugger and executing it under the debugger may take an unacceptably long time. For these reasons, this methodology for detecting actual and potential deadlocks was not used in MPI-CHECK.
Another possible methodology for finding actual and potential deadlocks for blocking routines would be to have the MPI program execute under the control of a central manager, similar to what is done in Umpire [4] for shared memory debugging. However, there are difficulties when using a central manager. For example, suppose one were debugging an application using p processes and the central manager is executing on one of these processes. If a deadlock were to occur on the process the central manager is executing on, then the central manager cannot function. Notice the central manager will likely significantly delay MPI communication on its process. Thus, one would have to request p+ 1 processes when executing an MPI program with p processes. Also notice that using a central manager does not scale well for large numbers of processors. In [4], it was stated that Umpire might be extended to distributed memory parallel machines using the central manager approach. We decided not to use this approach for MPI-CHECK.
MPI-CHECK takes a different approach to the automatic detection of actual and potential deadlocks. The idea is for MPI-CHECK to insert "handshaking" code prior to each call to a send routine and each call to a receive routine. If the "handshake" does not occur within a time set by the user (the user can adjust the threshold to accommodate their platform and application), then MPI-CHECK will issue a warning message that a
"handshake" has not occurred within the specified time, give the line number in the file where the problem occurred, and list the MPI routine exactly as it appears in the source listing. Users have the option of having MP I-CHECK stop execution of the MPI program when an actual or potential deadlock is detected or allowing MPI-CHECK to continue program execution after the problem is detected.
The "handshaking" strategy utilized by MPI-CHECK can be described as follows. Part of the handshaking involves comparing data from the call to the MPI send routine and the call to the MPI receive routine. If MPI-CHECK encounters a call to
mpi_send(buf, count, datatype, <lest, tag, comm, ierror),
then the following information is stored in the character*512 variable send_info: send_info
=
{filename, start_line, end_line, count, get_rank(comm), datatype,tag}, where
start_line and end_line are the beginning and ending line numbers of the call to mpi_send in the file named "filename". If the line containing the mpi _ send is not continued, then start line and end line will be the same.
The "handshake" for the mpi_send proceeds as follows: The process executing mpi_send sends send_info to process "<lest" using a non-blocking send, mpi_isend, with a (hopefully) unique tag, MPI_CHECK_Tagl
+
tag, to avoid possible tag conflicts with other messages. The following three possibilities may occur:1. The message was never received on process "<lest" and the sending process does not receive a response message within a specified time. In this case, a warning message is issued
2. The message was received on process "dest", the information in send_info was consistent with the argument information in the call to mpi_recv, and process "dest" sends a reply to the sending process stating that everything is okay. The reply is received by calling mpi_irecv.
3. The message was received on process "dest", the information in send_info was NOT consistent with the argument information in the call to mpi_recv. In this case, process "dest" issues a message stating what the inconsistencies are and then sends a reply to the sending process to indicate that the message was received. The "handshake" for the mpi_recv proceeds as follows: The process executing mpi_recv waits to receive (by calling mpi_irecv) a message from "source" with tag, MPI_CHECK_Tagl + tag, where "tag" is obtained from the call to mpi_recv. (If "tag" is mpi_any_tag, then mpi_any_tag is used as the tag for receiving the message.) The following three possibilities may now occur:
1. A message was never received within the specified time. In this case, a warning message is issued.
2. A message was received and the information in send _info was consistent with the argument information in the call to mpi_recv. A reply message is sent to the sending process indicating that everything is okay.
3. The message was received and the information in send_info was NOT consistent with the argument information in the call to mpi_recv. In this case, a warning message is issued and then a reply is sent to the sending process to indicate that the message was received.
If the call to mpi_recv uses mpi_any_source and/or mpi_any_tag, then it might happen that the original call to mpi_recv may receive a message from a different mpi_ send than the one from which the handshake was done. To avoid this problem, MP I-CHECK changes the original mpi _ recv from
call mpi_recv(buf, count, datatype, source, tag, comm, status, ierror) to
call mpi_recv(buf, count, datatype, send_rank, send_tag, comm, status, ierror) where send_rank and send_tag come from the send_info in the handshake. Also notice that in this non-deterministic situation, MPI-CHECK will only detect a deadlock that occurs in the order of actual execution.
Figures 3 and 4 show the code inserted prior to calling each mpi_send, mpi_ssend, and mpi_recv for the case when MPI-CHECK is configured to stop program execution when an actual or potential deadlock problem is found. The instrumented code for when MPI-CHECK is configured to continue execution when an actual or potential deadlock problem is found is listed in appendix A. The subroutine time_ out is configured differently for each of these situations and the code listed in appendix A works correctly for both situations. This instrumentation is accomplished by having MPI-CHECK simply inserting subroutine "handshake_send" before the mpi_send or mpi_ssend, and by inserting subroutine "handshake_recv" before the call to mpi_recv.
In Figures 3 and 4, the variable MPI_CHECK_TIMEOUT is the waiting time, in minutes, specified by the user. This is specified by using the environmental variable MPI_CHECK_TIMEOUT. For example, issuing the command
setenv MPI CHECK TIMEOUT 5 -
-sets the waiting time to 5 minutes. MPI-CHECK has been designed so that users have the option of having MPI-CHECK stop execution of the MPI program when an actual or potential deadlock is detected or allowing MPI-CHECK to continue program execution after the problem is detected. These options are also specified using the environmental
12
! Attempt to send the information in send_info to process of rank "dest": call mpi_isend (send_info, 512, mpi_character, dest, MPI_CHECK_Tagl+tag, &
comm, req 1, ierror ) timer = mpi _ wtime( ) ! start the timer
do while(.not. flag) ! spin wait for MPI_ CHECK_ TIMEOUT minutes or until req 1 is satisfied if(mpi_ wtime() - timer> MPI_ CHECK_ TIMEOUT) then
! Print warning message when time>MPI _CHECK_ TIMEOUT and stop program execution call time_out(filename, start_line, end_line, 'mpi_send', MPI_CHECK_TIMEOUT) endif
call mpi_ test ( req 1, flag, status, ierror) ! Test whether the isend has finished. enddo
! Check for a response from process of rank "dest":
call mpi_irecv(response, 1, mpi_integer, dest, MPI_ CHECK_ Tag2+tag, comm, req2, ierror) Timer = mpi _ wtime( ) ! start the timer
do while(.not. flag) ! spin wait for MPI_ CHECK_ TIMEOUT minutes or until req2 is satisfied if(mpi_wtime()- timer> MPI_CHECK_TIMEOUT) then
call time_out(filename, start_line, end_line, 'mpi_send',MPI_CHECK_TIMEOUT) endif
call mpi_test ( req2, Flag, status, ierror) ! Test whether the irecv has finished. enddo
! the original mpi_send, mpi_rsend or mpi_ssend
call mpi_send (buf, count, datatype, dest, tag, comm, ierror)
! Attempt to receive send_info from the process ofrank "source": if (tag== mpi_ any _tag) then
call mpi_irecv (send_info, 512, mpi_character, source, mpi_any_tag, & comm, req 1, ierror )
else
call mpi_irecv (send_info, 512, mpi_character, source, & MPI_CHECK_Tagl+tag, comm., reql, ierror) timer = mpi _ wtime( ) ! start the timer
do while(.not. flag)! spin wait for MPI_CHECK_TIMEOUT minutes or until reql is satisfied if (mpi_ wtime()-timer > MPI_ CHECK_ TIMEOUT) then
! Print warning message when time>MPI _CHECK_ TIMEOUT and stop program execution. call time_out(filename, start_line, end_line, 'mpi_recv', MPI_CHECK_TIMEOUT)
endif
call mpi_test(reql, flag, status, ierror) ! Test whether irecv has finished enddo
! Extract information from send info
read(send_info, *) send_filename, send_startline, send_endline, send_rank, & send_ count, send_ type, send_ tag
! Check count and datatype from send _info with the count and datatype in the mpi_recv, if data is ! not consistent a warning message is issued.
! Send response to the sender
call mpi_send( response, 1, mpi_integer, send_rank, MPI_CHECK_Tag2+send_tag, & comm, ierror)
! the original mpi_recv, except tag and source have been changed ifmpi_any_source ! and mpi_any_tag were used in the original call
call mpi_recv ( buf, count, datatype, send_rank, send_tag, comm, status, ierror)
14
variable ABORT ON DEADLOCK. For example, issuing the command setenv ABORT ON DEADLOCK true
will cause the program to stop execution (an mpi_abort is executed) when an actual or potential deadlock is detected.
To illustrate the meaning of the information provided by MPI-CHECK, suppose one process issues an mpi_send in an attempt to send a message A to another processes. Suppose further that there is no corresponding mpi _recv. Notice that when A is small, mpi_ send will copy A into a system buffer and execution will continue, i.e. no actual deadlock will occur. However, MPI-CHECK will issue the following warning message in this situation for all positive values of MPI _CHECK_ TIMEOUT:
WARNING [File=test_handshake2.j90, Line= 29, Process OJ mpi_send has been
waiting for ..IT minute. There may not be a corresponding mpi _recv or mpi _irecv
ready to receive. call mpi_send (A,n,mpi_real, p-1,1,mpi_comm._world,ierror)
If the ABORT_ON_DEADLOCK is set to false, then the above message will be issued every MPI _CHECK_ TIMEOUT minutes. However, if there really is a corresponding receive but the executing process does not reach the call to this receive until just after the first warning message, then the above message will only be issued once.
Next suppose that one process issues an mpi_recv and there is no corresponding mpi_send. In this case, the process executing the mpi_recv will not be able to continue beyond this call as may happen in the situation mentioned in the above paragraph. When MPI-CHECK encounters the situation of executing a mpi_recv and there 1s no corresponding mpi_send, then the following message is issued:
WARNING [File=test_handshake6.j90, Line= 33, Process
JJ
mpi_recv has beenwaiting for .IT minute. There may not be a corresponding MP! send ready to send.
call mpi _recv(C, n, mpi _real, mpi _any_ source, 1, mpi _comm._ world,status, ierror)
Next suppose there is a send/receive dependency cycle for two processes as in Figure
1. If ABORT_ON_DEADLOCK
=
true, then the process first executing the mpi_send will issue a warning and program execution will be terminated. IfABORT_ON_DEADLOCK
=
false, then for every time period both the send and receive processes will issue a warning. Notice that with ABORT_ON_DEADLOCK=
false, warning messages from both the send and receive processes will continue to be issued until the program is manually aborted by the user for all actual and potential deadlocks.If the time period is set too short and there is no actual or potential deadlock, then these warning messages will stop being issued when execution continues beyond this point.
2.2 Handshake Procedure for mpi_bsend
If a process calls the MPI buffered send, mpi_bsend, and there is no corresponding MPI receive, then this situation is not an "actual" nor a "potential" deadlock and MPI-CHECK does not currently detect this incorrect MPI code. If a process calls an MPI receive, then this receive may be satisfied by receiving a message from a buffered send. To determine if this is the case, instrumentation needs to be inserted prior to the call to mpi_bsend by MPI-CHECK. Notice that if the same instrumentation were to be used for mpi_bsend as is used for mpi_send, then for the send-receive cycle situation in Figure 5 an actual or potential deadlock would be reported. However, this is not a deadlock. This problem is solved by using the same handshaking procedure for mpi_send except we remove the waiting for the completion of the mpi_isend and mpi_irecv. Figure 6 shows the code inserted prior to the call to mpi _ bsend.
16
Process 0 Process 1
bsend send
recv recv
Figure 5. A cycle with mpi_bsend involved
! Attempt to send the information in send_info to process ofrank "<lest": call mpi_isend (send_info, 512, mpi_character, <lest, MPI_CHECK_Tagl+tag, &
comm, req 1, ierror )
! Check for a response from process of rank "<lest":
call mpi_irecv(response, 1, mpi_integer, <lest, MPI_CHECK_Tag2+tag, comm, req2, ierror) ! the original mpi _ bsend
call mpi_ bsend (buf, count, datatype, <lest, tag, comm, ierror )
Figure 6. Code inserted prior to calling mpi_bsend
2.3 Handshake Procedure for mpi_sendrecv and mpi_sendrecv_replace
To avoid actual and potential deadlocks, mpi_sendrecv and mpi_sendrecv _replace routines should be used when exchanging data between processes instead of using mpi_ send and mpi_recv, see [2], unless non-blocking sends and receives are used. Recall that when a process executes an mpi_sendrecv or mpi_sendrecv _replace, the process sends a message to another process and expects to receive a message from a possibly different process. Thus, actual and/or potential deadlocks may occur because of missing
sends and receives. In addition, send-receive cycles may occur when using mpi_ sendrecv or mpi_sendrecv _replace and may cause actual or potential deadlocks. This is illustrated in Figure 7.
Process 0 Process 1 Process 2 mpi_send mp1_recv
mpi_send mpi _ sendrecv mpi_recv
Figure 7. A dependency cycle involving mpi_sendrecv
Since mpi_sendrecv and mpi_sendrecv_replace involve both the sending and receiving of messages, the handshaking procedure used by MPI-CHECK for these routines is a combination of the handshaking used for mpi_send and for mpi_recv. Figure 8 shows the code inserted by MPI-CHECK prior to calling mpi_sendrecv. The code inserted prior to calling mpi_sendrecv _replace is identical.
2.4 Detection Strategy Considering mpi_probe Problem
Actual and potential deadlocks can also be caused by the incorrect usage of mpi_probe. Recall that mpi_probe allows one to poll an incoming message to determine how to receive the message. Since mpi_probe is blocking, if there is no corresponding send, then there will be an actual deadlock on the process executing the mpi_probe. The situation of deadlock detection is complicated by the fact that a single send can satisfy multiple calls to mpi_probe [2, p52]. An additional problem may occur when mpi_probe causes a dependency cycle, see Figure 9. To detect these problems, we first insert the same handshaking code before the call to mpi_probe as is used for mpi_recv. However,
18
! Attempt to send the information in send_info to process ofrank "dest":
call mpi_isend (send_info, 512, mpi_character, dest, MPI_CHECK_Tagl+sendtag, & comm, reql, ierror)
! Check for a response from process of rank "dest":
call mpi_irecv(response, 1, mpi_integer, dest, MPI_CHECK_Tag2+tag, comm, req2, ierror) ! Attempt to receive send_info from the process ofrank "source":
if (recvtag = mpi_any _ tag) then
call mpi_irecv (recv_info, 512, mpi_character, source, mpi_any_tag, comm, req3, ierror) else
call mpi_irecv (recv_info, 512, mpi_character, source, & MPI_CHECK_Tagl+recvtag, comm, req3, ierror) ! Timing until req3 is done. Otherwise timeout.
! Extract information from recv info
read(recv_info, *) send_filename, send_startline, send_endline, send_rank, & send_ count, send_ type, send_ tag
! Check count and datatype from recv_info with the recvcount and recvtype in the mpi_sendrecv, ! if data is not consistent a warning message is issued.
! Send response to the sender
call mpi_isend( response, 1, mpi_integer, send_rank, MPI_CHECK_Tag2+send_tag, & comm, req, ierror)
! Timing until req 1 is done. Otherwise timeout. ! Timing until req2 is done. Otherwise timeout.
! the original mpi_sendrecv, except recvtag and source have been changed ifmpi_any_source ! and mpi _any_ tag were used in the original call
call mpi _ sendrecv ( sendbuf, sendcount, sendtype, dest, sendtag, recvbuf &
recvcount, recvtype, send_rank, send_tag, comm, status, ierror)
Process 0 Send (1) Send (2) Process 1 Probe (2) Recv (1) Recv (2)
Figure 9. Dependency cycle involving a call to mpi_probe
notice that this will cause the handshaking strategy for mpi_recv and other calls to mpi_probe to not perform as desired. To avoid this problem, MPI-CHECK keeps a list of all calls to mpi_probe with a unique {communicator, tag, source}. In the code inserted prior to calling mpi_recv and mpi_probe, checking is done to determine if the { communicator, tag, source} for the mpi_recv or mpi_probe matches an entry in this list.
If it matches an entry in the list, then the handshaking is bypassed; otherwise, the handshaking is performed. If the { communicator, tag, source} of mpi_recv matches an entry in this list, then this entry is removed from the list. Figure 10 illustrates this handshake strategy.
To detect an actual or potential deadlock situation when mpi_probe is used, the idea of handshake strategy introduced before is still applicable but some alternations are required. If a corresponding mpi_probe exists before a MPI receive call, the receiver side handshake procedure should be inserted before the probe. If more than one corresponding probes exist before a MPI receive call, the receiver side handshake procedure should be inserted before the first probe. Thus, the problem we need to solve is to identify whether there has been a corresponding probe existing before any mpi_probe or mpi_recv (mpi_sendrecv, mpi_sendrecv _replace) call in a MPI program, and then decide whether a handshake procedure needs to be inserted. The modified handshake strategy for the receiver side is described in Figure 10. Because of space
20
limitations, the "MPI_RECV" in Figure 10 includes the mpi_recv, mpi_sendrecv, and mpi _ sendrecv _replace routines.
There is a non-blocking version of mpi_probe, called mpi_iprobe. Currently, MPI-CHECK does not analyze calls to mpi_iprobe.
yes
Next MPI call
Add a new entry to polling list
End of handshake strategy
No
routines
Figure 10. Handshake strategy for the receiver side
No
yes
Remove the Polling probe
3. Deadlock Detection For Non-blocking Point-To-Point MPI Routines
MPI also allows the use of non-blocking point-to-point routines, mpi_isend, mpi_issend, mpi _ibsend, mpi_irsend, and mpi_irecv. Non-blocking sends/receives can be matched with blocking receives/sends. Completion of non-blocking sends/receives is indicated by calls to mpi_wait, mpi_test, mpi_waitany, mpi_testany, mpi_waitall, mpi_testall, mpi_waitsome, and mpi_testsome. Currently, MPI-CHECK only checks for non-blocking sends/receives completed by mpi_ wait and mpi_ waitall. If other routines are used to indicate completion, MPI-CHECK will not check for completion, and under some circumstances MPI-CHECK may incorrectly report errors. If there are non-blocking send or receive calls without corresponding calls to mpi_ wait or mpi_ waitall, MPI-CHECK issues a warning message suggesting that the user add matching mpi_wait or mpi_ waitall calls. Wildcards in non-blocking receive routine are currently not supported by MPI-CHECK.
As is the case with blocking sends and receives, actual and potential deadlocks may occur when using non-blocking sends and receives. The actual or potential deadlock will occur at the call to mpi_ wait or mpi_ waitall and not at the call to the non-blocking send or receive. For example, an actual or potential deadlock will occur if there is no matching send or receive. Dependency cycles may also occur with non-blocking routines. Figure 11 shows a dependency cycle when using non-blocking and blocking calls that causes either an actual or potential deadlock.
Process 0 I send( req 1) Wait(reql) Recv Process 1 I send( req2) Wait(req2) Recv
22
Figures 12 and 13 show situations where a deadlock will never occur because of the progress statement in section 3.7.4 in the MPI specification [2]. Figure 14 shows a potential deadlock situation whereas there is no actual or potential deadlock the situation in Figure 15. The situation in Figure 15 is the same as in Figure 14 except both mpi_ wait's occur after calling the mpi_irecv's. There is also no actual or potential deadlock for the situation in Figure 16. Notice that this progress statement implies that the calls to mpi_ wait in Figures 15 and 16 may be in any order and may be replaced by a single call to mpi_ waitall using any order for req 1 and req2.
Process 0 Process 1 ssend(A) Irecv(A) ssend(B) Recv(B)
Wait
Figure 12. Situation where a deadlock will never occur
Process 0 Process 1 issend(A) recv(A) ssend(B) recv(B) Wait
Process 0 Process 1 send(A) irecv(B) send(B) Wait(B) irecv(A) Wait(A) Figure 14. Potential deadlock situation
Process 0 Process 1 send(A) irecv(B) send(B) irecv(A)
Wait(B) Wait(A)
24 Process 0 Process 1 isend(A) irecv(B) isend(B) irecv(A) Wait(B) Wait(B) Wait(A) Wait(A)
Figure 16. No actual or potential deadlock situation
MPI-CHECK detects actual and potential deadlocks involving non-blocking routines using a handshaking strategy that is similar to the handshaking strategy used for blocking routines. When a non-blocking send or receive is encountered, MPI-CHECK inserts code prior to the call that initiates the handshake but does not wait for the handshake to be completed. The code that completes the handshaking is inserted prior to the call to the corresponding mpi_ wait ( or mpi_ waitall). Prior to the call to any of the non-blocking sends/receives, MPI-CHECK inserts the desired code by inserting a call to subroutine "handshake_isend"/"handshake_irecv". Prior to the call for all corresponding waits, MPI-CHECK inserts the desired call to subroutine "handshake wait". Code for these routines can be found in Appendix A. Figure 17 shows the code inserted prior to calling mpi_issend, mpi_issend, and mpi_irsend as well as the code inserted prior to calling the corresponding mpi_ wait. The non-blocking buffered send, mpi_ibsend, is handled in a way similar to what was done for the blocking buffered send. Figure 18 shows the code inserted prior to calling mpi_irecv and its corresponding mpi_ wait.
By inserting code prior to the call to mpi_finalize, MPI-CHECK is able to determine if there are any pending calls to non-blocking send or receive routines. This situation only
! Attempt to send the information in MPI_CHECK_Isendinfo(k)to process of rank "<lest": call mpi_isend( MPI_CHECK_Isendinfo(k),512, mpi_character, <lest, &
MPI _CHECK_ Tag 1 +tag, comm, MPI _CHECK_ Ireq 1 (k ), ierror ) ! Attempt to receive a response from process of rank "<lest":
call mpi_irecv(response, 1, mpi_integer, <lest, MPI_CHECK_Tag2+tag, comm, & MPI_ CHECK_Ireq2(k), ierror)
! the original non-blocking send
call mpi_isend(buf, count, datatype, <lest, tag, comm, req, ierror)
•••
! spin wait for MPI_CHECK_TIMEOUT minutes or until MPI_CHECK_Ireql(k) is satisfied Timer = MPI _ W time( )
Flag= FALSE do while(.Not. Flag)
if(MPI_ Wtimer()-Timer > MPI_CHECK_TIMEOUT) then
call outtime('test.f90 ', 31, 31, 'mpi_ wait', MPI _CHECK_ TIMEOUT) endif
call mpi_ test(MPI _CHECK_ Ireq 1 (k), Flag, status, ierror) enddo
! spin wait for MPI _CHECK_ TIMEOUT minutes or until MPI _CHECK_ Ireq2(k) is satisfied Timer= MPI_ Wtime()
Flag= FALSE do while(.Not. Flag)
if(MPI_ Wtimer()-Timer > MPI_CHECK_TIMEOUT) then
call outtime('test.f90', 31, 31, 'mpi_wait', MPI_CHECK_TIMEOUT) endif
call mpi_ test(MPI _CHECK_ lreq2(k), Flag, status, ierror) enddo
! The original mpi_ wait call mpi_ wait ( req, status, ierror)
Figure 17. Code inserted prior to calling mpi_isend/mpi_issend/mpi_irsend and the corresponding mpi_ wait
26
! Attempt to receive MPI _CHECK_ Isendinfo(k) from process of rank "src". call mpi_irecv( MPI_CHECK_Isendinfo(k), 512, mpi_character, src, &
MPI_CHECK_Tagl+tag, comm., MPI_CHECK_Ireq(k), ierror) ! Attempt to send response to process of rank "src".
call mpi_isend( response, 1, mpi_integer, src, MPI_CHECK_Tag2+tag, & comm, temp _req, ierror )
! The original mpi_irecv
call mpi_irecv (buf, count, datatype, src, tag, comm, req, ierror) ! Timing before the original mpi_ wait
Timer =MPI_ Wtimer()
! spin wait for MPI_ CHECK_ TIMEOUT minutes or until MPI_ CHECK_ Ireq(k) is satisfied do while(.Not. Flag)
if(MPI_ Wtimer()-Timer > MPI_CHECK_TIMEOUT) then
call timeout('test.±90', 36, 36, 'mpi_wait', MPI_CHECK_TIMEOUT) endif
call mpi_test(MPI_ CHECK_Ireq(k), Flag, status, ierror) enddo
! Check count and datatype from MPI_ CHECK_Isendinfo(k) with the count and datatype ! in the mpi_irecv. If data is not consistent, a warning message is issued.
! the original mpi_ wait call call mpi_ wait (req, status, ierror)
occurs if a call to a non-blocking routine has been made and there is no corresponding call to a mpi_ wait. MPI-CHECK keeps track of all non-blocking calls and their corresponding request's by writing this information into the globally accessible, integer array, MPI_CHECK_Ireq of size MAX. MAX is set at 512 by default. When a request has been satisfied, it is removed from this array. When there are pending calls to non-blocking routines encountered, MPI-CHECK issues a warning, such as
WARNING: At least one nonblocking routine has no explicit termination. Please add
mpi_waitfor them to get more information from MPI-CHECK!
4. Deadlock Detection for Collective MPI Routines
Unlike pint-to-point communication, collective routines come in blocking versions only. This section presents the methods used by MPI-CHECK to detect "actual" and "potential" deadlocks for MPI collective routines.
The following are the categories of actual and potential deadlock situations that can occur when using collective routines:
1. A collective routine is not called by all the processes in the communicator. 2. All processes in a communicator may not call distinct collective routines in the
same order.
3. Improper ordering of point-to-point and collective routines. MPI-CHECK automatically detects all of the above problems.
Figure 19 illustrates the situation in item 1, where two processes execute mpi_gather while the third process does not. Notice that this may be an actual or potential deadlock depending on which processor is the root processor and depending on whether or not there is synchronization after the call to mpi_gather. Figure 20 also illustrates the situation of item 1, where two processes execute mpi_ allgather while the third executes mpi_gather.
28
Process 0 Process 1 Process 2 Gather
- - ~ - -
Gather finialize finalize finalize Figure 19. Collective operation with a missing callProcess 0 Process 1 Process 2
...__G_a_th_er __
__.J
I
Allgather AllgatherFigure 20. Mismatched collective operations
Process 0 Process 1 Process 2 Barrier Barrier Beast Beast Beast .. Barrier
Process 0 Process 1 Process 2 Reduce Reduce "' Beast Beast Beast
..
ReduceFigure 22. Incorrectly ordered collective operations.
Process 0 Process 1 Process 2 Beast
I
Beast ... ... Recv... ... ...
Send ,_.,,,..,,. .,,.,
><
.... , ',..., BeastFigure 23. Interleaved collective and send-receive operations
Figures 21 and 22 illustrate the situation in item 2, incorrectly ordered collective operations. The MPI standard requires that "a correct, portable program must invoke collective communications so that deadlock will not occur, whether collective communications are synchronizing or not" [2].
Figure 23 illustrates the improper ordering of point-to-point and collective routines. If
the root processor for the mpi_ beast is process 2, then there will be an actual deadlock. If
the root processor for mpi_bcast is process O or 1, then there may or may not be an actual deadlock, depending on whether or not mpi_bcast is synchronizing. In all of these cases, the MPI code is incorrect, since the MPI standard [2] requires that "the relative order of execution of collective operations and point-to-point operations should be such, that even
30
if the collective operations and the point-to-point operations are synchronizing no deadlock will occur".
MPI-CHECK detects the above errors by using a handshake strategy similar to what is done for the point-to-point routines. For the collective routines, p processes are involved rather than the 2 processes used for the point-to-point routines. The strategy is to use process Oto collect information from all other processes. To do handshaking for the collective routines, all nonzero processes send the following information to process 0 in the send _info array
send_info
=
{file_name, start_line, end_line, get_rank(comm), call_name }, where start_line and end_line are the beginning and ending line numbers of the call to the collective routine in the file named "file_name", and "call_name" is the name of that collective routine. The process then sends send _info to the process of rank O in that communicator, with a unique tag, MPI_CHECK_Tag3, to avoid possible tag conflicts with other messages. The following three possibilities may occur:1. The message was never received on process O and the sending process does not receive a response message within a specified time. In this case, a warmng message is issued.
2. The message was received on process 0, the call_name in send_info was not the same one as the collective call on process 0. In this case, process O issues a message stating what the inconsistencies are and the program execution is stopped no matter what mode of execution is being used.
3. The message was received on process 0, the call_name in send_info was same as the collective call on process 0, and process O sends a reply to the sending process stating that everything is okay. The reply is received by calling mpi_irecv with tag MPI_CHECK_Tag4. Program execution is continued.
The above describes the handshake procedure for the processes with nonzero rank. We now describe the handshake procedure for process zero. Processes zero issues p-1
mpi_irecv calls with tag MPI_CHECK_Tag3, attempting to obtain p-1 copies of send_info sent by nonzero processes. The following three possibilities may now occur:
1. P-1 copies of send_info were received and in each copy the call_name was consistent with the call_ name on process 0. In this case, a reply message is sent to the each of the p-1 sending process indicating that the information is consistent.
2. At least one of the p-1 of send _info is never received within the specified time. In this case, a warning message is issued.
3. One of the p-1 copies of send_ info was received, but the call_ name in send_ info was inconsistent with the call_name on process 0. In this case, a warning message is issued stating an unmatched collective call situation, and the program execution is stopped.
Figure 24 and 25 show the code inserted prior to calling each collective MPI routine on the process of rank O and processes of rank> 0, respectively.
For a deadlock situation described in Figure 19, MPI-CHECK will issue the following message:
WARNING: [File=test_collective3.f90, Line= 28, Process OJ mpi_gather has been
waiting for .IT minutes. There may not be corresponding mpi _gather on other
processes ready.
call mpi_gather(A,n,mpi_real,B,n,mpi_real,root,mpi_comm_world ,ierror)
For a deadlock situation described in Figure 20, 21 and 22, MPI-CHECK will issue the following message:
32
! Attempt to send the information in send_ info to process of rank O:
call mpi_isend( send_info, 512, mpi_character, 0, MPI_CHECK_Tag3, comm, reql, ierror) Flag= False
Timer = MPI_ Wtime( ) do while(.Not. Flag)
if(MPI_ Wtime()-Timer > MPI_CHECK_TIMEOUT) then
call outtime('test.f90', 12, 12, 'mpi_gather', MPI_ CHECK_TIMEOUT) endif
call mpi_test(reql, Flag, status, ierror) enddo
! Check for a response from process of rank 0:
call mpi_irecv(response, 1, mpi_integer, 0, MPI_CHECK_Tag4, comm, req2, ierror) Flag= False
Timer = MPI_ Wtime( ) do while(.Not. Flag)
if(MPI_ Wtime() - Timer > MPI _CHECK_ TIMEOUT) then
call outtime('test.f90', 12, 12, 'mpi_gather', MPI_CHECK_TIMEOUT) endif
call mpi_test(req2, Flag, status, ierror) enddo
! the original collective call
call mpi_gather(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm, ierror)
! Attempt to receive p-1 pieces of send_info from the process 1,2, ... p-1: do i=l, get_comm_size(comm)-1
call mpi_irecv( handshake_info, 512, mpi_character, mpi_any_source, &
MPI_CHCK_Tag3, comm., req, ierror) Timer = MPI_ Wtime( )
do while(.Not. Flag)
if(MPI_ Wtime-Timer > MPI_CHECK_TMEOUT) then
call outtime('test.f90', 12, 12, 'mpi_gather', MPI_CHECK_TIMEOUT) endif
call mpi_test(req, Flag, status, ierror) enddo
! Extract information from send info
read(handshake_info,*) come_filename, come_startline, come_endline, come_rank, &
come funcname ! unmatched collective call check
if(come_funcname .NE. 'MPI_ GATHER') then
call mismatch('test.±90', 12, 12, 'MPI_ GATHER', 0, come_filename, come_startline, & come_endline, come_funcname, come_rank)
endif
! Send response to the sender!
call mpi_send(response, 1, mpi_integer, come_rank, MPI_CHECK_Tag4, comm, ierror) enddo
! the original collective call
call mpi_gather(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, & comm, &ierror)
34
WARNING: [File=test_collective3.f90, Line= 28} This collective routine on process
0 can not match the corresponding routine (the second one listed below) on process
1.
call mpi_gather(A,n,mpi_real,B,n,mpi_real,root,mpi_comm._world & ,ierror)
call mpi _ all gather( A, n, mpi _real,B, n, mpi _real, mpi _comm._ world, ierror)
For a deadlock situation described in Figure 23, MPI-CHECK will issue following message:
WARNING: [File=test_collective.f90, Line= 29, Process OJ mpi_bcast has been
waiting for XX- minutes. There may not be corresponding routine on other processes
ready.
call mpi_bcast(Al, 1, mpi_real8, 0, mpi_comm_world, ierror)
WARNING: [File=test_collective.f90, Line= 34, Process 2} mpi_recv has been
waiting for XX- minutes. There may not be corresponding MP! send ready to send.
call mpi_recv(A, n, mpi_real8, 0, 1, mpi_comm_world, status, ierror)
Please notice the deadlock detection methods for collective MPI routines described in this section only apply to communication within a single group of processes (intra-communication) and not to disjoint groups of processes (inter-(intra-communication). The deadlock detection methods MPI-CHECK uses for detecting actual and potential deadlock caused by point-to-point MPI routines described in sections 2 and 3 are applicable to both intra- and inter-communicator domains. Currently MPI-CHECK does not support the detection of actual and potential deadlocks for inter-communicator domains.
MPI 1.0 only allows intracommunicators to be used for collective routines. MPI 2.0 allows intercommunicators to be used for most of the collective routines, see [3]. For
MPI-CHECK to detect actual and potential deadlocks for intercommunicator domains for collective routines, the following would need to be done. The MPI routine mpi _comm_ test_ inter( comm, flag) could be used to determine if the communicator is an intra or intercommunicator. To detect actual and potential deadlocks for intercommunicators, MPI-CHECK would have to be changed to accommodate the following differences:
1. Process zero in both the local and remote group will collect information from all the nonzero processes in the remote group.
2. The handshaking will then take place between each process zero and the nonzero processes in the remote group.
5. Conclusions
MPI-CHECK 1.0 [1] is a tool developed to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77, but does not contain any deadlock detection methods. This paper presents methods for the automatic detection of actual and potential deadlocks in MPI programs. These methods have been implemented in MPI-CHECK 2.0 for MPI programs written in free or fixed format Fortran 90 and Fortran 77. The methods presented in this paper may also be applied to MPI programs written in C or C++.
36
CHAPTER 3. GENERAL CONCLUSIONS
This thesis work upgrades MPI-CHECK, a tool developed to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77, from version 1.0 to 2.0. New methods for the automatic detection of actual and potential deadlocks in MPI programs are presented and implemented in MPI-CHECK 2.0. The methods presented in this thesis paper may also be applied to MPI programs written in C or C++.
APPENDIX. HANDSHAKE ROUTINES
!---handshake check before mpi_wait
!---SUBROUTINE handshake_wait(max_nonblock, ireq, ireq_type, isendinfo,& ireq_count, ireq_datatype,ireq_l,ireq_2, & req, filename, line_no, line_end, funcname include 'mpif.h'
integer max_nonblock, ireq(max_nonblock), ireq_type(max_nonblock) character(LEN=512) isendinfo(max_nonblock)
integer ireq_count(max_nonblock), ireq_datatype(max_nonblock) integer ireq_l(max_nonblock), ireq_2(max_nonblock)
integer req, timeout, line_no, line_end character(*) filename, funcname
integer i, j,timer logical flag
integer status(mpi status_size) character(512) come filename
integer come_lineno, come_lineend, come rank, come count integer come type, come tag
timeout= my_getenv_timeout( do i =1, max_nonblock
if (ireq(i) == req) then ireq(i) = -9999
if this is a pending isend/issend/irsend if(ireq_type(i) == 0) then
timer= MPI_Wtime()
call mpi test(ireq_l(i), flag, status, ierror) j=l
do while(.NOT. flag)
if(MPI_Wtime() - timer> j*timeout*60) then
call outtime(filename,line_no,line_end,funcname,j*timeout) j=j+l
endif
call mpi test(ireq_l(i), flag, status, ierror) enddo
timer= MPI_Wtime()
call mpi test(ireq_2(i), flag, status, ierror) j=l
38
do while (.NOT. flag)
if(MPI_Wtime() - timer> j*timeout*60) then
call outtime(filename,line_no,line_end, funcname,j*timeout) j=j+l
endif
call mpi test(ireq_2(i), flag, status, ierror) enddo
if this is a pending ibsend, do nothing else if (ireq_type(i) == 2) then
! if this is a pending irecv else
timer= MPI_Wtime()
call mpi test(ireq_l(i), flag, status, ierror) j=l
do while (.NOT. flag)
if(MPI_Wtime() - timer> j*timeout*60) then
call outtime(filename,line_no,line_end,funcname,j*timeout) j=j+l
endif
call mpi test(ireq_l(i), flag, status, ierror) enddo
read(isendinfo(i), *) come_filename, come_lineno,
come_lineend, come rank, come_count, come_type, come tag send recv count check
if(come_count > ireq_count(i)) then
call sendrecv_mismatch(filename, line_no, line end, &
'recv msg count is less than the send msg from processor', & come rank, come filename, come lineno, come_lineend)
endif
! send recv message type check
if(come_type /= ireq_datatype(i)) then
call sendrecv_mismatch(filename, line no, line_end, &
'recv msg type does not match the send msg from processor',& come_rank, come_filename, come lineno, come lineend)
endif endif exit endif enddo
END SUBROUTINE handshake wait
1---handshake check before blocking send
!---SUBROUTINE handshake_send(filename, line_no, line_end, funcname, comm,& count, datatype, dest, tag, special_tagl, special tag2, &
max, bsend_req, bsend_info, response include 'mpif.h'
character(*) filename, funcname
integer line_no, line_end, comm, count, datatype, tag, dest integer special tagl, special tag2, timeout
integer max
integer bsend_req(max*2)
character(512) bsend_info(max) integer response
character(512) handshake info integer i, j, timer, req integer : : done =0
logical flag, flag2
integer status(mpi status size) timeout= my_getenv_timeout( IF (funcname == "MPI_BSEND") THEN
do i=l, max
call mpi_test(bsend_req(i*2), flag, status, ierror) call mpi_test(bsend_req(i*2-1), flag2, status, ierror) if(flag .AND. flag2) then
write (bsend_info (i), *) "' 11
, filename,"' 11, line_no, line_end, &
get_rank(comm), count, datatype,tag call mpi isend(bsend info(i), 512, mpi_character, dest,&
special tagl+tag, comm, bsend_req(i*2), ierror) call mpi irecv(response, 0, mpi integer, dest, &
special tag2+tag, comm, bsend_req(i*2-1), ierror) done = 1;
EXIT endif enddo
if(done ==0) then
write(*, *) "ERROR: Sorry, you are using more mpi_bsend and mpi ibsend than MPI-CHECK can check!"
call MPI_Abort( MPI_COMM_WORLD, 1, ierror) endif
ELSE
write (handshake info, *) "'",filename,"'", line_no, line_end, get_rank(comm), count, datatype,tag
call mpi isend(handshake_info, 512, mpi_character, dest, special tagl+tag, comm, req, ierror) timer= MPI_Wtime()
call mpi_test(req, flag, status, ierror) j=l
do while(.NOT. flag)
if(MPI_Wtime() - timer> j*timeout*60) then
call outtime(filename,line_no, line end, funcname, j*timeout) j=j+l
endif
call mpi test(req, flag, status, ierror) enddo
40 response= 0
call mpi irecv(response, 0, mpi_integer, dest,
special tag2+tag, comm, req, ierror) timer= MPI_Wtime()
call mpi test(req, flag, status, ierror) j=l
do while (.NOT. flag)
if(MPI_Wtime() - timer> j*timeout*60) then
call outtime(filename,line_no, line_end, funcname, j*timeout) j=j+l
endif
call mpi test(req, flag, status, ierror) enddo
END IF
END SUBROUTINE handshake send
!---handshake check before mpi_probe
!---SUBROUTINE handshakeyrobe(need_handshake, probe_num, probe tag, &
probe_src, probe_comm, &
filename, line_no, line_end, funcname, comm, &
src, tag, special_tagl, special_tag2, & come_rank, come tag, max, req_g)
include 'mpif.h'
logical need handshake
integer probe num, max, probe tag(max), probe src(max), probe_comm(max)
character(*) filename, funcname
integer line_no, line_end, comm, tag, src, special tagl, special_tag2
integer timeout, come rank, come tag, req_g(max) character(512) handshake info, come filename integer
come_type logical integer integer
i,j, timer, response, req, flag
status(mpi status size) :: done =0
timeout= my_getenv_timeout( )
come_lineno, come_lineend,
! to see whether there is a matching probe existing already need handshake= .True.
do i = 1, probe_num
if(src == mpi any_source .AND. tag mpi_any_tag .AND. comm probe comm(i)) then
need handshake= .False. EXIT
endif
if(src == mpi any source .AND. tag probe comm(i)) then
need handshake= .False.
EXIT endif
if(src == probe src(i) .AND. tag probe comm(i)) then
need handshake= .False. EXIT
endif
if(src == probe src(i) .AND. tag probe comm(i)) then
need handshake= .False. EXIT
endif enddo
if(need_handshake) then if(probe_num == max) then
mpi any_tag .AND. comm
probe_tag(i) .AND. comm
write(*, *) "ERROR: Sorry, you are using more mpi_probe ", &
"than MPI-CHECK can check!"
call MPI_Abort( MPI_COMM_WORLD, 1, ierror) endif
if(tag == mpi_any_tag) then
call mpi_irecv(handshake info, 512, mpi_character, src, tag, & comm, req, ierror)
else
call mpi_irecv(handshake info, 512, mpi_character, src, &
special tagl+tag, comm, req, ierror) endif
timer= MPI_Wtime()
call mpi_test(req, flag, status, ierror) j=l
do while(.NOT. flag)
if(MPI_Wtime() - timer> j*timeout*60) then
call outtime(filename,line_no, line_end, funcname, j*timeout) j=j+l
endif
call mpi test(req, flag, status, ierror) enddo
read(handshake info, *) come filename,come lineno,come lineend, &
come rank, come count, come type, come tag response= 1
do i=l, max
call mpi_test(req_g(i), flag, status, ierror) if ( flag ) then
call mpi isend(response, O, mpi_integer, come_rank, &
special_tag2+come tag,comm, req_g(i), ierror) probe_num = probe_num + 1
probe_src(probe_num) = come_rank probe_tag(probe_num) = come_tag probe comm(probe_num) = comm
done 1 EXIT endif enddo if(done==0) then 42
write(*, *) "ERROR: Sorry, you are doing more point-to-point",& "communication than MPI-CHECK can check!"
call MPI_Abort( MPI_COMM_WORLD, 1, ierror) endif
endif
END SUBROUTINE handshake_probe
'---handshake check before blocking recv
!---SUBROUTINE handshake_recv(need_handshake, probe_num, max_probe,
probe_tag, probe_src, probe_comm, &
filename, line_no, line_end, funcname, comm, & src, tag, count, datatype, special tagl,& special tag2, come rank, come tag, max, req_g) include 'mpif.h'
logical need handshake
integer probe num, max_probe
integer probe_tag(max probe), probe src(max_probe), probe_comm(max_probe)
character(*) filename, funcname
integer line_no, line_end, comm, tag, src, count, datatype integer special_tagl, special_tag2, timeout
integer come rank, come tag, max, req_g(max) character(512) handshake info, come filename
integer i, j, timer, response, req, come lineno, come lineend, come_type
logical flag
integer status(mpi status size) integer : : done= 0
timeout= my_getenv_timeout( )
! to see whether there is a matching probe existing already, ! In this case, handshake has been done at that time
need handshake= .True. do i = 1, probe_num
if(src == mpi_any_source .AND. tag probe comm(i)) then
need handshake= .False. probe_tag(i) = -9999 probe_src(i) = -9999 probe comm(i) = -9999 EXIT
endif
if(src == mpi any_source .AND. tag probe_comm(i)) then
mpi_any_tag .AND. comm
need handshake= .False. probe_tag(i) = -9999 probe_src(i) = -9999 probe comm(i) = -9999 EXIT Endif
if(src == probe src(i) .AND. tag probe comm(i)) then
need handshake= .False. probe_tag(i) = -9999 probe_src(i) = -9999 probe comm(i) = -9999 EXIT
endif
if(src == probe src(i) .AND. tag probe comm(i)) then
need handshake= .False. probe_tag(i) = -9999 probe_src(i) = -9999 probe comm(i) = -9999 EXIT endif enddo if(need_handshake) then
if(tag == mpi_any_tag) then
mpi_any_tag .AND. comm
probe_tag(i) .AND. comm
call mpi irecv(handshake info, 512, mpi character, src, tag, &
comm, req, ierror) else
call mpi irecv(handshake info, 512, mpi_character, src, &
special tagl+tag, comm, req, ierror) endif
timer= MPI_Wtime()
call mpi_test(req, flag, status, ierror) j=l
do while(.NOT. flag)
if(MPI_Wtime() - timer> j*timeout*60) then
call outtime(filename,line_no, line_end, funcname, j*timeout) j=j+l
endif
call mpi test(req, flag, status, ierror) enddo
read(handshake info, *) come filename,come_lineno,come_lineend, &
come rank, come count, come type, come tag ! count match check
if(come count> count) then
call sendrecv_mismatch(filename, line_no, line_end, &
44
come rank, come filename, come lineno, come lineend) endif
! datatype match check
if(come_type /= datatype) then
call sendrecv_mismatch(filename, line_no, line_end, &
'recv msg type does not match the send msg from process', &
come rank, come filename, come lineno, come lineend) endif
! send confirmation to sender response= 1
do i=l, max
call mpi_test(req_g(i), flag, status, ierror) if ( flag ) then
call mpi isend(response, 0, mpi_integer, come rank, &
special tag2+come tag,comm, req_g(i), ierror) done= 1
EXIT endif enddo
if(done==0) then
write(*, *) "ERROR: Sorry, you are doing more point-to-point", &
"communication than MPI-CHECK can check!" call MPI_Abort( MPI_COMM_WORLD, 1, ierror)
endif endif
END SUBROUTINE handshake recv
!---handshake check before collective routines'---SUBROUTINE handshake_collective(comm, filename, line_no, line_end, & funcname, special_tagl, special tag2) include 'mpif.h'
integer comm, timeout, line_no, line_end, special tagl, special tag2 character(*) filename, funcname
integer i, j, req, response, come_lineno, come_lineend, come rank character(512) handshake info, come filename, come funcname
logical flag
integer status(mpi status size) timeout= my_getenv_timeout( if(get_rank(comm) == 0) then
do i = 1, get_commsize(comm) - 1
call mpi_irecv(handshake_info, 512, mpi character, &
mpi_any_source, special tagl, comm, req, ierror) timer= MPI_Wtime()
call mpi_test(req, flag, status, ierror) j=l