Startup and shutdown - Mastering IPython 4.0.pdf

The preceding examples all used infinite loops. While this is a valid choice for a system that is expected to run continuously, it is not a universal feature. It does, however, avoid some annoying behavior that can occur at startup and shutdown.

Messaging with ZeroMQ and MPI

[ 106 ]

Consider the PUSH/PULL architecture outlined previously. The ventilator, workers, and sink can all be spread to different processors by the scheduler. At some point, the ventilator will begin execution, open its socket, and start sending messages. It does not know how many workers will be there in total. The ventilator will simply start sending messages out as fast as it can to whoever is listening. If, when the first message is ready to be sent, there are only two workers that have started (and connected), one of those workers will get the message. ZeroMQ will not block the ventilator from sending messages until every worker is ready to accept them. If there is some delay such that the other 14 workers do not start within the time period that is required for the ventilator to send out all 16 messages, then the first two workers will get eight messages each and the remaining 14 workers will get none.

The sink has a similar problem. In this case, the sink does not know whether any workers are still processing a message. There is no necessary connection at the ZeroMQ level between the number of messages sent out, the number of workers that are connected to either the ventilator or the sink, and the number of messages that the sink should expect.

The solution in both cases is to set up "extra" sockets on other ports for out-of-band communication. In the aforementioned situation, the ventilator should set up a REP socket and each worker should connect to the REP socket using a REQ socket. When a worker started up, but before it started receiving "actual" messages, it would send a message to the ventilator to let the ventilator know that the worker was up and ready to receive messages. The server would delay sending any "actual" messages until it has received these out-of-band messages from all the workers.

A similar setup would work for the sink. In this case, the ventilator and the sink should set up a separate PAIR socket to communicate with each other. Every time the ventilator sent out a message to a worker, it would also send out an out-of-band message to the sink, letting the sink know that it should receive one more message from a worker. This way the sink could make sure that the number of messages it received was the same as the number of messages the ventilator sent. A special "no more messages" message could be used when the ventilator is done so that the sink does not sit around forever waiting to see if another message is going to be sent.

Discovery

All the previous examples assumed that all the communication processes were running on the same machine. Certainly, the servers could accept messages from other machines, but all client connections were to localhost. This was done at least partially to make the examples easy to run—putting everything on localhost means not needing a parallel machine on your desk.

Chapter 4 It also obviates another, more serious problem: how do the clients know where to connect to? Depending on the pattern, a process that wants to communicate needs to know on what machine and what port to send or listen.

A popular solution to this problem is to create an intermediary between the client and the server, called a broker or proxy. The broker sits in one well-known place that every server and client knows about (by having it hardcoded). This reduces the problem of finding where all the other processes are to simply connecting to a single endpoint.

Heavyweight solutions (for example, RabbitMQ and IBM Integration Bus) come with the broker built in. There are three reasons ZeroMQ avoids mandating the use of a broker:

• Brokers tend to become more complicated over time

• Brokers become bottlenecks for messages, as they all must pass through them

• The broker becomes a single point of failure

A directory architecture could be used as an alternative. In this case, there would be a single process responsible for knowing where all the endpoints were. When a process needs to know where to send/receive a message, it will check with the directory to find the server/port/socket type, and then set up its own socket. In this case, the directory does not pass any messages through itself, instead simply receiving and answering queries about the communications setup. While this may sound simpler than a broker, it is not without its issues.

Whether the solution to discovery is a broker, a directory, or some other scheme, ZeroMQ provides enough flexibility to implement a solution.

MPI

The Message Passing Interface (MPI) is a language-independent message passing library standard. It has been implemented in several languages, including Fortran, C/C++, and Python. This book will use the Mpi4py implementation. Chapter 3, Stepping Up to IPython for Parallel Computing, outlined the process for starting IPython using MPI. It will be assumed that IPython has been started this way in the following examples.

Messaging with ZeroMQ and MPI

[ 108 ]

In document Mastering IPython 4.0.pdf (Page 128-131)