Chapter 5 The NIMBUS Pipeline
5.4.1 Simple Queue Service (SQS) Performance
5.4.1.4 Exp:NIM1-4 SQS Queue Read Rates
The issues of reading queue message in a none distributed or parallel manner are consistent will the issues associated with writing messages to a queue. For each message read there is a connection overhead, but in addition to this there is also the overhead of deleting the message from the queue creating a double cost to message reading. When a message is read, a visibility timeout it set leaving the message invisible to others. Messages within the visibility timeout are referred to as a messages in flight, and the limit for the number of messages in flight is set currently at 120, 000. This provides a potential limit on the performance of a system using a single worker queue. A system which has the capacity to read more that that number of messages at one time, will be blocked reading messages until the total number of messages in flight is reduced below that threshold. This requires that either a message is returned to the queue or deleted by the reading worker.
While this is a limitation of the AWS SQS service it is not necessarily a limiting factor on the pipeline as it is possible to operate a larger number of queues, each containing file information for processing. If a pipeline was limited to operating at 120, 000 messages per second this would equate to 1.2M illion images per second using image data cubes of 10 stacked images, which in this pipeline would represent about 1 Terabyte per second, or 2.8 Petabytes per hour.
The issue of reading queues quickly is more related to the processing of post experiment analysis than with the running of the system. With the pipeline’s distributed nature, and the use of batch downloading per worker node, the processing time of the images is the limiting factor for the image processing rate, not the message download time for each worker. The use of the queues for distributed sharing of log files and result files however does require some thought on queue read performance. The reason a queue system is used to record result information and log file information is that it allows the worker nodes to be more independent. By centralising key information about their processing to a central queue, all of the pipeline log messages are available in a single queue, although there is no specific reason why multiple queues could not be used.
If the log files are being monitored for issues with workers or with processing then they need
to be constantly read and monitored with key value pairs9 being sought to identify issues. The
read rate of a log file queue must therefore have a similar read rate to the worker processing rate. A monitoring process reading the queue will be doing considerably less processing that a worker node, so it should be able to process messages considerably faster, or allow for multiple monitors to operate simultaneously (which is the case). A single monitoring server reading queue messages can read approximately 100 messages per second when running multiple threads, as shown below in Figure 5.16. Multiple monitoring servers could therefore reasonably be assumed to be able to read all log or result messages produced by a fully functional operating pipeline. Using Equation 9
A key/value pair could be a specific pattern within the log file data indicting an unusual or important state for a worker, such as WorkerID:STOP
5.3 it can be estimated that the message read rate would have to be slightly greater than twice the message processing rate of a system since each worker generates just over 2 messages per image file processed. If a single monitoring system can read 100 images per second, then the total monitoring node requirements is the message processing rate of the system per second divided by 100.
25.5 σ13.5%
Single Threaded, Sequential program
100.8 σ14.2%
Multi-threaded, 40 threads program
100.1 σ22.6%
Multi-threaded, 60 threads program
0 50 100
Message reads per second
Figure 5.16: Exp:NIM1-4. Messages read per second from a single monitor server node using varying levels of threads running with the standard deviation shown.
A number of strategies were considered for log file message reading which are briefly discussed. • Serial Download requires a single-threaded application to read messages from the queue one at a time. It is possible to increase the number of messages taken from the queue at a single read, however this did not provide any significant performance improvements on the message read performance. This approach, as shown in Figure 5.16 is not an optimal solution.
• Multi-Threaded Download provided considerable improvements in the overall message download rate with the number of threads being a configurable number similar to the Python listings showing message writing. In this case messages are deleted once they have been read so the queue is constantly reducing over time.
• Multi-Threaded Download with messages in flight is a faster message read given that the message is not deleted, but rather messages are given an exceptionally long visibility read time when downloaded. When a message is read, the time it remains off the queue can be set. If no messages are deleted then the message limit of 120, 000 is a system bottleneck after which no messages can be read until that number is reduced. This method only works for queues which are relatively small. In the experiments performed, queue messages can reach over 1 Million messages.
• Multi-Threaded Download with messages in flight hybrid is a compromise solution which estimates the number of messages within the queue and has a policy of deleting a proportion of them to take advantage of the messages-in-flight mechanism, while never allowing the maximum number of messages to be in flight. For large queues however, the advantage is diminished over time.