Chapter 5 The NIMBUS Pipeline
5.3.1 Experiment Metrics System
5.3.1.1 Experimental Parameters
Given the number of variables within the experimental setup, an exhaustive experimental approach would involve thousands of experiments. In total approximately 100 experiments were conducted with the primary aim of identifying the key variables within the system and to use that to maximise the performance of the NIMBUS pipeline within the given constraints of cost. The following seven variables were identified and where possible controlled for within each experiment.
• VAR1: Web Server Location. There are potentially significant impacts associated with the location of the web server providing files to the pipeline. While the configuration of 6 of the web servers was similar, the network between them and the workers was quite different. Network performance could have significant impact on the performance of a worker.
The locations of the web servers has already been highlighted in Figure 5.6. For most
experiments, the AWS US East (Virginia region) web servers were used while the remaining web servers provided scaling options for larger scale experiments. The fastest performing
web server was provided on loan from HEAnet 4, a high performance balanced web server
highly tuned for large data transfer when the experimental requirement was to eliminate the web server performance from an experiment. This resource was used in a number of experiments producing the highest system performance.
• VAR2: Number of web servers. Initial testing reviewed the impact of increasing the number of servers that served files to workers. Multiple servers allows for the testing of the network performance when running small numbers of workers. There are 8 Nginx web servers and 1 FTP cluster used.
• VAR3: Number of Instances. An instance is a virtual or physical machine running a unix environment capable of running 1 or more workers. The instances are primarily AWS EC2 ubuntu machines and the number run in an experiment ranged from 1 to 100.
• VAR4: Type of Instance. The EC2 instance types available range in size and config- uration. Experiments can be run with a number of EC2 instance types such as T1.Micro, T1.Large, M1.Large, M1.XLarge and C1.Xlarge. For a subset of experiments two physical machines were also used. A Sun/Oracle x4150 and an IBM i326e Server. Details of the specification of these given in Appendix Table D.3
• VAR5: Workers per Instance. For both single and multicore systems the use of multiple threads running independent workers allows for an investigation of the balance of CPU and Networking resources on an instances. Instances with more CPU cores should potentially improve the performance of the system if multiple worker instances are run.
4
• VAR6: Experimental Time. Most experiments ran for 20 minutes which was deemed long enough for the processing rate to be determined, and to ensure that a large volume of data could be processed. Some exceptions to this occurred near the final experiments to demonstrate that the processing rate could indeed be maintained over longer periods of time.
• VAR7: Batch Size. The C program which processes images can process all files it finds in a directory. If there is a single data file found then the file is read in along with the Master Files. The images are processed, the results posted and the C program terminates and is restarted for the new batch. If the batch size is 10 then the Master Files are read in and held in memory while each of the 10 files are processed. This reduces the number of times the C program is started and stopped and reduces the file I/O by only reading the Master Files once per batch instead of once per data file.
5.4
Results and Discussion
This Chapter defined and executed a series of experiments to determine the overall performance of the NIMBUS system architecture presented in this chapter. The results of each experiment are broken down and analysed to provide a comprehensive view of the system. In some cases the experiments cover multiple components of the system, but in all cases the context of the results and their contribution to the overall pipeline are discussed. There are four basic sets of experiments performed, message queue performance, single and multi node instance scaling, and pipeline limit testing. Table 5.11 shows the high-level experiments performed which are further broken down within this chapter. The web queues must be fast enough to service high levels of concurrent requests and the globally distributed computing nodes should be able to scale linearly until bottlenecks are observed. If bottlenecks are found then the architecture should be flexible enough to work around them. All data sources for all experiments and graphs are identified in Appendix Table D.2 which references an accompanying supplementary USB disk which contains raw and processed data relevant to these experiments.
Limits imposed on the experiments were based on limits of available resources, although where possible indications of scaling opportunities were identified. For the pipeline to be active a minimum of 1 worker is required to perform image cleaning and reduction. Multiple workers processes can run on a worker node/instance, which is typically a virtual AWS instance. The maximum number of instances activated within the experiments was 100, but the maximum number of workers was 10, 000. In some cases multiple runs of the same experiment were performed to ensure results were repeatable.
It is required that the processing rates used within these experiments are expressed correctly and consistently. While the original data set is stored as multiple images per file, and the raw
Reference Measure Description
Exp:NIM1 SQS Performance. Testing the read and writing times of the web message
queues
Exp:NIM2 Single Instance
Node Performance.
Determine the variables which affect the performance of the overall processing power of a single instance.
Exp:NIM3 Multi-Instance
Node Performance.
Focus on scaling the number of instances up to 100 looking for factors which could affect the scalability of the system.
Exp:NIM4 System Limits. Identify the full scalability of the pipeline and to identify
strategies to continue improving the system performance
Table 5.11: NIMBUS Experiments Overview
results from the experiments were measured as files per second, or images per second, a more useful representation of the processing rates is the amount of data processed over time. The conversion from files to bytes also needs to take into account that the data being processed is compressed so the concept of equivalent uncompressed data rate is also given. The following values and calculations are central to correctly determine processing rates.
• F Uncompressed File Size. An unprocessed image data cube is 7.297920 MB.
• Fc Compressed File Size. An unprocessed compressed data cube is 1.247040 MB.
• Fi Images per File. The number of images within an image data cube is 10.
• I Images size. An unprocessed compressed image.
I = Fc
Fi
(5.4) • C Compressed Rate. The size reduction of an image using the fpack utility.
C = Fc
F (5.5)
• Pf psProcessing rate in files per second. The number of files processed per second using the
NIMBUS pipeline.
• Pgps Processing rate in GB per second.The number of gigabytes processed per second of
compressed data using the NIMBUS pipeline.
Pgps=
Pf ps∗ Fi∗ I
1024 (5.6)
• bPgps Processing rate in equivalent uncompressed GB per second. The number of gigabytes
processed per second of equivalent uncompressed data using the NIMBUS pipeline.
b
Pgps=
Pgps
Reference Measure Description
Exp:NIM1-1 SQS Write Performance
Single Node
Testing the writing time of a single storage node using a series of threaded applications
Exp:NIM1-2 SQS Write Performance
Multi-Node
Testing the performance of the queue when multiple sources writing to it
Exp:NIM1-3 SQS Distributed Read
Performance
Testing the distribution of messages from multiple nodes and the impact this has on queue read per- formance
Exp:NIM1-4 SQS Queue Read Rates Testing the read speed of an SQS queue using a
series of threaded approaches
Table 5.12: NIMBUS SQS Performance Experiment Overview
• Pbgph Processing rate in equivalent uncompressed GB per hour. The number of gigabytes
processed per hour of equivalent uncompressed data using the NIMBUS pipeline.
b
Pgph = bPgps∗ 3600 (5.8)
• Pbtph Processing rate in equivalent uncompressed TB per hour. The number of terabytes
processed per hour of equivalent uncompressed data using the NIMBUS pipeline.
b
Ptph=
b
Pgph
1024 (5.9)