• No results found

Defining Unsuitable and Inefficient Problems

4.3 Specialised Problems

4.3.3 Defining Unsuitable and Inefficient Problems

Based on the implementation issues we encountered, we observe that some problems are well suited for implementation using the proposed model, others less so or requiring adaptation, and some which are unsuitable for implementation at all.

The dependency structure of the method used to solve the problem is key in defining whether a problem can be solved by our model or not, and how efficiently it can be solved. Broadly, problems fall into four categories of suitability for solving through our model:

• Ideal - These are problems that require a small number of previous iterations to be maintained as dependencies of the current iteration being solved. This has the advantages of allowing large problem instances to be solved due to low memory requirements on the GPU, as well as ensuring the GPU has a high ratio of computation operations compared to memory transaction operations. Example of these problems

include the longest common subsequence, knapsack, edit distance, and Manhattan tourist problems.

• Inefficient - Problems which require a large amount of previous dependencies to be maintained limit the size of instances that can be solved, as more iterations are required to be stored on the GPU - but they can still be solved successfully. An example of a problem in this category is the subset sum problem, where the current iteration has dependencies to all previous iterations prior to the current one. In this case, the model will still solve the problem successfully, but no previous iterations can be moved off the GPU, meaning no memory management will occur, considerably limiting the size of solvable problem instances.

• Requires Adaptation - If a problem has dependencies ahead of the wave front, it means they are unsuitable for implementation without an adaptation to support this. To allow cells ahead of the wave front to be available to the current iteration, the current iteration should be kept in memory as a previous iteration until the future cells it requires are reached by the wave front; at this point these can be filled, and the iteration rotated off the GPU. Examples of problems that fall into the category are the all pairs shortest path, and chain matrix multiplication problems.

• Unsuitable / Requires Extensive Adaptation - Finally, some problems are unsuitable to be implemented through the model without extensive changes being made - these are generally problems which require recursive child scoring grids to be created, such as the travelling salesman problem, or have otherwise specialised problem specific solving methodologies. Whilst we have demonstrated solving the TSP problem in this thesis using the same basic paradigm as our model, it required extensive hard coded adaptions to enable this.

Summary

In this chapter we have described the mapping of the test problems onto the GPU through the use of our parallel model. We have demonstrated through the use of the input files and changing the dynamic programming statement, different problems can been solved. Also discussed were adaptations to the model required when solving problems with more complex dynamic programming algorithms. Finally, problems which are unsuitable for implementation were considered, and the reasons for this are presented.

Testing Methodology

Overview

This chapter describes the hardware environment used for carrying out the performance testing of the proposed model, and the testing methodology adopted when analysing the program. Covered is the hardware specification of the machines used during testing, as well as the software that was used during both the compilation and execution of the program. We introduce the metrics that will be recorded during testing execution, and describe how these can be used to evaluate the model performance, as well as detailing how the test data is generated for each introduced problem. Finally, we provide some small scale benchmarks demonstrating the performance of the underlying hardware, giving theoretical best case performance that the model could achieve.

5.1

Testing Environment and Hardware

In this section the hardware and software environment used across all testing is described. The majority of the testing took place on a desktop computer running the Arch Linux operating system. The code was compiled using the GNU Compiler Collection (GCC) 4.8, and GPU code was compiled using version 6.5 of the CUDA framework. The machine has an Intel 3930K CPU providing 6 cores clocked at 3.2GHz, and 16GB of DDR3 memory. It also contained 2 NVIDIA Titan GPUs which each provide 2688 CUDA cores and are clocked at 837 MHz. Note however, no multiple GPU tests were carried out as the model has not been designed to support this, instead the multiple GPUs just enabled simultaneous testing in the smaller test cases.

A secondary desktop with a lower specification GPU was used in order to demonstrate how the code scales from one hardware environment to another. Code on this machine was compiled using GCC 4.6, and the GPU code was compiled using version 6 of the CUDA framework. In terms of hardware, this desktop has in Intel i7 4790 providing 4 cores clocked at 3.6Ghz, and 16GB of DDR3 memory. It also contains a NVIDIA GTX 960 providing 1024 CUDA cores clocked at 1127 MHz. A full description of the role of the second system during testing will be described shortly.