Parallel Agent-based Modelling and Simulation

I

T HAS BEEN DEMONSTRATED (in Chapter 2) that the inherent parallel nature of large-scale agent-based models is exploited with relative ease across graphics processing hardware. The increased performance brings with it access to larger systems, which are gaining increasing interest in recent years. For instance, biological systems such as algal photobioreactors can be modelled only at very small scale at present. Graphics processing hardware certainly show great promise to accelerate such simulations on the laboratory desktop computer. The use ofGraphical Processing Units (GPUs)in some of the world’s fastest supercomputers further attest to this.

Although there appears to be good scaling characteristics of models such as Boids acrossGPUs, a significant issue remains. Computational complexity of models such as Boids cause a very fast increase in timestep computing time as system sizes increase. This complexity arises from the interaction between agents. Where an agent must determine whether another agent is within communication distance, it must compute a distance and test it for every agent. The vast majority of authors in theAgent-based Modelling (ABM)community agree that agent-agent interaction is a crucially important aspect ofABM.

Fortunately, this problem has been considered in the N-body literature [203] as early as 1985 by Andrew Appel [3] since before the wide acceptance ofABM. Two major schools of thought exist in the reduction of complexity in systems such as these: approximations such as the Barnes-Hut method [12], and redundancy elimination. The former deals with approximating the collective effect of a cluster of agents (such as that of Scheffer et al. [252], though, for a different purpose), the latter deals with the elimination of redundant interactions. Should an agent only communicate and interact with agents in close proximity, then any method which reduces the number of distance computations which do not result in interaction, should be employed. A good method of accomplishing this is by using spatial partitioning techniques, which uses fast sorting algorithms to group together agents with a guarantee that all agents within range will still interact.

While spatial partitioning techniques acceptably accelerate simulations in single-threaded environments, the use of these on a data-parallel architecture is less trivial. Many spatial partitioning algorithms construct their

180 8. DISCUSSIONS

datastructures by recursion using pointer trees. Although recent advances in graphics hardware have improved support for recursion and dynamic memory allocation, these are still less trivial to utilise effectively. Warp divergence caused by branching code also reduces throughput due to the fact that the SIMT architecture forces threads to execute the same instructions, meaning that threads which diverge are simply disabled (executing NOOPs) until their execution paths align again. For these reasons, a number of spatial partitioning algorithms have surfaced for graphics hardware.

Chapter2discussed a spatial partitioning technique commonly termed the uniform grid [84] in the context of

ABM. Green [84] used the algorithm and datastructure to accelerate collision detection computations in a particle simulation accelerated by NVIDIA’s CUDA. Considerable performance improvements are obtainable from using this technique. Results from testing this algorithm in the context ofABMreiterated this, but presented a shortfall. Assuming that particles are uniformly distributed across the space (a condition always satisfied in the simulation of Green), then an acceptably smaller number of adjacent particles will be iterated over for each particle during timestep computations. However, in the case of Boids, where collision detection is less important (more flocking behaviours), agents can contract around a specific location, causing complexity increases beyondO(n2₎_{due to} the fact that all agents must again communicate. Not only this, but a datastructure must still be computed which (depending on the algorithm) sometimes reachesO(n2₎_itself.

In summary, it seems that spatial partitioning can be useful, but the appropriate algorithm must always be chosen. The multi-stage programming paradigm which features in this dissertation in Chapters5and6provide the ability to “store” such algorithms and their datastructures for later use. The addition of compile-time checks may also yield the appropriate choice of algorithm, alongside “hints” from the user and run-time information collection.

With the improved performance afforded by using a single graphics processing unit, it is not unreasonable to consider the use of several. Section2.2.3considers this along with the use of a uniform grid spatial partitioning algorithm. Given that algorithmic improvements could be made, it was encouraging to see that the multiple-GPU version of the model was able to be supported by the same spatial partitioning algorithm and datastructure. The result was a reasonably scalable simulation, bounded by host memory. With additional improvements, storing the simulation entirely upon graphics hardware would see this restriction removed and a model could then be arbitrarily scalable, bounded only by the number of graphics processors on a single machine. The next step would be to use multiple host nodes in a high performance cluster with multiple graphics processors each.

Throughout the discussion of these techniques in Chapter2, visualisation took a secondary role consistently providing qualitative feedback, complemented by quantitative methods including techniques such as clustering and spatial histogramming. Visualisation methods are very important and deserve developmental efforts proportional to that of the model itself. Building a model without qualitative feedback is a difficult and error-prone endeavour.

Visualisation also presents its own problems, particularly that of performance. It is for this reason that performance measurements were always taken with no visualisation. Rendering a set of agents upwards of one million is particularly time consuming. For these simulations, pixel shaders were used to render agents as spheres. The alternative was to render Boids, for example, as agents indicating their current heading. While helpful for debugging purposes, rendering slows tremendously. Concepts similar to laboratory-bench scale experiments in building large-scale agent-based models seem more and more appropriate. A technique not taken into account in Chapter2was the use of pixel buffer objects (PBOs), which take advantage of the fact that computation of data destined to be rendered take place on the graphics processing unit itself. While this removes some of the need to constantly copy memory between host and device (a computationally costly process), it would still be necessary to endure the waits for rendering many triangles, which could be mitigated by the same point-rendering methods already mentioned.

In document Data parallel structural optimisation in agent based modelling : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, New Zealand (Page 196-198)