Early work by [Perumalla, 2006] showed that DES style execution is possible on GPUs and could give a 2 fold improvement over sequential code but it was also noted that much work was required to convert DES code from a multicore based model to a streaming multiprocessor model. This work was followed up in 2008 [Perumalla and Aaby, 2008] where the GPU performed 2 to 3 orders of magnitude faster than that achievable using Agent Based Model (ABM) toolkits (which were sequential) when benchmarked on three homogeneous models (Mood Diffusion, Game of Life and Schelling Segregation). It was noted [Perumalla and Aaby, 2008] that the GPU code gained speed at the cost of modularity and reusability when compared to the ABM toolkits. Programability was also more difficult and there were worries about model correctness - artificial biases can be introduced for per- formance optimisation.
[Lysenko and D’Souza, 2008] also identified a GPU implementation of version 16 of the StupidModel benchmark [Railsback et al., 2005] (again a homogeneous model) that was an order of magnitude faster than ABM toolkit implementations. They demonstrated some data based algorithms (Single Instruction, Multiple Data (SIMD)) suitable for ABM and felt that the area showed promise.
Richmond [Richmond et al., 2009] developed an agent based GPU framework that can be scripted using a C++ like scripting language and used this to benchmark the flocking algorithm [Reynolds, 1987]. The results were promising but no direct comparisons were made with the equivalent sequential or multicore code.
Park [Park and Fishwick, 2010] showed that General-purpose computing on Graphics Processor Units (GPGPU) could be used for simulations where the time steps were irregular (similar to the DES approach) and developed a framework that gave an order of magnitude speed up but also gave results that were approximate and result in numerical error, thus confirming Perumulla’s worries over the same issues.
More recently again, Seok [Seok and Kim, 2012] compared homogeneous cel- lular models implemented both on multicores using OpenMP and GPUs using Compute Unified Device Architecture (CUDA). The GPU was 2.5 times as fast as a quad core Central Processor Unit (CPU) when simulating the largest model. He made no reference to the difficulties in developing for GPU or CPU. Around the same time the Flame ABM [Richmond et al., 2010] framework was devel- oped. It was designed to be easy to program and hides as much of the CUDA API from the user as possible. It describes agents and messages using X-machines [Holcombe, 1988]. X-machines are designed using formal techniques and encoded using the X-Machine Markup Language (XMML) while messages are described in eXtensible Markup Language (XML). Message implementation is still in the low level CUDA C subset with some extra restrictions on what the code is allowed to do (e.g. functions can only output one message). The GPU version of Flame shows a 250 times speed up over the non GPU version. Although no direct com- parisons between the speed of Flame and other ABM toolkits were found it is hard to imagine that it is not many times faster. For best results it is recommended that Flame is used for homogeneous models with many agents and simple mes- sages [Richmond et al., 2010]. For many ABSS these conditions do not hold so actual attainable speed-up will be lower but the authors provide no details as to how much lower they will be.
3.5
Summary
Concurrency introduces extra complexity [Sutter and Larus, 2005, Lee, 2000]into program development (such as deadlock, livelock and nondeterminacy) but these difficulties can sometimes be offset by speed-up. The maximum attainable speed- upis outlines in Amdahl’s and Gustafson-Barsis’ Laws [Amdahl, 1967],
[Karbowski, 2008]. In ABSS Gustafson-Barsis’ Law is more applicable as concur- rency is used to allow modellers to produce larger and more complex simulations. We can measure the theoretical space-time complexity of algorithms using asymp- totic analysis.
While work has been undertaken in parallelising ABSS before, this work suf- fers from two deficiencies. The first is that this work tends to parallelise only the simpler agent interaction types (for example, the Stupidmodel benchmark
[Lysenko and D’Souza, 2008]). It is difficult therefore to judge how their approach to concurrency will transfer to the more complex ABSS.
In chapter six we present new algorithms for synchronous updating in ABSS that can cope with the most complex agent interaction type and use asymptotic analysis to demonstrate that their space-time complexity belongs to the same com- plexity classes as asynchronous updating algorithms. We also prove that our new algorithms are deadlock and livelock free and that they are deterministic in execu- tion.
The second problem is that results tend not to be repeatable or even when they are repeatable (for example, through availability of original source code) they cannot be easily compared against other approaches [Edmonds and Hales, 2003].
What is required is a precisely defined benchmark that contains a wide range of interaction types of varying degrees of complexity. Producing such a benchmark requires tackling the replication problem in ABSS and the definition of a suitable range of agent behaviours.
In the next chapter we look at Sugarscape, a well known ABSS that contains a wide range of agent interaction types. These vary in complexity from the very sim- ple to the very complex. In chapter five Sugarscape is used as an example to show how to formally specify an ABM. The formally defined version of Sugarscape contains the precision required to allow its use as a benchmark. In appendix A this benchmark is used to test our new SU algorithms in chapter six to demonstrate that, in practice, they achieve the performance predicted by the theoretical complexity measures we derive in chapter six.
Chapter 4
Sugarscape
4.1
Sugarscape’s Place in ABM
Sugarscape is the simulation that demonstrated how ABM could be applied to the
social sciences. It first appeared in the book Growing Artificial Societies[Epstein and Axtell, 1996] and it remains influential today. Almost every major simulation toolkit (Swarm,
Repast, Mason and NetLogo) [Railsback et al., 2006a, Berryman, 2008, Inchiosa and Parker, 2002] comes with a partial implementation of Sugarscape that demonstrates that toolkit’s
approach to simulation. Different concurrency researchers [Lysenko and D’Souza, 2008, D’Souza et al., 2007] have used the Sugarscape model as a testbed for benchmark- ing different approaches to parallelising ABMs.
Although the rules of Sugarscape have been defined in [Epstein and Axtell, 1996] there is no general agreement on their exact meaning [Bigbee et al., 2007, Gilbert, 2014]. These difficulties hamper the ability of researchers to:
1. properly compare their approaches;
2. provide complete implementations of Sugarscape; 3. replicate their results.
Originally the rules were stated with an explicit assumption that the under- lying implementation would be sequential. Concurrency was simulated through randomisation of the order of each rule application on the individual agents. Mod- els that follow this regime are termed asynchronous although sequential is a more accurate term.
Sugarscape was originally implemented in twenty thousand lines of code and is a lattice based simulation.