Chapter 6 Conclusions and Future Work
6.4 Limitations and Future Work
Although the results from chapters 3 to 5 show clear benefits from the various methods, the exact improvements obtained will depend on the specific characteristics of the application to which they are applied. Potential factors that may impact the final runtime include:
1. The time per iteration
2. The time per phase of each iteration (i.e. time to calculate the prior, time to calculate the likelihood)
3. The move rejection rate
4. The proportion of different types of move (Ms,Mf,Mg,Ml etc).
5. The performance characteristics of different parts of the program, dependant on the hardware, operating system, and compiler/compiler settings utilised.
6. Caching issues, with SMP parallelisation methods conflicting simultaneous memory access request could cause problems, potentially memory thrashing. Further exploration of the consequences of changing these factors on an MCMC program’s runtime is desirable, particularly with the aim of creating or improving means of predicting runtimes (either though formulae or fast simulators such as from section 4.3.1). To date all such factors have been treated as constant over a simu- lation run, more accurate runtime predictions may be possible if the consequences of varying these factors at runtime are examined. For example, if starting with a underpopulated initial state the time per iteration would likely increase as more features are located and the size of the model (and statespace) increases. The move rejection rates are also very likely to change as the simulation nears convergence (a ‘near perfect’ state is going to reject more moves than a random initial state), the benefits of the speculative methods therefore change throughout the simulation.
The simulations used to predict results in chapters 3 and 4 can be refined to more accurately reflect the implementation. For example, the simulator from sec- tion 4.3.1 should be updated to account for non-negligible duration state-cloning. Predictions for specific applications and platforms would also be enhanced by con- sidering the varying costs of mutex operations over different hardware and software systems.
Another issue not fully addressed is determining the most appropriate size of partitions when using periodic partitioning (section 5.1). The smaller the partitions the greater the benefits from the parallel processing in Ml phases (assuming suffi- cient processors), however a smaller partition size also means a greater proportion of the image is unable to be modified during thatMl phase (as features that intersect with any of the partition boundaries may not be modified, and no modifications can be proposed that would cause a feature to intersect with a partition boundary), potentially delaying convergence of the chain. There is also the option of using speculative moves on the work done in each partition, as an alternative to shrinking
the size of each partition. It would therefore be useful to determine the extent to which these concerns and options conflict in some typical MCMC applications, and how to arrive at an optimum compromise.
One matter particularly relating to the image splitting methods of chapter 5 is that of load balancing. There is no guarantee that subimages will require equal amounts of processor time to process, in which case the order in which subimages are scheduled for processing may greatly effect the final runtime. Load balancing is also a concern for all methods in heterogeneous multiprocessor environments (such as in clusters with processors of different capabilities). Task scheduling in such an environment is an active research area, one which both the implementation and predictions of all methods from chapter 5 could benefit.
Since the efforts here have been split between constructing the pMCMC framework, implementing the parallelisation methods and implementing and fine- tuning the programs for the section 2.5 algorithm, it was not possible to implement and test a wider range of MCMC algorithms and applications. With the pMCMC now providing parallelisation and support code it is hoped that implementers with a specific expertise in MCMC algorithms will construct pMCMC applications of greater complexity and scope than was possible for this thesis. It would also be interesting to examine how different the differing memory footprints and access patterns impact the runtime of the various parallelisation methods. Another area for exploration is determining the extent to which more traditional MCMC programs (integral approximation being the classic example) can benefit from speculative moves and chains.
Finally there are opportunities for expanding the pMCMC framework. Cur- rently the framework providesMoveclasses implementing move operations for models
consisting only of an unordered collection of independent features. A set of classes to facilitate the use of models with inter-feature relationships (features organised in binary trees, for instance) would be helpful for would-be-developers for applications
that exhibit non-independent features. Support for distributed execution across mediums other than MPI may be useful, and a more functional user interface would make the end-user simulators more accessible. More important would be in-built support for additional MCMC variants, (MC)3 for instance. Due to the internal de-
sign of pMCMC adding periodic parallelisation-like variants such as (MC)3 should
be a relatively easy undertaking.