Improving Speculative Moves - Parallelisation by Speculative Chains

Chapter 4 Parallelisation by Speculative Chains

4.2 Improving Speculative Moves

The presence of Ms moves has a detrimental effect when using speculative moves because they impair thread utilisation, as shown in fig. 4.3 a). In each program cycle involving aMs move, threads performing aMf move are left idle whilst they wait for the slow move to complete. The naive implementation of speculative moves

potential state-change propose a new state

prior likelihood

acceptance test (rejected) acceptance test (accepted)

Figure 4.3: Each row represents a thread, vertical lines indicate synchronisation

points between threads, shapes represent work being done. Time passes from left to right. a) The presence of long-running moves reduces the benefits of the ‘naive’ speculative move implementation. b) By using threads only if they are not already busy we mitigate the adverse effect of longer-than-normal moves.

presented earlier (fig. 3.2) guarantees that all speculative moves will be employed at each loop round the program cycle by synchronising the threads (waiting for all move calculations to complete) before starting the next set of move proposals. The threads are always used for each step, delaying the next step if just one thread is busy working (whether the results of that thread will be used or not). The alternative is to use the threads lazily, a thread will only be used for a step of the program cycle if that thread is available when it is needed.

Under this revised implementation if a proposed move is rejected we will wait for the speculative move(s) to make decisions and act accordingly (as before). However, when a proposed move is accepted any additional speculative move threads that are active are flagged as cancelled, then the primary thread immediately begins work on the follow-up move. When a new speculative move needs to be processed,

any threads that are flagged as ‘cancelled’ but have not yet ceased processing are ignored and for that program cycle fewer speculative moves than normal are used. Speculative moves are considered only if there is a thread ready and waiting to be used, a speculative move will not be employed if it delays work on moves that are guaranteed to count towards the total number of MCMC iterations performed. Since the maximum number of speculative moves may not be utilised if one or more threads are busy, the average number of ‘normal’ MCMC iterations performed in each more steps (loops round the program cycle) is reduced. More steps will be required to obtain the same number of MCMC iterations, however the average time per step will be decreased as it will no longer be necessary to wait for invalidated Ms moves to complete their (unnecessary) processing. The net result is a increased number of normal MCMC iterations performed per unit time.

Figure fig. 4.3 b) shows lazy thread use in action. In the first step shown the fourth thread is taking longer than normal to complete, either the move being considered is from Ms or processing was delayed by resource conflicts (i.e. a back- ground process was temporarily allocated control of a processor core). Since the move considered on the third thread has been accepted, there is no need to wait for the results of the fourth thread to complete, so it is flagged as cancelled. Once the move from the third thread has been applied the next batch of moves is considered on the three available threads. When the fourth thread finally does complete processing it discards its results (they are no longer relevant) and reverts to its idle state to await and participate in the next batch of moves to be considered.

It would be preferable for the fourth thread to simply cease processing immediately upon the determination that its results are irrelevant, that way all threads would be available for the next step in the program cycle. Unfortunately this is not always achievable. Killing and restarting a thread in order to stop work on a move is not an option as it does not allow the thread to release any resources it was using, potentially causing memory leaks and/or deadlocks (if the thread holds a mutex

lock that it has not yet released at the time of the threads demise). ‘Terminating’ a cancelled move requires the ‘cancelled’ flag for that thread to be polled throughout each move’s time-consuming calculations, skipping the remaining calculations if the cancellation flag is set. Since this flag is shared between threads, access to it must be synchronised (reads/writes controlled by a pthread mutex), adding overhead even if the move is never cancelled. Frequent polling allows for a faster response to the cancellation flag being set, at the expense of the increased overhead in repeatedly checking the mutex-protected flag, and increased complexity in the move calculations in order to enable this polling to take place. The choice of whether it is worth enabling the premature termination of cancelled moves needs to be made on a case-by-case basis, taking into account the added difficulty of implementation, the frequency with which moves will need to be terminated, and whether the move can be terminated fast enough to make the added overhead worthwhile.

To assist in this decision, consider the case where no moves can be terminated prematurely. It is possible for all threads performing speculative moves to become ‘occupied’ by cancelledMsmoves, leaving just the primary thread to perform work and resulting in near-sequential runtime. This will only become an issue if Ms moves are proposed faster than they can be cleared from the threads performing speculative moves. A thread that is performing aMsmove will take the same time as τs

τf fast (Mf) moves to complete the Ms move. For the remaining threads to be

kept clear of anotherMs move whilst the first is still being processed, the nextMs move should not be proposed for τs

τf iterations (the time it takes a slow move to

process divided by the time it takes a fast move to process.). In other words the probability of proposing aMson any of thenthreads should be less than

τf τs, giving (1−qf)n< τf τs qf > n r 1−τf τs (4.3) whereqf is the probability an arbitrary move belongs toMf as oppose toMs. This

means less than < 10% of moves can be slow (from Ms) when a slow move is 5 times the length of the fast one, or only < 0.5% of moves if slow moves take the time of 100 fast moves. Implementers are therefore encouraged to accommodate the early cessation of processing Ms moves as they design the move proposal and prior/likelihood implementation.

In document Parallel Markov Chain Monte Carlo (Page 103-107)