Closely allied with the location policy, this policy dictates when the thread should be transferred. If a task selected to receive a thread subsequently becomes overloaded when the thread arrives, the thread must then be transferred elsewhere if possible. The transfer policy [Sinha 1997] clarifies when the transfer will occur and what action should take place (if any) upon the transfer of the thread.
Usually a candidate thread will be transferred as soon as possible once the participating tasks and the thread itself have been identified. If the location policy is too sensitive, however, the donor task may find itself under-loaded, or the recipient task may find itself over-loaded after departure/arrival of the thread. In the case where the recipient finds itself to be overloaded, it may simply return the thread, or it may retain it to be the subject of another round of load distribution activity.
To defray the fragility of the location policy, two thresholds may be utilised: the first — a ‘high-water mark’ — indicates if the task is over-loaded, the second — a ‘low-water mark’ — indicates if the task is under-loaded. By providing separation between these the above consequential effects of thread transfer may be avoided. An alternative to such a dual-threshold location policy is to include a ‘cool-off period’ in the transfer policy. By doing so, an agreed transfer may be cancelled if the state of either participant changes warranting a reversal of the transfer. The difficulty is in the selection of the
Chapter Four: Load Distribution delay. If reversal is uncommon, the delay hinders performance. This fact is
exacerbated by a large delay value.
4·5 Effectiveness
Loosely-coupled multicomputers are unique in their need for communication between the processing elements. The lack of shared memory necessitates the transmission of threads and data throughout the network of processing elements. The amount of transmission is determined largely by the topology (a connectivity ratio approximating
1:1 or smaller indicates that in addition to performing evaluation of the program, processing elements must also act as routers) and information and initiation policies. Whatever the requirements, communication should be minimised. In a speculative evaluation context this means that:
• speculative threads should be transmitted as little as possible;
• when speculative data are transmitted they should be transmitted only as the result of a direct request;
• if a task is executing a speculative thread of low importance and more important work is available elsewhere only one such thread should be transferred;
• transfers between donor and recipient processing elements should be as direct as possible; and
• tasks executing mandatory threads should not be delayed by the overheads of load distribution-related communication.
This final observation is crucial. Mandatory threads are known to contribute to the program’s outcome; there is doubt about the applicability of speculative threads — they may become mandatory but they may become irrelevant. Whenever the execution of a mandatory thread is delayed because the processing element is routing speculative messages the overall execution time is affected. Multicomputers are particularly susceptible to these delays; in order to maintain utilisation of a processing element which has no work to perform, a thread must be obtained from another processing element in the network.
Network communication and routing take time. It will be faster to communicate with a neighbour than with a neighbour’s neighbour. Further, by the time a neighbour’s neighbour receives a load status message it may be out of date. For this reason (and to
attempt to restrict the amount of load distribution-related network traffic) the multicomputer should be segmented into (overlapping) neighbourhoods with load distribution attempted only between processing elements in the same neighbourhood. To be effective in the context of the speculative evaluation of a general-purpose program on a multicomputer, a global thread scheduling algorithm should:
• employ load distribution to increase performance;
• embrace the aims of load sharing in preference to ‘all-out’ load balancing to reduce complexity and processing overhead;
• react dynamically to best cope with situations as they arise;
• function in a decentralised manner to minimise complexity and communication, and to maximise fault tolerance and scalability;
• behave in a cooperative manner to ensure thread transfers benefit both donor and recipient;
• maintain a load estimation policy that stipulates a simple load metric based on the highest priority of available threads;
• incorporate an information policy that segments the multicomputer into
overlapping neighbourhoods of processing elements to reduce load distribution- related communication and processing, and that advertises the load metric on demand when the task is particularly busy and with a zero threshold at other times to firstly decrease interruption at times of high load and secondly to ensure timely accuracy of load information at other times;
• adopt an adaptive information policy to exhibit the best performances of sender and receiver-initiated transfers and avoid the worst of each of these;
• use the default “best” location policy;
• implement a selection policy to transfer threads only if they are yet to begin execution (to minimise complexity and communication time), and to exclude threads known to be small in granularity, large in size, or of high locality of reference;
• facilitate flexible thread transfers through an unrestricted migration-limiting policy; and
• allow fast turn-around time for thread requests by using an immediate transfer policy.
Chapter Four: Load Distribution The spark percolation load distribution algorithm developed for this thesis is presented in Chapter Nine.
Chapter Five: Haskell & GHC