6·3·6 Concurrent Clean - Effective run time management of parallelism in a functional programmi

The first annotation included in Concurrent Clean is {Par}23 which indicates the

evaluation of the annotated expression should be undertaken on a new processing element. This is similar to the par annotation advocated in Chapter Seven but differs

in two respects:

• there is no choice about the creation of a new thread; and • the thread is created on a remote processing element.

An optional processing element allocation may be specified indicating on which processing element the new thread should be created [Plasmeijer and van Eekelen 1993]. This is a restricted version of the parAtPE annotation advocated in Chapter

Seven — but differs in that the spark is advisory and may be local in augmented GUM rather than mandatory and remote as it is in the PABC machine.

The second annotation is {Self}24 which indicates that the annotated expression

should be evaluated with other threads in an interleaved manner on the current processing element. This is similar to the idea of families proposed in Chapter Seven although the behaviour of that annotation allows the flexibility of specifying a subset of

23_{In addition to the parallel annotation}_{Par}_{in [Plasmeijer and van Eekelen 1993] is the}_{P} annotation in [Nöcker, Smetsers, van Eekelen, and Plasmeijer 1991; Plasmeijer and van Eekelen 1993] which specifies root normal form rather than normal form — see [Plasmeijer and van Eekelen 1993]. The {P} annotation was written {e} in [van Eekelen, Plasmeijer, and Smetsers 1990] and {|P|} in [Loogen 1999; Plasmeijer et al. 1999].

24_{As with}_{Par}_and_{P}_,_{Self}_{also has a root normal form version:}_{I}_{in [Nöcker, Smetsers, van} Eekelen, and Plasmeijer 1991; Plasmeijer and van Eekelen 1993]. This was written {i} in [van Eekelen, Plasmeijer, and Smetsers 1990] and {|I|} in [Loogen 1999; Plasmeijer et al. 1999].

threads that should be executed fairly rather than the entire pool of runnable threads as occurs with the PABC machine.

Other than the {Par AT n} annotation which explicitly creates a new thread on a

remote processing element, there are no mechanisms in the PABC implementations to distribute load across the multicomputer. There is no load distribution process within the Concurrent Clean runtime system.

6·3·7 Partridge’s Contribution

The scheme proposed in [Partridge 1991 and 1992a] is quite elaborate. A thread’s priority is specified as a triple <minpriority, specneed, specspec> where

minpriority represents the thread’s actual priority for scheduling purposes and is a non- positive integer between a maximum value of 0 and a minimum value, minpriority. The irrelevant priority equates to (minpriority-1). The specneed (either 0 or 1) and specspec (a non-negative number at least as large as specspec) values are used in the

calculation of the priority of spawned threads. If the executing thread is mandatory, an annotated strict sub-expression results in the creation of a thread with a priority of 0 while an annotated non-strict sub-expression results in the creation of a thread with a priority of –1. If the executing thread is speculative, an annotated strict sub-expression results in the creation of a thread with a priority of (p-specneed) while an annotated

non-strict sub-expression results in the creation of a thread with a priority of (p- specspec).

This is a very flexible but somewhat complicated scheme. The overhead of storing the triple in all threads must be questioned since the specneed and specspec values are

system-wide. When specneed and specspec are 0 this scheme becomes Black–White

(see Section 3·3·3·1). When specneed is 1 this scheme approximates Levels of

Speculation (see Section 3·3·3·2) — although Partridge’s annotations cannot textually specify a priority.

Each thread maintains a vector of priorities. This is similar to the scheme presented in this thesis where each TSO maintains a list of priority and TSO global address pairs. If a parent thread discovers a child thread is irrelevant it replaces its priority in the child thread’s priority vector with minpriority. Parent threads contain slots containing the

Chapter Six: Related Work identity of a child thread — in an analogous way to the storage of a list of child threads by a thread in the system presented here.

The priority management system combines the notification and evaluate-and-die thread synchronisation mechanisms. The priority maintenance scheme potentially suffers from the chasing problem (where the runtime system attempts to modify the priority of threads that subsequently divide creating further threads to be modified). This problem does not occur with the scheme presented in Chapters Seven and Eight.

To implement priority change and reduction of the graph, five messages are utilised (each is compared to the scheme presented in Chapter Eight):

• the creation of a child thread requires a request message to be sent from the

parent to the child (whereas in this implementation a CHILD message is sent from

the child to the parent);

• the termination of a child thread requires a kill message from the parent to the

child (similar to the sending of a PRIORITY message);

• child thread termination is acknowledged with a killack message from the child

to the parent (similar to the sending of a TSO_DEATH message)

• the priority of a thread is adjusted via an adjustpriority message from the

parent to the child (similar to the sending of a PRIORITY message); and

• results are returned from the child to the parent through a result message (the

graph is updated and no message is required in the implementation presented here).

6·3·8 GRIP, the Spineless-Tagless G-Machine, GUM, and GPH

GRIP [Peyton Jones et al. 1989; Hammond and Peyton Jones 1990; Mattson 1993a and 1993b; Akerholt et al. 1993] — when based on the Spineless Tagless G-Machine [Peyton Jones 1992 and unpub.] — and GUM possess one parallel-related annotation: par. The

addition to Haskell of this construct results in the GPH language [Trinder et al. unpub.]. More information on GUM and GPH may be found in Chapter Five where these have already been discussed.

In addition to the par and seq annotations, GHC has been written to accept a third

annotation in extended Haskell programs: fork [AQUA 1995; Peyton Jones et al.

is scheduled fairly with all other threads [AQUA 1996]. This is the basis for Concurrent Haskell [Peyton Jones et al. unpub.] (in contrast to Glasgow Parallel Haskell) and is not relevant to this thesis.

6·3·9 Mattson’s Contribution

Mattson [Mattson 1993a] uses annotations to introduce parallelism. These annotations may be accompanied by a compiler pragma (expressed as comment) that specifies the speculative priority of the annotated expression (using the Percentiles scheme). Mattson observes that the placement of annotations can alter the results from positive to negative. Threads of a priority below a certain threshold (10%) are discarded. As in the scheme presented in this thesis, no direct control over the depth of speculation is provided.

Scheduling is pre-emptive to ensure that a thread’s priority is up-to-date. When a task empties its runnable mandatory threads it looks in its speculative thread pool and executes the speculative thread of highest priority. No examination of the speculative thread pools of other processing elements occurs even if a higher priority speculative thread exists on another processing element. This is to minimise the impact, but is questionable since the current processing element is speculative and the remote

processing element may be also. In Chapters Seven and Eight a scheme for identifying which processing elements may be interrupted and which may not is presented to improve upon Mattson’s algorithm.

Irrelevant threads are killed by the garbage collector as is done by Partridge, however, the reclamation of the space occupied by a speculative thread is discretionary. Mattson implements deferred updates where a speculative thread doesn’t overwrite its closure with a result but instead overwrites it with a deferred update closure (grey hole) consisting of a pointer to the result and a pointer to the unevaluated expression. When another speculative thread encounters the deferred update closure it extracts the result; when a mandatory thread encounters the deferred update closure it replaces it with the result. If the heap is nearly full when garbage collection occurs, the grey hole can be reverted back to the original graph (if this is smaller than the resultant graph).

Mattson observes that few implementations of speculative thread scheduling have been implemented:

Chapter Six: Related Work

• Hudak and Keller [Hudak and Keller 1982] give all speculative threads the same priority (which is the approach taken in the original GUM runtime system); • Burton [Burton 1985b] fixes a thread’s priority when the thread is created; • Partridge [Partridge 1991] allows increase and decrease of priorities; and

• Osborne [Osborne 1989] (claimed by Mattson to be the author of the only actual implementation prior to Mattson’s) argues that priorities should be relative to the parental context and that they can be increased/decreased (which is the approach used here).

The implementation of Miller and Epstein [Miller and Epstein 1989] in which thread priorities may increase and (also effectively be made irrelevant) should be added to this list — as should the GHC compiler (in which speculative threads have no priority) [AQUA 1996], Eden (in which all speculative threads compete with mandatory threads) [Breitinger et al. 1998; Peña and Rubio 2001; Hernández et al. 2000], and the GranSim simulator developed by Loidl [Loidl 1998] — and the list can now further be extended with the implementation described here.

In document Effective run time management of parallelism in a functional programming context (Page 145-149)