6·2·8 COBWEB and Norman - Effective run time management of parallelism in a functional programm

The work by Burn and others [Burn 1988a; Bevan, Burn, and Karia 1987] culminated in hardware and software developments. Hankin, Osmon, and Shute [Hankin, Osmon, and Shute 1985] developed COBWEB (not an acronym) which is a parallel machine named after the topology formed by the processing elements. The architecture began as an experiment into wafer scale integration [Boudillet, Gupta, and Winter 1991; Karia 1986; Shute and Osman 1986]. Initially COBWEB implemented a sequential abstract machine — Norman [Hankin et al. 1985] — but later the abstract machine was replaced by the Spineless G-machine [Burn 1988a] (see below). The sequential machine was never actually constructed and was only simulated.

Conservative evaluation only is employed with no speculation present in the system — although evaluation can occur on a needed expression well before its value is required. The amount of evaluation to be undertaken on an expression is specified through the use of evaluation transformers [Burn 1987, 1991a, and 1991b; Howe and Burn 1992; Finne and Burn 1993] which are similar to serial combinators. Each graph node is adorned with an evaluation transformer which dictates the amount of evaluation that should be performed on the node. More will be said about evaluation transformers in Section 6·3·5.

A shared thread pool was implemented originally, but later this was distributed with a load balancing mechanism incorporating migration added.

6·2·9 Roe’s Contribution

Roe [Roe 1989 and 1991] argues that functional language implementations must be efficient in order to achieve speed-up on parallel architectures. He questions how the presence of useful parallelism in a program may best be indicated and concludes that strictness analysis is not enough and that programmer inserted annotations are

necessary for the best performance. He further states a desire that annotations should not be intrusive. Although he advocates in [Roe 1989] the use of evaluation

transformers due to their flexibility, he does not utilise them in [Roe 1991].

In [Roe 1991], Roe experiments with a shared memory simulator that evaluates (in parallel) programs written in the FLIC language. FLIC is described in [Peyton Jones 1988]. Parallelism is indicated by the programmer using par and seq annotations as in

the GUM system.

Roe argues that lightweight sparks should be stored and that these sparks should not be converted to threads immediately. In addition, he presents the view that a high

watermark for the number of sparks should be set and that additional sparks should be discarded if this maximum is exceeded. In the same vein, he advocates the checking of the graph node prior to the conversion from spark to thread so that the spark can be discarded without incurring the cost of sparking if the graph node has been evaluated by another thread. Penultimately, he states that excess parallelism can be detrimental as resources can be exhausted unnecessarily and scheduling is made more difficult. He indicates that parallelism should be limited to the extent that all processing elements are occupied.

Most of these recommendations are implemented in GUM: sparks and threads have different storage requirements with the representation of sparks being minimal, delayed sparking is utilised, no more than 500 sparks occupy each spark pool at any time (with subsequent sparks discarded22_{), the graph node is examined prior to sparking to}

determine if sparking is required (and if not the spark is discarded), and in the extended GUM presented here, no sparks are converted to threads unless either the runnable

Chapter Six: Related Work thread pool is empty, or if the priority of the highest priority spark is higher than the highest priority of all of the threads.

Finally, Roe states — in passing — that many of the problems of speculative evaluation arise because of sharing. In Chapters Seven and Eight of this thesis a scheme for the dynamic maintenance of thread hierarchies is discussed that resolves these problems. He has four main conclusions:

• that functional programming is an excellent paradigm for parallel programming (which has been concluded by many researchers);

• that programmer inserted annotations are better than compile-time analysis; • that the placement of the par and seq annotations in higher-order functions is

beneficial from an abstraction point-of-view; and

• that the evaluate-and-die thread creation mechanism together with user-specified granularity control provides good performance (which is the approach adopted here too).

6·2·10 GAML

GAML [Maranget 1991] is a shared-memory parallel implementation of the G-machine. Parallelism is introduced through a # annotation which is similar to the par annotation

in that it is an advisory annotation which may be ignored if processing resources cannot accommodate its execution. The annotation creates forks (sparks) which are stored in lightweight form in a fork pool until the pool of runnable threads is found to be empty at which point a fork is converted to a thread. When evaluation begins on a closure its tag is altered. If a thread subsequently enters a closure under evaluation it is queued on the closure’s notifier list. There are many similarities with GUM. In GAML, it is possible to configure the compiler to fork a thread immediately rather than store a fork in the fork pool.

In document Effective run time management of parallelism in a functional programming context (Page 129-131)