6·2·20 The mSTG Machine - Effective run time management of parallelism in a functional programm

In his Masters’ thesis, Harwood [Harwood 1994] analyses the shared-memory STG- machine. He produces a design for an abstract machine, entitled the mSTG machine, which supports communication between processing elements using message passing and speculative evaluation through and- and or-parallelism.

In the mSTG machine, closures can be sent messages and a thread may be terminated if it is found not to be required. Harwood states that because of these two actions, black holes could not be used. Instead, much of the effect of black holes was achieved through the use of special closure entry code while the remainder of the closure remained unaltered. In the implementation described in Chapters Seven and Eight, threads are never terminated; rather, they are given a priority that ensures they are not scheduled. Thus no roll-back of closure content is necessary. Further, the retained thread hierarchy ensures that if it should subsequently be found that the closure’s value is required, a priority alteration will result in the continuation of evaluating the closure by the original thread. This is all done while retaining the black hole system.

Parallel-or was implemented through the use of late binding, the notifier list of a closure, and entry and update code. This is close to the approach independently adopted in this thesis — although the mechanism in this thesis supports priorities and different scheduling conditions in addition to modified closure entry and update code. Harwood proposes the storage of task identifiers in closures which were to be

evaluated. This roughly corresponds to the parent list presented in Chapters Seven and Eight — the portability and longevity of task identifiers and closure addresses is, however, completely overlooked by Harwood.

Harwood suggests the addition of six additional messages (drawn from [Partridge 1991]) for the implementation of parallel-or and parallel-and. These comprise:

• a message to request the evaluation of a closure to be undertaken at a given priority;

• a message to adjust the evaluation priority of a closure;

• a message to indicate that the value of a closure is no longer needed;

• an acknowledgment message to indicate that the above kill message was received; and

• a message to indicate that the closure being evaluated by a task has new messages to be evaluated.

The evaluation request message is similar to the CHILD message which is augmented

with the global address of the thread evaluating the closure. Closure result messages are already a part of the distributed GUM runtime system (the RESUME message) and the re-

awakening of blocked threads already occurs through the use of blocking queues and update code. Priority adjustment messages are also sent in the implementation presented in this thesis, but they are sent to tasks rather than closures — it is after all the thread that is being executed, not the closure. Under the evaluate-and-die thread evaluation model, the storage of the priority within the closure rather than the thread is not only inefficient but flawed. The third proposed message corresponds loosely to the

TSO_DEATH message. No acknowledgment message is necessary. Similarly, no ‘process

messages’ message is required.

Harwood adopts Partridge’s priority scheme. Despite use of the message-passing paradigm, no discussion of architecture or load distribution is provided.

6·2·21 DREAM

The DistRibuted Eden Abstract Machine (DREAM) is the abstract machine model for executing Eden programs[Breitinger, Klusik, Loogen, Ortega, Peña 1997; Clack 1999]. Eden [Breitinger, Klusik, and Loogen 1998; Peña and Rubio 2001]. The DREAM is a sequential model consisting of one or more threads, a heap (shared by these threads), and references to values. These references may be for input (inports) or for output (outports). Each thread is connected to a single outport. Each heap is held locally and there is no need of (nor support for) a virtual shared heap. The overall program’s state is modelled through a collection of DREAMs, one per process.

An outport is simply a global reference to an inport — i.e. a pair consisting of the process identifier of the remote DREAM coupled with a channel identifier on the remote machine.

Chapter Six: Related Work The state of a thread within a DREAM is defined by the following components

[Breitinger, Klusik, et al. 1997]: • a code pointer;

• an argument stack; • a return stack; and • an update stack.

Each DREAM executes code from an intermediate language — Parallel Eden Abstract Reduction Language (PEARL) — which is an extension of the STG language

containing primitive parallel constructs.

A simulator for Eden (the PARAllel DIstribution Simulator for Eden, or, PARADISE [Hernández et al. 2000]) has also been developed and is based on GranSim.

6·2·22 GranSim

GranSim [Loidl 1998] has already been introduced. From Chapter Two, GranSim is a simulator that allows the simulation of an annotated parallel functional program on a variety of architecture models. Many aspects of the execution system may be adjusted including: the size of packets, the communication model (synchronous versus

asynchronous), and thread placement versus migration. Detailed profiles of the running program can be generated and examined for the purposes of redesigning the program’s algorithm, or for the detection of anomalous behaviour relating to execution system configuration.

GranSim and GUM share the STG-machine as the abstract machine model but GranSim is implemented on a multiprocessor. GranSim can be parameterised to resemble the original GUM runtime system.

GranSim extends GPH with three annotations (see Section 6·3·11).

Loidl also presents a static granularity analysis for a strict higher-order functional language, L.

6·2·23 Others

In addition to the above functional language implementations and abstract machine models, there are a number of others of lesser direct relevance to the work . Other

abstract machines include: the Stack, Environment, Control, and Dump (SECD) machine [Landin 1964; Henderson 1980; Boudillet et al. 1991], Traub’s machine [Traub 1985], the Categorical Abstract Machine (CAM) [Cousineau, Curien, and Mauny 1985; Cousineau 1990], the Ponder Abstract Machine (PAM) [Fairbairn and Wray 1986], Flagship [Watson and Watson 1986; Sargeant 1986; Watson et al. 1987; Keane and Mayes 1992; Keane 1994; Tan and Chin 1992], Parallel Abstract Machine (PAM) [Loogen, Kuchen, Indermark, and Damm 1988 and 1989; Kingdon et al. 1989], George’s machine [George 1989], the Threaded Interpretive Graph Reduction Engine (TIGRE) [Koopman and Lee 1989], the Amsterdam Parallel Experimental Reduction Machine (APERM) [Hertzberger and Vree 1989], the Babel Abstract Machine (BAM) [Kuchen, Loogen, Moreno-Navarro, Rodríguez-Artalejo 1990], the Miranda Parallel Machine (MPM) [Olszewski 1991], the Categorical Multi-Combinator Machine

(CMCM) [Thompson and Lins 1992], the PCKS-machine [Moreau 1994], Gofer [Jones 1994]. Other implementations include those of: Magó [Magó 1979a and 1979b], Kluge [Kluge 1983], Bloss, Hudak, and Young [Bloss, Hudak, and Young 1988], Amamiya and Taniguchi [Amamiya and Taniguchi 1989], Osborne [Osborne 1989], Revesz [Revesz 1990], Skillicorn [Skillicorn 1991], Kaser, Pawagi, Ramakrishnan, and Sekar [Kaser, Pawagi, Ramakrishnan, and Sekar 1992], Bülck, Held, Kluge, Pantke, Rathsack, Scholz, and Schröder [Bülck, Held, Kluge, Pantke, Rathsack, Scholz, and Schröder 1994], Zuhdy, Fritzson, and Engström [Zuhdy, Fritzson, and Engström 1995], Flanagan and Nikhil [Flanagan and Nikhil 1996], and Davis [Davis 1997]. There are many others and the reader is referred to [Schreiner unpub.] and [Szymanski 1991] for recent

compilations.

Excellent overviews of the stages involved in the parallel implementation of functional languages are given by [Burn 1990], [Peyton Jones 1987] (and in summary form [Peyton Jones 1989]), [Plasmeijer and van Eekelen 1993], and [Hammond 1994]. Interesting historical perspectives are included in [Hammond 1994] and [Martins 1992].

In document Effective run time management of parallelism in a functional programming context (Page 137-140)