• No results found

1·1·3·1 Paradigm

When writing a program for a uni-processor, program development consists of algorithm development; the challenge is to implement a software solution to the problem. Through the use of high-level languages, the programmer gives little if any direct thought to the underlying machine model or physical hardware.

Writing software to be executed on a computer possessing multiple processing elements, however, is a different matter. In many programming languages and paradigms, the parallel programmer must devise two solutions within one:

• an algorithm for solving the original problem; and

• a mapping of sections of code identified by the programmer as suitable for parallel execution onto the underlying hardware.

If the issues of parallel programming are not able to be shouldered by the compiler, the author of the parallel program must manage them. These issues include: which pieces of his/her code can be executed in parallel, whether the size of these code segments warrants their parallel execution, where in the code data dependencies reside, what critical regions exist, and how the sections of the program identified for parallel execution should be allocated to the underlying hardware.

The choice of programming language paradigm may offer some general assistance — a programming language intrinsically mired in the assignment of values to memory

Chapter One: Introduction locations emphasises the problems outlined above. Functional languages can provide an escape from these problems.

There are many who argue the virtues of functional programming languages e.g. Backus [Backus 1978], Hughes [Hughes 1989], Roe [Roe 1991], and Hammond and Michaelson [Hammond and Michaelson 1999] and it is not an aim of this thesis to introduce the reader to functional languages or to propagandise in an area oft the realm of religion- like debate. (The reader is referred to a review by Hudak [Hudak 1989], to Michaelson, Hammond, and Clack [Michaelson, Hammond, and Clack 1999], or to texts by Bird [Bird 1998], Bird and Wadler [Bird and Wadler 1988], or Field and Harrison [Field and Harrison 1988] for an insight into functional programming.)

Functional programming languages do, however, provide an expedient vehicle for investigating the thesis since they possess a number of features that facilitate the avoidance of many of the problems given above. The paradigm and language of functional programming is the topic of Chapter Two but two features of functional programming require early mention:

• functional programs are free of side effects (and are referentially transparent); and • many functional language implementations are lazy, and evaluation of function

arguments, sub-expressions, and list elements is deferred until the value is actually required.

The combination of these two features yields a program where many unevaluated components exist, which, due to the absence of side-effects, may be evaluated in any order. Given a computer with multiple processing elements, the spare processing capacity could be applied to evaluate the surplus parallel components — but which ones should be chosen? Some indication of the usefulness of each to the program’s outcome would assist in the allocation of these components to processing elements.

1·1·3·2 Parallelism

There are essentially two ways that useful parallelism may be identified in a functional language: compile time analysis that deduces which pieces of the functional program may be executed simultaneously (for example, function arguments and list elements), and

program annotations which advise the runtime system to concurrently execute fragments of the program.

There are two schemes for evaluating a functional program concurrently: conservative evaluation and speculative evaluation [Peyton Jones 1989]. Under the former, code fragments are only executed when it is certain that the value of the expression is

required, i.e.lazy semantics hold. An example of the use of concurrent evaluation under the conservative evaluation scheme is given in Figure 1·1.

f :: a -> a -> a -> a

f x y z = if (x < y) then y else z

Figure 1·1: An example function.

The function declared in Figure 1·1 clearly requires the values of x and y. z, however, is

only required if x≥y; there is then, no definite need to evaluate z. When the function f is applied to arguments, conservative evaluation dictates that the values of x and y —

if either is needed — may be calculated simultaneously (if the result of f is required).

Such usage information could be obtained through strictness analysis (e.g. [Wright 1992]).

Under a speculative evaluation scheme, laziness is usurped by eager concurrent evaluation of expressions when there is some likelihood that the values of those expressions will be needed. In the example of Figure 1·1, all three arguments could be evaluated concurrently if sufficient computing resources were available. Note that although speculative evaluation can provide benefits, it also includes a number of complications:

• there is only a finite number of processing elements, hence there are now expressions whose value is required competing against expressions whose value

may be required;

• it may be the case that some speculative expressions will be found not to be required (irrelevant), and these must be de-scheduled and terminated;

• it may be the case that some speculative expressions will be found to be necessary and the evaluation of these expressions should be expedited; and

• speculative evaluations may raise exceptions or diverge — neither of which should affect the outcome of the program.

A number of distributed parallel functional language implementations (e.g. GRIP

Chapter One: Introduction and Plasmeijer 1991; Plasmeijer, van Eekelen, Pil, and Serrarens 1999], Eden [Breitinger, Klusik, and Loogen 1998; Peña and Rubio 2001], GUM [Trinder et al. 1996; Hammond

et al. 1995; Hammond, Loidl, et al. unpub. a]), and GranSim [Loidl 1998] incorporate limited speculative evaluation. The reader is referred to Chapter Six for a review.

1·2 Aims

Parallel evaluation has many constituents: algorithm, language, architecture, topology, evaluation strategy, and load distribution. This thesis describes investigations into each major aspect and presents the components of an implementation of the efficient parallel execution of the functional programming language Haskell.

The system presented here aims to increase throughput and decrease wall clock execution time. In order for this to happen faster hardware must be used, a different algorithm must be implemented, and/or the runtime system under which the

algorithm’s implementation is evaluated must be modified.

The aim of this thesis is to develop an efficient strategy for handling speculative evaluation and load distribution on a multicomputer for the execution of functional programs. Through experimentation on a virtual multicomputer, it will be shown that speculative evaluation and effective prioritised load distribution are beneficial

components of a runtime system, and, to a lesser extent, that program annotations are a useful, flexible, and natural approach for concurrency identification in a computer program.

This aim will be realised on the test-bed architecture if the execution times for a number of ‘typical’ programs can be reduced through the inclusion of such mechanisms.

1·3 Overview

The results of experiments conducted on a modified implementation of a compiler (GHC [Hall, Hammond, Partain, Peyton Jones, and Wadler 1992; AQUA 1996]) for an expanded GPH language [Trinder, Barry, Davis, Hammond, Junaidu, Klusic, Loidl, and Peyton Jones 1998 and unpub.] (an extension of Haskell [Hudak, Peyton Jones, Wadler, Boutel, Fairbairn, Fasel, Guzmán, Hammond, Hughes, Johnsson, Kieburtz, Nikhil, Partain, and Peterson 1992; Michaelson et al. 1999]) running a modified distributed

runtime system (GUM [Trinder et al. 1996; Hammond et al. 1995; Hammond, Loidl, et al. unpub. a]) are presented. This system runs on a SPARCstation-20 running SunOS 5·5·13 and utilises PVM [Geist, Beguelin, Dongarra, Jiang, Manchek, and Sunderam 1994] to facilitate computation between distributed processing elements.

There are four fronts to the work presented here:

• parallelism indication — five annotations have been added to the GPH language to control the priority of the evaluation of an expression, the processing element on which an expression should be evaluated, and the manner in which a sub- expression should be scheduled in relation to other sub-expressions of the same expression. Four schemes (two qualitative and two quantitative) are also

presented for priority representation;

• scheduling — the scheduler has been modified to maximise load distribution and to pre-emptively execute related sub-expressions in a round-robin manner;

• speculative evaluation — GUM’s conservative evaluation mechanism has been

replaced with a low-overhead, prioritised, speculative evaluation engine; and • load distribution — GUM’s original “fishing” load distributor has been replaced

with a decentralised, non-intrusive, priority cognisant adaptive hybrid sender/receiver-initiated load distributor that implements spark percolation. The test programs have been drawn from a suite of test programs (NoFib [Partain 1992]) commonly used, cited, and recommended [Michaelson and Hammond 1999] for (parallel and non-parallel) functional programming research and benchmarking. Ten test programs have been selected representing a wide range of programming attributes: list processing, input/output, parsing, computation, and differing forms of parallelism. This has been done in an effort to evaluate the impact of the above changes on

different aspects of computation.

The (average) execution time of each program run under four different schemes (sequential execution, parallel execution under the original GUM runtime system, parallel execution under a prioritised GUM runtime system with fishing load

3 SPARCstation is a trademark of SPARC International, Inc. and SunOS is a trademark of Sun Microsystems Inc.

Chapter One: Introduction distribution, and parallel execution under a prioritised GUM runtime system with spark percolation load distribution) have been measured. The activities of garbage collection and speculative evaluation have also been examined; the results are graphed and analysed, and the observations presented.