Static Parallelization of Functional Programs: Elimination of Higher-Order Functions & Optimized Inlining

(1)

Elimination of Higher-Order Functions &

Optimized Inlining

Christoph A. Herrmann, Jan Laitenberger, Christian Lengauer, and Christian Schaller

Fakultät für Mathematik und Informatik, Universität Passau, Germany {herrmann,lengauer}@fmi.uni-passau.de http://www.fmi.uni-passau.de/∼lengauer

Abstract. Functional programs have long been recognized as attractive subjects of an implicit static parallelization because functional programming excludes artificial dependences, which would restrict parallelism.

One central concept which makes functional programming a powerful paradigm is the higher-order function, which can have functions appearing in its arguments or result. We present an automatic method of elim- inating higher-order functions, which is based on earlier work by Bell, Bellegarde and Hook [2]. The number of auxiliary functions added in the process is subsequently minimized by inlining transformations.

Keywords: functional programming, Haskell, higher-order function, in- lining, parallelization, skeletons.

1 Introduction

We report on first experiences with a new compiler for a functional language, called HDC, which supports the use of skeletons, i.e., higher-order functions which have customized parallel implementations, collected in a skeleton library.

The overall idea is to equipHDC with implicit, high-quality parallelism through these skeleton implementations.

The compiler consists of quite a number of phases; we concentrate on two here. One performs higher-order elimination (HOE), the other inlining. More complete information on the compiler is available elsewhere [6].

The following section sketches the different phases of our parallelizing compiler. Section 3 describes the HOE algorithm proposed in the literature and our modifications to it. Section 4 comments on the quality of the generated first-order program. Section 5 concludes.

2 The HDC Compiler

TheHDC compiler [6, 9] translates a subset of Haskell [3] into an imperative language – at present, C with MPI calls. The main difference to Haskell is thatHDC

P. Amestoy et al. (Eds.): Euro-Par’99, LNCS 1685, pp. 930–934, 1999.

Springer-Verlag Berlin Heidelberg 1999c

(2)

is strict, in order to facilitate a compile-time parallelization. (However, invisibly to the programmer, strictness is partly eliminated by inlining transformations.) The compiler is based on the principle of compilation by transformation, which has already been used successfully in the Glasgow Haskell compiler GHC [10], and consists of the following phases [6]:

1. scanning/parsing, using the toolhappy 2. desugaring

3. list comprehension simplification 4. lambda lifting, let elimination [8]

5. simplification of list comprehensions 6. type checking

7. monomorphization

8. elimination of functional arguments (HOE) 9. elimination of mutual recursion (optional) 10. case elimination

11. generation of intermediate DAG code 12. tuple elimination

13. optimization cycle (optional) – inline expansion

– rule-based DAG optimizations – size inference [5]

14. abstract code generation

15. automatic parallelization (optional) 16. code generation

3 Higher-Order Elimination (HOE)

The program subject to HOE must be well-typed according to the Hindley- Milner rules. It must also be closed, i.e., all functions cited must be available to the HOE procedure for a global analysis and transformation. The result of the HOE is an equivalent first-order functional program, which is also well-typed.

We are applying HOE inHDC because we want to avoid having to deal with higher-orderness in our target C code.

We base our work on a previous HOE algorithm for a more general setting [2]. We were able to simplify this algorithm significantly for our purposes. Most importantly, in order to simplify the generation of the target C code, our input to the HOE algorithm is monomorphic. The large amount of functions, which are introduced in the HOE and in the prerequisite phase of desugaring, is subsequently reduced substantially by the inlining transformations.

There is also a source translation from ML to Ada [13], which is based on the same general algorithm.

The general HOE algorithm uses a set of seven rewrite rules for the transformation. The idea is to replace the partial applications of a function by a kind of closure. A closure contains a function identifier and the values of the free variables in the partial application.

(3)

Some of the seven rules deal with restricting polymorphism and become ob- solete in our monomorphic setting. Our modified HOE algorithm [11] uses the following set of four rules:

1. η-expansion. This rule expands function definitions which return functions asresult with as many additional formal arguments as the function returned expects. If the result was polymorphic before monomorphization, the number of additional arguments may depend on the call. Applications of the expanded function then include the application of the function returned and deliver a non-function result.

2. Encode. This rule encodes functional arguments using constructors and introduces apply functions which decode them.

3. ApplVar. If in a function application the function is represented by a vari- able which is marked to carry a closure value, a temporary type inconsistency occurs during the transformation because a closure cannot be applied. This rule wraps the closure in a call to an additional apply function which takes the closure as an argument.

4. RemoveHOTypes. To clean things up, all function types appearing in data type definitions are replaced by an algebraic data type, which is parametrized with an identifier of the encoded type and encompasses all closures.

The algorithm starts with a phase of applications of rule 1, followed by a phase in which rules 2 and 3 are applied repeatedly in any order, and terminates with a phase of applications of rule 4.

4 Experimental Results

Of paramount interest is the impact of the HOE algorithm on the target code.

Our first example, Karatsuba, is an optimized multiplication of two polynomials, represented by a list of their coefficients [1, 4]. Our second example is the frequent set problem [12], a data mining application.

In Tab. 1, we have recorded some static characteristics of the code (row by row, as the compilation proceeds) and the effect of our optimizations.

The Karatsuba example is expressed with a skeleton whose parallelism is completely static, except for some parameters, e.g., the problem size. Thus, the compiler optimizations can only affect the local structure inside the customiz- ing functions. The frequent set example is much more dynamic: optimizations can affect the structure of the entire implementation. Therefore, it pays to an- alyze the properties of the program after different phases of the compilation.

We have built anHDC interpreter for this purpose. Tab. 2 shows the results of an interpretation of the abstract code with two different samplesA and B. The improvements after optimization demonstrate the important role inlining plays after the HOE.

The large amount of work is due partly to the nature of the problem and partly to the lack of sophistication of our source program, derived from Alg. 3.7 of [12] (there are cleverer ways [7]). Regardless of that, note that the optimizations

(4)

number of

Karatsuba frequent set

1. source functions 7 21

2. source lines 30 86

3. functions before HOE 75 104

4. functions after HOE 37 103

5. tree nodes 416 968

number of no opt. opt. no opt. opt.

6. DAG functions 31 11 86 25

7. total DAG nodes 202 269 492 455

8. total abscode nodes 212 343 534 563

Table 1. Effect of compilation and optimizations on the program

reduce the number of operations by up to 30% and that there is a high potential of parallelism.

input no. of operations no. of par. steps average par.

sample threshold no opt. opt. ratio no opt. opt. ratio no opt. opt. ratio

A 0.5 12075 8782 0.73 355 224 0.63 34.0 39.2 1.15

B 0.5 55935 39559 0.71 893 586 0.66 62.6 67.5 1.08

B 0.2 360963 252887 0.70 1854 1239 0.67 194.7 204.1 1.05 Table 2. Run-time characteristics of the frequent set example

We do not yet have data on the speedup through parallelism. But, compared to GHC-compiled code, the HDC Karatsuba example takes sequentially 20%

longer, the frequent set problem roughly 2 to 2.5 times as long [6]. This is the price we pay for not having to deal with higher-orderness in the target code.

5 Conclusions

We purport that the elimination of higher-order functions is especially useful for a parallelization via the use of skeletons. We have succeeded in applying our compilation techniques without difficulty to two realistic, application-level functional programs.

The higher-orderness of skeletons permits the combination of static and dynamic techniques in program parallelization. E.g., the frequent set example re- quires many skeletons – some static, some dynamic.

(5)

Acknowledgements

This work has been funded by the DFG under project RecuR2 and by the DAAD under an exchange project in the ARC programme. Our former team member Robert G¨unz deserves special thanks for implementing the first two compiler phases. We are also grateful to Fran¸coise Bellegarde, Christophe Darlot and John O’Donnell for fruitful discussions.

References

[1] Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The Design and Analysis of Computer Algorithms. Series in Computer Science and Information Processing.

Addison-Wesley, 1974.

[2] Jeffrey M. Bell, Fran¸coise Bellegarde, and James Hook. Type-driven defunction- alization. ACM SIGPLAN Notices, 32(8):25–37, 1997. Proc. ACM SIGPLAN Int.

Conf. on Functional Programming (ICFP’97).

[3] Richard Bird. Introduction to Functional Programming using Haskell. Series in Computer Science. Prentice Hall Europe, 2nd edition, 1998.

[4] Christoph A. Herrmann and Christian Lengauer. On the space-time mapping of a class of divide-and-conquer recursions. Parallel Processing Letters, 6(4):525–537, 1996.

[5] Christoph A. Herrmann and Christian Lengauer. Size inference of nested lists in functional programs. In Kevin Hammond, Tony Davie, and Chris Clack, ed- itors, Proc. 10th Int. Workshop on the Implementation of Functional Languages (IFL’98), pages 346–364. Department of Computer Science, University College London, 1998.

[6] Christoph A. Herrmann, Christian Lengauer, Robert Günz, Jan Laitenberger, and Christian Schaller. A compiler for HDC. Technical Report MIP-9907, Fakultät für Mathematik und Informatik, Universität Passau, May 1999.

[7] Zhenjiang Hu. Personal communication at the Dagstuhl Seminar on High-Level Parallel Programming, April 1999.

[8] Thomas Johnsson. Lambda lifting: Transforming programs to recursive equa- tions. In Jean-Pierre Jouannaud, editor, Proc. Conf. on Functional Programming Languages and Computer Architecture (FPCA’85), LNCS 201. Springer-Verlag, 1985.

[9] Lehrstuhl f¨ur Programmierung, Universit¨at Passau. The HDC compiler project.

http://www.fmi.uni-passau.de/cl/hdc/.

[10] Simon L. Peyton Jones. Compiling Haskell by program transformation: A report from the trenches. In Hanne Riis Nielson, editor, Programming Languages and Systems (ESOP’96), LNCS 1058, pages 18–44. Springer-Verlag, 1996.

[11] Christian Schaller. Elimination von Funktionen höherer Ordnung in Haskell- Programmen. Diplomarbeit, Fakultät für Mathematik und Informatik, Univer- sität Passau, September 1998. In German.

[12] Hannu Toivonen. Discovery of Frequent Patterns in Large Data Collections. PhD thesis, Department of Computer Science, University of Helsinki, 1996.

[13] Andrew Tolmach and Dino P. Oliva. From ML to Ada: Strongly-typed language interoperability via source translation. J. Functional Programming, 8(4):367–412, July 1998.