The D-Box coordinational language and the
Support for distribute development in Clean functional language
Main results of the PhD dissertation
Dr. Horváth Zoltán, professor
Eötvös Lóránd University, Faculty of Informatics
H-1117 Budapes, Pázmány Péter sétány 1/C.
PhD School of Eötvös Lóránd University, Faculty of Informatics PhD Programm: The foundations of and methodologies in informatics Leader of the PhD School and that of the PhD programme:
The present thesis deals with a coordination language that supports the distributed evaluation of a Clean programme, and its run-time system. The D-Clean higher ab-straction coordination language primitives designed for the Clean functional language will be translated into these lower-level D-Box coordination language definitions, on which the concepts channel, box, protocol are defined. The primary coordination el-ement is the box, that contains a computation task - that can be a function or an expression - formulated on Clean language. Parameters required by this expression are carried by typed channels from other boxes; these are capable of minimal buffering as well. On the other hand it supports the add and remove operations known from the
queue data structure.
The channels are read by a protocol expression that hides the details of channel-handling from the Clean language expression. The protocol expression is able to per-form basic transper-formations on the data coming through the channels - e.g. construct a list from them. The expression processes the data received from the protocol, then it computes the result. The results are given to an output protocol, that will transmit those to the outgoing channels. The input and output channels of the boxes are con-nected. The nod elements of the computation graph are the boxes, the edges are the channels. The topic of the dissertation is the syntax and semantics of the coordination language describing this graph.
1. To create a coordination language that supports the distributed functional prog-ramming in its operation. This should support the D-Clean higher abstraction coordination language and should be suitable for developing distributed program-ming solutions on a functional programprogram-ming language. To work out the syntactic and static semantic rules of the coordination language, on the basis of which the D-Box language descriptions can be processed and checked.
2. To give the specifications connected to the operation of the coordination language, that can be the basis of creating a code-generating tool. To apply external pattern
files during code-generation, that can be worked out in case of other platform, middleware as well. The code generating tool should parameter the patterns with types, and to create a Clean language source code. To work out the macros applied by the code-generating tool, and with the help of which the patterns can be created. To create concrete pattern files, with the help of which a code can be generated in case of some operation system and middleware.
3. To create the run-time system completing the services of the middleware, the informal syntactic and semantic description of the functions of API. These func-tions can be referred to in the pattern files. This system should be suitable for preparing the running of projects generated on the basis of D-Box language description, and to support it during running.
4. To test the environment created previously, to run concrete distributed compu-tational tasks, to measure effectiveness, performance.
Antecedents of topic choice
In the first phase of my research I have dealt with the MPI/PVM programme libraries under linux, using message passing communication method. However the complexity, weak typedness of C, C++ programming languages, is not a favorable environment for preparing distributed programmes, not to mention its cumbersome debugging and tracking methods.
Later, based on the possibilities offered by the .NET Framework I have prepared an implementation built on similar guidelines, for an O.O.P. platform, in C# language. The implementation was worked out on a pure C# language, and the communication was assured by a low-level socket handling, but in essence it was a class-collection furnished with communication-supporting methods. I have prepared a simple run-time system to it, and this was later the basis of the D-Box run-time system discussed in the present thesis.
The Clean functional programming language, due to its many characteristics, is an attractive programming platform. Unfortunately the Clean does not support the distributed running at the moment. It seemed necessary to develop a higher-level process-description and coordination system. To serve this aim, Zsók Viktória, the researcher of ELTE, has designed the D-Clean language with the help of which com-munication could be described by means similar to functional language expressions. I have joined the research at that point. The design of D-Box language was a cooperation. In this research I have prepared the D-Clean translator, the syntax and semantics of D-Box language, the D-Box code-generating, run-time system I have also implemented
the vast majority of D-Box pattern files. It was also me who have prepared an ear-lier implementation of D-Box protocol templates, that had to be reconsidered due to the SplitF protocol introduced later. The present protocol implementations in Clean language were prepared by a researcher from ELTE,Diviánszky Péter.
According to our solution, the functions created by the user do not have to contain any information related to communication or the run-time system. The generated code completely hides this from the Clean functions. This way the development and testing of the functions under traditional environments, and its transfer to the distributed environment without any altering becomes possible.
Antecedents of the research topic
From among the functional programming languages the Concurrent Haskell possessing a parallel evaluation system was studied. The JoCaml language is the supplement of the Objective Caml language with the help of the join calculus, and it has a specifi-cally parallel and distributed development support. It knows the concept of channel and abstract place. Communication among the processes are done through channels. The ERLANG makes the creation of concurrent, real-time, distributed and problem-tolerating systems. The language possesses inbuilt means on the domain of distribut-ing and message transmission, it fulfils the distribute without a common memory area, through sending messages. The Hume is a strongly typed, functionally founded pro-gramming language, its primary priority is to check the limitedness of running time and use of resource. It applies an asynchronous communication model, the basic element of which is the box. The boxes have unique identifiers, the in- and outgoing data are typed. The boxes can be connected, the description of which happens separate from the definitions - during this it will be checked whether the connection type is correct. The boxes can be connected through devices (stream, port, etc.) as well. The boxes can might as well be connected to themselves, forming a one-element loop. Connection among the boxes can be calledwires. Starting values can be placed on the wires, that are useful in case of creating the loops. To the Eden functional programming language Jost Berthold has created an implementation language calledEdI, in which he defined channel-creating and data-sending primitives. Due to the lazy evaluation of the Eden, he has defined evaluation strategies, that can force the computing (and sending) of the value on the sending side.
Through completing the ObjectIO library developed for the Clean functional prog-ramming language interactive programmes can be developed, that contain a menu and dialog windows. The unique type makes it possible to handle resources from a pure functional approach. A such kind of unique type value can not be duplicated.
The porting of ObjectIO in a Linux environment is not complete, so it can not be built upon in heterogeneous systems. The Concurrent Clean was a purely functional, strongly type language, that supported parallel and distributed evaluations as well. Unfortunately this extension of the language could not keep pace with the language versions, its development has stopped. In the early phase of Clean there also was a transputer supported language version. With the help of annotations it was possible to set the parts of the expression that can be evaluated in a parallel way. The paralleling strategies were based on these annotations, with their help it was possible to set an evaluation order.
The syntax of the D-Box coordination language is set by EBNF description and lex
+ yacc definitions as well. The rules essential to the analysis of the static syntac-tic properness were formerly described. The operation of the D-Box protocols was specified. The operation of the run-time system was given in the form of a natural op-erational semantics. The created D-Box lexical and semantic analyzer is a programme generated not on the basis of BNF, written on C language. A C++, C, Clean language codes are read, parametered from external pattern files, on the basis of information included in D-Box definitions. This is solved with the help of macros built into the pattern files. The Start expression of the boxes is created by the D-Box on the basis of code-generating non-patterns - the code strongly dependent on the applied protocol and channels is generated immediately by the translator.
The syntactic properness of the structure built up from lexical elements is checked by a not purely LALR(1) syntactic analyzer. The attached static semantic analyzer operates in a separate run, similar to the code-generating run. We do not perform code-optimizing steps, as it is not important in the code of the channels, the code of the boxes is generated in Clean language, and the code-optimization is done by the Clean translator and lazy evaluation system. The inter-layer services of the running system are extended by special services essential to the running of the project. ICE functions are called from the Clean code through interfaces. The source of the interface is in the pattern files as well.
The syntax and static semantics of D-Box language
I have created a coordination language through which a kind of computing graph can be defined in which applications written on a Clean functional programming language are
running on the computing nodes. The coordination language supports special language elements as well, such as the *World type restorable value. These value types can not be carried through channels, a replacing value can be generated on the host side.
The syntax was given in an EBNF form and in a form that can be processed by lexer and yacc. During the static semantic check the channels of the box and the applied input protocol; the output protocol and the output channels were checking. It was analyzed, whether the protocol was able to create the input parameters of the applied expression, and whether it is able to handle the output values. Furthermore it was analyzed whether each input and output channel is used only once and whether the dynamic sub-graph startups can be fulfilled during running. Above all these it was also analyzed, whether the communication between the boxes is closed concerning the sub-graph.
Specification of D-Box language primitives
I have introduced the syntactical and static semantic rules of the coordination language, on the grounds of which I have created an operating syntactical checking programme. Based on the Windows operation system the ICE middleware a template collection was also created and attached to the thesis as a DVD supplement. The names of template files and the library structure was put into a XML configuration file on the basis of which the code generating phase is more easy to parameter.
The specification of the coordination language discusses the operation of protocols, the serialization of more difficult structures, lists, lists of lists, and the steps of writing on a channel. We have given the reading of channels, the de-serialization of a symbol-serial arriving on the channel, the specification related to the compilation of protocol results. The operation of the input and the output protocols was precisely defined this way.
The operation of the protocols is based on the lazy evaluation of Clean language, instead of the parallel way. The processing starts with the output protocol, that causes the evaluation of the expression. The expression takes away the incoming data from the input protocol in a lazy way, this way causes the implicit reading of input channels. The generated data are put on the channels by the output protocol. The D-Box translator takes the definitions describing the computation graph either from a file, or directly from the D-Clean translator. The latter is the result of the integration of the two translators. After the evaluation of D-Box definitions and their static semantic check a resource code was generated on the basis of the pattern files. For the support of code-generation, macros were developed, the description of which are included in the
attachment of the present thesis.
The Clean Start function of the generated computation nodes (box) are created on the basis of non-patterns by the translator, as basically its every line is up to the type, number of applied channels, and the protocol. We use the C++ linker - as we could not apply the Clean linker to the task. C++ language object files are also attached to the codes of the box during the linking.
Formal semantics and implementation of the run-time
The thesis describes the state of the run-time system and the operational semantics. During the running, from a starting state boxes forming the spine of the project are launched, and the channel starting commands are processed. In the meantime new boxes are dynamically started and new channel starting commands can be part of the system. In the project finish state the start of every box is finished and every channel start command was fulfilled. The run-time system places the generated binary code into a code library service, on the basis of which the project can be started. During this, the system first instantiate the boxes belonging to the beginning sub-graph. After their starting, the boxes can ask for the starting of channels and the instantiate of boxes belonging to further sub-graphs. A name provider ensures the meeting of running components. The run-time system contains a scheduler, that defines which component should be put on which concrete computer.
Verification of the applicable of the D-Box system on
a problem class
Translation, code generation and running was checked through fulfilling a real-life, computation-intensive problem. Measures and charts related to the distributed running of the application prove that the generated code is effective in this problem class. The running time of the distributed case using 4, 8, 16 nodes, cutting point alteration, operational complication approximated to the expected maximal speed-up value.
 William Gropp, Ewing Lusk, Anthony Skjellum: Using MPI - Portable Parallel Programming with Message-Passing Interface MIT Press, 1999
 Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert Manchek,Vaidy Sunderam: PVM: Parallel Virtual Machine – A Users’ Guide and Tutorial for Networked Parallel Computing, MIT Press, 1994, http://www.netlib.org/pvm3/ book/pvm-book.html
 Kevin Hammond, Greg Michaelson, Robert Pointon: The Hume Report, version 1.1, http://www-fp.cs.st-andrews.ac.uk/hume/report/
 Jost Bertold: Explicit and Implicit Parallel Functional Programming: Concepts and Implementation, PhD Disszertáció, 2008, Marburg.
 Jones, S. P., Gordon, A., Finne, S.: Concurrent Haskell, Conference Record of POPL ’96: The 23rd ACM SIGPLANSIGACT Symposium on Principles of Pro-gramming Languages, Glasgow, 1996, 11 pp.
 Finne, S. and Jones, S., P. J.: Concurrent Haskell, In Principles Of Programming Languages, St. Petersburg Beach, Florida, 1996, pp. 295-308
 Fournet, C., Le Fessant, F., Maranget, L., Schmitt, A.: The JoCaml language beta release, Documentation and user’s manual, INRIA, 2001.
 Leroy X. et al. The Objective Caml Language (version 3.10). Software and docu-mentation, available at http://caml.inria.fr, 2007.
 J. Barklund and R. Virding. Erlang Reference Manual, 1999. Available fromhttp: //www.erlang.org/download/erl_spec47.ps.gz. 2007.06.01
 Kesseler, M.H.G.: The Implementation of Functional Languages on Parallel Ma-chines with Distributed Memory, PhD Thesis, Catholic University of Nijmegen, 1996.
 Serrarens, P.R.: Communication Issues in Distributed Functional Computing, Ph.D. Thesis, University of Nijmegen, January 2001.
 Horváth Z., Zsók V., Serrarens, P., Plasmeijer, R.: Parallel Elementwise Process-able Functions in Concurrent Clean, Mathematical and Computer Modelling 38, pp. 865-875, Pergamon, 2003.
 Horváth Zoltán, Hernyák Zoltán, Zsók Viktóra: Coordination Language for Dis-tributed Clean, Acta Cybernetica (ISSN: 0324-721 X), Vol. 17 (2), Institute of Informatics, University of Szeged, Szeged, Hungary, 2005, pp. 247-271. Selected publication of CSCS PhD Conference in Computer Science.
 Achten, P., Wierich, M.:A Tutorial to the Clean Object I/O Library, University of Nijmegen, 2000.http://www.cs.kun.nl/~clean
 Plasmeijer,R.-van Eekelen,M.: Functional Programming and Parallel Graph Rewriting, Addison-Wesley, 1993.
 [EeNoPlSm90] van Eekelen,M. et al.: Concurrent Clean, Technical Report no 90-20, November 1990, University of Nijmegen.
 Plasmeijer, R., van Eekelen, M.: Concurrent Clean Language Report, University of Nijmegen, 2001.
 Hernyák Zoltán: PEDPI as a Message Passing Interface with OO support, in: Striegnitz, Jörg; Davis, Kei (Eds.) (2003) Proceedings of the Workshop on Parallel/High-Performance Object-Oriented Scientific Computing (POOSC’03), In-terner Bericht FZJ-ZAM-IB-2003-09, Juli 2003, pp. 93-100.
List of publications
1. Horváth Zoltán, Hernyák Zoltán, Zsók Viktóra: Coordination Language for Dis-tributed Clean, Acta Cybernetica (ISSN: 0324-721 X), Vol. 17 (2), Institute of Infor-matics, University of Szeged, Szeged, Hungary, 2005, pp. 247-271. Selected publication of CSCS PhD Conference in Computer Science.
2. Horváth Zoltán,Hernyák Zoltán,Kozsik Tamás, Tejfel Máté, Ulbert Attila: A Data Intensive Application on a Cluster - Parallel Elementwise Processing, in P. Kacsuk, D. Kranzlmüller, Zs. Nemeth, J. Volkert (Eds.): Distributed and Parallel System -Cluster and Grid Computing, Proc. 4th Austrian-Hungarian Workshop on Distributed and Parallel Systems, Kluwer Academic Publishers, The Kluwer International Series in Engineering and Computer Science, Vol. 706, pp. 46-53, Linz, Austria, September 29-October 2, 2002.
3. Zsók Viktória, Hernyák Zoltán, Horváth Zoltán: Designing Distributed Com-putational Skeletons in D-Clean and D-Box, in.: Lecture Notes in Computer Science, Horváth Zoltán(ed.) in.: Central European Functional Programming School (The First Central European Summer School, CEFP 2005, Budapest, Hungary, July 4-15, 2005), Revised Selected Lectures. ISSN 0302-9743, vol. 4164, 2006, pp. 229-265.
4. Zsók Viktória, Hernyák Zoltán, Horváth Zoltán: Distributed Pattern Design in D-Clean, Central European Functional Programming School, CEFP 2005, ELTE, Bu-dapest, Hungary, July 4-15, 2005, Lecture Notes, 33 pages
5. Zsók Viktória, Hernyák Zoltán, Horváth Zoltán: Improving the Distributed Ele-mentwise Processing Implementation in D-Clean, In: Horváth Z., Kozma L, Zsók V. (eds): Proceedings of the 10th Symposium on Programming Languages and Software Tools (ISBN: 978-963-463-925-1), SPLST 2007, Dobogókő, Hungary, June 14-16, 2007, Eötvös University Press, 2007, pp. 256-264.
6. Zsók Viktória, Hernyák Zoltán, Horváth Zoltán: Distributed Pattern Design in D-Clean, Vene V., Meriste M.(ed.) in.: Proceedings of the Ninth Symposium on Pro-gramming Languages and Software Tools, ISBN: 9949-11-113-7, SPLST 2005, Tartu,
Estonia, 13-14 August, 2005, Tartu University Press, 2005, pp. 220-234.
Publications in referred proceedings
7. Zsók Viktória, Hernyák Zoltán, Horváth Zoltán: /Distributed Computation on Cluster using D-Clean and D-Box. Extended abstract In: Davis, K., Quintino, T., Striegnitz, J. (eds): 5th Workshop on Parallel/High Performance Object-Oriented Scientific Computing, POOSC’06 at 20th European Conference on Object-Oriented Programming, ECOOP 2006, Nantes, France, 3rd July, 2006, 3 pages. Summary: Object-Oriented Technology, ECOOP 2006 Workshop Reader, ECOOP 2006 Work-shops, Nantes, France, July 3-7, 2006, Final Reports, LNCS 4379, Springer Verlag, 2007, pp. 141-145.
8. Horváth Zoltán, Hernyák Zoltán, Zsók Viktória: Implementing Distributed Skele-tons using D-Clean and D-Box, In: Butterfield, A. (ed): Proceedings of the 17th International Workshop on Implementation and Application of Functional Languages, IFL 2005, Dublin, Ireland, September 19-21, 2005, pp. 1-16.
9. Hernyák Zoltán, Horváth Zoltán, Zsók Viktória: Clean-CORBA Interface Sup-porting Pipeline Skeleton, Csőke Lajos(ed.) in.: Proceedings of 6th International Con-ference on Applied Informatics, Eger, Hungary, January 27-31, 2004. Eger, Hungary, B.V.B. Press, Vol. I. pp. 191-200.
Publications in international conference proceedings
10. Zsók Viktória, Horváth Zoltán, Hernyák Zoltán: /Distributed Elementwise Pro-cessing in D-Clean, In: Nilsson, H. (ed): Proceedings of the Seventh Symposium on Trends in Functional Programming, TFP 2006, Nottingham, UK, 19-21 April, 2006, The University of Nottingham, pp. 378-386.
11. Hernyák Zoltán, Horváth Zoltán, Zsók Viktória: Design of Language Elements for Dynamic Distributed Computation of Clean Expressions on Clusters, in: Loidl, H-W. (ed): Proceedings of Fifth Symposium on Trends in Functional Programming, TFP 2004, Munich, Germany, November 25-26, 2004, Ludwig-Maximilians University, pp. 257-270.
12. Hernyák Zoltán: PEDPI as a Message Passing Interface with OO support, in: Striegnitz, Jörg; Davis, Kei (Eds.) (2003) Proceedings of the Workshop on Parallel/ High-Performance Object-Oriented Scientific Computing (POOSC’03), Interner Bericht FZJ-ZAM-IB-2003-09, Juli 2003, pp. 93-100.