CHAPTER 5 EMBEDDING DIVERSE PROGRAMMING
5.1 Related Work
Multi-model (or multi-paradigm) programming and ability of programming languages to accommodate that style is a topic that extends far beyond the confines of parallel computing. For example, C++ is often referred to as a multi-paradigm language, in that it supports programming with an impera- tive style, object-oriented programming, and template-based meta- and func- tional programming. For the purposes of this discussion, however, we limit ourselves to those systems which allow multiple programming paradigms which are specifically parallel in nature. These systems can be roughly cat- egorized as either multi-paradigm parallel languages, parallel programming model extensions, interoperability frameworks, and runtime systems which unify multiple programming models.
Before we begin a discussion of the particular programming models that we have embedded in Charj, we should first make more explicit what we mean by “embedding” in this context, as this term can denote several distinct tech- niques for incorporating multiple models within a program or programming environment.
In some cases, supporting embedded programming models means that the language provides support for users or library writers to provide their own custom syntax which closely integrates with the parent language, and which
allows syntax for domain-specific models which were not envisioned by the original language developers to be cleanly integrated in the basic model after the fact.
One example of this use is“Rakefile” syntax in the Ruby programming language, which provides constructs for managing software build and de- ployment processes similar to the common Unix “make” utility. Rakefiles are valid Ruby programs which incorporate this embedded special syntax to simplify the process of describing dependencies, build environments, and so on [38,39]. Similar techniques have been applied in other high-level languages such as Scala [40] and Haskell [41].
This practice of users embedding syntax in a parent language is perhaps best exemplified by the Lips macro system, which allows the programmer to specify sophisticated operations on the Lisp s-expressions that make up their program. This allows the creation of new control structures and so on at the level of libraries or application code. The extensive use of such constructs in Lisp programs is made possible by the fact that Lisp programs are essen- tially explicit representations of their own syntax tree, with no intermediate syntax. As a result, Lisp is a very effective platform for embedding new syn- tax to support domain specific languages or otherwise extend core language functionality [42].
Another use of “embedding” in the context of programming models is to describe the use of preprocessing tools which transform an input file that is a mix of the parent model and the embedded model and output a pure program that consists only of code in the parent model. This allows the developer of the embedded model to piggyback on the infrastructure provided by the parent model and potentially reduce the barrier to entry for potential users of the embedded model. For example, the MetaBorg system allows the user to embed domain-specific languages in the Java language to support such tasks as user interface specification and XML generation [43, 44], and Lua- ML allows the embedding of Lua code within ML applications [45].
However, these uses are not what we mean to indicate by the term “embed- ding.” Rather, we take embedding to mean that we incorporate syntax and other program constructs from other special-purpose programming models or extensions to the base message-driven Charj programming model directly within Charj programs in a fashion that can be understood by the Charj com- piler and all targeted to a common runtime system. Whereas in the original
Charm++ system, these models would require either their own translator support or implementation as a C++ library, in Charj we can directly in- corporate them as first-class citizens, making them possible targets for static checking and optimizations, as well as simpler, more direct syntax. These models typically target a particular class of parallel interactions, such as dis- ciplined access to global arrays, or the interaction between host hardware and accelerator hardware in software which runs on hybrid systems. By em- bedding these models within Charj, we make it easier for programmers to make use of the advantages they provide in their particular problem domains while maintaining the advantage of a unified Charj infrastructure.
We also avoid the need to embed models which would benefit from custom syntax in a language which supports custom syntax only weakly. While we discussed the embedding of custom syntax in Ruby and Lisp above, these languages are not widely used in HPC application development. In that arena one is much more likely to encounter programs written in C, C++, or Fortran. C and Fortran provide essentially no support for incorporat- ing special-purpose syntax into their base syntaxes. C++ provides greater opportunities, particularly through its template metaprogramming facilities. Indeed, the multiphase shared arrays programming model which we embed in Charj and which is discussed further in section 5.4 was originally imple- mented as a C++ library, and later revised to improve the static checking provided to the programmer by leveraging the C++ type system. However, C++ provides only weak embedding facilities compared to languages like Lisp and Ruby, and greatly constrains the syntax and language constructs that a model implementer can feasibly provide. As we discuss in the sub- sequent section, embedding this model directly in Charj provides a variety of benefits that are not available to implementations that are constrained to use only the facilities provided by C++.