• No results found

Ecological Modeler

8.2 The Sprat PDE DSL

The SpratPDE DSLis embedded into C++ via the piggyback pattern (Mernik et al. 2005), which means that theDSLis implemented completely in C++ and allDSLmodels are valid C++ code. This offers the advantage of acquiring full tool support (editors, debuggers, compilers, etc.) without any additional implementation effort. The choice of C++ as the host language is mainly due to the wide-spread use of C and C++ among numerical mathematicians (user acceptance). In addition to that, the operator overloading capabilities of C++ allow theDSLto feature matrix-vector expressions with a “natural” syntax.

While the implementation of the SpratPDE DSLitself uses features that are specific to C++ and that are not present in C, the language is designed in a way that should enable it to be used also by domain experts proficient only in C and not in C++. As the SpratPDE DSLdoes not enforce a specific program structure (in contrast to a framework) and can be used alongside existing code, a stepwise adoption of theDSLin existing projects is possible. All the aspects mentioned before are part of an effort to make the SpratPDE DSLas easy to learn and use as possible and to ensure its acceptance in the target community.

The focus of the Sprat PDE DSL is on the implementation of special- purposeFEMsolvers. It addresses developers of such algorithms rather than FEMpractitioners, who are not necessarily interested in how a certainPDE or system ofPDEsis solved. Therefore, the language does not feature the most abstract concepts of theFEM(say, variational forms) but concentrates on entities that allow to conveniently model mesh-basedPDEsolvers.

From a technical perspective, the language is comprised of a set of header files written in C++11 that can be used by the application programmer via include statements. These headers expose a set of classes, macros, and functions that interact with each other to implement the following four key feature areas, which are illustrated by Listing 8.1:

1. Coherent abstractions for the mesh topology. Solver algorithms imple- mented with the SpratPDE DSLcan be expressed independently of the employed mesh type and of the number of its spatial dimensions. There- fore, the mesh type and its dimension can be varied without having to modify the algorithm itself. This is made possible by abstractions—such

8.2. The Sprat PDE DSL 1 DistributedVector u, q; 2 ElementVectorArray F_L; 3 ElementMatrixArray C; 4 ElementMatrix D; 5

6 foreach_omp(tau, Elements(mesh), private(D), {

7 foreach(i, ElementDoF(tau), {

8 foreach(j, ElementDoF(tau), {

9 D(i, j) = max(i.globalIndex(), j.globalIndex());

10 })

11 })

12 F_L[tau] = C[tau]*q + D*u;

13 })

14 u *= u.dotProduct(q);

15 u.exchangeData();

Listing 8.1.SpratPDE DSLcode snippet.

as elements and Degrees of Freedom (DoF)—that are coherent for all mesh types and any number of dimensions and “know” how to handle common mesh-specific tasks. An example of this can be seen in line 6 of Listing 8.1, where—depending on the type ofmesh—tauautomatically is chosen to be of a corresponding element type that adjusts its behavior accordingly (e. g., it would “know” how to correctly compute the par- tially integrated integral of two of its degrees of freedom for assembling a discrete Laplace operator).

2. Lazily-evaluated matrix-vector arithmetic with a natural and declarative syntax, parallel execution, andDSOs. Lazy evaluation of matrix-vector expressions means that no temporary variables are created during their evaluation. For example, in line 12 of Listing 8.1, the assignment of the expression on the right-hand side to the element vectorF L[tau]is com- puted by directly adding the individual contributions ofC[tau]*qand D*utoF L[tau]. Additionally, we employDSOsthat fuse the computation ofC[tau]*qandD*uin a single loop, which prevents multiple iterations over matrices with a common structure. And, moreover, the evaluation

of matrix-vector expressions is automatically executed in parallel. For further details on lazy evaluation and DSOs in the Sprat PDE DSL, see Section 8.2.2.

3. (Parallel) iterations over sets. One of the most common tasks found in numerical algorithms is to use an integer variable to iterate over some index range. The drawback of this approach is that the iteration variable has no semantic connection with the objects that is iterated over. Because of that, we added iterations over sets, which explicitly state that, e. g., the variabletauin line 6 iterates over the set of all elements of the mesh. Moreover, as the iteration variables are thin wrappers around indices, functionality related to the object they represent can directly be requested from them (for example, in line 9, we ask the elementDoF ifor its global index). Iterations over sets can be parallelized using OpenMP clauses (Dagum and Menon 1998) as shown in line 6.

4. Optional Single Program, Multiple Data (SPMD) abstractions for parallel computing (Duncan 1990). We useMPI(Message Passing Interface Fo- rum 2012) to distribute computations across different compute nodes. Many data types, such as the DistributedVector or the mesh classes, feature high-level abstractions to transparently handle data exchange between compute nodes. For example, in line 15 the distributed vector uis instructed to exchange data regarding duplicated ghostDoFin the mesh after being updated in line 14. The method for calculating the dot product ofuandqin line 14 is also aware of the distributed nature of the problem and automatically computes the right value for the global problem and not just the node-local answer. Furthermore, meshes can be automatically partitioned and distributed across different compute nodes. Since many computational scientists have already implemented their own parallelization framework (cf. Section 6.2.3 e)), we made using the parallelization features of the SpratPDE DSLcompletely optional (via compile-time switches).

To achieve good data locality, we combineMPIwith OpenMP. OpenMP is used to automatically execute vector operations, such as the update ofuin line 14, in parallel. Additionally, as mentioned before, OpenMP- enabled versions of our set-based iterations exist.

8.2. The Sprat PDE DSL

By combining these features, the SpratPDE DSLallows to express mesh-based PDEalgorithms in a concise way that closely resembles their representation in mathematical text books or articles. Because of this high abstraction level, it is relatively easy to apply typical changes to a solver algorithm (such as generalizing it to more dimensions) and to check whether the implemented algorithm actually corresponds to the algorithm in a formal description like one in a paper (since they almost look the same). The compact notation for matrix-vector arithmetic also simplifies writing tests because checking, for example, whether the assertion holds that all entries of the sum of two vectors are positive, can be expressed in just a one-line statement (assert(u+v > 0)).

Furthermore, it can be assumed that the language is easy to learn for a numerical mathematician who is already familiar with C or C++. The user has to interact only with a handful of data types and is most certainly already acquainted with the concepts of matrix-vector arithmetic and iterations over sets. Additionally, manyFEMalgorithms share a similar structure, which makes it possible to supply the user with a skeleton for their implementation. Such a skeleton also encourages users to employ all the features of theDSLrather than implementing existing features again in the host language.

8.2.1

Domain Meta-Model

In order to have a closer look at the features of the Sprat PDE DSL, this section presents an overview on the meta-model of the language. As for the description of the meta-model ofDSLhierarchies in Chapter 7, we use a combination of the Unified Modeling Language (UML) and Object-Z for this purpose. Note that for the sake of clarity, our presentation of thePDE DSL meta-model only focuses on key concepts of the language. Therefore, the formal specification omits several details that are irrelevant to a high-level understanding of the DSL. For a complete reference of the language, we refer to the implementation available online (Johanson 2015d).

To make the presentation easy to follow, we divide the meta-model into the four main feature areas presented above. TheSPMDabstractions for parallel computing, however, are represented only by the single meta-class Parallel Execution Environment.

Figure 8.2.Meta-model elements of the SpratPDE DSLassociated with the mesh topology.

Mesh Topology

All meta-model elements associated with the mesh topology are depicted in Figure 8.2. As one would expect from the theory ofFEMsolvers (Chapter 4), a FEM Mesh consists of Elements and Degrees Of Freedom, which are linked together via Element DoF. Elements have Hypersurfaces, from which—again— the Element DoF of the element can be accessed. Furthermore, a FEM Mesh is associated with a Parallel Execution Environment, which transparently han- dles allSPMDparallelization tasks that are related to the mesh, such as mesh partitioning. Regarding such mesh partitions, ghostDoFare represented by the corresponding specialization of the Degrees Of Freedom meta-class.

A FEM Mesh can either be structured or unstructured. The SpratPDE DSLprovides several default mesh implementations such as a rectangular mesh with P1 elements. Code skeletons are provided for users of theDSL to develop their own mesh types (as indicated by the meta-classes with

8.2. The Sprat PDE DSL

Figure 8.3. Meta-model elements of the SpratPDE DSLrelated to matrix-vector expressions.

the dots). All meshes have to be implemented for arbitrarily many spatial dimensions.

Matrix-Vector Expressions

The second feature area are lazily-evaluated matrix-vector expressions as illustrated in Figure 8.3. A Matrix Vector Expression is represented as a tree containing the typical arithmetical operators. A Terminal Value either is a floating-point scalar (Number), a Vector Type, or a Matrix Type.

Besides usual dense vectors, the SpratPDE DSLfeatures views on parts or strides of vectors, element vectors (see Chapter 12), and distributed vector types as well as combinations of those types. Distributed vectors use the Parallel Execution Environment discussed above to transparently handle data exchange for ghostDoFbetween different compute nodes.

Among the matrix types supported by theDSLare dense, sparse, and implicit operators as well as element matrices (see Chapter 12). Sparse matrices can be stored in different formats, such as Compressed Row Stor- age (CRS) or List of Lists (LIL) (Pissanetzky 2014). Implicit matrices represent operators that are not stored explicitly (i. e., one cannot access individual entries) but it is known how to apply them to a vector. The Implicit Matrix meta-class is starred in Figure 8.3 because it was not present in the initial reference implementation but was suggested to be implemented during the evaluation of the SpratPDE DSLby domain experts (see Chapter 13).

In order to be able to handle matrix-vector expressions efficiently, their evaluation must not require the creation of temporary matrices or vectors. This is not only efficient but also allows the user of the SpratPDE DSLto stay in full control of memory allocation. To model the constraints necessary to guarantee that no temporaries are required for the evaluation, we introduce all possible categories of Terminal Values as Object-Z classes.

Terminal Value

size :N

A Null Terminal is introduced to model empty sub-trees in Object-Z.

Null Terminal Terminal Value size=0 Number Terminal Value size=1

8.2. The Sprat PDE DSL

For simplicity, we only consider quadratic matrices with size ˆ size entries in this specification. The implementation of the SpratPDE DSL, however, also features general rectangular matrices.

Vector Type Terminal Value size ą 0 Matrix Type Terminal Value size ą 0

Due to limitations of the Object-Z specification language, we do not model the different types of matrix-vector expression nodes via inheritance as shown in Figure 8.3. Instead, each node is of class Matrix Vector Expression and its node type is identified by a type attribute. In the SpratPDE DSL, all binary operators (i. e., nodes that are of type plus, minus, times, or divide) with two vector-valued operands indicate the element-wise application of the corresponding operation between those operands.

NODETYPE ::=terminal | unary plus | unary minus | plus | minus | times | divide | null

If the node of the expression tree is a terminal, a corresponding terminal value is associated with it. The nodes attribute represents all nodes of the (sub-)expression tree, which are partitioned in a left and a right sub-tree (each possibly being empty).

Matrix Vector Expression

type : NODETYPE value :↓Terminal Value

left, right : Matrix Vector Expression nodes :P1Matrix Vector Expression

hleft.nodes,{self}, right.nodesipartitions nodes

Depending on its type, the expression node has left and right children (binary), only a left child (unary), or no children (terminal and null type). The type of the value associated with the node is constrained in accordance. type=null ðñ value P Null Terminal^

left.type=null ^ right.type=null

type=terminal ðñ value P Number Y Vector Type Y Matrix Type^ left.type=null ^ right.type=null

type=unary plus _ type=unary minus ðñ

value P Null Terminal ^ left.type ‰ null ^ right.type=null type=plus _ type=minus _ type=times _ type=divide ðñ

value P Null Terminal ^ left.type ‰ null ^ right.type ‰ null

The set of all possible matrix-vector expressions can now be constrained to those valid in the SpratPDE DSL, which are described by the set Valid MV Expressions. In order not to require any temporary variables for the assignment of an expression to a vector, the expression must contain matrix types only as the left-most operand of a top-level term and this matrix must be multiplied with an expression containing only numbers and vectors. This implies that, in particular, we cannot allow any matrix-valued expressions in the SpratPDE DSL.

Translating this constraint into the context of our expression trees, it means that any matrix node must be the left operand of a node of type times. Furthermore, all parents of this times node must not be times or divide nodes (to ensure that we are in a top-level term) and the right operand of the times node must itself not contain any matrix types (to ensure that

8.2. The Sprat PDE DSL

we do not allow any matrix-valued expressions). Of course, all matrix and vector terminals must be of compatible size.

Valid MV Expressions== {expr : Matrix Vector Expression |

(@n : expr.nodes | n.value P Matrix Type ‚

(Do : expr.nodes | o.type=times ‚ o.left=n^

(@p : o.right.nodes ‚ p.value R Matrix Type)^

(Dp : o.right.nodes ‚ p.value P Vector Type))^

(@o : expr.nodes | o ‰ n ^ n P o.nodes ‚ o.type ‰ times ^ o.type ‰ divide))^

(Di :N ‚ @ n : expr.nodes ‚

(n.value P Vector Type _ n.value P Matrix Type) ùñ n.value.size=i)}

If matrix-vector expressions appear in the context of comparisons (with another vector or a floating-point value), even stricter constraints apply that are characterized by the definition of Valid Comparison MV Expressions: no matrix terminals must appear at all.

Valid Comparison MV Expressions== {expr : Matrix Vector Expression |

(@n : expr.nodes ‚ n.value R Matrix Type)^

(Di :N ‚ @ n : expr.nodes | n.value P Vector Type ‚ n.value.size=i)} Iteration Over Sets

InPDEsolvers, one often needs to iterate over all instances of a particular aspect of the mesh topology, such as over all elements or DoF. Therefore, Iterations in the SpratPDE DSLfeature an Iteration Variable that successively assumes the value of every Set Element in a given Set as depicted in Fig- ure 8.4. The set elements can be traditional integer-valued indices but can also be any entity of the mesh topology. The loop body of the iteration can contain arbitrary C/C++ statements, which in turn might consist of matrix-vector expressions.

An iteration itself can either be executed serially or in parallel. Parallel iterations rely on the OpenMP technology (Dagum and Menon 1998) and, therefore, might require additional OpenMP Clauses to control data sharing between threads. A special case of the parallel iteration is the Parallel

Figure 8.4.Meta-model elements of the SpratPDE DSLassociated with iterations over sets.

Independent Iteration, which is only available for sets of mesh elements. It guarantees that the elements are iterated over in an order that allows to modify values associated with theDoFof each element in parallel without the need for synchronization between threads. To accomplish this, the elements are divided into subsets forming maximal independent sets (Robson 1986) for their respectiveDoF.

8.2.2

DSL Implementation

In order for the SpratPDE DSLto be accepted by theHPCcommunity, the abstractions provided by the language must not compromise the runtime performance of programs significantly. This goal has been achieved (cf. Chapter 13) by observing three key principles:

8.2. The Sprat PDE DSL

1. Avoid using inheritance relationships that would require type intro- spection. Although we extensively employed inheritance to express the meta-model of theDSL, inheritance is generally avoided in the imple- mentation to circumvent the computational cost of looking up type information at runtime.

2. Ensure that the compiler can apply code inlining as often as possible. For example, when using abstractions for iterating over sets with the iteration variable being an object (e. g., aDoF) that can also be used as an index (e. g., the index of aDoF), the index look-up must not result in a function call. Utilizing function calls in this context would slow down the execution considerably because index look-ups tend to occur very often (e. g., in the body of a loop over allDoFthat is executed for every time step of the algorithm). To allow inlining as often as possible, the SpratPDE DSLis implemented using only header files.

3. Make sure that matrix-vector expressions are evaluated lazily instead of eagerly and applyDSOsto such expressions. We achieve this by using template meta-programming techniques (Abrahams and Gurtovoy 2004) and exploit optimization potential arising, e. g., from sparse matrices with the same sparsity patterns.

By default, expressions in C++ are evaluated eagerly, which means, for example, that for vectorsu,v, andwthe assignmentu = u + v * wwould be computed by creating a temporary vectort1 = v * w, then another tempo- rary vectort2 = u + t1, which is finally copied over tou = t2. This results in unnecessary temporary variables and unnecessarily many iterations over the index range of the vectors. Instead, we would like the expressionu + v * wfrom above to be evaluated lazily only when its result is actually needed (when it is assigned tou) and the whole assignment statement should be computed in a single loop equivalent to:

for(int i=0; i<u.size(); i++) { u[i] = u[i] + v[i] * w[i]; }

To achieve this, an AST representation of the right-hand side u + v * w would be needed. Since theASTof the compiler is not available to us in C++, we have to reconstruct it as a template type using template meta- programming. For example, a binary operator node can be represented by

a type that is templated with its two child nodes (for details, see Abrahams and Gurtovoy 2004). Since the implementation of this is tedious and error- prone (esp. regarding the type system and transformations of the AST), we use Boost Proto (Niebler 2007), which is itself an embeddedDSL for embeddingDSLsinto C++. Boost Proto provides means for constructing, transforming, and executing template expressions in the form of anAST. It allows specifying a grammar for aDSLand automatically takes care of the necessary operator overloading.

Embedding aDSLinto C++ this way offers the great advantage of getting full language support without any additional effort. There are, however, two drawbacks to this approach. First, the generation step from theDSL to the target language is implicit and, thus, there is no generated code that could be inspected. This can partly be overcome by looking at the output of different compiler stages, although we recognize that this can hardly compete with well-formatted code from an explicit generation step. A second drawback is concerned with error reporting. It is well known that many C++ compilers generate long and complicated error messages when it comes to errors concerning template types. But even if this was overcome, the error reporting would still not be on the level of abstraction on which the users write their code in theDSL(i. e., matrices and vectors