The Mercurium Source-to-source Compiler - SIMD@OpenMP : a programming model approach to leverag

Input source code C/C++/Fortran OpenMP front-end Linking Native compilation OpenMP + SIMD binary C/C++/Fortran front-end C/C++/Fortran codegen Libraries Compiler components Nanos++ runtime Intel RTL Nanos++ lowering Intel RTL lowering User files External tools Runtime libraries

Figure 2.1: Mercurium compilation diagram

2.3 The Mercurium Source-to-source Compiler

Mercurium [124, 57] is a research source-to-source compiler with support for the C, C++ and FORTRAN programming languages. It is developed at Barcelona Su- percomputing Center with the main goal of building an infrastructure for fast pro- totyping of new parallel programming models. Mercurium has already support for several programming models such as OpenMP 3.1 [116] (partial support), OmpSs and StarSs [12]. In addition, it is also able to manage CUDA and OpenCL codes.

Figure 2.1 shows a high-level scheme with the different phases of the Mercurium compiler. We can distinguish three main parts in the structure of the compiler. The first one is a front-end with full support for C, C++ and FORTRAN syntaxes that gathers all the symbolic and typing information into the intermediate representation of the compiler. The second one is a pipelined sequence of phases that performs all the source-to-source transformations and the specific features required by the target programming model and runtime. This sequence of phases is represented in Figure 2.1 with the OpenMP front-end phase and the lowering phases for the Nanos++ Runtime Library and the Intel OpenMP Runtime Library. Finally, the last part of the Mercurium compiler is the code generator. This part of the compiler is responsible for regenerating the reliable final source code from the intermediate representation of Mercurium. The output code is then ready to be compiled and linked with the chosen native compiler.

1 expression: NODECL_ADD([lhs]expression, [rhs]expression) type const-val 2 | NODECL_LOWER_THAN([lhs]expression, [rhs]expression) 3 type const-val 4 | NODECL_ARRAY_SUBSCRIPT([subscripted]expression, 5 [subscripts]expression-seq) 6 type const-val 7 | NODECL_FUNCTION_CALL([called]expression, [args]argument-seq) 8 type const-val

Listing 2.2: Example of nodes of the intermediate representation of Mercurium. The whole grammar can be found in the file cxx-nodecl.def [124]

2.3.1 Intermediate Representation

The intermediate representation (IR) of Mercurium is a tree-like IR based on an abstract syntax tree (AST) data structure. This AST represents the input code in a high-level and accurate way. Unlike low-level IRs, a high-level and accurate IR is indispensable to generate an output source code as similar as possible to the input code, which is one of the main goals of the Mercurium source-to-source compiler.

The AST is built in the Mercurium front-end with information of the explicit code and data of the compilation unit. However, information regarding declarations is minimal in this AST as the code generator of Mercurium is able to deduce it from the code representation. The type system and other symbolic information are represented separately from the AST. Mercurium uses independent data structures for these purposes that are accessible from fields of the AST nodes.

A single grammar describes the set of AST nodes that are used for the representation of the C, C++ and FORTRAN programming languages. These nodes are mainly shared by the three languages. However, there are also a few specific nodes that are aimed at representing particularities of a language that are not present in the others. An example of this grammar is shown in Listing 2.2, where four rules describe the nodes that represent arithmetic additions, lower than compar- isons, array subscripts and function calls. Each AST node is comprised by kind of the node, up to 4 children nodes and a set of external attributes that are node- kind dependent. For example, these attributes can be the type of the node used of the type system, a symbol, a scope of the language or an associated constant value (const-val). This grammatical description is automatically translated to a non-hierarchical class system that is then used by the developers of the compiler.

2.3.2 Analysis Infrastructure

The Mercurium analysis infrastructure was developed in parallel with the work of this thesis. Although some analysis techniques were implemented with research purpose, they are able to provide the compiler with an outstanding and solid analysis resources.

2.3. The Mercurium Source-to-source Compiler 17 [3] FunctionCode [7] OmpLoop [19] LoopFor [4] ENTRY [8] ENTRY [13] j = 0 [20] ENTRY [16] j < N [35] z[j] = a * x[j] + y[j] TRUE [37] EXIT FALSE [18] j++ [39] EXIT [41] EXIT

Figure 2.2: Example of a parallel control flow graph of the Saxpy computation

The main compiler analysis techniques implemented in Mercurium are use definitions, liveness, reaching definitions, induction variable analysis and range analysis [148, 125]. These techniques combine old and new traditional approaches ap- plied from the particular and restricted viewpoint of a source-to-source compiler. The restrictions that affect the implementation of analysis techniques in the context of the Mercurium compiler are mainly the characteristics imposed on the intermediate representation (IR). The Mercurium compiler utilizes a high-level and accurate IR (see Section 2.3.1) in favor of keeping the original structure of the input source code. This means that the IR is generic and it is neither in SSA nor three-address form.

All the analysis techniques implemented in Mercurium use a parallel control flow graph (PCFG). This structure is a hyper-graph that extends the classic CFG representation to describe parallelism, such as information of the OpenMP parallel constructs. The nodes of this PCFG can be simple or structured and they contain pointers to their counterpart nodes in the AST. Simple nodes represent sequential execution of one or more statements. Structured nodes are PCFGs that represent control flow or parallel semantic. Further information about the PCFG infrastructure is detailed in our work about correctness of OpenMP tasks [131].

Figure 2.2 shows an example of a simple loop annotated with an OpenMP for directive, which is represented with the OmpLoop node. This node states that the

iterations of the nested loop might be executed in parallel. This information will be taken into account in all the analysis algorithms implemented on this PCFG. The nodes FunctionCode, OmpLoop, LoopFor are structured nodes, i.e., PCFGs with their corresponding Entry and Exit nodes. The remaining nodes in the example are simple nodes.

In the context of this thesis, we intensively exploit use-definitions, liveness, reaching definitions and induction variable analyses.

In document SIMD@OpenMP : a programming model approach to leverage SIMD features (Page 43-46)