PGAS Programming Model - Software Background and Trends

2.2 Software Background and Trends

2.2.8 PGAS Programming Model

PGAS-based programming models aim to provide the ease of shared memory approaches such as OpenMP (Section 2.2.1) whilst also providing the performance and scalability of message passing based approaches such as MPI (Section 2.2.7). To implement shared memory constructs they utilise a global address space and a one-sided communication model to potentially enable processes to access any memory location. This global address space is, however, logically partitioned with each segment assigned to a particular processing element within the overall application. The model is thus able to express memory access locality and maps well to the architecture of current generations of HPC platforms, which

facilitates improved performance and scalability, potentially equivalent to or greater than that of the message passing model. It has also been recognised that the per-message overheads of models such as MPI may not be reducing sufficiently for MPI to be practicable on exascale system architectures, potentially necessitating the use of PGAS-based approaches [11].

Numerous PGAS languages and programming models are currently in ex- istence including but not limited to: UPC, Global Arrays, X10 and Chapel; each of which is targeted at a di↵erent user-base and is subtly di↵erent in their particular implementation of the general PGAS approach. This thesis examines the applicability of two additional PGAS implementations, CAF and OpenSHMEM, to explicit hydrodynamics applications and provides background information on each of these models in the following sections.

The CAF Programming Model

Several CAF extensions have been incorporated into the Fortran 2008 standard, the additions aim to make parallelism a first class feature of the Fortran language. These extensions were originally proposed in 1998 by Numrich and Reid as a means of adding PGAS concepts into the main Fortran language, using only minimal additional syntax [148].

CAF continues to follow the SPMD (Single Process Multiple Data) language paradigm with a program being split into a number of communicating processes known as images. The number of images is defined at runtime and is static throughout the execution of the program; no language facility exists yet for dynamic image creation. Communications are all one-sided, with each process able to use a global address space to access memory regions on other processes, without the involvement of the remote processes. The “=” operator is over- loaded for local assignments and also for remote loads and stores. Increasingly, o↵-image loads and stores are being viewed as yet another level of the memory hierarchy [19]. In contrast to OpenSHMEM, CAF employs a predominantly compiler/language based approach (no separate communications library), in which parallelism is explicitly part of the Fortran 2008 language. Consequently the Fortran compiler is potentially able to reorder the inter-image loads and stores with those local to a particular image.

The CAF language also enforces a local view of computation, requiring programmers to explicitly manage data locality and communication. Objects are declared to be co-arrays using an additional syntax operator “[ ]”. Any object, both arrays and scalars, can be declared as a co-array and when declared as such a copy of this object must exist, and be of the same size, on each image within the overall CAF program. The square brackets essentially assign an

additional dimension (potentially multiple dimensions) to a particular object, enabling the object to be uniquely referenced by other images. Images can use the “( )” notation to access the elements of a local array but must use a combination of both notations“( )[ ]”in order to access the elements of remote co-array objects, whether they reside within the local or a remote node.

Two forms of synchronisation are available within the language, thesync all construct provides a global synchronisation capability, whilst the sync

images construct provides functionality to synchronise particular subsets of

images. Collective operators have not yet been standardised, although Cray have implemented their own versions of several commonly used operations. Additionally no support exists for image“teams” or communicators within the current Fortran 2008 standard.

The OpenSHMEM Programming Model

The SHMEM programming model was originally developed by Cray for their T3D systems [81]. Although the technology has existed for some time, it was only recently standardised in 2012 as part of the OpenSHMEM initiative [40, 157]. Under the OpenSHMEM programming model, communications between processes are all one-sided and are referred to as “puts” (remote writes) and

“gets” (remote reads). The technology is able to express both intra- and inter- node parallelism, with the latter generally requiring explicit RDMA support from the underlying system layers. These constructs also purport to o↵er potentially lower latency and higher bandwidth than alternative approaches.

OpenSHMEM is not explicitly part of the Fortran and C language standards and is implemented as part of a library alongside these existing sequential languages. Processes within OpenSHMEM programs make calls into the library to utilise its communication and synchronisation functionality, in a similar manner to how MPI libraries are utilised. The programming model operates at a much lower-level than other PGAS models, such as CAF, and enables developers to utilise functionality significantly closer to the actual underlying hardware primitives. It also makes considerably more functionality available to application developers.

The concept of a symmetric address space is intrinsic to the programming model. Each process makes areas of memory accessible to the other processes within the overall application, through the global address space supported by the programming model. It is generally implementation-dependent how this functionality is realised; however it is often achieved using collective functions to allocate memory at the same relative address on each process.

implementpoint-to-pointsynchronisation it is necessary to utilise explicit“flag”

variables, or potentially use OpenSHMEM’s extensive locking routines, to con- trol access to globally accessible memory locations. The concept of memory

“fences”, which ensure the ordering of operations on remote memory locations, are also intrinsic to the programming model. Collective operations are part of the standard, although currently noall-to-oneoperations are defined, just their

all-to-all equivalents.

In document Evaluating technologies and techniques for transitioning hydrodynamics applications to future generations of supercomputers (Page 53-56)