haloSwap() - Inter-Process Communication - High-performance computing for computational biology

3.3 Inter-Process Communication

3.3.1 haloSwap()

The communication process described byHaloSwapis implemented in thehaloSwap()function

of device.c, the full code for which is shown in Listing C.3. In practice, the separate send and

receive actions shown above are combined in the MPI_Sendrecv()function. This prevents the

need to establish separate connections to send and receive with the same neighbouring process.

Defining Hyperplanes for Communication

Each subdomain will have up to six surfaces that require halo exchange with surrounding pro-

cesses. For each surface, two three-dimensional hyperplanes ofNeware defined — the surface of

the subdomain to be sent, and the corresponding halo into which received data will be written. It is possible to facilitate the exchange of data between processes by buffering, then ‘unpacking’,

The MPI library, however, offers an efficient mechanism by which buffering can be avoided. To the developer, data appears to be directly sent from, and received into the required regions of an array. To do this, MPI requires that the regions of the array being accessed are defined as a derived datatype, using subarrays. A subarray allows a one-dimensional array to

be viewed as containingn-dimensional data, and to place constraints on that array, such that

sparsely arranged data within the array can be accessed as if contiguous in memory. The con-

cept of subarrays fitsBeatbox’s use of Newwell. Subarrays for halo-swapping are defined in the

decomp_defineHaloTypes()function ofdecomp.c, which is shown in Listing C.4. Once defined,

MPI_Sendrecv()calls can accessNewusing the derived datatype definition, allowing subarrays

to be sent or received in a single call. Examples of derived datatypes in use can be seen in Listing C.3.

Magic Corners

As discussed in Section 4.11, some operations used with anatomically realistic tissue geometry must be able to reference neighbouring points diagonally. For points at the corners of the local subdomain, this will mean referencing points from diagonally neighbouring subdomains. A 2D subdomain will have 4 diagonal neighbours in addition to its 4 orthogonal neighbours, as illustrated in Figure 3.10. In 3D, 20 diagonal neighbours are added to the existing 6 orthogonal neighbours. Since there is significant overhead in establishing communication to other processes, synchronisation with these additional processes every timestep is to be avoided.

(a) Halo exchange involving only orthogonally neighbouring processes.

(b) Halo exchange involving orthogonally and diagonally neighbouring processes.

Figure 3.10: Two-dimensional views of traditional halo exchanges. Red dots indicate updated halo points.

Beatbox employs an algorithm designed to obviate the need to synchronise with diagonally

neighbouring subdomains. Instead of communicating individual corner points between diagonally neighbouring processes, we shall exchange orthogonally the extended corresponding faces of the

local domain hypercube, (x, y, z, v). First, we widen the (x, z, v) face of the local subdomain

to include the halo points in the x direction. We then exchange these widened faces in the y

direction. We then widen the (x, y, v) faces to include halo points in both thexandydirections.

These faces are then exchanged in the z direction. As illustrated in Figure 3.11, exchanging

3.3. INTER-PROCESS COMMUNICATION 87

exchanges taking place inx,y,zorder), allows corner points to travel between processes indirectly,

via the 6 orthogonal neighbours. The corner points updated by this method are referred to as

‘magic corners’. Magic corners will always travel along axes in x,y,z order, meaning that, for a

point to come from{x−1, y+ 1, z+ 1}, it will travel in the negative direction along thexaxis,

then positively along they axis, then positively along thez axis. Its reciprocal point will travel

positively along thexaxis, negatively along theyaxis, then negatively along thezaxis to reach

its destination. The path taken by two opposing corner points is illustrated in Figure 3.12

(a) Step 1 (b) Step 2

Figure 3.11: Two-dimensional view of the Beatbox halo exchange, with ‘Magic Corners’ shown

as stars. New exchanges are shown with larger arrows.

Enabling Collective Communication

When called,haloSwap()starts a collective exchange of halos. It is necessary to callhaloSwap()

Figure 3.12: Close-up view of the path taken by two opposing corner points. Travel is along the

xaxis, then the yaxis.

global space are updated. In order for this to work,haloSwap()must be called by every instance

of a device, including those whoserunHere flag is set to0. This is managed by inserting calls

tohaloSwap()inside the device’s Run function using theRUN_HEADmacro (device.h). As shown

in Listing 3.1, haloSwap() is called before runHere is tested. This ensures that every active

process is involved in exchanging halos.

Preventing Unnecessary Communication

Exchanging halos is an expensive operation, due to both the time taken to send and receive data across the network, and the ‘synchronisation time’ taken as collective calls require processes wait for the slowest process to catch up. Only devices that reference points outside of their space —

i.e. those with ‘stencils’ of more than one point — need callhaloSwap(). Devices indicate their

need for synchronisation by setting thesync field of their Device structure to1. As shown in

Listing 3.1, theRUN_HEADmacro testssyncbefore callinghaloSwap().

In document High-performance computing for computational biology of the heart (Page 105-108)