3.3 Distributed Arrays
3.3.2 Ghost Region Updates
3.3.2.1 Implementing Ghost Region Updates for X10 Distributed
Support for ghost regions was implemented in the package x10.regionarray as a number of new classes, as well as modifications to the existingDistArray class. A new method on the Region class, getHalo(haloWidth:Int), returns a halo region comprising the neighborhood of the target region. For rectangular regions, the halo region is simply a larger rectangular region enclosing the target region. For the special case of a zero-width ghost region (no ghosts),Region.getHalo(0)returns the region itself.
The constructor for DistArray was changed to allocate storage for the ghost region inLocalState. A new field,LocalState.ghostManager:GhostManager, holds a distribution-specific object that manages ghost updates. This reduces to standard behavior for DistArray in the special case of ghostWidth==0 as there is no ghost manager and the ghost region is identical to the resident region. All operations on
DistArraywere changed to use the ghost region rather than the resident region for indexing.
The implementation of the GhostManagerinterface is specific to the distribution type. It may also use different algorithms depending on the target architecture. The initial implementation is for rectangular, block-distributed arrays only. For these arrays, ghost region data are collected for sending to each place using thegetPatch
method described in §3.3.1.
The following methods are defined onDistArrayand constitute the user APIfor the ghost region implementation:
• sendGhostsLocal()
an operation called at each place in the distribution that sends boundary data from this place to the ghost regions stored at neighboring places
§3.3 Distributed Arrays 57
an operation called at each place in the distribution that waits for ghost data at this place to be received from all neighboring places
• updateGhosts()
an operation that is called at a single place to update ghost regions for the entire array; this starts an activity at each place in the distribution to send and wait for ghosts
Low-Synchronization Algorithm for Ghost Updates
Ghost region updates are typically used in the context of a phased computation, for example:
1 for (i in 1..ITERS) {
2 updateGhosts();
3 computeOnLocalAndGhostData();
4 }
It is necessary to synchronize between neighboring places in a computation to ensure that all ghost regions have been fully received at a place before computation begins at that place.
There are two basic approaches to this problem. One is to use two-sided (send/re- ceive or scatter/gather) communications. This is the approach used in the PETSc library [Balay et al.,2011], and in the M_P (message-passing) algorithm described by Palmer and Nieplocha[2002].
An alternative is to use one-sided communications surrounded by explicit syn- chronization. In some computations, such synchronization may naturally be included in the computation, for example to calculate a minimum or maximum value across all grid points. In the following example, collective synchronization occurs each iteration before the ghost data are sent and again before they are used in computation:
1 for (i in 1..ITERS) { 2 // collective synchronization 3 sendGhosts(); 4 computeOnLocalData(); 5 // collective synchronization 6 computeOnGhostData(); 7 }
The collective operations surrounding the ghost update ensure the consistency of ghost data by enforcing an ordering with regard to other messages. All previ- ous send operations from a place must complete before the collective reduction can begin. Where such natural synchronization is not present, the ghost update op- eration must perform synchronization before and after sending ghosts. In Global Arrays [Nieplocha et al.,2006a] this synchronization is done with a global collective operation.
Our approach combines non-blocking one-sided messages with local synchroniza- tion as suggested byKjolstad and Snir [2010]. A phase counter is assigned to each
ghosted DistArray. The use of a unique phase counter per array allows ghost up- dates on different arrays to proceed independently. This can be of use, for example, in a multigrid or adaptive mesh refinement algorithm in which different timesteps are used for coarser or finer grids. In each even-numbered phase the program com- putes on ghost data; in each odd-numbered phase ghost data are exchanged with neighboring places. A place may not advance more than one phase ahead of any neighboring place. A call tosendGhostsLocal()increments the phase for this place and then sends active messages to update ghost data at neighboring places. Each active message also sets a flag to notify the receiving place that data have arrived from a particular neighbor. The receiving place callswaitForGhostsLocal()to check that flags have been set for all neighbors before proceeding with the next computation phase.
Split-Phase Ghost Updates
The use of local synchronization allows phases to proceed with computation before neighboring places have received their ghost data; it also allows communication of ghost data to overlap with computation on local data at each place, as follows:
1 // at each place 2 for (i in 1..ITERS) { 3 sendGhostsLocal(); 4 computeOnLocalData(); 5 waitForGhostsLocal(); 6 computeOnGhostData(); 7 }
This approach is similar to the split-phase barrier in UPC, discussed in chapter2. Use of Active Messages
In the implementation of ghost updates, active messages are used to transfer and perform local layout of ghost data, and to ensure consistency of data for each phase of computation. InsendGhostsLocal(), each place sends messages to neighboring places. A conditional statement (when) ensures that the ghost data are not updated until the receiving place has entered the appropriate phase. After ghost data have been updated, a flag is set within an atomic block to indicate that the data have been received:
1 at(neighbor) async {
2 val mgr = localHandle().ghostManager;
3 when (mgr.currentPhase() == phase);
4 for (p in overlap) { 5 ghostData(p) = neighborData(p); 6 } 7 atomic 8 mgr.setNeighborReceived(sourcePlace); 9 }
§3.4 Summary 59
InwaitForGhostsLocal(), another conditional atomic block is used to wait until ghost data have been received from all neighboring places:
1 public def waitForGhostsLocal() {
2 when (allNeighborsReceived()) {
3 currentPhase++;
4 resetNeighborsReceived();
5 }
6 }
Support for ghost region updates in block-distributed arrays was added to x10. regionarray.DistArrayin X10 version 2.5.2. Ghost region support will be added to the basic distributed array classes in thex10.arraypackage in future versions of X10. Further enhancements are possible, for example, support for irregular distributions, and ghost regions with different (or zero) extent in different dimensions.