Active Communicators - Net Patterns - Parallel Programming with Global Asynchronous Memory: Mod

5.3 Net Patterns

5.3.3 Active Communicators

FIGURE5.4: AGAMnet representing a farm.

Fig.5.4illustrates aGAMnet implementing a farm. Each entity of the farm skeleton is mapped to aGAMprocessor as follows (letters between parentheses refer to identifiers in Fig5.4):

• The farm scheduler is mapped to a source processor (S);

• Each farm worker is mapped to a filter processor (W);

• The farm collector is mapped to a sink processor (C).

The scheduler S is linked to each worker by an one-to-many communi-cator (D), whereas each worker is linked to the collector C by a many-to-one communicator (G). Since D is attached to a single input processor, the pulling policy for D is the constant function that always returns S, as dis-cussed in Sect.5.1.1. Symmetrically, the pushing policy for G is the constant function that always returns C.

As for pushing and pulling policies for, respectively, communicators D and G, they depend on the specific type of farm being implemented, as we discuss in Sect.7.1.

5.3.3 Active Communicators

Let us consider the scenario in which a processor p is linked to a number of downstream processors by an one-to-many communicator. Let us also as-sume that the pushing dispatcher is of^broadcasttype (cf. Sect.5.2.2), that is, each incoming pointer is delivered to all the output processors. We remark that the depicted scenario is not an unrealistic one. For instance, in the con-text of data analytics, a common pattern consists in analyzing the same data by means of different operators, which immediately leads to broadcasting communication in case the operators are deployed on different nodes of a distributed platform. As a concrete example, in the Apache Storm [109]

framework for tuple processing, broadcasting is the only available model for data distribution.

Starting from this generic scenario, it can be observed that, by increasing the number of output processors, p will consume more and more compu-tational resources to distribute the data to all the processors. Therefore, we propose active communicators as a general approach for alleviating the de-picted phenomenon. As illustrated in Fig.5.5, an active communicator is a GAM(sub-)net that behaves as a regular (passive) communicator, in that it routes pointers among the processors. Internally, an active communicator consists of multiple processors, linked by intermediate communicators ac-cording to some networking topology (e.g., tree), that actively collaborate to deliver each pointer.

5.3. Net Patterns 87

P C

. . .

(A) A simple net with a broadcasting communicator C.

P C

. . .

. . . . . .

(B) The net in (A) after replacing C with an active communicator, highlighted in

gray.

FIGURE5.5: Replacement of a broadcasting communicator with an active communicator shaped as a tree of depth 1.

We remark that, differently from the ^svc functions considered so far (e.g., cf. Sect.5.3.1), the^svcfunctions for the processors within active com-municators are likely to not access the data carried by the incoming point-ers. Indeed, the simplest form of such functions amounts to a single call to the^emit function. Therefore, when based on dispatching policies that do not access the data as well (as in the common case), active communicators provide scalable, lightweight communication across processors in a GAM net.

Summary

In this chapter, we introducedGAMnets, a dataflow-like parallel program-ming model based on GAM. Moreover, we presented a template-based C++APIand implementation ofGAMnets, for programmingGAMnets in terms of parallel stateful processors exchanging smartGAM pointers. Fi-nally, we showed how two well-known stream-parallel programming pat-terns, namely farm and pipeline, can be implemented asGAMnets.

Chapter 6

Higher-Level Programming Models on top of GAM

In this chapter, we discuss howGAMcan be exploited to buildRTSsfor dif-ferent high-level parallel programming models. Although this chapter does not represent a fully developed contribution, it outlines possible exploita-tions of theGAMmodel and implementation presented so far. Therefore, this chapter should be regarded as a detailed proposal for future work.

In particular, we focus on implementingRTSs for data parallelism and task parallelism. From an architectural perspective, this amounts to imple-menting a RTS in terms of stream parallelism, that is the primary type of parallelism provided by GAM. This approach is extensively used in the implementation of parallel RTSs. For instance, several implementations of the data-parallel OpenMP API (cf. Sect.2.2.4) consist in a thread farm (cf. Sect. 5.3.2). Similarly, FastFlow uses stream skeletons (i.e., farm and pipeline) to implement data-parallel operators (cf. Sect. 2.4.2). As for the realm of distributed systems, the Flink [79] framework for Big Data analyt-ics relies on a streaming runtime for implementing batch processing.¹

On the same line as the mentioned frameworks, we envision to rely on passing pointers—rather than data—as the basic mechanism for limiting performance overhead.

This chapter proceeds as follows. In Sect.6.1, we propose accelerated data structures for supporting data-parallel programming. In Sect.6.2, we discuss the GAM implementation of a framework for task-parallel pro-gramming.

6.1 Accelerated Data Structures

We propose to target data parallelism in terms of accelerated data structures.

This approach, which stems from programming specialized hardware ac-celerators such asGPUs, has been proven by Drocco et al. [74] to be effec-tively applicable in the context of distributed-memory platforms.

We proceed by illustrating the mentioned approach for exploiting clus-ters in a data-parallel manner (Sect. 6.1.1). Then we depict an APIbased accelerated data structures and transformations (Sect.6.1.2), together with aGAM-based implementation (Sect.6.1.3).

1Batch processing is a generic name to denote finite datasets processing, by means of a combination of data-parallel operations.

GAM Accelerator

. . .

GAM Sequential Flow

ofﬂoad

get result

FIGURE6.1: Cluster-as-accelerator paradigm withGAM.

In document Parallel Programming with Global Asynchronous Memory: Models, C++ APIs and Implementations (Page 94-98)