From Implicit to Explicit Concurrency - A Dataflow IR for Implicit Concurrency

4.3 A Dataflow IR for Implicit Concurrency

4.3.1 From Implicit to Explicit Concurrency

Dataflow is an inherently concurrent representation of a program as a directed graph where nodes are computations and edges represent data transfer between them. As such, we present our dataflow IR in an abstract fashion, as an execution model for exposing concurrency. For a concrete concurrent or even parallel implementation, a compiler back-end can map this graph either to threads and queues [148] or to processors and interconnects, for example on an FPGA [215]. For this paper, we implemented the threads-and-queues version. The details are beyond the scope of this paper.

The Dataflow IR We define our dataflow IR in Figure 4.7a. The basic dataflow elements are nodes, edges and ports. A dataflow node receives data via its input ports to perform a computation and emits the results via its output ports. Such a computation can correspond to foreign function or I/O calls, as well as to one of the special dataflow or predefined value functions. Data travels through the graph via the edges where an edge originates at an output port and terminates at an input port.

An input port can only be connected to a single edge while an output port can have multiple outgoing edges. An output port replicates each emitted value and sends it via its outgoing edges to other nodes. Typical dataflow nodes dequeue one value either from one or all their input ports, perform a computation and emit a result. That is, they have a 1−1 correspondence from the input to the output, very much like a function. But dataflow nodes are free to define their own correspondence. For example, a node that retrieves a value from one or all its input ports but emits N values before retrieving the next input value has correspondence 1−N. The opposite N−1 node retrieves N values from one or all its input ports and only then emits a single result value. This correspondence is depicted by the color of the node in our figures. In order to support this concept, a node is allowed to have state, i.e., its computation may have side-effects. We define a list of 1−1 dataflow nodes that allow control flow and enhance concurrency. The ctrl node takes a boolean and turns it into a control signal that states whether the next node shall be executed or not. If the control signal is false then the node is not allowed to perform its computation and must discard the data from its input ports. The not node implements the negation of a boolean. The sel (select) node receives a choice on its bottom input port that identifies the input port from which to forward the next data item. The ds (deterministic split) node may have any number of output ports and forwards arriving packets in a round-robin fashion. The dm (deterministic merge) node may have any number of input ports and performs the inverse operation. Additionally, we define three nodes to operate on lists. The len−[] computes the size of a list. The [] is a 1−N node that receives a list and emits its values one at a time. In order to perform the inverse N −1 operation, the [] node first receives the number N on its input port on the top before it can construct the result list.

Dataflow Elements: d ::= 1-1 node | edge | port | 1-N node | N-1 node

Dataflow Functions (/Nodes):

| ctrl data to control signal

| not negation

| sel selection

| ds det. split

| dm det. merge

Predefined Value Functions:

| len-[] length of list

| []~> list to stream

| ~>[] stream to list

(a) Dataflow IR Combinators:

ff 7→ ff foreign function call

io 7→ io I/O call

Terms:

x 7→ variable

t 7→ term

let x= t in t 7→ x lexical scope

fff(x) 7→ x ff apply ff-call f to x

io(x) 7→ x io apply I/O call to x

Control Flow: if(t t t) 7→ sel ctrl ctrl not Predefined Functions: map(λx.t [v1. . . vn]) 7→ … []~> ~>[] [v1…vn] len-[] _ds… dm … (b) Translation

Figure 4.7: Definition of the dataflow IR and the translation from the expression IR to the dataflow IR.

Lowering Expressions to Dataflow From an expression in EDD form we can easily derive the corresponding dataflow graph, as shown in Figure 4.7b. Each term on the right-hand side of a binding form translates to a node. In EDD form this can only be an application of a combinator, a conditional or a call to one of the pre-defined functions. The definition of dataflow nodes perfectly matches the semantics of our ff and io combinators because both may have side-effects. Since both

… io io batch … []~> ~>[] unbatch… len-[] ds… …dm

Figure 4.8: The dataflow graph for concurrent I/O.

require only a single argument, the corresponding nodes define one input port and one output port with a 1−1 correspondence. Each variable translates into an arc. More specifically, each application to a variable creates a new edge at the same output port. An unlabeled ellipse denotes an abstract term which translates to a subgraph of the final dataflow graph. In order to translate control flow into dataflow, we turn the result of the first term into a control signal and send it to the subgraph from the second term. The subgraph for the third term receives the negated result as a signal. Therefore, the subgraph for the second term computes a result only if the subgraph of the first term emitted a true value. Otherwise, the subgraph for the third term performs the computation. The translation of the map function first streams the input list and then dispatches the values to the replicas of the subgraph derived from t. The subgraph results are again merged and then turned back into a list that is of the same length as the input list. Computation among the replicas of the subgraph of t can happen concurrently. To achieve maximum concurrency, the number of subgraphs needs to be equal to the size of the input list. This is impossible to achieve at compile-time because the size of the list is runtime information. So we need to make a choice for the number of replicas at compile-time and route the length of the input list to the node that constructs the output list. For both terms, the conditionals and the map function, we assume that the terms passed as arguments do not contain any free variables. The approaches to do so can be found elsewhere [215]. In this paper, however, we focus on concurrency transformations.

In document Towards Implicit Parallel Programming for Systems (Page 69-71)