Derivation of PPN Modeling Graphs - Maximum Cycle Mean Analysis

4.5 Maximum Cycle Mean Analysis

4.5.3 Derivation of PPN Modeling Graphs

In Section4.5.1, we mentioned the possibility of applying MCM analysis on PPNs by considering the equivalent CSDF graph and converting the CSDF graph into HSDF. This approach is depicted in the upper part of Figure4.3. Unfortunately, this results in an exponential increase in the number of nodes. To keep the time needed for the MCM computation within reasonable bounds, we must avoid the exponential increase in the number of nodes, which leads us to a new approach.

We have found a way to derive a more compact HSDF graph from a cyclic PPN, which we depict in the lower part of Figure 4.3. Our approach works by deriving an HSDF graph that models the throughput behavior of a PPN, and then applying conventional MCM analysis to this graph. The number of nodes in our HSDF graph

4.5. Maximum Cycle Mean Analysis 65

equals the number of processes in the PPN. The number of edges in our HSDF graph is linearly bounded, as we show in Proposition4.3. As such, no exponential increase of the graph size occurs, making the approach a suitable alternative for fast performance estimation of PPNs. We divide the derivation in two main steps. First, PPN processes are converted to HSDF nodes. Second, PPN channels are converted to HSDF edges.

Step 1: Constructing Nodes from Processes

The first step in deriving the PPN modeling HSDF graph is to convert PPN processes to HSDF nodes. One possible approach is to interpret the PPN as a CSDF graph [HZ+10, adg2csdf] and then derive an HSDF graph from this CSDF graph using the conventional approach [BELP96, Fig. 9]. This approach causesq(p)nodes to be instantiated for each processp. For consistent PPNs, q(p) always equals the number of points in the process domainDp:

Proposition 4.1(PPN Repetition Vector). For each processpof a consistent PPN, the corresponding element of the repetition vector of an equivalent CSDF graph equals the number of points in its process domain, that is,∀p∈ P :q(p) =|Dp|.

Proof. In a consistent PPN, for every channelc, the number of points in the corre- spondingOPDj_σ_c is equal to the number of points in the correspondingIPDk_δ_c, thus

|OPDjσc| = |IPD

δc|. Therefore, the solution of the balance equationΓ·r = 0 is

a vectorr which contains a ‘1’ for every process. As a result, the elements of the repetition vectorq=S·rare equal to the phase lengths of each node, which equals the number of points in the process domain.

As a result, a separate HSDF node would be instantiated for each iteration of the domain, resulting in large graphs even for small applications. This makes the conventional CSDF-to-HSDF approach infeasible for practical purposes. We can avoid an increase in the number of nodes based on the following observation. In an HSDF graph, allq(p) nodes originating from a processp may execute in parallel. How- ever, by definition the iterations of a PPN always execute sequentially. This allows us to represent each processpby a single HSDF node h, where nodeh represents sequential execution of allq(p) nodes of the conventional equivalent HSDF graph. We multiply the execution timeΛp of a single firing of processpbyq(p)to model

sequential execution of allq(p)nodes in the equivalent HSDF graph. As a result, the number of nodes in the resulting HSDF graph equals the number of processes in the original PPN.

66 _{Chapter 4. Performance Estimation}

The execution timet(h)of an HSDF node is set to the total time needed to fire all iterations of the process consecutively without overlapped execution. Included in this execution time are the read and write latencies and the time needed to fire the func- tion. Time spent on a blocking read or write operation is not included, which means our approach does not address the conditional synchronization aspect introduced in Section 4.2. Our approach cannot accurately assess throughput of applications in which read or write operations block on empty or full channels. To exclude auto- concurrency, we add to each HSDF node a selfloop with one initial token. This avoids multiple simultaneous executions of the entire PPN, which is undesirable when determining throughput.

Step 2: Constructing Edges from Channels

The second step in deriving the PPN modeling HSDF graph is to interconnect the HSDF nodes using edges in such a way that the PPN’s throughput characteristics are preserved. This is not trivial, because of the different semantics of HSDF edges and PPN channels: HSDF edges have an unbounded capacity and may contain initial tokens, whereas PPN channels have a bounded capacity and do not have a notion of initial tokens. We now discuss how to represent edges in a PPN modeling HSDF graph such that the PPN’s throughput characteristics are preserved.

The PPN modeling graph may contain more than one edge between two nodes a

andb, if for example the PPN contains multiple channels between two processes. It is sufficient to represent such a collection of channels by a single edge:

Proposition 4.2 (Pruning Multi-Edges in PPN Modeling Graphs). A collection of PPN channels from processato processbcan be represented by a single edge(a→

b)in the PPN modeling graph.

Proof. If an edge(a→ b)is part of a cycle, then another cycle also exists for each additional edge connecting atob. The only difference among the cycle means of those cycles is the number of initial tokens that occurs in the denominator of Equa- tion (4.1). A cycle with a larger denominator results in a smaller cycle mean, which implies the cycle mean will not be selected by Equation (4.2). Thus, we only need to consider the cycle with the smallest number of initial tokens, which is the cycle containing the edge with the smallest number of initial tokens.

We distinguish between three classes of channels: selfloop channels, feedback channels, and feedforward channels.

4.5. Maximum Cycle Mean Analysis 67 a b c 1 1 2 a b c d = 2

a) PPN containing a cycle. b) Corresponding PPN modeling graph.

... ...

Figure 4.4: Handling feedback edges in a PPN.

Selfloop channels

For a selfloop channel, which connects a process to itself, no edge is added to the HSDF graph. We omit such selfloops because in step 1 we have already added a selfloop with one initial token to each node. To see why selfloops can be omitted, suppose that the critical cycle of a PPN modeling graph would be a selfloops of processpwith buffer sizeSs≥1. This selfloop could be modeled by adding an edge

(p→p)withSsinitial tokens. For this newly added cycleCs, Equation (4.1) yields

CM(Cs) =

Λp· |Dp|

However, for selfloopeadded in line 3 of Algorithm4.1, we already have

CM(e) = Λp· |Dp|

1 .

Because CM(e) ≥ CM(Cs) for all Ss ≥ 1, we can ignore CM(Cs) in Equa-

tion (4.2). Thus, a selfloop of a PPN never forms the critical cycle, and therefore such selfloops can be omitted from the PPN modeling graph without affecting the MCM value.

Feedback channels

Feedback channels are part of a strongly connected component, and are thus the constituents of a cycle. The cycle mean of a cycle in the PPN modeling graph is computed using the sum of all initial tokens on the edges constituting the cycle, as we have shown in Equation (4.1). Hence, the amount of initial tokens on an HSDF edge representing a feedback channel may affect the MCM value of an HSDF graph. Therefore, we should determine the amount of initial tokens for each feedback edge such that an accurate MCM value is obtained.

Each cycle of a PPN contains one process that is thefirst processfrom that cycle to be fired. The channel of that cycle from which the first process reads is thelast

68 _{Chapter 4. Performance Estimation}

channelof that cycle. Initially, we construct for each PPN channel part of a cycle an HSDF edge and assign zero initial tokens to each edge. Only to the edge corresponding to the last channel of the cycle we assign a nonzero number of initial tokens d. For example, suppose processa in Figure4.4 first reads from a channel outside of the cycle and in the next firing reads from channel (c → a) that is part of the cycle

(a→ b → c→ a). As such, processais the first process of the cycle that can fire. Therefore, edge(c→a)is the last edge of the cycle. Selection between edges is not possible in the HSDF model which requires all incoming edges of a node to be read during every firing. Without assigning initial tokens to the last edge(c → a)of the cycle, the HSDF graph would be in a deadlock state, preventing meaningful analysis. To avoid this deadlock state, we assign initial tokens to the last edge.

Initial tokens on an edge of an HSDF graph are also referred to as thedelayof an edge. Here, “delay” refers to the temporal distance between the nodes in terms of iterations of the graph. For example, if an edge(a→b)has 2 initial tokens, then the firing of node bat iterationidepends on the token produced by nodeaat iteration

i−2. The PPN model does not have a notion of initial tokens, which means we need to relate the delay between two HSDF nodes that are part of a cycle to the distance between processes in the PPN model.

A notion of dependence distances is available for SANLPs from which we derive PPNs. A dependence distance vector gives the difference between a target iteration vector and the source iteration vector of a dependence [Pug92, definition d]. For a PPN channel(a → b), the distance vector gives the difference between an iteration of processbthat consumes a token and the iteration of processathat produced the token. In general, this difference may not be defined when the process iteration do- mains are different, which for example happens when the original statements are not located in the same loop nest. However, the PNGENtool flow puts all processes in a common iteration space to compute buffer sizes. In this common iteration space, the dependence distance vector is defined for any pair of processes that are connected by a channel. We therefore employ the dependence distance in the common iteration space to assign initial tokens to feedback edges in the PPN modeling graph.

We cannot use the dependence distance directly, because of the following two rea- sons. First, the dependence distance is a vector for common iteration spaces consisting of more than one dimension. In contrast, the number of initial tokens of an HSDF graph should always be a scalar value. Second, a dependence distance may benon- uniform, that is, the dependence distances may vary for different pairs of iterations. In such cases, the dependence distance of a single dimension cannot be expressed using a constant integer only, but is expressed using iterators. In contrast, the number of initial tokens of an HSDF graph should always be a constant integer value.

4.5. Maximum Cycle Mean Analysis 69

dependence distance. The way in which PNGENcomputes the buffer size (cf. Sec-

tion2.3.2) gives us a suitable approximation of the maximum dependence distance of a non-uniform dependence. For uniform dependence distances, the buffer size is an accurate measure of the dependence distance. For non-uniform dependence distances, the use of the buffer size introduces a source of inaccuracy in the PPN modeling graph.

The number of initial tokensdc assigned to the last edge of a cycle is determined

as follows. If the cycle is tight, that is, if in every iteration each process depends on the output of the previous iteration of its predecessor process, the processes execute sequentially without overlap between firings of different processes. In such a case, the dependence distance vector contains zeroes for all dimensions except the last for which it contains a one. That is, the dependence distance vector is of the form

[0,0, . . . ,1]. The corresponding buffer sizeScis one, and we assign one initial token

to the last edge of the cycle.

If the cycle is not tight, then overlapped execution between firings of different processes may occur. In such a case the dependence vector is different from the form described above. We assignSc+ 1initial tokens to the last edge of a cycle which cor-

responds to the buffer size plus one additional initial token to accomodate overlapped execution. Currently, this is a known source of inaccuracy in the MCM modeling HSDF graphs derived from PPNs. Determining the number of initial tokens to assign to the last edge of a non-tight cycle is therefore subject of future investigation.

Feedforward channels

Feedforward channels connect a strongly connected component of a PPN to another strongly connected component. As such, the corresponding feedforward edges in an HSDF graph are not part of any cycle and thus would not affect the MCM value. In the HSDF model, edges have infinite capacity which implies that a feedforward edge indeed does not affect the MCM value of an HSDF graph. That is, a feedforward edge cannot reach a “full” state that would cause blocking writes decreasing throughput. In contrast to HSDF edges, PPN channels have a finite capacity which may cause blocking write conditions that decrease throughput.

To take the finite capacity of a channel into account, we add for each feedforward channel(a → b) a forward edge e = (a → b) and a backedge (b → a) [SB00, Section 10.4]. We assign zero initial tokens to the corresponding feedforward edge in the HSDF graph. The number of initial tokensme∈Non the backedge represents

a particular buffer capacity. Empirically, we found that a valuemecorresponds to a

buffer capacity

70 _{Chapter 4. Performance Estimation}

wheredcis the dependence distance approximation used in the discussion above on

feedback channels. That is, the MCM computed usingmematches the PPN period

achieved with a buffer capacity ofSc tokens.

Bounding feedforward channel delays

According to the HSDF model, any positive number of initial tokensmon a backedge is allowed. This leads to an infinite number of possible buffer configurations. How- ever, when m is below a certain value, a corresponding PPN buffer size may not exist due to the operational semantics of a PPN process. This gives a lower bound on

m. Also, whenmexceeds a certain value, the MCM is not affected anymore, which means that increasing the buffer size does not lead to a higher throughput. This gives an upper bound onm. Therefore, we can bound the design space by only considering the values that lie between the lower and upper bounds.

The lower bound on m for any edge in the PPN modeling graph is two, which is a consequence of the operational semantics of a PPN process. This lower bound of two can be explained as follows. In an HSDF graph, a token is kept on the edge until the consuming node has finished its firing. In a PPN graph, a token is transfered to a buffer internal to the process during the read stage. This effectively increases the buffer size by one. As such, a buffer size of one corresponds to a number of initial tokensm= 2.

The upper bound onm represents the point where increasing the buffer size does not yield a higher throughput. This corresponds to a valuemfor which the maximum cycle mean of the graph is determined by cycles of the original graph or selfloops, but not by a cycle introduced by the modeling of a feedforward edge. For an arbitrary feedforward edge (a → b), we choosem such that the resulting cycle mean value is less than or equal to the cycle mean of the selfloops of the nodes involved in the cycle:

t(a) +t(b)

m ≤max{t(a), t(b)}.

For positive execution timest, this inequation holds ifmequals the number of nodes in the cycle, which is two. However, other paths betweena andbmay exist which must be considered as well to avoid that they determine the maximum cycle mean. To ensure that none of the other paths betweenaandbdetermine the maximum cycle mean, we generalize the above inequation to a path consisting ofnnodes:

i=1t(i)

n ≤max

i=1{t(i)}. (4.5)

4.5. Maximum Cycle Mean Analysis 71 a b c 1 1 1 a 30 b 30 c 30

a) PPN containing feedforward edges. b) Corresponding PPN Modeling graph.

Figure 4.5: Handling feedforward edges in a PPN.

m1 m2 m3 MCM(G4.5b) 1 1 1 90 1 1 2 60 1 1 3 60 1 2 1 90 1 2 2 60 1 2 3 60 m1 m2 m3 MCM(G4.5b) 2 1 1 90 2 1 2 60 2 1 3 60 2 2 1 90 2 2 2 45 2 2 3 30

Table 4.1: MCM values for different numbers of initial tokens. Only for the configurations in boldface a valid PPN buffer size configuration exists.

cesses on the longest path connectingσctoδc.

In Figure 4.5a, we show a PPN containing three feedforward channels. In Fig- ure4.5b, we show the corresponding modeling graph. For each feedforward channel in the PPN, we have added a forward edge and a backedge in the modeling graph. The values m1, m2, and m3 specify the amount of initial tokens assigned to the

backedges. According to equation (4.5), the upper bound ofm1 andm2 is two and

the upper bound of m3 is three. In Table 4.1, we show twelve possible combina-

tions ofm-values, deliberately assuming a lower bound of 1 for eachm-value. This yields twelve different design points trading off buffer size against throughput. If we take the lower bound on m-values for PPN modeling graphs into account, any combination ofm-values containing an m-value below two does not have a corresponding PPN buffer configuration. Hence, only for(m1, m2, m3) = (2,2,2)and

(m1, m2, m3) = (2,2,3)an actual PPN buffer configuration exists. As such, for this

example the buffer size design space is reduced to only two points. Summary

The derivation of the more compact HSDF graph is summarized in Algorithm4.1. The input is a PPN and a numberΛprepresenting the execution time of a single firing

In document Estimation and optimization of the performance of polyhedral process networks (Page 77-85)