Actor’s Action Selection Procedure - High-level synthesis of dataflow programs for heterogeneou

. . .

Stage 1 One clock cycle

Stage N One clock cycle

Low cost operators

No resource sharing among operators

Pipeline registers

Operators chain

Figure 4.6 – One clock per stage pipeline scheduling with chaining and without sharing resources.

recursive functions and explicit stack mechanisms. For each feasible number of pipeline stages, a pipeline schedule with minimum total register width is taken as an optimal coloring, which is then automatically transformed into an RVC-CAL description. An RVC-CAL developer can add a pipeline attribute to a selected action for it to be pipelined.

4.5 Actor’s Action Selection Procedure

An actor has a set of actions that each of them can read, write and change state variables as discussed in the previous chapter. If an actor has many actions, then either an FSM is defined or the control is effectuated by the firing conditions (guards and tokens availability) of the actions. When implementing an actor in either hardware or software, an action selection procedure should be defined. This procedure handles the execution of an actor as mentioned in Section 3.3.2 and it is implemented as a finite state machine. In comparison with actor machines [151], here the action selection calculates all conditions in parallel and let the

extracted actor state machine to choose which action should be executed.

Figure 4.8 represents the action’s selection finite state-machine of an actor with four states. For clarity, the image does not contain multiple transitions (actions) for each state. Figure 4.7a depicts a general software oriented action selection FSM that each state has a transition with three steps. First the transition has a blocking read on the input ports, then once the firing rules are satisfied it fires the action and finally has a blocking write on the transition output ports.

An action might read and write a list of tokens, but in hardware an FIFO queue implementation supports a single value in and a single value out which results in an implementation problem. However, a list of tokens might be implemented in four approaches: as a bus, as a concatenation of all list data as a single datum of a larger bit size, as a dual port memory that

Read Fire Write Blocking Read Blocking Write S0 S1 S3 S2 S0

(a) Software Action Selection

Read Write S0 Read Write S1 Read Write S3 Read Write S2 Fire S0

(b) Xronos Action Selection Figure 4.7 – Finite State Machine of Action Selection.

acts as a queue or as a sequential reading and writing single values from the input and output ports. The first two approaches can not be implemented in hardware for two reasons. Firstly, in the general case when the actor’s output port is connected to a successor actor, in which the data is consumed at a different rate than it is produced by the predecessor, then a part of data is lost. Secondly, the guard of an actor might do a peek in an index that is not in the head of the queue, which means that it does not satisfy the firing condition. In other words, all values should be available before the firing condition is resolved. The third approach, which is elegant and its implementation is given in [169], has the disadvantage of multiple latencies when both actors are writing and reading at the same time. Also, as mentioned in the same paper [169] for the same RVC-CAL application, Xronos, which implements the fourth option, has a two times higher throughput.

The fourth approach, reading and writing single tokens one by one, is easier to implement by adding two more states on the actors FSM: the READ and the WRITE state. However, it is necessary to have a local list for each input and output port with the maximum element size of the largest repeat found on the actor’s actions. This might be considered as a disadvantaged but actually it is not, because by having small local memories the FIFO queue size can be decreased. In addition, another significant advantage is that actions with multiple read/writes on different ports can perform a parallel Read on those input ports in the READ state and a

4.5. Actor’s Action Selection Procedure

parallel write on the output ports. This is possible because there are no dependencies between those operations.

Let consider the following example in Listing 4.2:

Listing 4.2 – Foo Actor

actor Foo()

int A,

int B

==> int C:

Act0:action A:[a] repeat 64, B:[b] repeat 32 ==> C:[a[0] + b[0]] end

Act1: action A:[a] repeat 32 ==> C:[a] repeat 32 end schedule fsm s0: s0 (Act0) --> s1; s1 (Act1) --> s0; end end

Actor Foo has two actions Act0 and Act1. Both of them uses the CAL keyword repeat. Act0 reads a list of 64 tokens from port A and 32 tokens from port B and outputs a single token. Act1 reads a list of 32 elements and writes the input list to its output C. Xronos modifies its FSM as depicted in Figure 4.8a by adding the READ and WRITE states. The READ state in Figure 4.8b starts by forking both reading paths in the input ports if the previous state was s0, only port A is read if s1. WRITE state in Figure 4.8c, depending on the previous state writes 1 token for s0 and 32 for s1.

4.5.1 Construction of the Action Selection Procedure

Contrarily to other Orcc code generators, Xronos does not pretty prints the code, but transforms the Dataflow and Procedural IR to a close to hardware intermediate representation and then it applies a scheduling on it. Moreover, Orcc backends do not create a Procedure for the

Action Selection. They just prints it by visiting the actors FSM and the actions inputPatterns

and outputPatterns. In Xronos, the choice was to first create an Orcc Procedure which then it is transforms into an object called Task in LIM IR, and effectuated by the Xronos Procedural IR to LIM mapping process, which is discussed in Section 4.8. Thus, it homogenizes and facilitates the overall transformation process of Procedures in the Procedural IR. Xronos applies the following steps/transformations for creating the Action Selection procedure:

1. ActionSelection Procedure: A new empty procedure with the name ActionSelection is created and added to the actor.

Read Write s0 Read Write S1 S0

(a) Action Selection FSM.

FORK Idx == R_A? Load A Idx == R_B ? Load B JOIN Start End

Read A ? false false Read B ?

true true (b) State READ. Idx == W Store C Start End

Figure 4.8 – Foo actor Action Selection.

Also, if an actor has an FSM but contains actions that are outside of the FSM it will create a transition with the highest priority for each state and will add it to the state. In addition, some actors may contain special actions called initialize. This action is used for initializing variables as well as for any other initialization purpose. AddFSM will create a INIT state with a set of transitions, defined by the number of initialized actions in the code and will then add the INIT state and set it as the initial state. This transformation is applied in order to homogenize the generation of a general actor. 3. WhileBlocks: This step creates a list of Blocks that will accommodate the Blocks created

for the following 7 steps.

4. EnumerateStates: This step creates a BlockBasic which assigns an index to each state of the actor’s FSM;

5. IOLists: This transformation adds five state variables for each input and output port:

tokenIndex represents the number of tokens that read/written, MaxTokenIndex. Re- questTokens is a state variable that indicates the number of the input tokens at a given

state that an action needs so that it can fire. SentTokens signifies the number of the output tokens that should be written on the output port. finally, PortEnable acts on the READ and WRITE states if a parallel reading/writing should be effectuated on those states. In addition, it adds a list of state variable with a name portList for each port with the maximum number of elements specified by the maximum repeat CAL construct

4.5. Actor’s Action Selection Procedure

in each action port. Eventually, it modifies all the InstLoads that target the associate with an input port with the portList. It also does the same modification on InstStore instructions associated with an output port. It should be mentioned that the association is given by the inputPattern and outputPattern of the actions.

6. LoadCurrentStateBlock: This step creates a BlockBasic with a single InstLoad instruc- tion that loads the actor’s initial state index to the FSM’s state variable called fsmState. 7. IsSchedulableBlocks: This step visits all the actions isSchedulable Procedure. As de-

scribed in Section 3.6.1, the isSchedulable procedure returns a boolean value that signifies that the firing condition is satisfied. IsSchedulableBlocks will copy all the isSchedulable procedures body Blocks of the actors in a list of Blocks and will then replace the InstReturn instruction by an InstAssign one, that assigns the return value to

partialFiring local variable.

8. PortStatusBlock: This step will create a BlockBasic and add for all ports of the actor the instruction InstPortStatus. Thus, it is possible to check if there is a token available in an input queue and if there is available space in the output queue.

9. HasTokensBlock: This step creates a BlockBasic with only InstAssigns instructions that will assign a boolean expression t okenI nd ex == RequestTokens that signifies the partial firing condition on inputs. The RequestToken is changed when a state demands the number of tokens to be read by an input.

10. StateBlocks: For each state a BlockIf is constructed to accommodate the state transi- tions. If a state has a single transition, then: the BlockIf condition is an ExprVar that references the value of the variable created by the IsSchedulableBlocks, the BlockIf’s

thenBlocks contains the call of the action’s procedure and a change of the state to

following state on the FSM, and the BlockIf’s else block contains an BlockIf for check- ing if there is a necessity to read from input port of the action and the changes to the READ state. If a state has many transitions, then a nested BlockIf is added on

elseBlocks of the transition that has the highest priority. Figure 4.9 depicts the flow

graph of a state with two transitions. The first transition handles the call for action

A and the second one the call for action B. Both of them are reading from an input PI and writing to an output PO. If Guard A and Guard B are false, then it means that

both actions require input tokens from PI. If both actions have different consumption rates, then P I _r eqest Tokens = max(r epeat(A,P I ),r epeat(B,P I )). The subtraction

P I _t okenI nd ex_r ead = P I _r equestTokens −P I _tokenIndex helps to read only the

necessary tokens for the next state.

11. InifiteBlockWhile: an infinite BlockWhile is constructed and added to the Action Selec-

tion procedure. The condition of the BlockWhile is an ExprBool with a boolean of true.

Call(A) nextState = s1 fsmState = WRITE PO_requestTokens =32 PO_portEnable = true Start End Guard A && PI_tokenIdx == R_A ? Guard B && PI_tokenIdx == R_B ? true Call(B) nextState = s0 fsmState = WRITE PO_requestTokens =64 PO_portEnable = true true PI_tokenIdx < Max(PI)

PI_tokenIdx = PI_requestTokens – PI_tokenIdx PI_portEnable = true nextState = s0 fsmState = READ true false false false

Figure 4.9 – A State that contains two transitions.

Once the Action Selection construction has finished, then two Procedural IR optimizations are applied on it. The first one is called BlockCombine and it combines all the different

BlockBasicto a single one. The second one is called RedundantLoadElimination that

eliminates all the redundant InstLoadsfrom the state variables. Reading from a state

variable has a latency of minimum one clock cycle from a register or a memory. Thus, all guards using the same state variable for the firing condition are combined to a singleInstLoad, and

all followingInstLoadinstructions are replaced byInstAssigninstructions to a local

variable of the procedure.

In document High-level synthesis of dataflow programs for heterogeneous platforms:design flow tools and design space exploration (Page 97-102)