4.3 Procedural IR Transformations
To prepare the IR for hardware synthesis some additional transformation should be applied before converting the Procedural IR of each action to the Openforge LIM IR. Thanks to the SSA representation of a Procedure, the constant propagation algorithm, and the dead code elimination are easy to implement. For the hardware synthesis the variables and the operation should have the correct bit size for reducing the resource footprint. Operations such as division or modulo are often not synthesizable for either ASICs or FPGAs. Thus, Xronos is transforming these operations into synthesizable components.
4.3.1 Expression Evaluator/Simplification
Expression evaluator/simplification is a helper visitor that finds algebraic identities to simplify the expression. For instance, a constant folding transformation as described in Section 4.3, uses the Expression Evaluator for streamlining the operations of constants.
Table 4.2 shows some identities that can be handled by the expression simplification if the constant values of the expressions are unknown.
Table 4.2 – Algebraic identities for Expression Simplificator. WithV and logic and operator,
andW or logic or operator.
a + 0 = a a − 0 = a a − a = 0 a ∗ n2= a << n
a ∗ x1 = a a ∗ 0 = 0 a ÷ 1 = a a/n2= a >> n
a >> 0 = a a >> 0 = a aV a = a aW a = a
The Expression Evaluator checks if the value of ExprUnary or ExprBin is known. If it is, then knowing the operation, the calculation is effectuated and the ExprUnary or ExprBin is replaced by constant Expressions such as ExprInt, ExprUint, etc..
4.3.2 Single Read and Write Register Optimization
Multiple readings and writings to the same memory are expensive operations. Each of them requires at least one clock cycle for either retrieving or writing a variable value. Furthermore, multiple read/write operations increase the latency of the overall execution of the procedure. To minimize this latency, Xronos traverses the CFG of a procedure and stores all definitions and
uses of the scalar state variables to a set calledUsedVars. At the entry of theBlockBasic
CFG node, Xronos inserts for each v inUsedVarsa Load instruction with a target t empv. Af-
ter this operation it propagates for all the uses of v the variable t empvand replaces load/store
instructions with assignments. Finally, at the exit of theBlockBasicCFG node, it inserts
store instructions for each v with a value of t empv. Here it should be mentioned that this
optimization is effectuated at the Procedural IR level and may increase the hardware’s critical path length of an action.
actor Actor()
int IN ==> int OUT:
int a := 0; int b := 0; int c := 0;
action IN:[token] ==> OUT:[c] do b := token; b := a + b; c := a + b; end end
(a) Orignial Code
actor Actor()
int IN ==> int OUT:
int a := 0; int b := 0; int c := 0;
action IN:[token] ==> OUT:[c] var
int temp_a, int temp_b, int temp_c do
temp_a := a; temp_b := b;
temp_b := token;
temp_b := temp_a + temp_b; temp_c := temp_a + temp_b;
a := temp_a; b := temp_b; c := temp_c; end
end
(b) Optimized Register Use.
Figure 4.3 – Single Read and Write Register Optimization. Only a single read and a single write for a, b, and c state variables.
4.3.3 Uninitialized Variables
Listing 4.1 – An actor with unitialized local variable
actor Uninitialized() ==> int O: Act0:action ==> O:[token] var int a, int token, do token := 5 + a; end end
Uninitialized variables are trivial to find once the CFG and liveliness are computed. It is sufficient to retrieve the entry Basic Block n0of the CFG and its liveness. For every variable v that is not defined in n0but found on the liveness set of n0, v is an uninitialized variable. By default, Xronos initializes those variables to zero or false (depending the type of the variable) and emits a warning to the programmer. In the Listing 4.1, Xronos will emit a warning that a local variable "a" in action Act0 is uninitialized.
4.3. Procedural IR Transformations
(a) Code (b) CP Pass 1 (c) CF (d) CP Pass 1
Figure 4.4 – Constant Propagation (CP) and Constant Folding (CF).
4.3.4 Constant Folding/Propagation
Constant propagation searches for a constant value expression c that is assigned to a variable
dl: t ← c, and another statement dnthat uses t. dn: x ← t Bi nOp y. Thanks to the reaching
definition analysis pass it is known that t is constant in dn if dl reaches dn, and no other
definitions of t reach dn. Thus, after constant propagation, dn: x ← c Bi nOp y. C
Constant folding begins once the constant propagation has terminated. This transformation will search for a binary or unary expression that contains only value expressions and depending on the operator will calculate the new expression value, and will assign to the target this new expression.
Figure 4.4 represents the constant propagation and folding in action. For each procedure, the constant propagation is relaunched up until no further modification is possible.
4.3.5 Dead Code Elimination
Dead code elimination acts on Instructions, Blocks, and Actions. If for an instruction with a target variable t and t ← ... contained in BlockBasic b such that t is not Li veOut(b) then this instruction can be removed. This also can be resolved by using the use/def chain within the SSA representation. If the variable t has no use, then it can be removed.
Constant propagation also acts on theBlockIfcondition. Thus, if the condition has a
boolean expression value of true, then all Then Blocks are copied to the container of the
BlockIfand the BlockIf is removed. Consequently, if the condition is false then the Else
Blocks are copied to the container Block and theBlockIfis removed and if a condition on a
While Block is false, then theBlockWhileis removed.
Dead Code Elimination is also applied on the Dataflow IR. If an actor is parametrized, it is possible that some actions contain a guard condition that includes the actor parameter value. Hence, if a guard on an Action is false then this action is eliminated from the dataflow model, and if this action is contained in a transition of the FSM of the actor then this transition is also removed.
4.3.6 Type Casting
Type casting is a necessity for having a bit accurate execution of a program. Is applied to the parameters of a function and to assignments. Each parameter of a function should have the same type of the function’s corresponding argument. If this is not the case, then a cast operation is necessary. The same applies when a variable with a different type or the same type with different bit size is assigned (stored or loaded) to a left-hand variable in an instruction such as t ← ....
For casting an expression, the Least Upper Bound should be defined. LUB represents the minimal type to which both Expressions a and b can be assigned. Xronos defines its LUB rules as defined in Table4.3. Type casting is applied as a visitor to the Procedural IR. All Block Instruction and Expression classes are iteratively visited up until all LUB rules have been employed.
Table 4.3 – Lest Upper Bound on Types
BitSize b
int unit float string
B
it
S
iz
e
a int int max(a,b) i nt
(
max(a, b) if a > b
max(a, b + 1) if a < b float X
uint i nt(max(a + 1,b) if a > b
max(a, b) if a < b uint max(a,b) flaot X
float float float float X
string X X X string
4.3.7 Division and Modulo Implementation
Division and Modulo operations are supported by HDL simulators but not by logic synthesizers. In Xronos, those operations are replaced by synthesizable ones. The Type of the numerator and denominator plays an important role on how the division is effectuated. If either the numerator or denominator type is signed, then the unsigned one should be casted as integer. If the bit size of the unsigned one is greater than the integer one, then both of them should be casted as integers and the bit size of the signed integer Type should be incremented by one. As it can be seen on Algorithm 4 the division is bit accurate and its output is an unsigned integer with a bit si ze = maxSi ze(num,den). The algorithm takes si ze number of clock cycles to finish the operation. This algorithm is implemented in Xronos as a Procedural IR visitor. If an ExprBin has an operator of division or modulo, then num and den are extracted from the expression and two new variables are added to the procedure. The Algorithm 4 is constructed programmatically and added to the instructions that contain the Division/Modulo expression. Algorithm 5 represents the integer version of the algorithm. To avoid overcharging
4.3. Procedural IR Transformations
Algorithm 4: Unsigned Integer Divisions and Modulo replacement
1 defDivisionModulo(isDivsion, num, den): 2 size = maxSize(num,den);
3 uint(size) result := 0;
4 uint(size) remainder := num;
5 uint(size) mask := 1 « (size - 1);
6 for i = 0 to si ze do
7 uint(size) numer := remainder » (size - i);
8 if numer >= den then
9 result := result or mask;
10 remainder := remainder - (den « (size - i)); 11 mask := mask » 1;
12 if not isDivision then
13 result := remainder;
14 return result;
Algorithm 5: Integer Divisions and Modulo replacement
1 defDivisionModulo(isDivsion, num, den): 2 size = maxSize(num,den);
3 int(size) result := 0;
4 int(size) remainder := num;
5 int(size) mask := 1 « (size - 1);
6 int(size) denom;
7 int(size) numer;
8 int flipResult := 0;
9 if num < 0 then
10 num := - num;
11 flipResult = flipResult xor 1;
12 if den < 0 then
13 den := - den;
14 flipResult = flipResult xor 1;
15 denom := den and (1 « size) ;
16 for i = 0 to si ze do
17 uint(size) numer := remainder » (size - i);
18 if numer >= denom then
19 result := result or mask;
20 remainder := remainder - (den « (size - i));
21 mask := (mask » 1) and ((1 « size) - 1);
22 if flipResult != 0 then
23 result = -result;
24 if not isDivision then
25 result := remainder;
the algorithms with more source code lines, the statement that contains the result variable is deleted in both algorithms.