Static analysis - Analysis and optimization of dynamic dataflow programs

The methods based on a static analysis of the source code range from simply counting the number of operations up to defining dependencies among the basic blocks. This informa- tion can be used during different optimization stages. For example, the lower and upper run-time of a given program on a given processing element can be directly evaluated from the operator count analysis [74, 75]. While this simple counting technique provides a very accurate evaluation of the operations, it cannot handle loops, recursion, conditional statements and data-dependent applications except for some particular cases. Explicit or implicit enumeration of program paths can handle loops and conditional statements and can yield bounds on best and worst case run-time [74, 75, 50]. The main drawback of these techniques is that the typical real processing complexity of many algorithms heavily depends on the input data statistics while static analysis can only detect upper and lower bounds. Restricted programming styles such as absence of dynamic data structures, recursion, and bounded loops are required in order to correctly perform a static analysis [74].

3.2.1 Source lines of code

Source lines of code (SLOC) is one of the most-used metric when dealing with program development complexity and maintainability. Using the definition proposed in [76], a line of

code is a line of program text that is not a comment or blank line, regardless of the number of

3.2. Static analysis

program headers, declarations, and executable and non-executable statements. However, the SLOC of a program can be strongly dependent on how the counting procedure is interpreted. For this reason, the number of lines of code should be used only as a crude complexity measure [77].

3.2.2 Operators count

As for the SLOC metric, the occurrence of each operator can be used as a crude complexity measure of the program. Table 3.1 reports the set of unary, binary, data handling and flow control operators available for the CAL language. However, basing the program complexity on the number of operator occurrences can be misleading as conditional blocks (e.g.ifand

while) are taken into account only once.

3.2.3 Cyclomatic complexity

The cyclomatic complexity analysis [78] is a quantitative measure of the complexity of programming instructions. It directly measures the number of linearly independent paths through the program source code. In other words, this is a software metric that equates complexity to the number of decisions in a program. Developers can use this measure to determine which modules (i.e. network, actor, action, procedure) of a program are overly complex and need to be re-coded. For each module, the metric can be calculated either from evaluating the CFG of the module (i.e. see Section 2.4.3) or from evaluating the program’s statements. The cyclomatic complexity is defined as:

v = e − n + 2p (3.1)

where e is the number of edges, n is the number of nodes, and p is the number of modules. It must be noted that this equation is based on the assumption that the CFG is a strongly- connected graph. The cyclomatic complexity of a module also gives the maximum number of linearly independent paths through it. In other words, it can be evaluated by counting the branch conditions in a module. Hence, Equation (3.1) can be redefined such as:

v = b + 1 (3.2)

where b represents the number of simple branch conditions. The formulation defined in Equation 3.2 is convenient because it allows developers to calculate the cyclomatic complexity of a program without having to use graph analysis. However, this only applies to individual modules in such that they only contain single-entry and single-exit, structured, blocks of code [79].

Table 3.1: Profiled executed operators and statements.

Kind Symbol Name

Unary ~ binary not ! logical not − unary minus # number of elements Binary

& bit and | bit or ∧ bit xor

== equal

! = not equal

≥ greater than or equal > greater than

≤ less than or equal < less than

&& logical and k logical or − minus + plus ∗ times / division d i v integer division ∗∗ exponentiation % modulo << shift left >> shift right Data Handling ASSIGN assign CALL call LOAD load STORE store

LIST_LOAD list load

LIST_STORE list store

Flow Control if if then else statement

3.2. Static analysis

3.2.4 Halstead metrics

Halstead metrics [80] are used to deduce a program production and quality based on the numbers of operands and operators used in the source code. Halstead metrics are based on the following set of parameters:

• n1the number of distinct operators present.

• N1the total number of operators present.

• n2the number of distinct operands present.

• N2the total number of operands present.

In the context of a dataflow program, these parameters can be defined with different levels of granularity: they can be defined for the overall program or for each actor, action and procedure. Some of the most-used Halstead metrics are the following:

• Program length: describes the size of the abstracted program obtained by removing everything except operators and operands from the original source code. It is defined as:

N = N1+ N2 (3.3)

Contrarily to the SLOC metric (see Section 3.2.1), Halstead length gives a clearer ac- counting of the overall statement complexity. In fact, SLOC does not tell anything about how complex the lines of code are.

• Program volume: models the number of bits required to store an abstracted program of length N . It is defined as:

V = N log2(n1 + n2) (3.4)

With this formulation, it is supposed that both the operators and the operands are encoded as binary strings of uniform (and potentially non-integral) length.

• Program level: describes the ratio between the volume V of the current program and the most compact volume of the same algorithm implementation [80]. It is defined as:

L = 2 n1

n2 N2

(3.5) In other words, a longer implementation of an algorithm has a lower program level than a shorter implementation of the same algorithm.

• Program difficulty: is defined as the inverse of the program level, such as:

D = 1

L (3.6)

In other words, a longer implementation of an algorithm has a higher difficulty com- pared to a shorter implementation of the same algorithm.

• Programming effort: defines the effort required to develop (or understand) a program. It is defined as:

E = D V (3.7)

In other words, the programming effort is proportional to both the difficulty and the volume of the program.

• Programming time: defines the time in seconds required to develop the program. It is defined as:

T =E

S (3.8)

where the S value is the Stroud number, defined as the number of elementary discrimi- nations performed by the human brain per second [81]. S ranges from 5 to 20 and its value for software scientists is generally set to 18.

In document Analysis and optimization of dynamic dataflow programs (Page 68-72)