Data-dependent analysis - Analysis and optimization of dynamic dataflow programs

The execution of DDF programs can vary according to a particular input stimulus (see Section 2.3.3). For this reason, complexity of a DDF program cannot be defined only through a static code analysis as the one illustrated in the previous section. In other words, in order to identify the program’s basic structure and complexity with different levels of abstraction, the DDF program should be executed considering a statistically meaningful set of input sequences [1]. The different approaches that are generally used are:

• Binary-code execution: where a low-level code representation of the dataflow program is generated and successively profiled through an instrumented platform-dependent (host-)execution [33, 82, 83, 84, 85].

• Code interpretation: where the dataflow program IR is executed through a platform- independent code interpretation [1, 73, 86].

The main difference between these two approaches is how the program execution abstracts from the platform and how results are biased by low-level code optimizations. The complexity measure obtained through a binary code-execution can be dependent on the particular

3.3. Data-dependent analysis

platform where the program is executed and can be biased by low-level code optimizations performed by the compiler. Contrarily, with a IR-code interpretation, the complexity measure is totally platform-independent and not biased by low-level code optimizations. During a data-dependent analysis it is possible to identify the program’s basic structure and complexity with different levels of abstraction, independently of the approach. Two main axes are typically recognized: the computational load and the data-transfers and storage load.

3.3.1 Computational load

The computational load is expressed in terms of executed operators and control statements (i.e. comparison, logical, arithmetic and data movement instructions). It is possible to model the firing time of an action firing based on the number of its executed operands and control statements retrieved during the program code interpretation. For each action firing sithis is

defined such as:

w (si) =

cjo(si)j (3.9)

where o(si)jrepresents the number of executions of the j − th operator or control statements

performed by the action firing siand cja weight for the respective operator or control state-

ments. It must be noted that cjcan be defined according to a desired target architecture. As for

the static analysis discussed in Section 3.2, Table 3.1 reports the set of operators and control statements that can be retrieved interpreting a CAL program.

3.3.2 Data-transfers and storage load

The data-transfers and storage load are expressed in terms of internal actor variable utilization, input/output port utilization, buffer utilization and token production/consumption. During the program code interpretation, some statistical information concerning the actor internal variables and the buffer utilization can be stored to evaluate the memory load and utilization.

Internal actor variables

During the program code interpretation, for each firing and each actor internal variable the following information can be collected:

• Writes: number of writes that each firing has made on an internal actor variable. • Reads: number of reads that each firing has made on an internal actor variable. • List writes: number of writes that each firing has made on an internal actor list variable. • List reads: number of reads that each firing has made on an internal actor list variable.

Tokens and buffers

During the program code interpretation, the following information can be collected for each firing and each buffer:

• Writes: number of tokens written on a buffer. • Reads: number of tokens read from a buffer.

• Peeks: number of peeks (i.e. test of tokens presence) made by each firing on the respec- tive input buffers.

• Read miss: number of unavailable tokens on the input buffers that made the selected action not fireable.

• Write hit: number of unavailable token places on the output buffers that made the selected action not fireable.

Furthermore, for each buffer, the maximal occupancy can be considered as a measure of an initial space estimation of the buffer size requirement.

3.4 Conclusions

In this chapter the main requirements of the profiling of a dataflow program have been summarized. It has been shown how actors can be analyzed and their behavior classified as static, cyclo-static and dynamic. Successively, the different static code analysis metrics have been illustrated. These are the elementary count of source lines of code, the operator count but also the more complex cyclomatic and Halstead metrics. Successively, data-dependent analysis for DDF programs has been discussed. The concepts of computational load and data-transfer and storage load have been introduced.

4 Exploring the design space of

dataflow programs

Complex software systems may have many design points in terms of selection of software components and hardware architectures for implementation. These point choices create a large space of possible design solutions called the design space. The design process requires exploring through this design space to find design solutions before the actual implementation. The aim of the design space exploration (DSE) is to find design solutions that satisfy functional performance constraints and/or optimize portions of the system. In addition, the heterogene- ity of modern parallel architectures and the diverse requirements of target applications greatly complicate modern systems design. Developing efficient programs for this kind of platform requires design methodologies that can deal with system complexity and flexibility. This has lead to the notion of system-level design, where key roles are played by aspects such as high-level modeling and simulation, and separation of concerns [87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97]. In this context, the exploration of the design space becomes an essential step when implement- ing applications to heterogeneous and parallel platforms. This is due to the combinatorial explosion of design options when dealing with multiple concurrent processing units. In order to have an efficient implementation and integration process, the design has to be suffi- ciently modular and portable, without the need of any or partial implementation and manual rewriting.

4.1 Orthogonalization of concerns

Orthogonalization of concerns is a well-established design paradigm [98]. Alternative solutions of the design space can be efficiently evaluated through design performance estimations. One of the main features of this design methodology is the separation between:

• Functional behavior and architecture. • Communication and computation.

• A functional specification, given as a set of explicit or implicit relations which involve inputs, outputs and possibly internal state information.

• A set of properties that the design must satisfy, given as a set of relations over inputs, outputs, and states, that can be checked against the functional specification.

• A set of performance indexes that evaluate the quality of the design (e.g. in terms of cost, reliability, speed, size) given as a set of equations involving inputs and outputs.

• A set of constraints on performance indexes, specified as a set of inequalities.

The functional specification fully characterizes the operation of a system, while the performance constraints bound the cost. In other words, target points of the design space can be formulated in terms of minimization problems where the objective functions are defined as performance indexes and constraints as inequalities of the problem. In the following, the concept of orthogonalization of concerns is illustrated using the formalism described in [98], where the notions of model of computation, model of architecture and mapping are used.

4.1.1 Model of computation

The Model of Computation (MoC) is a formal representation of the operational semantics of networks of functional blocks describing computation [99, 98]. Depending on the modeling perspective, MoCs can be classified as an abstract or executable description [100]. Abstract models are used to define the application workload without executing the specification. On the other hand, executable specifications allow different abstraction levels: it can directly represent the application or, for example, a discrete-event performance model of the application itself. In the context of this thesis, only abstract dataflow MoCs are analyzed; more precisely, MoCs where the taxonomy can be described as illustrated in Section 2.3.

4.1.2 Model of architecture

The Model of Architecture (MoA) is a formal representation of the operational semantics of networks of functional blocks describing architectures [90, 98, 101, 102]. Depending on the modeling perspective, a MoA can be classified as an abstract or an executable architecture description [100]. Abstract models are used to represent performance in a symbolical manner. For example they associate the required latency in clock cycles with each operation without actually executing any hardware description. On the other hand, executable specifications allow to more precisely model state-dependent behavior, such as the timing of caches and pipelines. In the context of this thesis, only abstract dataflow MoCs are analyzed as the ones illustrated in [101, 102].

In document Analysis and optimization of dynamic dataflow programs (Page 72-77)