code StreamIt compiler, strc StreamIt to OmpSs, str2oss
Stream graph Work
estimates StarSs source code Cholesky H.264 skeleton · · · Partition
Queue sizes Nanos++ Adaptive
scheduler ASM simulator OmpSs compiler (Mercurium) StarssCheck trace Paraver prvanim Program desc. Statistics ASM machine description Compiler decisions Chapter 3 Run-time decisions Chapter 4
Figure 1.13: Tool flow used in this thesis. Items in bold are contributions of this thesis. Figure 1.12 shows an example program using the SPM, and its stream graph. The streaming part of a program is known as a taskgroup, and it comprises the loop following the acotes taskgroup pragma, here on lines 5 to 13. This taskgroup has two tasks, each of which contains the statement or block following an acotes task pragma. The task’s inputs and outputs are identified using the input and output clauses.
The SPM program begins running in a single thread. When execution reaches the taskgroup, all of its tasks are created, each becoming a kernel in the language of this thesis, and the program starts processing data in streams, passing inputs and outputs through the
streams. More information can be found in the ACOTES documentation [CRM+07].
1.6
Tool flow and support tools
Figure 1.13 shows the tool flow used in this thesis, with the contributions of the thesis shown in bold. The left-hand side of the diagram is for the compile-time decisions in Chapter 3, and the right-hand side is for the run-time decisions in Chapter 4.
On the left, the StreamIt source code is compiled using the StreamIt compiler, which, in addition to creating an executable, also creates a file representing the stream graph, in dot format [GKN06], and a text file giving its estimate, for each filter, of the amount of work per firing. These files are the input, together with the ASM machine description, to
1. INTRODUCTION
the partitioning algorithm in Section 3.2. After partitioning, the buffer sizes are determined using the buffer sizing algorithm in Section 3.3.
The right-hand side of the diagram is for the run-time decisions in Chapter 4. The StreamIt source code is first translated to StarSs, using the tool described in Section 5.3. The StarSs source code is built using the Mercurium OmpSs compiler.
The various dynamic scheduling techniques, including the two proposed adaptive tech- niques, were compared using a set of OmpSs benchmarks built in this way.
1.6.1 Debugging using StarssCheck
When people start using a new programming language and compiler, they will soon discover that some of their programs don’t work. They will need to find out why, before they can fix them. A programming language without debug tools may be a fine research vehicle, but it is unlikely to be widely adopted, as users become frustrated by bugs in their code, blame the compiler, and think that the language is hard to use.
Section 5.1 describes a debugging tool, StarssCheck [CRA10a], that was developed as part of this thesis. It finds bugs in StarSs programs, but similar ideas could be used in a tool supporting SPM. StarssCheck was used, for example, to check the output of the StreamIt to OmpSs converter described in Section 5.3. The reasons for targeting StarSs rather than the SPM are that StarSs is more mature, and it already has real users.
1.6.2 Performance visualisation
In this thesis, the main reason to bother to write and compile an application to run in parallel is to improve its performance. When the performance is disappointing, or in some way surprising, it is important to understand why, and this requires some way to see how the application progresses.
The main tools used for this purpose in this thesis were Paraver [CEP] and prvanim (Paraver Animator). Paraver is a trace visualisation tool, developed at BSC–CNS. Paraver reads a trace in a straightforward format [CEP01], which is a text file that can be easily created either using custom code or the Mintaka library [Nanb].
Section 5.2 describes the prvanim tool, which was developed in the course of this thesis. It takes a Paraver trace, and produces an animation that shows the progress of the application through time. It is a simple tool, which has, nonetheless, proven quite useful.
1.6.3 StreamIt to OmpSs
In order to use the same StreamIt benchmarks throughout the thesis, these benchmarks had to be translated to StarSs, so they could be compiled by the OmpSs compiler. For that purpose, we developed str2oss, a source-to-source compiler that translates from StreamIt to StarSs. This tool is described in Section 5.3. It does not support kernel fusion, but it does support unrolling, using a control file given by the user.
Each filter is translated into a work function, and, if required, an initialisation function. The work function does the work required by one firing of the filter, and is marked as a StarSs task. The tool creates a main thread, which allocates memory for the streams, and implements the steady state by calling the tasks in sequence.