Token Logging and KPN Tracing - Programming heterogeneous MPSoCs : tool flows to close the soft

This section provides further details about the first two phases of the parallel flow in Figure 6.1. Since the tracing approach is close to the one of the sequential flow, the implementation details of the instrumentation process are omitted in this section. Instead, the section focuses on the definition and the format of the KPN traces in Section 6.2.1, and on how timing is added to traces in Section 6.2.2.

1 <Scheduler Name="RR0" SchedulingPolicy="RoundRobin">

2 <Param TimeSlot="1000"> <ProcessContainer List = "P1 P2 P3" /> </Param>

3 </Scheduler>  4 <Mapper>

5 <ProcessorContainers>  6 <PeGroup Name="RISCS" Processors="irisc0" />

7 <PeGroup Name="VLIWS" Processors="ltvliw0 ltvliw1" /> </ProcessorContainers>

8 <ProcessMapping>

9 <Connect Scheduler="RR0" Group="RISCS" />  10 </ProcessMapping>

11 <ChannelMapping>

12 <Map Fifo="Channel_P1-P2" Primitive="over_shared" Resource="shared_mem1"/>  13 </ChannelMapping></Mapper>

14 <Buffersizing>

15 <FifoBound Fifo="Channel_P1-P2" Bound="4"/>  16 </Buffersizing>

6.2. Token Logging and KPN Tracing 103 c) b) a) Write Read t ... ... ... ... ... ... Write Read t

Figure 6.3: Process trace example. a) General process and sample control flow graph.

b) Sample execution trace. c) Graphical representation of a process trace.

6.2.1 KPN Traces

As introduced with the motivational example in Section 2.5.3 (see Figure 2.8), in order to predict the effect of a mapping, it is important to understand the control flow within every process in the KPN application. While in static dataflow programming models the behavior of a process is described in the specification itself, in a CPN specification it is hidden in the C implementation of the processes. At runtime, a process may follow different paths in its internal CFG. Along these paths, the process may access (read or write) any of its channels. In order to characterize the behavior of a KPN application, every process is analyzed in isolation with the sequential tracing flow described in Section 5.2.1. As mentioned before, this analysis is performed on the standalone implementation of the processes and using the channel logs produced by the token logging phase.

Recall that sequential tracing produces a file that records the control flow followed by the application and the memory accesses (see Figure 5.2b). This information is used to build the model of each process according to Definition 2.44 in the same way it was done for a sequential application (see Section 5.2.2). Note, however, that for the purpose of scheduling, only three types of runtime events are relevant: a read access to an input channel, a write access to an output channel and a call to the time checkpoint. Channel accesses are important because they can potentially change the state of processes in the KPN application, e.g., a read from an empty channel would block the reader process. The time checkpoint event is needed to assess whether or not timing constraints are met. In order to make it easier to recognize these events in the execution trace, the instrumentation process was slightly modified.

The execution of a process is then characterized by a sequence of events, separated by arbitrary computations. The sequence is called a process trace and the arbitrary computa- tions are denoted segments. More formally,

Definition 6.1. A segment (SPA) of a process PA = (SEPA, CGPA, πPA) is a sequence of statements SPA = (s1, . . . , sn) that determine a path in its CFG, where s1 follows imme-

diately from a synchronization event and sn is the only statement in the sequence that

generates a synchronization event. Naturally, ∀s_i ∈ SPA_{, s}

i ∈ SEP

. The set of all seg- ments in an application A is denoted SA.

Definition 6.2. A process trace (TPA) of a process PA is a sequence of segments TPA = (SPA

1 , . . . , SP

m ) observed during the tracing process. The set of all process traces of an

104 Chapter 6. Parallel Code Flow The concepts of segments and traces are illustrated in Figure 6.3. Figure 6.3a shows a generic process P_x with n inputs and m outputs together with its CFG. A portion of the execution trace is shown in Figure 6.3b, with the execution path marked in black solid lines. This path corresponds to a segment that starts after a write to channel Oi,

ends with the statement that reads from channel Ij and contains all the statements in

between from basic blocks BB1, BB2, BB4, BB6, BB2 and BB5. Finally, Figure 6.3c shows a

graphical representation of the process trace, in which read and write events are marked by arrows. The time elapsed in between synchronization points, i.e., the duration of a segment, depends on the processor the process is mapped to. This time (∆t in the figure) is obtained via performance estimation, as discussed in the next section.

6.2.2 Sequential Performance Estimation Revisited

The parallel flow requires time estimates of the sequential portions between synchronization events, i.e., for segments. This represents a coarser granularity than the one required in the sequential flow (see Section 5.2.3). For this reason, all the techniques introduced in Section 2.2.2 are enabled in the parallel flow. The purpose of this estimation pro- cess is to determine the duration of every segment SPA ∈ SA_{, ζ}PT

seg(SP

). Provided the estimation for a segment, it is possible to define the estimation for a whole trace TPA,

ζ_tracePT (TPA_{) =}P

S∈TPASP A

. This section describes how the estimation for the segments is performed.

The table-based approach can be directly applied to compute the time of each segment for each processor type. For a generic segment S = (s1, . . . , sn), the cost is computed as ζ_segPT(S) =Pn

i=1ζtbPT(si).

Additional modules were needed to enable the remaining three sequential performance estimation methods: Totalprof-based, simulation-based and measurement-based. To enable Totalprof, a new plugin was implemented that instruments the channel access functions and the time checkpoint function. At every event, the plugin exports a time stamp into a text file. For simulation-based, a set of scripts and monitoring classes were implemented that directly connect to the processor ISSs that comes with Synopsys PA. These classes monitor the execution of each process and export time stamps as well. Fi- nally, a simple postprocessing tool was implemented that analyzes the execution logs from actual HW boards to produce a process trace. Each of these three profiling flows works on the output of the standalone cpn-cc backend, and produces a file like the one in Listing 6.4. The file records all the events that occurred during the profiling run. In the example, the process with identifier 5 is entered (see Line 1), it then writes to channel 0 after 1446 cycles from the source code line 193 (see Line 2), writes to channel 3 after 10 cycles from source line 194 (see Line 3), reads from channel 2 after 5685 cycles from source line 365 (see Line 4_{) and hits a time checkpoint after 5 cycles (see Line 5).}

1 e 5 // enter process with id 5

2 w 0 1446 193 // Write to channel 0 after 1446 cycles from line 193 3 w 3 10 194 // ...

4 r 2 5685 365 // Read from channel 2 after 5685 cycles from line 365

5 f 5 // Reached time check point after 5 cycles.

6.3. Building and Evaluating the Model 105

In document Programming heterogeneous MPSoCs : tool flows to close the software productivity gap (Page 112-115)