Restricting Output - Allinea Forge User Guide. Version 6.0.1

To keep file sizes within reasonable limits .map files will contain a summary of the program output limited to the first and last 500 lines (by default). To change this number, profile with the environment variable ALLINEA KEEP OUTPUT LINES set to the prefered total line limit (ALLINEA KEEP OUT- PUT LINES=20will restrict recorded output to the first 10 lines and last 10 lines). Setting this to 0 will remove the line limit restriction, although this is not recommended as it may result in very large .map files if the profiled program produces lots of output.

The length of each line is similarly restricted to 2048 characters. This can be changed with the environment variable ALLINEA KEEP OUTPUT LINE LENGTH. As before setting this to a value of 0 will remove the restriction, although this is not recommended as it risks a large .map file if the profiled program emits binary data or very long lines.

19.4 Saving Output

By right-clicking on the text it is possible to save it to a file. You also have the option to copy a selection to the clipboard.

20 Source Code

MAP provides code viewing, editing and rebuilding features. It also integrates with most major version control systems and provides static analysis to automatically detect many classes of common er- rors.

The code editing and rebuilding capabilities are not designed for developing applications from scratch, but they are designed to fit into existing profiling sessions that are running on a current executable. The same capabilities are available for source code whether running remotely (using the remote client) or whether connected directly to your system.

20.1 Viewing

Source and header files found in the executable are reconciled with the files present on the front-end server, and displayed in a simple tree view within the Project Files tab of the Project Navigator window. Source files can be loaded for viewing by clicking on the file name.

The source code viewer supports automatic colour syntax highlighting for C and Fortran.

You can hide functions or subroutines you are not interested in by clicking the ‘−’ glyph next to the first line of the function. This will collapse the function. Simply click the ‘+’ glyph to expand the function again.

Figure 102: Source Code View

The centre pane shows your source code, annotated with performance information. All the charts you will see in Allinea MAP share a common horizontal time axis. The start of your job is at the left and the end at the right. The sparkline charts next to each line of source code shows how the number of cores executing that line of code varies over time.

What does it mean to say a core is executing a particular line of code? In the source code view, MAP uses inclusive time, that is time spent on this line of code or inside functions called by this line. So the main()function of a single-threaded C or MPI program is typically at 100% for the entire run. Only ‘interesting’ lines get charts—lines in which at least 0.1% of the selected time range was spent. In the figure above we can see three different lines meet this criteria. The other lines were executed as well, but a negligible amount of time was spent on them.

The first line is a function call to imbalance, which was running for 18.1% of the wall-clock time. If you look closely, you’ll see that as well as a large block of green there is a sawtooth pattern in blue. Colour is used to identify different kinds of time. In this single-threaded MPI code we have three colours:

• Dark green Single-threaded computation time. For an MPI program, this is all computation time. For an OpenMP or multi-threaded program, this is the time the main thread was active and no worker threads were active.

• Blue MPI communication and waiting time. All time spent inside MPI calls is blue, regardless of whether that is in MPI_Send or MPI_Barrier. Typically you want to minimize this, because the purpose of most codes is parallel computation, not communication for its own sake.

• Orange I/O time. All time spent inside known I/O functions such as reading and writing to the local or networked filesystem is shown in orange. You definitely want to minimize time spent in I/O and on many systems the complex data storage hierarchy can cause unexpected bottlenecks to occur when scaling a code up. MAP always shows the time from the application’s point of view, so all the underlying complexity is captured and represented as simply as possible.

• Dark purple Accelerator. All the time the CPU is waiting the accelerator to return the control to the CPU. Typically you want to minimize this, making the CPU work in parallel with the accelerator using accelerator asynchronous calls.

In the above screenshot we can see the following:

• First a function called imbalance is called. This function spends most of its time in computation (dark green) and around 15–20% of it in MPI calls (blue). Hovering the mouse over any graph shows an exact breakdown of the time spent in it. There is a sawtooth pattern to the time spent in MPI calls that we will investigate a little further below.

• Next the application moves on to a function called stride, which spends all of its time computing. Great! Later we will see how to tell whether this time is well spent or not. We can also see an MPI synchronization at the end. The triangle shape is typical of ranks finishing their work at different times and spending varying amounts of time waiting at a barrier. Wherever you see triangles in these charts you see imbalance.

• Finally, a function called overlap is called, which spends almost all of its time in MPI calls. • The other functions in this snippet of source code were active for <0.1% of the total runtime and

can be ignored from a profiling point of view.

As this was an MPI program, the height of each block of colour repesents the percentage of MPI processes that were running each particular line at any moment in time. So the sawtooth pattern of MPI usage actually tells us that:

• The imbalance function goes through several iterations.

• In each iteration all processes start out computing; there is more green than blue.

• As execution continues more and more processes finish computing and transition to waiting in an MPI call, causing the distinctive triangular pattern showing workload imbalance.

• As each triangle ends all ranks finish communicating and the pattern begins again with the next iteration.

This is a classic sign of MPI imbalance. In fact, any triangular patterns in MAP’s graphs show that first a few processes are changing to a different state of execution, then more, then more until they all synchronize and move on to another state together. These areas should be investigated!

You can explore this in more detail by opening the examples/slow.map file and looking at the imbalance function yourself. Can you see why some processes take longer to finish computing than others?

In document Allinea Forge User Guide. Version 6.0.1 (Page 153-157)