Extra-P - Scalability Engineering for Parallel Programs Using Empirical Performance Models

The Extra-P tool [60] embodies the workflow for empirical performance modeling presented in Figure 2.1. It is designed to be a general-purpose tool to be used with various kinds of applications and use cases. It provides both textual and graphical user interfaces, and produces results that can be explored interactively. Extra-P works either with Cube4 [41,61] input files or plain textual files. As described earlier, Score-P is one possible performance profiling tool that we can use since its output is in Cube4 format. However, there also is a simple generic

POINTS 8 16 32 64 128 EXPERIMENT Time/MPI_Recv DATA 0.283169 0.285326 0.289267 DATA 0.458113 0.473634 0.449258 DATA 0.608647 0.598367 0.620311 DATA 0.904977 0.881244 0.893256 DATA 1.20038 1.19564 1.21402

Figure 2.3: Example of Extra-P’s plaintext format for performance experiments. Plot of the model Selected kernel(s) Call tree exploration

Figure 2.4: The graphical user interface of Extra-P based on PyQt.

text-based format, such that any other measurement tool and workflow can be used as well by converting its output into the text-based format.

Figure 2.3 shows an example of the measurement results, specifically, the results of profiling a single call path, in Extra-P’s textual format. The first line, starting with the keyword

POINTS, introduces the values of the model parameter (i.e., p in the PMNF), which in this

case is the number of processes. It means that the application has been profiled running on 8, 16, 32, 64, and 128 processes, which is the minimum number of values needed. The keyword

EXPERIMENTthen starts a section for a given performance metric and/or call path, in this case theTimemetric for anMPI_Recvcall. A single file can contain any number of such sections. The following lines that start with the keywordDATAcontain the actual measurements for this experiment in the order of the values defined in the first line. In other words, the firstDATA

line corresponds to 8 processes, the second to 16, and so on. The number of measurements in eachDATAline corresponds to the number of repetitions of each experiment. It is three in the example, but might be much higher.

The current version of Extra-P is implemented in C++ and Python. The core logic is written in C++ for performance reasons and the graphical user interface (GUI) is written in Python, using PyQt [62]. The Python code communicates with the C++ core via a defined interface

Figure 2.5: The dialog in which a set of performance profiles can be provided as an input to Extra-P.

that is wrapped using SWIG [63]. The implementation allows different model generator classes to be defined. All of them are derived from an abstract base class ModelGenerator. The model generation algorithm discussed in Section 2.3 is implemented as one subclass of the

ModelGeneratorbase class and we can implement alternative algorithms by defining new subclasses. For example, the automated refinement algorithm mentioned in Section 2.3.1 is implemented as a subclass ofModelGenerator.

Figure2.4shows a screenshot of the Extra-P GUI. The left part of the window is divided into two areas. The upper area is a dropdown box that shows the selected metric and allows users to change it. The lower area contains a tree of call paths with models and their error metrics. By clicking on any one of the call paths, the plot of the corresponding model together with the data points is displayed in the right part of the window. The user can select multiple call paths and each new call path adds a plot to the figure. The user can also configure Extra-P to use different model generators.

Figure 2.5 shows a screenshot of the dialog that Extra-P uses to collect information from the user about a set of performance profiles. Extra-P assumes that each performance profile is located in a separate subdirectory and the names of the subdirectories are in a structured format: <Prefix>_<Parameter name><Value>_r<Repetition><Postfix>. Prefix specifies an optional path relative to where Extra-P was invoked from and the prefix of the sub- directories. Postfix specifies anything that comes after the number of the repetition in the end of the subdirectory name. File name is the name of the performance profile inside each subdi- rectory. By default, it isprofile.cubexsince this is the default name for Score-P profiles.

Parameter namespecifies the name of the input parameter of the model, and Values specifies the

values of this parameter separated by comma. In the example, the name isp, which means the number of processes, and the values are 8, 16, 32, 64, and 128 processes. The field Repetitions specifies the number of repetitions for each value of the input parameter. It is assumed that repetitions are enumerated starting from 1, so for 8 processes in this example, Extra-P will attempt to read profiles from subdirectoriesms2_p8_r1, ms2_p8_r2, ..., and ms2_p8_r10. The last field, Scaling type, which specifies either strong or weak scaling is reserved for future use.

In document Scalability Engineering for Parallel Programs Using Empirical Performance Models (Page 46-49)