Dissertation Structure - Scalability Engineering for Parallel Programs Using Empirical Performa

The structure of this dissertation is as follows. We begin with a discussion of the state-of- the-art in automated empirical modeling of performance in Chapter2. This chapter describes the modeling workflow, covers the Extra-P tool, which is used extensively in this work, and finishes with an overview of the approach to model functions with more than one parameter.

Following Chapter 2, we present each contribution of this dissertation in a separate chapter. Chapter 3 presents our scalability validation framework including the case studies that were used to evaluate it. Chapter 4 describes the Task Dependency Graph (TDG) abstraction and presents the techniques for constructing, analyzing, and replaying TDGs. This chapter lays the foundation for Chapter5, which discusses our second contribution, namely the technique for practical isoefficiency analysis. We continue with Chapter6that covers studies related to each one of the contributions. Finally, Chapter7finishes with conclusions and an outlook of future research.

2 Empirical Performance Modeling

This chapter focuses on the state-of-the-art in automated empirical performance modeling, specifically, the results of Calotoiu et al. [43, 46], and provides the necessary background needed to understand the techniques in Chapters 3 and 5. Most of the work was conducted as part of the Catwalk project [47,48] under the auspices of the DFG Priority Programme 1648 Software for Exascale Computing (SPPEXA).

2.1 Overview

As was briefly discussed in Section1.3.3, analytical performance modeling expresses different aspects of application performance with analytical expressions. It is a powerful technique in performance engineering as it allows developers to get a preliminary feedback on the design of their applications. They can, therefore, understand how close the performance is to the optimum or adapt the design to the requirements of larger problem and machine sizes.

Analytical performance modeling was successfully used in a number of previous studies [49, 50, 51] to model the performance of HPC applications. The process of constructing the models, however, is very laborious and requires time and expert knowledge about the code. First, an initial model is suggested following an in-depth analysis of the algorithms, and then experimental data is gathered to find the exact coefficients in the model and to verify its correct- ness. It is easy to see that this a trial-and-error approach that requires the person performing the analysis to be a domain expert or to work closely with one. If the first guess of the model is incorrect, a different model has to be suggested and verified against the experimental data. Moreover, the process has to be repeated for each part of the code we want to model. The technique for empirical performance modeling we discuss in this chapter is mostly automated. It constructs empirical models, as well as requirements models, for each function (or code section) accurately and quickly. This means no domain expert is required and all the code can be covered in the analysis. This technique was first studied by Calotoiu et al. [43] in the context of identifying scalability bugs. A scalability bug is a part of the program in which scaling behavior is unintentionally not good. Figure2.1gives an overview of the different steps necessary to find these bugs. This workflow can be used not only to search for scalability bugs, but also produce performance models that can improve our performance engineering efforts. In other words, these models can predict future performance and provide us with insights into the application behavior, such as the compute time needed to solve a larger problem.

The technique for automated empirical modeling has a number of limitations. It is sensitive to noise in the measurements and to behavior that changes unexpectedly, for example, when a function switches its algorithm for some specific input or number of processes. Besides, as discussed below, this technique uses a specific form for the models and cannot model accurately

Performance profiles Performance profiles Performance profiles Performance measurements Statistical quality control Model generation Model refinement Scaling models Scaling models Scaling models Accuracy saturated? Performance extrapolation Ranking of kernels Ranking of kernels Ranking of kernels Kernel refinement Comparison with user expectations Yes No

Figure 2.1: Workflow of scalability-bug detection proposed by Calotoiu et al. [43] that can be generalized to empirical performance modeling in general. Dashed arrows indicate optional paths taken after user decisions.

code that behaves in a very unusual way. In such cases, the traditional analytical modeling has an advantage, since it allows us to tailor very specific models for such codes.

The workflow in Figure 2.1 begins with a set of performance measurements on different processor counts {p1, . . . , pmax}. The measurements produce performance profiles, similar to profiles discussed in Section 1.3.1. Computing systems in general, and HPC systems in par- ticular, are prone to jitter (i.e., noise). This means that to ensure the measurements produce statistically sound results, they have to be repeated a number of times. Even if the OS itself is optimized to be as noiseless as possible, such as the CNK on the Blue Gene/Q machine [52], noise and unexpected interference in the network are still possible. The amount of measure- ment repetitions depends on the variation of the results. Once this is accomplished, Calotoiu et al. apply regression to obtain a coarse performance model for every possible program region, which is a node in a call-path tree. These regions are called kernels since they define the code granularity at which the models are generated. The granularity of the kernels can be further increased by using a more fine-grained instrumentation, such as the manual instrumentation in Extra-P [36]. The initial performance models undergo an iterative refinement process until the model quality reaches a saturation point.

The next sections discuss the model generation processes in more detail and present the Extra-P tool, which embodies the workflow for empirical performance modeling presented in Figure2.1. In Section2.5, we explain the multi-parameter modeling approach which is based on the same modeling workflow, but produces models with two or more parameters.

In document Scalability Engineering for Parallel Programs Using Empirical Performance Models (Page 39-42)