Modelling the Diagnosis - Supporting visual diagnosis of performance problems in multi core and

In the second phase of our research, which will be explained in more detail in Chapters 5 and 6, we wanted to answer the question of whether we can perform a comprehensive and systematic analysis of information involved in understanding and improving the performance of parallel programs. In other words, we needed to create apractical

modelthat could be used to build tools to support developers. To do this and continue

with the goal of creating an effective visualisation for parallel problem identification, several parts had to be brought together and analysed as a whole:

1. Performance data extraction and collection. In order to be able to build paral-

lel performance analysis tools, we first need to be able to collect data whether it be by instrumenting the program or gathering operating-system or hardware counter data. Most importantly, we need to know what type of data is possible to collect, visualise and analyse later. We settled on the Windows operating system, since it features a set of well-supported and well-documented tools. We compiled comprehensive lists of of different measurable events or counters we could collect; this gave us insight into the possible information that can be dis- played by tools that would allow successful identification of performance problems.

Once this was carried out, we implemented a data collection tool that allowed us to experiment, extract and combine both CPU hardware and performance counters2 and performance events and counters issued by Windows operating

CHAPTER 3. RESEARCH OVERVIEW

system. Chapter 6 contains more discussion on this aspect of our research.

2. Taxonomy of parallel performance problems on multi-core architectures. Un-

fortunately, no comprehensive taxonomy existed, listing parallel performance problems which can occur on shared memory multi-core architectures; hence, we had to create one ourselves. Several classes of problems had been identified tangentially in the interview study, and we began iteratively constructing a taxonomy based on various scientific literature and white-paper publications from companies such as Microsoft or Intel. The very first taxonomy contained only eight of the most common problems, or at least that was our untested initial as- sumption. The initial problems included under-subscription, over-subscription, uneven load distribution, lock contentions and lock convoys, along with I/O contention, indirect memory access and false/true sharing.

After the first model we reiterated multiple times with two domain experts and created a more complete taxonomy which contained seven broad categories such as load balancing and task granularity, with a total of twenty three individual problems. We then performed a broad survey with 71 participants to better un- derstand which problems are most commonly occuring as well as commonly diagnosed, which ones are more exotic and do not really occur in practice. We discuss this in more detail in the Chapter 5.

3. Expert knowledge on the parallel problem diagnosis.

By the time we started creating the taxonomy, we had already begun pulling in a significant amount of expert knowledge. Together with the domain expert, we have attempted to create a set of simple diagnosis models for the performance problems we have identified. Some of the diagnosis process turned out to be rather straightforward while other problems turned out to be very difficult to diagnose and the expert could not determine how he would perform diagnosis. For example, for one of thepoor data locality performance problems we were able to construct a simple decision tree based on the observations of measurable events or counters we could collect.

cessors to store the counts of hardware-related activities within computer systems. Advanced users often rely on those counters to conduct low-level performance analysis or tuning.

CHAPTER 3. RESEARCH OVERVIEW

33% 67% +

“high #DRAM page changes”

80% + 20%

“high #TLB misses”

60% 40%

“low #DRAM page changes”

Figure 3.3– An example of the levels of agreement between experts on various “measurable observations” of two performance problems.

Once we had completed the three parts, we knew the data we could collect and implemented a performance data collection mechanism. Once we had a relatively large and refined problem taxonomy and initial expert diagnosis models for most of the performance problems, we began to work on apractical model of performance problem identification. It was paramount that the model would be relatively easy to apply as the model is intended to be used by practitioners and tool builders and not only for research purposes.

While Chapter 5 goes into more detail on the model itself and the related validation, an example of the components of the model can be seen in Figure 3.3 which depicts the levels of agreement between experts on various observations that can be either:

• A (strong) indicationof a particular performance problem being present in the

target program.

• A (strong) contra-indicationof a particular performance problem being present

in the target program.

In other words, this model helps to determine which measurable events or counters can be used for effective parallel performance problem identification through inter-expert validation and, can be extended by simply having experts assessing various observations and the agreement level statistics used to further extend/refine the model.

In document Supporting visual diagnosis of performance problems in multi core and parallel software (Page 57-59)