Chapter 4 Decision Tree Based Automatic Performance Analysis
4.4 Display Performance Analysis Results
The results of performance analysis are mainly used to feed into the control to guide the performance steering. Besides this, it can be directly written into plain text files to help the developers figure out the program performance problems.
Projections
Projections [32], is a stand-alone tool to visualize and analyze performance of Charm++ applications. During program execution, performance is traced. The details of events are recorded in plain text files, which are read by Projections. Projections composes a set of tools.
1. Time Profile tool : it shows the profile of various functions, idle and overhead time for all processors over the time. The clearly tells how well the program runs.
2. Usage Profile tool : it displays each processor’s CPU usage for a time period. It is useful to tell how the load is distributed over all processors.
3. Communication Over Time tool : it displays the communication over the time. 4. Extrema Analysis tool : it sorts the processor based on idle time ascending or de-
scending, utilization descending. This is useful to find processors with interest. 5. Timeline tool : it displays every detail of the program execution. Figure 4.8 is an
(a) Time Profile (b) Usage Profile
(c) Extrema Profile (d) Communication Profile
Figure 4.7: Tools in Projections
of processors that users choose. For each processor, it shows the execution of each function, idle period in different colors. The begin time, end time and associated messages to entry methods are also displayed. More features include tracing message back to source, tracing message forward to the destination. Timeline is the ultimate tool to visualize and analyze program behaviors and performance.
All these tools are extremely powerful and useful to analyze the performance. How- ever, there are a few drawbacks of Projections. With the number of processors that the program running on increasing, the data files generated for Projections can be huge, as high as hundreds of Gigabytes or even more. This costs a lot of storage. Also it takes time to download all data from supercomputers to local machines to process. Even worse, Projections becomes very slow to visualize or analyze the huge data due to slow loading of files, files not fitting into the memory. The worst problem is to analyze the complicated data manually. For example, with hundreds of thousands of processors finding the least idle processor, the longest entry method takes long time, which can be tens of minutes for 10K processor data set. It becomes almost impos- sible to manually figure out the root performance bottleneck from the massive data. This motivates us to use PICS to help performance analysis and visualization.
Figure 4.8: Timeline tool of NAMD PICS and Projections
In order to overcome the drawback of Projections, we have applied PICS to help. The following approaches are used.
Output Processors with Interest : in order to reduce the data, we only output the data for the processors with interest. For example, we output the processors with least idle time, most utilization, most overhead, etc. This significant reduces the amount of data.
Generate a PICS summary file : A PICS summary file is generated to help Projec- tions. For example, instead of finding the least idle time in all processors, this information is read from this PICS summary file.
Output performance bottleneck : The results of performance analysis is written into plain text files to help users understand the performance.
The output of PICS is organized in blocks of data shown in List 4.2. The data contains multiple blocks of DataEntry. Each entry is firstly identified by the step id, the begin timer. Following it are the performance summary data for this step, the performance problems and the solutions.
Features in Projections
The PICS output file is read by Projections to reconstruct the data structure for analysis use. In projections in Timeline tool, a few new features are added to display the PICS output. For each feature, a binary search based on the timer is applied to find the proper entry of data. And then the related information is displayed based on the requirement of different features.
Listing 4.2: Struct of PICS output class Node { bool isSolution; //--- functions } class Condition{ } class Solution{ } class DataEntry{ int step; int entries; long timer; double summary[][]; vector<Node> CondSol; } Vector<DataEntry> dataset;
Now we have added the following features in Timeline.
Least idle processors The least idle processor is automatically loaded into Timeline. Highlight the entry methods of longest duration : When there is a performance problem associated with entry method duration, it is important to highlight this entry, the object it belongs, the processor it runs on. The information can all be extracted from PICS output. After it is obtained, the associated entry methods are highlighted for users to clearly see them.
Display the performance problems : All the output from PICS analysis can be displayed in plain text to the users.
Plot the performance summary data : All the performance summary data can be visualized in plot in line, bar, or area formate shown in Figure 4.9.