3.2 Profiler
3.2.4 Simulation output
Profile
In the end of a simulation run VEX returns method-level execution time predictions, based on the virtual time execution. Results are presented either on a per-method stack trace or per- method basis, depending on whether the user requested a path or a flat profile. VEX results include the following measurements for each method or stack trace:
• Number of invocations.
• Mean predicted CPU-time excluding called methods (ExCpu).
• Mean predicted total execution time excluding called methods (ExTotal). • Mean predicted CPU-time including called methods (InclCpu).
• Mean predicted total execution time including called methods (InclTotal). • Percentage of predicted inclusive total time (InclTotal) on total execution time.
3.2. Profiler 83
Figure 3.6: Per stack-trace profile visualisation
CPU-times disregard those periods where a thread is waiting or runnable. Results are printed in a Comma-Separated-Value (CSV) file that is easily processed by spreadsheet management applications, enabling further processing or graphical representation.
Method-level profiles can be exported in an hprof -like style, returning only execution time percentages including and excluding called methods. This allows for direct comparison with standard profilers.
Results on a stack-trace basis are also presented graphically (Figure 3.6), with a simple html- based interface, that expands submethods in a tree-like way. Users can define a prune percent threshold, to prune the branches of the tree that have a total estimated time less than this threshold.
Virtual times
Apart from a profiler, VEX is a virtual execution runtime environment. This enables a dif- ferent usage of the framework that is based on the upper instrumentation layer replacing calls that return time on the application-level, with VEX calls returning virtual timestamps. For example, all invocations to the gettimeofday call are replaced to return the global virtual time at that point. This means that other runtime tools can be executed in virtual time, allowing performance metrics, like response times or throughput to be replaced by their virtual counter-
parts. This option is on by default: if it is disabled then real times can be returned as normal, to measure for instance the overhead of the framework. Most of our validation tests keep the default value on and measure overhead with timers external to the VEX simulation.
Virtual execution visualisation
The VEX visualiser is a component that offers a graphical representation of the virtual execu- tion. It gathers event samples throughout the simulation and uses them after it terminates to show the execution progress. Originally used as a debugging and validation tool, the visualiser can be used for performance analysis or educational purposes.
The event log is accessed by a Java applet, which allows users to follow the progress of the virtual execution step by step, by clicking to show the next event. Users can jump forward to any time point of the virtual execution, or zoom in to particular time intervals.
Contrary to method time logging, the visualisation component stores events in a centralised way, which increases the overhead of the framework. The analysis capabilities of the visualiser are, in practice, limited to a few tens of threads. This allows the virtual timeline to fit in the same window as the simulated threads. Events are gathered throughout the virtual execution and are only stored and used after it finishes, which increases the memory requirements of the simulation as well. For these reasons, the visualiser is useful for short, validation or experimental runs.
In Figure 3.7 we show a virtual execution with four threads. In this program, thread main creates three threads, that execute exactly the same code, which performs a fixed number of loop calculations. In the virtual execution, the method executed by thread:
• SlowThread is assigned a TSF of 0.2, resulting in a 5 times slowdown
• FastThread is assigned a TSF of 5.0, resulting in a 5 times speedup
3.2. Profiler 85
Threads terminate execution at singificantly different times despite running the same code VEX-induced context
switching to adhere to the virtual time schedule
Figure 3.7: Visualiser snapshot: all threads apart from main execute the same code with different virtual specifications and progress accordingly on a uniprocessor machine. The dashes signify timeslices that a thread is in the “Running” state.
The visualiser shows how VEX simulates the execution of the three threads on a uniprocessor correctly, assigning timeslices in a round-robin way. The blue line on the bottom of Figure 3.7 is the virtual timeline.
Statistics
VEX’s results include statistics for the virtual execution. By default the framework generates a transition graph for the VEX states, a feature which summarises the simulation characteris- tics. Especially for larger-scale experiments, where the visualiser is unsuitable, these kinds of statistics provide a useful insight into the virtual execution.
Figure 3.8 illustrates such a graph for the execution of 32 threads spawned by main (in total 33 registering and terminating threads). “Suspended” threads became “Running” 292 times and either remained in that state at the end of their timeslice (in 34 cases) or became suspended again in 257 cases. Thread main enters the “Waiting” state, when trying to join with a child- thread that is still running. The state transition graphs help us to identify how specific state
RUNNING 3 4 WAITING 3 SUSPENDED 2 5 7 TERMINATING 3 3 1 2 2 9 2 2 5 8 REGISTERING 3 3
Figure 3.8: A sample automatically generated state transition graph
changes affect the simulation results and execution time. They have been used in the evaluation of VEX.
VEX also provides analytic time-related statistics, showing the virtual time progress. This feature is user-enabled and may gather results on a global scale, on a per-thread basis or separately for each new time sample created. Using time statistics we are able to compare real and virtual executions. To facilitate this comparison, VEX offers a “pure profiling” version, in which the scheduling part is entirely disabled. We call this no-scheduling VEX. Method entries and exits are still trapped and statistics generated, but the framework does not act upon the various thread state changes, nor does it assign timeslices or control the scheduling of threads.
The framework may optionally export scheduler-related statistics, namely timeslices assigned, interrupts sent and virtual time progress as a result of this scheduler’s activities (used in multicore simulations).
Another set of statistics is offered with the help of system and PAPI counters. These range from instruction counts and L2 misses to thread signals, context switches, memory requirements etc. We compare these measurements between the real-time and virtual executions, to understand quantitatively how the VEX simulation affects the performance of the application.