Interpreting measurements - Discussion of objectives for a new benchmark framework

3.2 Discussion of objectives for a new benchmark framework

3.2.5 Interpreting measurements

By using a plugin infrastructure, the benchmarking process is largely abstracted from the file system operation itself. As performance is usually given as work completed per time unit, one can run a given number of operations, measure the time needed, and then claim that, on average, one measured 3,000 operations per second of operation A or 2,000 operations per second of operation B. But is this enough data to draw conclusions about the performance of the file system? For complex, distributed measurements with multiple dimensions of parameters it is not enough, because it leads to the loss of much useful information.

Result compression

For application-level benchmarks, an effective approach to describing the performance of a file system for a particular workload is to summarize the results, sometimes even into a single number. Postmark, for example, runs a suite of different metadata-intensive operations and returns a number indicating the transactions per second [Kat97]. It is simple to compare different filesystems when considering the same workload, but because the relationship between this single result and the operations performed is opaque‡_{this method is less suitable}

as a tool to systematically identify issues and improve a file system. Too much information is averaged and/or lost. In contrast microbenchmarks, such as fileops, break down mea- surements into detailed performance numbers directly connected to specific operations, but again, information about time-related effects and their influence on performance is lost.

Neglect of the time parameter

One parameter that has been neglected in metadata benchmarking is time: more precisely the runtime of the benchmark. Why is the runtime important for a file system benchmark?

3.2. Discussion of objectives for a new benchmark framework 47

As previously discussed, a benchmark for distributed file systems must utilize multiple processes working in parallel, and different processes perform their tasks independently. In a sequential benchmark, there is one process and one point in time when it completes. With multiple processes that may be running on different nodes, it is quite common that different processes do not work at the same speed. This may be due to systematic reasons (e.g., unfair processing of requests by the file system), differences in hardware or software configuration, or mistakes in the benchmark setup (e.g., additional processes running on the nodes and using up CPU time). Small deviations in performance can lead to results difficult to explain if only summary numbers are available. These deviations are quite similar to the ”noise” effects observed in large parallel computations in which a single lagging process delays the entire calculation [TEFK95].

Unfortunately, the common techniques used to obtain summary result calculations, in- cluding those described below, lead to the complete loss of information about many events.

Global throughput approach

The simplest way to calculate a ”result number” is to divide the total number of operations by the wallclock time required to complete it. While this is a perfectly valid approach for a single-threaded, single-process benchmark such as Postmark, it is problematic in a parallel environment. Consider example (b) in Figure 3.2, where the three processes P1, P2 and P3 each execute 1,500 operations, and process P3, which finishes after 15s, is slower than P1 and P2. The average number of operations per second would be (3*1,500)/15s = 300 op/s. Unfortunately this single number does not give any indication of a problem and is indistinguishable from example (a), where all processes work at the same speed.

P1

P2

P3

P1

P2

P3

T(0)

T(end)

T(0)

T(end)

(a)

(b)

Figure 3.2: Average throughput measurement

The ”stonewalling” approach

Stonewalling is a technique used internally to IOzone. It is used during the throughput tests. The code starts all threads or processes and then stops them on a barrier. Once they are all ready to start then they are all released at the same time. The moment that any of the threads or processes finish their work then the entire test is terminated and throughput is calculated on the total I/O that was completed up to this point. This ensures that the entire measurement was taken while all of the processes or threads were running in parallel.

Stonewalling is very useful if the objective is to estimate the total performance of the file system without concern for unfairness among particular processes. IOzone also displays the minimum and maximum amount of data that was processed, which is the equivalent of the number of operations completed for metadata and, optionally, the time needed by every process; however other information about differences in speed is lost.

P1

P2

P3

T(0)

T(stonewall)

T(end)

Figure 3.3: The stonewalling approach

Proposal for a time-interval logging approach

A time-interval logging approach is proposed to retain information about the behavior of every process. In this approach, every single process records the number of operations already completed within fixed time intervals until the entire task is completed (see Fig. 3.4).

This method preserves both the performance of every single process as well as the func- tion of time vs. the number of operations completed. Based on this data the two averages described above, as well as total performance numbers for all or a subset of processes can be computed. In this manner a single benchmark run can provide much more information than the compressed and averaged results of existing benchmarks.

The example in Fig. 3.4 shows three processes that each performs 30 operations. A wallclock-time average of 18 operations per time unit is the result of 90 operations performed within five time units (90/5=18). The stonewalling approach would yield a result of 23.3 operations per time unit (70 ops/3 time units=23.3). The axis labeled ”Total” in the

3.2. Discussion of objectives for a new benchmark framework 49

P1

P2

P3

T(0)

0 5 13 18 25 30 0 8 18 30 0 6 14 22 30 5 8 5 7 5 8 10 12 6 8 8 8 0 19 45 70 85 19 26 25 15 5 90

Total

Operations completed (logged during measurement)

Operations completed in this time interval

Summary data (calculated) Fixed time interval

Figure 3.4: Time-interval logging (Time interval enlarged for illustrative purposes) figure indicates exactly how performance changed between each time interval. For illustrative purposes, few intervals are used in the figure. In reality, the time interval would be kept much smaller than the total runtime, but still long enough so that a larger number of operations are completed within each time segment.

Obtaining the progress data requires running additional execution threads parallel to the benchmark itself. An alternative approach tested in [Str04] that is simpler to implement is to record the time elapsed when a process has completed a fixed quantum of operations (e.g., every 100 operations). However, in the context of file systems, a fixed time interval provides a significant advantage: Side effects that emerge based upon the time frame can be observed in detail, regardless of operation performance. It is also easier to compare different measurements if they share a common time grid.

The method presented above makes use of time intervals that all start at the same time and requires a suitable synchronization mechanism (e.g., an MPI barrier). Alternately, every process could keep track of time using a global time reference, for example an NTP- synchronized clock.

Time-based logging and scaling

When fine-granular data on the number of operations completed per time unit is available, strong scaling can be simulated. If the performance for a fixed number of operations N is needed, the first time interval within which the number of operations completed is larger than N can be identified and the average performance then computed from T(0) to that point

in time. In a similar way, the ’stonewall’ average can be calculated by identifying the time interval within which the first process has completed its operations.

In document Analyzing Metadata Performance in Distributed File Systems (Page 58-62)