Performance evaluation - Parallel metaheuristics

List of Abbreviations

CACHECACHE

1.3. Parallel metaheuristics

1.3.3. Performance evaluation

Most of the metaheuristics, whether parallel or serial ones, are evaluatedempiri-

cally in anad hoc manner, due to the difficulty in developing theoretical analysis [4]. An experimental analysis usually consists in applying the proposed algorithms to a collection of problem instances and comparatively report the observed solution quality and execution time.

1.3 Parallel metaheuristics 33

In deterministic optimization methods, the efficiency in terms of search time is the main factor to evaluate the performance of the algorithms, since they guaran- tee the global optimality of solutions. However, to evaluate metaheuristic methods, other measures have to be considered. Because of the stochastic nature of metaheuristics, a number of independent experiments need to be conducted to gain sufficient experimental data. Thus, the performance measures for these methods are based on some kind of statistics.

This subsection describes the guidelines followed in this Thesis to evaluate, and also to compare, the parallel metaheuristics proposed in a rigorous way.

Vertical versus horizontal views

There exist two different approaches for collecting data of different runs [87]: Vertical view: a vertical approach assesses the performance for a predefined effort. The effort may be predefined as a target execution time or as a fix number of evaluations. Fixing a predefined effort can be pictured as drawing a vertical line on the convergence graphs (see Figure 1.12).

Horizontal view: an horizontal approach assesses the performance by measuring the time needed to reach a given target value. Fixing a target function value can be seen as drawing an horizontal line in the convergence graphs (see Figure 1.12).

For benchmarking algorithms the horizontal view is preferred to the vertical one, since it gives quantitative and interpretable data: the horizontal view measures the time needed to reach a given target function value and allows deriving conclusions such as Algorithm A is X times faster than Algorithm B in solving this problem. In the vertical view, there is no interpretable meaning to the fact that Algorithm A reaches a fitness value that is X times smaller than the one reached by Algorithm B, since there is no a priori evidence how much more difficult it is to reach a fitness value that is X times smaller (as demonstrated in Figure 1.12).

The main goal of this Thesis is to improve the horizontal approach, however, the vertical approach also benefits from the proposed solutions, which is interesting for

Figure 1.12: Vertical and horizontal views illustrated in a convergence graph. Adapted from [87].

many real world applications where the total number of evaluations is limited.

Speedup

There are different metrics to measure the performance of parallel algorithms.

Among them, thespeedup is the most popular one. This metric calculates the ratio

between sequential and parallel execution times. Thus, the definition of execution

time must be faced. In a single core, a common performance metric is theCPU time

to solve the problem, that is, the time that the processor spends executing algorithm instructions, excluding system overhead activities. However, in the parallel case, execution time can not be considered either the sum of the CPU times on each core, or the largest among them. Since the goal of parallelism is the reduction of the real

time, our choice for measuring the performance of the parallel code is thewall-clock

time to solve the problem, that is, the time between the starting and finishing of

the entire algorithm.

The speedup compares the sequential time against the parallel time to solve a

problem. If Tn is the execution time for a parallel algorithm using n cores, the

1.3 Parallel metaheuristics 35

Sn=

T1 Tn

Unfortunately, for stochastic algorithms we cannot use this metric directly. The

speedup should be instead calculated using the mean execution times. Moreover,

another issue with this measure is that researchers do not agree on the meaning ofT1

andTm, and there exists different definitions of speedup depending on the meaning

of these values [4]:

Strong speedup: compares the parallel run time against the best-so-far sequential algorithm. This is the most accurate definition of speedup, but due to the difficulty of finding the current most efficient algorithm, it is not a practical one.

Weak speedup: compares the parallel algorithm developed by a researcher against his/her own sequential version. This is the definition of speedup used in this Thesis.

Additionally, two different sequential algorithms can be considered to calculate the speedup. We can compare the execution time of the parallel algorithm against

the canonical sequential version of the algorithm (speedup versus panmixia) or we

can compare the execution time of the parallel algorithm against the same parallel

algorithm running on one core (orthodox speedup). In this Thesis, we always use

the former, that is, a speedup versus panmixia. However, it should be noted that in this case we are comparing two clearly different algorithms, and thus, superlinear speedups may arise when the parallel algorithm modifies the systemic properties of the original method and outperforms the sequential algorithm.

Graphical data representation

Some visualization tools to analyze the data have to be used to complement the numerical results presented in tables. Indeed, graphical representation of the data allows a better understanding of the performance assessment of the obtained results. Boxplots illustrate the distribution of the results through their five-number sum- maries: the smallest value, lower quartile (Q1), median (Q2), upper quartile (Q3),

and largest value. They are useful in detecting outliers and indicating the dispersion and the skewness of the output data without any assumptions on the statistical distribution of the data. Violinplots, in turn, show the probability density of the data at different values. While the boxplots only show summary statistics, the violinplots show the full data distribution. The difference is particularly useful when the data distribution is multimodal (more than one peak). The violin plots clearly shows the presence of peaks, their position and relative amplitude. Violinplots are a good alternative to employ a serie of histograms. In this Thesis, a combination of boxplots and violinplots (see Figure 1.13) are used in order to incorporate the goals of both representations.

To clearly illustrate the goal of a proposed metaheuristic against other ap-

proaches convergence curves are frequently used. Convergence curves (see Fig-

ure 1.14) represent the logarithm of the objective function value against the execution time. Though one can argue about the convenience of representing the convergence curves for the best profits, or even artificial convergence curves ob-

tained by plotting the best solution found by any of thenparallel processes at every

time instant [219], in this Thesis we prefer to show the converge curves for those experiments that fall in the median value of the results distribution, since those are real convergence curves (that is, correspond to one of the experimental tests) and we think they are more realistic than the ones that depicted the best profit.

The region between the lower and upper bounds of the m runs performed for each

experiment can also be shown to better illustrate the dispersion of the results (see Figure 1.14(b)).

Statistical analysis

Most of the reported results in this Thesis try to prove that a novel proposed metaheuristic outperforms previous attempts. In this case, the use of descriptive statistics, such as the sample mean and the standard deviation, is not sufficient. The comparison between two average values may be different from the comparison between two distributions. Thus, statistical methods should be used wherever pos- sible. The statistical test are performed to estimate the confidence of the results to be scientifically valid.

1.3 Parallel metaheuristics 37

Figure 1.13: Example of hybrid violin/boxplot.

10 1000 10000 50000 Wall-time (s) 108330 200000 500000 f(x) ×105 method1 method2

(a) Convergence curves (b) Convergence curves including lower and upper bounds

Figure 1.15: Statistical methods. Adapted from [4]. .

Several statistical methods and the conditions to apply them are shown in Fig- ure 1.15. The selection of a given statistical test is driven according to the char- acteristics of the data [4]. The first step is to decide between non-parametric and parametric test. In theory, when the data set is non-normally distributed and the number of experiments is below 30 we should use non-parametric methods. That is the case in the experiments performed in this Thesis.

Among non-parametric methods, in this Thesis the Wilcoxon signed-rank test has been used when comparing two metaheuristics, and the Kruskal-Wallis test has been used when comparing more than two algorithms. The Wilcoxon signed-rank test assumes that there is information in the magnitudes and signs of the differences between paired observations. This test essentially calculates the difference between

1.3 Parallel metaheuristics 39

test can be used to test the null hypothesis that two populations have the same continuous distribution. When more than two samples are compared, the Kruskal- Wallis test is used, that is also based on ranked data.

Chapter 2 Enhanced parallel Differential

In document Optimization in computational systems biology via high performance computing techniques (Page 72-81)