SCIENCE OR ART?
2.5 TOPOLOGICAL OPERATIONS AND ALGORITHMS .1 Find the neighbouring elements of a triangular mesh
2.6.5 Comparison of the sorting methods
The sorting methods are to be tested with three sets of data of different characteristics. The first set consists of evenly distributed data, which are generated by the function
xi = sin (i + 0.23) i = 1, n
The results are shown in Table 2.1 where count is the total number of data in all the lists sorted by quick sort or the total number of data in all the bins sorted by bin sort. Hence, count is a direct measure of complexity with respect to the number of data points n. Time is the CPU time of Intel® Core™ i7 CPU [email protected] running Visual Fortran in XP mode. The count and the CPU time taken by the four sorting methods for n = 102, 103, …, 107 are plotted in Figure 2.38. As shown in Table 2.1, bubble sort and insertion sort are obviously O(n2) algo-rithms, as the CPU time increases by 100 times for each step of increment of n by 10 times. A direct comparison between the bubble sort and insertion sort reveals that the insertion sort is about twice more efficient than the bubble sort. As the O(n2) trend complexity is clear, the tests
Table 2.1 Counts and CPU time for sorting of data set 1
n
Quick sort Quick sort* Bin sort Bubble Insert
Count Time(s) Count Time(s) Count Time(s) Time(s) Time(s) 100 681 9.77E6 681 1.04E5 148 1.31E5 4.04E5 1.9E5 1000 11,799 1.42E4 11,799 1.45E4 2096 1.90E4 3.6E3 1.74E3 10,000 165,812 0.00186 165,812 0.00197 21,767 0.00203 0.378 0.173 100,000 2,110,329 0.0196 2,110,329 0.0241 141,907 0.0133 37.6 17.5 1,000,000 25,852,531 0.284 25,852,531 0.292 1,622,397 0.21
10,000,000 316,774,234 3.34 316,774,234 3.48 14,974,655 3.29
0.000010.000110,0000.00110000.011000.1101 100,000 1,000,000 10,000,000 100,000,0001E + 09
1 10 100 1000 10,000 100,000 1,000,000 10,000,000
Count/time
Number of data
Quick_C Quick*_C Bin_C Quick_T Quick*_T Bin_T Bobble Insert
Figure 2.38 Data set 1: x(i) = sin(i + 0.23).
for the bubble sort and insertion sort stop at n = 105. On the other hand, the quick sort and bin sort are far more efficient than the bubble sort and insertion sort, especially for large data sets.
The nlog(n) complexity trend is quite obvious for the quick sort, as the count increases slightly more than 10 folds for each step of increment of n by 10 times. In spite of minor fluctuation for each increment of n, the bin sort follows more or less a linear trend. However, the overall performance of the quick sort and bin sort is quite similar for n up to 107.
The second set of data consists of packets of equal items, which are generated by the function
xi = sin (mod(i,17) + 0.23) i = 1, n
Only the quick sort and bin sort are tested, and the results are shown in Table 2.2. The count and the CPU time taken for this data set are plotted in Figure 2.39. As shown in Table 2.2, the count for the quick sort increases 100 times for each step of increment of n from 104 to 106. The O(n2) performance is the worst case for the quick sort due to the presence of packets of equal items. As for the bin sort, the count is absolutely linear, and the CPU time also follows a strictly linear trend, showing that the bin sort is very efficient with equal valued data. The fact that the bin sort has no difficulty with equal values is attributed to the program statement ‘if (d < tolerance = 10–99) return’, and for an array of equal values, xmin = xmax or d = 0, and as a result nothing will be done to this array.
Table 2.2 Counts and CPU time for sorting of data set 2
n
Quick sort Quick sort* Bin sort
Count Time(s) Count Time(s) Count Time(s)
100 847 9.53E6 711 1.03E5 200 1.03E5
1000 34,459 1.87E4 10,259 1.34E4 2000 9.90E5
10,000 2,990,844 0.0108 138,376 0.00151 20,000 9.80E4 100,000 294,591,139 0.987 1,879,235 0.0194 200,000 0.0096 1,000,000 29,417,088,191 96.7 21,759,630 0.221 2,000,000 0.1 10,000,000 2,941,235,588,192 9695 252,306,156 2.39 20,000,000 1.03
0.0000010.00001100,0000.000110,0000.00110000.011000.1101 1,000,000 10,000,000 100,000,0001E + 091E + 101E + 11
1 10 100 1000 10,000 100,000 1,000,000 10,000,000
Count/time
Number of data Quick_C
Quick*_C Bin_C Quick_T Quick*_T Bin_T
Figure 2.39 Data set 2: x(i) = sin(mod(i,17) + 0.23).
The weakness of quick sort on equal valued data can be easily rectified. The main dif-ficulty is due to the improper placement of the pivot always at the first position for a list of equal valued data instead of at the middle. What has to be done is to replace the statement in Procedure Pivot (A, I, J, IP), ‘If (T < P) then’ by ‘If (T < P or T = P and IP < K) then’. In case of equal valued data, T = P, IP would not stop at the first position but would proceed to the Kth position right at the middle of the list. This modification has been implemented and tested, and the enhanced algorithm is denoted by quick sort*. As shown in Table 2.2, the quick sort* takes slightly more CPU time as compared to the quick sort to sort an array of small size, say, n = 100. For larger n = 103 onwards, the quick sort* takes much less CPU time than the quick sort, and as indicated by the count, the nlog(n) complexity trend is recovered for large n’s. For evenly distributed data as shown in Table 2.1, the quick sort*
only takes a few percent more CPU time compared to the quick sort.
The third set of data consists of widely spread and equal values of ±1043, which are gener-ated by the function
xi = exp (mod(i,100) + 0.1) sin (mod(i,17) + 0.23) i = 1, n
Again, only the quick sort, quick sort* and bin sort are tested, and the results are shown in Table 2.3. The count and the CPU times taken are plotted in Figure 2.40. As shown in Table 2.3, the bin sort takes slightly more CPU time as compared to the sorting of the set of
0.000010.000110,0000.00110000.011000.1101 100,000 1,000,000 10,000,000 100,000,0001E + 091E + 101E + 11
1 10 100 1000 10,000 100,000 1,000,000 10,000,000
Count/time
Number of data Quick_C
Quick*_C Bin_C Quick_T Quick*_T Bin_T
Figure 2.40 Data set 3: x(i) = exp(mod(i,100) + 0.1)sin.
Table 2.3 Counts and CPU time for sorting of data set 3
n
Quick sort Quick sort* Bin sort
Count Time(s) Count Time(s) Count Time(s)
100 763 1.02E5 763 1.05E5 998 9.20E5
1000 12,413 1.43E4 12,413 1.47E4 8766 8.00E4 10,000 183,760 1.93E3 172,779 2.10E3 76,384 6.12E3 100,000 4,454,447 0.0283 1,891,357 0.0216 630,294 0.049 1,000,000 308,302,121 1.14 23,348,533 0.247 5,474,122 0.416 10,000,000 29,563,195,747 95.0 272,169,661 2.75 48,964,710 3.78
evenly distributed data, showing that the performance of the bin sort is pretty consistent over data values in the practical range. However, the performance of the quick sort* is the best, indicating that the quick sort* is not sensitive to the values of the data set. The performance of the quick sort is rather disappointing for this data set, suggesting that the original version of the quick sort is quite sensitive to an even minor fraction of equal-valued data. From the three sets of data tested, the performance of the quick sort* is the most promising in terms of speed and memory requirement, yet a definite conclusion cannot be drawn unless more data sets are tested, and other programming techniques are tried out. However, one thing is clear that both the bin sort and quick sort* are efficient, and either one of them can be employed to sort data points up to n = 108 or more with confidence. Other sorting methods not men-tioned in this section are also available; interested readers can take a look at the merge sort by Nardelli and Proietti (2006) and sample sort and PE sort by Chen (2006).