New and Improved Exact Floating-Point Algorithms
4.2.3. Comparison to Straightforward Conversion Using the algorithms from above, we can convert a general expansion into a bigfloat number by first transform-
ing it into an equivalent maximally non-overlapping monotone expansion and then joining the mantissae of the new summands into a bigfloat number. This option is illustrated in Figure 4.5. Alternatively, we may split the expansion into two monotone expansions, convert them separately and then perform a single exact addition of two bigfloat numbers. This option is illustrated in Figure 4.9. While overflow may occur in the first conversion strategy for very large expansions, the second option is free from overflow on principle, since there are no floating-point operations involved. It may hence serve as a backup strategy.
For both mpfr and leda we tried several direct conversion approaches, using functionality provided by the library. Among the things we tried are summing by increasing or decreasing magnitude, increasing the output precision with each summand or setting sufficient output precision before starting to sum up, as well as some library specific approaches. mpfr has a function
mpfr_sum()
for computing the sum of several mpfr numbers in one step, while leda supports exact addition, i.e., may internally compute and set the precision sufficient for an addition to be exact. For each library we selected the two fastest direct approach for comparison.As input data, we use randomly generated expansions, which we create by evaluating a polynomial expression using Shewchuk’s arithmetic operations. This way, test expansions are more likely to have a structure that actually occurs in applications. As expression D we compute a 4× 4 determinant of 4 × 4 determinants of randomly generated numbers. D has a polynomial degree of d= 16. Using the
rand48
family of random number generators, we select a floating-point number1 4 16 64 speedup Monotonize split direct 2 direct 1 0 5 10 15 1 4 16 64 # summands speedup 0 20 40 60 # summands compressed uncompressed 64 bit limbs 32 bit limbs
Figure 4.10. Converting expansions to mpfr. Figures show the speedup for each conversion method, relative to direct 2 , on a logarithmic scale.
as input number. All our input numbers can be uniformly scaled to integers with
p= 53+35 bit precision. Hence, D can be represented with approximately dp ≈ 1400
bits. We observed however that D was in fact representable with approximately 1000 bits only on average.
The evaluation of D gives us strongly non-overlapping expansions with about 220 summands on average. If we additionally compress the sums, we get non-adjacent expansions with about 20 summands on average. In a compressed expansion, each summand carries about 52 bits of information. Therefore, before compressing, each summand carries less than 5 bits of information on average. Thus, uncompressed and compressed expansions are input sets with quite different characteristics. This is particularly relevant, since Monotonize compresses as a side effect and hence may reduce the number of summands significantly before the actual conversion step. We consider expansions with n= 1, 2, . . . , 64 summands in the uncompressed case and
n= 1, 2, . . . , 20 in the compressed case. To generate expansion with fewer summands
than originally created, we simply ignore leading summands.
We run experiments with a limb size of both 64 bit and 32 bit on the
descartes
platform. For 32 bit limbs we use a slightly different setup. Code is compiled with1 4 16 64 speedup Monotonize split direct 1 direct 2 0 5 10 15 1 4 16 64 # summands speedup 0 20 40 60 # summands compressed uncompressed 64 bit limbs 32 bit limbs
Figure 4.11. Converting expansions toleda::bigfloat. Figures show the speedup for each conversion method, relative to direct 1 , on a logarithmic scale.
the
-m32
flag ofg++
, since neither mpfr nor leda support 32 bit limbs in a 64 bit environment and we use libraries gmp 5.0.2, mpfr 3.0.1, and leda 6.3. To get measurable running times, we generate 2000 expansions as described above, and measure the total time for converting all expansions a 1000 times. The results are shown in Figure 4.10 and Figure 4.11. The graphs do not show running time but the speedup for each method with respect to the faster direct approach. This improves the display of differences for the important range with very few summands, where the actual running times are very small.Both, conversion by Monotonize and conversion by splitting clearly outper- form the direct approaches, and conversion by Monotonize is uniformly the fastest method. The speedup is about the same for 32 and 64 bit limbs. As expected, Mono- tonize achieves greater speedup for uncompressed than compressed expansions. For mpfr, there is a small local minimum in the speedup achieved by Monotonize for uncompressed expansions, near the mark of 20 summands. This roughly corresponds to our observation that summands carry less than 5 bits of information. Near this range the number of output summands of Monotonize jumps from one to two. For
mpfr, our new approaches are strictly faster, even in the case of one summand only. Both direct approaches simply call