Algebraic Foundations - Geometric Computing

2.7 Geometric Computing

2.7.2 Algebraic Foundations

A key requirement in the design of ECG algorithms is the ability to perform exact computa-tions with algebraic numbers. Such computacomputa-tions require a representation in purely algebraic terms, as given by the primitive element representation or Thom’s encoding. A fundamental problem is root isolation, that is, to determine a set of disjoint, connected regions such that their union contains all roots of a univariate polynomial (or a polynomial system) and each of the regions contains precisely one root. During the report period, we made significant progress on root isolation. We proved the first polynomial bound on the complexity of the Continued Fraction method for root isolation, we derived a deterministic root isolator for polynomial with Bitstream coefficients, we developed an adaptive algorithm for determining the (complex) roots of a zero-dimensional system of integer polynomials and a more efficient method for computing subresultants of two polynomials over a domain D. Finally, we showed how to use graphics processors for large integer arithmetic.

The Continued Fraction Algorithm Investigator: Vikram Sharma

The continued fraction based algorithms are some of the best known algorithms for real root isolation. However, unlike other algorithms for the same task (see e.g. [6]), tight bounds on their worst case complexity are not known. In [12] we provide polynomial bounds on the worst case bit-complexity of two formulations of the continued fraction algorithm. In particular, for a square-free integer polynomial of degree n with coefficients of bit-length L, we show that the bit-complexity of Akritas’ formulation is Õ(n⁸L³), and the bit-complexity of a formulation by Akritas and Strzeboński is Õ(n⁷L²); here Õ indicates that we are omitting

logarithmic factors. The analysis use a bound by Hong to compute the floor of the smallest positive root of a polynomial, which is a crucial step in the continued fraction algorithm.

We also propose a modification of the latter formulation that achieves a bit-complexity of O(n˜ ⁵L²). Our analysis sheds some light on the difficulty involved in bounding the worst case complexity of the continued fraction based algorithms.

A key component of such algorithms is a bound on the largest positive root of a polyno-mial. Most approaches for obtaining such bounds depend only on the absolute values of the coefficients of the polynomial. For instance, for a univariate polynomial A(X) =Pn

i=0aiXⁱ,

However, as good as these bounds are, they are not known to be tight w.r.t. the largest positive root of the polynomial. Recently, we have given a general framework for obtaining bounds similar to Hong’s. Within this framework, we show that Hong’s bound is close to being the best bound. Moreover, all these bounds are upper bounds on the absolute posi-tiveness of a polynomial, that is the largest positive real root amongst all the derivatives of the polynomial. Since this, in general, may not be the positive root, we still do not know tight bounds for the positiveness of a polynomial. Finding such bounds are crucial for the continued fraction based algorithms, and may also be useful in other contexts. Even if this is not feasible, we may alternatively try to get a constant factor or a logarithmic factor ap-proximation to the absolute positiveness of the polynomial, which would be a considerable improvement over the linear factor approximation that is obtained by Hong’s bound.

Another interesting direction is to obtain a continued fraction algorithm that can han-dle polynomials with real numbers as coefficients, instead of integers. Or in other words, a bit-stream continued fraction algorithm. This might be achievable by modifying the current approaches for the bit-stream model to accommodate the use of the bounds mentioned above for computing subdivision points. The bit-stream approach would also lead to an improve-ment in both the theoretical and practical aspects of the algorithm. Thus this direction is worth investigating.

A Deterministic Bitstream Root Isolator

Investigators: Arno Eigenwillig, Kurt Mehlhorn, Michael Sagraloff, and Vikram Sharma Root isolation is usually formulated for polynomials with integer coefficients. We also need to isolate the roots of polynomials with arbitrary real coefficients. We propose the following model. The coefficients can be approximated to any desired accuracy. In other words, the coefficients are available through their potentially infinite binary or decimal representations.

However, there is no need for exact arithmetic in the field of coefficients. We coined the name Bitstream coefficients for this model.

In the previous period, we reported about a randomized subdivision algorithm for isolating the real roots of square-free polynomials with Bitstream coefficients [5]. We also reported about a partial extension to polynomials with multiple roots [4]. We continued our work on the method. In [3], a revised version of the randomized Bitstream algorithm is given

with a crucial improvement in the precision management. It determines the needed preci-sion directly, without the detour through an estimate of root separation and without the close coupling between precision and subdivision depth that existed in the former version.

Furthermore, a detailed complexity analysis is given. This also includes a partial analysis of the extension to polynomials with multiple roots.

The algorithm is part of the Max Planck Institute for Informatic’s model of CGAL’s algebraic kernel. It is a cornerstone for our computations with curves and surfaces discussed in Sections 2.7.3 and 2.7.4. Even for polynomials with integer coefficients, it is one of the fastest methods available, see [7] for a comparison of real root solvers. The same paper recommends our solver as the method of choice for polynomials with non-rational coefficients.

We have developed a deterministic Bitstream root isolator [9, 10]. It uses the fact that the roots of a polynomial depend continuously on the coefficients in a more direct way than its randomized predecessor. The randomized algorithm worked on a polynomial with interval coefficients that represents all ε-approximations of the input polynomial f . In contrast, the deterministic algorithm works on a concrete ε-approximation ˜f . Here ε is chosen such that f and f have the same number of real roots and suitably enlarged isolating intervals of ˜˜ f can serve as isolating intervals for the real roots of f . The choice of ε would be simple if the root separation of f , i.e., the smallest distance between any pair of roots, would be known.

A key part of the algorithm adapts ε as it learns about the roots of ˜f and hence the roots of f . The deterministic method has the same complexity as the randomized method. Moreover, the partial extension to multiple roots can be completely analyzed.

Polynomial System Solving

Investigators: Michael Hemmer, Michael Kerber, and Michael Sagraloff

Polynomial system solving is the key operation in non-linear computational geometry. This is due to the fact that all current algorithms in that area have shown the same behavior, namely, their bottleneck, regarding computational speed, is to find the (real) solutions of a polynomial system. Therefor we mainly distinguish three different approaches. Exclusion or subdivision methods continuously subdivide regions that may contain solutions, whereas regions that doubtlessly do not contain solutions are discarded. Usually, this is combined with a criterion to ensure that a region contains precisely one root, but such methods mostly fail in the presence of multiple roots (e.g. Interval Newton test) and it is hard to make them certifying.

The reason is that, for the multivariate case, there exists no simple test, such as Descartes’

Rule of Signs, to ensure the presence of exactly one simple root within a certain region.

Homotopy methods [13] turned out to be a very powerful numerical approach to find the solutions of a polynomial system. They numerically track the continuous path of the known complex solutions of some trivial and appropriate polynomial system during a continuous deformation into the input system. Such methods, although very robust, lack the certification of their output in general. Finally, there exists several elimination methods. Multivariate resultants as well as Groebner bases are well-studied tools to obtain the solution set of a system with respect to a projection direction. In combination with a certified univariate root isolator they meet the demands in terms of the ability to certify their output. However, the coefficient explosion during their computations constitutes a severe drawback regarding the performance. A further disadvantage is that the computational costs of resultants or

Groebner Basis computations depend on the bitsize and the degree of the input polynomials, but not on the real (geometric) complexity of the problem. That is, these costs are high even if the roots of the system are well separated.

Recently, we made a first step toward the design of a more adaptive algorithm [11], where we combine a subdivision-based approximation scheme for the solutions and a projection-based symbolic approach that takes place completely in several prime fields. Both approaches only deliver incomplete information, but their combination is sufficient to certify the results of the method. At the same time, the performance adaptively depends on several magni-tudes in the algorithm, such as the separation of roots, instead of using worst-case bounds for them. It is planned to implement and benchmark the proposed algorithm in order to an-swer the question whether these advantages lead to measurable effects also in practice. The formulation of the algorithm already shows that there exist several possible optimizations of the proposed methods that an actual implementation should take care of. Some of them are easy whereas others require further studies, also within research areas not considered so far.

We consider the proposed approach also suited to serve as the certification block in a hybrid approach where it is combined with a numerical method, for example, a homotopy solver to find the solutions. Finally, we would like to investigate the computation of the complexity of the proposed algorithm which we assume will also prove the adaptive behavior in theory.

Division-Free Computation of Subresultants Using Bezout Matrices Investigator: Michael Kerber

The subresultants of two polynomials over a domain D correspond to the elements of a Euclidean remainder sequence up to a scalar factor. Since the coefficients of the subresultants never leave the ground domain D and their size grows moderately compared to naive pseudo-division, they have become a standard tool in Computer Algebra. The usual computation strategy for the subresultants is to iteratively perform pseudo-division, but scalar factors are divided out in each iteration to prevent a too big swell-up of the coefficient size.

A second approach to compute subresultants is by evaluating determinants of matrices.

Abdeljaoued et al. [1] get the principal subresultant coefficients (i.e., the formal leading co-efficients of the subresulants) by a modified version of the Bezout matrix where the principal subresultant coefficients correspond to the determinants of the k × k upper-left submatrices (also called leading principal minors). With the (sequential) Berkowitz algorithm [2], they compute the determinant of that matrix without divisions, and the determinants of all k × k upper-left submatrices are obtained as a by-product.

Our work [8] generalizes this approach such that the whole subresultant sequence is ob-tained, instead of only the principal coefficients. This is achieved by defining suitable matrices M₁, . . . , M_k that arise from the Bezout matrix by simple manipulations, so that their leading principal minors give all subresulant coefficients.

Regarding the number of arithmetic operations, the discussed method has a complexity of O(n⁵). This is clearly inferior to pseudo-division-based solutions whose complexity is O(n²). However, divisions in D might be considerably more expensive than multiplications in practice. Our experiments show that our approach is more efficient than state-of-the-art pseudo-division approaches for input polynomials with large integer coefficients, or if D is a polynomial ring.

References

[1] J. Abdeljaoued, G. Diaz-Toca, and L. Gonzalez-Vega. Minors of Bezout matrices, subresultants and the parameterization of the degree of the polynomial greatest common divisor. Int. J. Comp.

Math., 81(10):1223–1238, 2004.

[2] S. Berkowitz. On computing the determinant in small parallel time using a small number of processors. Information Processing Letters, 18:147–150, 1984.

• [3] A. Eigenwillig. Real Root Isolation for Exact and Approximate Polynomials Using Descartes?

Rule of Signs. Phd thesis, Universit¨at des Saarlandes, Saarbr¨ucken, 2008.

• [4] A. Eigenwillig, M. Kerber, and N. Wolpert. Fast and exact geometric analysis of real algebraic plane curves. In C. W. Brown, ed., Proceedings of the 2007 International Symposium on Symbolic and Algebraic Computation, Waterloo, Ontario, Canada, 2007, pp. 151–158. ACM.

[5] A. Eigenwillig, L. Kettner, W. Krandick, K. Mehlhorn, S. Schmitt, and N. Wolpert. A descartes algorithm for polynomials with bit-stream coefficients. In V. G. Ganzha, E. W. Mayr, and E. V. Vorozhtsov, eds., Computer Algebra in Scientific Computing, 8th International Workshop, CASC 2005, Kalamata, Greece, 2005, LNCS 3718, pp. 138–149. Springer.

[6] A. Eigenwillig, V. Sharma, and C. K. Yap. Almost tight recursion tree bounds for the Descartes method. In J.-G. Dumas, ed., ISSAC ’06: Proceedings of the 2006 international symposium on Symbolic and algebraic computation, Genova, Italy, 2006, pp. 71–78. ACM.

• [7] I. Z. Emiris, M. Hemmer, M. Karavelas, B. Mourrain, E. P. Tsigaridas, and Z. Zafeirakopoulos.

Experimental evaluation and cross-benchmarking of univariate real solvers. Rapport de recherche EMIRIS:2008:INRIA-00340887:1, INRIA, Sophia Antipolis, France, 2008.

• [8] M. Kerber. Division-free computation of subresultants using bezout matrices. International Journal of Computer Mathematics Taylor&Francis In print, 2009.

• [9] K. Mehlhorn and M. Sagraloff. A deterministic bitstream descartes algorithm. Technical Report ACS-TR-361502-03, University of Groningen, 9700 AB Groningen THE NETHERLANDS, 2008.

accepted to ISSAC 2009.

• [10] K. Mehlhorn and M. Sagraloff. A deterministic descartes algorithm for real polynomials. Ac-cepted for ISSAC 2009, 2009.

[11] M. Sagraloff, M. Kerber, and M.Hemmer. Certified complex root isolation by subdivision and modular computation. Submitted, 2009.

• [12] V. Sharma. Complexity of real root isolation using continued fractions. Theoretical Computer Science, 409:292–310, 2008.

[13] A. J. Sommese and C. W. Wampler. The Numerical Solution of Systems of Polynomials Arising in Engeneering and Science. World Scientific, Singapore, 2005.

Large Integer Arithmetic on Graphics Processors Investigator: Pavel Emeliyanenko

From the very beginning, the graphics hardware was particularly well-suited to perform efficient computations in floating-point arithmetic. However, with the release of NVIDIA’s CUDA framework [1] the situation has changed, allowing scientific applications which require integer arithmetic to benefit from the tremendous power of graphics processors.

...

512−point forward NTT 512−point forward NTT point−wise modular multiplication

512−point inverse NTT

Grid of N thread blocks

512−point forward NTT 512−point forward NTT point−wise modular multiplication

512−point inverse NTT

CTA of 128 threads CTA of 128 threads

interblock thread fence

gathering results from other blocks and partial CRT reconstruction

uploading results back to CPU

Figure 2.4: Fast modular multiplication algorithm on the GPU

The large integer arithmetic constitutes the core of many scientific computations. For in-stance, algorithms in algebraic geometry involve a large amount of symbolic computations performed over integer polynomials in one or more variables (e.g., polynomial subresultants and derived quantities). It is known that the binary segmentation method [2] reduces mul-tiplication of polynomials with integer coefficients to one huge integer mulmul-tiplication.

As it was first shown by Sch¨onhage and Strassen [3], the Number Theoretic transform (NTT), as generalization of discrete Fourier transforms to finite fields, is asymptotically the fastest known way to multiply two N -bit integers. They also conjectured a lower bound for multiplication as N log N which corresponds to the complexity of the Fast Fourier transform (FFT). As the NTT belongs to the family of fast orthogonal transforms, its inherent par-allel structure makes it very tempting candidate for implementation on massively-threaded architectures.

The algorithm proceeds as follows: the CPU splits large integers into pieces corresponding to the size of the transform, reduces each piece modulo a set of distinct primes and loads the data on the graphics processor. The GPU launches a set of parallel NTT multiplications. In its turn, each single NTT multiplication is computed in parallel by the so-called cooperative thread array (CTA). Once ready, another group of thread arrays gathers multiplication results and performs partial reconstruction of the results in parallel using the Chinese Remainder theorem (CRT). Then, partially reconstructed results are transferred back to the CPU which computes the final product. The algorithm is schematically depicted in Figure 2.4.

Due to the limitations inferred by graphics hardware, our modular multiplication algorithm operates in a field generated by 24-bit primes which also allows for division-free modular reductions using floating-point arithmetic. The current implementation consists of highly-optimized 512-point NTT executed by a single CTA and 1024-point NTT run by two CTAs cooperatively. Our algorithm exploits redundancy in the representation of 24-bit residues with 32-bit words, in the sense that it operates on partially reduced numbers with final reductions deferred to the last stages of the algorithm.

GPU (CUDA 2.1) 39 ms GMP 4.2.1 64-bit 837 ms GMP 4.2.1 32-bit 1482 ms Table to the right shows time measurements for

multipli-cation of 16384 random integers (each of them is 43 × 256 = 11008 bits long) on the graphics processor using 512-point NTT with 2-steps of CRT reconstruction (which is enough

to multiply integers of this bit-length) and on the CPU using GNU MP library⁵.

Finally, there are several ways for future work. Efficient modular arithmetic being devel-oped can be extended to port other algorithms which use exact arithmetic to the GPU.

For instance, it enables computing polynomial subresultants by evaluating a matrix deter-minant modulo a set of distinct primes in parallel by graphics hardware followed by CRT reconstruction of the final result on the CPU.

References

[1] Nvidia cuda: Compute unified device architecture. NVIDIA Corp., 2007.

[2] J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, 1999.

[3] A. Sch¨onhage and V. Strassen. Schnelle multiplikation grosser zahlen. Computing, 7:281–292, 1971.

In document Contents. I Overview The Research Units 3. II Research Units in Detail The Algorithms and Complexity Group (D1) 5 (Page 75-81)