Hyperelliptic Curves and the HCDLP P Gaudry
VII.2. ALGORITHMS FOR COMPUTING THE GROUP LAW
Algorithm VII.1: Lagrange’s Algorithm for Hyperelliptic Group Law INPUT: Two reduced divisors D1= div(a1, b1) and D2= div(a2, b2).
OUTPUT: The reduced divisor D3= div(a3, b3) =D1+D2.
Composition.
1. Use two extended GCD computations, to get
d= gcd(a1, a2, b1+b2+H) =u1a1+u2a2+u3(b1+b2+H).
2. a3 ←a1a2/d2
3. b3←b1+ (u1a1(b2−b1) +u3(F−b21−b1H))/d(moda3)
4. If deg(a3)≤g then return div(a3, b3).
Reduction. 5. ˜a3 ←a3, ˜b3←b3. 6. a3 ←(F−b3H−b23)/a3. 7. q, b3←Quotrem(−b3−H, a3). 8. While deg(a3)> g do 9. t←˜a3+q(b3−˜b3). 10. ˜b3 ←b3, ˜a3=a3, a3 ←t. 11. q, b3←Quotrem(−b3−H, a3). 12. Return div(a3, b3).
The important case where D1 = D2 must be treated separately; in-
deed some operations can be saved. For instance, we readily know that
gcd(a1, a2) =a1.
To analyze the run time of this algorithm, we evaluate how the number of base field operations grows with the genus. The composition steps require one to perform the Euclidean algorithm on polynomials of degree at mostgand a constant number of basic operations with polynomials of degree at most 2g. This can be done at a cost ofO(g2) operations in the base field. The cost of the reduction phase is less obvious. If the base field is large enough compared to the genus, each division is expected to generate a remainder of degree exactly one less than the modulus. The quotientsqinvolved in the algorithm are then always of degree one and one quickly see that the cost is againO(g2) operations in the base field for the whole reduction process. It can be shown [317] that even if the degrees of the quotients behave in an erratic way, the heavy cost of a step where the quotient is large is amortized by the fact that we have less steps to do and finally the overall cost of reduction is always O(g2) base field operations.
VII.2.2. Asymptotically Fast Algorithms. In the late eighties, Shanks [298] proposed a variant of Gauss’s algorithm for composing positive definite quadratic forms. The main feature of this algorithm, called NUCOMP, is that most of the operations can be done with integers half the size of the integers that are required in the classical algorithm. This makes a great difference if
we are in the range where applying NUCOMP allows us to use machine word sized integers instead of multiprecision integers.
In the classical algorithm, just like in Lagrange’s algorithm, the compo- sition step produces large elements that are subsequently reduced. The idea of the NUCOMP algorithm is to more or less merge the composition and the reduction phases, in order to avoid those large elements. A key ingredient is the partial GCD algorithm (also known as the Half GCD algorithm, which is related to the continued fractions algorithm). This is essentially a Euclidean algorithm that we stop in the middle. In NUCOMP we need a partial ex- tended version of it, where only one of the coefficients of the representation is computed.
In [182], it is shown that the NUCOMP algorithm can be adapted to the case of the group law of a hyperelliptic curve. It suffices more or less to replace integers by polynomials. We recall the Partial Extended Half GCD algorithm here for completeness:
Algorithm VII.2: Partial Extended Half GCD (PartExtHGCD)
INPUT: Two polynomials A and B of degree ≤n and a bound
m
OUTPUT: Four polynomials a, b, α, β such that
a≡αA(modB) and b≡βA(modB)
and dega and degb are bounded by m.
An integer z recording the number of operations 1. α←1, β ←0, z←0, a←AmodB, b←B.
2. While degb > m and a= 0 do
3. q, t←Quotrem(b, a). 4. b←a, a←t, t←β−qα, β←α. α←t. 5. z←z+ 1. 6. If a= 0 then 7. Return (AmodB, B,1,0,0). 8. If z is odd then 9. β← −β, b← −b. 10. Return (a, b, α, β, z).
We are now ready to give the NUCOMP algorithm as described in [182]. We restrict ourselves to the case of odd characteristic where the equation of the curve can be put in the formY2 = F(X). The minor modifications to handle the characteristic 2 case can be found in [182].
VII.2. ALGORITHMS FOR COMPUTING THE GROUP LAW 139
Algorithm VII.3: NUCOMP Algorithm
INPUT: Two reduced divisors D1= div(u1, v1) and D2 = div(u2, v2).
OUTPUT: The reduced divisor D3=D1+D2.
1. w1 ←(v12−F)/u1, w2←(v22−F)/u2.
2. If degw2<degw1 then exchange D1 and D2.
3. s←v1+v2, m←v2−v1.
4. By Extended GCD, compute G0, b, c such that G0= gcd(u2, u1) =bu2+cu1.
5. If G0 |s then
6. By Extended GCD, compute G, x, y such that
G= gcd(G0, s) =xG0+ys. 7. H ←G0/G, By ←u1/G, Cy ←u2/G, Dy ←s/G. 8. l←y(bw1+cw2) modH, Bx←b(m/H) +l(By/H). 9. Else 10. G←G0, Ax←G, Bx←mb. 11. By ←u1/G, Cy ←u2/G, Dy ←s/G. 12. (bx, by, x, y, z)←PartExtHGCD(BxmodBy, By,(g+ 2)/2). 13. ax←Gx, ay←Gy. 14. If z= 0 then 15. u3 ←byCy, Q1 ←Cybx, v3←v2−Q1. 16. Else 17. cx←(Cybx−mx)/By, Q1 ←bycx, Q2 ←Q1+m. 18. dx←(Dybx−w2x)/By, Q3←ydx, Q4←Q3+Dy. 19. dy←Q4/x, cy←Q2/bx. 20. u3 ←bycy−aydy, v3 ←(G(Q3+Q4)−Q1−Q2)/2 modu3. 21. While deg(u3)> g do 22. u3 ←(v32−F)/u3, v3← −v3modu3. 23. Return div(u3, v3).
In the NUCOMP algorithm, if the input divisors are distinct, then it is likely thatG0= 1, and then Steps 5–8 are skipped. On the other hand, in the case of doubling a divisor, these steps are always performed whereas Step 4 is trivial. Hence it is interesting to have another version of NUCOMP that is specific to doubling. Shanks had such an algorithm which he called NUDPL and it has been extended to our case in [182].
Steps 21–22 are there to adjust the result in the case where the divisor
div(u3, v3) is not completely reduced at the end of the main part of the al-
gorithm. In general, at most one or two loops are required. Therefore the NUCOMP algorithm requires a constant number of Euclidean algorithms on polynomials of degree O(g) and a constant number of basic operations with polynomials of degree O(g). Thus the complexity isO(g2), again, just as for
Lagrange’s algorithm. However, as we said before, the NUCOMP algorithm works with polynomials of smaller degree (in practice rarely larger thang).
Up to now we have considered naive algorithms for polynomial multipli- cations and GCDs. If we used an FFT-based algorithm, two polynomials of degree O(n) can be multiplied in time O(nlognlog logn) operations in the base field. It is then possible to design a recursive GCD algorithm that requires O(nlog2nlog logn) operations [141]. Furthermore, the same algo- rithm can also be used to computed the Partial Extended Half GCD in the same run time (this is actually a building block for the fast recursive GCD computation). Plugging those algorithms in NUCOMP and NUDUPL yields an overall complexity of O(glog2glog logg) which is asymptotically faster than theO(g2) complexity of Lagrange’s algorithm.
VII.2.3. Which Algorithm in Practice. When the genus is large, the best algorithm is not the same as when the genus is small. In particular, for very small genus, it is possible to write down the explicit formulae cor- responding to the Cantor or Lagrange algorithm. Then some operations can be saved by computing only the coefficients that are necessary. For instance, the quotient of two polynomials costs less if it is known in advance that the division is exact (see [258]). For genus 2 and 3, such optimized formulae have been worked out by many contributors. We refer to [212] for a survey of the current situation in genus 2, and to [277] for genus 3. In the case of genus 2, the best formulae for adding (resp. doubling) have a cost of 25 multiplications and 1 inversion (resp. 27 multiplications and 1 inversion).
When the genus is larger than 3, explicit formulae are not available and the best algorithm is then Lagrange’s algorithm.
According to [182], the genus for which the NUCOMP and NUDUPL
algorithms are faster than Lagrange’s algorithm is around 10. This value de- pends highly on the implementation of those two algorithms, on the particular type of processor that is used, and on the size of the base field.
It is likely that the asymptotically fast algorithm for computing GCD will never be useful for practical applications: the point at which it becomes faster than the classical Euclidean algorithm is around genus 1000.
We finally mention that an analogue of projective coordinates for elliptic curves has been designed. This allows one to perform the group law without any inversion, which can be crucial in some constrained environments such as smart cards. In addition, in the spirit of [85], different types of coordinates can be mixed to have a faster exponentiation algorithm [212].