Parallel Algorithm for Multiplication on Elliptic Curves

(1)

Curves

JuanManuelGarciaGarcia 1

andRolandoMenchacaGarcia 2

1

DepartmentofComputerSystems

InstitutoTecnologico deMorelia

Morelia,Mexico

[email protected]

2

CenterforComputingResearch

InstitutoPolitecnicoNacional

MexicoCity,Mexico

[email protected]

Abstract. GivenapositiveintegernandapointP onanellipticcurve

E,thecomputationofnP,thatis,theresultofaddingntimesthepoint

P to itself, called the scalar multiplication, is the central operation of

ellipticcurvecryptosystems.Wepresentanalgorithmthat,usingp

pro-cessors,cancomputenP intimeO(logn+H(n)=p+logp),whereH(n)

istheHammingweightofn.Furthermore,ifthisalgorithmisappliedto

Koblitzcurves,therunningtimecanbereducedtoO(H(n)=p+logp).

1 Introduction

Ellipticcurvecryptosytemswere rstproposed byKoblitz [10]and Miller[15].

Theirmain attractiveis theirstrongestsecuritybykeylength.It ispossibleto

constructellipticcurvecryptosystemsoverasmallerdenitioneldthansimilar

public-keycryptosystemsbasedonthediscretelogarithmproblem,suchasRSA

[17]cryptosystems.Ellipticcurvecryptosystemswitha160-bitkeyarebelieved

to havethesamesecurityas,forexample,theRSAwitha1024-bit keylength.

Severalfastimplementationofellipticcurvecryptosystemshasbeenreported

[8,7,18,4,5]. Thesecurityof elliptic curvecryptosystemsis basedon the

diÆ-cultyoftheellipticcurvediscretelogarithmproblem, andthenthemain

opera-tionofthosecryptosystemsistheexponentiation,alsoknownasscalar

multipli-cation.Forageneralsurveyonexponentiationmethodssee[6].Inordertoobtain

fastellipticcurveexponentiation threefactorshasbeenconsidered:theeld of

denition [4][18], addition chains [4,12,16,18] and optimalcoordinatesystems

[3,5].Special ellipticcurveswitheÆcientarithmetics hasalso been considered

[11] [20][21]. Inthis paper wepropose theapplication of parallel processing to

speeduptheexponentiationonellipticcurves.

Therealreadyexistsparallelalgorithmsforexponentiationofintegersmodulo

am-bit integer[2,1,13] and alsofor exponentiation on theelds GF(2 m

) [22]

andGF(q m

(2)

algorithmcouldleadtoeÆcientimplementationsofellipticcurvecryptosystems

onmultiprocessorcomputers.

2 Binary Method

We haveanelliptic curveE andapointP 2E. Wewantto compute nP, for

somenonnegativeintegern.Letn=(b

N 1 b

N 2 b

N 3 b

2 b

1 b

0 )

2

bethebinary

representationofn,where

N =blognc+1: (1)

Then

n=b

N 1 2

N 1

+b

N 2 2

N 2

++b

2 2

2

+b

1 2+b

0

: (2)

UsingtheHornerexpansion,thelastexpressionbecomes

n=(((b

N 1 2+b

N 2

)2++b

1 )2+b

0

): (3)

From(3) wehave

nP =2(2(2(2(b

N 1 P)+b

N 2

P)++b

1 P)+b

0

P: (4)

The last expression suggests a way to get nP. We can dene the sucession of

pointsP

1 ;;P

N

asfollows.Let

P

1 =P

(5)

P

i =

2P

i 1

ifb

N i =0

2P

i 1

+P ifb

N i =1

fori=2;;N.Wecan easilyobservethat P

N =nP.

Themethodaboveoutlined,to obtainnP by thecomputationoftheterms

P

1 ;;P

N

,isknownasthebinaryalgorithm [9]. From(5)wecandeduce that

toobtainthetermP

i

weneedonedoubling and,ifthecorrespondingbitofthe

binaryrepresentationofnisone,oneadding.Thiscomputation ismadeN 1

times,whereN =blognc+1.IfH(n)isthebinary Hammingweight ofn, that

is,thenumberofbitssettooneinthebinaryrepresentationofn,thenweneed

H(n) 1 point addings besides the N 1 doublings. Then we can state the

followingtheorem.

Theorem1. If P isapointonan elliptic curveE andnisapositive integer,

toobtain nP by the binarymethodarerequiredblogncdoublingsand H(n) 1

pointaddings.

Ifweassumethatthetimetakento addtwoellipticcurvepointsis almostthe

(3)

(blognc+H(n) 1)t;

wheret isthe timetakenby asingle elliptic curveoperation.

Some improvement to the binary method can be obtained by the use of

redundant number systems. Morain and Olivos [16] observed that on elliptic

curvestheinverse of apoint canbe obtainedwithouta cost. Forcurvesy 2

=

x 3

+Ax+B over GF(p) with p > 3, the inverse of (x;y) is (x; y) and, for

y 2

+xy=x 3

+Ax 2

+B overGF(2 m

), theinverseis (x;x+y). Then, wecan

considerrepresentations n= N 1 X i=0 c i 2 i (6) withc i

2f 1;0;1gfori=0;:::;N 1.Anonadjacent form(NAF)isa

repre-sentationwithc

i c

i+1

=0for0i<N 1.SincetheNAFingeneralhasfewer

nonzeros that the binary representation, then theadvantageof using it is few

numberofadditions.MorainandOlivos[16] showedthat theexpectednumber

ofnonzerosinaNAFisN=3.

3 The p

th

-Order Binary Method

First, we divide the binary representation of n in dN=pe blocksof pbits each

oneandthenwesplitnintos

0 ;s

1 ;:::;s

p

suchthatthebinaryrepresentationof

eachs

i

isformedbydN=peblocksofpbitssetto0,exceptforthei-thbit,which

has the same valuethat the i-th bit of the corresponding block of the binary

representationofn.Thus,wecandene s

i

,fori=0;:::;p 1,asfollows:

s 0 =b 0 +b p 2 p +b 2p 2 2p

++b

(d N p e 1)p 2 (d N p e 1)p (7) s 1 =b 1 2+b

p+1 2 p+1 +b 2p+1 2 2p+1

++b

(d N p e 1)p+1 2 (d N p e 1)p+1 . . . s p 1 =b p 1 2 p 1 +b 2p 1 2 2p 1 +b 3p 1 2 3p 1

++b

pd N p e 1 2 pd N p e 1 : That is s i = d N p e 1 X j=0 b jp+i 2 jp+i (8)

for i = 0;1;:::;p 1, where weare considering somepadding bits b

k

=0 for

N 1<kpdN=pe 1.Theintegersetfs

0 ;s

1 ;;s

p 1

gcanbeeasilyobtained

fromnbyXOR-ingitwiththeappropiatemask.Itisclearfrom(2)and(7)that

n=s

0 +s

1

++s

p 1

(9)

andthen,wecancomputenP asthesumofs

0 P;s

1

P;:::;s

p 1 P.

(4)

0 1 p 1

pprocessors,computeinparallel thescalarmultiplications

s

0 P;s

1

P;:::;s

p 1 P

runningthebinarymethodoneachprocessor.

2. Associativefan-in: Usingdp=2eprocessorscomputeinparallel

nP=s

0 P+s

1

P+:::+s

p 1 P:

Todothis, rstweseparatethetotalsumas

p 1

X

i=0 s

i =

dp=2e 1

X

i=0 s

i +

p 1

X

i=dp=2e s

i ;

andthenbothhalvescan becomputedinparallel.Weproceedoneachhalf

thesamewayrecursivellyuntil we onlyneed to computea singleaddition

oftwoelliptic curvepoints.Clearlyin thedeepest levelofthe recursionwe

havetododp=2eadditionsin parallel,andtoobtainthetotalsumwehave

tomakedlogperecursivesteps.

We can observethat if p = 1 then our algorithm reduces to the ordinary

binaryalgorithm.Inwhat follows,weanalyzetherunningtimeofthep th

-order

binarymethod.

Theorem2. The running time of the p th

-order binary algorithm, on the best

case, is

blognc+dH(n)=pe+dlogpe 1

and, onthe worstcase,

blognc+H(n) 1:

Proof. First,wemust observethat theintegersetfs

0 ;s

1 ;;s

p 1

gcan be

ob-tained from n in a negligible time. In the phase one of the algorithm, each

processor executes the binary algorithm to compute s

i

P, where i is the

pro-cessor'sindex. Fromcorollary1,itrequiresblogs

i

c+H(s

i

) 1stepstodothis

computation.Now,sincea

N 1

=1,oneoftheprocessorshavetocomputeN 1

doublingsandifwesupposethatin ordertostartthephasetwoallthe

proces-sors must nish its computation, then the running time of the phase one will

be

T

1

=N+H

2; (10)

where

H

= max

0ip 1 H(s

i

): (11)

Sincewehavethat

H(n)= p 1

X

H(s

i

(5)

dH(n)=peH

H(n): (13)

In the best case, all the bits set to onein the binary representation of n are

equallyscattered amongthe integersfs

i

g,sotheloadis perfectly balanced on

alltheprocessors,andthenwehave

H

=

H(n)

p

: (14)

Andfrom(14),(10)and(1)wehavethat,onthebestcase,thetimeofthephase

oneisgivenby

T

1

=blognc+dH(n)=pe 1: (15)

Inthephasetwoofouralgorithm,thesums

0

P++s

p 1

P iscomputedusing

theassociativefan-inalgorithmin

T

2

=dlogpe (16)

steps,andthenthetotalrunningtimeforthebest casewill be

T =T

1 +T

2

=blognc+dH(n)=pe+dlogpe 1: (17)

Inthe worst caseH

=H(n), which implies by thedenition of H

(11) that

exists some index k, 0 k p 1, such that H(s

k

)= H(n) and, from (12),

H(s

i

)=0fori 6=k and then s

i

=0for i6=k. From (9) wehavethat n =s

k

and then s

k

P =nP. Then wehavethat nP is computedonthe phaseone by

theprocessork. Thereforetherunningtimewillbethesameasthat statedon

thecorollary1.

Someimprovementcan be obtainedif we use theNAF as input to our

al-gorithm insteadof the binary representation of n. Since the NAF hasa fewer

numberofnonzeros,thenwecanobtainthesamerunningtimeswithfewer

pro-cessors.Butbeforethiswecanseethatabottleneckin ouralgorithmiscaused

bythenumberofdoublingsneededontherstphase,representedbythelinear

termonN in (10),whichisindependentfromthenumberpofprocessors.

In the deduction of the running time of our algorithm, was supossed that

thetimeneededtodoadoublingorapointadditionisalmostthesame,butin

generalitdepends ofthecoordinatesystemin whichanelliptic curveis

repre-sented.The most well known coordinate systemsare the aÆneand projective

coordinates[19].Anothertwocoordinatesystems,theJacobiancoordinatesand

ChudnovskyJacobian coordinates havebeenproposed in [3]. The eÆciency of

Jacobian coordinates for elliptic curve exponentiation was discussed in [4]. In

[5]hasbeenproposedamodiedJacobiancoordinatesystem,whichgivesfaster

doublings than aÆne, projective, Jacobian and Chudnovsky Jacobian

coordi-nates. Then we can use this modiedJacobian coordinate systemto partially

overcomethebottleneckofdoublingsinouralgorithm.

(6)

4 The p -order -ary method

Thebinary anomalous curves,proposedbyKoblitz [11],havebeenusedto

ob-taineÆcientimplementationsofellipticcurvecryptosystems[14][20][21].These

curvesare

E

1 : y

2

+xy=x 3

+x 2

+1 (18)

and

E

2 : y

2

+xy=x 3

+1 (19)

overGF(2 m

). For these it canbe easily veried thepropertythat ifthe point

(x;y)isinthecurve,alsoisthepoint(x 2

;y 2

).ThenwecandenetheirFrobenius

automorphisms as '(x;y) = (x 2

;y 2

), which corresponds to multiplication by

=(1+ p

7)=2onE

1

andby =( 1+ p

7)=2onE

2

.Usingnormalbasis

on GF(2 m

),the multiplicationby requires only twocyclic shifts andthen it

can bedoneinanegligibletime.

Since isanelementofnorm2intheEuclideandomain Z[]anyintegern

hasarepresentationas

n= 1 X i=0 c i i (20) for c i

2f0;1g. Forany n 2Z[]a representation(20) is called aNAF if c

i 2

f0;1gandc

i c

i+1

=0foralli0.Solinas[20]showedtheexistenceofaNAF

of minimal weigth by givingan algorithm that computes theNAF directly. If

jn, thenc

0

=0.Otherwise, 2

divideseithern+1orn 1andtheNAFends

in (0; 1) or(0;1), respectively.This processisrepeated onn=,(n+1)= 2 or (n 1)= 2 . Because' m

(x;y)=(x 2

m

;y 2

m

)=(x;y),anytworepresentationswhichagree

modulo m

1will yieldthe sameendomorphismonthecurve.Basedon this

property,MeierandStaelbach [14]showedthefollowing:

Theorem3. Every n2Z[]has arepresentation

n m 1 X i=0 c i i (mod m 1); (21) with c i

2f0;1g.

The binary method can be extended in a natural way to a -ary method

based on the representation of n given by (21). Now we candene a parallel

p th

-order-arymethod.

Using(21)wecandenetheset fs

0 ;;s

p 1

gZ[],where

s i = d m p e 1 X j=0 c jp+i jp+i (22)

fori=0;1;:::;p 1.

(7)

s

0 P;s

1

P;;s

p 1 P

runningthe-aryalgorithm oneachprocessor.

2. Associativefan-in: Usingdp=2eprocessorscompute

p 1

X

k =0 s

k P

withtheassociativefan-inalgorithm,indlogpesteps.

Ifwedene H(n)as thenumberofnonzerosin therepresentationgivenby

(21),wecanstatethefollowingtheorem:

Theorem 4. Therunningtimeofthep th

-order-ary method,onthe bestcase,

is

dH(n)=pe+dlogpe 1

and

H(n) 1

on theworst case.

Proof. Theproof ofthis theorem isthesameasthat of theorem2except that

thetimeforthephaseonewill be

T

1 =H

1 (23)

whereH

isdenedas in(11).

MeierandStaelbach[14] conjecture,basedonexperimentalevidence, that

onaveragehalf ofthec

i

will benonzero.Sowecanexpectthat whenpm=2

therunningtimewillbeclosertologH(n).Furtherimprovementscanbemade

to the -ary representation to reduce the number of nonzeros [21], so a small

numberofprocessorscan bringsimilarspeed-upinthep th

-order-arymethod.

5 Conclusions

Wehavediscussedinthispaperaextensionofthebinaryalgorithm,whichbythe

useofparallelprocessingcanspeedupthecomputationofscalarmultiplications

onellipticcurves.Wehavealsodiscussedsomewaystoimprovethisalgorithm.

WehaveshowedhowitcanbespeciallyeÆcientforKoblitz curves.

(8)

1. M.AdlemanandK.Kompella,Usingsmoothnesstoachieveparallelism,

Proceed-ingsofthe20thACMSymposiumontheTheoryofComputing,pp.528-538,1988.

2. P.W. Beame, S.A. Cook and H.J. Hoover, Log depth circuits for division and

relatedproblems,SIAMJournalonComputing,vol.15,pp.994-1003,1986.

3. D.V.ChudnovskyandG.V.Chudnovsky,Sequencesofnumbersgeneratedby

ad-dition in formal groups and new primality and factorization tests, Advances in

AppliedMathematics,vol.7,pp.385-434,1986.

4. H.Cohen,A.MiyajiandT.Ono,EÆcientellipticcurveexponentiation,Advances

inCryptology-ProceedingsofICICS'97,LNCS1334,pp.282-290,1997.

5. H. Cohen, A. Miyaji and T. Ono, EÆcient elliptic curve exponentiation using

mixed coordinates, Advances in Cryptology - ASIACRYPT'98, LNCS 1514, pp.

51-65,1998.

6. D.M. Gordon, A survey of fast exponentiation methods, Journal of Algorithms,

vol.27, pp.129-146,1998.

7. J. Guajardo and C. Paar, EÆcient algorithms for elliptic curve cryptosystems,

AdvancesinCryptology -CRYPTO'97,LNCS1294, pp.342-356,1997.

8. G.Harper,A.MenezesandS.Vanstone,Public-keycryptosystemswithverysmall

keylengths,Advances inCryptology-EUROCRYPT'92,LNCS658,pp.163-173,

1993.

9. D.E. Knuth. The Art in Computer Programming. Vol 2: Seminumerical

Algo-rithms.Addison-Wesley,2nd.Ed.,1981.

10. N.Koblitz,Ellipticcurvecryptosystems,MathematicsofComputation,vol.48,pp.

203-209,1987.

11. N.Koblitz,CM-curveswithgoodcryptographicproperties,AdvancesinCryptology

-CRYPTO'91,LNCS576,pp.279-287,1992.

12. K.KoyamaandT.Tsuruoka,Speedingupellipticcryptosystemsbyusingasigned

binary window method, Advances in Cryptology - CRYPTO'92, LNCS 740, pp.

345-357,1993.

13. C.H. Lim and P.J. Lee, More exible exponentiation with precomputation,

Ad-vancesinCryptology-CRYPTO'94,LNCS389,pp.95-107,1994.

14. W.MeierandO.Staelbach,EÆcientmultiplicationoncertainnonsupersingular

ellipticcurves,AdvancesinCryptology-CRYPTO'92,vol.740,pp.333-344,1993.

15. V. Miller, Uses of elliptic curves in cryptography, Advances in Cryptology

-CRYPTO'85,LNCS218,pp.417-426,1986.

16. F.MorainandJ.Olivos.Speedingupthecomputationsonanellipticcurveusing

addition-substractionchains.Inform.Theor.Appl.,24,531-543,1990.

17. R.Rivest,A.ShamirandL. Adleman,A methodforobtainingdigitalsignatures

and public keycryptosystems, Communications of the ACM, vol. 21, no. 2, pp.

120-126,1978.

18. R.Schroeppel,H.Orman,S.O'MalleyandO.Spatscheck,Fastkeyexchangewith

ellipticcurvesystems,Advances inCryptology -CRYPTO'95,LNCS963,pp.

43-56,1995.

19. J.H. Silverman, The Arithmetic of Elliptic Curves, GTM 106, Springer Verlag,

1986.

20. J. Solinas, An improved algorithm for arithmetic on a familyof elliptic curves,

AdvancesofCryptology-CRYPTO'97,LNCS1294,pp.357-371,1997.

(9)

GF(2 n

),SIAMJournalonComputing,vol.19,pp.711-717,1990.

23. J.vonzurGathen,EÆcientandoptimalexponentiationinniteelds,