RIT Scholar Works
Theses
Thesis/Dissertation Collections
2005
Hardware and Software Multi-precision
Implementations of Cryptographic Algorithms
Muhammad A. Janjua
Follow this and additional works at:
http://scholarworks.rit.edu/theses
This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion
in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact
.
Recommended Citation
Implementations of Cryptographic Algorithms
by
Muhammad Ali Janjua
A thesis submitted in partial fulfillment of the requirements for the degree of
Master of Science in Computer Engineering
Approved By:
Dr. Marcin Lukowiak
Supervised by
Dr. Marcin Lukowiak
Department of Computer Engineering
Kate Gleason College of Engineering
Rochester Institute of Technology
Rochester, NY
June, 2005
Primary Advisor - R.I.
T.
Dept. oj Computer Engineering
20/05
Dr. Stanislaw P. Radziszowski
I
Secondary Advisor - R.I.
T.
Dept. oj Computer Science
Dr. Muhammad Shaaban
Title of Thesis: Hardware and Software Multi-precision Implementations of
Cryptographic Algorithms
Name of Author: Muhammad Ali Janjua
Degree: Master of Science
Major: Computer Engineering
College: Kate Gleason College of Engineering
As per current Rochester Institute of Technology (RIT) guidelines for completion
of my degree, I understand that I need to submit a copy of my Master's thesis to the RIT
Archives. I hereby permit RIT and its agents to archive and make use of my thesis or
dissertation in whatever forms necessary. I retain the ownership rights to the copyright of
the thesis or dissertation and also retain the rights to use all or part of my thesis in my
future work.
I
wouldlike
to
offerdeep
and sincerethanks to
God
the
Gracious
andMerciful,
who
blessed
me withthe
capabilities ofheart
andmind,
whichlead
meto successfully
completethis thesis.
I
amindebt
to
my
family
membersfor
giving
me courage and confidencewith alltheir
hopes
and prayersfor
me.Now
asI
submitmy
thesis,
I
express meheartiest
gratitudeto
allthose
whohelped
mein completing my
thesis.
Working
onthis thesis
has
been
anexperience,
and educationin its
ownway.For
the
first
time
in
my
entire education careerI
was ableto
workthoroughly
within achallengeable environmentwhichmade me
to
learn
morefrom
industrial
pointofview.My
deepest
gratitudeis for Dr. Marcin
Lukowiak,
my
thesis advisor,
who was aguiding light for
me.I
found his
expertise spreadequally
overtheory
aswell as practicalaspects.
Specially,
he
guidedmy way
to
overcomeproblemsoccurring
from
time
to time
during
the
course ofcompletionofthis thesis.
I
wouldlike
to
thank the
members ofmy
thesis committee,
Dr. Stanislaw Pawel
Radziszowski
andDr. Muhammad
Shaaban,
for
taking
the time
from
their
busy
schedulefor advising
me aboutthe
relatedquestions,
reviewing my
work,
andperfecting
mefor
the
final defense.
Finally,
I
wouldlike
to thank
all ofmy
professors with whomI
took
variouscourses which
helped
mein
thinking
creatively
anddynamically
to
bring
up
the
successto
my
thesis.
I
amthankful
my
supervisorduring
my
internship,
Mr. Jim O'Connor
for
Abstract
The
softwareimplementations
ofcryptographic algorithms are consideredto
be
very
slow,
whenthere
arerequirements ofmulti-precision arithmetic operations onvery
long
integers.These
arithmetic operationsmay
include
addition,
subtraction,
multiplication, division
andexponentiation.Several
researchpapershave been
publishedproviding
different
solutionsto
makethese
operationsfaster.
Digital
Signature
Algorithm
(DSA)
is
a cryptographic applicationthat
requires multi-precision arithmetic operations.These
arithmetic operations aremostly
based
upon modularmultiplication and exponentiation onintegers
ofthe
size of1024
bits.
The
use of such numbersis
an essential part ofproviding
high
security
againstthe
cryptanalytic attacks onthe
authenticated messages.
When
these
operations areimplemented
in
software,
performancein
terms
of speedbecomes
very
low.
The
majorfocus
ofthe
thesis
is
the
study
of various arithmetic operationsfor
publickey
cryptography
andselecting
the
fast
multi-precision arithmetic algorithmsfor hardware
implementation.These
selectedalgorithmsareimplemented
in
hardware
and softwarefor
performance comparisonandthey
are usedto
implement
Digital
Signature
Algorithm for
Table
ofContents
Acknowledgements
j
Abstract
^
Table
ofContents
jy
List
ofFigures
vj
List
ofTables
ix
Glossary
xiiChapter
1.
Introduction
1
1.1
Scope
ofResearch
1
1.2
Organization
ofThesis
2
Chapter
2.
Cryptography,
anintroduction
2.1
Cryptography
#
3
2.1.1 Digital
Signatures
6
2.1.1.1 RSA Digital
Signature
6
2.1.1.2 ElGamal Digital
Signature
Scheme
7
2.1.1.3 Digital
Signature Algorithm
7
2.2
Summary
11
Chapter
3.
Multi-precision
arithmeticin
cryptography
12
3.1 Congruence
,...,....12
3.2 Greatest Common Divisor
(GCD)
...13
3.3 Euclidean Algorithm
14
3.4 Modular
Exponentiation
...14
3.4.1
Right-to-left
binary
exponentiation15
3.4.2
left-to-right
binary
exponentiation-#J5
3.5 Common
Algorithms
usedin
Cryptography
...16
3.5.1 Extended Euclidean
Algorithm
....,..
17
3.5.2
Primarily Testing
3.6
Summary
joChapter
4.
Hardware
implementations
ofthe
multi-precision modular arithmeticmethods
...20
4.1
Background
Work
20
4.2
Classical
Modular
Reduction
21
4.3
Montgomery
Modular Reduction
without
trial
division
22
4.4
Montgomery
Modular
Multiplication
without
trial
division
[7]
24
4.4. 1
Hardware
implementation
ofMontgomery
Modular
multiplication25
4.4.1.1 Design
withtwo
adders26
4.4.1.1.1 Simulation
results29
4.4.1.1.2 Synthesis
results32
4.4.1.2
Design
withtwo
adders andamultiplexer33
4.1.4.2.1 Simulation
results35
4.4.1.2.2 Synthesis
results37
4.5
Montgomery
Modular
Exponentiation
39
4.5.1
Left-to-Right
Montgomery
modularexponentiationalgorithm39
4.5.2
Right-to-Left
Montgomery
modularexponentiationalgorithm41
4.5.2.1
Hardware
implementation
ofRight-to-Left
Montgomery
modularexponentiationalgorithm
43
4.5.2.1.1 Simulation
ofRight-to-Left
Montgomery
modularexponentiation
#
5-7
4.5.2.1.2 Synthesis
ofRight-to-Left
Montgomery
modularexponentiation61
4.5
Summary
^
Chapters.
Hardware
andSoftware
implementation
ofDigital
Signature
iUgorithm
using
Montgomery
Modular
methods62
5.1
Software Implementation
ofDigital
Signature
Algorithm
,.,.,..62
5.1.1 Results
andAnalysis
ofSoftware Implementation
65
5.2
Hardware
Implementation
ofDigital
Signature Standard
5.2.1.1 Simulation
resultsfor
DSA-Signature
block
74
5.2.1.2 Synthesis
resultsfor DSA-Signature block
79
5.2.1.3
Comparison
between
1024
bit hardware
andits
equivalent softwaredesign
79
5.2.2 Hardware
implementation
of DSA-Verification Operation
80
5.2.2.1 Simulation
resultsfor
DSA-Verification
block
85
5.2.2.2 Synthesis
resultsfor
DSA-Verification
block
89
5.2.2.3 Comparison between
1024
bit
hardware
andits
equivalentsoftwaredesign
89
5.3
Summary
90
Chapter
6.
Conclusions
andfuture
work91
List
ofFigures
Figure 2.1: Public
Key
Cryptosystem
5
Figure
2.2:
Data flow diagram
ofDigital Signature Algorithm using SHA-1
8
Figure 2.3:
Block
diagram
Digital
Signature
Algorithm
usedfor
the thesis
research9
Figure
4.1
:Top
level block diagram
ofMontgomery
modularmultiplier25
Figure 4.2: Port
level block diagram
ofMontgomery
modular multiplier26
Figure
4.3: Block
level diagram
ofMontgomery
Modular Multiplier using
two
addersand
two
multipliers27
Figure 4.4: Finite
state machinefor
Montgomery
multiplier withtwo
addersdesign
...28
Figure 4.5: Figure 4.5: Waveform
simulationsfor 4-bit
Montgomery
modular multiplierdesign
withtwo
adders.Design
enable anddata
load
view30
Figure 4.6: Figure 4.6: Waveform
simulationsfor
4-bit
Montgomery
modular multiplierusing design
withtwo
adders, output=ABR~l(mod
M)
31
Figure 4.7: Figure 4.7: Waveform
simulationsfor
4-bit
Montgomery
modular multiplierusing design
withtwo
adders,output=AB(R2)(mod
M)
31
Figure
4.8: Block
diagram
ofMontgomery
modular multiplierusing multiplexer,
andtwo
adders35
Figure 4.9: Input
and output ports ofthe
Montgomery
Modular
exponentiationdesign
Figure
4.10:
Block
diagram
ofright-to-left
Montgomery
Modular Exponentiation
Algorithm
44
Figure
4.11:
Block diagram
of oneADDMUX
block
used
in
right-to-left
Montgomery
modular
exponentiation
algorithm...49
Figure
4.12: Data flow level
block diagram
of
right-to-left
Montgomery
Modular
Exponentiation
50
Figure 4.13: Finite State Machine
ofright-to-left
Montgomery
Modular Exponentiation
Algorithm
<-*Figure 4.14: Wave form
simulationsshowing
the
data
inputs
ofp, e,
m and c with outputgenerated
58
Figure
4.15:
Wave form
simulationsshowing
the
behavior
ofthe
registers usedin
the
design.
This
simulationis
the
first half
ofthe total simulation59
Figure
4.16:
Wave form
simulationsshowing
the
behavior
ofthe
registers usedin
the
design.
This
simulationis
the
secondhalf
ofthe total
simulation60
Figure
5.1: Data
flow diagram
in
the
hardware Implementation
ofDSA
signature
block
using
data
producedby
the
softwareimplemented
block
..,,...68
Figure
5.2:
Block
diagram
ofDSA
Signature
Block using
two
Montgomery
modularexponentiation
blocks
finFigure 5.3:
Port-level detail
ofDSA
signatureblock
70
Figure 5.5: Shift
registerto
load data
setsin
shiftleft
mode72
Figure 5.6: Finite
Machine
ofDSA Signature
Module
73
Figure
5.7:
Wave form
simulationfor
12
bit
DSA
signaturedesign showing
the
load
operationcompleted
in 6
clock cycles76
Figure
5.8:
Output
generated at1405
nsfor
12
bit
DSA
signaturedesign
using 2
nsclock.
Wave form
showsthe
final
output operation77
Figure
5.9: Data
flow diagram
ofhardware
implementation
ofDSA
verificationblock
81
Figure
5.10: Block
diagram
ofDSA Verification Block using
two
Montgomery
modularexponentiation
blocks
82
Figure 5.11: Port-level
detail
ofDSA
verificationblock
82
Figure 5.12: Finite
state machinefor
DSA
verificationblock
84
Figure 5.13: Wave form
simulationsfor
the
load
operation ofDSA
verificationblock
86
Figure 5.14: Wave
form
simulationsfor
the
final
out put of12
bit
DSA
verificationList
ofTables
Table 4.1:
Comparison
ofFPGA
resources used with minimum clock32
Table
4.2: Comparison
of simulationtime to
completethe
modular multiplicationoperation
in hardware
and software36
Table 4.3:
Comparison
ofFPGA
resources usedfor
Montgomery
modular multiplierwith multiplexer
37
Table 4.4: Control
signalsfor
shift registersgeneratedby
finite
state machine52
Table 4.5: Control
signals generatedby
FSM
for
registersin design in LOAD
withredjnontld sub-state
53
Table 4.6: Control
signals generatedby
FSM
for
registersin REDUCEMONT
state..53Table 4.7: Control
signals generatedby
FSM
for
registersin LOAD
with squmultldstate
54
Table 4.8: Control
signalsgeneratedby
FSM
for
registersin SQU_MULT
state55
Table
4.9: Control
signalsfor
registersin
LOAD
withfinaloutld
state56
Table 4.10: Control
signals generatedby
FSM
for
registersin
FINALOUT
state57
Table 4.11:
Results
taken
from
the
synthesis ofdifferent
sizes ofthe
design
ofMontgomery
modularexponentiationalgorithm61
Table 5.1: Values
generated atthe
output ofthe
softwareimplementation
ofDigital
Table 5.2: Total
simulationtime
for
the
softwareblocks
ofDSA
signature andverification operations.
The
blocks
consideredfor
timing
analysis werethe
same asmodeled
in
VHDL for
synthesis66
Table 5.3: Data input for
12
and32
bit hardware
block
ofDSA Signature
75
Table
5.4:
Data
producedfrom
the
DSA
Signature block
for
DSA
verificationblock
76
Table
5.5:
Data
input
producedby
softwarefor
1024
bit
hardware block
ofDSA
Signature
77
Table 5.6:
Data
producedfrom
the
DSA Signature block for DSA
verificationblock
78
Table
5.7: Synthesis
resultstaken
for
12,
32
and1024
bit DSA Signature designs
79
Table 5.8: Comparison between hardware
and softwareimplementation in
terms
ofspeed79
Table 5.9: Inputs
to
DSA
verificationblock
85
Table 5.10: Outputs from DSA
verificationblock
exceptr86
Table 5.11: Input
valuesfor
DSA
verificationblock
87
Table 5.12: Output
valuesfrom
DSA
verificationblock
exceptr,
whichis
givenhere for
comparison
88
Table 5.13: Synthesis
resultstaken
for
12,
32
and1024
bit
DSA
verificationTable
5.14:
Comparison
between hardware
and softwareimplementation in
terms
ofGlosary
DSS: Digital
Signature
Standard
DSA: Digital
Signature
Algorithm.
RSA:
A
publickey
cryptosystem named afterits
inventers, Rivest, Shamir,
andAdlerman.
SHA-1: 160
bit
securehashing
standard.GCD: Greatest Common Divisor.
ASIC: Application
Specific Integrated Circuit.
VHDL:
Very
high
speedintegrated
circuitHardware
Description Language.
Key: A
number or equivalent representationform
whichis
usedto
encrypt ordecrypt
information.
Primality
Testing:
Any
algorithmusedto
verify
anumberif
prime or not.CPU: Central
Processing
Unit.
FSM: It
standsfor
Finite State Machine. It
is
a controlblock
to
controlthehardware data
flow logic.
Synthesis: Conversion
ofcode writtenin VHDL
to
describe
actualbehavior
ofhardware
into
gatelevel
model.Gate
level
model representslogical
hardware
components,
whichPlace
andRoute:
In
place and routeoperation,
the
gatelevel
model generatedby
synthesis operation are connectedtogether
using
interconnects
or wiresto
form
ahardware design.
This
operationalso generatesSDF
file,
which containsthe
actualtiming
detail
ofthe
gatelevel
model.SDF:
Standard
delay
format.
FPGA:
It
standsfor field
programmable gate array.It
is
usedto
implement
the
design,
which
has been
processedthrough
placeand route operation.
After
place and routeIntroduction
Application
Specific
Integrated
Circuits
(ASICs)
have
outperformedmany
applications
running
on general purpose processorsin
terms
of speed.This
is
due
to the
nature of
hardware implementation
ofthese
applications which givesthe
advantages of
hardware
parallelism, pipelining,
dedicated
resources and shortlength
ofdata
transfer.
Multi-precision
cryptographic applications require use ofvery
long
numbers,i.e.
Digital Signature Algorithm
requires modular exponentiation ofthe
size of1024
bits.
As
a
result,
the
softwarebased
application execution on a32
bit
general purpose processorbecomes
very
slowbecause
of sequentialdata
operations, longer
interconnects
andlack
of
data
parallelism.Implementation
ofdedicated hardware
can removethese
speedbottlenecks,
and as a result a combination ofhardware
and software providesbetter
performance.
This
combination can alsobe further
optimizedinto
System
onBoard
orSystem
onChip
to
achieve more speedup.Multi-precision
modular arithmeticis
an essential part of publickey
cryptosystems.
Ordinary
implementation
of1024
or2048
bit
modular arithmeticin
hardware
can affectthe speed,
because
it
requires use ofdivision
operation.Because
ofthis,
softwarebased fast
multi-precision algorithms cannotbe
implemented
in
hardware.
The
requirement ofthe
long
numbersis
requiredto
providehigh security
againstcryptanalytic attacks on
the
authenticated messages.1.1 Scope
ofResearch
The
research area coveredin
this thesis to
recognizethe
speedbottlenecks
in
the
software
based
multi-precision cryptographic algorithms.In
orderto
removethese
bottlenecks,
fast
multi-precision algorithmsfor
modular multiplication andexponentiation are analyzed and
implemented
in
hardware.
The
hardware
is
then
further
used
to
implement Digital
Signature
Algorithm
(DSA)
cryptosystem.The
implementation
of
DSA
has been done in
both
software andhardware for
performance comparison.The
hardware
implementation
is
particularly targeted
for
those
portionsofDSA
whereChapter
2
is
provides abrief introduction
aboutcryptography
andthe
use ofmulti-precision
arithmetic.Chapter
3
then
further
elaboratesthe
requirement of multi-precision arithmeticby
describing
the
mostcommonly
usedcryptographicalgorithms.Chapter
4
coversthe
selectedModular
arithmeticmethods,
their
hardware
implementation
and analysis.Chapter
5
providesthe
implementation
ofDigital Signature Algorithm
whileusing
the
selected algorithmsin
chapter4. Chapter 5
also giveshardware
and software comparisonfor
speed.Cryptography,
anintroduction
Information
security has been
a major concernfor
many
years.Several
techniques
have
been
developed
to
securethe
information
over short andlong
range communication.This
concept ofsecuring
information
belongs
to the
field
of cryptology.Generally,
cryptology
is
the
field
ofstudy
which providesinformation
aboutthe
communication ofdata
over non-secure channels.It
is
further
subdivided
into
cryptography
andcryptanalysis.
Cryptography
is
the
process ofdesigning
systemsto
secureinformation
where as cryptanalysisis
the
process ofbreaking
those
systems.In
cryptographic science ciphertext
is
aterm
usedfor
the
encryptedinformation.
2.1
Cryptography
"The
mathematical science used to secure theconfidentiality
andauthenticationof
data
by
replacing it
with atransformed
versionthat
canbe
reconvertedto revealthe
original
data only
by
someoneholding
thepropercryptographic algorithm andkey.''''[1]
From
ancient cryptographic systemsthe
first
recorded use ofcryptography
found
is
by
the
Spartans
who(400
BC)
employed a cipherdevice
called a "scytale"to
send secret communicationsbetween
military
commanders.The
scytale consisted of atapered
stick aroundwhichwas wrapped a piece ofparchment
inscribed
withthe
message.Once
unwrapped
the
parchment appearedto
contain anincomprehensible
set ofletters,
however
when wrapped around another stick ofidentical
sizethe
originaltext
appears.There
aremany
examples of ancient cryptographic systems availablethese
days. Julius
Caeser
often used a simplecipher,
which waslater
named afterhim,
"Caeser Cipher".
His
method of encryption anddecryption
wasbased
uponshifting
ofletters
by
three
spaces.
With
the
growing
needsto
have fast
and securecommunication,
the
old cryptographic systems canonly
be
referencedto
a smallfraction
of whatis
usedthese
days. As
the
distance
between
an encrypter anddecrypter
becomes
longer,
the
security
becomes
more critical whentransferring
someimportant information. Demand
of morecomplex systems
become difficult.
The
data
encryption anddecryption
of amessageis
done
by
using key.
A
key
is
a number or equivalent representationform
whichis
usedto
encrypt or
decrypt information.
Modern cryptography
is
categorizedinto
two
majortypes,
asymmetric
key
cryptography,
and symmetrickey
cryptography.Symmetric
key
cryptography is
commonly
termed
as secretkey
or privatekey
cryptography.
Private
key
cryptosystems use samekey
for both
encryptions anddecryptions.
They
aremathematically
less
complicatedthan
asymmetrickey
cryptosystems.
Security
in
privatekey
cryptosystems requires secure channels ofcommunication.
Generally
the
security
alsodepends
upon,
how
the
key
is
exchangedbetween
the two
parties.This
exchangeis usually
done
by
using
asymmetrickey
methods.
These
cryptosystems areusually
implemented
for
shorterdistance
ofinformation exchange.
These
cryptosystemsdo
not require complex multi-precisionarithmetic,
thus
they
willbe ignored
as part of research.Asymmetric
key
cryptography
is
alsocommonly
termed
as publickey
cryptography.
Its
concept wasintroduced
in 1970. The
word publickey
is
based
uponthe
concept, that
anyone can retrieve akey
eitherfor
encryption ordecryption
but
notfor
both. In
this case, there
arethus two
keys,
oneis
considered publickey,
and oneis
secret.One
of a use of such cryptosystemsis for
long
distance information
exchange.Fast
cryptanalytic attacks
have
madethese
systemsunsecured,
thus there
is
always animprovement
required withthe
growth offast
systems.Consider
the
basic
structure of publickey
cryptosystem,
whichis
divided into
anencrypter
E,
and adecrypter D. Suppose E
wantsto
send an encrypted messageto
D,
which
D
then
decrypts,
but
they
arelocated
very far
from
each other.Due
to
longer
distances
they
cannot agree uponany
key
to
usebecause
they
cannot meet each other.The
concept of publickey
then
ariseshere,
that,
the
decrypter
sends orbroadcast
a publickey
and either one encrypter or more encrypttheir
messagesto
form
ciphertexts.
The
cipher
texts then
goesback
to the
decrypter,
whoalready
has
the
inverse
ofthe
publickey,
whichhe
usesto
decrypt
the
cipher.The
inverse
ofthe
publickey
is kept
secret.The
E
Encrypter
"*' * I
* I
* I
* V
Decrypter
Encrypter
\\
1
*1
Cipher
\
\
\
\ \ ^ w
D
Figure 2.1
:Public
Key
Cryptosystem
In
the
figure
2.1
an attackeris
also shownbesides
n number ofencrypters anddecrypter.
There
canbe
several goals ofthe attacker, the
common ones area)
To
readthe
message.b) Retrieving
the
secretkey,
andthus
reading
allpublicly
encrypted messages.c)
Corrupting
an original ciphertext
into
another cipher sothat the
decrypter
gets awrong
ciphertext.
d)
Dodge
the
decrypter
by
pretending
as a valid encrypter.There
arefour
possibletypes
of attacksthat the
attackers can perform onthe
ciphertext,
a)
Cipher
text
only
attack:The
attackeris
only
ableto
copy
the
ciphertext,
andthen
he
cantry
different
waysto
decrypt
the
ciphertext.
b)
Known
plaintext attack:It
is
moredestructive
attackthen the
ciphertext
only
attack.
The
attacker getsthe
copy
of ciphertext
as wellthe
originalcorresponding
text.
Then
he
candeduce
the
key
out ofit.
Once
he has
the
secretkey,
andthe
decrypter doesn't
changethe
secretkey
sothe
attackeris
ableto
read allthe
messages
in future.
c)
Chosen
plaintext attack:Using
this
method,
if
the
attackerhas
accessto the
actual encryption
machine,
he
encryptsmany
chosen plaintexts to
deduce
the
decrypter
machineandtries to
decrypt
many
symbolsin
orderto
deduce
the
key.
In
orderto
securethe
information
many
cryptosystemshas
been
developed.
Digital
Signature Algorithm
is
a cryptosystem whichis
usedto
securethe
document
authentication.
Next
sectiondescribes
in
detail
aboutDigital
Signature
Algorithms.
2.1.1
Digital Signatures
A
signature representsthe
authenticity
of adocument
withits
associationto the
signer.Presently
signatureshave
attained muchimportance
in
the
digital
media,
especially
whenthe
distance
between
the
message signer and signatureverifierincreases.
When
authenticationof adocument
is
consideredthen
anintruder may try
to
stealthe
authenticationofthe
document
by
copying
the
original signature.Electronically
same situationexists,
andthus the
signeddocument
needsto
be
encrypted andthen to
be
verified.Digital
signatures algorithms are usedfor signing
the
documents
and verifying.There
aretwo
basic
techniques
usually
consideredfor digital
signatures.One
is RSA
signatures andthe
otheris
ElGamal
signature scheme.Digital
signatures are part of publickey
cryptography.2.1.1.1 RSA Digital
Signature
RSA
signature[3]
usesthe
same concept ofRSA
publickey
cryptosystem[3].
First
considertwo persons,
Alice
andBob,
whereAlice is
the
signer andBob is
the
verifier.Bob
sends adocument
mto
Alice
to
signit. The
algorithmbegins
asfollows,
1
.Alice
finds
two primesp
andq
andthen
computes n =pq.She
next chooseseA
suchthat
1
<eA< </>(n)
with gcd(eA,
<t>())
=1,
and calculatesdA
suchthat eAdA
=1
(mod
<j>(ri)). Where <j)(n)
=(p-\)
(q-\).
Alice
publishes(eA,
n)
andkeeps
privatedA,p
andg.2.
Alice's
signatureis
y= md/t
Bob
downloads
the
pair(tn,
y)
andthe
publickey
(eA,
n).Bob
verifiesthe
signatureby
using
the
following
steps,
1
.Calculate
z = ye"(mod
n).If
z =m, then
Bob
acceptsthe
signature asvalid;
otherwise
the
signatureis
not valid.2.1.1.2 ElGamal Digital Signature Scheme
ElGamal
signature scheme[3]
is based
uponElGamal
encryption.A
difference
from
RSA
is
that
withElGamal
method,
there
canbe
many
different
signatures producedthat
are validfor
a given message.Based
uponElGamal
encryptionmethod, the
National
Institute
ofStandards
andTechnology (NIST)
proposedthe
Digital Signature Standard
[17]
in 1991
and adoptedit
as a standardin 1994.
2.1.1.3
Digital
Signature Algorithm
Digital
Signature Algorithm
(DSA) [3]
based
uponDigital Signature Standard
requires a
hashed
messageSHA(m),
whichis
160-bit
long.
This
hashed
messageis
then
signed and verified
in
this
algorithm.Before explaining
the
algorithmin
detail,
first
consider ahash
algorithmSHA,
whichhas been
usedin
the
design.
The
current standardfor hash
algorithmis Secure
Hash
Standard
Algorithm
[7]
named asSHA-1. It
wasdeveloped
by
NIST along
withthe
NSA
for
use withthe
Digital Signature
Standard
(DSS)
andis
specified withinthe
Secure
Hash
Standard
(SHS)
National
Institute
ofStandards
andTechnology. SHA-1
is
a cryptographic messagedigest
algorithm whichtakes
a message m < 264bit
plaintext
and convertsit into
160
bit hashed
message
SHA(w). This 160
bit hashed
messageSEA(m)
is
called as a messagedigest.
It
is
designed
to
have
the
following
properties:it
is computationally infeasible
to
find
amessage which corresponds
to
a given messagedigest,
orto
find
two
different
messageswhich produce
the
same messagedigest
[8].
i.e.
if
there
is
ahash for document
ml,
it is
difficult
to
find
adocument
ml whichhas
the
samehash.
The
algorithm ofSHA-1 is
with
SHA-1
for both signing
and verification procedures.
Signature Generation
Signature
VerificationMessage
Received MessageSecure Hash Algorithm
Secure
Hash AlgorithmMessage Digest
Private
Digital
<
N
Message Digest
Digital
X
Public^
DSA
SignOperation
DSA
Verify
Operation ^^Key
Sig
natureSignature
<XKey
Yes
-Signature Verified
or o
-Signature
Verification Failedj
Figure
2.2: Data
flow diagram
ofDigital
Signature Algorithm
using SHA-1
In
figure
2.2,
ahash function
is
usedin
the
signature generation processto
obtaina
data,
called a messagedigest.
The
messagedigest
is
then
usedby
DSA
to
generatethe
digital
signature.The
digital
signatureis
sentto the
verifierin
terms
of signed messageand public
key.
The
verifier verifiesthe
signatureby
using
the
signer's publickey.
The
same
hash function
must alsobe
usedin
the
verification process.For
this thesis
research,
the
hash function
is only
used atthe
beginning
ofsigning operation,
because
the
purposeof
the
researchis
to
explainthe
difference
between
hardware
and softwareimplementation
ofmulti-precision arithmetic algorithms.DSA
is
usedto
implement
thesemulti-precision algorithms
for
performance comparisonsbetween
hardware
and software.A
block diagram
of a modified version ofDSA
is
mentionedin
the
figure
2.3. A
commonSHA-1 block is
usedby
both
ofthe
signature generation and sign verificationblocks.
The
signature generation
block
generates a publickey
and a signed messagefor
the
verification process.
In
verificationprocess, the
signed messageis
verifiedusing
the
I
SHA
T
Message
Digest
Figure 2.3: Block
diagram
Digital Signature Algorithm
usedfor
the thesis
research.Consider Alice
as aSigner
andBob
as aVerifier. Bob has
a messagedigest
SHA(m)
whichis
160
bits
long,
andhe
wantsAlice
to
signthe
message.Alice
signsthe
message,
and sendsit
back
to
Bob along
with a publickey.
Bob
usesthe
publickey
to
verify
the
authenticity
ofthe
signature.Consider
the
following
algorithm as mentionedin
[3],
page190.
2.1 Algorithm DSA
Alice
createsthe
publickeys
asfollows,
1
.Alice
finds
a primeq
as a160
bit
long
and chooses a primejl?that
satisfies q\p-\.i.e. p
=qx +
1
is
also prime, xis
the
evenfactor
ofthe
p-l.The
discrete
log
a ^g^''
(modp),
(2.2)
Then,
aq=1
(mod
p)
mustbe
satisfied.3.
Alice
chooses asecretasuchthat
1
< a <q-\ and calculatesB
= aa(modp)
(2.3)
4.
Alice
publishes(p,
q, a,
B)
as publickey,
andkeeps
asecret.Next Alice
signsthe
messageusing
a,p, q, a,
andSHA(m).
1
.Alice
selectsarandom,
secretinteger k
suchthat
0
<k<q-\.
She
then
computesr=(ak
(mod
p))
(mod
q)
(2.4)
2.
Next
she computess = k'1
(SHA(m)
+af)
(mod
q)
(2.5)
3.
Alice's
signatureis
(r,s),
which she sendsto
Bob.
On
the
otherside,
Bob
receivesthe
data
including
the
publickey
andthe
signed message.1
.Bob
first
downloads
Alice's
publickey
information
(p,
q, a,
B).
2.
He computes,
i =
s'1
SHA
(m)
(mod
q)
(2.6)
and,
3.
Next he
computesu2 =
s'1
r
(mod
q)
(2.7)
v =(o7'/T2
(modp))
(mod
q)
(2.8)
4.
Bob
acceptsthe signature,
if
andonly if
v=
2.2
Summary
After
abrief
reviewofRSA
andDSA
public
key
cryptosystems,
it is
evidentthat
the
modular operationis
the
important
part ofthe
security.In 1024
bit
RSA
andDSA,
the
arithmetic operations
involving
exponentsfor
modular operations requirelot
ofcomputation
cycles.For
examplein 1024 bit
DSA cryptosystem,
computation of cipherB
= aa
(mod
p)
requires1024
bit data
to
be
used.Performing
softwarebased
multi-precisionarithmetic operationson a single general purpose processor
(GPP)
is very
slowbecause
oflack
ofhardware
parallelism anddedicated hardware
resources.If
dedicated
hardware
is
usedto
replacethe
slower softwareblocks
then
avery
efficient solution canbe
presented as ahardware
and software combination.Such
problems andtheir
solutionsare explained
in
this
thesis
report.Elgamal DSA
has been
targeted
for
efficientimplementation
and performance comparisons andthe
hardware implementation is
shown
in
Chapter
5.
The
performance comparisonhas
been
done
between
the
implementation
of slower moduleslike
modular exponentiationin
software andChapter 3
Multi-precision
arithmetic
in
cryptography.
In
moderncryptography,
information
is
encoded anddecoded in
terms
ofnumerical values.
Information
is
first
convertedinto
numericalform
using
variousconversion methods.
The
latest
message conversion algorithms are securehashing
standard and
Message
Digest.
They
areusually
based
uponthe
principle ofhashing
the
messages.
Once
the
information
is
convertedinto
its
numericalform
m,
it is
then
passedthrough
the
encryption algorithm.In
encryptionalgorithm,
certain arithmetic operationsare
done
on m and produce a cipherc.This
cipher cis
when received onthe
decryption
side; the
decryption
algorithm containsthe
arithmetic operations which converts cto
m.This
whole processmay
requirefrom
simple mathematicalto
very
complex mathematicaloperations.
As
the
cryptosystems arebecoming
moresecure,
their
mathematicalcomplexity
is
increasing
andthus
producing
many
challengesfor
the
designers
to
implement
it in
most efficient way.Before going into
the
details
ofthe
cryptographicarithmetic,
let's
reviewthe
concepts ofbasic
numbertheory
usedin
them.
3.1
Congruence
In
chapter2
of"Introduction
to
Cryptography"[3],
publickey
cryptosystems arebriefly
described
to
showthe
mathematicalimplementations
for
encryption anddecryption.
Modular
arithmeticis
usedin
these
cryptosystemsto
obtainthe
results ofvarious
functions.
It
is
alsotermed
as congruence.It
is
defined
as,
"Let
a,
b
andnbe integers
with n^
0. We say
that
a=
Z>(modw)
(3.1)
if
a-b is
a multiple(positive
ornegative)
of ."[3].
Simply,
if
the
difference
a-b is
integrally
divisible
by
a numbern(i.e.
is
aninteger),
then
a andb
are congruentto
n
n.
The quantity
ais
sometime
called asbase,
andthe
quantity b is
sometime
called asa=
b (mod
n)
can alsobe
writtenas,
a =b
+nk
for
someinteger
k (positive
ornegative).An
examplecanbe
consideredas,
a=19
andn =8
then,
19
=3
(mod
8),
or19
=3
+2*8.
In
this
casek
=2,
andb
=3.
Another
exampleusing
a=-12, andn =
7,
-12 =
37
(mod
7),
or-12=37
+(-7
*7)
In
this case,
k
=-7 and
6
=37.
There
arefour
propositionsfor
congruencies asfrom
[3],
Z,er
a,b,
c, nbe
integers
andn^O.a =
0
(mod
w),
//"(/ o/y
if
n/a.a =a
(mod
n). a =b
(mod w)
z/"a<i
only ifb
=a(mod
n).if
a =^
andb
= c(mod
),
^era
a =c(mod ).3.2
Greatest Common
Divisor
(GCD)
Considering
a andfe,
the
greatest commondivisor
[3]
is
the
largest
positiveinteger
that
divides both
a andb.
It
is
also called as greatest commonfactor
andhighest
common
factor.
GCD
is
commonly
usedin
publickey
cryptosystemsparticularly
for
finding
relative primes.Relative
primes arethe two numbers,
whoseGCD is 1.
Considering
the
following
examples,
GCD
(13,
21)
=1,
GCD
(2, 8)
=2,
GCD
(22, 71)
=1.
GCD
ofthree
or morethan three
numbers canbe found
by
computing
GCD
oftwo
numbers at a
time,
as,
GCD
(a,
b,
c)
=GCD
(a,
GCD
(b,
c))
Finding
GCD
has
two
methods:1
.Considering
aandb
as positiveintegers,
the
GCD
canbe
computedby
first
factorizing
the
numbersinto
smallestprimes,
i.e.
i
i
1728
=2632,
135
=332. If
the
numbers arevery
large,
the
factorization is
very
hard.
The GCD
canbe
calculated
then
by
using
Euclidean
algorithm,
which willbe
explained next.3.3
Euclidean Algorithm
Euclidean
algorithm[3]
is generally
usedto
find
GCD
oftwo
integers.
Suppose
aand
b
aretwo
integers,
then there
exitsk
and/
suchthat,
ka
+lb
=c
(3.2)
wherec
is
the
GCD
ofaandb.
Algorithm
3.1:
Euclidean-Algorithm
(a,
b)
l.r0
=a;r\=b;m =
1;
2.
while(rm
0)
do
2.\qm
=r
2.2
rm+\ -rm.\-qmrm
;
3.
m=m+l;
llrm
=GCD
(a,
b)
This
algorithmonly
givesGCD
oftwo
integers.
In
orderto
find
the
multiplicativeinverses
ofaandb,
anotherversion,
known
as extendedEuclidean
algorithm canbe
used.3.4 Modular
Exponentiation
Modular
exponentiation[3]
is
very
important
part ofthe
security
ofthe
mostpublic
key
cryptosystems.Modern
cryptosystems usevery
long
numbers which causesarithmetic operations
like
modular exponentiationto
operate atvery
slow speed.Considering
anexample,
21027
(mod
637)
=37. If it is
computed
the
power of2
andreducing
by
637,
it
willtake
1027
iterations
whichis very
slow operation.On
the
otherhand
the
successive square onboth
sides and reduction ofthe
base
with modulus willreduce
the
number ofiterations.
Considering
the
aboveexample,
Start
with 22=
4 (mod
637)
andrepeatedly
squareboth
sidesto
obtainthe
216
=2562 =
562
232 =
529
2645=
198
2127=
347
2257=
16
2512;=
256
21024 =
562
Since
the
binary
representationof1027
=1000000001
1
Then
21027=
562
*4
*2
=37 (mod
637).
Where
562, 4,
and2
are selectedbased
onlogic
1's
the
binary
representation of1027.
An
efficient modular exponentiation algorithmis
based
uponright
to
left
andleft
to
right
binary
exponentiation algorithm.3.4.1
Right-to-left
binary
exponentiationThe
inputs
to the
Right-to-left
binary
exponentiation[12]
arexande,
wherexis
the
base
ande
is
power.Algorithm 3.2: Expo
(xe)
\.A<-l,S<rx.
2.
while(
e0)
do:
2.1 if
eis
odd, then^
<-A
.S.
2.2
e<r2.3
if
e+
0,
then
S
<r
S
.S.
3.
returnv4.4. End Expo (xe).
In
the
above algorithmthe
loop
runs until e0. At
line
2.1,
eis
verifiedfor
if
eis
odd, then compute,
A
<r
A
.S. This
verification of eto
be
odd or even caneasily be
checked
using
the
LSB
ofe.If e(0)
= '1
',
then
eis
even,
elseit is
odd. eis
further divided
by
2,
whichin
hardware
represents aright
shiftregister,
which when shiftsright,
the
number
is divided
by
2. At
line 2.3
S
4-S
.S
is
computed.The
multipliers2.1
and2.3
canbe
executedin parallel,
if implemented
in
hardware
althoughthere
is
adependency
in
the
loop
between
line
2.1
and2.3. i.e. S
is
updatedatline
2.3
andthen
in
the
nextiteration
it is
usedby
A
atline 2.1. This
dependency
does
not causeany
conflict withthe
parallelism,
because
the
multiplier atline
2.1
needsthe
resources ofS
to
be
usedin
the
nextiteration,
but
not withinthe
sameiteration.
3.4.2
Left-to-right
binary
exponentiationThe
inputs
to
Left-to-right
binary
exponentiation[12]
algorithm arexande.Algorithm
3.3: Expo
(xe)
1.
A<r\.
2.
fori=t
down
to
0,
loop
A<rA.A
lf(ei=l),A<rA.x.
2.
Return A
3.
End Expo (xe).
In
left-to-right
binary
exponentiation,
the
loop
runsfor
time
t,
where tis
the
number ofbits in
e.When comparing
to
right-to-left
algorithm,
this
algorithmhas
adata
dependency
withinthe
sameiteration,
whichdoes
not allowachieving
parallelismin
this
algorithm.
This
reducesthe
performancein
terms
of speedin
this algorithm,
although,
the
area requirements
during
the
hardware implementation
arehalf
in
left-to-right
algorithm as comparedto
right-to-left
algorithm.This is
because
the
same multiplier canbe
usedfor
both A<r A
.A andA
<r
A
. xiteratively.
In
chapter4,
these two
methods arefurther
explained and one methodis
then
chosen
for hardware
implementation.
3.5
Common Algorithms
usedin
Cryptography
Most
ofthe
cryptosystems require complex computationbased
upon certainalgorithms.
These
algorithms arehelpful
in
providing fast
computations and thus accessExtended Euclidean Algorithm
andPrimality
test
based
uponMiller Rabin
theorem.
The
description
ofthese
algorithmsin
chapter3
signifiesthe
multi-precision modulararithmetic used
for
publickey
cryptosystems.The
purpose ofshowing
the
modulararithmetic rs
the
hardware
and softwareimplementation
of multi-precision arithmeticalgorithms
for
performance comparison.The
computationtime
in
terms
ofhardware
andsoftware
implementation
varies,
whichis
the target
of
this thesis
research.3.5.1
Extended
Euclidean Algorithm
The Extended Euclidean
algorithm[3]
is
a version ofEuclidean
algorithm.In
addition
to
computing
GCD,
this
algorithm also generatesthe
multiplicative
inverses
ofa andb
only
if
aandb
arerelatively
prime numbers.Algorithm
3.4:
Extended-Euclidean-Algorithm
(a,
b)
Ha
andb
are positiveintegers
1.
r0=a;ri=b;
2.
'o
=0;
'i
=l;
3.
so=L
5i =0;
4.
m=\\
5.
while(r^O)do
5.1
qm=rm-\
55.2
fm+lt"m-\
'qm^m'4-5.3
*m+\
~tm-1
'q-nttn-.
5.4
Sm+1~ Sm-\'qmSmi
5.5
m =m+l;
6.
m =ml;
Hrm
=GCD
(a
,
b)
anAsma
+ tmb =rm[4]
Due
to the
ability
ofthis
algorithmto
providethe
multiplicativeinverses
ofthe
given
two
co-primenumbers,
it is
widely
usedfor
many
cryptographic applications.As
seen
in
chapter2,
RSA
algorithmdescription,
d has been
computedasthe
inverse of
e,
whichis
the
part of publickey
usedfor
encryption of message.Besides
finding
multiplicative
inverses
ofnumbers, it
alsohelps
in avoiding
the
fractions
createddue
to
modulodivision
operation.The
runtime
ofthis
algorithmis
O (n2).
3.5.2
Primality Testing
Primality
testing
[3]
is
requiredin
orderto
verify if
a certaininteger
is
prime orcomposite.
For
aninteger
N,
the
number of primesless
than
or equalto
N
is
approximately
N I
ln
N
[4].
One
property
of prime numbersis
that, they
have
nofactors.
Another
is
that
every
prime numberis
an odd numberbut
notevery
odd numberis
aprimenumber.
Numbers
which are not primes are called as composite numbers.The
term
prime
factorization
refersto
a primep
where(p
-1)is
an even number yieldsto
its
finite
number of
factors.
When
considering
very
long
numbers,
computationaltime
offinding
if
a number
is
prime or notis less
then
primefactorization.
Primality
testing
is
nowconsidered
in
polynomialtime
afterM.
Agrawal,
N.
Kayal,
andN.
Saxena,
provided analgorithm
that
supposedly
tested
primality in
polynomialtime
[5]. As
comparedto
polynomial
time
algorithms ofprimality, the
randomized or probabilistic algorithms aremuch
faster
and arewidely
usedfor
Primality
testing.
Randomized
algorithms arebased
upon yes-based
Monte Carlo
algorithm where a "yes" answeris
alwayscorrect,
but
a"no"
answer
is
probabilisticmeans,
it
canbe incorrect.
A
most common probabilisticalgorithm used
for
Primality
is Miller-Rabin
theorem
[3].
Algorithm 3.5: Miller-Rabin
(n)
1
.define
a,
k,
m, xq,
xi asintegers.
2.
chooseaas a randominteger.
3.
m=n-I;
k=0;
4.
while(m
mod2
=0)do
4.1
k=k+l;
4.2
m= m1
2;
5.
endwhile;
6.
xo= am(mod
n);
7.1
xi
= x02(mod
n);
7.2
if
(x- =1
and
xq
^
1
andx0
^
n-\)
then
returncomposite;
7.3
xq
=x>;
8.
endloop;
9.
if
(xi
iz1)
then
returncomposite;
10.
returnprime;
End
Miller-Rabin(n).
Odd
numbers areonly
tested
for
primes as mentionedabove,
sothe
input
to this
algorithmis
only
the
odd number.The
probability
ofMiller-Rabin
test
for
a certainchosen randoma
is Va
for
afailure
ofrecognizing
a composite number.3.6
Summary
The
algorithms mentionedin
this
chapter arethe
basis
ofmany
cryptosystems.Modular
exponentiationis
an essential part ofRSA
andDSA
publickey
cryptosystems.Chapter
4
describes
the
hardware
implementation
of modular multiplication andexponentiation algorithms.
Miller-Rabin
Primality
test
has been
used as part ofthesis
research
in
orderto
find
prime numbersfor
DSA. Extended
Euclidean
algorithmhas
been
Chapter
4
Hardware implementations
ofthe
multi-precision
modular arithmetic
methodsWhen
considering
highly
securecryptosystem,
use ofvery
long
numbersis
always
targeted.
Modern
publickey
based
cryptosystemlike
RSA
andDSA normally
uses numbers
up
to
2048 bits long.
As
presentedin
chapter2,
RSA
andDSA
algorithmsrequires modular exponentiation.
When
designing
for
speed,
hardware
replaces certainsoftware portions which require selection suitable algorithms
that
fit
the
requirement ofefficient
hardware
design.
Multi-precision
modular exponentiationimplementation in hardware
gives muchspeedup
when comparedwith softwareimplementation.
The speedup
canbe
achievedby
the
availability
of parallelism anddedicated
hardware
resourcesinstead
of a generalpurpose
CPU
which schedulesthe targeted
cryptographic applicationtasks
with othertasks.
These
arethe
most common consideration which causesthe
selectionofdedicated
hardware
solution.If
modular exponentiationis
implemented
in
hardware
using
the
division
operation, it
can consumelot
of area andtime
resources onthe
hardware
chip.This
division
canbe
avoidedby
using
an algorithmfor
modular reduction whichis
explained next
in
the
background
work.4.1 Background Work
a)
Modular Multiplication
withoutTrial Division
[5]
In
this
paperMontgomery
proposed a methodfor multiplying
two
integers
modulo A-"
while
avoiding
division
by
N. The
representation ofthe
integers
is
called asN.
This
methodis
usefulonly
if
several multiplications are performedfor
fixed
modulusN.
Fixed
modulusis
adrawback for
suchcryptosystems,
wherethere
is
a requirement ofusing
morethan
one modulus.Avoiding
division is
an essential partfor
hardware
implementation,
thus this
algorithmhas
gained primeimportance for
the
hardware
of
this thesis.
Using
Montgomery
modular multiplicationtechnique,
an algorithm ofmodular
exponentiation
has been implemented
in
hardware.
b)
Montgomery
Modular Exponentiation
onReconfigurable
Hardware
[8]
In
this
paper,
Montgomery
modular multiplication was combined with systolicarray
design,
which was capable ofprocessing
a variable number ofbits
perarray
cell.The
design
wastargeted
for Modular
exponentiationwhich wasfurther
used as adesign
unit
for
512
and1024
bit RSA.
The
design
wasimplemented
onXilinx XC4000 Series
FPGA.
The
RSA
decryption
time
for
the
512
bit
in
hardware
was2.37ms
as comparedto
the
softwareimplementation
which was9.1
ms on a150 MHz Alpha. Thus
the
speedup
in
hardware
was3.8
times
more as comparedto
software.Also
the
fastest
1024
bit
software
implementation
of43.3
msrunning
on aPentium
Pro-200 based
PC
was about4
times
slowerthan the
hardware (10.2ms).
c)
Efficient
Architectures
for
implementing
Montgomery
Modular
Multiplication
andRSA
Modular Exponentiation
onReconfigurable
Logic.
[6]
This
paper presents a review of someexisting
architecturesfor
the
implementation
ofMontgomery
modular multiplication and exponentiation on anFPGA.
There
arethree
different
architectures ofMontgomery
multiplierdescribed. These
architectures are
implemented,
and comparisonfor
area and speedis
given.A
comparisonof
two
modular exponentiation algorithmsusing
Montgomery
multiplicationis
alsodescribed.
A
good selection of modular multiplier and exponentiatorhas been
madeto
implement
RSA
cryptosystem.For
this thesis
researchthe
algorithms presentedin
this
paper
have been
chosenfor hardware implementation.
Instead
ofRSA,
the
selectedarchitectures are
implemented for DSA.
4.2
Classical Modular Reduction
[3]
Consider
two
positiveintegers,
x&
y,
andthe
modulusM. The
representation ofa)
Consrder
x *y, x+y, x-y, x,
y
modularreduction operations.b)
Compute
remainder
rfor
any
ofthe
arithmetic operations x *y, x + y, x-y, x,
y
divided
by
M.
c)
returnr.In the
stepsabove,
the
division
is
used
in performing
the
modulo reduction operation.On
the
otherhand let's
considerMontgo