International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)107
Cryptographic Unit of Embedded Processor in FPGA
Implementing for Speed and Security
Bharatbhushan V. Panzade
1, Prof.Trupti H.Nagrale
21M.E.(II Year,Embedded System and Computing), 2Associate Professor (CSE), G.H.Raisoni College of
Engg.Nagpur(M.S.),INDIA
Abstract:-Networking and data communications security has become a vital requirement since the emergence and common use of computer networks. The main goal of this presented work is to implement the RSA symmetric cryptography algorithm and ECC in Embedded processor using FPGA. We Implements cryptographic unit of Enhanced Embedded Processor that enhancing cryptographic unit which does transmission of data securely. Making the part of processor we use FPGA to speedup the values of arithmetic operations. We also implements cryptographic unit using software algorithms and ASIC but to make high acceleration and more secure distribution of keys using cryptographic algorithm ,VHDL RSA and ECC code dump in FPGA and attached a part of Embedded processor i. e. Cryptographic unit.
Keywords:- FPGA, ECC, RSA, multiplier ,cryptography.
I. INTRODUCTION
Streamlining the resources of general-purpose processors for the need of a specific application domain is nowadays widely employed in embedded systems. Various micro-processor vendors developed architectural enhancements for fast multimedia processing. Similar to multimedia applications, public-key cryptosystems are suitable for processor specialization since most software algorithms for multi precision arithmetic employed in public-key cryptography Speeding up these regions through certain architectural enhancements result in a significant performance gain. In this paper, we explore the benefits of architectural enhancements for fast and secure computation of cryptographic operations on an embedded RISC processor. Enhancements come in two flavors: 1) augmenting the existing ISA with new instructions and 2) adding new functional units with reasonable overheads. Augmenting a general-purpose processor through relatively low-cost enhancement techniques for fast arithmetic operations that dominate cryptographic computations in terms of time and resource usage, has a number of benefits over using a hardware accelerator such as a cryptographic co-processor.
II. RELATED WORK
Previous works propose various enhancements to accelerate cryptographic operations. five custom instruction to accelerate arithmetic operations in both
GF(p) and In this paper, we take a slightly different and holistic approach by designing/implementing a
cryptographic unit (CU) for an extensible embedded
processor core that benefits many cryptographic operations; through not only acceleration and but also secure execution.
III. PROPOSED WORK
The proposed cryptographic unit facilitates new and powerful custom instructions to accelerate multiplication and inversion in prime finite field GF(p)[1,4] and cryptographic operations of elliptic curve cryptography and RSA.
IV. GENERAL ARCHITECTURE
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012) [image:2.612.70.270.173.362.2]108
Figure 1: General architecture of embedded processor [4]V. IMPLEMENTATION OF RSA
The following step is taken to implement the RSA public key scheme:
1. Choose two large prime numbers, p and q. Let n=p*q, Let Ô(n) = (p-1)*(q-1).
2. Randomly choose a value Kp (1< Kp < Ô(n)), which is relative prime to Ô(n) that gcd (Kp, Ô(n)).
3. Calculate Ks•ßKp-1 mod Ô(n), send public key (Kp, n) to transmitter and secret key (Ks, n) to receiver.
4. Transmitter encrypt the original message, C=MKp mod n, then send cipher text to receiver.
5. Receiver decrypt cipher text by M=CKs mod n and retrieve the original message.
Therefore, in the total three steps of RSA implementation, we mainly require three algorithms.
1. Miller-Robin test to find two large prime numbers.(Step1)
2. Extended Euclidean algorithm to calculate private key Ks, which is multiplicative inverse Ô(n). (Step 3)
3. Fast integer exponent (square and multiply) algorithm. (Step 5, 6), which is critical for time consideration.
5.1 Miller- Robin test
Miller-Robin test is a primarily testing algorithm to determine an integer n is composite number or prime number[6,7]. The probability that an odd integer value < N is a prime number is 2/lnN. The implementation algorithm is described as following:
1. Find integer k, q with k>0, q is odd so that (n-1) =2k*q.
2. Select random integer ‗a‘, 1< a < n-1. 3. Zaq mod n
4. If Z=1 then return (true)—test A. 5. For j=1 to k do
6. If Z= (n-1) mod n then return (true)—test B.
7. If test A and test B are both false, then the integer number must be composite, otherwise it may be a prime number or a composite number.
If we choose the integer randomly, the probability that the integer is composite with roughly (0.25) t where t is the
number of tests. Since the random number generator and Miller-Robin test may take some time in hardware implementation, in our implementation we use java to generate two large prime numbers, which are both 32 bits long.
5.2 The Euclidean algorithm and extended Euclidean
algorithm
5.2.1 Euclidean algorithm
After Ô(n) have been calculated, we randomly choose a number Kp which satisfies the condition that Kp is less than Ô(n) and greater than 1, and calculate gcd (Ô(n), Kp) = 1 to make sure there exits one and only one multiplicative inverse Ks. The implementation algorithm described as following:
1. First we generate two positive integers a, b, and also have i = 0.
2. If a and b are both even number, then a <= a/2, b
5.2.2 Miller- Robin test
Miller-Robin test is a primarily testing algorithm to determine an integer n is composite number or prime number. The probability that an odd integer value < N is a prime number is 2/lnN. The implementation algorithm is described as following:
1. Find integer k, q with k>0, q is odd so that (n-1) =2k*q.
2. Select random integer ‗a‘, 1< a < n-1. 3. Zaq mod n
4. If Z=1 then return (true)—test A. 5. For j=1 to k do
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)109
7. If test A and test B are both false, then the integer number must be composite, otherwise it may be a prime number or a composite number[8,9]. If we choose the integer randomly, the probability that the integer is composite with roughly (0.25) t where t is the
number of tests. Since the random number generator and Miller-Robin test may take some time in hardware implementation, in our implementation we use java to generate two large prime numbers, which are both 32 bits long.
VI. ECCENCRYPTION/DECRYPTION
Several approaches to encryption/ decryption using elliptic curves have been analyzed. This paper describes one of them. The first task in this system is to encode the plaintext message m to be sent as an x-y point Pm. It is the
point Pm that will be encrypted as a cipher text and
subsequently decrypted. Note that we cannot simply encode the message as the x or y coordinate of a point, because not all such coordinates are in Ep(a, b). There are
approaches to encoding . We developed a scheme that will be reported elsewhere. As with the key exchange system, an encryption/decryption system requires a point G and an elliptic group Ep(a, b) as parameters. Each user A selects a
private key nA and generates a public key PA = nA x G. (6)
To encrypt and send a message Pm to B, A chooses a
random positive integer x and produces the cipher text Cm
consisting to the pair of points[7] Cm = {xG, Pm + xPB} (7)
Note that A has used B‘s public key PB. To decrypt the
cipher text, B multiplies the first point in the pair by B‘s secret key and subtracts the result from the second point: Pm + xPB – nB(xG) = Pm + x(nBG) – nB(xG) = Pm (8)
A has masked the message Pm by adding xPB to it.
Nobody but A knows the value of x, so even though PB is a
public key, nobody can remove the mask xPB. However, A
also includes a ―clue,‖ which is enough to remove the mask if one knows the private key nB. For an attacker to recover
the message, the attacker would have to compute xgiven G and xG, which is hard.
VII. VHDLIMPLEMENTATION
7.1 Top-down design:
In our design, we separate the whole system into three parts according to three different algorithms.
3. Miller-Robin test, we use java to generate two large prime numbers p and q which are both 32-bit integers, generating a file called prime.txt.
4. Reading prime.txt, calculating the composite number n=p*q, ô (n) = (p-1)*(q-1), Kp and Ks, generating a file called keys.txt, where n, ô (n) is 64-bit. Kp and Ks are less than 64-bit integer but extended to 64-bit. 5. Reading keys.txt and message.txt, calculating C=MKp
mod n, generating file cipher.txt.
6. Reading cipher.txt, calculating M=CKs mod n and
generating file plain.txt, comparing the content of plain.txt and message.txt, verifying the results.
7.2 Bottom-up implementation:
7.2.1 The extended Euclidean algorithm:
After the two prime numbers p and q have been tested. We can begin to calculate the composite number n, the public key Kp, and the private key Ks. The implementation of the RSA datapath is showed as follows:
1. Generating a 64-bit composite number n by multiplying two unsigned 32-bit inputs p and q. 2. Calculating Ô(n) = (p-1) and (q -1) (? Make sure the
[image:3.612.324.605.369.650.2]expression is correct) by 32*32 multiplier and two sub tractor, where Ô(n) is always even.
Fig.2 Adder Unit[2]
Fig. 3 Subtracter unit[2]
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)110
In 3 particular, Fibonacci LFSR was implemented because it is more suitable for hardware implementation than Galios LFSR. In theory, an n-bit linear feedback shift register can generate a (2n -1)-bit long pseudo random
sequence before repeating [2]. So for simply reasons, we just use the formula x^63 + x ^17+1 to generate the random number. And when we design the following part which is the GCD part in the design, we modified the LFSR a little to ensure the first bit is always 0 and the last bit is always 1.In that case, we can make sure the random number is always odd. The purpose is to is simplify the algorithm and get the right Kp faster because Ô(n) is originally even before we apply the modified LFSR.
4. The register after LFSR is to store the random number e1; first input the Ô (n) and e1 to a 64-bit divider and output reminder is e2 in order to ensure e2 is less than Ô (n).
5. Test1 is to check whether e2 is equal to zero. If it is zero or 1, we choose Ô (n) / 2 + 1 as the new output number e3. This is to make sure the e3 is greater than 1.
6. The datapath of the gcd part is shown in the Figure 2. After the change in the part2, we only deal with the situation of one even number and one odd number so that we do not need the integer any more. in this situation, we require two shift registers and the abs component to calculate the absolute value of that x – y. However we can only test the number to see if the gcd is greater than 1 of these two numbers. We still have to design a component called test2 to deal with number to get the right one. The inputs of this component are e3 and Ô (n), and the output is e4.
7. The test 2 part is used to check if the number e4 is the right one. When the gcd is 1, we can set the e4 as the right output Kp. If the gcd is not 1, we just subtract the e4 by 2, as the new input of the gcd part. And one problem should be noticed, if e4 is 3, we should use add instead of subtract to prevent the e4 goes to 1. In practice this strategy works well. We have tested in common cases, and it only needs 1 or 2 recalculate to get the right Kp. In this part, the state machine design is very important because we should consider about several different situations. We have spent lots of time to design the state diagram, and sometimes we figure out that inserting one waiting state to delay outputs is necessary for smoothing the state transactions.
8. The extended Euclidean algorithm is implemented in the EEA part. First we input the Kp and Ô(n) to the eea component. We have modified the algorithm as we have been mentioned before. Because we only need one multiplier inverse, we do not need to calculate b1, b0 except for the situation where gcd = 1. The problem we met here is the clock cycle management. The divider and the multiplier have a small clock, and the FSM should have a large clock cycle because it needs to wait for the result and then do the next operation. To solve this problem, we design the ocl components to generate different kinds of clock cycle. This strategy has other benefits in the whole implementation. We design an ―ocl‖ (shown in the attach codes) components to generate different clock cycles so that the divided parts can operate under different clock cycles. And we can design it in asynchronies mode which means the output changes when the input changes. This setting is reasonable and gives us convenient to ignore the effect of the different clock cycles. The other small thing is that in this algorithm we have the signed number, so we need to add a positive component to get the positive number by adding the result with the Ô (n).
7.2.2 Fast exponent algorithm:
MULMOD component :- MULMOD is the critical component to perform rounds of multiply and modular operation.
1. One input to multiplier64 is remainder z and the other is either 64-bit plaintext m or remainder z which ism derived from the two inputs multiplexer. Inside mulmod, clock cycle is set to which make the 64-bit multiplier operation finish in 128 ns and set mul done flag.
2. The 128-bit product is stored in a 128-bit register. 3. The 128-bit product is input to divider-128 as
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 4, April 2012)111
VI. CONCLUSION
In this cryptographic unit of embedded processor VHDL implementation of RSA and ECC, we have implemented Multimod algorithm in binary system simply. In the practice the strategy works well. We can gain more security than the other strategy because we use the random numbers. And this paper implementation can easily extend to large bits such as 256 or 1024 or even longer. It has many advantages than ASIC or software implementation of same cryptographic unit of embedded processor. Meanwhile, there are several limitations in this paper system. First, we use RSA and ECC to realize block cipher, the plaintext, cipher texts and keys have length limitations. Second, since we use constant intervals in the calculation, the system will always take some time even when there is no data operation.
REFERENCES
[1 ] R. L. Rivest, A. Shamir, L. Adleman, "A method for obtaining digital signatures and public-key crypto systems," Communications of the ACM, vo1.21,no.2, pp.120-126, Feb 1978.
[2 ] C. D. Walter, "Systolic modular multiplication," IEEE Transactions onComputers, 1993,42(3), pp.376-378. [3 ] N. Ncdjah, M. M. L. De, "Three hardware
architectures for the binarymodular exponentiation: sequential, parallel, and systolic," IEEE Transactions on Circuits and Systems I: Fundamental Theory andApplications, 2006, 53(3), pp.627-633.
[4 ] Ov¨unc¸ Kocabas¸, Erkay Savas, ―Enhancing an Embedded Processor Core with a Cryptographic Unit for Speed and Security.‖, 2008 International
Conference on Reconfigurable Computing and FPGAs. [5 ] Ahmed Rady Ehab EL Sehely ―Design and Implementation of area optimized AES algorithm on reconfigurable FPGA‖,IEEE 2007
[6 ] Arif Irwansyah, Vishnu P. Nambiar, ―An AES Tightly Coupled Hardware Accelerator in an FPGA-based Embedded Processor Core,‖ 2009 International Conference on Computer Engineering and Technology IEEE 2009
[7 ] Mohamed Khalil-Hani1, Arif Irwansyah ―A Tightly Coupled Finite Field Arithmetic Hardware in an FPGA-based Embedded Processor Core for Elliptic Curve Cryptography ,‖ 2008 International Conference on Electronic Design, December 1-3, 2008, Penang, Malaysia.
[8 ] Yingjie Ji, Liji Wu, ―Power Analysis Resistant AES Crypto EngineDesign and FPGA Implementation for a Network Security Co-processor‖ ,IEEE 2009