DESIGN OF WORD LEVEL MULTIPLICATION ALGORITHM ON REORDERED NORMAL BASIS

(1)

DESIGN OF WORD LEVEL MULTIPLICATION ALGORITHM ON REORDERED NORMAL

BASIS

G.Yuvaraj

Assistant Professor, Department of ECE SNS College of Engineering, Coimbatore

Anusree.G, Manopriya.K, Shobana.T, Yasodha.R Final year, Department of ECE

SNS College of Engineering, Coimbatore

Abstract - Normal basis is widely used for representation of binary field elements Much attention has been paid to trade off between time and number of gates, but until little attention has been paid to the problem of connecting the gates in economical and regular way to minimize chip and chip area and design costs. Two new level high speed architectures, reordered normal basis type-I and reordered normal basis type-II for binary field multiplication are proposed. It allows the designer to trade off between area and speed. Optimal normal basis type-II is a special class of exhibiting very low multiplication complexity. Reordered normal basis is referred to as a certain permutation of optimal normal basis. One unique feature of the proposed architectures is that the critical path delay is independent of the number of words or the field size. This enables the proposed multipliers to operate at very high clock rates regardless of the number of words or the field size.

I. INTRODUCTION

Arithmetic operations in the Galois field GF(2^m) have several applications in coding theory, computer algebra and cryptography. Several algorithms for basic arithmetic operations in finite fields are suitable for both hardware and software implementations have been recently developed. The applications of these algorithms are found in error-correcting codes and public key cryptography. The proposed algorithms are suitable for obtaining high speed implementations of the field operations on signal processors and microprocessors. The efficient implementation of multipliers plays a major rule in system performance. Two bases are commonly used in practice they are polynomial basis and normal basis.

Optimal normal basis (ONB) type-I and type-II are two special classes of normal basis for which the complexity of multiplication is minimized. ONB type-II has been recommended and widely used for the design of arithmetic with cryptographic applications. Reordered normal basis is a reordered version of an ONB type-II and was initially proposed in the 1994. Later, this basis was used to create efficient ONB type-II multipliers in the 2001. One advantage of the normal basis is that the squaring of an element is computed by a cyclic shift of the binary representation.

For binary field multiplication three types of architectures are present: bit level, fully parallel, and word level.

The bit level and fully parallel architectures represent extreme design styles while word level architectures fill the gap between the two extremes and allow the designer to set the trade-off between the area and speed. Bit serial multiplier needs m number of more flip-flops than the word level multipliers as the serial output needs to be stored because of word level multipliers does not independent of the number of words.

1.1 VLSI

VLSI stands for Very Large Scale Integration. VLSI is the process of creating integrated circuits by combining thousands of transistors into a single chip. VLSI began in the year 1970 when complex semiconductor and communication technologies were being developed.

VLSI is mainly applied for miniaturization of complex circuits so as to achieve better speed, reduce chip size, minimum power consumption and enhance better performance. This technology replaces older electronic

(2)

components such as vacuum tubes, valves and resolves pitfalls exits in those such as speed and power constraint.

1.1.1 Factors involved

VLSI circuits are predominantly CMOS based. The way normal blocks like latches and gates are implemented is different from what students have seen so far, but the behavior remains the same. All the miniaturization involves new things to consider. A lot of thought has to go into actual implementations as well as design. Some of the factors involved are listed below.

II. REORDERED NORMAL BASIS

In linear algebra, a basis is a set of linearly independent vectors that, in a linear combination, can represent every vector in a given vector space or free module, which define a coordinate system. In more general terms, a basis is a linearly independent spanning set. A basis is just a set of vectors with no given ordering. For many purposes it is convenient to work with an ordered basis. An ordered basis is also called a frame.

2.1 NORMAL BASIS

A normal basis in field theory is a special kind of basis for Galois extensions of finite degree, characterized as forming a single orbit for the Galois group. The normal basis theorem states that any finite Galois extension of fields has a normal basis.

2.1.1 Optimal normal basis of type-I

Theorem: The finite field GF(p^m) contains an optimal normal basis of type I consisting of no unit (m+1) roots of unity if and only if m+1 is a prime and p is primitive modulo m+1.

2.1.2 Optimal normal basis of type-II

Theorem: Let GF(2^m) be a finite field with 2^m elements where 2m+1 = p is a prime.

(a) 2 is a primitive root (mod p)

(b) -1 is a quadratic non-residue (mod p) and 2 generates all the quadratic residues (mod p) Let α = β + β^-1

where β is a primitive p^th root of unity in GF(2^2m)

We have α ∈ GF(2^m), then {α , α² , α²²…………. α^2m-1} is called the optimal normal basis of type-II.

2.2 USAGE

This basis is frequently used in cryptographic applications that are based on the discrete logarithm problem such as elliptic curve cryptography. Hardware implementations of normal basis arithmetic typically have far less power consumption than other bases.

When representing elements as a binary string, we can square elements by doing a left circular shift with wraparound. This makes the normal basis especially attractive for cryptosystems that utilize frequent squaring.

2.3 WORD -LEVEL ALGORITHM

Define function s(i) which maps the set of integers to the set{0,1,…m}

S(i) = i mod 2m+1, if 0<=i mod 2m+1<=m,

S(i)= 2m+1-i mod 2m+1, if m< i mod 2m+1<=2m. (1) Let A,B € GF(2^m) and their product be C € GF(2^m). Let

A, B, and C be represented with respect to the reordered normal basis.

A and

Then it follows from [11]

ci (2)

Let w denote the word size and then d = [ m/w] is the number of words needed to represent a field element in GF(2^m).

(3)

Write the subscript j as j=kw+l , where k=0,1,2,…..d-1 and l= 1,2,…….w and replace j in (2)

ci (3)

Define signals ei,k(l) and gi,k(l), where i=1,2,….,m, k=0,1,….d-1 and l=1,2,…w, as follows:

ei,k(0)

=0 and ei,k(l)

= ei,k(l-1)

+akw+lbs(i+kw+l)

gi,k(0)

=0 and gi,k(l)

= gi,k(l-1)

+akw+lbs(i-kw-l) (4)

Then from (4) ei,k(w)

= gi,k

(w)= (5)

Compare (3) with (5) Ci= ei,k

(w) + gi,k

(w)] (6)

2.4 WORD-LEVEL REORDERED NORMAL BASIS TYPE-a 2.4.1 Algorithm

Input: A = (a1,a2,. . . am), B = (b1,b2,. . . bm) Both A and B € GF(2^m)

Output: C= A X B = (c1,c2,. . . cm) € GF(2^m) 1. Initialization: ei,k

(0)=0 and gi,k

(0)=0, i=1,2,…m, k=0,1,…….d-1.

2. Compute in parallel for all i = 1, 2, . . .m.

3. Compute in parallel for all k = 0, 1, . . . [m/w]-1.

4. Compute in serial for l=1, 2, . . . , w.

5. tem1 = akw+lbs(i+kw+l)

6. ei,k (l)= ei,k

(l)+tem1 7. tem2 = akw+lbs(i-kw-l)

2.5 WORD -LEVEL ALGORITHM

Define function s(i) which maps the set of integers to the set{0,1,…m}

S(i) = i mod 2m+1, if 0<=i mod 2m+1<=m,

S(i)= 2m+1-i mod 2m+1, if m< i mod 2m+1<=2m. (1) Let A,B € GF(2^m) and their product be C € GF(2^m). Let

A, B, and C be represented with respect to the reordered normal basis.

A and

Then it follows from [11]

ci (2)

Let w denote the word size and then d = [ m/w] is the number of words needed to represent a field element in GF(2^m).

Write the subscript j as j=kw+l , where k=0,1,2,…..d-1 and l= 1,2,…….w and replace j in (2)

ci (3)

Define signals ei,k(l)

and gi,k(l)

, where i=1,2,….,m, k=0,1,….d-1 and l=1,2,…w, as follows:

ei,k(0)

=0 and ei,k(l)

= ei,k(l-1)

+akw+lbs(i+kw+l)

gi,k

(0)=0 and gi,k (l)= gi,k

(l-1)

+akw+lbs(i-kw-l) (4)

Then from (4) ei,k(w)

= gi,k

(w)= (5)

Compare (3) with (5)

(4)

Ci= ei,k(w)

+ gi,k(w)

] (6)

2.6 WORD-LEVEL REORDERED NORMAL BASIS TYPE-a 2.6.1 Algorithm

Input: A = (a1,a2,. . . am), B = (b1,b2,. . . bm) Both A and B € GF(2^m)

Output: C= A X B = (c1,c2,. . . cm) € GF(2^m) 1. Initialization: ei,k

(0)=0 and gi,k

(0)=0, i=1,2,…m, k=0,1,…….d-1.

2. Compute in parallel for all i = 1, 2, . . .m.

3. Compute in parallel for all k = 0, 1, . . . [m/w]-1.

4. Compute in serial for l=1, 2, . . . , w.

5. tem1 = akw+lbs(i+kw+l)

6. ei,k (l)= ei,k

(l)+tem1 7. tem2 = akw+lbs(i-kw-l)

8. gi,k(l)

= gi,k(l-1)

+tem2 9. end

10. Ci= ei,k (w) + gi,k

(w)] 11. end

12. end

2.6.2 Architecture for WL-RNB type-a

A multiplier architecture can be built based on algorithm which is referred to as word-level reordered normal basis multiplier is shown in Fig.3.1.

From the top to the bottom, the architecture contains a (2m + 1) bit circular shift register, the Wire Expansion Module, one layer of AND gates, one layer of accumulation units (one XOR gate and one flip-flop connected in the feedback fashion), and one layer of XOR gate networks. A 2[m/w]-input XOR gate or a binary tree of 2[m/w]- 1 two-input XOR gates is used to produce the final output ci shown at the bottom in Fig.3.1.

An AND gate is required to multiply together the two input bits, an XOR gate is used to realize the addition operation, while a flip-flop (1-bit register) is used for holding the signal e. Assume operand A is input from an outside storage and operand B is stored in a (2m + 1)-bit circular shift register for step5.

The input operand A is required to be fed into the multiplier in a comb style. Let A be divided into [m/w] words of w bits. Then, in the first clock cycle the inputs are the first bit of every word,a1,aw+1,…,a(d-1)w+1. In the second clock cycle, the inputs are a2,aw+2,…,a(d-1)w+2. Finally, in the w^th clock cycle the inputs are aw,a2w,…,adw. For given i and k in Steps 5 and 6, every clock cycle the variable of function s(.) increases by one in operand bs(i-kw+l)

in Step 5, and decreases by one in operand bs(i-kw-l) in Step 7.

Figure. 2.1. Word-level multiplier using reordered normal basis (WL-RNB) type-a

The operand input bs(i+kw+l) is connected to bit bs(i+kw+l) of R2, while the operand input bs(i-kw-l) is connected to bit bs(i-kw-l) of R1. The Wire Expansion Module is just a wire reordering and copying module which does not contain any gates. This module accepts 2m+1 inputs from the circular shift register and provides 2dm outputs.

The 2dm outputs of the module can be divided into m groups and each group contains d pairs of output lines.

One option is to enter zero input bits for input A once the w clock cycles are over. The output product bits can be read from the output ports after certain number of extra clock cycles immediately following w cycles.

(5)

2.6.3 ARCHITECTURE COMPLEXITIES

The circular shift register R contains 2m+1 flip-flops. Wire Expansion Module does not contain any gates or flip-flops. There are, respectively, 2dm AND gates. The number of accumulation units is also 2dm, which each contains one flip-flop and one XOR gate. The m binary trees of XOR gates at the bottom consist of in total m(2d-1) XOR gates.

2.7 PROPOSED WORD-LEVEL MULTIPLIER TYPE-b 2.7.1 Algorithm

The number of gates required for the proposed multiplier WL-RNB-a in Fig. 3.1 can be further reduced at expense of slightly increased critical path delay. Its algorithm can be obtained by replacing Steps 5-10 in Type- a Algorithm with the following steps:

5. tem1 := bs(i+kw+l)+bs(i-kw-l)

6. tem2 := akw+l × tem1 7. e^(l)i,k :=e^(l-1)i,k + tem2 8. end

9. Ci= ei,k(w) ]

2.7.2 Architecture for WL-RNB type-b

The main difference between the two multiplier architectures lies within the blocks of dashed lines shown in Fig. 3.1 and 3.2.

From the top to the bottom, the architecture contains a (2m + 1)-bit circular shift register, the Wire Expansion Module, one layer of AND gates, one layer of accumulation units (one XOR gate and one flip-flop connected in the feedback fashion), and two layers of XOR gate networks. The (2m+1) bits of circular shift register and wire expansion module are the same for two architectures.

Figure. 2.2. Word-level multiplier using reordered normal basis (WL-RNB) type-b

One option is to enter only one zero input bits for input A once the w clock cycles are over. The output product bits can be read from the output ports after certain number of extra clock cycles immediately following w cycles.

The circular shift register R contains 2m+1 flip-flops. Wire Expansion Module does not contain any gates or flip-flops. There are 2dm AND gates. The number of accumulation units is also 2dm, which each contains one flip-flop and one XOR gate. The m binary trees of XOR gates at the bottom consist of in total m(2d-1) XOR gates.

III. RESULTS & DISCUSSIONS

Complexity comparison of the proposed reordered normal basis multipliers to some existing popular normal basis multipliers for the class of fields there exists.

(6)

Screen shot WL- RNB (Type-a)

Screen Shot WL- RNB (TYPE-b)

The VLSI area of an AND gate, an XOR gate, and a flip-flop can be approximated to be 6, 12, and 16 times the area of a transistor. So the area cost represents the number of transistors used which is equal to six times the number of AND gates, 12 times the number of XOR gates and 16 times the number of registers. The delay for an XOR gate is assumed to be twice of that for an AND gate.

The area and power requirements for WL-RNB type-a was greater than the WL-RNB type-b as shown in the table 3.1

Table 3.1 Area and power requirements for type-a and type-b

Let TA and TX denote the delay of a two-input AND gate and a two-input XOR gate. Then, critical path delay is Tcp=TA+TX. From this we can know that the critical path delay depends on neither the field size m nor the number of words d.

Compared to the other architectures, the proposed architectures have much smaller delay cost. Note that both area and delay are the objectives to minimize for a word-level multiplier, and decreasing one is usually at the expense of increasing the other. In fact, the area-delay product for WL-RNB-b is only 72%, and 71% of the previously proposed multiplier with smallest area-delay product, for number of words 16 and 32, respectively.

It is also worth noting that even though theoretically architecture WL-RNB-a is faster than WL-RNB-b, for large designs WL-RNB-b outperforms WL-RNB-a. This comes from the fact the delay resulting from placement and routing is almost twice more than that of WL-RNB-b.

IV. CONCLUSION

Contents WL-RNB Type-a

WL- RNB Type-b Total equivalent gate

count for design

1,553 1,369 Additional JTAG gate

count for IOBs

528 528

Peak memory usage 121MB 121MB Total estimated power

consumption

189mW 179mW

(7)

Two high speed word-level finite field multipliers in GF(2^m) using reordered normal basis are presented. The purpose of our project is to handling the multiplication complexity. Their architectural level details have been presented are compared with each other in terms of number of AND gates, XOR gates, critical path delays, area and power requirements. Compared to the existing sequential normal basis multipliers, our proposed multipliers have a fewer number of gates.

V. FUTURE ENHANCEMENT

The proposed multipliers are highly area efficient and require fewer number of logic gates even when compared with the most area efficient multiplier available in the open literature. This makes these proposed multipliers suitable for applications where the value of m is large, e.g., resource constrained cryptographic systems.

REFERENCES

[1] Agnew, G.B,Mullin, R.C, Onyszchuck, I.M. and Vanstone, S.A. (1991) ‘An implementation for a Fast Public-Key Cryptosystem,’j.Cryptology,vol. 3, pp,vol.32

[2] Agnew, G.B,Mullin, R.C, Onyszchuck, I.M. and Vanstone, S.A. (1993),’Arithmetic operations in GF(2^m),’ J. Cryptology, 6:3–1.

[3] Agnew, G.B,Mullin, R.C.and Vanstone, S.A. (1998) ‘Fast Exponentiation in GF(2ⁿ),’ Proc. Workshop Theory and Application of Cryptographic Techniques, Advance in Cryptology (Eurocrypt ’88), C.G. Gunther, ed., pp. 251-255.

[4] Adnan Abdul-Aziz Gutub, and Alexandre Ferreira Tenca ‘Efficient Scalable VLSI Architecture for Montgomery Inversion in GF(p),’

[5] Chien-Hsing Wu, Chien-Ming Wu, Ming-Der Shieh and Yin-Tsung Hwang, (2003), High-Speed, Low-Complexity Systolic Designs of Novel Iterative Division Algorithms in GF(2^m),’IEEE Trans on Computers vol.109

[6] Dieter Jungnickel, Alfred,J. Menezes, And Scott A. Vanstone,(1990),’ On The Number Of Self-Dual Bases Of GF(q^m) Over GF(q),’vol.53.

[7] Francisco Rodrı´guez-Henrı´quez, and Cetin Kaya Koc, (2003) ‘Parallel Multipliers Based on Special Irreducible Pentanomials,’ IEEE Trans on Computers. vol. 52, NO. 11

[8] Fan, H and Hasan, M.A (2006)’Fast bit Parallel shifted Polynomial Basis Multiplier in GF(2ⁿ),’ IEEE Trans. Circuit and Systems I.

[9] Gao, S and Vanstone, S. (1995) ‘On Orders of Optimal Normal Basis Generators,’ Math. Computation,vol. 64,no. 2,pp. 1227-1233.

[10] Gao, S,von zur Gathen, J,Panario, D and Shoup, V (2000) ‘Algorithms for Exponentiations in Finite Fields,’ j.Symbolic Computation,vol. 29, pp. 879-889.

[11] Gerardo Orlando (2002) ‘Effcient Elliptic Curve Processor Architectures for Field Programmable Logic,’

[12] Hasan, M.A, Wang M.Z and Bhargava V.K (1992)’Modular construction of low complexityparallel multipliers for a class of finite fields GF(2^m),’ IEEE Trans. Comput., 41(8):962–971.

[13] Hasan, M.A, Wang M.Z and Bhargava V.K (1993) ‘A Modified Massey-Omura Parallel Multiplier for a Class of Finite Fields,’ IEEE Trans. Computers, vol. 42, no. 10, pp. 1278-1280.

[14] Itoh,T and Tsujii,S.(1988)’A fast algorithm for computing multiplicative inverse in GF(2^m) using normal bases,’ Inform. and Comput., 78:171–177.

[15] Itoh, T and Tsujii, S (1989)’ Structure of parallel multipliers for a class of finite fields GF(2^m).,’ Information and Computation, 83:21–

40.

[16] Koc, C.K and Sunar, B. (1998) ‘Low-Complexity Bit-Parallel Canonical and Normal Basis Multipliers for a Class of Finite Fields,’

IEEE Trans. Computers, vol. 47, no. 3, pp. 353-356.

[17] Kim, C.H, Oh, S and Lim, J (2002)’A new Hardware Architecture for Operations in GF(2^m),’IEEE Trans Compute.51.

[18] Lehmer, D.H (1965) ‘On certain chains of primes,’

[19] Lidl, R and Niederreiter, H (1983) ’Finite Fields,’ Addison-Wesley Publishing Company, Reading.

[20] Lidl, R and Niederreiter, H. (1997) ‘Intoduction to Finite Fields and Their Applications,’ second ed.

[21] Lejla Batina, Nele Mentens, Sıddıka Berna Ors, Bart Preneel Katholieke Universiteit Leuven ‘Serial Multiplier Architectures over GF(2ⁿ) for Elliptic Curve Cryptosystems,’

[22] D.H. Lehmer, \On certain chains of primes", Proc. London Math. Soc. 14A (Littlewood 80 volume, 183-186.

[23] Massey, J.L and Omura, J.K (1988) ‘Computational Method and Apparatus for Finite Field Arithmetic,’ US Patent 4587627.

[24] Mastrovito ,E.D(1988) ‘ VLSI architectures for multiplication over finite field GF(2m),’ Applied Algebra, Algebraic Algorithms and Error-Correcting Codes, Lecture Notes in Computer Science, No. 357, pages 297–309.

[25] Mullin, R.C and Wilson, R.M. (1989) ‘Optimal Normal Bases in GF(pⁿ),’ Discrete Applied Math., vol. 22, pp. 149-161.

[26] Mullin, R.C, Onyszchuk, I.M, Vanstone, S.A and Wilson R.A (1989) ‘Optimal Normal Basis in GF(pⁿ),’ Discrete Applied Mathematics, 22:149–161.

[27] Namin, A.H, Wu, H and Ahmadi, M.(2008) ‘A High Speed Word Level Finite Field Multiplier Using Reordered Normal Basis,’ Proc.

IEEE Int’l Symp. Circuit and Systems (ISCAS), pp.3278-3281.

[28] Reyhani-Masoleh, A and Hasan, M.A (2002) ‘A New Construction of Massey-Omura Parallel Multiplier Over GF(2^m),’ IEEE Trans.

Computers, vol. 51, no. 5, pp. 511-520.

[29] Reyhani-Masoleh, A and Hasan, M.A (2003) ‘Efficient Multiplication Beyond Optimal Normal Basis,’ IEEE Trans.Compute.52.

[30] Rodriguez-Henriquez, F and Cetin Kaya Koc (2003) ‘Parallel Multipliers Based on special Irreducible Pentonomials,’ IEEE Trans.Compute.52.

[31] Reyhani-Masoleh, A and Hasan, M.A. (2005) ‘Low Complexity Word-Level Sequential Normal Basis Multipliers,’ IEEE Trans.

Computers, vol. 54. No. 2, pp. 98-110.

[32] Sunar, B and Koc, C.K (1999)’ Mastrovito multiplier for all trinomials,’ IEEE Trans Compute.48.

[33] Sunar, B and Koc, C.K (2001) ‘An Efficient Optimal Normal Basis Type II Multiplier,’ IEEE Transactions on Computers, 50(1):83–

88

[34] Wu, H and Hasan, M.A. (1998) ‘Low Complexity Bit-Parallel Multiplers For a Class of Finite Fields,’ IEEE Trans. Computers, vol.

47, no.8,pp. 883-887.

[35] Wu, H, Hasan, M.A, Blake, I.F and Gao, S. (2002) ‘Finite Field Multiplier Using Redundant Representation,’ IEEE Trans.

Computers, vol. 51,no. 11, pp. 1306-1316.