The Design & Implementation of the FAM (Fused Add-Multiply) Operator for Improving Performance Using Modified Booth (MB) Form

(1)

International Journal of Engineering Science Invention Research & Development; Vol. III, Issue III, September 2016 www.ijesird.com, e-ISSN: 2349-6185 (page 211-219)

The Design & Implementation of the FAM (Fused Add-Multiply) Operator for Improving Performance Using Modified Booth

(MB) Form

J

.

E

ASHA¹

, R

UPA

K

UMAR

D

HANAVATH2

, K. S

RINIVASA

R

EDDY3

,E.Geetha Reddy

⁴

1PG Scholar, Dept of VLSI&ES, Nagole Institute of Technology and Science, JNTUH, Hyderabad, TS, India.

2Associate Professor, Dept of ECE, Nagole Institute of Technology and Science, JNTUH, Hyderabad, TS, India.

3Associate Professor, Dept of ECE, Nagole Institute of Technology and Science, JNTUH, Hyderabad, TS, India.

4Assistant Professor, Dept of EEE, VITAE, JNTUH, Hyderabad, TS, India.

1[email protected],²[email protected],³[email protected],⁴[email protected]

Abstract: In the Digital signal processing (DSP) domain, applications like DSP processors, microprocessors etc, the complex arithmetic operations like FFT, FIR filters, DCT, DWT etc transformations take more time, power consumption and design complexity to do the required task using processing chips. To make such complex on efficient and structured recoding technique and dig into three different schemes integrating them FAM designs. From the experimental results we can show that a comparative study of the FAM designs performance in term of hardware complexity critical path delay and power consumption. After our research in the FAM design we apply this unit to the Multiply-Accumulator (MAC) application.

Keywords: Modified Booth Recoding, Add-Multiply Operator, Arithmetic Circuits, MAC.

I. INTRODUCTION

In 21^st century the role of the multimedia and communications plays a prominent role in modern day requirements. These two areas namely multimedia nd communications are the applications of digital signal processing (DSP) and based on these two areas a large number of arithmetic operations as their implementation is based on computationally intensive kernels, such as Fast Fourier Transform (FFT), Discrete Cosine Transform (DCT), Finite Impulse Response (FIR) filters and signals’

convolution. The digital signal processing (DSP) performance is the main area of concern because of design decisions made and regarding the allocation and the architecture of arithmetic units. Although tremendous progress has been made in the literature to resolve the issues of the digital signal processing (DSP) performance issues, there still exists a number of problems which fails digital signal processing (DSP) to meet the practical requirements.

Recent developments in the field of digital signal processing (DSP) achieves the significant achievements namely proposing the combination of operations based on arithmetic components which share data, finally resultant results proves

significant improvement in terms of performance and efficiency.

As reported in the paper [3] the novel systems namely the Multiply-Accumulator (MAC) and Multiply-Add (MAD) units were introduced [3] leading to more efficient implementations of DSP algorithms compared to the conventional ones. Latter many research works proved that the MAC suffers from area occupation, critical path delay or power consumption which is latter resolved in [5]-[7]. The main usage of the MAC is it increases the flexibility of DSP data path synthesis as a large set of arithmetic operations can be efficiently mapped onto them. Even though the MAC/MAD operations vividly used in digital signal processing (DSP) applications but still some DSP applications are based on Add-Multiply (AM) operations.

The straightforward design of the AM unit, by first allocating an adder and then driving its output to the input of a multiplier, increases significantly both area and critical path delay of the circuit. Although the direct recoding of the sum of two numbers in its MB form leads to a more efficient implementation of the fused Add-Multiply (FAM) unit compared to the conventional one, existing recoding schemes are based on complex manipulations in bit-level, which are implemented by dedicated circuits in gate-level. This work focuses on the efficient design of FAM operators, targeting the optimization of the recoding scheme for direct shaping of the MB form of the sum of two numbers (Sum to MB – S- MB).

II. PROPOSED METHOD A. Sum to Modified Booth Recoding Algorithm

In this technique S-MB recoding, the inputs ) of two successive bits and two ) successive bits are recoded into one MB digit , from the equation (1) and (2), three bits are needed to for a MB digit. The least significant of them weighted as positive while most significant bit is weighted as negative as shown in table (1).

.

(2)

International Journal of Engineering Science Invention Research & Development; Vol. III, Issue III, September 2016

www.ijesird.com, e-ISSN: 2349-6185 (page 211-219)

TABLE I: Modified Booth Encoding Table TABLE III: Basic Operation

By applying signed-bit arithmetic we can bring pair of bits in MB form, consequently. To achieve this, we implement a set of bit level signed Half Adders (HA) and Full Adders (FA) by taking signed inputs and outputs.

(1)

(2) In the implemented work we have been using two types of signed HAs which are referred as HA^* and HA^**. The respective truth tables are for this Table II-IV and Fig.1 we present corresponding Boolean equations. Assuming that

are the binary inputs and are outputs (sum and carry respectively) of a HA^* which implements the relation where the sum is negatively signed (Table II, Fig 4(a)), the output will be one of the values

In the Table III, we also narrate the dual implementation of HA^* where we inversed the signs of the all inputs and outputs and, consequently, changed the output values to

In the table IV and Fig. 4(b) tell the operation and schematic of HA^** which gives the relation and utilizes a negative and a positive input resulting in the output values We also construct two types of signed FAs which are displayed in the Table V and VI and Fig. 2. The schematics drawn in Fig. 2(a) and show the relation FA^* and FA^** with the conventional FA.

Considering that and are the binary inputs and are the sum and carry respectively, FA^* implements the relation where the bits and are assumed as negatively signed (Table V, Fig. 2(a)).

Output value= - 2.c + s = -p – q

Table V represents the truth table and output values of FA^* are . In the case of FA^**negatively signed two inputs are and FA^**implements the relation (Table VI, Fig. 5(b)). The output values convert . As shown in Fig. 5, the signed FAs are implemented using the conventional FA with negative inputs and outputs inverted.

Fig.1. Boolean equations and schematics for signed (a) HA* and (b) HA**.

B. S-MB Proposed Method

In order to implement and explore three new alternative schemes of the S-MB recoding technique we have to use conventional, signed HAs and FAs which are discussed earlier section. We can apply easily each of the three schemes in either signed (2’s compliments form) or unsigned numbers which consist of even and odd number of bits. In this methodology inputs A and B are 2’s complements representation form and inhere of bits if it is even or

bits if it is odd bit-width. Here our aim to transform the sum of A and B i.e. Y=A+B in MB representation, in order to get form of output bit of three as MB digit

from the input bits and .

S-MB1 Recoding Scheme: S-MB1scheme considered as first method in the proposed methodology is described in detail in Fig. 3 for even and odd bit-width of input numbers.

From the fig.3 sum of A and B is given as.; This encoding is based on the analysis of section II.B (see (2)) for MB ,

. From the j th recoding cell of Fig. 6, both bits and are extracted. Inputs and to a

(3)

International Journal of Engineering Science Invention Research &

Development; Vol. III, Issue III, September 2016 www.ijesird.com, e-ISSN: 2349-6185 (page 211-219)

TABLE II: Basic Operation

Output value= 2.c – s = p + q

(4)

conventional FA gives carry

and the sum is . As the bit needs

to be negatively signed, we use a FA^**(Table V, Fig 2(a)) with inputs and , which generates the carry

and the sum

(3)

Here the equation a negatively signed

is driven to the FA^* while we can implement as the positive sign as an input carry of the subsequent recoding cell where the initial conditions are and .

path delay is constant in respect to the input bit-width and can be described by below equation.

(5) Here the delay of shaping output carries of the conventional FA is and forming sum of the signed FA^* is .

S-MB2 Recoding Scheme: The second approach of the proposed recoding technique, S-MB2, is described in Fig. 3 for even (Fig. 3(a)) and odd (Fig. 3(b)) bit-width of input numbers. We consider the initial values and

. The digits , 0 ≤ j ≤ k-1 are formed based on, and according to (8). As in the S-MB1 recoding scheme, we use a conventional FA to produce the carry and the sum . The inputs of the FA are

and . The bit is the output carry of a conventional HA which is part of the ( j -1) recoding cell and has the bits as inputs. The bit is the output sum of a HA* (basic operation – Table II, Fig. 1(a))in which we drive and the sum produced by a conventional HA with the bits , as inputs. The HA* is used in order to produce the negatively signed sum and its

(a) outputs are given by the following Boolean equations:

(6) In case thatA and B comprise of even number of bits (Fig.

7(a)), and are negatively weighted and the conventional HA of the (n-1) recoding cell is replaced by the dual HA* analyzed in Table III. The MSD is a signed digit and is given by the relation:

(b) (7)

In case that the number of bits of the inputs A and B is odd Fig.2. Boolean equations and schematics for signed (a) (Fig.3(b)), the MSD is a MB digit that is formed FA* and (b) FA**.

based on , and . The carry (- ) and

In the recoding scheme of the S-MB1 the most significant

the sum are produced by a FA** with inputs (- ), digit (MSD) can be distinguished as two cases even and odd

based on the bit-width of inputs A and B (Fig. 3) as even (-) and (Table VI, Fig. 2(b)). The critical path number of bit-width and odd number of bit-width. In the even delay of S-MB2 recoding scheme is calculated as follows:

bit width case the MSD is signed digit and respective

equation is (8)

(4) where and are the delays of shaping the

After successful evaluation of first case in latter approach output carry of a conventional HA and FA respectively and the MB formation, the odd bit-width case, the MSD is is the delay of forming the sum of a signed HA*.

from and . By the inputs

and

S-MB3 Recoding Scheme: The third scheme implementing to the FA^** gives carry and the sum the proposed recoding technique is S-MB3. It is illustrated in (Table VI, Fig 2(b)). In this recoding scheme, the critical detail in Fig. 3 for even (Fig. 3(a)) and odd (Fig. 3(b)) bit-

width of input numbers. We consider that and

(5)

(6)

International Journal of Engineering Science Invention Research

& Development; Vol. III, Issue III, September 2016 www.ijesird.com, e-ISSN: 2349-6185 (page 211-219)

. We build the digits , 0 ≤ j ≤ k-1 based on , and according to (8). Once more, we use a conventional FA to produce the carry and the sum . The bit is now the output carry of a HA* (basic

operation– Table II, Fig. 1(a)), which belongs to the (j-1) recoding cell and has the bits , as inputs. The negatively signed bit is produced by a HA** (Table IV, Fig. 4(b)) in which we drive and the output sum (negatively signed) of the HA* of the j recoding cell with the bits , as inputs. The carry and sum

outputs of the HA** are given by the following Boolean equations:

(9) In case that both A and B comprise of even number of bits (Fig. 3(a)), and are negatively weighted and we use the dual implementation of the HA* (Table III, Fig1(a)) in the (k-1) recoding cell. Consequently, the output sum of the HA* becomes positively weighted and the HA**

that follows has to be replaced with a HA*. The most significant digits for both cases of even and odd bit-width of and , are formed as in S-MB2 recoding scheme. The critical path delay of S-MB3 recoding scheme is calculated as follows:

(5) where and are the delays of shaping the output carry of a signed HA* and FA respectively and is the

delay of forming the sum of a signed HA**.

most significant digits change) and odd (only the most significant digit change) bit-width of A and B , regarding the signs of the most significant bits of A and B . The basic recoding block in all schemes remains unchanged.

III. SIMULATION RESULTS

The S-MB1, S-MB2 and S-MB3 is Verilog coded and simulated on Xilinx to check the desired functionality. In proposed S-MB’s generate the modified booth encoded values directly from inputs without adding the bts. For . Figs.4 to 8 shows the Xilinx snapshots of S-MB1 with signed and unsigned bits and also for even and odd bits. the S-MB1 and S-MB2 based FAM designs are on average more efficient than the ones based on the S-MB3 and the existing schemes.

So, we propose the S-MB1 and S-MB2 recoding schemes as the most efficient ones (regarding their overall performance which includes critical delay, area complexity and power dissipation). The Proposed schemes in Verilog is synthesized on Xilinx ISE.

Fig.4. Smb1 even (signed).

Fig.5. Smb1 odd (signed).

Fig.3. Critical path of the proposed (a) S-MB1 and (b) S- MB2 recoding scheme.

Unsigned Input Numbers: In case that the input numbers A and B are unsigned, their most significant bits are positively signed. Figs. 4–8 present the modifications that we have to

make in all S-MB schemes for

both cases of even (the two

(7)

International Journal of Engineering Science Invention Research & Development; Vol. III, Issue III, September 2016

www.ijesird.com, e-ISSN: 2349-6185 (page 211-219)

Fig.6. Smb1 even (unsigned).

(8)

Fig.7. Smb1 odd (unsigned).

Fig.8. FAM based MAC operation.

IV. CONCLUSION

This paper stresses on the optimized the design of the Fused-Add-Multiply (FAM) operator. The direct recoding of the sum of the two numbers to its MB form as structured method is proposed and we research on the other thee alternative design of the proposed method S-MB recoder, then we show the better performance of these techniques in the FAM operator design in terms of power consumption, hardware design complexity and path delay. We implemented the MAC application by using the proposed method.

V. REFERENCES

[1] A. Amaricai, M. Vladutiu, and O. Boncalo, “Design issues and implementations for floating-point divide-add fused,” IEEE Trans. Circuits Syst. II–Exp. Briefs, vol. 57, no.

4, pp. 295–299, Apr. 2010.

[2] E. E. Swartzlander and H. H. M. Saleh, “FFT implementation with fused floating-point operations,” IEEE Trans. Comput., vol. 61, no. 2, pp. 284–288, Feb. 2012.

[3] J. J. F. Cavanagh, Digital Computer Arithmetic. New York: McGrawHill, 1984.

[4] S. Nikolaidis, E. Karaolis, and E. D. Kyriakis-Bitzaros,

“Estimation of signal transition activity in FIR filters implemented by a MAC architecture,” IEEE Trans. Comput.- Aided Des. Integr. Circuits Syst., vol. 19, no. 1, pp. 164–169, Jan. 2000.

[5] O. Kwon, K. Nowka, and E. E. Swartzlander, “A 16-bit by 16-bit MAC design using fast 5: 3 compressor cells,” J.

VLSI Signal Process. Syst., vol. 31, no. 2, pp. 77–89, Jun.

2002.

[6] L.-H. Chen, O. T.-C. Chen, T.-Y. Wang, and Y.-C. Ma,

“A multiplication-accumulation computation unit with optimized compressors and minimized switching activities,”

in Proc. IEEE Int, Symp. Circuits and Syst., Kobe, Japan, 2005, vol. 6, pp. 6118–6121.

[7] Y.-H. Seo and D.-W. Kim, “A new VLSI architecture of parallel multiplier–accumulator based on Radix-2 modi fied Booth algorithm,” IEEE Trans. Very Large Scale Integr.

(VLSI) Syst., vol. 18, no. 2, pp. 201–208, Feb. 2010.

[8] A. Peymandoust and G. de Micheli, “Using symbolic algebra in algorithmic level DSP synthesis,” in Proc. Design Automation Conf., Las Vegas, NV, 2001, pp. 277–282.

[9] W.-C. Yeh and C.-W. Jen, “High-speed and low-power split-radix FFT,” IEEE Trans. Signal Process., vol. 51, no. 3, pp. 864–874, Mar. 2003. [10] C. N. Lyu and D. W. Matula,

“Redundant binary Booth recoding,” in Proc. 12th Symp.

Comput. Arithmetic, 1995, pp. 50–57.

[11] J. D. Bruguera and T. Lang, “Implementation of the FFT butterfly with redundant arithmetic,” IEEE Trans. Circuits Syst. Il, Analog Digit. Signal Process., vol. 43, no. 10, pp.

717–723, Oct. 1996.

[12] W.-C. Yeh, “Arithmetic Module Design and its Application to FFT,” Ph.D. dissertation, Dept. Electron. Eng., National Chiao-Tung University, , Chiao-Tung, 2001.

[13] R. Zimmermann and D. Q. Tran, “Optimized synthesis of sum-of-products,” in Proc. Asilomar Conf. Signals, Syst.

Comput., Paci fic Grove, Washington, DC, 2003, pp. 867–

872. [14] B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs. Oxford: Oxford Univ. Press, 2000. [15]

O. L. Macsorley, “High-speed arithmetic in binary

computers,” Proc. IRE, vol. 49, no. 1, pp. 67–91, Jan. 1961.

[16] N. H. E. Weste and D. M. Harris, “Datapath subsystems,” in CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed. Readington: Addison-Wesley, 2010, ch. 11.

[17] S. Xydis, I. Triantafyllou, G. Economakos, and K.

Pekmestzi, “Flexible datapath synthesis through arithmetically optimized operation chaining,” in Proc.

NASA/ESA Conf. Adaptive Hardware Syst., 2009, pp. 407–

414.

[18][Online].Available:http://www.synopsys.com/Tools/Impl ementation/RTLSynthesis/DCUltra/Pages/default.aspx.

[19][Online].Available:http://www.synopsys.com/Tools/Impl ementation/SignOff/PrimeTime/Pages/default.aspx.

[20] Z. Huang, “High-Level Optimization Techniques for Low-Power Multiplier Design,” Ph.D., University of California, Department of Computer Science, Los Angeles, CA, 2003.

(9)

AUTHORS:

J. EASHA is pursuing M.tech in VLSI & ES from the Nagole Institute of Technology & Science, Hyderabad. She completed B.tech in E.C.E from Stanley College of Engineering and Technology for Women (OU affiliated), Abids, Hyderabad, Telangana.

Dr.K. SRINIVASA REDDY is Associate Professor

of the Electronics and Communication Engineering, Nagole Institute of Technology and Science, Hyderabad .He received his B.Tech degree in Electronics and Communication Engineering from JNT University, Hyderabad, M.Tech degree in Embedded Systems from JNT University, Hyderabad and PhD from OPJS University, Churu – Rajasthan. He is a member of The International Association of Engineers (IAENG)and Member of IEEE. He had forty five publications in both National and International Journals. He has written three text books in the field of wireless communications.

Mr. Rupa kumar Dhanavath is Associate

Professor of the Electronics and Communication Engineering, Nagole Institute of Technology and Science, Hyderabad .He received his B.Tech degree in

Electronics and Communication Engineering from JNT University, Hyderabad, and M.Tech degree in VLSI System Design from JNT University, Hyderabad. He had about six publications in National and International Journals. His interested areas are micro electronics and communications.

The Design & Implementation of the FAM (Fused Add-Multiply) Operator for Improving Performance Using Modified Booth (MB) Form