A resource optimize Split bit multiplier for high order computation

(1)

Vol. 28, No. 13, (2019), pp. 996-1004

A resource optimize Split bit multiplier for high order computation

N. Srinivas1 , Dr. Y. Rajasree rao2

Electronics & Communication Engineering,Guru Nanak Institute of Technology,Hyderabad,Telangana,India

Electronics & Communication Engineering Lords Institute of Engineering&Technology,Hyderabad,Telangana,India

srinivasnarkuti@gmail.com yrrao315@yahoo.com

Abstract

Computation of arithmetic and logical operation involve multipliers as the basic unit.

In the designing operation of multiplier operation, among different optimization approach, divide and conquer approach has a greater efficiency. The split operation of a partial operand in multiplication operation result in faster computation and lower resource overhead. However a lower radix operation leads to larger processing iterations, resulting in increase of resource utilization and increases the latency of the system. In improving the multiplication performance a resource optimization by resource reutilization and higher level coding is proposed. This approach illustrates a significant improvement in the resource optimization and speed of computing in multiplication operation.

Keywords: Resource optimization, divide and conquer approach, Dadda multiplication.

1. Introduction

High speed low processing units are required due to the high speed processing requirement. System architecture, algorithmic modification or high processing demands are required to modify the variation with the traditional processing approach [1]. All existing processors are examples of data processing units. Information such as pressure, light, heat etc. are the physical parameters. These physical parameters are transformed into an electronic signal resulting [2], is digital data representation. For the interaction of active and processing units, there is a careful observation of data and transforms into digital processing[3]. Traditionally data processing and information systems are governed by algorithmic derivatives, logical processing, and statistical data analysis [4]. Along with technology spreading in various environments, this culture approach has been subjected to various scientific and research approaches [5-7]. In these approaches, the processing of data density varies depending on system performance and processing information. In areas, such as biomedical information or biomedical signal processing [8]. The need for signaling is to multiply heavy data calculations and processing information for computation. Data processing is the process of converting data into information or knowledge. The process of this information is usually considered mechanically and processed under an constraint resource [9-12]. Data processing systems high emphasis on practicality, since the information is useful or effective when processing them effectively.

High resource support or more data bits need to be processed [13] multiple times to achieve the goal of high-computing efficiency, and must be processed on a machine cycle [14]. The processing of multiple data on high resources is an effective approach.

Limitations on available resources are subject to Resource Allocation limits [15,16]. Fast processing is accessible through data representation to overcome the resource allocation

(2)

Vol. 28, No. 13, (2019), pp. 996-1004

divide and conquer operation is developed [17] to achieve low resource high throughput, however the resource requirement for the split operation results in lower computing efficiency to solve the resource constraint a higher logical computation with resource sharing is proposed. To present the stated design this paper is outlined in 7 sections.

Where section 2 outline the approach of a split bit multiplier unit. Section 3 outline the proposed approach of resource optimized split bit coding. The result analysis on the developed approach is presented in section 4. A concluding remark to the developed approach is outlined in section 5.

2. 2/N Split bit Multiplier

Split bit multiplications are developed on the base of Divide-and-conquer approach [17], where N-bit input data are divided into N/2 bit patterns and a successive multiplication and addition leads to the generation of product result. In the process of multiplication partial products (PP) are used to buffer temporary results. For a tow 16-bit input A,B the partial product is defined as,

Where A H and BH are the MSB block and AL and BL are the LSB block of a given inputs, for k = 2n∕2. The product operation depend on the radix of multiplication, and for a 16-bit input it is represented as 28. Which indicate a total of 28 left shift operations. The resultant product is represented by,

For each of the 8 bit operation a partial product is developed represented as,

Each of the partial products is of 16 bit, which is then divided into two equal parts given as,

These Partial products are added to result final product (FP) as given below,

(3)

Vol. 28, No. 13, (2019), pp. 996-1004

Wherein dedicated resources for each of the partial product in allocated in [17], an optimization approach for resource utilization based on the recursive multiplication process following Dadda algorithm is outlined in [18]. The developed multiplication architecture for the proposed approach is illustrated in figure 1 below.

Figure 1. 16-bit split multiplier [18]

Here to optimize the resource utilization a 2-bit counter is used for controlling a multiplexing and decode operation. Here for each of the operation a recursive 8-bit Dadda multiplier unit is used to generate partial product. At each of the iteration one PP is generated buffered into a data register. The buffered PP is then added to generate the product result.

In the implementation of the suggested approach, a 2-bit counter is used for the control of multiplexer, storage and decoder unit. For the product generation, a single multiplier unit is iterated for 4 times, and each of the iteration, the inputs are taken from multiplexer and partial product is generated based on the select line to the multiplexing unit. A 2-4 decoder unit is used for the control of register access in the multiplier unit. For available data the two PP blocks are stored into the registers separately using a control signal passed by the decoder unit.

The register data’s are processed for multiplication based on Dadda multiplier algorithm producing partial products (PP1-PP4). Dadda algorithm is observed to be faster in computation compared to the Wallace tree algorithm. Here the multiplication operation is controlled by the counter output which is used in generating the PP. Each of the PP are added using a ripple carry adder because of it low area utilization and lower complexity.

Here the process iterated to generate the product result. The selection of input line to generate output in concern to resource optimization is observed, however, the iterative nature of the process and dedicated register for PP and addition operation results in delay and resource overhead to the system. To overcome the limitation a new resource efficient split multiplier unit is proposed.

(4)

Vol. 28, No. 13, (2019), pp. 996-1004

3. Proposed resource optimize split bit multiplier

In developing the resource optimized design for multiplication operation, a new resource optimization with scheduled resource utilization is proposed. This unit controls the allocation process and temporary buffering based on a recurrent logic sharing in multiplexing operation of a multiplier unit. In this approach a new control operation based on the event of transition is proposed, where a count operation monitors the register utilization and PP generation in triggering the resource reutilization. In the process of resource utilization the register allocated in the existing approach and the proposed approach is illustrated below,

For a 16 bit input the existing approach a register bank of 8x2 locations are allocated for buffering operand and 8 partial Products and as shown in figure 2.

Figure 2. Register allocation in existing 2/N split multiplier [18]

Total number of dedicated registers allocated in this approach is given as, Operand Register = 2

PP_register = 8, and Adder = 8.

Total register dedicated = 2+8+8= 18

In the proposed approach however allocating 8 dedicated PP-register a transient based control for register allocation is proposed. In this approach, for each of the operand event and multiplexer selection, a temporary PP register (T_PP_reg) is allocated and the next event the results are added and buffered to the add registers. the next iteration, T_PP_reg is cleared and reallocated for buffering the current operstion result. The T_PP_reg is then added and buffered to the next add register of the multiplier unit. The operation is as illustrated below.

Figure 2. Register allocation for the proposed approach

(5)

Vol. 28, No. 13, (2019), pp. 996-1004

The transient monitor unit observe a change in the operand and trigger the allocation of T_PP_reg which is then synchronized to adder unit to perform a addition of the two PP and buffered to Adder register.

The total operational register in this case is given by, Operand Register = 2

PP_register = 2, and Adder = 8.

Total register dedicated = 2+2+8= 12

Where 6 registers are minimized in the allocation process. The modified architecture for the proposed approach is illustrated as,

Figure 3. Proposed architecture for the proposed resource optimized split bit register The inputs are taken as 16 bit input and passed to multiplexer unit. Each of the operand is then passed to the PP register based on the transient monitoring unit. The validation of the developed approach is presented in below section.

4. Result Analysis

An operational description for the evaluation of the proposed approach is defined in HDL environment. The timing simulation results for the specific approach, and the results obtained have been tested, and the results are as illustrated below,

(6)

Vol. 28, No. 13, (2019), pp. 996-1004

Figure 3: Result for the operand fetch process.

Figure 4: Result for the multiplication phase.

Figure 5 Result for the adder phase.

The above figure illustrates the process of setting the time-simulation of the developed system. Operation detection and logical operating defines the location monitored for a system with 5x5 CLB configuration. Operations in interconnections vary dramatically, for the developed system with variation in bit width. The implementation detail of the developed approach is shown below.

Figure 6. Observed synthesis report .

(7)

Vol. 28, No. 13, (2019), pp. 996-1004

The power analysis details for the developed approach is obtained as,

Figure 9: Power analysis The Timing report for the implementation is given as, Maximum Frequency: 145.32MHz

Minimum input arrival time before clock: 1.450ns Maximum output required time after clock: 2.515ns

Figure 7: RTL view of implementation .

Figure 8: Routing connection of CLBs in the targeted FPGA.

(8)

Vol. 28, No. 13, (2019), pp. 996-1004

Figure 9: Top Chip view of the implementation

The logical implementation of the approach developed to the targeted Xilinx FPGA is monitored. The logical area coverage and placement are tracked at targeted FPGA. Chip outline is monitored for the target device. The location and routing of a specific approach directly to the targeted device is configured correctly. Operation density was introduced based on the operation bit coding.

proposed [18] [7] [4]

FPGA device Spartan E Spartan E Spartan E Spartan E

Algorithm Dadda, resource optimized split multiplier

Dadda, N/2 split multiplier

Dadda Vedic

Slice 123 132 493 493

IOBs 69 67 66 66

LUTs 198 228 844 1234

FF 49 62 492 -

Freq.(MHz) 415.5 320.7 79.1 -

Delay(ns) 23.6 38.3 61.6 38.8

Power (mW) 118.4 211.8 278.3 297.8

Table 1. Comparison of the develop approach over past existing approaches

5. Conclusion

An Advanced approach in improving the performance for a faster approach in multiplication process. Multi-level representation of the multiplication signal representation will help reduce variable complexity through traditional withdrawal approaches at variable data sizes. The process of retraction of multiplication data is demonstrated with the logical representation and processing of multiple register. The signal processing of the multiplexing signal system improves the observations through the proposed approach. The performance of the developed approach shows that the rapid functioning of the signal detection, computation, and addtion can achieve a high level of logical representation for any density data and can be processed for higher magnitude value giving higher computation compatibility.

References

[1] A B. Pawar , “Radix-2 Vs Radix-4 High Speed Multiplier”, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 5, Issue 3, pp. 329-333, March 2015.

[2] Philip E. Madrid and Brian Millar, “Modified Booth Algorithm for High Radix Multiplication” ,IEEE Transactions on Very Large Scale Integration (VLSI) Systems Volume.1, Issue 2, pp. 164 – 167, August 2002.

(9)

Vol. 28, No. 13, (2019), pp. 996-1004

[3] Minu Thomas, “Design and Simulation of Radix-8 Booth Encoder Multiplier for Signed and Unsigned Numbers”, International Journal for Innovative Research in Science & Technology,Vol. 1, Issue 1, pp 1-10June 2014.

[4] K.Swapna, A.krishna Mohan, “Area Optimized Radix-2 8-Bit Reversible Booth Multiplier”, Int. Journal of Engineering Research and Application, Vol.

7, Issue 10, pp.65-70, ( Part -1) October 2017.

[5] Kelly Liew Suet Swee, Lo Hai Hiung, “Performance comparison review of Radix-based multiplier designs”, 4th International Conference on Intelligent and Advanced Systems, Volume 2, pp 854- 859 12-14th June 2012.

[6] Chandrashekhar T. Kukade,“A Novel Parallel Multiplier for 2’s Complement Numbers Using Booth’s Recoding Algorithm”, IEEE, International Conference on Electronic Systems, Signal Processing and Computing

Technologies, Volume 2, Issue 8, pp.93 – 98, 9-11 Jan. 2014.

[7] CHEN Ping-hua, ZHAO Juan. “High-speed Parallel 32×32-b Multiplier Using a Radix-16 Booth Encoder”, IEEE, Third International Symposium on Intelligent Information Technology Application Workshops, Volume 3, Issue 4, pp. 406-409, Nov. 2009.

[8] Laya Surendran E K, Rony Antony P, “Implementation of fast multiplier using modified Radix-4 Booth Algorithm with redundant binary adder for low energy applications”, First International Conference on Computational Systems and Communications, Volume 1, Issue 2, pp.266-271, 17-18 Dec 2014.

[9] Na Tang, “A High-Performance 32-bit Parallel Multiplier Using Modified Booth's Algorithm and Sign-Deduction Algorithm”, IEEE,ASIC,Volume.2 ,pp.

1281 – 1284,Oct 2003.

[10] Razaidi Hussin, Ali Yeon Md. Shakaff, “An Efficient Modified Booth Multiplier Architecture”, IEEE, International Conference on Electronic Design, Volume 1, Issue 6, pp.271-276, December 2008.

[11] Kajal B. Bobade, Prof. V. G. Roy, Prof. S. Kuntawar, “A Review On Fast Radix-10 Multiplication Using Binary Input And Convert Into Decimal Codes”, International Journal of Science, Engineering and Technology Research, Volume 06, Issue 05, pp 795-797, May 2017.

[12] “High Radix Encoding for Energy-Efficient Inexact Multipliers”, IEEE Transactions on Very Large Scale Integration Systems, Vol.26, (3), pp-421–

43, 2018..

[13] V. Leon, G. Zervakis, S. Xydis, D. Soudris, and K. Pekmestzi., “Walking through the Energy-Error Pareto Frontier of Approximate Multipliers”, IEEE Micro 38, 4 (Jul-Aug 2018), 40–49, 2018.

[14] A. Lingamneni, C. Enz, K. Palem, and C. Piguet. “Synthesizing Parsimonious Inexact Circuits Through Probabilistic Design Techniques. ACM Transactions on Embedded Computing Systems 12, 2s (May 2013), 93:1–93:26, 2013.

[15] A. Lingamneni, C. Enz, K. Palem, and C. Piguet, Highly Energy-Efficient and Quality-Tunable Inexact FFT Accelerators. In IEEE Custom Integrated Circuits Conference, pp-1–4, 2014.

[16] C. Liu, J. Han, and F. Lombardi, A Low-Power, High-Performance Approximate Multiplier with Configurable Partial Error Recovery. In Design, Automation and Test in Europe, pp-1–4, 2014.

[17] Manolopoulos K, Reisis D, Chouliaras, “An efficient multiple precision floating-point multiplier”, 18th IEEE international conference on electronics, circuits and systems (ICECS), IEEE, pp 153–156, 2011.

[18] Muneeb Abrar, Hassan Elahi, Bilal Ali Ahmad, Muhammad Ghayasudin, M.

Rizwan Mughal, “ An area‑ optimized N‑ bit multiplication technique using N/2‑ bit multiplication algorithm”, SN Applied Sciences, Springer,1:1348,