ANALYSIS AND IMPLEMENTATION OF LOW POWER WALLACE TREE MULTIPLIER
1
MS. V.N. CHAUDHARY,
2PROF. DR. P.R. DESHMUKH
1, 2
Department of Electronics And Telecommunication Engineering,
Sipna’s College Of Engineering And Technology, Amravati University, Amravati–444606, Maharashtra, India.
[email protected]
ABSTRACT : Designing multipliers that are of high-speed, low power, and/or regular in layout are of substantial research interest. Many attempts have been made to reduce the number of partial products generated in a multiplication process one of them is Wallace tree multiplier. Wallace Tree CSA structures have been used to sum the partial products in reduced time. This paper proposes a 8*8 bit modified Wallace tree construction is investigated and evaluated. Wallace high-speed multiplier uses full adders and half adders, 4:2 compressor, 3:2 compressor in their reduction phase. Therefore, minimizing the number of half adders used in a multiplier reduction will reduce the complexity. A modification to the Wallace reduction is presented that the delay is the same as for the conventional Wallace reduction.
KEY WORDS : CSA, Compressor, Wallace 1. INTRODUCTION
In the majority of digital signal processing ( DSP ) applications the critical operations usually involve many multiplications and/or accumulations. For real- time signal processing, a high speed and high throughput Multipliers-Accumulator (MAC) always a key to achieve a high performance digital signal processing system. In the last few years, the main consideration of MAC design is to enhance its speeds. This is because, battery energy available for these portable products limits the power consumption of the system. To achieve high speed MAC it is necessary to design high speed multiplier.
To achieve high speed, the well-known Wallace high- speed multiplier uses carry save adders to reduce an N-row bit product matrix to an equivalent two row matrix that is then summed with a carry propagating
adder to give the product. The carry save adders are conventional full adders whose carries are not connected, so that three words are taken in and two words are output. The Wallace multiplier also uses full adders and half adders, 4:2 compressor, 3:2 compressor in their reduction phase This paper presents a modified design that greatly reduces the number of half adders in Wallace multipliers.
Wallace Tree Construction
There exist a handful of ways to construct the Wallace Tree. One method Considers all the bits in each column at a time and compresses them into two bits. (a sum and a carry). A second method, which is used in this project, considers all the bits in each fours rows at a time and compresses them in an appropriate manner (see Figure 1).
Fig 1. Traditional Wallace Tree construction
In Figure 1, the Wallace tree uses 4:2 compressor, 3:2 compressors, full adders and half adders to compress a 12 partial products tree. Each 4:2 compressor takes in four bits from the same position “j”, one bit from the previous position “j-1” (which is the carry our of the compressor in the previous position) and outputs one sum bit in the position “j” and two carry outs into the next position “j+1”.
2. BACKGROUND
Any multiplier can be divided into three stages:
Partial products generation stage, partial products addition stage, and the final addition stage. In the first stage, the multiplicand and the multiplier are multiplied bit by bit to generate the partial products product The second stage is the most important, as it is the most complicated and determines the speed of
the overall multiplier. This project will be focused on the optimization of this stage, which consists of the addition of all the partial products. If speed is not an issue, the partial products can be added serially, reducing the design complexity. However, in high- speed design, the Wallace tree construction method is usually used to add the partial products in a tree-like fashion in order to produce two rows of partial products that can be added in the last stage. Although fast, since its critical path delay is proportional to the logarithm of the number of bits in the multiplier, the Wallace tree introduces other problems such as wasted layout area and increased complexity. In the last stage, the two-row outputs of the tree are added using any high-speed adder such as carry save adder to generate the output result.
3. WALLACE TREE MULTIPLIER DESIGN
acabs
U3 half_adder
ACOBSCin
U4 full_adder
ACOBSCin
U7 full_adder ACOBSCinU8 full_adder
ACOBSCin
U10 full_adder ACOBSCinU11 full_adder
acabs
U15 half_adder
ACOBSCin
U16 full_adder
ACOBSCin
U18 full_adder ACOBSCinU19 full_adder ACOBSCinU23 full_adder ACOBSCinU24 full_adder
ACOBSCin
U32 full_adder acabs
U22 half_adder
acabs
U29 half_adder
ACOBSCin
U30 full_adder ACOBSCinU33 full_adder ACOBSCinU34 full_adder
ACOBSCin
U35 full_adder ACOBSCinU36 full_adder ACOBSCinU37 full_adder acabs
U38 half_adder
acabs
U39 half_adder
acabs
U40 half_adder
acabs
U41 half_adder
acabs
U42 half_adder
acabs
U43 half_adder acabs
U44 half_adder
acabs
U45 half_adder
acabs
U46 half_adder
acabs
U47 half_adder
acabs
U48 half_adder acabs
U50 half_adder
acabs
U51 half_adder
acabs
U52 half_adder acabs
U53 half_adder
ACOBSCin
U55 full_adder ACOBSCinU56 full_adder
ACOBSCin
U57 full_adder ACOBSCinU58 full_adder acabs
U25 half_adder
ACOBSCin
U6 full_adder ac1bc2csum
cin d
U12 cm42
ac1bc2csum
cin d
U60 cm42
ac1bc2csum
cin d
U9 cm42
ac1bc2csum
cin d
U14 cm42 ac1bc2csum
cin d
U21 cm42ACOBSCinU61 full_adder
ACOBSCin
U62 full_adder
ACOBSCin
U63 full_adder
ac1bc2csum
cin d
U49 cm42ACOBSCinU65 full_adder
ACOBSCin
U66 full_adder x(7:0)Pa00y(7:0)Pb01Pb10Pc02Pc11Pc20Pd03Pd12Pd21Pd30Pe04Pe13Pe22Pe31Pe40Pf05Pf14Pf23Pf32Pf41Pf50Pg06Pg15Pg24Pg33Pg42Pg51Pg60Ph07Ph16Ph25Ph34Ph43Ph52Ph61Ph70Pi17Pi26Pi35Pi44Pi53Pi62Pi71Pj27Pj36Pj45Pj54Pj63Pj72Pk37Pk46Pk55Pk64Pk73Pl47Pl56Pl65Pl74Pm57Pm66Pm75Pn67Pn76Po77 U2inputand1 BusInput0(7:0)BusInput1(7:0)
Output0 Output1 Output2
Output3 Output4
Output5
Output6
Output7
Output8 Output9
Output10 Output11 Output12
Output13 Output14 Output15
ac1bc2csum
cin d
U26 cm42 in1 carryin2carryout in3sumin4U1 new32
in1 carryin2carryout in3sumin4
U5 new32
in1 carryin2carryout in3sumin4
U13 new32
in1 carryin2carryout in3sumin4
U20 new32
in1 carryin2carryout in3sumin4
U28 new32 in1 carryin2carryout in3sumin4U31 new32 in1 carryin2carryout in3sumin4U17 new32
Fig.2 Block Dia. Of Wallace Tree Multiplier
8*8 bits multiplier is designed in a 0.25μm process with 2.8V supply. The critical path of the Wallace tree is simulated to be less than 1ns. The Wallace tree multiplier block diagram is shown in Figure 2.It takes a 8 bits multiplicand and a 8 bits multiplier and generates result. The partial products are then added in a Wallace tree fashion as shown in Figure 1. It consists of Following numbers of components:
14 – Full Adder 14 – 4:2 compressor 6 – 3:2 compressor 2 – Half Adder
4:2 compressor takes five inputs and produces two outputs with carry given to a next stage. 3:2 compressor takes four inputs and produces two
takes three inputs to produce two outputs which contains sum and carry while half adders takes two inputs and produces two outputs containing sum and carry. As half adder does not contribute much more in reduction phase , it is necessary to reduce the number of half adders. The modified Wallace tree construction reduces the number of half adders and hence complexity reduces. 8*8 bit multiplication provides 64 partial which are reduced using Wallace tree by using above components.
4. OPTIMIZATION
The circuits produced by contemporary VHDL synthesis tools are, unfortunately, highly sensitive to the manner in which the original behavioral or structural description is expressed. When designing the Wallace tree multiplier, using different VHDL constructs to describe the same behavior resulted in a
VHDL, carefully some aspects should be taken into account. The synthesis tools sensitive to behavioral or structural descriptions.
5. IMPLEMENTATION IN VHDL
VHDL can be used to represent several levels of abstraction. The level chosen to represent 8*8 bit Wallace tree multiplier includes a mixture of register- level and gate –level logic blocks. To describe the register transfer level and the data flow through the multiplier, VHDL constructs Called processes provide the sequential instructions that manipulate data. Each process contains a sensitivity list that triggers the actions within a process.
6. VERIFICATION OF SIMULATION Quartus II is powerful simulation tool developed by Altera Technologies for altera devices. VHDL code can be compiled and special libraries for altera devices can be used to simulate any hardware efficiently. All the basic modules designed for Wallace tree multiplier are compiled and tested vigorously for functional correctness using waveforms. The complete circuit of 8*8 bit Wallace tree multiplier is described in VHDL, as a structural component. The hierartial structure is created and the complete simulation can be observed. Fig.3 shows the waveform simulation result for the Wallace tree multiplier
.
Fig.3 Simulation waveform for Wallace Tree Multiplier
7. RTL VIEW
The RTL view of the 8*8 bit Wallace tree multiplier is shown in fig.4.The Wallace tree multiplier is designed and implemented using QuartusII by Device:CycloneII:EP2C20F484C7’
Fig.4 RTL View Of Wallace Tree Multiplier
8. POWER & AREA ANALYSIS
Powerplay Power Analyzer status: Successful – Sat Jan 08 00:43:27 2011
Quartus II Version : 8.1 Build 163 10/28/2008 SJ Web Edition
Revision Name : wall2 Top-level Entity Name : wall2 Family : Cyclone II Device : EP2C20F484C7 Power Models : Final
Total Thermal Power Dissipation: 73.15 mW
Power Core Dynamic Thermal Dissipation : 0.15mW Power Core Static Thermal Dissipation : 47.36 mW I/O Thermal Power Dissipation : 25.63 mW Total Logic Elements : 155 / 18,752( <1%) Total Combinational Function : 155 / 18,752( <
1% )
Dedicated Logic register : 0 / 18,752( < 0% ) Total Registers : 0
Total Pins : 32 / 315 (< 10% ) Total Virtual Pins : 0
Total Memory Bits : 0 / 239,616 (< 0%
) Embedded Multiplier 9-bit Elements : 0 / 52 ( 0 %
)
Total PLLs : 0 / 4 ( 0 %) 9. CONCLUSION
The Wallace tree multipliers can be solved &
analyzed using a new modified method of Wallace tree construction. The modified tree has a slightly smaller critical path ,a slightly larger wiring overhead but requires low power and high speed. A 8x8 bits Wallace tree multiplier was designed in 0.25μm.As per analysis the power core dynamic thermal dissipation is 0.15mW which is much less. Total logic elements and total combinational functions required are only less than1% from the available. There is no requirement of logic registers, registers, pins memory bits and PLLs. This modified design of multiplier which consist of 4:2 compressor, 3:2 compressor, full adders and reduced no. of half adder and reduces the complexity and save the power.
10. REFERENCES
[1] Niichi Itoh, Yuka Naemura, Hiroshi Makino, Yasunobu Nakase, Tsutomu Yoshihara, and Yasutaka Horiba, “A 600-MHz 54 x 54-bit Multiplier with Rectangular-Styled Wallace Tree”, JSSC,vol.36, no.
2, February 2001.
[2] Gensuke Goto, Tomio Sato, Masao Nakajima, and Takao Sukemura, “ A 54x54-b RegularlyStructured Tree Multiplier”, JSSC, col. 27, no. 9, September 1992.
[3] Siraram Vangal et.al. “A 5GHz Floating Point Multiply-Accumulator in 90nm Dual Vt CMOS”,ISSCC 2003.
[4] Robert Montoye, et.al. “A Double Precision Floating Point Multiply”, ISSCC 2003.
[5] Niichi Itoh, Yuka Naemura, Hiroshi Makino, Yasunobu Nakase, “A Compact 54x54 Multiplier with Improved Wallace-Tree Structure”, 1999 Symposium on VLSI Circuits Digest of Technical Papers.
[6] Pascal C.H. Meier, Rob A. Rutenbar, L. Richard Carley, “Exploring Multiplier Architecture and Layout for Low Power”, IEEE 1996 Custom Integrated Circuits Conference.
[7] S. Shah, A.J. Al-Khalili, D. Al-Khalili,
“Comparison of 32-bit Multipliers for Various Performance Measures”, The 12th International Conference on Microelectronics, Tehran, Oct. 31-