FPGA Implementation of Reconfigurable FIR Filter using Vedic Design with CLA Adder

(1)

FPGA Implementation of Reconfigurable FIR Filter using Vedic Design with CLA Adder

Kasarla Satish Reddy¹, Hosahally Narayangowda Suresh²

1 Visvesvaraya Technological University /Department of Electronics and Instrumentation Engineering, Bengaluru, India

2 Bangalore Institute of Technology/ Department of Electronics and Instrumentation Engineering, Bengaluru, India

Abstract

Nowadays, Reconfigurable Finite Impulse Response (RFIR) filter is required for most of the Digital Signal Processing (DSP) applications. In that, the reconfigurable filter frequently changes the coefficients while performing the operation. In this paper, Vedic Design with Carry Look Ahead adder is used to design the RFIR filter (RFIR-VD-CLA).

This RFIR architecture is designed using different bits and taps such as 4 bit & 3 Tap, 4 bit & 7 Tap, 8 bit & 3 Tap, and 8 bit & 7 Tap. For all the architectures, the FPGA and ASIC performances are evaluated. Cadence 180nm and 45nm technology have been used for calculating area, power, and delay of the entire architecture. From Xilinx, FPGA performances such as LUT, flip flop, slices, and frequency have been evaluated. RFIR- VD-CLA architecture utilized 103716 um² area, 693908 nW power, and 130ps delay in 180nm technology. RFIR-VD-CLA architecture has better ASIC and FPGA performances than existing architectures.

Index words: Area, Carry Look ahead adder, Power, Reconfigurable Finite Impulse Response Filter, Vedic design.

1. Introduction

FIR digital filter is frequently required in many DSP applications such as echo removal, speech processing, speaker normalization, adaptive noise removal, and communications [1] [2]. There are two types of digital filters used in the communication systems, such as FIR filter and Infinite Impulse Response (IIR) filter. In DSP systems, FIR design plays a vital role which is operated by convolving the input data sample with the desired unit response of the FIR filter [3]. The FIR filter is mostly used as a basic tool in DSP and image processing applications due to their linear phase property and absolute stability [4]. The high speed and low complexity FIR filter is significantly used in mobile communication systems and multimedia applications such as matched filtering, video convolution functions, and channel equalization [5].

The main limitation of the traditional FIR filter design is the high number of evaluation processes, which require high filter order, more hardware area, and power consumption [6] [7]. In recent years, DA based FIR filter are deployed which are more complex because the filter order increases the number of Multiply-Accumulate (MAC) operations at each FIR filter output [8] [9]. A number of existing architectures have been designed such as FIR filter design using linear phase FIR filter [10], parallel based FIR filter [11], low power multiplier FIR filter [12], and DA based FIR filter [13]. All the existing architectures require more hardware utilization and provide less efficiency.

Moreover, a number of existing architectures fail to implement the reconfigurable design.

A block-based RFIR structure can easily be derived using the scheme proposed in [14]

and [15]. But, the block structure is not sufficient for large filter lengths and variable filter coefficients. Therefore, the design of the block-based RFIR structure is suitable for 2-D FIR filters and block least mean square adaptive filters [16]. To overcome this problem,

(2)

RFIR-VD-CLA architecture is introduced in this paper. Normal adder and multipliers occupy more area and power. So, in this work optimal adder and multiplier is used to reduce the hardware utilization. ASIC and FPGA performance is analyzed for different architectures.

This research work is composed as follows, Section 2 presents an extensive survey on recent papers based on RFIR filter design. Section 3 briefly describes the RFIR architecture by using VD algorithm with CLA approach. In section 4, the comparative experimental results of proposed RFIR-VD-CLA filter design with conventional methods is presented. The conclusion of this work is made in section 5.

2. Literature review

Ramanathan et al. [17] presented a low-power adaptive FIR filter based on DA with high-throughput, low- power and area. The Least Mean Square (LMS) algorithm was used to update the weight and reduced the Mean Square Error (MSE) between the current filter outcome and the desired response. Switching activity and power increased using pipeline DA table and the power consumption of the entire architecture is too high.

Rui Jia et al. [18] introduced a novel RFIR filter design based on statistics centric reconfigurable (SCR) architecture. The area, speed, and power has been improved in this architecture and also increased the efficiency of the entire design. But, this work did not discuss the dynamically reconfigurable mechanism.

Basant Kumar Mohanty et al. [19] introduced the VLSI architecture for RFIR using distributed arithmetic operation. This paper analyzed two types of structures such as direct form and transpose form structure of the FIR filter. The proposed method found that the direct form structure needs a less number of registers compared to the transpose form structure. Reconfigurable DA-based FIR filter provided the scalability for higher block sizes. This proposed technique achieved less performance for the ASIC implementation.

J.L. Mazher Iqbal et al. [20] implemented FIR filter based on various methods such as computation sharing multipliers, constant shift method and modified binary-based common subexpression elimination method for several word length filter coefficients.

Due to the complicated design of submodule, FIR filter architecture occupied more area.

3. RFIR- VD-CLA Methodology

The FPGA methodology is developed from the VLSI hardware approach to provide an efficient hardware system. In this paper, RFIR filter design is completely based on FPGA implementation. The block diagram of the RFIR-VD-CLA architecture is represented in the Fig. 1. Section 3.1 describes the principle of Vedic design based on RFIR architecture.

3.1. Vedic design based RFIR Filter Design

The RFIR-VD-CLA block diagram mainly consists of serial in parallel out level shifter, and the scalable accumulator which is used for multiplication and addition operation. In this paper, rewritable RAM based LUT is used instead of ROM based LUT.

In the RFIR-VD-CLA method, an efficient scheme is used to optimize the shared -LUT implementation of RFIR filter by employing a VD technique for multiplication in the shift accumulator. Moreover, the addition operation in RFIR filter is designed by employing the CLA approach. In this work, the reconfigurable VD based FIR filter is used for FPGA implementation. In RFIR architecture, the length of the register is represented as N. But, registers are an insufficient resource in the FPGA as each LUT in FPGA device consists of the two-bit register. The LUTs are analyzed by the Distributed RAM (DRAM) for the FPGA implementation. The partial inner products Sl and 𝑝 cannot be received from the DRAM simultaneously because only one LUT value is read from the DRAM per cycle.

Furthermore, 𝐿 represents the bit width of the input, and as the sample duration of the

(3)

design is 𝐿-times operating clock period, it may not be applicable for higher throughput applications.

Figure.1. Block diagram of the RFIR-VD-CLA architecture

A DRAM is employed to implement LUT in each bit slice for less resource utilization. Hence, this paper decomposes the partial inner product generator into Q parallel sections, and each section has R-time multiplexed processes corresponding to the R- bit slices, where 𝐿 is a composite number given by 𝐿 = 𝑅𝑄 (here 𝑅 & 𝑄 denotes two integer numbers). In Eq. (1) index 𝑙 value is mapped with 𝑟 + 𝑞𝑝. Here 𝑟 = 0,1,2, … . 𝑄 − 1. The 𝑆_𝑙,𝑝 value is represented in Eq. (2).

𝑦 = ^𝐿−1_𝑙−1 2⁻¹ ^𝑃−1_𝑝−1𝑆_𝑙,𝑝 (1)

𝑆_𝑙,𝑝 = ^𝑀−1_{𝑚 −0}ℎ 𝑚 + 𝑝𝑀 𝑠 𝑀 + 𝑝𝑀 (2)

Here, 𝑙 = 0,1,2, … . , 𝐿 − 1and 𝑝 = 0,1,2, … … , 𝑃 − 1 since the sum of partial product is 𝑆_𝑙,𝑝 of the 𝑀 samples.

2^−𝑅𝑃 ^𝑅−1_𝑟−02⁻¹ ^𝑃−1_𝑃−0𝑟 + 𝑞, 𝑅, 𝑃

𝑄−1

𝑞−1

(3)

In Eq. (3), 𝑞 is represented as an index and 𝑟 denotes the time index. This architecture has 𝑅 - time slots of the same duration for the operating clock period so that filter can have one output at each R cycle. The block diagram of the RFIR-VD-CLA is shown in Fig. 2 which is designed by employing DRAM. The RFIR- VD-CLA structure has 𝑞-sections and each section consist of 𝑃 DRAM based Reconfigurable Partial Product Generator (DRPPG) and Pipeline Adder Tree (PAT) to compute the rightmost summation. The shift accumulator is used to perform the shifting operation over 𝑅- cycles according to the second summation. Generally, the RFIR input performs the multiplication operation. The VD algorithm is used for the multiplication process in the shift accumulator. For performing addition performance, CLA adder is used instead of normal digital adders. However, this work employs dual-port DRAM to mitigate the LUT size by half, because two-DRPPGs from two various sections can share the signal DRAM, which is shown in Fig. 3. The RFIR-VD-CLA architecture can provide QP partial inner

(4)

product in an individual cycle and can generate LP inner products. In the 𝑟^𝑡ℎcycle, 𝑃 DRPPG in 𝑞^𝑡ℎsection generate 𝑃 partial products of 𝑆𝑟 + 𝑞𝑅, p for 𝑝 = 0, 1 … 𝑃 − 1 to be added by the PAT.

Figure.2 Block diagram of the RFIR-VD-CLA by employing DRAM

Fig. 3 shows the Structure of shift accumulator. The PAT outputs are accumulated by the shift accumulator over 𝑅 cycles. At the last stage, the pipeline shift add tree produces the filter output by obtaining the output from every 𝑅 cycle. The accumulator value resets at each 𝑅 cycle by the control signal to make the accumulator register ready, for computing the next RFIR filter output. If the clock operating period increases to fclk, then the RFIR-VD- CLA architecture can support the input sample rate at fclk/R.

Figure.3. Structure of shift accumulator 3.1.1. Vedic design algorithm

Fig. 4 shows the multiplication by using a Vedic multiplier. Vedic mathematics contains sixteen fundamental formulae and sub-formulae, which is used to solve all the numeric computation. The Vedic Multiplier architecture performs at high- speed compared to the existing multipliers. The Vedic multiplier is applied in all types of numeric schemes. To illustrate the Urdhva Tiryakbhayam multiplication, consider that two binary numbers such as multiplicand (a1, a0) and multiplier (b1, b0). Hence, the results after the multiplication process of binary numbers give 4-bit of output. Generally, Vedic multiplier follows the below steps.

2X2

(5)

Step 1: Multiply Least Significant Bit (LSB) of the multiplicand and the multiplier vertically, which gives the final result of the LSB.

Step 2: Multiply the LSB of multiplicand with the Most Significant Bit (MSB) of the multiplier with MSB of multiplicand with the LSB of the multiplier (crosswise) and add the products. The addition process gives the second bit of the final outcome.

Step 3: Multiply the MSB of the multiplicand and the multiplier (vertically). The product is added to the previous carry which is already obtained in the previous step. The resulting sum and carry are measured as the third and fourth bit of the final outcome. Fig. 4 shows the diagram of the 2 × 2 multiplication by using Vedic multiplier.

Figure.4. The multiplication by using a Vedic multiplier.

The block diagram of Vedic Multiplier is shown in Fig. 5. According to this design, the Verilog code is written to verify the results. This block contains four multiplier blocks and three adder blocks. In this diagram, to and to are represented as four bit input value.

Figure.5. Vedic multiplier block diagram.

Initially, LSB of the two inputs ( , and , ) is given to the input of multiplier block to perform multiply operation. In the stage, , and , ,

in stage, , and , , at the final stage, , and , values perform the multiplier operation. Last two stage multiplier value is stored in one adder as well as first two-stage multiplier value is stored in one more adder. Those two adder’s results are given as the input to the final adder. Finally, 8-bit results are delivered in the output of the Vedic multiplier design.

4.2. Carry Look-ahead Adder design

In this work, 16-bit CLA is used in the VD-CLA-FIR filter design instead of the normal adder, which is shown in Fig. 6. This adder achieved fast arithmetic operation

2X2

4X4

2 2x

a0 a₃ b₀ b₃

4 4x

a0 a₁ b₀ b₁

2X2 2^nd a₂ a₃ b₀ b₁

3^rd a₀ a₁ b₂ b₃ a₂ a₃ b₂ b₃ 2X2

(6)

employing different type of data processing method, which is used for high-speed operation. The 16-bit CLA consists of four 4-bit CLA blocks and a carry generator which is shown in Fig. 6.

Figure.6. Block diagram of the 16-bit CLA design

4-bit CLAs are required to construct the 16-bit CLA to operate all the 𝑃 and 𝐺 internal signals which is shown in Fig. 7. The CLA adders are commonly implemented as 4-bit modules, which is used to build larger size adders. Overall power, delay, and area can be minimized in the VD-CLA-FIR method by using 16-bit CLA.

Fig. 7. 4 bit carry look-ahead adder 3.3. Co-efficient generation

The coefficients are generated from MATLAB FDA tool which is shown in Fig.

8. This FDA tool is used to generate the co-efficients which are given as input to the filter.

(7)

After opening the FDA tool, the filter order is specified as 8 and density factors is set to 20. Equi-ripple FIR filter method is used to generate the co-efficients for FIR filter operation.

Fig. 8. FDA tool

From the filter response, the magnitude is generated which helps in further steps.

These coefficients are used as input variables in Verilog to perform FIR operation.

4. Experimental results and discussion

In this section, the experimental results and discussion of the proposed methodology is presented and the experimental set-up and performance details are provided. The performance of the proposed methodology was evaluated by ASIC and FPGA.

5.1. Experimental setup

The proposed approach was simulated using 4GB RAM with 3.30 GHz, i3 processor, and 500GB hard disk. The architecture has been implemented using Verilog language. MATLAB is used to generate the coefficients. Modelsim 10.5 tool is used for verifying the timing diagram. Xilinx 14.4 is used for evaluating FPGA performances like LUT, flip flop, slices, and frequency. Cadence RTL compiler is used to calculate ASIC performances like area, power, and delay.

Table 1. ASIC performance of 4 bit for different methods 4-bit input

Technology Methodologies Bits &

Taps

Area (um2)

Power (nW)

Delay (ps)

APP (um2 * nW )

ADP (um2 *ps

) DA-RFIR [7] 4B &

3T

214781 884722 178 190021475882 38231018

4B &

7T

364700 1201345 178 438130521500 64916600

(8)

180nm LC-CBA- RFIR [9]

4B &

3T

201475 814360 165 164073181000 33243375

4B &

7T

306987 1153698 165 354170287926 50652855

RFIR-R2- CSLA

4B &

3T

186314 795140 154 148145713960 28692356

4B &

7T

248796 1084361 154 269784679356 38314584

RFIR-R2- LCSLA

4B &

3T

160017 740036 142 118418340612 22722414

4B &

7T

214788 984710 142 211503891480 30499896

RFIR-VM- CLA

4B &

3T

103716 693908 130 71969362128 13483080

4B &

7T

181521 956498 130 173624473458 23597730

45nm

DA-RFIR [7] 4B &

3T

6421 42015 198 269778315 1271358

4B &

7T

6847 48741 198 333729627 1355706

LC-CBA- RFIR [9]

4B &

3T

5142 39874 175 205032108 899850

4B &

7T

6478 40587 175 262922586 1133650

RFIR-R2- CSLA

4B &

3T

4878 35102 172 171227556 839016

4B &

7T

5142 38942 172 200239764 884424

RFIR-R2- LCSLA

4B &

3T

3987 32698 169 130366926 673803

4B &

7T

4215 36547 169 154045605 712335

4B &

3T

2302 29085 162 66953670 372924

4B &

7T

2328 29634 161 68987952 374808

The ASIC performance of 4 bit architecture for different design methods is tabulated in Table 1. DA [7], CBA [9], R2 based RFIR filter has been used for existing methods. This work is implemented using VD which helps to improve the hardware utilization of the entire architecture. This table shows the area, power, delay value for 180nm and 45nm technology. The proposed architecture gives better performance compared to the existing architectures. This table shows the ASIC performance for 4 bit

& 3 taps as well as 4 bit & 7 taps.

(9)

Figure.9. Comparison of Area for 180nm and 45nm

Figure.10. Comparison of power for 180nm and 45nm

Figure.11. Comparison of delay for 180nm and 45nm

The comparison graph of the area, power, and delay are shown in Fig. 9, Fig. 10, and Fig. 11. In that graph, first two tapes (3-taps and 7 taps) are considered as 180nm results and the rest of the two taps are represented as 45nm technology. From this bar graph, it is easy to understand that the proposed method gives better performance compared to conventional methods.

(10)

Table 2. ASIC performance of 8 bit for different methods 8-bit input

Technolog y

Methodologie s

Bits

&

Tap s

Area (um2)

Power (nW)

Dela y (ps)

APP (um2 * nW )

ADP (um2 *ps

)

180nm

DA-RFIR [7] 8B

&

3T

25647 8

2418971.1 2

279 679402784138 7155736 2

8B

&

7T

26645 7

2431657 278 647932029249 7407504 6

LC-CBA- RFIR [9]

8B

&

3T

23478 9

2241023.1 271 526167549147 6362781 9

8B

&

7T

25461 3

2314521 270 589307135373 6874551 0

RFIR-R2- CSLA

8B

&

3T

20155 6

1932252.2 0

265 389457024423 5341234 0

8B

&

7T

22451 3

1834612 265 411894243956 5949594 5

RFIR-R2- LCSLA

8B

&

3T

20145 0

1932154.2 1

261 389232465604.

5

5257845 0

8B

&

7T

21478 1

1984615 261 426257594315 5605784 1

8B

&

3T

19235 7

1351544 130 259978949208 2500641 0

8B

&

7T

19296 2

1140187 130 220012763894 2508506 0

45nm

DA-RFIR [7] 8B

&

3T

12403 98400.23 198 1220455200 2455794

8B

&

7T

13457 94152 197 1267003464 2651029

LC-CBA- RFIR [9]

8B

&

3T

10428 89452.43 189 932805456 1970892

(11)

8B

&

7T

12471 91247 184 1137941337 2294664

RFIR-R2- CSLA

8B

&

3T

9546 85265.31 175 813942649 1670550

8B

&

7T

9614 84754 175 814824956 1682450

RFIR-R2- LCSLA

8B

&

3T

9426 85152.22 169 802644825.72 1592994

8B

&

7T

8414 86541 169 728155974 1421966

8B

&

3T

3772 54893.47 159 207056396 599748

8B

&

7T

3795 55928.54 159 212246760 603405

Result of the 8 bit ASIC performance is given in Table 2. The operation of 8 bit RFIR filter is same as 4 bit RFIR filter design. The minor change is that the input size is represented as 8-bit instead of 4-bit. This 8-bit architecture also reduces area, power, and delay than existing 8-bit architectures.

Table 3. FPGA performance of 4 bit for different methods Target

FPGA Devices

Methodologies and Bit&

Tab

No. LUT No. Flip- Flop

No. Slice Frequency (MHz)

Virtex-4 Xc4vfx12

DA-RFIR [7] 4 B & 3T 82/10944 54/10944 57/5472 221.145 4 B & 7T 142/10944 98/10944 105/5472 110.214 LC-CBA-RFIR

[9]

4 B & 3T 78/10944 46/10944 52/5472 235.120 4 B & 7T 138/10944 90/10944 96/5472 115.312 RFIR-R2-

CSLA

4 B & 3T 66/10944 44/10944 42/5472 254.754 4 B & 7T 130/10944 82/10944 88/5472 136.418 RFIR-R2-

LCSLA

4 B & 3T 55/10944 40/10944 38/5472 284.613 4 B & 7T 125/10944 78/10944 82/5472 148.421 RFIR-VM-

CLA

4 B & 3T 42/10944 35/10944 31/5472 315.706 4 B & 7T 110/10944 70/10944 79/5472 160.962 DA-RFIR [7] 4 B & 3T 83/12480 105/12480 50/3120 214.265 4 B & 7T 98/12480 46/12480 42/3120 142.130 LC-CBA- RFIR 4 B & 3T 74/12480 94/12480 45/3120 224.125

(12)

Virtex-5 xc5vlx20t

[9] 4 B & 7T 92/12480 46/12480 42/3120 154.216 RFIR-R2-

CSLA

4 B & 3T 70/12480 90/12480 40/3120 236.154 4 B & 7T 85/12480 46/12480 42/3120 174.215 RFIR-R2-

LCSLA

4 B & 3T 62/12480 81/12480 38/3120 248.651 4 B & 7T 83/12480 46/12480 42/3120 178.421 RFIR-VM-

CLA

4 B & 3T 43/12480 35/12480 23/3120 289.763 4 B & 7T 77/12480 70/12480 36/3120 196.398

Virtex-6 Xc6vcx75t

DA-RFIR [7] 4 B & 3T 89/46560 52/93120 66/11640 75.132 4 B & 7T 132/46560 72/93120 96/11640 54.152 LC-CBA- RFIR

[9]

4B & 3T 81/46560 45/93120 60/11640 80.124 4 B & 7T 124/46560 65/93120 92/11640 62.145 RFIR-R2-

CSLA

4 B & 3T 76/46560 40/93120 54/11640 84.612 4 B & 7T 116/46560 60/93120 84/11640 68.154 RFIR-R2-

LCSLA

4 B & 3T 65/46560 38/93120 50/11640 94.652 4 B & 7T 110/46560 58/93120 80/11640 72.489 RFIR-VM-

CLA

4 B & 3T 59/46560 33/93120 47/11640 118.382 4 B & 7T 102/46560 52/93120 74/11640 87.797

The FPGA performance of 4-bit architecture for different design methods is given in Table 3. In this table, the LUT, flip flop, slices, and the operating frequency is evaluated for hardware utilization. When compared to existing architectures, RFIT-VM- CLA utilizes less LUT, slices, and flip flop. The operating frequency is also high. This FPGA performance is analyzed for different Virtex devices such as Virtex-4, Virtex-5, and Virtex- 6.

Figure.12. comparison of LUT for different Virtex devices

(13)

Figure.13. comparison of flip flop for different Virtex devices

Figure.14. comparison of slices for different Virtex devices

Figure.15. comparison of frequency for different Virtex devices

(14)

The comparison graph of the LUT, flip flop, slices, and frequency are shown in Fig. 12, Fig. 13, Fig. 14, and Fig. 15. In all graphs, 3-tap and 7-tap are represented for Virtex 4, Virtex 5 and Virtex 6 devices. As can be seen from the graphs, all the FPGA performance parameters have improved in RFIT-VD-CLA than existing methods.

Table 4. FPGA performance of 8 bit for different methods Target

FPGA Devices

Methodologies and Bit&

Tab

No. LUT No. Flip- Flop

No. Slice Frequency (MHz)

Virtex-4 Xc4vfx12

DA-RFIR [7] 8 B &

3T

142/10944 98/10944 94/5472 108.312

8 B &7 T

156/10944 115/10944 101/5472 104.51

LC-CBA-RFIR [9]

8 B &

3T

115/10944 87/10944 78/5472 110.012

8 B &7 T

140/10944 111/10944 97/5472 106.45

RFIR-R2- CSLA

8 B &

3T

105/10944 81/10944 72/5472 125.161

8 B &7 T

134/10944 94/10944 94/5472 108.312

RFIR-R2- LCSLA

8 B &

3T

100/10944 79/10944 68/5472 128.243

8 B &7 T

124/10944 84/10944 90/5472 142.16

8 B &

3T

86/10944 63/10944 68/5472 282.438

8 B &7 T

106/10944 76/10944 87/5472 211.180

Virtex-5 xc5vlx20t

DA-RFIR [7] 8 B &

3T

160/12480 75/12480 89/3120 141.417

8 B &7 T

170/12480 98/12480 99/3120 138.121

LC-CBA- RFIR [9]

8 B &

3T

115/12480 59/12480 81/3120 145.624

8 B &7 T

168/12480 96/12480 94/3120 140.042

RFIR-R2- CSLA

8 B &

3T

109/12480 49/12480 75/3120 158.324

8 B &7 T

138/12480 88/12480 90/3120 141.417

RFIR-R2- LCSLA

8 B &

3T

101/12480 45/12480 70/3120 163.245

8 B &7 130/12480 84/12480 79/3120 158.423

(15)

T RFIR-VM-

CLA

8 B &3 T

84/12480 63/12480 38/3120 253.107

8 B &7 T

100/12480 76/12480 53/3120 233.938

Virtex-6 Xc6vcx75t

DA-RFIR [7] 8 B &

3T

165/46560 96/93120 79/11640 136.912

8 B &7 T

188/46560 108/93120 145/11640 128.412

LC-CBA- RFIR [9]

8 B &

3T

124/46560 88/93120 64/11640 141.267

8 B &7 T

174/46560 94/93120 140/11640 136.912

RFIR-R2- CSLA

8 B &

3T

116/46560 77/93120 59/11640 154.231

8 B &7 T

165/46560 84/93120 134/11640 144.632

RFIR-R2- LCSLA

8 B &

3T

113/46560 74/93120 55/11640 162.321

8 B &7 T

154/46560 79/93120 120/11640 154.361

8 B &3 T

108/46560 59/93120 62/11640 188.778

8 B &7 T

161/46560 60/93120 103/11640 174.241

FPGA performance of 8 bit architecture for different design methods is given in Table 4. This architecture is also the same as 4-bit architecture. But, the input given to this architecture is 8 bit. This architecture also provides better performance. The RTL schematic of the top module and internal schematic are shown in Fig. 16 and Fig. 17. The screenshot of Virtex 4 – 4 tap architecture FPGA results are shown in Fig. 18. These results are taken from Xilinx software.

Fig. 16. RTL schematic of top module

(16)

Fig. 17. RTL schematic of internal

Figure.18. FPGA results for Virtex 4 - 4 bit

(17)

5. Conclusion

In this research work, the RFIR-VD-CLA architecture is implemented in FPGA and ASIC platforms by using Verilog code. The proposed RFIR filter was designed by employing VD algorithm/CLA, which has reduced the complexity of the multiplication/addition process than the normal multiplier/adder design. The CLA design achieved better performance using FPGA and ASIC compared to the normal CSLA design. Hence, the proposed VD-CLA module occupied less area in the RFIR filter design. Using FPGA implementation, LUT’s, slices, flip-flops, and the frequency has improved in the RFIR-VD-CSLA architecture. The RFIR-VD-CLA architecture reduced 35.18% of area, 6.23% of power, and 8.45% of delay in ASIC 180nm technology compared to the best existing architecture. In conclusion, RFIR-VD-CLA architecture has better ASIC and FPGA performances than existing architectures.

References

[1] Naik, Naveen S., and Kiran A. Gupta. "An Efficient Reconfigurable FIR Digital Filter Using Modified Distribute Arithmetic Technique." arXiv preprint arXiv:1704.08526 (2017).

[2] Rasekh, Amirhossein, and M. Sharif Bakhtiar. "Design of Low-Power Low-Area Tunable Active RC Filters." IEEE Transactions on Circuits and Systems II: Express Briefs 65.1 (2018): 6-10.

[3] R. Thakur, and K. Khare, “High-Speed FPGA Implementation of FIR Filter for DSP Applications”, International Journal of Modeling and Optimization, Vol.3, No.1, pp.92, 2013.

[4] N. Bhagyalakshmi, K.R. Rekha, and K.R. Nataraj, “Design and implementation of DA-based reconfigurable FIR digital filter on FPGA”, In: Proc. of International Conf. on Emerging Research in Electronics, Computer Science and Technology (ICERECT), pp.214-217, 2015.

[5] S.J. Lee, J.W. Choi, S.W. Kim, and J. Park, “A reconfigurable FIR filter architecture to trade off filter performance for dynamic power consumption”, IEEE transactions on very large scale integration (VLSI) systems, Vol.19, No.12, pp.2221-2228, 2011.

[6] S.karthick, Dr.s.valarmathy, C.kamalanathan, "Design and Performance Analysis of a Reconfigurable Fir Filter", International Journal of Innovations in Engineering and Technology (IJIET), Volume 8 Issue 1 January 2017.

[7] P.K. Meher and S.Y. Park, “High-throughput pipelined realization of adaptive FIR filter based on distributed arithmetic”, In: Proc. of the 19th International Conf. on VLSI and System-on-Chip (VLSI- SoC), pp.428-433, 2011.

[8] Bonetti, Andrea, Adam Teman, Philippe Flatresse, and Andreas Burg. "Multipliers-driven perturbation of coefficients for low-power operation in reconfigurable FIR filters." IEEE Transactions on Circuits and Systems I: Regular Papers 64, no. 9 (2017): 2388-2400.

[9] Reddy, Kasarla Satish, and Hosahally Narayangowda Suresh. "A Low Power VLSI Implementation of Reconfigurable FIR Filter Using Carry Bypass Adder." Received: January 6, 2018 22.

[10] Tsao, Y.C. and Choi, K., 2012. Area-efficient VLSI implementation for parallel linear-phase FIR digital filters of odd length based on fast FIR algorithm. IEEE Transactions on Circuits and Systems II:

Express Briefs, 59(6), pp.371-375.

[11] Khan, S. and Jaffery, Z.A., 2015, December. Low power FIR filter implementation on FPGA using parallel Distributed Arithmetic. In India Conference (INDICON), 2015 Annual IEEE(pp. 1-5). IEEE.

[12] Rashidi, B., Rashidi, B. and Pourormazd, M., 2011, April. Design and Implementation of Low Power Digital FIR Filter based on low power multipliers and adders on xilinx FPGA. In Electronics Computer Technology (ICECT), 2011 3rd International Conference on (Vol. 2, pp. 18-22). IEEE.

[13] Sang Yoon Park, and Pramod Kumar Meher,"Efficient FPGA and ASIC Realizations of DA-Based Reconfigurable FIR Digital Filter", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II:

EXPRESS BRIEFS.

[14] Mohanty, B.K. and Meher, P.K., 2013. A high-performance energy-efficient architecture for FIR adaptive filter based on new distributed arithmetic formulation of block LMS algorithm. IEEE transactions on signal processing, 61(4), pp.921-932.

[15] Mohanty, B.K., Meher, P.K., Al-Maadeed, S. and Amira, A., 2014. Memory footprint reduction for power-efficient realization of 2-D finite impulse response filters. IEEE Transactions on Circuits and Systems I: Regular Papers, 61(1), pp.120-133.

[16] Mohanty, B.K. and Meher, P.K., 2016. A high-performance FIR filter architecture for fixed and reconfigurable applications. IEEE transactions on very large scale integration (vlsi) systems, 24(2), pp.444-452.

[17] Ramanathan, S., Anand, G., Reddy, P. and Sridevi, S.A., 2016. Low Power Adaptive FIR Filter Based on Distributed Arithmetic. Int. Journal of Engineering Research and Applications, 6(5), pp.47-51.

(18)

[18] Rui Jia, Hai-Gang Yang, Colin Yu Lin, Rui Chen, Xin-Gang Wang, and Zhen-Hong Guo. "A Computationally Efficient Reconfigurable FIR Filter Architecture Based on Coefficient Occurrence Probability." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 35.8 (2016): 1297-1308.

[19] Basant Kumar Mohanty, PramodKumarMeher, Subodh Kumar Singhal, M.N.S. Swamy "A high- performance VLSI architecture for reconfigurable FIR using distributed arithmetic." Integration, the VLSI Journal 54 (2016): 37-46.

[20] Iqbal, JL Mazher, and S. Varadarajan. "High Performance Reconfigurable FIR Filter Architecture Using Optimized Multiplier." Circuits, Systems, and Signal Processing 32.2 (2013): 663-682.