DA Based FIR Filter Design Analysis using Different LUT Partitions

(1)

DA Based FIR Filter Design Analysis using

Different LUT Partitions

GIRISHA KUMAR

ME Scholar,ECE Departmen NITTTR,Chandigarh,India [email protected]

DR.RAJESH MEHRA

Head, ECE Department, NITTTR chandigarh,India [email protected]

Abstract: This paper present realization of an efficient reconfigurable distributed arithmetic (DA)-based digital finite impulse response (FIR) filter using field programmable gate array(FPGA). Usually In case of reconfigurable DA based filter Lookup tables (LUTs) are implemented using RAM. For DA computation shared-LUT concept is proposed because it is economic. In DA processing to store partial inner product result of different bit positions DA units will share the register .To implement a DA based FIR filter we are using FPGA. The proposed filter supports for maximum input sampling frequency of 442MHz and it requires less number of LUTs and Slice registers so it is area efficient design. The proposed design for different LUT partition is implemented on Xilinx vertex-5 FPGA device (XC5VSX95T-1FF1136).

Keywords: distributed arithmetic(DA); finite-impulse response filter(FIR); Reconfigurable filter; field programmable gate array(FPGA)FPGA; look up table(LUT)

1.Introduction

Digital signal processing is necessary in communication. In recent years due to popularity of telephony and data network digital communication has become popular due to better signal to noise ratio of digital signal over analog so digital signal processing devices with high throughput has become need of it. The growth in digital signal processing is due to its reduced cost as compared to general purpose computer made digital filter technique so popular. Digital filters are able to produce digital output by taking digital input. Due to advancement in VLSI technology and usage of digital signal processors in many application finite-impulse response (FIR) filter plays key role. Digital filters allow the designers to obtain high performance as compared to analog filters. FIR and IIR are two different types of digital filters. FIR Filters are linear phase filters and will not have feedback and are stable filters as compared to the IIR filters. [Hentschel et al.(1999),Ming and chao (2016)]

Fig 1. Magnitude response of FIR Filter

(2)

expensive. The remaining part of the paper is organised as follows. The distributed arithmetic and algorithm is described in the next section and using FPGA DA based FIR filter implementation is discussed in section III. The design of FIR filter is discussed in section IV and its simulation results related to implementation of are presented and discussed in section V and the conclusion is presented in section VI.

2. Distributed Arithmetic &Algorithm

DA based arithmetic uses LUT and shift register for its operation. Usually for the implementation of FIR filter using DA technique assume that FIR filter coefficients are fixed this allows it to use ROM based LUT but this substantially increases the memory requirement of filter as the order of filter increases causes increase in size of device and cost To overcome this systolic decomposition of DA based arithmetic is used for long length convolutions and large order FIR. [Meher (2006),white(1989)]

In many applications for convolutions one sequence is input sample and other is fixed. This characteristic of DSP algorithm makes usage of DA computations for digital convolution. In this method it stores pre-computed partial results in the memory element so it produces faster output as compared to multiplier- accumulator based design. But memory requirement for computation increases with the increase in convolution length. In the systolic decomposition scheme the variable address length for LUTs for DA based computation is provided. It helps to decide the area time trade off because it is possible to reduce the area by reducing memory size due to smaller address length for DA based computation on the other hand complexity and latency increases. For the computation of inner product the conventional inner product distributed arithmetic approach is used. [Singh and mehra (2013)]

Let us consider two N point vector A and B the inner product of is given by C = ∑ (1)

Where A is a constant vector and B is time dependent it may vary with respect to time. Assume L is word length , the two’s complement representation of each component of B is

= − + ∑ 2 (2)

lth bit of Bk is denoted by bkl ,the expanded form of inner product after substituting (2) in (1) is

= − ∑ + ∑ . [∑ 2 ] (3)

In the second term of (3) the summation order of indexes k and l are interchanged to change it into the usaual sum-of-product term in (1) to distributed form

= − ∑ + 2 . [∑ ] (4)

We are assuming that samples of signal are unsigned type with a word length of size L,for binary coding and two’s complement coding DA decomposition algorithm can be used. The simpler form of equation (4) is related to inner product is given by

C = 2 (5)

Where

= ∑ . (6)

Among two vectors one is constant that is vector A and every element of N-point bit-sequence {bkl, for 0 < k <N-1} can be either zero or one, there 2N possible values for are any of the partial sum Cl for l= 0 ,1,…L-1. All 2N possible values are computed and stored in a ROM and can be used while computing inner product of partial sum cl using a bit sequence as an address bits. So according to (6) the inner product can be calculated by shift accumulation of L cycles followed by ROM read operations of corresponding L number of bit sequences {bkl, for 0 <L <N-1}. The implementation of FIR and IIR filter Distributed algorithm is used and it helps to perform significant signal processing operations such as dot product and weighted sum of products [Meher and Chandrashekaran(2008)].

(3)

sequence. all possible partial products are stored in the Look Up Table (LUT) over the filter coefficient space.[Sen et al,(2007)],[Longa and Miri,(2006)]

Fig. 2. DA algorithm based FIR filter. [Patnam (2015)]

3.DA BASED FIR FILTER ON FPGA

The general concept of DA technique is bit-serial in nature.it is using reordering of multiply and accumulation operations. This method is suitable for implementation of FIR filter using FPGA. The FPGA technology has considered as best choice in implementation of communication base station because instead of multipliers it uses LUTs and accumulators for computing inner product. This algorithm takes one serial input and one parallel input. The constant input in DA algorithm is applied in parallel. To perform MAC operation for many terms several scaling accumulator units can be used. The partial term and AND operator can represent the product term and these values are stored in ROM where its values are defined by constants and their address are input bits & the ROM content are defined as follows

Address (000) =>0 Address (001) =>A0 Address (010) =>A1 Address (011) =>A0+A1 Address (100) =>A2 Address (101) =>A0+A2 Address (110) =>A1+A2 Address (111) => A0+A1+A2

Depending on the limitations of the system in many approaches this algorithm can be implemented The DA implementation is given in fig 3 it contains shifter followed by LUTs . The shifter output is connected to address lines of LUT and the result is serially added to produce filter output result. The LUT contains the partial product values and these are stored at different memory locations of LUT and these values can be accessed by mentioning the particular address of LUT. The size of look up table also depends on address width used to access the table. [Patnam and Chitra ,(2015) ]

Fig.3. DA implementation. [Patnam and Chitra ,(2015)]

(4)

product of two positive integers such as R and Q .where l indicated the index that is mapped into (r+qR) for r = 0,1 ,2…. R-1 and q = 0, 1, 2….Q-1.

y = 2 . ∑ 2 . , (7)

We have mentioned r and q as time index and section index as q and r respectively in equation(7) respectively. We get one filter output for every R cycle the duration of each time slot is same as operating clock. In fig 4. We proposed the design of DA- based FIR filter using time multiplexed concept to implement (6). The proposed design contains Q sections and P DRAM RRPGs (DRPPG) in each section along with pipeline adder tree (PAT) to perform summation followed by shift accumulator over R cycle for second summation[ozalevi,(2008)]. To reduce size of LUTs by half we can use DRAM with dual port since single DRAM can be shared by two different sections of two DRPPGs and the structure is shown in fig 5.In a single cycle the structure which is proposed here can generates QP inner partial products .. Sr+qR,p is the P partial product from Q sections produced in rth cycle by PDRPPGs for p=0,1,...P-1 and these partial inner products are added by PAT. Over a R cycles the output of PAT is accumulated in shift accumulator .Finally for R cycle pipeline shift adder tree (PSAT) produces filter output from each section output. The signal for the purpose of control [acc_rst1 in fig 6] is used to reset the value accumulated for every R cycle and this helps to be ready to store the values of calculation for next filter output. The projected structure can sustain the sample rate fclk/R where fclk is the maximum operating clock period. [Park (2014)]

Fig. 4. The propsed structure of DA based FIR filter [Park and Mehra,(2014)]

(5)

Fig.6. Structure of shift accumulator [Park and Mehra, (2014)]

4. DESIGN OF MATLAB BASED FIR FILTER

DA based reconfigurable FIR filter is designed by using MATLAB by specifying the filter co-efficient such as Sampling frequency , pass band frequency, stop band frequency attenuation in passband and stop band and the design method used and generated an HDL file to calculate the area , speed of operation, power consumption of FIR filter. The Filter Specifications of FIR Low pass filter are given as Sampling frequency Fs is 48000Hz,Pass Band frequency 9600Hz,Stop band Frequency 1200Hz,Pass Band Attenuation: 1dB,Stop band Attenuation: 1dB Following analysis of FIR filter design is done:

,

Fig. 7.Magnitude Response of FIR filter

(6)

Fig.9. Impulse Response of FIR filter

Fig.10. Pole Zero plot of FIR filter

5. IMPEMETION RESULTS AND DISCUSSIONS

Using Xilinx Vertex 5 FPGA device (XC5VSX95T) the proposed DA based structure for different LUT partition can be implemented and the parameters such as minimum sample period, maximum sampling frequency, number of slice register, and number of LUTs are compared. The LUT partition (3_3_3_3_3_2) is best suitable for high speed of operations. The number of slice LUTs required is also reduces by 74% to and speed of operation is increases up to 442MHz as compared to LUT partition (12_5).In terms of number of slice registers required partition (12_5) is suitable.

TABLE I RESULT ANALYSIS

LUT partition

No of slice Registers

No of LUTs

No of LUT FF Pair Used

Frequency (in MHz)

Time (in ns)

12_5 112 560 599 206 4.83

8_8_1 123 156 197 325 3.06

6_6_5 128 129 163 386 2.59

(7)

LUT Partition

Fig. 7. Plot of Resource utilization of FIR filter

LUT Partition

Fig. 9. Plot of Frequency of operation

Fig. 10.Wave form of FIR filter [12_5 LUT partition]

0 100 200 300 400 500 600

No of slice Registers

No of LUTs

No of LUT FF Pair Used

Frequency (in MHz)

12_5

8_8_1

6_6_5

(8)

Fig. 11.Wave form of FIR filter [8_8_1 LUT partition]

6. Conclusion

(9)

References

[1] Allred, D.J.; Yoo, H.; Krishan, V.;Huang, W.; Anderson, D.V.(2005): LMS Adaptive Filters Using Distributed Arithmeticfor High Throughput. Circuits and System-I, 52(7), pp. 1327-1337.

[2] Chatterjee, S.; Mehra, R. (2011): FPGA based design of CIC interpolator using Embedded LUT Structure. Electronics Engineering, 1 (1), pp. 1-4.

[3] Devi, S.; Mehra, R. (2010): FPGA based Design of High Performance Decimator using DALUT Algorithm. Signal Processing, 1(2), pp. 51–62.

[4] Ginne; Mehra, R.(2013): FPGA based Gaussian Pulse Shaping filter using Distributed Arithmetic Algorithm. Scientific & Engineering Research, 4, pp 711-715.

[5] Hentschel, T.; Henker, M.; Fettweis, G. (1999): The Digital Front-End of Software Radio Terminals. Personal Communication Magzine, 6(4),pp. 40–46.

[6] Kaur, R.; Mehra, R. (2011): Reconfigurable Area and Speed interpolator using DALUT Algorithm. Computer Science and Technology, 132, pp. 117-225.

[7] Longa, P.; Miri, A. (2006): Area Efficient FIR filter design on FPGAs using distributed arithmetic. Signal Processing and Information Technology, pp. 248-252.

[8] Meher, P. K..; Chandrashekaran, S.S. (2008): FPGA Realization of FIR filters by Efficient and Flexible Realization using Distributed Arithmetic. Signal Processing, 56 (7), pp. 539-542.

[9] Meher, P. K. (2006): Hardware-Efficient Systolization of DA-based Calculation of Finite Digital Convolution. Circuits and Systems-II, Express Briefs, 53(8), pp. 707–711.

[10] Ming, L.; Chao, Y. (2016): The Multiplexed Structure of Multi-channel FIR Filter and its Resources Evaluation. Computer Control and Intelligent Environmental Monitoring, 8(19), pp. 764–768.

[11] Ozalevli, E. ; Huang, W. Hasler, P. E. ; Anderson, D. V. (2008): A Reconfigurable Mixed-Signal VLSI implementation of Distributed Arithmetic used for Finite-Impulse Response Filtering. Circuits and systems-I, 55(2), pp. 510–521.

[12] Park, S. Y.; Mehra, P. K. (2014): Efficient FPGA and ASIC Realizations of a DA-Based Reconfigurable FIR Digital Filter. Circuits and systems-II, 61(7), pp. 511-515.

[13] Patnam, C. S. V.; Chitra, E. (2015): Efficient FPGA realization of DA based reconfigurable FIR digital filter. Electronics and Communication Engineering, 3(2), pp. 24-28.

[14] Sen, W.; Bin, T.; Jun, Z. (2007): Distributed arithmetic for FIR filter design on FPGA. Communications, Circuits and Systems, pp. 620- 623.

[15] Singh, L.; Mehara, R. (2013): FPGA based Speed Efficient Decimator using Distributed Arithmetic Algorithm. Computer Applications, 80 (11), pp. 0975-8887.