Implementation of Low Power Area Efficient Parallel FIR Digital Filter Structures of Odd Length Based on Common Sub expression Algorithm

(1)

1Shaila Khan, ²Uma Sharma

1M. Tech Student, ²Assistant Professor

1,2Department of Electronics & Communication Engineering

1,2Ajay Kumar Garg Engineering College, Ghaziabad, India

Abstract— In the digital systems, filters occupy very important role such as in the field of wireless communication. In this paper, we proposed the design of 3X3 parallel FIR digital filter structures using common sub expression elimination algorithm that requires reduced number of multipliers and low power adders.

Generally multipliers consume more power and large area in comparison to adders. For minimizing the area, this proposed filter structure uses adders since the adder requires low power and less area than the multipliers.

Moreover along with the length of FIR filter, number of adders does not increase. Lastly the new proposed parallel FIR filter structures of 3x3 are beneficial in terms of hardware cost and power consumption when compared to the existing parallel FIR filter structure.With the use of common sub expression elimination algorithms, the proposed structures uses the inherent nature of symmetric coefficients which reduces half the number of multipliers in Sub-Filter section at the cost of additional adders in preprocessing and post processing blocks.

Exchanging multipliers with adders is useful as the weight of adders is less than multipliers in terms of silicon area in addition.

In dex Terms— Digital signal processing (DSP), fast finite-impulse response (FIR) algorithms (FFAs), parallel FIR, symmetric convolution, common sub expression elimination algorithm, very large scale integration (VLSI).

I.INTRODUCTION

The finite-impulse response (FIR) filter has been and continues to be one of the fundamental processing elements in any digital signal processing (DSP) system.FIR filters are used in DSP applications that range from video and image processing to wireless communications. In some applications, such as video processing, the FIR filter circuit must be able to operate at high frequencies, while in other applications, such as cellular telephony, the FIR filter circuit must be a low- power circuit, capable of operating at moderate frequencies.

Parallel, or block, processing can be applied to digital FIR filters to either increase the effective throughput or reduce the power consumption of the original filter. Traditionally, the application of parallel processing to a FIR filter involves the replication of the hardware units that exist in the original filter. If the area required by the original circuit is A, then the L-parallel circuit requires an area of L x A. With the continuing trend to reduce chip size and integrate multi-chip solutions into a single chip solution, it is important to limit the silicon area required to implement a parallel FIR digital filter in a VLSI implementation. In many design situations, the

cannot be tolerated due to limitations in design area.

Therefore, it is advantageous to realize parallel FIR filtering structures that consume less area than traditional parallel FIR filtering structures. There have been a few papers proposing ways to reduce the complexity of the parallel FIR filter in the past [1]–[9]. In [1]–[4], polyphase decomposition is mainly manipulated, where the small-sized parallel FIR filter structures are derived first and then the larger block-sized ones can be constructed by cascadingor iterating small-sized parallel FIR filtering blocks. Fast FIR algorithms (FFAs) introduced in [1]–[3] shows that it can implementa L-parallel filter using approximately (2L-1) sub filter blocks, each of which is of length N/L. FFA structures successfully break the constraint that the hardware implementation cost of a parallel FIR filter has a linear increase along with the block size L. It reduces the requirednumber of multipliers to (2N-N/L) from LXN. In [5]–[9], the fast linear convolution is utilized to develop the small-sized filtering structures and then a long convolution is decomposed into several short convolutions, i.e., larger block-sized filtering structures can be constructed through iterations of the small-sized filtering structures. On the other hand, parallel and pipelining processing are two techniques used in DSP applications, which can both be exploited to reduce the power consumption. Pipelining shortens the critical path by interleaving pipelining latches along the data path, at the price of increasing the number of latches and the system latency, whereas parallel processing increase the sampling rate by replicating hardware so that multiple inputs can be processed in parallel and multiple outputs are generated at the same time, at the expense of increased area. Both techniques can reduce the power consumption by lowering the supply voltage, where the sampling speed does not increase. In this paper, parallel processing in the digital FIR filter will be discussed.

II.EQUIRIPPLE METHOD FOR FIR FILTER DESIGN The Equiripple method is also known by Parks-McClellan method, Optimal, or Minimax method. To find an Equiripple set of coefficients, the Remez exchange algorithm is commonly used. Here the user specifies a desired frequency response, a weighting function for errors from this response, and a filter order N. The algorithm then finds the set of (N+1) coefficients that reduces the maximum deviation from the ideal. This method is particularly easy in practice. Equiripple FIR filters can be designed using the FFT algorithm as well.

The algorithm is iterative in nature. It is easily understood in terms of the convolution theorem for Fourier transforms, making it instructive to study after the Fourier theorems and Equiripple for spectrum analysis.

--- .Temperature and Supply voltage insensitive CMOS based Current Reference

Implementation of Low Power Area Efficient Parallel FIR Digital Filter

Structures of Odd Length Based on Common Sub expression Algorithm

(2)

III. PARALLEL PROCESSING FOR LOW POWER The throughput of the FIR filters can be increased by using parallel processing. If a L-parallel filter is operated at the same clock rate as the original filter, L output samples are generated every clock cycle compared to the single output sample that is produced every clock cycle in the original filter. This implies that the L-parallel filter effectively operates at L times the rate of the original FIR filter. While it is clear that parallel processing can increase the throughput of a FIR filter, the technique of parallel processing can also be used to minimize the power of a FIR filter. This fact is often overlooked. The application of parallel processing facilitate the lowering of the supply voltage which in turn leads to a decrease in the power consumption Let Po = Co Vo 2 fo represent the power consumed in the original FIR filter, where Co is the effective capacitance of the original filter, Vo is the supply voltage of the original filter and of is the clock frequency of the original filter. It should be noted that fo = 1/

To, where To is the clock period of the original filter. In order to maintain the same sample rate, the clock period of a L- parallel filter must be increased to L To since L samples are produced every clock cycle. The L-parallel filter has L copies of the original filter, each of which has an effective capacitance of Co. This means that Co is charged in time L To rather than in time To. In other words, there is more time to charge the same capacitance. This implies that the supply voltage can be lowered to β Vo, where β is a positive constant less than 1. By examining the propagation delay considerations of the original and parallel filter, the power supply reduction factor, β, can be determined. The propagation delay of the original circuit is given by

Tpd=

^{𝐶𝑜𝑉𝑜}

𝑘(𝛽𝑉0−𝑉𝑡)2

(1)

where k is a process dependent parameter and Vt is the device threshold voltage. It should be noted that the clock period, To, is typically set equal to the maximum propagation delay, Tpd, in a circuit. The propagation delay of the L-parallel filter is given by

LTpd =

^{𝐶𝑜𝛽𝑉𝑜}

𝑘(𝛽𝑉𝑜 −𝑉𝑡 )2

(2)

From (1) and (2) the following quadratic equation is Obtained

L (β Vo – Vt ) 2 = β(Vo − Vt )2 (3)

This equation is used to solve for β. Once β is obtained, the reduced power consumption of the FIR filter can be calculated using

P = β2 (LCo) Vo 2( fo/L) = β2Co Vo 2 fo (4)

As can be seen, parallel processing leads to a reduction in power consumption by a factor of β2. For example,

consider the case when L = 2, Vo = 5 V and Vt = 0.4 V.

Using these values, β is approximately 0.572 which leads to a power reduction factor of approximately 0.327. It should be noted that the supply voltage cannot be lowered indefinitely by increasing the level of parallelism in a filter.

There is a lower bound on the supply voltage which is dictated by the process parameters.

IV.FAST FIR ALGORITHM

As described in [10] let {xi} and {hi} to be the input sequence and the Nth-order impulse response of an FIR filter respectively, the output sequence y(n) and the filter transfer function H(z) can be written as (3.1)

(4) The traditional L-parallel FIR filter can be shown in Fig 1

Fig .1. Traditional 2 parallel FIR filter implementation

Two-parallel FIR filter and Three-parallel FIR filter implementation using FFA as shown by Fig.2 and Fig.3 respectively.

Fig.2. Two parallel FIR filter implementation using FFA

(3)

Fig. 3.Three parallel FIR filter implementation using FFA

V.COMMON SUB EXPRESSION ELIMINATION ALGORITHM

In this section, we will give a detailed description of an algorithm able to solve Problem B (i.e., the elimination of patterns with arbitrary shifts within the input matrix).

Afterwards, we will discuss the modifications necessary for the algorithm to be able to solve Problem A as well. The algorithm must accomplish the following tasks. 1) Identify the presence of multiple patterns in the input matrix. 2) Select one pattern for elimination. 3) Eliminate all occurrences of the selected pattern. This should be iteratively repeated until there are no more multiple patterns present. The input parameter represents the number of nonzero bits in the examined patterns. In the first step, an exhaustive search for all possible multiple-bit patterns is performed and complete statistics of the pattern frequencies are created. Since many different patterns will occur more than once, some criterion must be used to select the one for elimination. We use the steepest descent approach, i.e., select always the pattern with the highest frequency. In the second step, all occurrences of the selected pattern are removed (i.e., the nonzero bits are replaced by zeros), and the pattern is added as a new line at the bottom of the matrix so it can be searched for the multiple patterns with smaller later. Last, since the removal of a pattern must influence the total frequency statistics of the remaining ones, the global frequency statistic holding the complete information has to be adjusted to properly reflect the changes.

After all multiple patterns with nonzero bits are processed, the whole cycle is repeated for nonzero bit patterns. A detailed discussion will be further concentrated on the following problems: A) Pattern identification; B) Pattern selection; C) Frequency statistics management; D) Adaptation of the algorithm for Problem A; E) Viability of the algorithm for large tasks; F) Applicability for similar CSE tasks.

VI

.

PROPOSED 3X3 PARALLEL FIR DIGITAL FILTER STRUCTURE

For N-tap three parallel FIR Filter as shown in Fig. 4, the total amount of saved multiplier would be the number of sub filter blocks that contain symmetric coefficients times half the number of multiplications in a single sub filter blocks.

Compared with the existing FFA three-parallel FIR filter structure, the proposed structure leads to two more sub filter

blocks, which contains symmetric coefficients. A comparison figure between the existing FFA three parallel FIR filter and proposed FIR filter is shown in Fig.5

Fig. 4. Proposed three parallel FIR filter implementation

I

Fig.5. Comparison of sub filter blocks between the existing FFA and the proposed structure.

Where the shadow blocks stand for the sub filter blocks, which contain symmetric coefficients. Therefore, for an N-tap three-parallel FIR filter, the proposed structure can save N/3 multipliers from the existing FFA structure. However, it comes with the price of the increase in amount of adders, i.e., five additional adders, in pre-processing and post processing blocks. The number of multipliers and the number of adders required for the filtering structure

𝑀 =

^𝑁

𝐿_𝑖 𝑟𝑖=1

𝑀

_𝑖

𝑟𝑖=1

(5)

Where r is the number of FFAs used, Lⁱis the block size of the FFA at step-i , Mi is the number of filters that result from the application of the ith FFA and N is the length of the filter.

The number of required adders that is calculated as follows:

𝐴 = 𝐴1 𝐿_𝑖

𝑟

𝑖=2

+ 𝐴𝑖 𝐿𝑗 𝑟

𝑗 =𝑖+1

𝑀𝑘

𝑖−1

𝑘=1

𝑟

𝑖=2

+ [

^𝑟_𝑖=1

𝑀

_𝑖

]

^𝑁

𝐿𝑖 𝑟𝑖=1

− 1 (6)

(4)

VII. PROPOSED DESIGN SIMULATION

The block Level Implementation of 3X3 proposed FIR Filter of 81-tap is done on MATLAB R2012a as shown in Figure 6.Each Block of Proposed FIR Filter contains Sub filter blocks each of which is loaded with symmetric Coefficients based on Canonical Signed Digit Structures which reduces the hardware complexity of the Filter. The various output waveform of 81-tap direct form FIR filter based on Equiripple method are shown in Fig. 7.

Fig. 6. Block Level implementation of 3x3 parallel FIR filter on MATLAB

a. Magnitude Response

b. Phase Response

c.

c.Step Response

d. Impulse Response

e. Pole/Zero Plot

Fig. 7 (a) Magnitude Response, (b) Phase Response (c) Step Response, (d) Impulse Response And (e) Pole/Zero Plot

(5)

The main advantage of direct form Symmetric FIR is that it is no longer costly and can be operate on lower data sample

rates. The simulation waveform of 81-tap FIR Filter is shown in Fig 8.

Fig.8 Output waveform of 81-tap FIR parallel digital Filter Structure

This above results describes the proposed 3X3 parallel l fir filter of order 81 with symmetric co-efficient in terms of Area and Power. Area can be calculated with the help of number of slices used in Field Programmable Gate Arrays. Also the power utilization is measured with the help of Xilinx 14.1. Power consumption is depends on area utilized. i.e., number of slices to be used.

VIII. HARDWARE COMPLEXITY ANALYSIS

A Comparison between the proposed CSE (Common Sub expression Elimination Algorithm) and the existing FFA structures for odd coefficients with different length under different level of parallelism is summarized in Table I.

Table I. Performance results of FFA and CSE Algorithm

Algo L=3 Required multiplier

Reduced multipliers

Reduced Adders

No of Ad d- ers Inc- reas ed Sub Pre/

Pos t

FFA 27- Tap

38

8 48

10 5 81-

Tap

110 15

CSE 27- Tap

30

26 156 10

5 81-

Tap

84 15

IX.IMPLEMENTATION

In this work, sub filters blocks are used and implementation is done using MATLAB .Figure 6 shows the block diagram of proposed approach.Each of the block is loaded with sub filters structures by using CSE Algorithm, and the blocks inside the subfilter structures is loaded with filter coefficients.Lastly,its simulation and synthesis results is analysed using Xilinx 14.1 ISE tool.Xilinx 14.1 ISE tool is used to analyse Power and Area between existing FFA and Proposed CSE structure. Comparision of Power and Area is summarized in Table II.

Table II. Comparision of Power and Area Analysis Algorithm Power(mW) Area (in slices)

FFA 129.76 ²⁸⁰⁸⁶¹

CSE 76 ⁷⁹³⁹⁷

(6)

X.CONCLUSION

The CSE algorithm is designed to reduce the number of multipliers in FIR filter structure. The comparison between the FFA algorithm in FIR filter to increase the order of multipliers and CSE algorithm reduce the number of multipliers and increase the adders in parallel FIR filters are done. They can be simulated by using ISIM simulator and Xilinx 14.1 software and their performances were analyzed. Thus the performance of the proposed method was improved compared to the steepest descent approach. Finally, the symmetric coefficients for the parallel FIR filter using FFA is slightly increased comparing to CSE approach. The area and power consumed by CSE algorithm is less compared to the FFA algorithm.

XI. REFERENCES

[1] D. A. Parker and K. K. Parhi, “Low-area/power parallel FIR digital filter implementations,” J. VLSI Signal Process. Syst., vol. 17, no. 1, pp. 75–92,Sep.

1997.

[2] J. G. Chung and K. K. Parhi, “Frequency-spectrum- based low-area low power parallel FIR filter design,”

EURASIP J. Appl. Signal Process., vol. 2002, no. 9, pp.

444–453, Jan. 2002.

[3] K. K. Parhi, VLSI Digital Signal Processing systems: Design and Implementation. New York:

Wiley, 1999.

[4] Z.-J. Mou and P. Duhamel, “Short-length FIR filters and their use in fast non recursive filtering,” IEEE Trans. Signal Process., vol. 39, no. 6, pp. 1322–1332, Jun. 1991.

[5] J. I. Acha, “Computational structures for fast implementation of L-path and L-block digital filters,”

IEEE Trans. Circuits Syst., vol. 36, no. 6, pp. 805–812, Jun. 1989.

[6] C. Cheng and K. K. Parhi, “Hardware efficient fast parallel FIR filter structures based on iterated short convolution,” IEEE Trans. Circuits Syst. I, Reg.

Papers, vol. 51, no. 8, pp. 1492–1500, Aug. 2004.

[7] C. Cheng and K. K. Parhi, “Further complexity reduction of parallel FIR filters,” in Proc. IEEE ISCAS, May 2005, vol. 2, pp. 1835–1838.

[8] C. Cheng and K. K. Parhi, “Low-cost parallel FIR structures with 2-stage parallelism,” IEEE Trans.

Circuits Syst. I, Reg. Papers, vol. 54, no. 2, pp. 280–

290, Feb. 2007.

[9] I.-S. Lin and S. K. Mitra, “Overlapped block digital filtering,” IEEE Trans. Circuits Syst. II, Analog Digit.

Signal Process. vol. 43, no. 8, pp. 586–596, Aug. 1996.

[10] Y.-C. Tsao and K. Choi, “Area-efficient parallel FIR digital filter structures for symmetric convolutions based on fast FIR algorithm,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 2, pp. 366–371, Feb. 2010.

[11] Yu-Chi Tsao and Ken Choi, “Area-Efficient VLSI Implementation for Parallel Linear-Phase FIR Digital Filter of Odd Length Based on Fast FIR Algorithm”, IEEE Transaction on Circuits and Systems-II, vol.

59.no.6.pp.371-375, 2012.

[12] Y.C. LIM, “Design of Discrete-Coefficient-Value Linear Phase FIR Filters with Optimum Normalized Peak Ripple Magnitude”, IEEE Transaction on Circuits And Systems—I,vol.37. no.12.pp.1480-1486,1990.

[13] P. P. Vaidyanathan, “Optimal Design of Linear- Phase FIR Digital Filters with Very Flat Passbands and Equiripple Stopbands”, IEEE Transactions on Circuits and Systems, vol. 32,no.9,pp. 904-917,1985.

[14] P. K. Meher, S. Chandrasekaran, and A. Amira,

“FPGA realization of FIR filters by efficient and flexible systolization using distributed arithmetic”, IEEE Trans. Signal Process., Vol. 56, no. 7, pp. 3009–

3017,2008.

Figure 2 Block Diagram of Proposed Approach.