Digital Signal Processing

(1)

Digital Signal Processing FAQs

dspGuru's Digital Signal Processing FAQs not only provide

Rarely Given Answers to Frequently Asked Questions about

DSP, they offer a gentle introduction to a variety of fundamental

DSP topics.

Expanded Table of Contents

FIR Filter FAQ

FIR Filter Basics FIR Filter Properties FIR Filter Design

FIR Filter Implementation IIR Filter FAQ

IIR Filter Basics Multirate FAQ

Multirate Basics Decimation Interpolation Resampling

Fast Fourier Transform (FFT) FAQ CORDIC FAQ

Raised Cosine FAQ

Top-Level Table of Contents

FIR Filter FAQ

Finite Impulse Response (FIR) filters are one of the most important topics in Digital Signal Processing. Here, dspGuru answers "Frequently Asked Questions" on FIR filtering.

FIR Filter Basics

(2)

FIR filters are one of two primary types of digital filters used in Digital Signal Processing (DSP) applications, the other type being IIR.

1.2 What does "FIR" mean?

"FIR" means "Finite Impulse Response". If you put in an impulse, that is, a single "1" sample followed by many "0" samples, zeroes will come out after the "1" sample has made its way through the delay line of the filter.

1.3 Why is the impulse response "finite"?

In the common case, the impulse response is finite because there is no feedback in the FIR. A lack of feedback guarantees that the impulse response will be finite. Therefore, the term "finite impulse response" is nearly synonymous with "no feedback".

However, if feedback is employed yet the impulse response is finite, the filter still is a FIR. An example is the moving average filter, in which the Nth prior sample is subtracted (fed back) each time a new sample comes in. This filter has a finite impulse response even though it uses

feedback: after N samples of an impulse, the output will always be zero. 1.4 How do I pronounce "FIR"?

Some people say the letters F-I-R; other people pronounce as if it were a type of tree. We prefer the tree. (The difference is whether you talk about an F-I-R filter or a FIR filter.) 1.5 What is the alternative to FIR filters?

DSP filters can also be "Infinite Impulse Response" (IIR). (See dspGuru's IIR FAQ.) IIR filters use feedback, so when you input an impulse the output theoretically rings indefinitely.

1.6 How do FIR filters compare to IIR filters?

Each has advantages and disadvantages. Overall, though, the advantages of FIR filters outweigh the disadvantages, so they are used much more than IIRs.

1.6.1 What are the advantages of FIR Filters (compared to IIR filters)? Compared to IIR filters, FIR filters offer the following advantages:

They can easily be designed to be "linear phase" (and usually are). Put simply, linear-phase filters delay the input signal but don’t distort its linear-phase.

They are simple to implement. On most DSP microprocessors, the FIR calculation can be done by looping a single instruction.

They are suited to multi-rate applications. By multi-rate, we mean either "decimation" (reducing the sampling rate), "interpolation" (increasing the sampling rate), or both.

(3)

Whether decimating or interpolating, the use of FIR filters allows some of the calculations to be omitted, thus providing an important computational efficiency. In contrast, if IIR filters are used, each output must be individually calculated, even if it that output will discarded (so the feedback will be incorporated into the filter).

They have desireable numeric properties. In practice, all DSP filters must be implemented using finite-precision arithmetic, that is, a limited number of bits. The use of finite-precision arithmetic in IIR filters can cause significant problems due to the use of feedback, but FIR filters without feedback can usually be implemented using fewer bits, and the designer has fewer practical problems to solve related to non-ideal arithmetic.

They can be implemented using fractional arithmetic. Unlike IIR filters, it is always possible to implement a FIR filter using coefficients with magnitude of less than 1.0. (The overall gain of the FIR filter can be adjusted at its output, if desired.) This is an important consideration when using fixed-point DSP's, because it makes the implementation much simpler.

1.6.2 What are the disadvantages of FIR Filters (compared to IIR filters)?

Compared to IIR filters, FIR filters sometimes have the disadvantage that they require more memory and/or calculation to achieve a given filter response characteristic. Also, certain responses are not practical to implement with FIR filters.

1.7 What terms are used in describing FIR filters?

Impulse Response - The "impulse response" of a FIR filter is actually just the set of FIR

coefficients. (If you put an "impluse" into a FIR filter which consists of a "1" sample followed by many "0" samples, the output of the filter will be the set of coefficients, as the 1 sample moves past each coefficient in turn to form the output.)

Tap - A FIR "tap" is simply a coefficient/delay pair. The number of FIR taps, (often

designated as "N") is an indication of 1) the amount of memory required to implement the filter, 2) the number of calculations required, and 3) the amount of "filtering" the filter can do; in effect, more taps means more stopband attenuation, less ripple, narrower filters, etc.

Multiply-Accumulate (MAC) - In a FIR context, a "MAC" is the operation of multiplying

a coefficient by the corresponding delayed data sample and accumulating the result. FIRs usually require one MAC per tap. Most DSP microprocessors implement the MAC operation in a single instruction cycle.

Transition Band - The band of frequencies between passband and stopband edges. The

narrower the transition band, the more taps are required to implement the filter. (A "small" transition band results in a "sharp" filter.)

Delay Line - The set of memory elements that implement the "Z^-1" delay elements of the

FIR calculation.

Circular Buffer - A special buffer which is "circular" because incrementing at the end

causes it to wrap around to the beginning, or because decrementing from the beginning causes it to wrap around to the end. Circular buffers are often provided by DSP

(4)

without having to literally move the data in memory. When a new sample is added to the buffer, it automatically replaces the oldest one.

FIR Filter Properties

2.1 Linear Phase

2.1.1 What is the association between FIR filters and "linear-phase"?

Most FIRs are linear-phase filters; when a linear-phase filter is desired, a FIR is usually used. 2.1.2 What is a linear phase filter?

"Linear Phase" refers to the condition where the phase response of the filter is a linear (straight-line) function of frequency (excluding phase wraps at +/- 180 degrees). This results in the delay through the filter being the same at all frequencies. Therefore, the filter does not cause "phase distortion" or "delay distortion". The lack of phase/delay distortion can be a critical advantage of FIR filters over IIR and analog filters in certain systems, for example, in digital data modems. 2.1.3 What is the condition for linear phase?

FIR filters are usually designed to be linear-phase (but they don't have to be.) A FIR filter is linear-phase if (and only if) its coefficients are symmetrical around the center coefficient, that is, the first coefficient is the same as the last; the second is the same as the next-to-last, etc. (A linear-phase FIR filter having an odd number of coefficients will have a single coefficient in the center which has no mate.)

2.1.4 What is the delay of a linear-phase FIR?

The formula is simple: given a FIR filter which has N taps, the delay is: (N - 1) / (2 * Fs), where Fs is the sampling frequency. So, for example, a 21 tap linear-phase FIR filter operating at a 1 kHz rate has delay: (21 - 1) / (2 * 1 kHz)=10 milliseconds.

2.1.4 What is the alternative to linear phase?

Non-linear phase, of course. ;-) Actually, the most popular alternative is "minimum phase". Minimum-phase filters (which might better be called "minimum delay" filters) have less delay than linear-phase filters with the same amplitude response, at the cost of a non-linear phase

characteristic, a.k.a. "phase distortion".

A lowpass FIR filter has its largest-magnitude coefficients in the center of the impulse response. In comparison, the largest-magnitude coefficients of a minimum-phase filter are nearer to the

(5)

beginning . (See dspGuru's tutorial How To Design Minimum-Phase FIR Filters for more details.)

2.2 Frequency Response

2.2.1 What is the Z transform of a FIR filter?

For an N-tap FIR filter with coefficients h(k), whose output is described by: y(n)=h(0)x(n) + h(1)x(n-1) + h(2)x(n-2) + ... h(N-1)x(n-N-1),

the filter's Z transform is:

H(z)=h(0)z-0 + h(1)z-1 + h(2)z-2 + ... h(N-1)z-(N-1) , or

2.2.2 What is the frequency response formula for a FIR filter?

The variable z in H(z) is a continuous complex variable, and we can describe it as: z=r·ejw, where r is a magnitude and w is the angle of z. If we let r=1, then H(z) around the unit circle becomes the filter's frequency response H(jw). This means that substituting ejw for z in H(z) gives us an expression for the filter's frequency response H(w), which is:

H(jw)=h(0)e-j0w + h(1)e-j1w + h(2)e-j2w + ... h(N-1)e-j(N-1)w , or

Using Euler's identity, e-ja=cos(a) - jsin(a), we can write H(w) in rectangular form as: H(jw)=h(0)[cos(0w) - jsin(0w)] + h(1)[cos(1w) - jsin(1w)] + ... h(N-1)[cos((N-1)w) -jsin((N-1)w)] , or

2.2.3 Can I calculate the frequency response of a FIR using the Discrete Fourier Transform (DFT)? Yes. For an N-tap FIR, you can get N evenly-spaced points of the frequency response by doing a DFT on the filter coefficients. However, to get the frequency response of the filter at any

arbitrary frequency (that is, at frequencies between the DFT outputs), you will need to use the

formula above.

(6)

Consider a DC (zero Hz) input signal consisting of samples which each have value 1.0. After the FIR's delay line had filled with the 1.0 samples, the output would be the sum of the coefficients. Therefore, the gain of a FIR filter at DC is simply the sum of the coefficients.

This intuitive result can be checked against the formula above. If we set w to zero, the cosine term is always 1, and the sine term is always zero, so the frequency response becomes:

2.2.5 How do I scale the gain of a FIR filter?

Simply multiply all coefficients by the scale factor.

2.3 Numeric Properties

2.3.1 Are FIR filters inherently stable?

Yes. Since they have no feedback elements, any bounded input results in a bounded output. 2.3.2 What makes the numerical properties of FIR filters "good"?

Again, the key is the lack of feedback. The numeric errors that occur when implementing FIR filters in computer arithmetic occur separately with each calculation; the FIR doesn't "remember" its past numeric errors. In contrast, the feedback aspect of IIR filters can cause numeric errors to compound with each calculation, as numeric errors are fed back.

The practical impact of this is that FIRs can generally be implemented using fewer bits of precision than IIRs. For example, FIRs can usually be implemented with 16 bits, but IIRs generally require 32 bits, or even more.

2.4 Why are FIR filters generally preferred over IIR filters in multirate (decimating and interpolating) systems?

Because only a fraction of the calculations that would be required to implement a decimating or interpolating FIR in a literal way actually needs to be done.

Since FIR filters do not use feedback, only those outputs which are actually going to be used have to be calculated. Therefore, in the case of decimating FIRs (in which only 1 of N outputs will be used), the other N-1 outputs don't have to be calculated. Similarly, for interpolating filters (in which zeroes are inserted between the input samples to raise the sampling rate) you don't actually have to multiply the inserted zeroes with their corresponding FIR coefficients and sum the result; you just omit the multiplication-additions that are associated with the zeroes (because they don't change the result anyway.)

(7)

In contrast, since IIR filters use feedback, every input must be used, and every input must be calculated because all inputs and outputs contribute to the feedback in the filter.

2.5 What special types of FIR filters are there? Aside from "regular" and "extra crispy" there are:

Boxcar - Boxcar FIR filters are simply filters in which each coefficient is 1.0. Therefore,

for an N-tap boxcar, the output is just the sum of the past N samples. Because boxcar FIRs can be implemented using only adders, they are of interest primarily in hardware implementations, where multipliers are expensive to implement.

Hilbert Transformer - Hilbert Transformers shift the phase of a signal by 90 degrees. They

are used primarily for creating the imaginary part of a complex signal, given its real part.

Differentiator - Differentiators have an amplitude response which is a linear function of

frequency. They are not very popular nowadays, but are sometimes used for FM demodulators.

Lth-Band - Also called "Nyquist" filters, these filters are a special class of filters used

primarily in multirate applications. Their key selling point is that one of every L coefficients is zero--a fact which can be exploited to reduce the number of multiply-accumulate operations required to implement the filter. (The famous "half-band" filter is actually an Lth-band filter, with L=2.)

Raised-Cosine - This is a special kind of filter that is sometimes used for digital data

applications. (The frequency response in the passband is a cosine shape which has been "raised" by a constant.) See dspGuru's Raised-Cosine FAQ for more information.

Lots of others.

FIR Filter Design

3.1 What are the methods of designing FIR filters? The three most popular design methods are (in order):

Parks-McClellan: The Parks-McClellan method (inaccurately called "Remez" by Matlab) is probably the most widely used FIR filter design method. It is an iteration algorithm that accepts filter specifications in terms of passband and stopband frequencies, passband ripple, and stopband attenuation. The fact that you can directly specify all the important filter parameters is what makes this method so popular. The PM method can design not only FIR "filters" but also FIR "differentiators" and FIR "Hilbert transformers". Windowing:. In the windowing method, an initial impulse response is derived by taking the Inverse Discrete Fourier Transform (IDFT) of the desired frequency response. Then, the impulse response is refined by applying a data window to it.

Direct Calculation: The impulse responses of certain types of FIR filters (e.g. Raised Cosine and Windowed Sinc) can be calculated directly from formulas.

(8)

With a FIR filter design program, of course. ;-) Although it's possible to design FIR filters using manual methods, it is a whole lot easier just to use a FIR filter design program.

3.3 What FIR filter design programs are available?

FIR filter design programs come in three broad categories:

Filter Design Applications: See dspGuru's Digital Filter Design Software page for a list of filter design programs.

Near and dear to us here at dspGuru is Iowegian's own ScopeFIR product. We believe ScopeFIR offers an excellent combination of professional features, smooth user interface, and affordable price. We sell it for just $199, with a generous 60 day trial period. Even if you already use Matlab, ScopeFIR's "point and shoot" capabilities can improve your FIR filter design productivity.

Math Programs: Matlab and its Free Clones offer built-in FIR filter design functions. Source code: One of the best places on the net to find source code to design FIR filters is Charles Poynton's Filter Design Software page.

FIR Filter Implementation

4.1 What is the basic algorithm for implementing FIR filters?

Structurally, FIR filters consist of just two things: a sample delay line and a set of coefficients. To implement the filter:

1. Put the input sample into the delay line.

2. Multiply each sample in the delay line by the corresponding coefficient and accumulate the result.

3. Shift the delay line by one sample to make room for the next input sample. 4.2 How do I implement FIR filters in C?

There are lots of possibilities, including a few tricks. To illustrate, we have provided a set of FIR filter algorithms implemented in C called "FirAlgs.c" in ScopeFIR's distribution file. FirAlgs.c includes the following functions:

1. fir_basic: Illustrates the basic FIR calculation described above by implementing it very literally.

2. fir_circular: Illustrates how circular buffers are used to implement FIRs.

3. fir_shuffle: Illustrates the "shuffle down" technique used by some of Texas Instruments' processors.

4. fir_split: Splits the FIR calculation into two flat (non-circular) pieces to avoid the use circular buffer logic and shuffling.

5. fir_double_z: Uses a double-sized delay line so that the FIR calculation can be done using a flat buffer.

(9)

6. fir_double_h: Similar to fir_double_z, this uses a double-sized coefficient so that the FIR calculation can be done using a flat buffer.

4.3 How do I implement FIR filters in assembly?

FIR assembly algorithms are quite processor-specific, but the most common system uses a circular buffer mechanism provided by the DSP processor. The basic steps are:

1. Configure the circular buffer; load the coefficient and delay-line pointers. Then, for each input sample:

2. Store the incoming data in the delay line; increment the delay-line pointer. 3. Clear the multiplier-accumulator.

4. Loop over all coefficients/delays; accumulate the values obtained by multiplying the coefficients by the delayed samples.

5. Round or truncate the result as the FIR output.

Alternatively, a "shuffle down" method is used in Texas Instruments' older fixed-point processors to implement circular buffers. The processor literally moves each sample delay values by one slot during each multiply-accumulate (via the "MACD" instruction).

Each DSP microprocessor manufacturer provides example FIR assembly code in its data books or its application handbooks, so be sure to look at those before you "reinvent the circular buffer". 4.4 How do I test my FIR implementation?

Here are a few methods:

Impulse Test: A very simple and effective test is to put an impulse into it (which is just a "1" sample followed by at lest N - 1 zeroes.) You can also put in an "impulse train", with the "1" samples spaced at least N samples apart. If all the coefficients of the filter come out in the proper order, there is a good chance your filter is working correctly. (You might want to test with non-linear phase coefficients so you can see the order they come out.) We recommend you do this test whenever you write a new FIR filter routine.

Step Test: Input N or more "1" samples. The output after N samples, should be the sum (DC gain) of the FIR filter.

Sine Test: Input a sine wave at one or more frequencies and see if the output sine has the expected amplitude.

Swept FM Test: From Eric Jacobsen: "My favorite test after an impulse train is to take two identical instances of the filter under test, use them as I and Q filters and put a complex FM linear sweep through them from DC to Fs/2. You can do an FFT on the result and see the complete frequency response of the filter, make sure the phase is nice and continuous everywhere, and match the response to what you'd expect from the coefficient set, the precision, etc."

(10)

FIR tricks center on two things 1) not calculating things that don't need to be calculated, and 2) "faking" circular buffers in software.

4.5.1 How do I skip needless calculations?

First, if your filter has zero-valued coefficients, you don't actually have to calculate those taps; you can leave them out. A common case of this is "half-band" filter, which have the property that every-other coefficient is zero.

Second, if your filter is "symmetric" (linear phase), you can "pre-add" the samples which will be multiplied by the same coefficient value, prior to doing the multiply. Since this technique

essentially trades an add for a multiply, it isn't really useful in DSP microprocessors which can do a multiply in a single instruction cycle. However, it is useful in ASIC implementations (in which addition is usually much less expensive than multiplication); also, some newer DSP processors now offer special hardware and instructions to make use of this trick.

4.5.2 How do I fake circular buffers in software?

When hardware support for circular buffers isn't available, you have to "fake" them. Also, since ANSI C has no construct to describe circular buffers, most C compilers can't generate code to use them, even if the target processor has them.

You can always implement a circular buffer by duplicating the logic of a circular buffer in software (and many have), but the overhead can be prohibitive; the circular-fake might take several instructions to implement, compared to just a single instruction to do the multiply-accumulate operation. Therefore you need to fake it.

Here are several basic techniques to fake circular buffers:

1. Split the calculation: You can split any FIR calculation into its "pre-wrap" and "post-wrap" parts. By splitting the calculation into these two parts, you essentially can do the circular logic only once, rather than once per tap. (See fir_double_z in FirAlgs.c above.) 2. Duplicate the delay line: For a FIR with N taps, use a delay line of size 2N. Copy each

sample to its proper location, as well as at location-plus-N. Therefore, the FIR

calculation's MAC loop can be done on a flat buffer of N points, starting anywhere within the first set of N points. The second set of N delayed samples provides the "wrap around" comparable to a true circular buffer. (See fir_double_z in FirAlgs.c above.)

3. Duplicate the coefficients: This is similar to the above, except that the duplication occurs in terms of the coefficients, not the delay line. Compared to the previous method, this has a calculation advantage of not having to store each incoming sample twice, and it also has a memory advantage when the same coefficient set will be used on multiple delay lines. (See fir_double_h in FirAlgs.c above.)

4. Use block processing: In block processing, you use a delay line which is a multiple of the number of taps. You therefore only have to move the data once per block to

implement the delay-line mechanism. When the block size becomes "large", the overhead of a moving the delay line once per block becomes negligible.

(11)

IIR Filter FAQ

Infinite Impulse Response (IIR) filters are fundamental elements of Digital Signal Processing. Here, dspGuru answers "Frequently Asked Questions" about IIR filtering.

IIR Filter Basics

Infinite Impulse Response (IIR) filters are fundamental elements of Digital Signal

Processing. This is dspGuru's mini-FAQ of "Frequently Asked Questions" on the

subject of IIR filters.

1. IIR Basics

1.1 What are IIR filters? What does "IIR" mean?

IIR filters are one of two primary types of digital filters used in Digital Signal Processing (DSP) applications (the other type being FIR). "IIR" means "Infinite Impulse Response".

1.2 Why is the impulse response "infinite"?

The impulse response is "infinite" because there is feedback in the filter; if you put in an impulse (a single "1" sample followed by many "0" samples), an infinite number of non-zero values will come out (theoretically).

1.3 What is the alternative to IIR filters?

DSP filters can also be "Finite Impulse Response" (FIR). FIR filters do not use feedback, so for a FIR filter with N coefficients, the output always becomes zero after putting in N samples of an impulse response.

1.4 What are the advantages of IIR filters (compared to FIR filters)?

IIR filters can achieve a given filtering characteristic using less memory and calculations than a similar FIR filter.

1.5 What are the disadvantages of IIR filters (compared to FIR filters)?

They are more susceptable to problems of finite-length arithmetic, such as noise generated by calculations, and limit cycles. (This is a direct consequence of feedback: when the output isn't computed perfectly and is fed back, the imperfection can compound.) They are harder (slower) to implement using fixed-point arithmetic.

(12)

interpolation) applications.

This FAQ is Under Construction.

Multirate FAQ

Multirate DSP is a fundamental technique of Digital Signal Processing. Here, dspGuru provides "Rarely Given Answers" to "Frequently Asked Questions" about the important and mysterious subject of Multirate Digital Signal Processing.

Multirate Basics

1.1 What does multirate mean?

Multirate simply means "multiple sampling rates". A multirate DSP system uses multiple sampling rates within

the system. Whenever a signal at one rate has to be used by a system that expects a different rate, the rate has to be increased or decreased, and some processing is required to do so. Therefore "Multirate DSP" really refers to the art or science of changing sampling rates.

1.2 Why should I do multirate DSP?

The most immediate reason is when you need to pass data between two systems which use incompatible sampling rates. For example, professional audio systems use 48 kHz rate, but consumer CD players use 44.1 kHz; when audio professionals transfer their recorded music to CDs, they need to do a rate conversion. But the most common reason is that multirate DSP can greatly increase processing efficiency (even by orders of magnitude!), which reduces DSP system cost. This makes the subject of multirate DSP vital to all

professional DSP practitioners.

1.3 What are the categories of multirate? Multirate consists of:

1. Decimation: To decrease the sampling rate, 2. Interpolation: To increase the sampling rate, or,

3. Resampling:To combine decimation and interpolation in order to change the sampling rate by a fractional value that can be expressed as a ratio. For example, to resample by a factor of 1.5, you just interpolate by a factor of 3 then decimate by a factor of 2 (to change the sampling rate by a factor of 3/2=1.5.)

(13)

Right here. Our "multirate_algs" package includes a decimation, interpolation, and resampling routines. You can download it from dspGuru's DSP Algorithm Library.

1.5 What reference books cover multirate topics?

Many DSP books omit the important subject of multirate altogether. But two introductory texts that briefly go into it are:

1. Understanding Digital Signal Processing[Lyo97]

2. Digital Signal Processing and the Microcontroller[Gro98]

But it's a big subject. The two most popular "industrial strength" (advanced) books that cover multirate in depth are:

1. Multirate Digital Signal Processing[Cro83]

2. Multirate Systems and Filter Banks[Vai92]

Decimation

2.1 Basics

2.1.1 What are "decimation" and "downsampling"?

Loosely speaking, "decimation" is the process of reducing the sampling rate. In practice, this usually implies lowpass-filtering a signal, then throwing away some of its samples.

"Downsampling" is a more specific term which refers to just the process of throwing away samples, without the lowpass filtering operation. Throughout this FAQ, though, we'll just use the term "decimation" loosely, sometimes to mean "downsampling".

2.1.2 What is the "decimation factor"?

The decimation factor is simply the ratio of the input rate to the output rate. It is usually symbolized by "M", so input rate / output rate=M.

Tip: You can remember that "M" is the symbol for decimation factor by thinking of "deci-M-ation". (Exercise for the student: which letter is used as the symbol for interpo-L-ation factor?) 2.1.3 Why decimate?

The most immediate reason to decimate is simply to reduce the sampling rate at the output of one system so a system operating at a lower sampling rate can input the signal. But a much more common motivation for decimation is to reduce the cost of processing: the calculation and/or

memory required to implement a DSP system generally is proportional to the sampling rate, so

(14)

To that, Jim Thomas adds:

Almost anything you do to/with the signal can be done with fewer operations at a lower sample rate, and the workload is almost always reduced by more than a factor of M.

For example, if you double the sample rate, an equivalent filter will require four times as many operations to implement. This is because both amount of data (per second) and the length of the filter increase by two, so convolution goes up by four. Thus, if you can halve the sample rate, you can decrease the work load by a factor of four. I guess you could say that if you reduce the sample rate by M, the

workload for a filter goes down to (1/M)^2.

2.1.4 Is there a restriction on decimation factors I can use?

Yes. Decimation involves throwing away samples, so you can only decimate by integer factors; you cannot decimate by fractional factors. (However, you can do interpolation prior to

decimation to achieve an overall rational factor, for example, "4/5"; see Part 4: Resampling.) 2.1.5 Which signals can be downsampled?

A signal can be downsampled (without doing any filtering) whenever it is "oversampled", that is, when a sampling rate was used that was greater than the Nyquist criteria required. Specifically, the signal's highest frequency must be less than half the post-decimation sampling rate. (This just boils down to applying the Nyquist criteria to the input signal, relative to the new sampling rate.) In most cases, though, you'll end up lowpass-filtering your signal prior to downsampling, in order to enforce the Nyquist criteria at the post-decimation rate. For example, suppose you have a signal sampled at a rate of 30 kHz, whose highest frequency component is 10 kHz (which is less than the Nyquist frequency of 15 kHz). If you wish to reduce the sampling rate by a factor of three to 10 kHz, you must ensure that you have no components greater than 5 kHz, which is the Nyquist frequency for the reduced rate. However, since the original signal has components up to 10 kHz, you must lowpass-filter the signal prior to downsampling to remove all components above 5 kHz so that no aliasing will occur when downsampling.

This combined operation of filtering and downsampling is called decimation.

2.1.6 What happens if I violate the Nyquist criteria in downsampling or decimating?

You get aliasing--just as with other cases of violating the Nyquist criteria. (Aliasing is a type of distortion which cannot be corrected once it occurs.)

2.2 Multistage

(15)

Yes, so long as the decimation factor, M, is not a prime number. For example, to decimate by a factor of 15, you could decimate by 5, then decimate by 3. The more prime factors M has, the more choices you have. For example you could decimate by a factor of 24 using:

one stage: 24

two stages: 6 and 4, or 8 and 3 three stages: 4, 3, and 2

four stages: 3, 2, 2, and 2 2.2.2 Cool. But why bother with all that?

If you are simply downsampling (that is, throwing away samples without filtering), there's no benefit. But in the more common case of decimating (combining filtering and downsampling), the computational and memory requirements of the filters can usually be reduced by using multiple stages.

2.2.3 OK, so how do I figure out the optimum number of stages, and the decimation factor at each stage?

That's a tough one. There isn't a simple answer to this one: the answer varies depending on many things, so if you really want to find the optimum, you have to evaluate the resource requirements of each possibility. However, here are a couple of rules of thumb which may help narrow down the choices:

Using two or three stages is usually optimal or near-optimal.

Decimate in order from the largest to smallest factor. In other words, use the largest factor at the highest sampling rate. For example, when decimating by a factor of 60 in three stages, decimate by 5, then by 4, then by 3.

The multirate book references give additional, more specific guidance.

2.3 Implementation

2.3.1 How do I implement decimation?

Decimation consists of the processes of lowpass filtering, followed by downsampling. To implement the filtering part, you can use either FIR or IIR filters.

To implement the downsampling part (by a downsampling factor of "M") simply keep every Mth sample, and throw away the M-1 samples in between. For example, to decimate by 4, keep every fourth sample, and throw three out of every four samples away.

2.3.2 That almost sounds too easy...

(16)

2.3.3 If I'm going to throw away most of the lowpass filter's outputs, why bother to calculate them in the first place?

You may be onto something. In the case of FIR filters, any output is a function only of the past inputs (because there is no feedback). Therefore, you only have to calculate outputs which will be used.

For IIR filters, you still have to do part or all of the filter calculation for each input, even when the corresponding output won't be used. (Depending on the filter topology used, certain

feed-forward parts of the calculation can be omitted.),. The reason is that outputs you do use are affected by the feedback from the outputs you don't use.

The fact that only the outputs which will be used have to be calculated explains why decimating filters are almost always implemented using FIR filters!

2.4 FIR Decimators

2.4.1 What computational savings do I gain by using a FIR decimator?

Since you compute only one of every M outputs, you save M-1 operations per output, or an overall "savings" of (M-1)/M. Therefore, the larger the decimation factor is, the larger the savings, percentage-wise.

A simple way to think of the amount of computation required to implement a FIR decimator is that it is equal to the computation required for a non-decimating N-tap filter operating at the

output rate.

2.4.2 How much memory savings do I gain by using a FIR decimator?

None. You still have to store every input sample in the FIR's delay line, so the memory requirement is the same size as for a non-decimated FIR having the same number of taps. 2.4.3 How do I design a FIR decimator?

Just use your favorite FIR design method. The design criteria are:

1. The passband lower frequency is zero; the passband upper frequency is whatever information bandwidth you want to preserve after decimating. The passband ripple is whatever your application can tolerate.

2. The stopband lower frequency is half the output rate minus the passband upper frequency. The stopband attenuation is set according to whatever aliasing your application can stand. (Note that there will always be aliasing in a decimator, but you just reduce it to a negligible value with the decimating filter.)

3. As with any FIR, the number of taps is whatever is required to meet the passband and stopband specifications.

(17)

2.4.4 How do I implement a FIR decimator?

A decimating FIR is actually the same as a regular FIR, except that you shift M samples into the delay line for each output you calculate. More specifically:

1. Store M samples in the delay line.

2. Calculate the decimated output as the sum-of-products of the delay line values and the filter coefficients.

3. Shift the delay line by M places to make room for the inputs of the next decimation. Also, just as with ordinary FIRs, circular buffers can be used to eliminate the requirement to literally shift the data in the delay line.

2.4.5 Where can I get source code to implement a FIR decimator in C?

Iowegian's ScopeFIR comes with a free set of multirate algorithms, including FIR decimation functions in C. Just download and install the ScopeFIR distribution file.

2.4.6 Where can I get assembly code to implement a FIR decimator?

The major DSP vendors provide examples of FIR decimators in their data books and application notes; check their web sites.

2.4.7 How do I test a FIR decimator?

You can test a decimating FIR in most of the ways you might test an ordinary FIR:

1. A special case of a decimator is an "ordinary" FIR. When given a value of "1" for M, a decimator should act exactly like an ordinary FIR. You can then do impulse, step, and sine tests on it just like you can on an ordinary FIR.

2. If you put in a sine whose frequency is within the decimator's passband, the output should be distortion-free (once the filter reaches steady-state), and the frequency of the output should be the same as the frequency of the input, in terms of absolute Hz.

3. You also can extend the "impulse response" test used for ordinary FIRs by using a "fat impulse", consisting of M consecutive "1" samples followed by a series of "0" samples. In that case, if the decimator has been implemented correctly, the output will not be the literal FIR filter coefficients, but will be the sum of every subset of M coefficients.

4. You can use a step response test. Given a unity-valued step input, the output should be the sum of the FIR coefficients once the filter has reached steady state.

Interpolation

(18)

3.1.1 What are "upsampling" and "interpolation"?

"Upsampling" is the process of inserting zero-valued samples between original samples to increase the sampling rate. (This is called "zero-stuffing".) Upsampling adds to the original signal undesired spectral images which are centered on multiples of the original sampling rate.

"Interpolation", in the DSP sense, is the process of upsampling followed by filtering. (The filtering removes the undesired spectral images.) As a linear process, the DSP sense of interpolation is somewhat different from the "math" sense of interpolation, but the result is conceptually similar: to create "in-between" samples from the original samples. The result is as if you had just originally sampled your signal at the higher rate.

3.1.2 Why interpolate?

The primary reason to interpolate is simply to increase the sampling rate at the output of one system so that another system operating at a higher sampling rate can input the signal. 3.1.3 What is the "interpolation factor"?

The interpolation factor is simply the ratio of the output rate to the input rate. It is usually symbolized by "L", so output rate / input rate=L.

Tip: You can remember that "L" is the symbol for interpolation factor by thinking of "interpo-L-ation".

3.1.4 Is there a restriction on interpolation factors I can use?

Yes. Since interpolation relies on zero-stuffing you can only interpolate by integer factors; you cannot interpolate by fractional factors. (However, you can combine interpolation and

decimation to achieve an overall rational factor, for example, 4/5; see Part 4: Resampling.) 3.1.4 Which signals can be interpolated?

All. There is no restriction.

3.1.5 When interpolating, do I always need to do filtering? Yes. Otherwise, you're doing upsampling. ;-)

3.1.6 OK, you know what I mean...do I always need to do interpolation (upsampling followed by filtering) or can I get by with doing just upsampling?

Upsampling adds undesired spectral images to the signal at multiples of the original sampling rate, so unless you remove those by filtering, the upsampled signal is not the same as the original: it's distorted.

(19)

Some applications may be able to tolerate that, for example, if the images get removed later by an analog filter, but in most applications you will have to remove the undesired images via digital filtering. Therefore, interpolation is far more common that upsampling alone.

3.2 Multistage

3.2.1 Can I interpolate in multiple stages?

Yes, so long as the interpolation ratio, L, is not a prime number. For example, to interpolate by a factor of 15, you could interpolate by 3 then interpolate by 5. The more factors L has, the more choices you have. For example you could interpolate by 16 in:

one stage: 16 two stages: 4 and 4 three stages: 2, 2, and 4 four stages: 2, 2, 2, and 2 3.2.2 Cool. But why bother with all that?

Just as with decimation, the computational and memory requirements of interpolation filtering can often be reduced by using multiple stages.

3.2.3 OK, so how do I figure out the optimum number of stages, and the interpolation ratio at each stage?

There isn't a simple answer to this one: the answer varies depending on many things. However, here are a couple of rules of thumb:

Using two or three stages is usually optimal or near-optimal.

Interpolate in order of the smallest to largest factors. For example, when interpolating by a factor of 60 in three stages, interpolate by 3, then by 4, then by 5. (Use the largest ratio on the highest rate.)

The multirate book references give additional, more specific guidance.

3.3 Implementation

3.3.1 How do I implement interpolation?

Interpolation always consists of two processes:

1. Inserting L-1 zero-valued samples between each pair of input samples. This operation is called "zero stuffing".

2. Lowpass-filtering the result.

The result (assuming an ideal interpolation filter) is a signal at L times the original sampling rate which has the same spectrum over the input Nyquist (0 to Fs/2) range, and with zero spectral

(20)

content above the original Fs/2. 3.3.2 How does that work?

1. The zero-stuffing creates a higher-rate signal whose spectrum is the same as the original over the original bandwidth, but has images of the original spectrum centered on multiples of the original sampling rate.

2. The lowpass filtering eliminates the images.

3.3.3 Why do interpolation by zero-stuffing? Doesn't it make more sense to create the additional samples by just copying the original samples?

This idea is appealing because, intuitively, this "stairstep" output seems more similar to the original than the zero-stuffed version. But in this case, intuition leads us down the garden path. This process causes a "zero-order hold" distortion in the original passband, and still creates undesired images (see below).

Although these effects could be un-done by filtering, it turns out that zero-stuffing approach is not only more "correct", it actually reduces the amount of computation required to implement a FIR interpolation filter. Therefore, interpolation is always done via zero-stuffing.

3.4 FIR Interpolators

3.4.1 How does zero-stuffing reduce computation of the interpolation filter?

The output of a FIR filter is the sum each coefficient multiplied by each corresponding input sample. In the case of a FIR interpolation filter, some of the input samples are stuffed zeros. Each stuffed zero gets multiplied by a coefficient and summed with the others. However, this adding-and-summing processing has no effect when the data sample is zero--which we know in advance will be the case for L-1 out of each L input samples of a FIR interpolation filter. So why bother to calculate these taps?

The net result is that to interpolate by a factor of L, you calculate L outputs for each input using L different "sub-filters" derived from your original filter.

3.4.2 Can you give me an example of a FIR interpolator?

Here's an example of a 12-tap FIR filter that implements interpolation by a factor of four. The coefficients are h0-h11, and three data samples, x0-x2 (with the newest, x2, on the left) have made their way into the filter's delay line:

h0 h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 Result

(21)

0 x2 0 0 0 x1 0 0 0 x0 0 0 x2·h1+x1·h5+x0·h9

0 0 x2 0 0 0 x1 0 0 0 x0 0 x2·h2+x1·h6+x0·h10

0 0 0 x2 0 0 0 x1 0 0 0 x0 x2·h3+x1·h7+x0·h11

3.4.3 What can I generalize from that example?

The table suggests the following general observations about FIR interpolators:

Since the interpolation ratio is four (L=4), there are four "sub-filters" (whose coefficient sets are marked here with matching colors.) These sub-filters are officially called "polyphase filters".

For each input, we calculate L outputs by doing L basic FIR calculations, each using a different set of coefficients.

The number of taps per polyphase filter is 3, or, expressed as a formula: Npoly=Ntotal / L.

The coefficients of each polyphase filter can be determined by skipping every Lth coefficient, starting at coefficients 0 through L-1, to calculate corresponding outputs 0 through L-1.

Alternatively, if you rearranged your coefficients in advance in "scrambled" order like this: h0, h4, h8,h1, h5, h9,h2, h6, h10,h3, h7, h11

then you could just step through them in order.

We have hinted here at the fact that N should be a multiple of L. This isn't absolutely necessary, but if N isn't a multiple of L, the added complication of using a non-multiple of L often isn't worth it. So if the minimum number of taps that your filter specification

requires doesn't happen to be a multiple of L, your best bet is usually to just increase N to the next multiple of L. You can do this either by adding some zero-valued coefficients onto the end of the filter, or by re-designing the filter using the larger N value.

3.4.4 What computational savings do I gain by using a FIR interpolator?

Since each output is calculated using only N/L coefficients (rather than N coefficients), you get an overall computational "savings" of (N - N/L) per output .

A simple way to think of the amount of computation required to implement a FIR interpolator is that it is equal to the computation required for a non-interpolating N-tap filter operating at the

input rate. In effect, you have to calculate L filters using N/L taps each, so that's N total taps

calculated per input.

3.4.5 How much memory savings do I gain by using a FIR interpolator?

Compared to the straight-forward implementation of interpolation by upsampling the signal by stuffing it with L-1 zeros , then filtering it, you save memory by a factor of (L-1)/L. In other words, you don't have to store L-1 zero-stuffed "upsamples" per actual input sample.

(22)

3.4.6 How do I design a FIR interpolator?

Just use your favorite FIR design method. The design criteria are: 1. TBD

3.5 Implementation

3.5.1 How do I implement a FIR interpolator?

An interpolating FIR is actually the same as a regular FIR, except that, for each input, you calculate L outputs per input using L polyphase filters, each having N/L taps. More specifically:

1. Store a sample in the delay line. (The size of the delay line is N/L.)

2. For each of L polyphase coefficient sets, calculate an output as the sum-of-products of the delay line values and the filter coefficients.

3. Shift the delay line by one to make room for the next input.

Also, just as with ordinary FIRs, circular buffers can be used to eliminate the requirement to literally shift the data in the delay line.

3.5.2 Where can I get source code to implement a FIR interpolator in C?

Iowegian's ScopeFIR comes with a free set of multirate algorithms, including FIR interpolation functions in C. Just download and install the ScopeFIR distribution file.

3.5.3 Where can I get assembly code to implement a FIR interpolator?

The major DSP vendors provide examples of FIR interpolators in their data books and application notes, so check their web sites.

3.5.4 How do I test a FIR interpolator?

You can test an interpolating FIR in most of the ways you might test an ordinary FIR:

1. A special case of an interpolator is an ordinary FIR. When given a value of 1 for L, an interpolator should act exactly like an ordinary FIR. You can then do impulse, step, and sine tests on it just like you can on an ordinary FIR.

2. If you put in a sine whose frequency is within the interpolator's passband, the output should be distortion-free (once the filter reaches steady state), and the frequency of the output should be the same as the frequency of the input, in terms of absolute Hz. 3. You can use a step response test. Given a unity-valued step input, every group of L

outputs should be the same as the sums of the coefficients of the L individual polyphase filters, once the filter has reached steady state.

(23)

Resampling

4.1 Basics

4.1.1 What is "resampling"?

"Resampling" means combining interpolation and decimation to change the sampling rate by a rational factor.

4.1.2 Why resample?

Resampling is usually done to interface two systems which have different sampling rates. If the ratio of two system's rates happens to be an integer, decimation or interpolation can be used to change the sampling rate (depending on whether the rate is being decreased or increased); otherwise, interpolation and decimation must be used together to change the rate

A practical and well-known example results from the fact that professional audio equipment uses a sampling rate of 48 kHz, but consumer audio equipment uses a rate of 44.1 kHz. Therefore, to transfer music from a professional recording to a CD, the sampling rate must be changed by a factor of:

(44100 / 48000) = (441 / 480) = (147 / 160)

There are no common factors in 147 and 160, so we must stop factoring at that point.

Therefore, in this example, we would interpolate by a factor of 147 then decimate by a factor of 160.

4.1.3 What is the "resampling factor"?

The interpolation factor is simply the ratio of the output rate to the input rate. Given that the interpolation factor is L and the decimation factor is M, the resampling factor is L / M. In the above example, the resampling factor is 147 / 160 = 0.91875

4.1.4 Is there a restriction on the resampling factor I can use?

Yes. As always, the Nyquist criteria must be met relative to the resulting output sampling rate, or aliasing will result. In other words, the output rate cannot be less than twice the highest frequency (of interest) of the input signal.

4.1.5 When resampling, do I always need to a filter?

Yes. Since resampling includes interpolation, you need an interpolation filter. Otherwise, the images created by the zero-stuffing part of interpolation will remain, and the interpolated signal will not be "the same" as the original.

(24)

Likewise, since resampling includes decimation, you seemingly need a decimation filter. Or do you? Since the interpolation filter is in-line with the decimation filter, you could just combine the two filters by convolving their coefficients into a single filter to use for decimation. Better yet, since both are lowpass filters, just use whichever filter has the lowest cutoff frequency as the interpolation filter.

4.1.6 How do I design the resampling filter? As hinted at above:

Determine the cutoff frequency of the decimation filter (as explained in Part 2: Decimation.)

Determine the cutoff frequency of the interpolation filter (as explained in Part 3: Interpolation)

Use the lower of the two cutoff frequencies to design the resampling filter.

4.2 Multistage

4.2.1 Can I resample in multiple stages? Yes, but there are a couple of restrictions:

If either the interpolation or decimation factors are prime numbers, you won't be able to decompose those parts of the resampler into stages.

You must preserve the Nyquist criteria at each stage or else aliasing will result. That is, no stage can have an output rate which is less than twice the highest frequency of interest. 4.2.2 Cool. But why bother with all that?

Just as with interpolation and decimation, the computational and/or memory requirements of the resampling filtering can sometimes be greatly reduced by using multiple stages.

4.3 Implementation

4.3.1 How do I implement resampling?

The straight-forward implementation of resampling is to do interpolation by a factor of L, then decimation by a factor of M. (You must do it in that order; otherwise, the decimator would remove part of the desired signal--which the interpolator could not restore.)

4.3.2 Is that straight-forward implementation efficient?

No. The problem is that for resampling factors close to 1.0, the interpolation factor can be quite large. For example, in the case described above of changing from the sampling rate from 48 kHz to 44.1 kHz, the ratio is only 0.91875, yet the interpolation factor is 147!

(25)

Also, you are filtering the signal twice: once in the interpolator and once in the decimator.

However, one of the filters has a larger bandwidth than the other, so the larger-bandwidth filter is redundant.

4.3.3 So what's the cure?

Just combine the computational and memory advantages that FIR interpolator and decimator implementations can provide. (If you don't already understand those, be sure to read and understand Part 2: Decimation, and Part 3: Interpolation before continuing.)

First, let's briefly review what makes FIR interpolation and decimation efficient:

When interpolating by a factor of L, you only have to actually calculate 1/L of the FIR taps per interpolator output.

When decimating by a factor of M, you only have to calculate one output for every M decimator inputs.

So, combining these ideas, we will calculate only the outputs we actually need, using only a subset of the interpolation coefficients to calculate each output. That makes it possible to efficiently implement even FIR resamplers which have large interpolation and/or decimation factors.

The tricky part is figuring out which polyphase filters to apply to which inputs, to calculate the desired outputs, as a function of L and M. There are various ways of doing that, but they're all beyond our scope here.

4.3.4 Where can I get source code to implement a FIR decimator in C?

Iowegian's ScopeFIR comes with a free set of multirate algorithms, including FIR resampling functions in C. Just download and install the ScopeFIR distribution file.

4.3.5 How do I test a FIR resampler?

1. The most obvious method is to put in a sine whose frequency is within the resampler's passband. If an undistorted sine comes out, that's a good sign. Note, however, that there will typically be a "ramp up" at the beginning of the sine, due to the filter's delay line filling with samples. Therefore, if you analyze the spectral content of the sine, be sure to skip past the ramp-up portion.

2. Depending on the resampling factor, resampling can be thought of as a general case of other types of multirate filtering. It can be:

Interpolation: The interpolation factor, L, is greater than one, and the decimation factor, M, is one.

(26)

"Ordinary" Filtering: The interpolation and decimation factors, L and M, are both one.

Therefore, if you successfully test it with all these cases using the methods appropriate for each case, it probably is correct.

Fast Fourier Transform (FFT) FAQ

The Fast Fourier Transform is one of the most important topics

in Digital Signal Processing but it is a confusing subject which

frequently raises questions. Here, we answer Frequently Asked

Questions (FAQs) about the FFT.

1. FFT Basics

1.1 What is the FFT?

The Fast Fourier Transform (FFT) is simply a fast (computationally efficient) way to calculate the Discrete Fourier Transform (DFT).

1.2 How does the FFT work?

By making use of periodicities in the sines that are multiplied to do the transforms, the FFT greatly reduces the amount of calculation required. Here's a little overview.

Functionally, the FFT decomposes the set of data to be transformed into a series of smaller data sets to be transformed. Then, it decomposes those smaller sets into even smaller sets. At each stage of processing, the results of the previous stage are combined in special way. Finally, it calculates the DFT of each small data set. For example, an FFT of size 32 is broken into 2 FFTs of size 16, which are broken into 4 FFTs of size 8, which are broken into 8 FFTs of size 4, which are broken into 16 FFTs of size 2. Calculating a DFT of size 2 is trivial.

Here's a slightly more rigorous explanation: It turns out that it is possible to take the DFT of the first N/2 points and combine them in a special way with the DFT of the second N/2 points to produce a single N-point DFT. Each of these N/2-point DFTs can be calculated using smaller DFTs in the same way. One (radix-2) FFT begins, therefore, by calculating N/2 2-point DFTs. These are combined to form N/4 4-point DFTs. The next stage produces N/8 8-point DFTs, and so on, until a single N-point DFT is produced.

1.3 How efficient is the FFT?

The DFT takes N^2 operations for N points. Since at any stage the computation required to combine smaller DFTs into larger DFTs is proportional to N, and there are log2(N) stages (for radix 2), the total computation is proportional to N * log2(N). Therefore, the ratio between a

(27)

DFT computation and an FFT computation for the same N is proportional to N / log2(n). In cases where N is small this ratio is not very significant, but when N becomes large, this ratio gets very large. (Every time you double N, the numerator doubles, but the denominator only increases by 1.)

1.6 Are FFTs limited to sizes that are powers of 2?

No. The most common and familiar FFTs are "radix 2". However, other radices are sometimes used, which are usually small numbers less than 10. For example, radix-4 is especially attractive because the "twiddle factors" are all 1, -1, j, or -j, which can be applied without any

multiplications at all.

Also, "mixed radix" FFTs also can be done on "composite" sizes. In this case, you break a non-prime size down into its non-prime factors, and do an FFT whose stages use those factors. For example, an FFT of size 1000 might be done in six stages using radices of 2 and 5, since 1000 = 2 * 2 * 2 * 5 * 5 * 5. It might also be done in three stages using radix 10, since 1000 = 10 * 10 * 10.

1.7 Can FFTs be done on prime sizes?

Yes, although these are less efficient than single-radix or mixed-radix FFTs. It is almost always possible to avoid using prime sizes.

2. FFT Terminology

2.1 What is an FFT "radix"?

The "radix" is the size of an FFT decomposition. In the example above, the radix was 2. For single-radix FFTs, the transform size must be a power of the radix. In the example above, the size was 32, which is 2 to the 5th power.

2.2 What are "twiddle factors"?

"Twiddle factors" are the coefficients used to combine results from a previous stage to form inputs to the next stage.

2.3 What is an "in place" FFT?

An "in place" FFT is simply an FFT that is calculated entirely inside its original sample memory. In other words, calculating an "in place" FFT does not require additional buffer memory (as some FFTs do.)

2.4 What is "bit reversal"?

(28)

Therefore the MSBs become LSBs and the LSBs become MSBs. But what does that have to do with FFTs? Well, the data ordering required by radix-2 FFTs turns out to be in "bit reversed" order, so bit-reversed indexes are used to combine FFT stages. It is possible (but slow) to calculate these bit-reversed indices in software; however, bit reversals are trivial when

implemented in hardware. Therefore, almost all DSP processors include a hardware bit-reversal indexing capability (which is one of the things that distinguishes them from other

microprocessors.)

2.5 What is "decimation-in-time" versus "decimation-in-frequency"?

FFTs can be decomposed using DFTs of even and odd points, which is called a Decimation-In-Time (DIT) FFT, or they can be decomposed using a first-half/second-half approach, which is called a "Decimation-In-Frequency" (DIF) FFT. Generally, the user does not need to worry which type is being used.

3. FFT Implementation

3.1 How do I implement an FFT?

Except as a learning exercise, you generally will never have to. Many good FFT implementations are available in C, Fortran and other languages, and microprocessor manufacturers generally provide free optimized FFT implementations in their processors' assembly code, Therefore, it is not so important to understand how the FFT really works, as it is to understand how to use it. 3.2 I'm not convinced. I want to really learn how the FFT works.

(Gosh you're difficult!) Well, virtually every DSP book on the planet covers the FFT in detail. However, if you want to read something online right now, see The Scientists and Engineer's Guide to DSP.

3.3 OK, I read that stuff. Let's cut to the chase. Where do I get an FFT implementation? 1. If you want an assembly language implementation, check out the web site of the

manufacturer of your chosen DSP microprocessor. They generally provide highly optimized assembly implementations in their user's guides and application manuals, and also as part of the library of their C compilers.

2. Here are a couple of the best C implemenations:

MixFFT - This Mixed-radix FFT implemenation by Jens Joergen Nielsen is fast, flexible, and easy to use.

FFTW - The "Fastest Fourier Transform in the West".

3. There are several great FFT hotlists on the net. One of the best is jj's useful and ugly FFT page, and FFTW also has a very good FFT links page.

(29)

The following people have contributed questions, answers, or helpful suggestions: Dale Grover, Marius Vollmer

CORDIC FAQ

CORDIC is an iterative algorithm for calculating trig functions including sine,

cosine, magnitude and phase. It is particularly suited to hardware

implementations because it does not require any multiplies.

1. Basics

1.1 What does "CORDIC" mean?

COordinate Rotation DIgital Computer. (Doesn't help much, does it?!) 1.2 What does it do?

It calculates the trigonometric functions of sine, cosine, magnitude and phase (arctangent) to any desired precision. It can also calculate hyperbolic functions, but we don't cover that here. 1.3 How does it work?

CORDIC revolves around the idea of "rotating" the phase of a complex number, by multiplying it by a succession of constant values. However, the multiplies can all be powers of 2, so in binary arithmetic they can be done using just shifts and adds; no actual multiplier is needed.

1.4 How does it compare to other approaches?

Compared to other approaches, CORDIC is a clear winner when a hardware multiplier is unavailable, e.g. in a microcontroller, or when you want to save the gates required to implement one, e.g. in an FPGA. On the other hand, when a hardware multiplier is available, e.g. in a DSP microprocessor, table-lookup methods and good old-fashioned power series are generally faster than CORDIC.

2.0 Examples

2.1 Do you have an example in the form of an Excel spreadsheet?

Funny you should ask: I just happen to: cordic-xls.zip. I highly recommend you open this up and look it over a little before you ask any more questions.

(30)

Yup, it's your lucky day: cordic.zip.

3.0 Principles

3.1 What symbols will we be using?

Given a complex value: C = Ic + jQc

we will create a rotated value: C' = Ic' + jQc'

by multiplying by a rotation value: R = Ir + jQr

3.2 What are the basic principles?

1. Recall that when you multiply a pair of complex numbers, their phases (angles) add and their

magnitudes multiply. Similarly, when you multiply one complex number by the conjugate of the other, the phase of the conjugated one is subtracted (though the magnitudes still multiply).

Therefore: To add R's

phase to C: C' = C·R Ic' = Ic·Ir - Qc·QrQc' = Qc·Ir + Ic·Qr

To subtract R's

phase from C: C' = C·R* Ic' = Ic·Ir + Qc·QrQc' = Qc·Ir - Ic·Qr

2. To rotate by +90 degrees, multiply by R = 0 + j1. Similarly, to rotate by -90 degrees, multiply by R = 0 - j1. If you go through the Algebra above, the net effect is:

To add 90 degrees, multiply by R = 0 + j1:

Ic' = -Qc

Qc' = Ic (negate Q, then swap)

To subtract 90 degrees, multiply by R = 0 - j1:

Ic' = Qc

Qc' = -Ic (negate I, then swap)

3. To rotate by phases of less than 90 degrees, we will be multiplying by numbers of the form "R = 1 +/-jK". K will be decreasing powers of two, starting with 2^0 = 1.0. Therefore, K = 1.0, 0.5, 0.25, etc. (We use they symbol "L" to designate the power of two itself: 0, -1, -2, etc.)

Since the phase of a complex number "I + jQ" is atan(Q/I), the phase of "1 + jK" is atan(K). Likewise, the phase of "1 - jK" = atan(-K) = -atan(K).

To add phases we use "R = 1 + jK"; to subtract phases we use "R = 1 - jK". Since the real part of this, Ir, is equal to 1, we can simplify our table of equations to add and subtract phases for the special case of CORDIC multiplications to:

(31)

To add a phase,

multiply by R = 1 + jK: Ic' = Ic - K·Qc = Ic - (2^-L)·Qc = Ic - (Qc >> L)Qc' = Qc + K·Ic = Qc + (2^-L)·Ic = Qc + (Ic >> L)

To subtract a phase, multiply by R = 1 - jK:

Ic' = Ic + K·Qc = Ic + (2^-L)·Qc = Ic + (Qc >> L) Qc' = Qc - K·Ic = Qc - (2^-L)·Ic = Qc - (Ic >> L)

4. Let's look at the phases and magnitudes of each of these multiplier values to get more of a feel for it. The table below lists values of L, starting with 0, and shows the corresponding values of K, phase, magnitude, and CORDIC Gain (described below):

L K = 2^-L R = 1 + jK Phase of R in degrees = atan(K)

Magnitude of R CORDIC Gain

0 1.0 1 + j1.0 45.00000 1.41421356 1.414213562 1 0.5 1 + j0.5 26.56505 1.11803399 1.581138830 2 0.25 1 + j0.25 14.03624 1.03077641 1.629800601 3 0.125 1 + j0.125 7.12502 1.00778222 1.642484066 4 0.0625 1 + j0.0625 3.57633 1.00195122 1.645688916 5 0.03125 1 + j0.031250 1.78991 1.00048816 1.646492279 6 0.015625 1 + j0.015625 0.89517 1.00012206 1.646693254 7 0.007813 1 + j0.007813 0.44761 1.00003052 1.646743507 ... ... ... ... ... ... A few observations:

Since we're using powers of two for the K values, we can just shift and add our binary numbers.

That's why the CORDIC algorithm doesn't need any multiplies!

You can see that starting with a phase of 45 degrees, the phase of each successive R multiplier is a little over half of the phase of the previous R. That's the key to understanding CORDIC: we will be doing a "binary search" on phase by adding or subtracting successively smaller phases to reach some target phase.

The sum of the phases in the table up to L = 3 exceeds 92 degrees, so we can rotate a complex number by +/- 90 degrees as long as we do four or more "R = 1 +/- jK" rotations. Put that together with the ability to rotate +/-90 degrees using "R = 0 +/- j1", and you can rotate a full +/-180 degrees.

Each rotation has a magnitude greater than 1.0. That isn't desirable, but it's the price we pay for using rotations of the form 1 + jK. The "CORDIC Gain" column in the table is simply a

cumulative magnitude calculated by multiplying the current magnitude by the previous magnitude. Notice that it converges to about 1.647; however, the actual CORDIC Gain depends on how many iterations we do. (It doesn't depend on whether we add or subtract phases, because the magnitudes multiply either way.)

(32)

4.0 Applications

4.1 How can I use CORDIC to calculate magnitude?

You can calculate the magnitude of a complex number C = Ic + jQc if you can rotate it to have a phase of zero; then its new Qc value would be zero, so the magnitude would be given entirely by the new Ic value.

"So how do I rotate it to zero," you ask? Well, I thought you might ask:

1. You can determine whether or not the complex number "C" has a positive phase just by looking at the sign of the Qc value: positive Qc means positive phase. As the very first step, if the phase is positive, rotate it by -90 degrees; if it's negative, rotate it by +90 degrees. To rotate by +90 degrees, just negate Qc, then swap Ic and Qc; to rotate by -90 degrees, just negate Ic, then swap. The phase of C is now less than +/- 90 degrees, so the "1 +/- jK" rotations to follow can rotate it to zero.

2. Next, do a series of iterations with successively smaller values of K, starting with K=1 (45 degrees). For each iteration, simply look at the sign of Qc to decide whether to add or subtract phase; if Qc is negative, add a phase (by multiplying by "1 + jK"); if Qc is positive, subtract a phase (by multiplying by "1 - jK"). The accuracy of the result

converges with each iteration: the more iterations you do, the more accurate it becomes. [Editorial Aside: Since each phase is a little more than half the previous phase, this algorithm is slightly underdamped. It could be made slightly more accurate, on average, for a given number of iterations, by using "ideal" K values which would add/subtract phases of 45.0, 22.5, 11.25 degrees, etc. However, then the K values wouldn't be of the form 2^-L, they'd be 1.0, 0.414, 0.199, etc., and you couldn't multiply using just

shift/add's (which would eliminate the major benefit of the algorithm). In practice, the difference in accuracy between the ideal K's and these binary K's is generally negligible; therefore, for a multiplier-less CORDIC, go ahead and use the binary Ks, and if you need more accuracy, just do more iterations.]

Now, having rotated our complex number to have a phase of zero, we end up with "C = Ic + j0". The magnitude of this complex value is just Ic, since Qc is zero. However, in the rotation process, C has been multiplied by a CORDIC Gain (cumulative magnitude) of about 1.647. Therefore, to get the true value of magnitude we must multiply by the reciprocal of 1.647, which is 0.607. (Remember, the exact CORDIC Gain is a function of the how many iterations you do.) Unfortunately, we can't do this gain-adjustment multiply using a simple shift/add; however, in many applications this factor can be compensated for in some other part of the system. Or, when relative magnitude is all that counts (e.g. AM demodulation), it can simply be neglected. 4.2 How can I use CORDIC to calculate phase?

To calculate phase, just rotate the complex number to have zero phase, as you did to calculate magnitude. Just a couple of details are different.