• No results found

FFT/IFFT

In document Interference Canceller (Page 51-55)

4. System Performance

4.6. FFT/IFFT

A 4096 point, single channel, fully pipelined FFT/IFFT is specified through the Xilinx FFT IP Core. Most options are left as default, except for using convergent rounding, and natural ordering of the output. Natural ordering has the output of the FFT core come out in a normal direct readable form, instead of digit-reversed order that would require further implementation logic to correctly interpret the output of the FFT.

Following from the ∆f found in previous sections, the appropriate bin ranges for the FFT can be calculated from 0 to NFFT/2 as:

67_F © = –(! ∗ ∆

67_F 4“ –(! ∗ ∆ 1

Table 4.6-1 shows example frequency bin ranges for segments 0-11, covering DC to approximately 3.2 kHz. The full table is 2048 entries (NFFT/2), and consequently is not shown;

however is easily calculated in an Excel spreadsheet. An identical image will occur in the upper half of the bin ranges referenced against fsample, representing a mirror image of the lower bin ranges. For the 50 kHz test tone, the low image side is expected to peak in bin 184 on the low side (N < NFFT/2), and bin 3912 on the high side (N > NFFT/2). As this is only the FFT data being produced, the magnitude of the FFT core output is not the actual magnitude representation of the signal. The magnitude and phase vectors are calculated in later stages by the CORDIC functions.

Verification of the 50 kHz detections is shown in Figure 4.6-1 using available Chipscope data.

Table 4.6-1: Bin Range Examples Bin Start

(Hz)

Stop

(Hz) Bin Start (Hz)

Stop (Hz)

0 0 271 6 1626 1897

1 271 542 7 1897 2168

2 542 813 8 2168 2439

3 813 1084 9 2439 2710

4 1084 1355 10 2710 2981

5 1355 1626 11 2981 3252

Figure 4.6-1: FFT Output, 50 kHz

As a fully streaming configuration, the FFT and IFFT modules have the advantage that each has a throughput of one value per cycle after initial latency. The latency for each module is shown in Table 4.6-2. In this design, all the DSP chain stages are designed to handle one value per cycle, so no chokepoint exists within the chain once data starts entering from the ADC.

Consequently, the overall throughput of this system is one value per cycle. At either start-up or under reset conditions, there will be a time delay prior to the first frame of IFFT data.

Disregarding the delays of other components, as they are negligible compared to the transform length of the FFT/IFFT, a total latency time of approximately 22ms to first signal transmission occurs, and after that it will appear to be fully streaming. Assuming that the intended receiver has no concept of the original signal timing (no ability to detect the delay), the initial start-up delay remains the only indication.

Table 4.6-2: FFT Latency Module IP

Version

Transform

Cycles Latency (ms)

FFT 7.1 12446 11.2

IFFT 7.1 12444 11.2

The resource consumption is shown in Table 4.6-3. The FFT and IFFT modules are instantiated separately in this design and consequently will consume about one third of the 220 available hardware DSP slices, and about ten percent of the available block memory. In designs that do not require simultaneous FFT and IFFT operations (burst data) a single instance could be used, and toggled between the two modes to process data at higher clock frequencies.

Table 4.6-3: FFT Resource Consumption Module IP

Using the FFT chain testbench, with a 100MHz sampling frequency, 10MHz test tone, and a 64 point FFT, the expectation is that bin 6 and 58 will contain the majority of the signal power;

this information is verified in Figure 4.7-1. Using this information, at the output of the CORDIC it is necessary to keep track of the index, as the CORDIC will cause a shift delay to occur, due to the propagation through the CORDIC block. Referring to the Xilinx implementation details, each sample will take 34 cycles to propagate, using a balanced pipeline structure.

Figure 4.7-1: FFT Output

Using Figure 4.7-2, the first CORDIC stage is used to translate the in-phase and quadrature components into magnitude and phase vectors. As expected, the test tone still appears in bins 6 and 58; the resulting magnitude vector can be calculated:

9 0!( ªa> = «! − ¬ℎ > + -ª ab ªb> = 16453 489 16460

Note that in the FFT chain testbench model used, the CORDIC used only takes 28 cycles to complete, however in the actual implementation will require more, due to the different data sizes.

For implementation, the data width is slightly different due to the initially simulated data widths, and consequently the output will occur a few cycles delayed compared with the FFT. As compensation, the indexes are shifted by the appropriate difference to maintain the correct index positioning.

Figure 4.7-2: CORDIC Translation Output

After correction has been applied to the magnitude vector (Figure 2.1-1), the magnitude and phase vectors are rotated back to form the in-phase and quadrature complex signals. With no correction being applied, the full conversion from in-phase/quadrature to magnitude/phase and back to in-phase/quadrature is shown in Figure 4.7-3. Note that through the magnitude correction blocks there is only a two cycle delay between indices due to the construction of the correction, which is represented by the second set of three CORDIC_translate vectors. This delayed version of the CORDIC output is used in the later corrector stage for summation.

Figure 4.7-3: CORDIC Rotation Output

In hardware, the output of the magnitude vector of the CORDIC is the signal of interest, as the phase will be simply passed through. Using a 150kHz test tone, the output of the FFT and translation CORDIC can be viewed in Figure 4.7-4; the in-phase is shown in blue, the quadrature in red, and CORDIC magnitude in purple. By separating out the magnitude, further operations can be performed without affecting the phase.

Figure 4.7-3: CORDIC Output, 150kHz

In document Interference Canceller (Page 51-55)

Related documents