CALIFORNIA STATE UNIVERSITY, NORTHRIDGE
Interference Canceller
A graduate project submitted in partial fulfillment of the requirements
For the degree of Master of Science in Electrical Engineering
By Peter Littlewood
Signatures
The graduate project of Peter Littlewood is approved:
__________________________ _______________ Dr. Ronald Mehler Date
__________________________ _______________ Dr. Ramin Roosta Date
__________________________ _______________ Dr. Shahnam Mirzaei, Chair Date
Acknowledgements
I would like to thank Dr. Mirzaei for being my graduate advisor throughout my time at California State University Northridge, and for his invaluable input during the course of this project. My thanks also extend to David Crawford, Cezar Delgado, Doug Pace, Vinod Saxena, Vivek Sen, and Johnny Vu for invariably putting my education ahead of their needs. Lastly, to my parents for their constant support and direction throughout my life and providing the opportunity for my success.
Table of Contents
Signatures ... ii
Acknowledgements ... iii
List of Figures ... vii
List of Tables ... ix Abstract ... x 1. Introduction ... 1 1.1. Background ... 1 1.2. Objective ... 5 1.3. Comparison ... 5 2. Design... 6 2.1. Theory of Operation ... 6 2.2. Hardware ... 7 2.3. Tools ... 7 3. System Description ... 9 3.1. Clocking Resources ... 9 3.1.1. MMCM ... 9 3.1.2. Clock Divider ... 10 3.2. One Shot... 11 3.3. Reset Circuitry ... 11 3.4. Debounce Circuitry ... 12 3.5. Chipscope ... 13
3.6. Test Tone Generation ... 15
3.6.1. Direct Digital Synthesizer (DDS) ... 15
3.6.2. Digilent PMOD-DA4 ... 17
3.7. Digilent PMOD-AD1 ... 19
3.8. DC Cancellation ... 21
3.9. Hilbert FIR Filter ... 21
3.10. Fast Fourier Transform (FFT) ... 26
3.11. CORDIC ... 27
3.12. K-Point Averaging ... 28
3.13. Magnitude Corrector ... 29
3.14. Inverse Fast Fourier Transform (IFFT)... 30
4. System Performance ... 32
4.1. Clocking ... 32
4.2. Test Tone Generation ... 34
4.3. Sampling ... 36 4.4. DC Cancellation ... 37 4.5. Hilbert FIR ... 40 4.6. FFT/IFFT ... 41 4.7. CORDIC ... 43 4.8. Averaging ... 45 4.9. Correction ... 48 4.10. Corrected Output ... 50
5. Conclusion ... 54 5.1. Performance ... 54 5.2. Resource Consumption ... 54 6. Source HDL... 56 6.1. System VHDL ... 56 6.1.1. Top.vhd ... 56 6.1.2. Clk_divider.vhd [5] ... 70 6.1.3. Oneshot.vhd ... 72 6.1.4. Rst_aasd.vhd ... 74 6.1.5. Btn_debounce.vhd [18] ... 75 6.1.6. Sine_generator.vhd ... 76 6.1.7. SPI_Interface.vhd [19] ... 80 6.1.8. dc_cancellation.vhd ... 85 6.1.9. shift_reg.vhd ... 86 6.1.10. average_IQ.vhd ... 87 6.1.11. signed_shift.vhd ... 94 6.1.12. signed_adder.vhd ... 95 6.1.13. corrector.vhd ... 95 6.1.14. corrector_state_machine.vhd ... 99 6.1.15. counter.vhd ... 102 6.1.16. ram.vhd [20]... 103 6.1.17. utils.vhd ... 105 6.1.18. constraints.ucf ... 105 6.2. IP Cores ... 106 6.2.1. Clock Manager ... 106 6.2.2. Chipscope ... 112 6.2.3. ILA2... 113 6.2.4. ILA3... 116 6.2.5. ILA4... 119 6.2.6. Hilbert FIR ... 122 6.2.7. FFT ... 124
6.2.8. CORDIC (Polar to Rectangular)... 125
6.2.9. CORDIC (Rectangular to Polar) ... 126
6.2.10. IFFT ... 127 6.2.11. DDS F1 ... 129 6.2.12. DDS F2 ... 132 6.2.13. DDS F3 ... 135 6.2.14. DDS F4 ... 138 6.3. Testbench VHDL ... 141 6.3.1. top_tb.vhd ... 141 6.3.2. clk_divider_tb.vhd ... 142
6.3.3. oneshot_tb.vhd ... 145 6.3.4. spi_interface_tb.vhd ... 147 6.3.5. corrector_state_machine_tb.vhd ... 150 6.3.6. corrector_tb.vhd ... 152 6.3.7. dc_cancellation_tb.vhd ... 154 6.3.8. Hilbert_tb.vhd ... 156 Works Cited ... 160
List of Figures
Figure 1.1-1: Signal Cancellation by Summation ... 1
Figure 1.1-2: Frequency Error ... 2
Figure 1.1-3: Phase Error ... 3
Figure 1.1-4: 0.75π Phase Error ... 4
Figure 1.1-4: RF Summation Assembly [1]... 4
Figure 1.3-1: RF Downconversion Block Diagram ... 5
Figure 2.1-1: System Block Diagram ... 6
Figure 3.1-1: Clock Tree Management ... 9
Figure 3.1.1-1: CMT Block Diagram ... 10
Figure 3.3-1: One Shot State Machine ... 11
Figure 3.4-1: Zedboard Pushbutton Characteristic ... 12
Figure 3.4-2: Typical Pushbutton Characteristic [4] ... 13
Figure 3.5-1: ICON Module Signals [14]... 14
Figure 3.5-2: ICON & ILA Core Signals [15] ... 15
Figure 3.6.1-1: DDS Architecture [6] ... 15
Figure 3.6.1-2: DDS Amplitude Shift ... 17
Figure 3.6.2-1: PMOD-DA4 Schematic [7]... 17
Figure 3.7-1: PMOD-AD1 Filter & ADC Chain [13] ... 19
Figure 3.7-2: ADC SPI Communications [12] ... 20
Figure 3.7-3: ADC Quantization [12] ... 20
Figure 3.8-1: DC Cancellation Module [9] ... 21
Figure 3.9-1: Hilbert MATLAB Simulation ... 23
Figure 3.9-2: Hilbert FIR Structure [11] ... 23
Figure 3.9-3: Phase Response ... 25
Figure 3.9-4: Group Delay ... 25
Figure 3.9-5: Magnitude Response ... 26
Figure 3.12-1: 4-Point Averaging Block Diagram ... 29
Figure 4.1-1: MMCM Lock Delay... 32
Figure 4.1-2: 100MHz Clock ... 33
Figure 4.1-3: 40MHz Clock ... 33
Figure 4.1-4: 1.11MHz DSP Clock... 34
Figure 4.2-1: 50 kHz Test Tone Amplitude ... 35
Figure 4.2-2: 50 kHz Test Tone Frequency ... 35
Figure 4.3-1: SPI Controller Timing ... 36
Figure 4.3-2: Chipscope ADC Results... 37
Figure 4.4-1: DC Cancellation Simulation ... 38
Figure 4.4-2: Chipscope Results, DC Cancellation ... 39
Figure 4.4-4: FFT Spectrum, DC Cancellation ... 40
Figure 4.5-2: Hilbert FIR Output ... 41
Figure 4.6-1: FFT Output, 50 kHz ... 42
Figure 4.7-1: FFT Output ... 44
Figure 4.7-2: CORDIC Translation Output ... 44
Figure 4.7-3: CORDIC Rotation Output ... 44
Figure 4.7-3: CORDIC Output, 150kHz ... 45
Figure 4.8-1: 4-Point Average, Initial ... 46
Figure 4.8-2: 4-Point Average, Data Valid ... 47
Figure 4.8-3: 4-Point Average, Enable Low ... 47
Figure 4.8-4: Averager Output ... 48
Figure 4.8-5: Averaging Output, Spurious Signals ... 48
Figure 4.9-1: Pre-Correction FFT output ... 49
Figure 4.9-2: Magnitude Correction ... 49
Figure 4.10-1: Pre-Correction Output ... 50
Figure 4.10-2: Post-Correction Output ... 50
Figure 4.10-3: Multi Tone Signal ... 51
Figure 4.10-4: Interfering Test Sources ... 51
Figure 4.10-5: ADC Input ... 52
Figure 4.10-6: Interfering Source Cancellation ... 52
Figure 4.10-7: Recovered Signal ... 52
List of Tables
Table 2.2-1: System Hardware ... 7
Table 2.3-1: IP Core Listing ... 8
Table 3.1-1: Zedboard Oscillator Specifications ... 9
Table 3.6.2-1: AD5628 SPI Commands [8] ... 18
Table 3.6.2-2: AD5628 SPI Addresses [8] ... 18
Table 3.6.2-3: AD5628 Internal Reference Register [8] ... 18
Table 3.9-1: Hilbert Filter Coefficients ... 24
Table 4.2-1: DAC Measurement Error ... 35
Table 4.3-1: ADC Measurement Error ... 37
Table 4.4-1: Cancellation Measurement Error ... 40
Table 4.6-1: Bin Range Examples ... 42
Table 4.6-2: FFT Latency ... 43
Abstract
Interference Cancellation By
Peter Littlewood
Master of Science in Electrical Engineering
The canceller proposed is used to detect the presence of standing spurious signals present in a given bandwidth, spanning approximately 0-500kHz, through the digitization and processing of the incoming waveform. These spurs can come from a variety of sources, such as an internally generated spur from signal coupling, or external interference sources such as nearby transmission stations. As the spur power approaches and exceeds the signal of interest, the receiving system experiences difficulty distinguishing between the intended signal and the spur generated by the receiving system.
This system aims to remove standing spurious signals, while minimizing distortion to the signal(s) of interest. For the purposes of this project, it is the goal to remove a single continuous waveform (CW) located at an arbitrary point within the receiver bandwidth.
1. Introduction 1.1. Background
A set of signals received consists of both the signal(s) of interest and undesired components due to noise and co-located transmission sources, interfering with the receiver’s ability to correctly interpret the desired data signaling. In a cancellation system, it is desirable to remove as much of these signals as possible while minimizing impact to the signal of interest. For the purposes of this project, we restrict ourselves only to continuous wave (CW) signals, which are fixed in frequency and amplitude; however the frequency and amplitude of the interfering signal is not known to the receiver at synthesis time. It is a requirement of the receiver to determine the location, and match the amplitude for correction in real time.
An analog implementation operates by taking an incoming bandwidth, and using a phase-controlled length of cabling a delayed version is created 180o from the original signal. The amplitude of the delayed signal is manually balanced, and summed with the original signal to provide cancellation at a specific frequency. This concept is shown in Figure 1.1-1 below, where the frequency and phase have no error introduced (the ideal case). The small variations in the output are on the order of 10-15 and can be safely attributed to small rounding precision error occurring in MATLAB, and are discarded as an artifact of simulation.
Figure 1.1-1: Signal Cancellation by Summation
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 10-3 -1 -0.5 0 0.5 1
Original Signal & Out of Phase
time (s) A m p li tu d e Original Signal 180deg Phase Shift
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 10-3 -1 0 1 2x 10 -15 Signal Summation time (s) A m p li tu d e
This method of cancellation requires that the RF cabling be precisely cut to match the desired multiple of the wavelength, and typically is fed into a balancing stage to compensate for insertion loss prior to summation. This presents problems both in terms of adaptability, and the manual skill and equipment required to accurately phase-match the delays. In the above Figure 1.1-1, the cancellation of the signal was accomplished by having an exact frequency and phase match, however both goals are impossible; the frequency, phase, and amplitude of the signal are continuous functions with zero probability of tuning to the precise value. When errors arise in the tuning, effects such as beat frequencies (frequency error), and phase shifting (phase error) occur. Figure 1.1-2 demonstrates the creation of a new beat frequency, stemming from a 5% error in the frequency, described:
2 + 2 = 2 2 −2 2 +2
= | − |
Figure 1.1-2: Frequency Error
Similarly, if we fix the frequency at an ideal value, and introduce phase error into the signal summation, it can be shown that the new signal created is of the same frequency, however will have a different phase and amplitude. The general equation is shown:
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 -1 -0.5 0 0.5 1
Input Signal & Delayed Signal
samples [n] N o rm a li z e d A m p li tu d e Input Signal 5% Frequency Error 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 -2 -1 0 1 2 Signal Summation samples [n] N o rm a li z e d A m p li tu d e
αsin 2 + sin 2 +
= + + 2 cos sin 2 + !" # sin
+ cos $ + % % = &0 ( + ( + > 0< 0
The effect of introducing phase error, created by inaccuracy in controlling the exact propagation delay through the summation path, is shown in Figure 1.1-3. As a result of not matching the phase delay correctly, a new spurious signal has been created, in this case attenuated by approximately 9dB:
+ (!,- = 10/ 0 123345
67 8
Figure 1.1-3: Phase Error
In the above case there remains a beneficial effect, because only a small amount of error was introduced; the correcting signal is still close to being of opposite polarity. However, as error increases, the effectivity of the cancellation will diminish. As opposed to a shift of 0.05π shown above, if the delay tap is incorrectly cut and the two signals are brought near in-phase with each other, the correcting signal constructively interferes with the original spur, creating a larger signal. 0 200 400 600 800 1000 1200 1400 1600 1800 2000 -1 -0.5 0 0.5 1
Input Signal & Delayed Signal
samples [n] N o rm a liz e d A m p li tu d e Input Signal .05π Phase Error 0 200 400 600 800 1000 1200 1400 1600 1800 2000 -0.2 -0.1 0 0.1 0.2 Signal Summation samples [n] N o rm a liz e d A m p li tu d e
Figure 1.1-4: 0.75π Phase Error
An implementation of a fully analog solution, built by Radio Sky, is shown in Figure 1.1-4. The solution is comprised of potentiometers and a series of phase taps to create the signal, and can be switched over at divisions of the wavelength to select the best cancellation for summation. This solution is effective at a relatively low cost, however does suffer from impedance mismatching. While balanced coaxial cabling is being used to create the delay taps, the potentiometers used are not intended for this purpose (their frequency characteristics are not guaranteed), and it is very likely that the assembly will create impedance mismatches. Should the frequency location of the interfering signal change, the assembly will need to be re-tuned to accommodate. More details regarding this system and its performance can be found on the Radio Sky website [1].
Figure 1.1-4: RF Summation Assembly [1]
0 200 400 600 800 1000 1200 1400 1600 1800 2000 -1 -0.5 0 0.5 1
Input Signal & Delayed Signal
samples [n] N o rm a li z e d A m p li tu d e Input Signal 0.75π Phase Error 0 200 400 600 800 1000 1200 1400 1600 1800 2000 -2 -1 0 1 2 Signal Summation samples [n] N o rm a li z e d A m p li tu d e
1.2. Objective
The desired end product has the following minimum criteria: high data throughput, flexibility in frequency recognition, and minimal impact to the signal of interest in accordance with the abstract. Because of the potential applications of this project, it is necessary that the output should appear to be fully contiguous to the end application; either as output to a human, or as an input to the next processing component in a higher level system.
The technical approach of the proposed system is to attempt corrections within the frequency domain information (as opposed to time domain). By building a spectrum of the baseband input, multiple transform frames can be merged together to form a composite image, and played back to subsequent transform frames, thereby providing a real-time correction. If at the time of recording there are no useful information transmissions occurring, then all other power is assumed to be interference; any reduction that can be made to these signals is beneficial to the system.
1.3. Comparison
The system proposed does not require phase delays taps to be installed, as it relies on the digital processing to recognize spurious signals and remove them from the output waveform; additionally, any arbitrary frequency (bounded by the bandwidth of the receiver) is considered valid. However, for frequencies above the receiver’s bandwidth downconversion stages will be necessary, requiring additional circuitry to down-convert signals from. The SoC utilized can provide the control logic required for the down-conversion, limitations on the utilized ADC prevent higher direct bandwidth applications. A basic front end down conversion block diagram is shown below in Figure 1.3-1. Subsequently, this system would be part of a receiver chain consisting of an analog front end and digital back end, and subsequently have a higher cost than the phase tap solution, but provide greater flexibility.
Figure 1.3-1: RF Downconversion Block Diagram
For applications that only require baseband, such as audio band, the receiver could process with no front end conversion required as audio typically only extends to 20kHz in the audible range, and is generally tapered off by the filters. This significantly lowers the cost of operation of this system, as the down-conversion chain is not required.
2. Design 2.1. Theory of Operation
A block diagram of the overall proposed system is shown in Figure 2.1-1. The steps include signal digitization, DC cancellation, filtering, Fast Fourier Transform (FFT), rectangular to polar coordinate conversion, correction, polar to rectangular reversion, Inverse Fast Fourier Transform (IFFT), and output reconstruction. By sampling the input, processing, and separating the magnitude and phase of the frequency domain data, information about the magnitude spectrum can be obtained about what signals are present, and their corresponding power. Assuming that the correction is built when only interfering sources are present (no data transmission), the spectrum can be recorded as the correction factor and played back against subsequent data frames.
The correction is built and applied based on an input start command, and for purposes of demonstration is connected to a pushbutton. This could be initiated instead by a state machine, or as a command sent down from a higher level system. When applied, the synthesized logic builds an image representing an average magnitude data frame, and subsequently begins applying this magnitude frame to future data, aligned with the bin indices of future data frames. After the magnitude has been corrected, the frequency data is passed through the IFFT and recombined for output to the next processing stage. When a new set of deterministic signals is overlaid on the continuous spurs, these should be passed through, so long as the original spurs remain unmodified.
Figure 2.1-1: System Block Diagram
Each block component is further discussed within Section 3 for background on theory of operation, and the individual implementation details & results are discussed in Section 4. Overall system performance is discussed in Section 5.
2.2. Hardware
A Xilinx Zynq-7000 series System on Chip (SoC) device is interfaced with external ADC and DAC components to provide a closed loop test system to develop on. The Zynq SoC used is a XC7Z020 device comprised of a dual-core ARM Cortex-A9 processor (referred to as the Processing System), and an Artix-7 FPGA fabric (referred to as the Programming Logic). Interfacing between the Processing System (PS) and Programming Logic (PL) layers is primarily handled by the Xilinx-proprietary AXI4 interface, a standardization of interfaces being used by Xilinx in new generations of IP cores. This project does not contain any functions that require the PS system to be used, and consequently the PS is disabled. Future additions, such as a command/control module, would be a potential use of the PS layer.
Additionally the Zedboard contains a number of on-board peripherals; however a number of them are only accessible via the PS subsystem. For the purposes of this project, only the external Fox oscillator (100MHz), and two PMOD add-on device boards are used. The PMOD boards, produced by Digilent Inc, provide the necessary ADC and DAC; the Zedboard does not carry on-board DAC chips, and the Zynq’s XADC (Xilinx ADC) requires either a specialized AXI interface through the Processing System, or as a Dynamic Configuration Port (DRP) for direct logic. It is preferable not to collect timing sensitive data through the processor, which is poorly suited for timing sensitive measurements; the inaccuracy will manifest as phase noise in the FFT. Additionally, it is preferred to use standard communication protocols (SPI) with low overhead and superior portability. A listing of hardware used is shown in Table 2.2-1.
Table 2.2-1: System Hardware
Description Manufacturer P/N Quantity Price
(USD)
ZedBoard Evaluation
Kit Avnet
AES-Z7EV-7Z020-G 1 $395.00
PMOD-AD1 Digilent 410-064P-KIT 1 $37.00 PMOD-DA4 Digilent 410-245P-KIT 1 $37.00
2.3. Tools
VHDL is selected as the descriptor language for this project; however both VHDL and Verilog are equally capable in completing the necessary tasks. Simulation and synthesis is completed under Xilinx ISE 14.5 (P.58f) suite, targeting the Zynq XC7Z020-2CLG484 SoC. Any Xilinx IP Cores are considered to be exhaustively verified by Xilinx and only require functional verification, and HDL designed specifically to this project requires further verification tasks. Table 2.3-1 lists the appropriate versions of IP used in this project.
Usage of the Xilinx IP cores is a major limitation on the portability of this design, as moving to either another FPGA designer or to an ASIC will require change from the Xilinx IP core set to either another IP provider or a custom designed module. As this project is designed for Xilinx targets, some interesting technology specific techniques can be used, such as register initialization. Xilinx parts, so there is no simulation mismatch by initialization; however it is not guaranteed that all previous Xilinx parts will have acted in the same manner, so care must be exercised when transitioning between families.
Table 2.3-1: IP Core Listing
IP Core Description Version
Clocking Wizard 3.6 Chipscope Integrated Controller (ICON) 1.06a
Integrated Logic Analyzer (ILA) 1.05a
DDS Compiler 4.0
FFT 7.1
CORDIC 4.0
For signal verification, a LeCroy WaveAce 212 oscilloscope is used. With higher speed lines, such as the 100MHz and 40MHz clocks, there is a significant amount of signal distortion introduced when making measurement on the clocks due to the loading effect (complex impedance) and the coupling displayed between traces.
3. System Description 3.1. Clocking Resources
The Zedboard is primarily fed by a pair of oscillators: a 33.3333MHz oscillator for the PS subsystem, and a 100MHz oscillator for the PL subsystem. A basic set of characteristics for both oscillators is found in Table 3.1-1; however only the oscillator for the PL subsystem will be used. The external oscillator is brought into the MMCM, and buffered 100MHz and 40MHz clocks are generated from this input so that phase relationships are maintained. For the DSP elements, a further reduced clock is implemented in VHDL, following the clock tree shown in Figure 3.1-1. Descriptions of the MMCM and clock divider circuitry can be found in Sections 3.1.1 and 3.1.2.
Table 3.1-1: Zedboard Oscillator Specifications
Description Manufacturer P/N Frequency
(MHz) Phase Jitter (U.I.) PL Oscillator Fox 767-100-136 100.0 <0.01 PS Oscillator Fox 767-33.333333-12 33.333333 <0.01
Figure 3.1-1: Clock Tree Management
3.1.1. MMCM
The primary clocking elements are synthesized through the Xilinx IP Core structure to properly implement the Mixed-Mode Clock Manager (MMCM) or Phase Lock Loop (PLL) primitives. While clocks may be described within the HDL, they are not guaranteed to infer the dedicated clocking network and instead create the clocks in the logic fabric, unless the clock buffering components are explicitly used during synthesis to correctly bring clocks into the global clocking network. It is preferable that clocks are sent through the dedicated clocking network, and the IP core will ensure correct synthesis.
Each Clock Management Tile (CMT) in current 7 series FPGA family supports one MMCM and one PLL, as shown in Figure 3.1.1-1[3]. The MMCM is essentially a PLL with a few improvements, most notably the addition of a fractional counter either in the feedback path or output, which gives it a frequency synthesis advantage by a factor of eight, and improved phase shift ability. Additionally, the MMCM can implement spread spectrum techniques, by modulating the clock signal to move the Power Spectral Density (PSD) over a wider bandwidth [3]
. The applicable techniques (center-spread modulation, down-spread modulation) requires that clock outputs three and four be used to generate the modulation scheme, so the MMCM is consequently reduced to five frequency outputs. With higher frequencies being used, and the location of the XADC inside the FPGA fabric, reducing the EMI effect becomes increasingly important.
Figure 3.1.1-1: CMT Block Diagram
3.1.2. Clock Divider
Due to the data arrival rate of the ADC, a clock divider is needed to generate the clock for the DSP elements, synchronous with the main clock; the MMCM instantiation is unable to directly generate the clock frequency required, so a logic implementation is required. The clock divider implemented is from the EDN Network [5], originally developed by Brian Boorman of Harris RF Communications. VHDL code used to implement the clock divider module can be found in Section 6.
3.2. One Shot
A one shot module is used for generating either a single pulse or level upon a trigger condition, implemented using the state machine diagram in Figure 3.3-1. The one shot module is used for configuring the Xilinx FFT/IFFT IP core, and for the AD5628 DAC module’s initial register programming. Upon reset being asserted, the state machine enters the ‘IDLE’ state until a trigger condition is asserted. The state machine enters ‘D1’ and outputs a high pulse, and one cycle later enters the ‘WAIT FOR RST’ state. During this state, the output’s state is determined by a generic of the module (see Section 6). The state machine type is left to the Xilinx synthesis tools. As long as functionality is achieved, it is not of any particular importance what type is used.
Figure 3.3-1: One Shot State Machine
3.3. Reset Circuitry
Under this technology, Xilinx highly recommends against usage of global asynchronous resets, and prefers a structure of a synchronizing stage, and selective local synchronous reset. When considering low clock speed circuitry, flip flops spend a relatively small amount of time in those unstable time slots; but as the frequency increases, that time slot does not decrease with the shrinking period. In effect, as the circuit is run at a higher operation, the likelihood that any flip flop will be pushed into a metastable state increases, making asynchronous resets an increasingly poor choice. It is preferable that a synchronizing circuit is used with a cascade of flip flops, where the synchronizer significantly increases the Mean Time Between Failures (MTBF):
9:;<== >? = @ A :4 B >?@ A= :4 # D E F
In the above equation, tr is the resolution time, f is the system frequency, a is the asynchronous event frequency, and To and τ are parameters specific to the target technology. As an increasing number of synchronizers are used, the MTBF raises, but at the cost of complexity of the system. In this implementation, the counter used by the debouncing circuitry provides a series of flip flops to act as the synchronizer. While an incorrect value may be propagated, the likelihood of propagating a metastable event is decreased.
3.4. Debounce Circuitry
A typical pushbutton displays erratic pulses both upon being asserted (logic low to high), and from release (logic high to low) due to the mechanical nature of the device. Figure 3.4-1 contains a sample from a button press on the Zedboard development board. This board does not demonstrate the bouncing characteristics typical of pushbuttons, however is virtually new. With continued use is likely to display more events as in Figure 3.4-2, as the mechanical components wear. These events are unavoidable, and requires either analog filtering circuitry, a fixed digital implementation (discrete digital components), or a synthesized digital implementation.
Figure 3.4-2: Typical Pushbutton Characteristic [4]
Without logic to detect when a user pushbutton input has occurred, a single pushbutton press may be incorrectly interpreted as multiple input events and cause the logic to display erratic behavior, such as the appearance of skipping states or the appearance of random jumps in control logic. This behavior is accounted for through the debounce circuitry, using a counter that begins incrementing when the asynchronous input is active. If the input is active long enough and in turn the counter reaches a threshold value, then a valid input is declared and an output pulse is generated.
3.5. Chipscope
Chipscope provides a powerful interface for debugging the programmable logic in hardware, providing on-chip capability to tap into signals based on the Integrated Controller (ICON), Integrated Logic Analyzer (ILA), Virtual Input/Output (VIO), and Integrated Bus Analyzer (IBA) cores. The ICON provides on-chip management of the various companion cores, and handles communication to the Chipscope software via the JTAG connection. The ICON is capable of controlling multiple ILA/IBA/VIO cores simultaneously at varying levels of architecture, by passing the control signals through lower level modules as required. Shown in Figure 3.5-1, each CONTROL signal is connected to one of the companion cores, such as the ILA. For the purposes of this project, only the ILA and ICON cores are required.
Figure 3.5-1: ICON Module Signals [14]
The ILA core is connected to various selected signals within the VHDL modules, and when synthesized gives access to the actual operation of FPGA components. With this access, vectors can be displayed in convenient forms (binary, signed decimal, etc.), and advanced triggering options assist in locating transient errors. Additionally, information about performance under real conditions (such as spectral data) can be obtained, where simulations have limitations on how realistic the spectral density will be.
Figure 3.5-2: ICON & ILA Core Signals [15]
3.6. Test Tone Generation
3.6.1. Direct Digital Synthesizer (DDS)
The DDS, otherwise known as a Numerically Controlled Oscillator (NCO), generates a numerical sequence representing a sinusoidal waveform. Shown in Figure 3.6.1-1, the DDS is able to output both a sine and cosine function to be used, useful in generating carrier signals for in-phase and quadrature modulation/demodulation. The DDS operates by using the Phase Accumulator to determine the necessary phase slope, quantizes, and maps to a LUT; the resulting output is a quantized approximation of the sinusoidal function. In the case of just generating a test tone only one of the outputs is needed; it is irrelevant which one is used.
One tone used during hardware testing is a 50 kHz sine wave, driven by a 50 MHz clock frequency. DDS frequencies can be calculated as follows:
45 = 2GHI-K L∆ MN GHI = 509MN ;P 7 = 27 1 ∆ = 20R49 U = 134217 1 45 =50W10 U∗ 134217 2 Y MN 45 = 49,999.728MN 45 ~ 50^MN
Additionally, the truncation of the address count (dropping of fractional bits) will cause spurs to appear in the waveform as a result of quantization. Following the general rule for 6dB/bit of dynamic range, the calculation of the Spurious Free Dynamic Range (SFDR) becomes:
_`: aab> ;( = cd<ef6 h
For a SFDR of 40dB:
_`: aab> ;( = c406 h _`: aab> ;( = 7
Remaining bits of the DAC can be used to provide DC offset, or shifting the DDS output within the DAC range to effectively multiply or divide the AC component. The DDS provides a signed two’s complement output, and for the above example of a 7 bit vector, has a range of -63 to 63. For usage within a unipolar DAC where negative values are meaningless, the values can be shifted by the range of 2address_width-1, which for the case of a 7 bit vector is 26, or 64. This provides a new range of 1 to 127, suitable for usage with unipolar devices. The concept is shown below in Figure 3.6.1-2.
Figure 3.6.1-2: DDS Amplitude Shift
3.6.2. Digilent PMOD-DA4
A PMOD-DA4 board produced by Digilent is used for generating test signals, as the Zedboard does not have on-board DAC circuitry. The schematic of the PMOD-DA4 is shown in Figure 3.6.2-1. At the FPGA board side, the SYNC, DIN, and SCLK signals form the SPI bus, and the power is connected through the Zedboard’s VCC and GND. The IC used in the PMOD-DA4 board is the AD5628-1, which has a 1.25V internal reference, and internal output amplifier with a gain of 2, allowing for a 0-2.5V full scale voltage swing at the output of the DAC. It is important to note that this board does not provide an external reference voltage for the DAC, so the internal reference register must be set for functional operation.
As the external reference is not set on the PMOD-DA4 board, the internal 1.25V reference must be enabled according to Table 3.6.2-1 and Table 3.6.2-2. If this register is not set, the DAC will appear to have no output (all bits are multiplied by 0V). The internal register is set by sending the command bits ‘1000’, and data bit 0 (D0) is set high or low to enable or disable the internal reference respectively, in accordance with Table 3.6.2-3.
Table 3.6.2-1: AD5628 SPI Commands [8]
Table 3.6.2-2: AD5628 SPI Addresses [8]
3.7. Digilent PMOD-AD1
A 12-bit Successive Approximation ADC performs the digitization of the incoming signals, controlled via standard SPI communications shown in Figure 3.7-2. The Digilent PMOD-AD1 board is composed of a pair of AD7476A Analog Devices ADC’s with Sallen-Key anti-aliasing filters on the input. The schematic for a single ADC chain is shown in Figure 3.7-1. A direct consequence of the layout of the PMOD-AD1 board is that since the SPI MISO and CS lines are shared between the two modules, interleaving cannot be used to increase the sampling frequency.
Figure 3.7-1: PMOD-AD1 Filter & ADC Chain [13]
The Analog Devices AD7476A used in the PMOD-AD1 contains 12 bits of data transmitted using a 16-bit length, containing 4 ‘don’t care’ bits to maintain compatibility with standard word length sizing. Consequently, approximately one fourth of the transmission cycle is wasted, so a higher rate clock is required to achieve the same sampling performance. The transferal sequence is shown in Figure 3.7-2, indicating a few important considerations in the implementation: the clock polarity (CPOL) and clock phase (CPHA). Referring to Figure 3.7-2, during inactive periods (no data transmission) SCLK is held high, indicating that CPOL=1. Additionally, the data is shown as being set after time t4 on the falling edge of SCLK, indicating that data should be read on the rising edge (CPHA=1).
It is worth noting that in the Analog Devices design [24], these timing constraints are not met as they have used the PS subsystem and accompanying core modules, and have not followed the design constraints. Additionally, erroneous pulses are displayed on the chip select line, and while the pulses do not affect operation, clock cycles are wasted in de-assertion and re-assertion. If using this design, care should be taken regarding the SPI core.
Figure 3.7-2: ADC SPI Communications [12]
With a 12 bit ADC, there is a theoretical dynamic range of approximately 72dB (6dB/bit), minus internal loss; referring to the AD7476 datasheet for dynamic range, Analog Devices guarantees 70dB of Signal to Noise Ratio (SNR) performance, and 69dB Signal to Noise + Distortion (SINAD) performance. The quantization effects, and encoding of the ADC values is shown in Figure 3.7-3, and is useful in determining the measurements read from the ADC in future sections.
3.8. DC Cancellation
Due to the unipolar nature of the AD7476A ADC, there will always be a DC component for non-zero input signals; these may be the result of level shifting from previous stages, or an intended DC component of the signal. Options such as fixed value implementations are sub-optimal, where a generic offset is applied to all signals. Alternatively, high-pass or band-pass filter could be used to remove the DC component (zero frequency), however have high resource consumption during implementation. A solution shown from Xilinx WP279 [9] acts similar to an RC filter, where the fast changes of the AC component produce small changes the DC level Vo, and over a large number of samples will produce a value that approximates the DC component of the signal. This signal is fed back through a subtractor to subsequent data samples. This circuit is area efficient, and provides a dynamic cancellation capability.
The cancellation of the DC component provides two useful benefits: the power spectrum of the FFT will no longer be dominated by the DC component and/or saturating as easily, and providing the flexibility to output to a bipolar mode device. As can be noted in Figure 3.8-1, both the corrected signal (AC component) and the Vo signal (DC component) are available as outputs. Referring to the block diagram in Figure 2.1-1, if the DC component is required to be carried through to the output it can be summed back into the data path after the correction stage. The presence of the DC component at the correction stage will cause the corrector to use the DC component in the correction, and may cause undesired effects at the output.
Figure 3.8-1: DC Cancellation Module [9]
3.9. Hilbert FIR Filter
The ideal Hilbert Transform results in the following frequency and impulse response:
Mijk>lmn = &−o,o, 0 * p ** p * 0
qijr!s t
0, ! 0
2 (! ? !2 A
This indicates that the output of the Hilbert transform is a 900 phase shifted signal for positive frequencies, and by -900 for negative frequencies. The transfer function can also be represented, and is used in the following example, as [10]:
Mijk>lmn = &−o 0! p >
"lmv, p < |p| * p
0, ℎ>bw( >
Example: Determine the Hilbert Transform of the function:
W = cos 2 G
Taking the Fourier Transform, using available transform tables for a simple cosine function:
x G =12 [y − G + y + G ]
xzi G = −o 0! G x G
xzi G =2o 0!1 G [y − G + y + G ]
Applying sgn(fc) to the two delta pulses, y + G lies on the negative side of the frequency spectrum, so the Signum function will be negative, whereas the y − G lies on the positive side of the frequency spectrum and will be positive. Applying the Signum function and simplifying, we can convert back to obtain the Hilbert response.
xzi G =2o [y −1 G − y + G ]
W{i = sin 2 G
An important property of the Hilbert transform that has occurred is the creation of a sine function from a cosine (90 degree phase advance), which can be used to form the in-phase and quadrature vectors. This result can be verified through software simulation, using the built-in MATLAB Hilbert function, and separating the resulting complex value into the real and imaginary vectors:
Figure 3.9-1: Hilbert MATLAB Simulation
A quantized, causal form of the Hilbert Transform can be implemented with a Finite Impulse Response (FIR) filter, using the architecture in Figure 3.9-2 [11]. A linear phase form filter is preferred, which will provide important properties in the implementation. Using a Type III filter (antisymmetric, odd), the general difference equation can be shown below [10], and further implemented in hardware:
|[!] = } [^] W[! − ^] − W[! − 9 + ^]
~" I•1
Figure 3.9-2: Hilbert FIR Structure [11]
0 50 100 150 200 250 300 -1.5 -1 -0.5 0 0.5 1 1.5 Hilbert Transform samples [n] N o rm a li z e d A m p lit u d e Input Signal Hilbert Transform
Using the antisymmetric property of the above difference equation, only even numbered coefficients are required, where the odd numbered coefficients are all set to 0. Due to the accumulated rounding errors present in MATLAB, the calculated coefficients will have non-zero components in the odd components, but on the order of <10-15, and zeros should be substituted. Additionally, during the implementation stage the leading and lagging zeros generated will be dropped, as they are not necessary for implementation. The filter coefficients obtained are shown below in Table 3.9-1. This results in using half as many hardware multipliers compared with a standard FIR direct form structure, implemented either by subtractors, or negating the value and using the hardware adder chain.
Table 3.9-1: Hilbert Filter Coefficients
Index a[k] Index a[k]
1 0 12 0.633938 2 -0.09606 13 0 3 0 14 0.204298 4 -0.07387 15 0 5 0 16 0.114551 6 -0.11455 17 0 7 0 18 0.073874 8 -0.2043 19 0 9 0 20 0.096063 10 -0.63394 21 0 11 0
An important property of the implemented filter is the group delay, calculated as:
€E, p ≜ −a‚ pap
When a linear (first order) phase delay is used, the group delay becomes constant, as shown above. This indicates that for all frequencies, the delay can be viewed as a time shift and will not introduce distortion or smearing within the signal. With non-linear (higher order) phase responses, the relative phase of each frequency component is delayed by a variable amount, distorting the output signal. This becomes particularly problematic when information is being carried on the envelope, such as during pulse transmission. Figure 3.9-3 and Figure 3.9-4 show the phase response and group delay of the calculated filter.
Figure 3.9-3: Phase Response
Figure 3.9-4: Group Delay
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 -30 -25 -20 -15 -10 -5
Normalized Frequency (×π rad/sample)
P h a s e ( ra d ia n s ) Phase Response 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 9.5 9.6 9.7 9.8 9.9 10 10.1 10.2 10.3 10.4 10.5
Normalized Frequency (×π rad/sample)
G ro u p d e la y ( in s a m p le s ) Group Delay
The magnitude response can be observed in Figure 3.9-5. This filter operates from approximately 0.05π to 0.95π, with a passband ripple of 2dB, representing an all-pass filter through most of the band.
Figure 3.9-5: Magnitude Response
3.10.Fast Fourier Transform (FFT)
The FFT is an efficient implementation of the Discrete Fourier Transform (DFT), requiring a computational complexity of approximately Nlog2N, as opposed to the approximately N2 complexity of directly computing the DFT. The FFT is used for computing the frequency spectrum in terms of quantized ‘bins’ of data related to the power spectral density. Given an arbitrary input frame of time domain data, the FFT can be calculated as:
x[^] = } W[!]>"l7I ƒ„ , ^ = 0,1, … , † − 1 „"
7•1
∆ =†∆1
Where ∆t is the sampling period that the FFT is being run at. The FFT rate is determined by the arrival of data from the ADC. With a 4096 point filter being run, the frequency resolution can be calculated: ∆ =1.11W104096 U ∆ ≅ 271MN 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 -50 -40 -30 -20 -10 0
Normalized Frequency (×π rad/sample)
M a g n itu d e ( d B ) Magnitude Response (dB)
When processing larger bandwidths, the frequency uncertainty shown above grows as ∆f increases; any frequency that is within the bin can be determined to be present, but precise frequency information is lost. Improvements on the frequency resolution can be made at the expense of bandwidth: if the sampling rate is reduced to 100 kSPS (kilo sample/second); the uncertainty in frequency is reduced to:
∆ =1000004096 ∆ ≅ 24MN
Detection of the test tone can be viewed either in the lower half or upper half of the bin range, as the upper half can be viewed as a reflection of the lower half of the frame. In drawing a parallel, if a similar FFT operation were performed in MATLAB the same results would occur, however the fftshift() function is typically used to center the periodogram around the zero frequency[23]. As we are not verifying that the FFT module was built correctly, but only its implementation within this circuitry, it is adequate to use either frame side in verifying test signals.
3.11.CORDIC
The Coordinate Rotation Digital Computer (CORDIC) is useful for calculation of trigonometric functions, without requiring the usage of hardware multipliers. The CORDIC is slower than a power series approximation of the functions, however is able to accomplish the task using only simple shifts, addition, and subtraction. Given the streaming design approach, the latency of the CORDIC is only significant at startup, after which the data is streaming pipelined.
The CORDIC uses a coarse rotation to bring the data into the first quadrant, performs the CORDIC function, an inverse coarse rotation is applied, and the desired output scaling is applied. For this project, only the vector translation and vector rotation functions are being used. In a vector translation, the (X,Y) coordinate is rotated around the quadrant until the Y vector is zero. At this point, both the resulting X’ vector (magnitude) and θ’ (phase) information can be obtained through the following equations [16]:
xˆ = ‰7 x + Š
‰7 =∏ acos atan 27 1 "6 6•
A similar function is used for the vector rotation, used for converting from polar coordinates to rectangular coordinates. In the context of this design, the vector rotation is used to take the magnitude and phase information of the frequency data, and convert back into the rectangular vectors to be passed to the IFFT. The equations can be expressed as [16]:
xˆ = ‰ 7 x − Ysin Šˆ = ‰ 7 x + Ysin ‰7 =∏ acos atan 27 1 "6 6•
An undesired effect of the CORDIC function is the growth factor Zn (same in both vector translation and vector rotation). The Xilinx CORDIC core can compensate both X and Y coordinates with a factor of 1/Zn, or handled externally to the module. If compensating externally, a generic value of Zn = 1.6467 is generally adequate to approximate the growth through the CORDIC function, as the growth will level to approximately this value within 7-8 iterations, and stabilize through the rest of the algorithm iterations. Due to this function being available within the IP core, there is no reason that correcting the growth factor externally is necessary.
3.12.K-Point Averaging
An averaging function implemented provides data smoothing over K samples of an N-point FFT frame size, intended for use in frequency domain data where the arriving data corresponds to distinct frequency bins. In contrast, a general time domain averaging function uses a consecutive series of data samples that are summed and averaged:
|[!] = }• W[! − •]1
‘" 6•1
, • = 2’, _ = 0,1,2 …
This leads to an efficient implementation in the time domain, as a shift register with an adder chain can easily implement the necessary hardware for this filter. In contrast, for small size FFT transform lengths, a similar strategy can be performed by using a parallel bank of shift registers and adders (one set for each frequency bin), and a multiplexor tree to route data into the correct path based on the FFT output index. However, this strategy consumes large amounts of hardware quickly for two reasons: as the transform length increases the number of parallel chains required increases linearly, and as the number of points required increases, every parallel stage exponentially increases resource consumption.
Instead, by inferring a series of RAM block, each RAM block can be configured to hold one full FFT transform length, and the inherent cycle delay acts as the shift register. By connecting the FFT frame index as a module input, a ring counter is driven to performing circular shifting whenever the maximum transform length is reached; for example, “001” -> “010” -> “100” -> “001” is the sequence used in Figure 3.12-1, a 4-point averaging system. The ring counter output enables each RAM in sequence, so a total of three past transform frames are held, plus the current transform frame.
To form the average, the read address for each RAM block is tied to the frame index, so that on the next cycle the data will be output from the RAM, right shifted by log2(K), and averaged with a one cycle delayed version of the incoming data from the current frame. Note that log2(K) is a fixed value at compile time, it is never evaluated within the synthesized hardware. This new average value, corresponding to a one cycle delay of the FFT transform index, is then stored in the final RAM as the new average transform at a specified index. All that remains is to offset the last RAM’s write address by one cycle, to compensate for the cycle delay used earlier. By using this form, only a single adder chain is necessary, with a width linearly increasing with the data width, and length increases when the number of points used increases.
Figure 3.12-1: 4-Point Averaging Block Diagram
3.13.Magnitude Corrector
The magnitude correction is implemented by capturing a frame ‘snapshot’ of the averaging circuit from Section 3.1.2, representing an average magnitude spectrum across the last N frames of data. When a start pulse is applied to the module, an image of the current averaging block output is built and stored within the corrector module. This image is then played back out of the correction block, and subtracted from a delayed version of the streaming data to apply the correction.
The magnitude corrector consists of: a state machine, counter, accumulator, and a transform length of memory. When the start pulse is applied, the state machine resets itself, generating reset signals to both the accumulator and counter. Post initialization, the internal memory is enabled, and begins tracking the write addresses with the incoming data index. A counter is initialized, and a full frame is written to the RAM, terminating after NFFT counts. By using this method, the frame can start writing at any arbitrary point and finish without frame distortion (introduction of data from adjacent transform frames), assuming that the clocking is the same as the averaging block. The time to write the full transform frame can be shown as:
:F 5“ = :, ∗ †<<: + 1
Running the averaging circuit continuously makes fresh data available to the data correction circuit, so the image can be built at an arbitrary time starting from any valid point within the transform data index. Consequently, the image can be built in a minimal amount of time, which may be important if the time slot window is relatively narrow. If building the image can take a longer time, the averaging could be enabled only during the time when an image is being built:
:F 5“= :, ∗ †<<: + 1 + :, ∗ †<<:k†“467 − 1n ≈ †“467 ∗ :, ∗ †<<:
3.14.Inverse Fast Fourier Transform (IFFT)
The IFFT is used for converting the frequency domain data back into the time domain, and can be generally calculated by the equation:
W[!] =† } x[^]•1 „"I7, ! = 0,1, … , † − 1 „"
I•1
The IFFT calculation involves a factor of 1/N that differs from the FFT implementation, however in all other respects is implemented in the same way that the FFT core is implemented. As a result, when the Xilinx FFT module is put into inverse mode, the scaling must either be handled externally, or as a scaling schedule, handling the growth at the output of each stage. The width of the scaling schedule can be shown for a 4096 point IFFT [17]:
•(a ℎ = 2W c/ 0 †<<:2 h –( •(a ℎ = 2W c/ 0 40962 h –(
Given a 4096 point IFFT, it is necessary to reduce the output of the IFFT core by a factor of 1/4096, or 2-12. By selecting a scaling schedule of AAA16, the requirement for output scaling is accomplished by scaling two bits at the output of every radix stage in the IFFT module. All timing in the IFFT is handled in the same way as the FFT module.
4. System Performance 4.1. Clocking
The Zedboard development board contains an onboard 100MHz oscillator (Fox 767-100-136) [21], connected to pin Y9. This input is routed to the MMCM, and configured for a 100MHz and 40MHz output. Using these outputs, the SPI clocks can be generated for the DAC & ADC at 50MHz and 20MHz respectively, and using the clock divider circuit the 1.11MHz DSP clock can be generated (See Section 4.3 for DSP timing). Simulation provides relevant information in Figure 4.1-1, where the clocking structure has been simulated with a 100MHz input. After initialization, the clk_100mhz and clk_40mhz lines are delayed while the clock manager initializes and sets up, however the output lines may go active before the MMCM has achieved lock (unstable phase). Consequently the lock line from the clock manager is used as a clock enable to path elements, restricting data flow when the phase relationship is not stable.
Figure 4.1-1: MMCM Lock Delay
Functionality of the generated clocks can be measured in hardware by bringing the necessary signals to test connectors, verified in Figures 4.1-2, 4.1-3, and 4.1-4. Note that depending on the specific trace and the oscilloscope bandwidth, some loading does occur on the measurement. This verification is only a functional test, not testing corner conditions of the MMCM or clock divider circuitry.
Figure 4.1-2: 100MHz Clock
Figure 4.1-4: 1.11MHz DSP Clock
4.2. Test Tone Generation
The test tone is generated by outputting a sine wave from the DDS, at a frequency specified within the Xilinx IP Core; the frequency varies depending on the testing being done. The output is a signed full scale sine wave, so an offset is required to be compatible with a unipolar device. For the specified DDS width of 7 bits, a DC offset by 2width-1 shifts all values from the initial signed range (-63 to 63) to 1 to 127, creating a sine wave that is suitable for a unipolar device. This vector can be efficiently shifted to scale the signal as desired, and in this case is left shifted by 3 bits to expand the range by a factor of 8 (8 to 1016). The expected voltage swing can then be calculated:
—D˜™„š = —›œ•žŸj2ei™ši2− e„ ’ž˜8
—D˜™„š 2.5 21016 84096 8 615 —
For the purposes of creating a test signal, it is only important that the DDS output does not exceed the limits of the dynamic range for the DAC; the DC value is not considered to contain any useful information. Choosing an arbitrary value of 102410 with the AD5628-1, the offset can be calculated by adding the arbitrary offset with the mean value of the generated sine wave:
—¡¢ —›œ•žŸj2ež••Dœj 2„e¡¡D_~œ¤„8 —
In Figure 4.2-1, The VDC is measured to be 0.905V, and VAC is measured at 0.624V. Figure 4.2-2 measures the frequency at approximately 49.5 kHz, within the expected values for the measurement resolution. A summary of error is found in Table 4.2-1.
Figure 4.2-1: 50 kHz Test Tone Amplitude
Figure 4.2-2: 50 kHz Test Tone Frequency
Table 4.2-1: DAC Measurement Error
Measurement Frequency (kHz) DC Voltage (V) AC Voltage (V) Expected 50 0.932 0.615 Measured 49.5 0.905 0.624 Error 1.0% 2.9% 0.81%
4.3. Sampling
The ADC is configured to run at a maximum SPI frequency of 20MHz, and consequently the module is clocked at 40MHz to provide the necessary speed. In this design, for a 20MHz SPI clock frequency, the data arrival rate is shown as:
, =† 1
¢’‘:D¥™
Where NCLK is the number of clock cycles per sample (measured at the SPI clock frequency), and TSPI is the SPI clock period. This frequency represents the maximum data arrival rate from the ADC to the remainder of the system, and consequently determines the operating frequency of all following elements in the DSP chain. Referring to Figure 4.3-1, simulation of the SPI controller shows that for a complete SPI cycle to occur, 18 clock cycles are necessary at the SPI clock frequency, or 36 cycles at the module clock frequency (0.36µs at 0.02µs per SPI clock period). In the implementation a 40MHz module clock generates a period TSPI of 50ns and NCLK of 18, the fdata can be calculated:
, 50W101"¦∗ 18 ≅ 1.119MN
Figure 4.3-1: SPI Controller Timing
Using Chipscope, the 12 bits of ADC data can be graphed as signed decimal, shown in Figure 4.3-2. Referring to the data obtained, the ADC has a high value of 152110, a low of 78210, and has a period of 22 ADC samples (TSPI). Using the equations from Section 3.7-2, the values can be converted into the corresponding voltages and frequency. The results are summarized in Table 4.3-1 with error of measurement versus expected.
—¤¢ = —¡¡2e§ ¨2¤¡¢_-™jD− e§678 3.3 21521 7824096 8 595 —
F6E7 H : 1
D¥™†¢’‘ 50.51 ^MN
Figure 4.3-2: Chipscope ADC Results
Table 4.3-1: ADC Measurement Error
Measurement Frequency (kHz) DC Voltage (V) AC Voltage (V)
Expected 50 0.932 0.615
Measured 50.51 0.927 0.595
Error 1.0% 0.53% 3.2%
4.4. DC Cancellation
The DC cancellation can be visualized easily through simulation, shown in Figure 4.4-1. The periodic signal applied is comprised of an AC and DC component. At the start of the simulation all register values are zeroes and the output is equal to the input signal. As clock cycles accumulate, the DC value will raise, fed back, and subtracted from the input until equilibrium is reached, when the DC component reaches the mean value of the input signal. In Figure 4.4-1 due to the large number of points, the signals appear as a solid block instead of an AC signal, due to resolution limitations. The settling time of the DC signal is dependent on the mean value of the input signal.
Figure 4.4-1: DC Cancellation Simulation
Using Chipscope, the signals can be verified for functionality, by tracing the data coming directly from the ADC (blue), AC (red), and DC (green) outputs of the cancellation in Figure 4.4-2. The expectation is that the DC value will be located at the mean of the original signal, and the AC component will be centered on the x-axis. This information can be verified in Figure 4.4-2, where the average DC value has settled around 115210. Using Figure 3.7-3, the DC value detected by the ADC can be shown:
—¡¢ = —¡¡2e2¡¢„ 8 —
—¡¢ 3.3 2115240968 0.928 —
Similarly, the AC component voltages are obtained by comparing the maximum and minimum voltages of the DC cancelled signal (green) from Figure 4.4-2, using values extracted from the Chipscope data.
—¤¢ —›œ•žŸj2e¤¢§ ¨ 2„e¤¢§678 —
Figure 4.4-2: Chipscope Results, DC Cancellation
For test purposes the DC cancellation can also be switched out, using a multiplexor controlled by a switch on the Zedboard. When asserted, the ADC data will be fed directly into the Hilbert FIR block, bypassing the DC cancellation. The effects of the cancellation on the FFT spectrum are shown in Figures 4.4-3 and 4.4-4. The apparent positioning of the FFT is caused by non-triggered sampling of Chipscope, where each sampling run does not guarantee that each data index collected will correspond to a fixed FFT bin range.
Figure 4.4-4: FFT Spectrum, DC Cancellation
As designed, the DC component is intended to always be removed prior to further processing; the ability to use a multiplexor to bypass the module is for demonstration purposes only. Since the DC value of this system is assumed to be only an artifact of the ADC, the DC output is not further used. If at a later point the DC value is needed, the output can be summed back with the AC component post-processing. If the DC component is considered to be very slowly changing, it may be acceptable to introduce the DC component without a delay factor; however the output signal will have some degree of distortion. To entirely preserve the DC component a shift register would be needed to delay the signal appropriately; however the length may be impractical due to the transform length of the FFT/IFFT.
Table 4.4-1: Cancellation Measurement Error
Measurement DC Voltage (V) AC Voltage (V)
Expected 0.932 0.615
Measured 0.928 0.597
Error 0.21% 2.1%
4.5. Hilbert FIR
The Hilbert structure implemented from Section 3.9 is used to provide the in-phase/quadrature signals, generated from the ADC data post DC cancellation. Referring back to the impulse response, for all positive frequencies, we expect that a 90ophase shift will have occurred. Figure 4.5-1 shows the Hilbert FIR simulation, which is being clocked at 100MHz
with a 4.5MHz input (.045 digital frequency). Shown in Figure 4.5-2 is the Chipscope results, showing the in-phase and quadrature outputs of the FIR. Note that the in-phase output (blue) is the same as the input signal, which has been put through a shift register to match up to the cycle latency of the quadrature output.
Figure 4.5-2: Hilbert FIR Output
The in-phase output signal has a peak-to-peak value of 74210, while the quadrature is approximately 34810, representing just over a 3dB loss in power. Comparing with the expected results, there is about 1dB more loss in the implementation compared to the MATLAB generated filter parameters, attributable to the quantization error accumulated during construction of the filter. With this simple test signal, there is no observable smearing; however it should be noted that it does not verify that smearing does not occur, a more complex test setup than available is required.
4.6. FFT/IFFT
A 4096 point, single channel, fully pipelined FFT/IFFT is specified through the Xilinx FFT IP Core. Most options are left as default, except for using convergent rounding, and natural ordering of the output. Natural ordering has the output of the FFT core come out in a normal direct readable form, instead of digit-reversed order that would require further implementation logic to correctly interpret the output of the FFT.
Following from the ∆f found in previous sections, the appropriate bin ranges for the FFT can be calculated from 0 to NFFT/2 as:
67_F © = –(! ∗ ∆
67_F 4“ –(! ∗ ∆ 1
Table 4.6-1 shows example frequency bin ranges for segments 0-11, covering DC to approximately 3.2 kHz. The full table is 2048 entries (NFFT/2), and consequently is not shown; however is easily calculated in an Excel spreadsheet. An identical image will occur in the upper half of the bin ranges referenced against fsample, representing a mirror image of the lower bin ranges. For the 50 kHz test tone, the low image side is expected to peak in bin 184 on the low side (N < NFFT/2), and bin 3912 on the high side (N > NFFT/2). As this is only the FFT data being produced, the magnitude of the FFT core output is not the actual magnitude representation of the signal. The magnitude and phase vectors are calculated in later stages by the CORDIC functions. Verification of the 50 kHz detections is shown in Figure 4.6-1 using available Chipscope data.
Table 4.6-1: Bin Range Examples
Bin Start (Hz) Stop (Hz) Bin Start (Hz) Stop (Hz) 0 0 271 6 1626 1897 1 271 542 7 1897 2168 2 542 813 8 2168 2439 3 813 1084 9 2439 2710 4 1084 1355 10 2710 2981 5 1355 1626 11 2981 3252 Figure 4.6-1: FFT Output, 50 kHz
As a fully streaming configuration, the FFT and IFFT modules have the advantage that each has a throughput of one value per cycle after initial latency. The latency for each module is shown in Table 4.6-2. In this design, all the DSP chain stages are designed to handle one value per cycle, so no chokepoint exists within the chain once data starts entering from the ADC. Consequently, the overall throughput of this system is one value per cycle. At either start-up or under reset conditions, there will be a time delay prior to the first frame of IFFT data. Disregarding the delays of other components, as they are negligible compared to the transform length of the FFT/IFFT, a total latency time of approximately 22ms to first signal transmission occurs, and after that it will appear to be fully streaming. Assuming that the intended receiver has no concept of the original signal timing (no ability to detect the delay), the initial start-up delay remains the only indication.
Table 4.6-2: FFT Latency Module IP Version Transform Cycles Latency (ms) FFT 7.1 12446 11.2 IFFT 7.1 12444 11.2
The resource consumption is shown in Table 4.6-3. The FFT and IFFT modules are instantiated separately in this design and consequently will consume about one third of the 220 available hardware DSP slices, and about ten percent of the available block memory. In designs that do not require simultaneous FFT and IFFT operations (burst data) a single instance could be used, and toggled between the two modes to process data at higher clock frequencies.
Table 4.6-3: FFT Resource Consumption
Module IP Version XtremeDSP Slices 18k Block RAMs FFT 7.1 45 24 IFFT 7.1 30 30 4.7. CORDIC
Using the FFT chain testbench, with a 100MHz sampling frequency, 10MHz test tone, and a 64 point FFT, the expectation is that bin 6 and 58 will contain the majority of the signal power; this information is verified in Figure 4.7-1. Using this information, at the output of the CORDIC it is necessary to keep track of the index, as the CORDIC will cause a shift delay to occur, due to the propagation through the CORDIC block. Referring to the Xilinx implementation details, each sample will take 34 cycles to propagate, using a balanced pipeline structure.
Figure 4.7-1: FFT Output
Using Figure 4.7-2, the first CORDIC stage is used to translate the in-phase and quadrature components into magnitude and phase vectors. As expected, the test tone still appears in bins 6 and 58; the resulting magnitude vector can be calculated:
9 0!( ªa> = «! − ¬ℎ > + -ª ab ªb> = 16453 489 16460
Note that in the FFT chain testbench model used, the CORDIC used only takes 28 cycles to complete, however in the actual implementation will require more, due to the different data sizes. For implementation, the data width is slightly different due to the initially simulated data widths, and consequently the output will occur a few cycles delayed compared with the FFT. As compensation, the indexes are shifted by the appropriate difference to maintain the correct index positioning.
Figure 4.7-2: CORDIC Translation Output
After correction has been applied to the magnitude vector (Figure 2.1-1), the magnitude and phase vectors are rotated back to form the in-phase and quadrature complex signals. With no correction being applied, the full conversion from in-phase/quadrature to magnitude/phase and back to in-phase/quadrature is shown in Figure 4.7-3. Note that through the magnitude correction blocks there is only a two cycle delay between indices due to the construction of the correction, which is represented by the second set of three CORDIC_translate vectors. This delayed version of the CORDIC output is used in the later corrector stage for summation.