Fixed-Point Performance - Educational Signal Processing Platform

We also used the hardware timer to measure the performance of basic fixed-point arithmetic operations. Since the students would be implementing the filters in straight C code for most of the laboratories (except for Laboratory #4, the assembly code optimization lab), we performed these benchmarks using C code (not assembly). The results of these simple tests

are summarized in Table 5. It is not surprising that the long integer (32-bit) operations take longer than the regular 16-bit operations because the dsPIC33F is a 16-bit microprocessor and is hence designed for 16-bit math. Also notice that many of these operations, such as 16-bit additions and 16-bit multiplications, take four cycles to execute. However, from the dsPIC33F documentation, we know that these 16-bit operations should require only a single cycle. The discrepancy is due to the fact that our measurements also reflect the clock cycles required to move the operads in and out of the working registers (we can confirm this by taking a look at the disassembly listing window in MPLAB).

Table 5: Benchmarks for basic fixed-point operations Fixed-Point Operation Clock Cycles 16-bit integer addition 4

16-bit integer multiply 4 32-bit integer addition 8 32-bit integer multiply 14

If we follow a similar analysis to that above in the floating-point performance section, we see that if there are 5000 clock cycles available for real-time processing at an 8kHz sampling rate, we can perform over 700 multiply-and-accumulate (MAC) operations. This is significantly better than the roughly 16 MAC operations that we could achieve with floating-point arithmetic.

However, there is still room for improvement. We know from Microchip’s documentation that the dsPIC33F should be able to perform a MAC operation in a single clock cycle.

The C code in our basic test above obviously did not perform the MAC operations this efficiently, since they required substantial overhead to move data in and out of working registers. We could write hand-optimized assembly code to use the dsPIC’s addressing modes and data paths much more efficiently; such is the focus of Laboratory Assignment #4 (see Appendix B). Alternatively, if we want to ignore the gory details of hand-coded assembly,

we can get significant performance increases by turning on the optimizing C compiler.

Although the free student version of the MPLAB C30 compiler does not perform all of the DSP-specific instruction optimizations, it still provides tremendous performance im-provements over unoptimized compiled C code. For example, without optimization, the 11-coefficient FIR filter from Laboratory #3 required 618 clock cycles to execute. With full optimization enabled, the same filter required only 122 cycles to execute. Thus, the optimizing C compiler improved performance by approximately a factor of 5.

Microchip also provides an excellent set of free DSP libraries with its MPLAB C30 compiler. These libraries consist of C-callable assembly functions for various types of filters, transforms, and matrix arithmetic. These libraries are efficient, highly-optimized, and easy to use. For example, the example source code for Laboratory #6 (see Appendix C) uses the FIR filter from libraries, and as a result, all of the vocoder processing (three 100-coefficient filters and two modulations) requires fewer than 580 clock cycles to complete.

Microchip’s website [11] provides a table which summarizes the capabilities of the DSP libraries and the required numbers of clock cycles. Table 6 summarizes a few of the more relevant DSP functions. The term “block” indicates that each type of filter can be used to calculate filter results for a range of ‘N’ time values. In the real-time systems of the ECE 4703 laboratories, we generally only compute the output value for one instant in time each time that the DCI interrupt service routine gets called. Therefore, we set N=1, and the equation for the number of clocks required for the FIR filtering becomes 57+M. We benchmarked this FIR filter code and obtained results that were within a few clock cycles of the predicted value (57+M). Thus, Microchp’s DSP library FIR filter code is extremely efficient, scaling linearly with the number of coefficients (M) and requiring only a modest amount of initialization overhead (57 clock cycles).

We can use the 57+M equation to extrapolate to find the maximum FIR filter order that can be used for real-time operation. Assuming a 40MHz instruction clock, we can achieve up to about 2400 coefficients at a 16kHz sampling rate or 800 coefficients at 48kHz.

Table 6: MPLAB C30 DSP Libray Performance (from Microchip’s website [11]) DSP Function Clock Cycles Required

Block FIR 53 + N*(4+M)

Block IIR Canonic 36 + N*(8+7*S) Block IIR Lattice 46 + N*(16+7*M)

Complex FFT Test case: N=64 requires 3739 cycles Key: N = # of samples, M = # of taps, S = # of sections

In conclusion, there are several ways to write efficient fixed-point code for the dsPIC33F.

Our fixed-point performance analysis has showed that the dsPIC33F excels in fixed-point arithmetic, and it is more than powerful enough for the purposes of the ECE 4703 curriculum.

The real-time vocoder algorithm of Laboratory #6 successfully illustrates that the dsPIC33F is computationally powerful enough to accommodate interesting and educational laboratory assignments.

5 Laboratory Redesign

The goal of this project was to develop a platform which could duplicate the pedagogical functionality (this is, to teach students about digital signal processing) of the expensive TI DSK in a cheaper, easily constructed package. The success of this hinged on re-examining the current curriculum for ECE 4703 and through testing the dsPIC hardware, it was determined that many of the laboratory assignments would have to be changed in order to create the same educational value while at the same time taking into account the performance limitations of the dsPIC. The most significant limitation being that the dsPIC has no dedicated hardware for performing floating point operations.

A summary of the major differences beween the original DSK labs and the new dsPIC labs is given in Table 7. Appendix B contains the new laboratory procedure documents.

The next several paragraphs explain the major changes in greater detail.

5.1 Lab 1: Development Environment

Lab 1 was mostly focused on familiarizing oneself with the DSK and Code Composer Studio.

It involved using switches to make LEDs turn on and understanding the basic functionality of the board. The revised Lab 1 is mostly the same - it provides a functional tutorial of the MPLAB development environment, introduces students to the dsPIC hardware, and has them run simple C code to make an LED turn on and off using a switch. In addition, it now has a emphasis on the UART protocol which the dsPIC supports. The assignment in-volves the students setting up and testing the UART interface, which becomes an invaluable debugging tool that they may use in all of the subsequent labs. This is a major deviation from the TI-based assignment, since the CCS automatically diverted printf() commands to the console in the development environment. MPLAB has no such functionality, so the UART interface is used in its place. This is educationally valuable since UART is a very common, standardized communication protocol and familiarity with it will give students

an-Table 7: Comparison between original labs and new labs

Lab Original Version New Version

Lab #1 Intro to CCS IDE LED and DIPs

Hello, World! (console) CCS Grapher

Intro to MPLAB IDE LEDs and DIPs UART

Plot data with MATLAB Lab #2 Floating-point FIR

Floating-point IIR

Floating-point FIR Floating-point IIR Lab #3 Fixed-point FIR

Fixed-point IIR

Fixed-point FIR Fixed-point IIR Lab #4 Assembly optimization

Floating-point

Assembly optimization Fixed-point

Benchmarking techniques Lab #5 Floating-point FFT Fixed-point FFT

Lab #6 Adaptive filter Vocoder

other tool to use in their future classes, projects, or careers. The original Lab 1 also explored Code Composer Studio’s capabilities for graphing data from microprocessor memory. Since MPLAB does not have built-in graphing tools, the revised Lab 1 suggests that the students extract the data using either the debugger (Watch Window) or the UART, and then plot it using MATLAB. This practice using MATLAB will be a helpful refresher for the students in preparation for subsequent labs.

In document Educational Signal Processing Platform (Page 42-48)