The Importance of Good Documentation - Prototype Testing and Debugging

3.3 Prototype Testing and Debugging

3.3.6 The Importance of Good Documentation

In summary, most of the problems we had with getting the hardware working were related to poor documentation and to intermittent hardware failures. The AIC23 datasheet (see [7]) contains a great deal of information, but it does leave gaping holes in certain areas. We hope that our Detailed Guide document (see Appendix D) will help to explain certain things that this datasheet leaves out. Intermittent hardware failures are extremely difficult to detect, but diligence and methodical testing minimizes their occurrence.

At this point, most of our hardware was built and functional, so our efforts became primarily software-related. We spent some time gathering all of our working code into a set of software libraries, as discussed earlier. Once this software library infrastructure was in place, we began the process of rewriting the ECE 4703 laboratories and code to run on our new hardware platform. We also began the process of benchmarking and measuring performance of our system, which is the topic of the next section.

4 Performance Evaluation

The dsPIC33F must have sufficient computational abilities in order to be feasible for adop-tion into the ECE 4703 curriculum. If its DSP performance is too limited, the dsPIC33F platform will be impractical for use in an introductory real-time DSP course. For example, a certain amount of floating-point capability is required because floating-point arithmetic is significantly easier for students to program than fixed-point. Students start with simple floating-point systems and gradually add layers of complexity, working up towards fixed-point implementations. If the students were forced to start with fixed-fixed-point math because the hardware could not accommodate floating-point, students would have a tremendous amount of difficulty, since they would be unable to gradually build up complexity over time.

The floating-point laboratories are intended to prepare the students for the transition to fixed-point. The hardware’s overall performance must be sufficient to provide a suitable basis for practical, thought-provoking, and educational assignments.

After building the hardware prototype, we performed a considerable amount of compu-tational performance analysis to ensure that our dsPIC33F platform would be practical for laboratory assignments. Figure 2 summarizes the numbers of FIR filter coefficients possible for different datatypes as a function of the sampling frequency. (Note that the number of coefficients scales linearly with respect to the sampling period but is inversely proportional to the sampling frequency, since Ts = ¹₂/fs.) The following section explains the results of our performance evaluation and research in further detail.

4.1 Floating-Point Performance

Although the dsPIC33F lacks a hardware floating-point unit, the MPLAB C30 C compiler comes with a set of software floating-point libraries. The drawback of doing floating-point processing in software is that it is much slower (by at least two orders of magnitude) than using fixed-point arithmetic. However, even though the software floating-point performance

Figure 2: Summary of dsPIC33F Computational Performance

is poor, we found that the floating-point performance should still be sufficient for the first filtering lab. Once the students get their filters working in floating-point, they will quickly be able to switch to fast fixed-point processing for the subsequent labs.

We performed a few simple tests to measure the performance of the floating-point in-structions. We set up one of the dsPIC33F’s hardware timer peripherals and used it like a stopwatch to measure the amount of time various floating-point operations took to execute.

For example, we found that single-precision addition instruction required 120 cycles and that a single-precision multiplication instruction required 117 cycles. More details about the process of configuring the dsPIC33F’s timer and using it for benchmarking can be found in the Laboratory #4 Procedure document provided in Appendix B.

It turns out that Microchip’s website [20] already provides a summary of the performance of their software floating-point libraries. A few of the more interesting numbers from the website are summarized in Table 3. The benchmarks provided by Microchip are quite close (within a few clock cycles) to the numbers we measured ourselves; the slight discrepancy is undoubtedly due to the fact that our measurements also include the clock cycles required to

move the 32-bit operands in and out of working registers.

Table 3: Floating-Point benchmarks (from Microchip’s website [20]) Floating-Point Operation Clock Cycles

addition 122

subtraction 124

multiplication 109

division 361

remainder 385

cosine 3249

sine 2238

exp 530

log 2889

sqrt 493

The audio samples sent back and forth between the AIC23 audio codec and the dsPIC are generally 16-bit signed integers. If we want to perform our signal processing in floating-point, we first need to cast the received integer samples as floating-point values. After our processing is done, we need to cast the floating-point values back into integers. We were curious how many clock cycles these casting operations required, so we benchmarked them.

The results are summarized in Table 4. Notice that even the casting operations require a significant number of clock cycles.

When using a 40MHz instruction clock rate and an 8kHz audio sampling rate, there are (40E+6)/(8E+3) = 5000 clock cycles available to process the data in real-time. We measured that a single floating-point multiply-and-accumulate (MAC) operation requires about 272 cycles. If we perform the division, we see that we can only perform about 18 MAC operations and maintain real-time operation. If we take into account the cycles required to cast to and from floating-point, this number becomes about 16 MAC operations. This roughly translates

Table 4: Benchmarks for casting between floating-point and integer datatypes

Type of cast Clock Cycles

float rightarrow 16-bit signed int 138 float rightarrow 16-bit unsigned int 136 16-bit signed int rightarrow float 185 16-bit unsigned int rightarrow float 193

to an FIR filter that is approximately 16 coefficients in length, or a DFII-SOS IIR filter that is about 8 coefficients in length.

To summarize, the software floating-point performance is abysmal. Even at the low, 8kHz sampling rate, the maximum possible filter order is quite small. Moreover, this performance cannot be improved using the optimizing C compiler because the floating-point software libraries are already optimized; floating-point emulation simply takes many clock cycles to perform. However, even though the performance is not great, it should still be sufficient for the purposes of the ECE 4703 laboratories. The students will be able to learn how to successfully implement their filters using all of the basic techniques, such as circular buffering and efficient convolution. After mastering the basics, the students will be equipped to move on to fixed-point arithmetic. The floating-point exercises are purely academic it is with the fixed-point processing that much higher filter orders are possible, allowing for more exciting signal processing applications.

In document Educational Signal Processing Platform (Page 38-42)