Overflow and Solutions - DSP Fundamentals and Implementation

DSP Fundamentals and Implementation

3.6 Overflow and Solutions

Assuming that the input signals and filter coefficients have been properly normalized (in the range of 1 and 1) for fixed-point arithmetic, the addition of these two B-bit numbers will always produce a B-bit sum. Therefore no roundoff error is introduced by addition. Unfortunately, when these two B-bit numbers are added, the sum may fall outside the range of 1 and 1. The term overflow is a condition in which the result of an arithmetic operation exceeds the capacity of the register used to hold that result.

For example, assuming a 3-bit fixed-point hardware with fractional 2's complement data format (see Table 3.4) is used. If x₁ 0:75 (011 in binary form) and x₂ 0:25 (001), the binary sum of x₁ x₂is 100. The decimal value of the binary number 100 is 1, not the correct answer 1. That is, when the result exceeds the full-scale range of the register, overflow occurs and unacceptable error is produced. Similarly, subtraction may result in underflow.

When using a fixed-point processor, the range of numbers must be carefully examined and adjusted in order to avoid overflow. For the FIR filter defined in (3.1.16), this overflow results in the severe distortion of the output y(n). For the IIR filter defined in (3.2.18), the effect is much more serious because the errors are fed back and render the filter useless. The problem of overflow may be eliminated using saturation arithmetic and proper scaling (or constraining) signals at each node within the filter to maintain the magnitude of the signal.

3.6.1 Saturation Arithmetic

Most commercially available DSP devices (such as the TMS320C55x) have mechanisms that protect against overflow and indicate if it occurs. Saturation arithmetic prevents a

OVERFLOW AND SOLUTIONS 103

x y

−1

1 − 2^−M 1 − 2^−M

Figure 3.14 Transfer characteristic of saturation arithmetic

result from overflowing by keeping the result at a maximum (or minimum for an underflow) value. Saturation logic is illustrated in Figure 3.14 and can be expressed as

1 2 ^M, x, 1,

x 1 2 ^M 1 x < 1 x < 1, 8>

>: 3:6:1

where x is the original addition result and y is the saturated adder output. If the adder is under saturation mode, the undesired overflow can be avoided since the 32-bit accu-mulator fills to its maximum (or minimum) value, but does not roll over. Similar to the previous example, when 3-bit hardware with saturation arithmetic is used, the addition result of x1 x2 is 011, or 0.75 in decimal value. Compared with the correct answer 1, there is an error of 0.25. However, the result is much better than the hardware without saturation arithmetic.

Saturation arithmetic has a similar effect to `clipping' the desired waveform. This is a nonlinear operation that will add undesired nonlinear components into the signal. There-fore saturation arithmetic can only be used to guarantee that overflow will not occur. It is not the best, nor the only solution, for solving overflow problems.

3.6.2 Overflow Handling

As mentioned earlier, the TMS320C55x supports the data saturation logic in the data computation unit (DU) to prevent data computation from overflowing. The logic is enabled when the overflow mode bit (SATD) in status register ST1 is set (SATD 1).

When the mode is set, the accumulators are loaded with either the largest positive 32-bit value (0x00 7FFF FFFF) or the smallest negative 32-bit value (0xFF 8000 0000) if the result overflows. The overflow mode bit can be set with the instruction

bset SATD

and reset (disabled) with the instruction bclr SATD

The TMS320C55x provides overflow flags that indicate whether or not an arithmetic operation has exceeded the capability of the corresponding register. The overflow flag ACOVx, (x 0, 1, 2, or 3) is set to 1 when an overflow occurs in the corresponding accumulator ACx. The corresponding overflow flag will remain set until a reset is performed or when a status bit clear instruction is implemented. If a conditional instruction that tests overflow status (such as a branch, a return, a call, or a conditional execution) is executed, the overflow flag is cleared. The overflow flags can be tested and cleared using instructions.

3.6.3 Scaling of Signals

The most effective technique in preventing overflow is by scaling down the magnitudes of signals at certain nodes in the DSP system and then scaling the result back up to the original level. For example, consider the simple FIR filter illustrated in Figure 3.15(a).

Let xn 0:8 and xn 1 0:6, the filter output yn 1:2. When this filter is implemented on a fixed-point DSP hardware without saturation arithmetic, undesired overflow occurs and we get a negative number as a result.

As illustrated in Figure 3.15(b), the scaling factor, b < 1, can be used to scale down the input signal and prevent overflowing. For example, when b 0:5 is used, we have xn 0:4 and xn 1 0:3, and the result yn 0:6. This effectively prevents the hardware overflow. Note that b 0:5 can be implemented by right shifting 1 bit.

If the signal x(n) is scaled by b, the corresponding signal variance changes to b²s²_x. Thus the signal-to-quantization-noise ratio given in (3.5.8) changes to

SNR 10 log₁₀ b²s²_x s²_e

4:77 6:02B 10 log₁₀s²_x 20 log₁₀b: 3:6:2

Since we perform fractional arithmetic, b < 1 is used to scale down the input signal. The term 20 log₁₀b has negative value. Thus scaling down the amplitude of the signal reduces the SNR. For example, when b 0:5, 20 log₁₀b 6:02 dB, thus reducing the SNR of the input signal by about 6 dB. This is equivalent to losing 1-bit in representing the signal.

Figure 3.15 Block diagram of simple FIR filters: (a) without scaling, and (b) with scaling factor b

OVERFLOW AND SOLUTIONS 105

Therefore we have to keep signals as large as possible without overflow. In the FIR filter given in Figure 3.6, a scaling factor, b, can be applied to the input signal, x(n), to prevent overflow during the computation of y(n) defined in (3.1.16). The value of signal y(n) can be bounded as

jynj bX^{L 1}

where Mx is the peak value of x(n) defined in (3.2.3). Therefore we can ensure that jynj < 1 by choosing

b 1 can be calculated using the MATLAB statement

bsum sum(abs(b));

where b is the coefficient vector.

Scaling the input by the scaling factor given in (3.6.4) guarantees that overflow never occurs in the FIR filter. However, the constraint on b is overly conservative for most signals of practical interest. We can use a more relaxed condition

b 1

Other scaling factors that may be used are based on the frequency response of the filter (will be discussed in Chapter 4). Assuming that the reference signal is narrowband, overflow can be avoided for all sinusoidal signals if the input is scaled by the maximum magnitude response of the filter. This scaling factor is perhaps the easiest to use, especially for IIR filters. It involves calculating the magnitude response and then selecting its maximum value.

An IIR filter designed by a filter design package such as MATLAB may have some of its filter coefficients greater than 1.0. To implement a filter with coefficients larger than 1.0, we can also scale the filter coefficients instead of changing the incoming signal. One common solution is to use a different Q format instead of the Q15 format to represent the filter coefficients. After the filtering operation is completed, the filter output needs to be scaled back to the original signal level. This issue will be discussed in the C55x experiment given in Section 3.8.5.

The TMS320C55x provides four 40-bit accumulators as introduced in Chapter 2.

Each accumulator is split into three parts as illustrated in Figure 3.16. The guard bits are used as a head-margin for computations. These guard bits prevent overflow in iterative computations such as the FIR filtering of L 256 defined in (3.1.16).

b39−b32 b31−b16 b15−b0

G H L

Guard bits High-order bits Low-order bits Figure 3.16 TMS320C55x accumulator configuration

Because of the potential overflow in a fixed-point processor, engineers need to be concerned with the dynamic range of numbers. This requirement usually demands a greater coding effort and testing using real data for a given application. In general, the optimum solution for the overflow problem is by combining scaling factors, guard bits, and saturation arithmetic. The scaling factors are set as large as possible (close to but smaller than 1) and occasional overflow can be avoided by using guard bits and saturation arithmetic.

In document pdf DSP - Real Time Digital Signal Processing (Page 116-120)