Quantization Errors - DSP Fundamentals and Implementation

DSP Fundamentals and Implementation

3.5 Quantization Errors

As discussed in Section 3.4, digital signals and system parameters are represented by a finite number of bits. There is a noticeable error between desired and actual results ± the finite-precision (finite wordlength, or numerical) effects. In general, finite-precision effects can be broadly categorized into the following classes:

1. Quantization errors a. Input quantization b. Coefficient quantization 2. Arithmetic errors

a. Roundoff (truncation) noise b. Overflow

The limit cycle oscillation is another phenomenon that may occur when implementing a feedback system such as an IIR filter with finite-precision arithmetic. The output of the system may continue to oscillate indefinitely while the input remains 0. This can happen because of quantization errors or overflow.

This section briefly analyzes finite-precision effects in DSP systems using fixed-point arithmetic, and presents methods for confining these effects to acceptable levels.

3.5.1 Input Quantization Noise

The ADC shown in Figure 1.2 converts a given analog signal x(t) into digital form x(n).

The input signal is first sampled to obtain the discrete-time signal x(nT). Each x(nT) value is then encoded using B-bit wordlength to obtain the digital signal x(n), which consists of M magnitude bits and one sign-bit as shown in Figure 3.11. As discussed in Section 3.4, we assume that the signal x(n) is scaled such that 1 xn < 1. Thus the full-scale range of fractional numbers is 2. Since the quantizer employs B bits, the number of quantization levels available for representing x(nT) is 2^B. Thus the spacing between two successive quantization levels is

D full-scale range

number of quantization levels 2

2^B 2 ^B1 2 ^M, 3:5:1

which is called the quantization step (interval, width, or resolution).

Common methods of quantization are rounding and truncation. With rounding, the signal value is approximated using the nearest quantization level. When truncation is used, the signal value is assigned to the highest quantization level that is not greater than the signal itself. Since the truncation produces bias effect (see exercise problem), we use rounding for quantization in this book. The input value x(nT) is rounded to the nearest level as illustrated in Figure 3.12. We assume there is a line between two quantization levels. The signal value above this line will be assigned to the higher quantization level, while the signal value below this line is assigned to the lower level. For example, the

000 001 010 011

Quantization level

Time, t x(t)

0 T 2T

∆ / 2 e(n)

∆

Figure 3.12 Quantization process related to ADC

discrete-time signal x(T) is rounded to 010, since the real value is below the middle line between 010 and 011, while x(2T) is rounded to 011 since the value is above the middle line.

The quantization error (noise), e(n), is the difference between the discrete-time signal, x(nT), and the quantized digital signal, x(n). The error due to quantization can be expressed as

en xn xnT: 3:5:2

Figure 3.12 clearly shows that

jenj D

2: 3:5:3

Thus the quantization noise generated by an ADC depends on the quantization interval.

The presence of more bits results in a smaller quantization step, therefore it produces less quantization noise.

From (3.5.2), we can view the ADC output as being the sum of the quantizer input x(nT) and the error component e(n). That is,

xn QxnT xnT en, 3:5:4

where Q[] denotes the quantization operation. The nonlinear operation of the quantizer is modeled as a linear process that introduces an additive noise e(n) to the discrete-time signal x(nT) as illustrated in Figure 3.13. Note that this model is not accurate for low-amplitude slowly varying signals.

For an arbitrary signal with fine quantization (B is large), the quantization error e(n) may be assumed to be uncorrelated with the digital signal x(n), and can be assumed to be random noise that is uniformly distributed in the interval ^D₂,^D₂

. From (3.3.13), we can show that

Een D=2 D=2

2 0: 3:5:5

QUANTIZATION ERRORS 99

+ x(n) + Σ

e(n)

x(nT )

Figure 3.13 Linear model for the quantization process

That is, the quantization noise e(n) has zero mean. From (3.3.14) and (3.5.1), we can show that the variance

s²_e D² 122 ^2B

3 : 3:5:6

Therefore the larger the wordlength, the smaller the input quantization error.

If the quantization error is regarded as noise, the signal-to-noise ratio (SNR) can be expressed as

SNR s²_x

s²_e 3 2^2Bs²_x, 3:5:7

where s²_x denotes the variance of the signal, x(n). Usually, the SNR is expressed in decibels (dB) as

SNR 10 log₁₀ s²_x s²_e

10 log₁₀3 2^2Bs²_x

10 log₁₀3 20B log₁₀2 10 log₁₀s²_x

4:77 6:02B 10 log₁₀s²_x 3:5:8

This equation indicates that for each additional bit used in the ADC, the converter provides about 6-dB signal-to-quantization-noise ratio gain. When using a 16-bit ADC (B 16), the SNR is about 96 dB. Another important fact of (3.5.8) is that the SNR is proportional to s²_x. Therefore we want to keep the power of signal as large as possible.

This is an important consideration when we discuss scaling issues in Section 3.6.

In digital audio applications, quantization errors arising from low-level signals are referred to as granulation noise. It can be eliminated using dither (low-level noise) added to the signal before quantization. However, dithering reduces the SNR. In many applica-tions, the inherent analog audio components (microphones, amplifiers, or mixers) noise may already provide enough dithering, so adding additional dithers may not be necessary.

If the digital filter is a linear system, the effect of the input quantization noise alone on the output may be computed. For example, for the FIR filter defined in (3.1.16), the variance of the output noise due to the input quantization noise may be expressed as

s²_y;e s²_eX^{L 1}

b²_l: 3:5:9

This noise is relatively small when compared with other numerical errors and is deter-mined by the wordlength of ADC.

Example 3.5: Input quantization effects may be subjectively evaluated by observ-ing and listenobserv-ing to the quantized speech. A speech file called timitl.asc (included in the software package) was digitized using f_s 8 kHzand B 16.

This speech file can be viewed and played using the MATLAB script:

load(timitl.asc);

plot(timitl);

soundsc(timitl, 8000, 16);

where the MATLAB function soundsc autoscales and plays the vector as sound.

We can simulate the quantization of data with 8-bit wordlength by qx round(timitl/256);

where the function, round, rounds the real number to the nearest integer. We then evaluate the quantization effects by

plot(qx);

soundsc(qx, 8000, 16);

By comparing the graph and sound of timitl and qx, the quantization effects may be understood.

3.5.2 Coefficient Quantization Noise

When implementing a digital filter, the filter coefficients are quantized to the word-length of the DSP hardware so that they can be stored in the memory. The filter coefficients, b_l and a_m, of the digital filter defined by (3.2.18) are determined by a filter design package such as MATLAB for given specifications. These coefficients are usually represented using the floating-point format and have to be encoded using a finite number of bits for a given fixed-point processor. Let b⁰_l and a⁰_m denote the quantized values corresponding to bl and am, respectively. The difference equation that can actually be implemented becomes

yn X^{L 1}

b⁰_lxn l X^M

a⁰_myn m: 3:5:10

This means that the performance of the digital filter implemented on the DSP hardware will be slightly different from its design specification. Design and implementation of digital filters for real-time applications will be discussed in Chapter 5 for FIR filters and Chapter 6 for IIR filters.

If the wordlength B is not large enough, there will be undesirable effects. The coefficient quantization effects become more significant when tighter specifications are used. This generally affects IIR filters more than it affects FIR filters. In many applications, it is desirable for a pole (or poles) of IIR filters to lie close to the unit circle.

QUANTIZATION ERRORS 101

Coefficient quantization can cause serious problems if the poles of desired filters are too close to the unit circle because those poles may be shifted on or outside the unit circle due to coefficient quantization, resulting in an unstable implementation. Such undesir-able effects due to coefficient quantization are far more pronounced when high-order systems (where L and M are large) are directly implemented since a change in the value of a particular coefficient can affect all the poles. If the poles are tightly clustered for a lowpass or bandpass filter with narrow bandwidth, the poles of the direct-form realiza-tion are sensitive to coefficient quantizarealiza-tion errors. The greater the number of clustered poles, the greater the sensitivity.

The coefficient quantization noise is also affected by the different structures for the implementation of digital filters. For example, the direct-form implementation of IIR filters is more sensitive to coefficient quantization errors than the cascade structure consisting of sections of first- or second-order IIR filters. This problem will be further discussed in Chapter 6.

3.5.3 Roundoff Noise

As shown in Figure 3.3 and (3.1.11), we may need to compute the product yn axn

in a DSP system. Assuming the wordlength associated with a and x(n) is B bits, the multiplication yields 2B bits product y(n). For example, a 16-bit number times another 16-bit number will produce a 32-bit product. In most applications, this product may have to be stored in memory or output as a B-bit word. The 2B-bit product can be either truncated or rounded to B bits. Since truncation causes an undesired bias effect, we should restrict our attention to the rounding case.

In C programming, rounding a real number to an integer number can be implemented by adding 0.5 to the real number and then truncating the fractional part. For example, the following C statement

y (int)(x+0.5);

rounds the real number x to the nearest integer y. As shown in Example 3.5, MATLAB provides the function round for rounding a real number.

In TMS320C55x implementation, the CPU rounds the operands enclosed by the rnd( ) expression qualifier. For example,

mov rnd(HI(AC0)), *AR1

This instruction will round the content of the high portion of AC0(31:16)and the rounded 16-bit value is stored in the memory location pointed at by AR1. Another key word, R (or r), when used with the operation code, also performs rounding operation on the operands. The following is an example that rounds the product of AC0 and AC1 and stores the rounded result in the upper portion of the accumulator AC1(31:16)and the lower portion of the accumulator AC1(15:0) is cleared:

mpyr AC0, AC1

The process of rounding a 2B-bit product to B bits is very similar to that of quantiz-ing discrete-time samples usquantiz-ing a B-bit quantizer. Similar to (3.5.4), the nonlinear

operation of product roundoff can be modeled as the linear process shown in Figure 3.13. That is,

yn Qaxn axn en, 3:5:11

where axn is the 2B-bit product and e(n) is the roundoff noise due to rounding 2B-bit product to B-bit. The roundoff noise is a uniformly distributed random process in the interval defined in (3.5.3). Thus it has a zero-mean and its power is defined in (3.5.6).

It is important to note that most commercially available fixed-point DSP devices such as the TMS320C55x have double-precision (2B-bit) accumulator(s). As long as the program is carefully written, it is quite possible to ensure that rounding occurs only at the final stage of calculation. For example, consider the computation of FIR filter output given in (3.1.16). We can keep all the temporary products, blxn l for l 0, 1, . . . , L 1, in the double-precision accumulator. Rounding is only performed when computation is completed and the sum of products is saved to memory with B-bit wordlength.

In document pdf DSP - Real Time Digital Signal Processing (Page 111-116)