Quantization Effects in Digital Filters

(1)

Quantization Effects in Digital

Filters

(2)

In two's complement representation an exact number would have infinitely many bits (in general). When we limit the number of bits

to some finite value we have truncated all the lower-order bits. Suppose the number is 23/64. In two's complement binary this is 0.0101110000... If we now limit the number of digits to 5 we get 0.0101 which represents exactly 20/64 or 5/16. The truncation error is 5 /16 - 2

(

)

3 / 64 3 / 64. If the exact number is 15 / 64, this is 1.110001000... in two's complement. Truncating to 5 bits yields 1.1100 which exactly represents 1/ 4. The error is 1/ 4

15 / 64 1/ 64. This illustra

= − −

− −

− − = − tes that in two's complement

truncation the error is always negative.

(3)

Distribution of Truncation Errors

In analysis of truncation errors we make the following assumptions.

1. The distribution of the errors is uniform over a range equivalent to the value of one LSB.

2. The error is a white noise process. That means that one error is not correlated with the next error or any other error.

3. The truncation error and the signal whose values are being truncated are not correlated.

(4)

Quantization of Filter

Coefficients

( )

0 1

Let a digital filter's ideal transfer function be

H

1

If the 's and 's are quantized, the poles and zeros move, creating a new transfer function

N k k k N k k k b z z a z a b − = − = = +

∑

( )

0 1 ˆ ˆ H ˆ 1 ˆ ˆ where and . N k k k N k k k k k k k k k b z z a z a a a b b b − = − = = + = + ∆ = + ∆

∑

(5)

Coefficients

( )

(

)

( )

(

)

1 1 1 1 1

The denominator can be factored into the form D 1 1 and, with quantization,

ˆ _ˆ _ˆ D 1 1 ˆ where . T N N k k k k k k N N k k k k k k k k z a z p z z a z p z p p p − − = = − − = = = + = − = + = − = + ∆

∑

∏

∑

∏

1 2 1 1 2

he errors in the pole locations are related to the errors in the coefficients by

N i i i i i N k k N k p p p p p a a a a a a a ₌ a ∂ ∂ ∂ ∂ ∆ = ∆ + ∆ + + ∆ = ∆ ∂ ∂ " ∂

∑

∂

(6)

Quantization of Filter

Coefficients

( )

The partial derivatives can be found from

D D or D / D / D 1 i i i i i i k _{z p} _{z p} k k _{z p} i k _{z p} k _{z p} k z z _p a z a z a p a z z z a a = = = = = ⎡∂ ⎤ ⎡∂ ⎤ _∂ = × ⎢ _∂ ⎥ ⎢ _∂ ⎥ _∂ ⎣ ⎦ ⎣ ⎦ ⎡∂ ∂ ⎤ ⎣ ⎦ ∂ = ∂ ⎡_⎣∂ ∂ ⎤_⎦ ⎡∂ ⎤ _∂ = ⎢ _∂ ⎥ _∂ ⎣ ⎦

( )

(

)

1 1 1 D 1 i i i i N k k k k _{z p} i k _{z p} N q q z p z p a z z p z p z z z − − − = = ₌ − = = = ⎡ ₊ ⎤ ₌ _⎡ _⎤ ₌ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎡ ⎤ ⎡∂ ⎤ _∂ = _⎢ − _⎥ ⎢ _∂ ⎥ _∂ ⎣ ⎦ ⎣ ⎦

∑

∏

(7)

Coefficients

(

)

( )

(

)

(

)

1 2 1 1 2 2 1 1 1 1

The derivative of the qth factor is 1

and the overall derivative is D 1 1 i i q q N N N N k k q q i k q k i q z p q k _{z p} q k p p z z z z _p _p p z p p z z p − − − = = = = = _≠ _≠ = ∂ ⎡ ₋ ⎤ ₌ ⎢_∂ ⎥ ⎣ ⎦ ⎡ ⎤ ⎡∂ ⎤ _⎢ _⎥ = − = − ⎢ _∂ ⎥ _⎢ _⎥ ⎣ ⎦ _⎢_⎣

∑ ∏

_⎥_⎦

∏

( )

₍ 1₎

(

)

(

)

2 1 1 1 1 1 D ₁ i N N N N N k i i q N k i q k _i q _i k q z p q k q k z _p p p p p p p z p p − − + = = = = = _≠ _≠ ⎡∂ ⎤ = − = − ⎢ _∂ ⎥ ⎣ ⎦

∑

∏

∑

∏

(8)

Coefficients

(

)

(

)

1 1 1

In the summation , for any k other than i,

the iterated product has a factor 0. Therefore the only term in the k summation that survives is the term and

N N k i q k q q k N i q i i q q k k p p p p p p p k i = = ≠ = ≠ − − − = =

∑ ∏

∏

( )

(

)

(

)

1 1 1 D ₁ ₁ i N N i i q i q N N q q i i z p q k q k z p p p p p z ₌ p + ₌ p ₌ ≠ ≠ ⎡∂ ⎤ = − = − ⎢ _∂ ⎥ ⎣ ⎦

∏

(9)

Quantization of Filter

Coefficients

(

)

(

)

1 1 1 Then 1 and k i i N k i q N q i q k N k N k i i N k i q q q k p p a p p p a p p p p − = ≠ − = = ≠ ∂ = ∂ ₋ ∆ ∆ = −

∏

∑

∏

(10)

Coefficients

(

)

1

The error in a pole location caused by errors in the coefficients is strongly affected by the denominator factor which is a product of

differences between pole locations. So if the pole

N i q q q k p p = ≠ −

∏

s are closely spaced, the distances are small and the error in the pole location is large. Therefore systems with widely-spaced poles are less sensitive to errors in the

(11)

Quantization of Pole Locations

( )

₁ ₂

1 2

1 2 1

1 2

Consider a 2nd order system with transfer function 1

H

1

and quantize and with 5 bits. If we quantize in the range 1 1 and 2 2 we get a finite s

z a z a z a a a a a − − = + + − ≤ < − ≤ < et of possible pole locations.

(12)

Quantization of Pole Locations

If we look at only pole locations for stable systems,

Obviously the pole locations are non-uniformly distributed.

(13)

Quantization of Pole Locations

Consider this alternative 2nd order system design.

H

( )

z = dz

−1 1−

(

c + jd

)

z−1

(

)

(

1−

(

c − jd

)

z−1

)

Pole locations are uniformly distributed.

(14)

(15)

(16)

(17)

[ ]

[ ] [

]

[ ] [

]

[ ]

Let y be the response at node to an input signal x and let h be the impulse response at node .

y h x h x

Let the upper bound on x be . y k k k k k k k m m x k n k n n k n m n m m n m n A n A ∞ ∞ =−∞ =−∞ = − ≤ − ≤

∑

[ ]

( )

[ ]

x x h

If we want y to lie in the range -1,1 then 1

h

guarantees that oveflow will not occur.

k m k k n m n A n ∞ =−∞ ∞ =−∞ <

∑

(18)

Scaling to Prevent Overflow

[ ]

x The condition 1 h

may be too severe in some cases. Another type of scaling that may be more appropriate in some cases is

1 k n x A n A ∞ =−∞ < =

∑

( )

[ ]

( )

0max H where h H . j k j k k e n e π Ω ≤Ω≤ Ω ←⎯→F

(19)

(20)

(21)

Scaling Example

Assume that the input signal x is white noise bounded between -1 and +1 and that all additions and multiplications are done using 8-bit two’s complement arithmetic. Also let all numbers (signals and coefficients) be in the range -1 to 1.

( )

1 1₁ 0.5 2 ₂ ₂ 2 0.5 H 1 0.2 0.15 0.2 0.15 z z z z z z z z z − − − − − + − + = = + − + −

(22)

Scaling Example

[ ]

1 2 0 1

Quantize the 's to 8 bits.

0.2353 0.2344 (0.0011110) 0.15 0.1562 (1.1101100) Scale and quantize the 's.

1.338 1.338 /1.338 1 0.9922 (0.1111111) 1.1 1.1/1 K K Q K Q v v Q v = → → = − → → − = → = → → = − → −

[ ]

2 .338 0.8221 0.8281 (1.0010110) 0.5 0.5 /1.338 0.3737 0.3672 (0.0101111) Q v Q = − → → − = → = → →

(23)

Scaling Example

Scaling the v’s scales the output signal but the input signal may still need scaling to prevent overflow.

( )

0 2 1 2 2 2 1 2 2 0 1 1 2 1 2 1

The exact transfer function is H where

1 0.2 0.15 . The actual , after quantizing the

's is 1 0.1978 0.1562 . 1, 0.2344 and 0.1562 0.1978 N m m m z v z z z z z z K z z z z z z z z z = − − − − − − − = Β Α Α = + − Α Α = + − Β = Β = + Β = − + +

∑

( )

₁ 1 ₂ 2 . Therefore the actual transfer function is

0.7407 0.7555 0.3672 H . 1 0.1978 0.1562 z z z z z − − − − − + = + −

(24)

Scaling Example

( )

2

The maximum magnitude of H in this example is 2.882 and

the maximum magnitude of H is 1.5455. So the scaling factor should be 1/ 2.882 0.347. The simplest scaling is by shifting right. In thi j j e e Ω Ω =

s case that would require a shift of two bits to the right to scale by 0.25. This is significanly less that 0.347 so a more complicated scaling might be preferred. We could instead scale by 0.25 0.0+ 625 0.03125

0.34375 by shifting two bits, then four bits and then 5 bits and adding the results.

+ =

(25)

Statistical Analysis of

Quantization Effects

The exact estimation of quantization effects requires numerical simulation and is not amenable to exact analytical methods. But an approach that has proven useful is to treat the quantization noise effects as a random process problem. In doing this we get an approximate analytical estimate instead of an exact numerical simulation. We do this by treating quantization noise as an added noise signal in the system everywhere quantization occurs.

(26)

Statistical Analysis of

(27)

Statistical Analysis of

( )

The probability density function for quantization noise using two's complement representation is

1 , LSB 0 1

f .

0 , otherwise LSB

The expected value of the quantization nois

Q q q = ⎧⎨ − < < ⎩

( )

(

)

2 2 2 2 2 2 2 2 e is E LSB / 2. The variance of the quantization noise is LSB /12. Therefore

the mean-squared quantization noise is

E E LSB /12 LSB / 2 LSB / 3 Q Q Q Q Q

σ

= − = = + = + − =

(28)

Statistical Analysis of

( )

2

The power spectral density function for quantization noise using two's complement representation is G . It has an impulse of strength LSB / 4 at 0 and is otherwise flat with a value in the range 1/ Q F F = K −

( )

(

)

(

)

( )

(

)

(

)

1/ 2 1/ 2 2 2 1/ 2 1/ 2 2 2 2 2 2

2 1/ 2. It repeats periodically outside this range.

E G LSB / 4 LSB / 4 LSB / 3 Solving, LSB /12 and G LSB /12 LSB / 4 Q k Q k F Q F dF F k K dF K K F F k

δ

∞ =−∞ − − ∞ =−∞ < < ⎡ ⎤ = = _⎢ − + _⎥ ⎣ ⎦ = + = = = + −

∑

∫

∑

(29)

Statistical Analysis of

( )

2

(

2

)

(

)

G LSB /12 LSB / 4 is the power spectral

density of both quantization noise sources. The power spectral density of the quantization noise effect on the output signal is

Q k F

δ

F k ∞ =−∞ = +

∑

−

( )

y y,x y, 2 2 y,x x x y x y 2 2 y, y y y 2 2 x y ₂ y ₂ x G G G where G G H G H and G G H G H Y Y and H and H f Q f f Q f j F j F f j F j F f F F F F F F F F F F _e F _e F F Q F e a Q F e a π π π π = + = = = = = = = = − −

(30)

Statistical Analysis of

The two quantization noise sources have the same power spectral density and they are independent so the output quantization noise is just twice what would be produced by one of them acting alone.

( )

2 1/ 2 ₂ ,y ₂ 1/ 2 2 2 _{1/ 2} 2 2 2 2 1/ 2 2 2 2 ,y ₂ 2 G LSB 1 LSB 2 1 6 LSB 1 LSB 1

Evaluating the integral, .

2 1 6 1 j F Q Q _j _F j F j F Q e P F dF e a e dF a e a P a a π π π π − − = − ⎛ ⎞ = _⎜ _⎟ + − − ⎝ ⎠ ⎛ ⎞ = _⎜ _⎟ + − − ⎝ ⎠

∫

(31)

Statistical Analysis of

2 2 2 ,y 2 2 2 2 2 LSB 1 LSB 1 In 2 1 6 1 LSB 1

is the effect of the expected value of the input

2 1

quantization noise and

LSB 1

is the effect of the variance of the input quantization noise.

6 1 Both Q P a a a a ⎛ ⎞ = _⎜ _⎟ + − − ⎝ ⎠ ⎛ ⎞ ⎜ ₋ ⎟ ⎝ ⎠ −

are dependent on . The closer is to one, the larger the output quantization noise.