Quantization Effects in Digital
Filters
In two's complement representation an exact number would have infinitely many bits (in general). When we limit the number of bits
to some finite value we have truncated all the lower-order bits. Suppose the number is 23/64. In two's complement binary this is 0.0101110000... If we now limit the number of digits to 5 we get 0.0101 which represents exactly 20/64 or 5/16. The truncation error is 5 /16 - 2
(
)
3 / 64 3 / 64. If the exact number is 15 / 64, this is 1.110001000... in two's complement. Truncating to 5 bits yields 1.1100 which exactly represents 1/ 4. The error is 1/ 4
15 / 64 1/ 64. This illustra
= − −
− −
− − = − tes that in two's complement
truncation the error is always negative.
Distribution of Truncation Errors
In analysis of truncation errors we make the following assumptions.1. The distribution of the errors is uniform over a range equivalent to the value of one LSB.
2. The error is a white noise process. That means that one error is not correlated with the next error or any other error.
3. The truncation error and the signal whose values are being truncated are not correlated.
Quantization of Filter
Coefficients
( )
0 1Let a digital filter's ideal transfer function be
H
1
If the 's and 's are quantized, the poles and zeros move, creating a new transfer function
N k k k N k k k b z z a z a b − = − = = +
∑
∑
( )
0 1 ˆ ˆ H ˆ 1 ˆ ˆ where and . N k k k N k k k k k k k k k b z z a z a a a b b b − = − = = + = + ∆ = + ∆∑
∑
Quantization of Filter
Coefficients
( )
(
)
( )
(
)
1 1 1 1 1The denominator can be factored into the form D 1 1 and, with quantization,
ˆ ˆ ˆ D 1 1 ˆ where . T N N k k k k k k N N k k k k k k k k z a z p z z a z p z p p p − − = = − − = = = + = − = + = − = + ∆
∑
∏
∑
∏
1 2 1 1 2he errors in the pole locations are related to the errors in the coefficients by
N i i i i i N k k N k p p p p p a a a a a a a = a ∂ ∂ ∂ ∂ ∆ = ∆ + ∆ + + ∆ = ∆ ∂ ∂ " ∂
∑
∂Quantization of Filter
Coefficients
( )
( )
( )
( )
( )
The partial derivatives can be found from
D D or D / D / D 1 i i i i i i k z p z p k k z p i k z p k z p k z z p a z a z a p a z z z a a = = = = = ⎡∂ ⎤ ⎡∂ ⎤ ∂ = × ⎢ ∂ ⎥ ⎢ ∂ ⎥ ∂ ⎣ ⎦ ⎣ ⎦ ⎡∂ ∂ ⎤ ⎣ ⎦ ∂ = ∂ ⎡⎣∂ ∂ ⎤⎦ ⎡∂ ⎤ ∂ = ⎢ ∂ ⎥ ∂ ⎣ ⎦
( )
(
)
1 1 1 D 1 i i i i N k k k k z p i k z p N q q z p z p a z z p z p z z z − − − = = = − = = = ⎡ + ⎤ = ⎡ ⎤ = ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎡ ⎤ ⎡∂ ⎤ ∂ = ⎢ − ⎥ ⎢ ∂ ⎥ ∂ ⎣ ⎦ ⎣ ⎦∑
∏
Quantization of Filter
Coefficients
(
)
( )
(
)
(
)
1 2 1 1 2 2 1 1 1 1The derivative of the qth factor is 1
and the overall derivative is D 1 1 i i q q N N N N k k q q i k q k i q z p q k z p q k p p z z z z p p p z p p z z p − − − = = = = = ≠ ≠ = ∂ ⎡ − ⎤ = ⎢∂ ⎥ ⎣ ⎦ ⎡ ⎤ ⎡∂ ⎤ ⎢ ⎥ = − = − ⎢ ∂ ⎥ ⎢ ⎥ ⎣ ⎦ ⎢⎣
∑ ∏
⎥⎦∏
( )
( 1)(
)
(
)
2 1 1 1 1 1 D 1 i N N N N N k i i q N k i q k i q i k q z p q k q k z p p p p p p p z p p − − + = = = = = ≠ ≠ ⎡∂ ⎤ = − = − ⎢ ∂ ⎥ ⎣ ⎦∑
∑
∏
∑
∏
Quantization of Filter
Coefficients
(
)
(
)
1 1 1In the summation , for any k other than i,
the iterated product has a factor 0. Therefore the only term in the k summation that survives is the term and
N N k i q k q q k N i q i i q q k k p p p p p p p k i = = ≠ = ≠ − − − = =
∑ ∏
∏
( )
(
)
(
)
1 1 1 D 1 1 i N N i i q i q N N q q i i z p q k q k z p p p p p z = p + = p = ≠ ≠ ⎡∂ ⎤ = − = − ⎢ ∂ ⎥ ⎣ ⎦∏
∏
Quantization of Filter
Coefficients
(
)
(
)
1 1 1 Then 1 and k i i N k i q N q i q k N k N k i i N k i q q q k p p a p p p a p p p p − = ≠ − = = ≠ ∂ = ∂ − ∆ ∆ = −∏
∑
∏
Quantization of Filter
Coefficients
(
)
1
The error in a pole location caused by errors in the coefficients is strongly affected by the denominator factor which is a product of
differences between pole locations. So if the pole
N i q q q k p p = ≠ −
∏
s are closely spaced, the distances are small and the error in the pole location is large. Therefore systems with widely-spaced poles are less sensitive to errors in the
Quantization of Pole Locations
( )
1 21 2
1 2 1
1 2
Consider a 2nd order system with transfer function 1
H
1
and quantize and with 5 bits. If we quantize in the range 1 1 and 2 2 we get a finite s
z a z a z a a a a a − − = + + − ≤ < − ≤ < et of possible pole locations.
Quantization of Pole Locations
If we look at only pole locations for stable systems,Obviously the pole locations are non-uniformly distributed.
Quantization of Pole Locations
Consider this alternative 2nd order system design.H
( )
z = dz−1 1−
(
c + jd)
z−1(
)
(
1−(
c − jd)
z−1)
Pole locations are uniformly distributed.
[ ]
[ ]
[ ]
[ ]
[ ] [
]
[ ] [
]
[ ]
[ ]
Let y be the response at node to an input signal x and let h be the impulse response at node .
y h x h x
Let the upper bound on x be . y k k k k k k k m m x k n k n n k n m n m m n m n A n A ∞ ∞ =−∞ =−∞ = − ≤ − ≤
∑
∑
[ ]
[ ]
( )
[ ]
x x hIf we want y to lie in the range -1,1 then 1
h
guarantees that oveflow will not occur.
k m k k n m n A n ∞ =−∞ ∞ =−∞ <
∑
∑
Scaling to Prevent Overflow
[ ]
x The condition 1 hmay be too severe in some cases. Another type of scaling that may be more appropriate in some cases is
1 k n x A n A ∞ =−∞ < =
∑
( )
[ ]
( )
0max H where h H . j k j k k e n e π Ω ≤Ω≤ Ω ←⎯→FScaling Example
Assume that the input signal x is white noise bounded between -1 and +1 and that all additions and multiplications are done using 8-bit two’s complement arithmetic. Also let all numbers (signals and coefficients) be in the range -1 to 1.
( )
1 11 0.5 2 2 2 2 0.5 H 1 0.2 0.15 0.2 0.15 z z z z z z z z z − − − − − + − + = = + − + −Scaling Example
[ ]
[ ]
[ ]
1 2 0 1Quantize the 's to 8 bits.
0.2353 0.2344 (0.0011110) 0.15 0.1562 (1.1101100) Scale and quantize the 's.
1.338 1.338 /1.338 1 0.9922 (0.1111111) 1.1 1.1/1 K K Q K Q v v Q v = → → = − → → − = → = → → = − → −
[ ]
[ ]
2 .338 0.8221 0.8281 (1.0010110) 0.5 0.5 /1.338 0.3737 0.3672 (0.0101111) Q v Q = − → → − = → = → →Scaling Example
Scaling the v’s scales the output signal but the input signal may still need scaling to prevent overflow.
( )
( )
( )
( )
( )
( )
( )
( )
( )
0 2 1 2 2 2 1 2 2 0 1 1 2 1 2 1The exact transfer function is H where
1 0.2 0.15 . The actual , after quantizing the
's is 1 0.1978 0.1562 . 1, 0.2344 and 0.1562 0.1978 N m m m z v z z z z z z K z z z z z z z z z = − − − − − − − = Β Α Α = + − Α Α = + − Β = Β = + Β = − + +
∑
( )
1 1 2 2 . Therefore the actual transfer function is0.7407 0.7555 0.3672 H . 1 0.1978 0.1562 z z z z z − − − − − + = + −
Scaling Example
( )
( )
2
The maximum magnitude of H in this example is 2.882 and
the maximum magnitude of H is 1.5455. So the scaling factor should be 1/ 2.882 0.347. The simplest scaling is by shifting right. In thi j j e e Ω Ω =
s case that would require a shift of two bits to the right to scale by 0.25. This is significanly less that 0.347 so a more complicated scaling might be preferred. We could instead scale by 0.25 0.0+ 625 0.03125
0.34375 by shifting two bits, then four bits and then 5 bits and adding the results.
+ =
Statistical Analysis of
Quantization Effects
The exact estimation of quantization effects requires numerical simulation and is not amenable to exact analytical methods. But an approach that has proven useful is to treat the quantization noise effects as a random process problem. In doing this we get an approximate analytical estimate instead of an exact numerical simulation. We do this by treating quantization noise as an added noise signal in the system everywhere quantization occurs.
Statistical Analysis of
Quantization Effects
Statistical Analysis of
Quantization Effects
( )
The probability density function for quantization noise using two's complement representation is
1 , LSB 0 1
f .
0 , otherwise LSB
The expected value of the quantization nois
Q q q = ⎧⎨ − < < ⎩
( )
( )
( )
(
)
2 2 2 2 2 2 2 2 e is E LSB / 2. The variance of the quantization noise is LSB /12. Thereforethe mean-squared quantization noise is
E E LSB /12 LSB / 2 LSB / 3 Q Q Q Q Q
σ
σ
= − = = + = + − =Statistical Analysis of
Quantization Effects
( )
2
The power spectral density function for quantization noise using two's complement representation is G . It has an impulse of strength LSB / 4 at 0 and is otherwise flat with a value in the range 1/ Q F F = K −
( )
( )
(
)
(
)
( )
(
)
(
)
1/ 2 1/ 2 2 2 1/ 2 1/ 2 2 2 2 2 22 1/ 2. It repeats periodically outside this range.
E G LSB / 4 LSB / 4 LSB / 3 Solving, LSB /12 and G LSB /12 LSB / 4 Q k Q k F Q F dF F k K dF K K F F k
δ
δ
∞ =−∞ − − ∞ =−∞ < < ⎡ ⎤ = = ⎢ − + ⎥ ⎣ ⎦ = + = = = + −∑
∫
∫
∑
Statistical Analysis of
Quantization Effects
( )
2(
2)
(
)
G LSB /12 LSB / 4 is the power spectral
density of both quantization noise sources. The power spectral density of the quantization noise effect on the output signal is
Q k F
δ
F k ∞ =−∞ = +∑
−( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
y y,x y, 2 2 y,x x x y x y 2 2 y, y y y 2 2 x y 2 y 2 x G G G where G G H G H and G G H G H Y Y and H and H f Q f f Q f j F j F f j F j F f F F F F F F F F F F e F e F F Q F e a Q F e a π π π π = + = = = = = = = = − −Statistical Analysis of
Quantization Effects
The two quantization noise sources have the same power spectral density and they are independent so the output quantization noise is just twice what would be produced by one of them acting alone.
( )
2 1/ 2 2 ,y 2 1/ 2 2 2 1/ 2 2 2 2 2 1/ 2 2 2 2 ,y 2 2 G LSB 1 LSB 2 1 6 LSB 1 LSB 1Evaluating the integral, .
2 1 6 1 j F Q Q j F j F j F Q e P F dF e a e dF a e a P a a π π π π − − = − ⎛ ⎞ = ⎜ ⎟ + − − ⎝ ⎠ ⎛ ⎞ = ⎜ ⎟ + − − ⎝ ⎠
∫
∫
Statistical Analysis of
Quantization Effects
2 2 2 ,y 2 2 2 2 2 LSB 1 LSB 1 In 2 1 6 1 LSB 1is the effect of the expected value of the input
2 1
quantization noise and
LSB 1
is the effect of the variance of the input quantization noise.
6 1 Both Q P a a a a ⎛ ⎞ = ⎜ ⎟ + − − ⎝ ⎠ ⎛ ⎞ ⎜ − ⎟ ⎝ ⎠ −
are dependent on . The closer is to one, the larger the output quantization noise.