Predictive vector quantization - A Parametric Approach for Efficient Speech Storage, Flexible S

2.3 Quantization

2.3.3 Predictive vector quantization

The vector quantization methods described in this section can achieve very good coding quality for a given bitrate. However, the performance can still be improved by incorporating memory into the vector quantizer, provided that there is some correlation between successive vectors. The memory can be used to store information about one or more previously quantized vectors. Based on this previously obtained information, a prediction of the current vector to be quantized is calculated. Then, instead of quantizing the vector itself, only the error between the original vector and the prediction is quantized. This approach is referred to as predictive vector quantization (PVQ) [Cup85]. (It should be noted that while PVQ best fits the needs in this thesis, there are also other quantization approaches that utilize memory. Finite-state vector quantization [Fos85] is one example of such an approach. See, e.g., [Ger92] for more information.)

The basic idea in predictive vector quantization is close to that in the linear prediction approach introduced in Section 2.2.1. There are two major differences between these methods. Firstly, the PVQ method operates on vectors instead of scalars. Secondly, the "prediction coefficients" or the predictor matrices are often constant in predictive vector quantization. Thus, the prediction cannot usually adapt to changes in the input data like in the case of linear prediction described earlier.

Mathematically, in predictive vector quantization the predictionx˜nof the in-

Figure 2.5: Predictive vector quantizer. (From [Nur01a].)

quantized vectors. Then, the prediction error

en= xn− ˜xn (2.13)

is quantized instead of the original input vector. Finally, the output of the PVQ is computed by adding together the prediction and the quantized prediction error,

xn= ˜xn+ ˆen. (2.14)

Consequently, a predictive vector quantizer can be seen simply as a normal vector quantizer operating with the prediction errors. The resulting quantizer structure is depicted in Figure 2.5. There are alternative methods for obtaining the prediction errors, or predictions, and in principle there are no limitations on how to compute them, as long as the same predictions can be made both at the encoder side and at the decoder side. Usually, the predictions are calculated in a linear manner using either the auto-regressive (AR) or the moving average (MA) approach.

Common predictor types

In the auto-regressive method, the prediction is established using the quantized values of previous input vectors. Each involved vector is multiplied with the cor- responding predictor and the prediction is computed as a sum,

˜ xn= mA X j=1 Ajxˆn−j. (2.15)

By plugging Equation (2.14) into Equation (2.15), the prediction can also be expressed in terms of earlier predictions and quantized prediction errors as

˜ xn= mA X j=1 Aj(ˆen−j+ ˜xn−j). (2.16)

In both Equation (2.15) and Equation (2.16),mAis the predictor order and Aj is

The prediction in the moving average approach is based only on the preceding quantized prediction errors. The prediction vector can be conveniently expressed as ˜ xn= mB X q=1 Bqˆen−q, (2.17)

wheremBis the MA predictor order and Bqis theqth predictor matrix. It is also

possible to combine the moving average method with the auto-regressive approach as ˜ xn= mA X j=1 Ajxˆn−j+ mB X q=1 Bqˆen−q (2.18)

to form an ARMA predictor.

The difference between the two introduced methods is that the MA approach uses only the earlier quantized prediction errors while AR takes advantage of the previous predictions as well. It has been reported that potentially lower distortion with a lower predictor order can be achieved by employing the auto-regressive prediction [Sko97]. However, this advantage is only true for an error-free en- vironment because there is no mechanism to limit the propagation of the effect of the occurring bit errors. The error propagation in moving average prediction is limited by the predictor degree and thus the performance for noisy channels is better [Ohm93]. This makes MA the most popular predictor type in speech coding applications despite its slightly inferior performance in error-free situations.

Training of predictive quantizers

The coding performance in predictive vector quantization depends on both the predictor and the reproduction codebook. Optimally, the codebook of a predictive vector quantizer and the predictor matrices should be jointly optimized to obtain the best possible quantization quality. However, this approach is rarely applied in practice since it leads to somewhat complicated and computationally burdensome optimization techniques such as the stochastic gradient and coordinate descent methods (see for example [Cha86] and [Zeg91]). Furthermore, it has been reported that joint optimization is not usually worth the effort and much simpler techniques can achieve nearly identical performance [Ger92][Chapter 13]. In par- ticular, it has been found that good overall performance can be achieved if the codebook is optimized for the predictor even when the predictor is not optimized for the codebook [Cha86].

In [Cup85], two popular basic techniques were introduced for the training of predictive quantizers. The first technique, referred to as the open-loop approach, is the simplest. In this approach, the predictor is designed first and then a training set of prediction error vectors is obtained directly using the predictor and the original, unquantized source vectors. Finally, the codebook is trained for this training set using conventional training methods.

The second basic approach proposed in [Cup85] is the closed-loop approach. Both the original version presented in [Cup85] and the later version described in [Ger92] utilize a closed-loop system for generating the prediction errors in an iter- ative fashion. At each iteration, a new training set is generated by computing the prediction error vectors using the quantizer of the previous iteration, alternating between the computation ofxˆnand enfor all the vectors in the training sequence.

The initial quantizer is typically generated using the open-loop method. Even though both of these simple basic techniques fulfill the criterion of [Cha86] by optimizing the codebook for the predictor, better performance can be obtained by modifying the training algorithm.

The asymptotic closed-loop (ACL) design algorithm, proposed in [Ros98b] and further studied and developed, for example, in [Kha01b], is one of the most appealing methods developed to tackle the problems related to the basic approaches. The ACL technique has been shown to provide a stable design process and to produce high-quality predictive quantizers. In the asymptotic closed-loop design algorithm, when implemented as described in [Kha01b], the predictive quantizers are trained using alternate optimizations of the predictor and the codebook. The stability of the training procedure is improved using a simple trick: the predictions are always based on a fixed set of vectors obtained during the previous iteration. Thus, the training is effectively carried out in an open-loop fashion and the insta- bility problems associated with the closed-loop approach can be avoided. How- ever, the optimization is ultimately performed for closed-loop operation [Kha01b]. Due to the good performance and simplicity, the predictive quantizers used in this thesis work are trained using the asymptotic closed-loop design technique. A practical example demonstrating effective use of the ideas behind the ACL design method is provided in Section 3.3.

In document A Parametric Approach for Efficient Speech Storage, Flexible Synthesis and Voice Conversion (Page 40-43)