Vector quantization - A Parametric Approach for Efficient Speech Storage, Flexible Synthesis an

2.3 Quantization

2.3.1 Vector quantization

Vector quantization(VQ) is one of the most efficient and powerful tools that can be used in data compression. A fundamental result of Shannon’s rate-distortion theory [Sha59] shows that better performance can always be achieved by coding vectors instead of scalars, even for uncorrelated or independent data, as discussed in [Gra84]. The basic idea in vector quantization is to compress the input vectors by representing them using a predefined set of symbols. The symbols can be de- compressed and converted into vectors using a reproduction codebook. Optimally, the process is performed in a way that minimizes the resulting distortion.

Ak-dimensional vector quantizer consists of two mappings [Gra84]. The first mapping is an encoderγ that assigns to each input vector x = [x1, x2, ..., xk]⊤a

channel symbolγ(x) in a channel symbol set ζ. The symbol is then conveyed to a decoderβ that performs the second mapping by assigning to each channel symbol h in ζ a code vector in a reproduction set C. This finite set is usually referred to as the codebook of the quantizer and is defined as

C={ch|h ∈ ζ} , (2.6)

where ch = β(h). Consequently, the quantized vector ˆx can be obtained through

the two mappings as

x= β (γ(x)) = c_γ(x). (2.7)

The accuracy achievable using a quantizer is dependent on the size of the reproduction codebook. The resolution, or rate, of ak-dimensional VQ islog2N

k , where

a quantizer measures the number of bits needed for representing one vector com- ponent [Ger92]. Another, often even more popular way for describing the rate of a quantizer is to define the number of bits needed for representing the channel symbol for the whole vector. E.g. if the rate of the quantizer is 3 andk = 2, the quantizer can also be referred to as a 2-dimensional 6-bit vector quantizer.

Optimality conditions and distortion measures

A vector quantizer is considered optimal if two conditions are fulfilled. First, the encoder must always select a mapping that minimizes the resulting distortion

γ(x) = arg min

h∈ζ

d (x, β(h)) , (2.8)

whered(_{·) is a distortion measure such as, for example, the squared error} d(x, c) = (x− c)⊤

(x− c). (2.9)

In low bitrate speech coding, it is typical to complement the distortion measure in Equation (2.9) with perceptually motivated weighting. The resulting weighted squared error can be expressed as

d(x, c) = (x− c)⊤

W(x− c), (2.10) where the weighting matrix W is typically diagonal. The second condition for the optimality of a vector quantizer states that the decoder must assign to each channel symbolh the generalized centroid of all vectors mapped into h,

β(h) = cent(h) = arg min

ˆ x

E (d(x, ˆx|γ(x) = h)) . (2.11) In other words, the average distortion caused by the two mappings in the quantization process should be minimized [Gra84].

From the optimality condition in Equation 2.8, it follows that full search should be employed in the encoder, meaning that the distortion is measured for every code vector in the codebook and the channel symbol corresponding to the code vector leading to minimum distortion is selected. The condition in Equa- tion 2.11 implies that the reproduction codebook C must be optimal. The two optimality conditions together imply that an optimal vector quantizer can be fully described by the distortion measure d and the reproduction codebook C (and a mapping rule for cases where more than one mapping leads to equal minimum distortion).

Codebook training

Since the reproduction codebook, along with the distortion measure, determines the performance of a vector quantizer, it is essential that the codebook is well designed. As stated in Equation (2.11), a reproduction codebook is considered optimal if it consists of the distinct centroids of the source vectors mapped into each channel symbol. However, since such codebooks can be constructed in many ways, it is obvious that the optimality conditions alone only ensure that the codebook is locally optimal. A reproduction codebook that minimizes the overall distortion of the quantizer among every possible codebook is considered a globally optimal codebook.

The objective in codebook design is to find a space partitioning that minimizes the expected overall distortion between the input and the reproduction. The overall distortion is usually approximated using the long-term sample average [Gra84], i.e., the empirical average distortion for all the vectors in the training set. Since the source distribution is estimated using a training sequence consisting of a finite number of training vectors, one of the most fundamental problems in codebook training is the selection of the training sequence. There are no strict rules or solutions to this problem. However, the training data should always consist of representative pieces of the typical input data. Moreover, it is recommended that the training set should consist of at least 50 vectors per available channel symbol [Mak85].

Once the training data is selected, the actual codebook can be constructed in several ways. The most commonly used basic approach is to employ the gener-

alized Lloyd algorithm(GLA) [Lin80], also referred to as the Linde-Buzo-Gray algorithm (the algorithm is also essentially similar to the well-known K-means algorithm). The main idea is to begin with an initial codebook, and then to alter- nately encode the training sequence using the minimum distortion rule in Equation (2.8) and to replace the old reproduction codebook by the centroids of the training vectors mapped into each channel symbol according to Equation (2.11). The iter- ation is carried on until the overall distortion or the change in the overall distortion is considered low enough, or a predetermined maximum number of iterations has been reached.

The generalized Lloyd algorithm can be shown to converge to a local optimum [Gra84]. However, an inherent problem with the GLA approach is that it often gets greedily attracted to a nearby local minimum instead of finding the global minimum. Finding a globally optimal codebook is possible but only if the process is started with an initial codebook that converges to the global minimum. Thus, the selection of the initial codebook can be considered the most crucial step in the GLA method.

Many techniques have been proposed for constructing the initial codebook (several alternatives were already introduced in [Gra84]), but the method for gen- erating an initial codebook always yielding a globally optimal codebook is yet to

be found. The most simple technique that provides reasonably good results is to use GLA with a set of different random initial codebooks and to select the codebook that results in the lowest distortion. More refined approaches have also been proposed in the literature. For example, deterministic annealing [Ros93] has been reported to achieve promising results [Ros98a], and particle swarm optimization has also been found a valid approach [Sun10]. Despite the often slightly improved quality, the related additional complexity makes many of these improved methods less appealing. In practice, satisfactory performance can be usually achieved by using the simple technique of repeated random initializations.

In document A Parametric Approach for Efficient Speech Storage, Flexible Synthesis and Voice Conversion (Page 34-37)