4 CONTRIBUTIONS
4.2 Residual Vector Quantization based Techniques
4.2.2 Joint K-Means Quantization for ANN
Another RVQ based VQ method for ANN proposed in this thesis is Joint K-Means Quan- tization (JKM). As mentioned in Chapter 3.3.2.1 and 4.2.1 RVQβs hierarchical structure separates the quantization problem into π subproblems and the solution of each of prob- lem strongly depend on the previous one. However, in its proposed solution, RVQ does not consider this dependence. ERVQ claims to offer a joint training scheme, but the pro- posed algorithm only provides an update on the codebooks generated by RVQ, which are already obtained independently from each other. Hence, the proposed codebook update does not really construct a joint scheme.
Nevertheless, a combination of the hierarchical structure with a joint codebook genera- tion strategy would increase the performance while enjoying the low encoding complexity. Following this claim, Joint K-Means is proposed [P4]. JKM expands the βK-meansβ22
training on one of RVQβs layers to all layers, providing a joint training scheme. Investi- gating the training scheme of K-Means23, first an βexpectationβ step is performed, where
22 K-Means clustering algorithm and Lloydβs vector quantization are sometimes used inter-
changeably in the literature.
57
the vectors are assigned to the nearest codevectors. Later a βmaximizationβ step follows the expectation step where the codevectors are updated with the means of the assigned vectors. RVQ applies these steps for many iterations separately at each layer. In JKM, it is proposed to extend this to all layers.
The βexpectation-maximizationβ steps of JKM occurs as follows: in the βexpectationβ step, each vector is assigned to its βselectedβ codevector and the residual is immediately cal- culated and transferred to the next layer, where the same operation will be repeated until the final layer is reached. Then in the maximization stage, codevectors at each layer are updated with the means of assigned codevectors. Therefore, while RVQ waits for the quantization on a layer to converge, JKM propagates the residuals through layers during the iterations. Note that, JKM does not assign the given vector to the nearest codevector, but instead it assigns to the βselectedβ codevector, and this selection is performed by the encoding algorithm. JKM proposes a joint encoding algorithm, which takes also the layer below the current layer into account, while selecting the codevector from the current layer. Incorporating this encoding method into the training improves the codebook generation even further.
Encoding in RVQ is also performed independently for each layer in a nearest neighbor fashion. In other words, the nearest codevector from the corresponding codebook is se- lected for each residual. However, this does not guarantee the minimum error. Let π1,π be the closest codevector to π and π2,π is the closest codevector to the first residual π1= π-π1,π. π1,π is a different codevector from the first codebook, i.e., π1,π β π1,π and π2,π is a codevector from the second codebook. The suboptimality of this encoding scheme can be proven as follows:
lemma: Given βπ β π1,πβ22β€ βπ β π1,πβ22, and β(π β π1,π) β π2,πβ22β€ β(π β π1,π) β π2,πβ22 there exist at least one π1,π and π2,π, which satisfy
β(π β π1,π) β π2,πβ22β₯ β(π β π1,π) β π2,πβ2 2
(4.12)
proof: Assume that π = π1,π+ π2,π. Then (4.12) turns into the following:
58
which is always true. Now if one can show that the assumption for π = π1,π+ π2,π is valid, the proof is complete. If π = π1,π+ π2,π, then putting it in the first inequality given in lemma gives the following:
βπ1,π+ π2,πβ π1,πβ22β€ βπ2,πβ22 (4.14)
Rearranging the terms in (4.14), one can obtain the equation below:
βπ2,πβ (π1,πβ π1,π)β22β€ βπ2,πβ22 (4.15)
which is true when βπ1,πβ π1,πβ22β€ 2β©π2,π, π1,πβ π1,πβͺ. For the second inequality in lemma, when the proposed assumption for π = π1,π+ π2,π is put into the inequality, then the fol- lowing inequality is obtained:
β(π1,π+ π2,π) β π1,πβ π2,πβ22β€ βπ1,πβ π1,πβ22 (4.16)
Rearranging the terms in (4.16), one can obtain the equation below:
β(π1,πβ π1,π) β (π2,πβ π2,π)β22β€ βπ1,πβ π1,πβ22 (4.17)
which is true when β(π2,πβ π2,π)β22β€ 2β©π1,πβ π1,π, π2,πβ π2,πβͺ. Since (4.15) and (4.17) can be true according to the selection of codevectors, in other words they are not always false, then π = π1,π+ π2,π is a valid case, hence the proof is complete.
In order to improve the encoding performance, βjoint encodingβ is proposed in JKM. Joint encoding is similar to beam search in AQ or OCKM, but much less complex since it enjoys the hierarchical structure, which reduces the number of required computations significantly. The joint encoding method searches for the codevector with the minimum quantization error in a small neighborhood of the nearest codevector. So instead of the nearest codevector, it is proposed to select the π» nearest codevectors and calculate the residuals for each of them. Then the same operation is repeated for each residual, giving π»2 candidates. The best π» according to the quantization error is selected and the oper-
59
ations proceed until the final layer is reached. To explain the computational costs of en- coding in detail, the distance between the ππ‘β layer residual π
π of the given vector π, and the ππ‘β codevector on the ππ‘β layer π
π,π can be rewritten as follows:
π(ππ, ππ,π) = βπ β β πΜπ πβ1 π=1 β ππ,πβ 2 2 = βπ β β πΜπ πβ1 π=1 β 2 2 β 2 β©π β β πΜπ πβ1 π=1 , ππ,πβͺ + βππ,πβ22 = βπ β β πΜπ πβ1 π=1 β 2 2 β 2β©π, ππ,πβͺ + 2 β β©πΜπ, ππ,πβͺ πβ1 π=1 + βππ,πβ22 (4.18)
where πΜπ is the nearest codevector on the ππ‘β layer. For each layer, note that the first term is already calculated in the previous layers. The third and fourth terms can be re- trieved from a look-up table. Hence, the second term should be calculated first for all the codevectors, which requires π(πΎπ·) operations for one layer. The look-ups for the third and fourth terms require π(ππΎπ») look-ups and additions for the ππ‘βlayer. Finally, among all the distances the best π» are selected, which cost π(πΎπ» log π»). This is re- peated π times so the final cost of encoding is π° (ππ·πΎ +(πβ1)(πβ2)
2 πΎπ» + ππΎπ» πππ π»).
More details on this encoding scheme can be found in [P4] and [P5].
To conclude, JKM takes the lower layers into account during both codebook generation and vector encoding steps. This affects the quantization performance as expected. The tests on ANN benchmarks are shown in Table 11 and Table 12. JKM is also presented in comparison with the prior art in Table 16, Table 17 and Table 18, in Chapter 4.3.
Table 11: JKM Test Results
TEST RESULTS FOR SIFT1M,32-BIT CODES
recall@1 recall@10 recall@100
SOBE 0.100 0.348 0.731
JKM 0.121 0.402 0.790
TEST RESULTS FOR GIST1M,32-BIT CODES
recall@1 recall@10 recall@100
SOBE 0.064 0.189 0.403
JKM 0.077 0.213 0.511
TEST RESULTS FOR SIFT1M,64-BIT CODES
recall@1 recall@10 recall@100
SOBE 0.282 0.701 0.962
JKM 0.323 0.759 0.980
TEST RESULTS FOR GIST1M,64-BIT CODES
recall@1 recall@10 recall@100
SOBE 0.136 0.360 0.705
60
Table 12: Computational and Storage Costs of JKM
Method Encoding Cost Encoding Cost for Different Datasets and Code Lengths (Number of Operations)
SIFT1M-32 SIFT1M-64 GIST1M-32 GIST1M-64
SOBE π°(2ππΎπ·) 262144 524288 1966080 3932160 JKM π° (ππ·πΎ +(π β 1)(π β 2) 2 πΎπ» + ππΎπ» πππ π») 319488 761856 1171456 2465792
Method Storage Cost Storage Cost for Different Datasets and Code Lengths (MB)
SIFT1M-32 SIFT1M-64 GIST1M-32 GIST1M-64
SOBE Ξ(ππΎπ·) 1.00 2.00 7.5 15 JKM Ξ(ππΎπ·) 1.00 2.00 7.5 15 π΄: number of layers 128 128 960 960 π²: number of codevectors 256 256 256 256 π«: number of dimensions 8 8 4 4 π―: number of candidates 32 32 32 32