Self-Organized Binary Encoding for ANN - Residual Vector Quantization based Techniques

4 CONTRIBUTIONS

4.2 Residual Vector Quantization based Techniques

4.2.1 Self-Organized Binary Encoding for ANN

The hierarchical scheme proposed in RVQ is an efficient solution for codebook generation and codevector selection. As discussed in Chapter 3.3.2.1, thanks to its hierarchy, RVQ divides the codebook generation problem into subproblems, each of which can be solved efficiently and easily using Lloyd’s quantizer. Yet this approach has some disad- vantages. The hierarchical structure forces each layer to be solved separately and this may lead to convergence to a local minimum for each layer. The inferiority of each layer affects its lower layers, as the following layers quantize the residuals of the previous layers. This makes overfitting a very important problem in RVQ.

21_{The numbers presented here are the average number of dimensions for the subspaces}

As discussed in Chapter 2.2, clustering techniques have been adapted to VQ problem quite many times. A popular clustering method is Self-Organizing Maps (SOM) [134]. SOM is a neural network, which is used to map the high-dimensional distribution of sam- ples onto a predefined low dimensional grid. It is inspired from the structure of the brain cells, as the neighboring brain cells are discovered to respond to inputs together. SOM usually forms a two dimensional grid of neurons and automatically reorganizes the sam- ples and their association with these neurons. Training of SOM can be interpreted as a competitive learning algorithm, as the weights of the neurons are updated iteratively ac- cording to an error measure [134]. SOM brings the concept of “winner neuron” to the neural nets, and updates the weights of the winner neurons and some of its neighbors’ weights. With this feature, it can be said that SOM’s are more robust against overfitting [135].

In order to improve the performance of RVQ by eliminating the overfitted centroids, in [P3], it is proposed to adapt SOM as a VQ for each layer of RVQ. The definition of SOM starts with the definition of the winner neuron. For an input 𝒙, the winner neuron is defined as given below:

𝑘̇ = argmin

𝑘 (‖𝒙 − 𝒘𝑘‖2 2₎

(4.8)

In this way, a winner neuron’s weight vector provides the minimum squared error to the given input. SOM proposes to train the weights by using a stochastic gradient descent approach, to minimize the error between a given sample and the corresponding winner neuron. The iterative weight update equation for the winner neuron is given below:

𝒘_𝑘̇(𝑡 + 1) = 𝒘_𝑘̇(𝑡) − γ(𝑡)∇𝒘_𝑘̇(‖𝒙 − 𝒘𝑘̇‖2) (4.9)

where ∇_𝒘_𝑘̇ is the gradient operation and γ(𝑡) is the learning rate. The gradient can be calculated as:

∇𝒘_𝑘̇(‖𝒙 − 𝒘𝑘̇‖22) = 2(𝒘𝑘̇− 𝒙) (4.10)

SOM also updates the neighbors of the winner neuron. Let 𝒩_𝒘_𝑘̇ represent the set of neighbors of the winner neuron. The weight update is performed as shown below:

55 𝒘𝑘(𝑡 + 1)= {𝒘𝑘(𝑡)− γ_𝒘(𝑡)(𝒘𝑘− 𝒙) 𝑘(𝑡) 𝑖𝑓 𝑘 ∈ 𝒩_𝒘 𝑘̇ 𝑒𝑙𝑠𝑒 (4.11)

Usually neighbors are defined by a 2-D Gaussian kernel, but alternatively the distance between the neurons can also be used to define the neighborhood and control the neighboring weight updates [134]. In [P3], a similar approach is followed. A number of nearest neurons are defined as the neighboring neurons. However, since SOM usually defines a 2-dimensional grid, a multidimensional SOM grid is proposed in [P3]. A transform cod- ing based clustering is applied to initialize the neurons positions and weights. Then for each neuron, a number of nearest neurons are assigned as neighbors. The neighboring neurons are updated in relation to their distances to the winner neuron. The neighborhood relation is preserved throughout the iterations.

In [P3], an improvement for the encoding algorithm is also proposed. RVQ’s encoding is simple and layer based as it simply selects the nearest codevectors to residuals. SOBE proposes to optimize the encoding iteratively, by keeping 𝑀 − 1 codevectors fixed and updating the 𝑚𝑡ℎ_{one for all 𝑀 codevectors. This continues until the decrease in the} quantization error converges or a maximum number of iterations is reached.

Together with SOM based codebook generation and improved encoding scheme, SOBE outperforms RVQ on tests performed on benchmark datasets of ANN. The test results of SOBE are provided in Table 9 and the computational and storage costs are presented in Table 10. SOBE is presented in comparison with the prior art in Table 16, Table 17 and Table 18, in Chapter 4.3.

Table 9: SOBE Test Results

TEST RESULTS FOR SIFT1M,32-BIT CODES

recall@1 recall@10 recall@100

RVQ NA NA NA

SOBE 0.100 0.348 0.731

TEST RESULTS FOR GIST1M,32-BIT CODES

recall@1 recall@10 recall@100

RVQ NA NA NA

SOBE 0.064 0.189 0.403

TEST RESULTS FOR SIFT1M,64-BIT CODES

recall@1 recall@10 recall@100

RVQ 0.257 0.653 0.946

SOBE 0.282 0.701 0.962

TEST RESULTS FOR GIST1M,64-BIT CODES

recall@1 recall@10 recall@100

RVQ 0.113 0.325 0.676

Table 10: Computational and Storage Costs of SOBE

Method Encoding Cost Encoding Cost for Different Datasets and Code Lengths (Number of Operations) SIFT1M-32 SIFT1M- 64 GIST1M- 32 GIST1M- 64 RVQ 𝛰(𝑀𝐾𝐷) 131072 262144 983040 1966080 SOBE 𝛰(2𝑀𝐾𝐷) 262144 524288 1966080 3932160

Method Storage Cost Storage Cost for Different Datasets and Code Lengths (MB) SIFT1M-32 SIFT1M- 64 GIST1M- 32 GIST1M- 64 RVQ 𝛰(𝑀𝐾𝐷) 1.00 2.00 7.5 15 SOBE 𝛰(𝑀𝐾𝐷) 1.00 2.00 7.5 15 𝑴: number of layers 128 128 960 960 𝑲: number of codevectors 256 256 256 256 𝑫: number of dimensions 8 8 4 4

As shown in Table 10, the iterative encoding converges in about two iterations on average, so the cost of encoding for SOBE is twice the cost of RVQ. Both methods require the same amount of additional storage space. More details about SOBE can be found in [P3].

In document Vector Quantization Techniques for Approximate Nearest Neighbor Search on Large-Scale Datasets (Page 69-72)