Speed Up Spatial Pyramid Matching Using Sparse Coding with Affinity Propagation Algorithm

(1)

Sparse Coding with Aﬃnity Propagation

Algorithm

Rukun Hu and Ping Guo

Image Processing and Pattern Recognition Laboratory Beijing Normal University, Beijing 100875, China

Abstract. Recently support vector machines (SVMs) combining spatial pyramid matching (SPM) kernel have been highly successful in image an-notation. And linear spatial pyramid matching using sparse coding (Sc-SPM) scheme was proposed to enhance the performance of SPM both in time and annotation accuracy. However, both of these algorithms suffer from expansibility problem, and ScSPM needs quite a long time for code-book construction. In this paper, we proposed an adjusted framework for the ScSPM algorithm, which applies multi-level affinity propagation (AP) algorithm to the codebook construction process (AP-ScSPM). This novel approach can remarkably reduces the time complexity of codebook construction process. Furthermore, as AP algorithm can automatically determine the representative vector number, the expansibility of the algo-rithm is improved. By a series of experiments, we find that the proposed framework greatly reduces the time of codebook construction process and has the same performance in terms of annotation accuracy with ScSPM. Keywords: Image annotation, Affinity propagation, Spatial pyramid matching, Scale invariant feature transform.

1 Introduction

In recent years, contend-based image annotation has been the subject of a sig-niﬁcant amount of research. Various methods have been proposed [1][2][3][4][5], and the bag-of-features (BoF) [6] model has been extremely popular in image annotation. The model treats an image as a collection of unordered appearance descriptors extracted from local patches, quantizes them into discrete “visual words”, and then computes a compact histogram representation for semantic image classiﬁcation, e.g. object recognition or scene categorization [6].

As the BoF approach discards the spatial information of local descriptors, the descriptors’ representation power is limited. Based on the BoF model, an exten-sion called spatial pyramid matching (SPM) [1] was proposed and has made a remarkable success on a range of image classiﬁcation benchmarks like Caltech-101 and Caltech-256, and was the major component of the state-of-the-art sys-tems [6]. However, as the SPM model use the nonlinear SVM as classiﬁer, great

_{Corresponding author.}

B.-L. Lu, L. Zhang, and J. Kwok (Eds.): ICONIP 2011, Part III, LNCS 7064, pp. 467–474, 2011. c

(2)

computational and memory complexity in the training phase need to be paid, which will limit the application of this model in real-world. To solve this prob-lem, an extension of the SPM approach, named Linear Spatial Pyramid Matching Using Sparse Coding (ScSPM) [6] was proposed.

In the ScSPM, a spatial-pyramid image representation based on sparse coding (SC) of scale invariant feature transform (SIFT) features, instead of thek−means vector quantization (VQ) in the traditional SPM was proposed. The ScSPM ap-proach is naturally derived by relaxing the restrictive cardinality constraint of VQ. Furthermore, unlike the original SPM that performs spatial pooling by com-puting histograms, the ScSPM method uses max spatial pooling which is more robust to local spatial translations and more biological plausible [6]. As to the classiﬁer, the ScSPM use simple linear SVMs instead of nonlinear ones which dramatically reduces the training complexity to O(n), and obtains a constant complexity in testing, while still achieving an even better classiﬁcation accuracy in comparison with the traditional nonlinear SPM approach [6]. But in the Sc-SPM algorithm, there are still two disadvantages: 1) The codebook construction process is time consuming; 2) The codebook size is empirically pre-settled, and the expansibility of the ScSPM algorithm is limited.

In this work, multi-level AP algorithm is applied to train the codebook used in ScSPM algorithm. The performance of the ScSPM and the AP-ScSPM is studied in this paper. In Section 2 the feature used in the work is discussed and the codebook construction technique is presented in details. The experiment details are given in section 3. Finally, conclusion is given in Section 4.

2 Feature Extraction

According to reference [1], all images can be represented by a bag of feature vectors. And if a codebook can be used to classify those feature vectors, the image can be then presented by a single histogram vector. In [1], k−Mean algorithm was used to train the codebook, and in [6], sparse coding algorithm and spatial pooling methods are chosen. In this work, AP algorithm is used to train the codebook, and the spatial pooling idea is kept when coding the SIFT features (the same as that in [6]). The advantage of this framework will be discussed in section 3.

2.1 Multi-level AP Framework for Codebook Construction

As discussed above, the codebook plays a very important role in the feature mapping process. In our study, we ﬁnd that cluster centers can be used for code-book construction. And with the application of AP algorithm [8], the codecode-book construction process is faster than that in [6].

What is more, compared with other clustering method, AP algorithm is robust to the initial value, it can automatically determine the cluster number based on the data’s distribution feature, and also can adjust the cluster result based on the user’s interest by tuning the preference parameterP of the algorithm [8][9].

(3)

AP algorithm is based on graph theory, and the time complexity increases rapidly with the raise of the number of data. One possible solution for this problem when dealing with large data set is to divide the data set into small subsets and hieratically apply AP algorithm to pick the representative data in each subset. In this work, both two-level and three-level AP frameworks (as

Fig. 1. Multi-level AP cluster framework for codebook construction

shown in Fig. 1) for codebook construction are studied and the results are shown and analyzed in section 3.

In Fig. 1, ﬁrst, AP is used to select feature vectors from each training image; then apply AP to select representative vectors for each class. In the two-level framework, those selected vectors from each class are put together to form the codebook. As for the three-level AP framework, another AP algorithm is added to compress the codebook size. According to the experiment results, those two framework can remarkably reduce the time complexity of codebook construction process with good annotation accuracy.

3 Experiment

To investigate the performance of the proposed framework for codebook con-struction used in ScSPM, we do experiments on two image datasets and study the performance of both ScSPM and AP-ScSPM.

3.1 Summary of Experiments

In the experiment, two image datasets are used: thirteen scene categories and Caltech-101. For the thirteen scene categories, 13 classes with 2926 images are used. And for the Caltech-101 dataset, 40 classes with 2643 images are selected. Furthermore, the 40 classes are randomly divided into two subsets with 20 classes each. In the following part, we refer these three datasets as Caltech-101-40, Caltech-101-20-a and Caltech-101-20-b. In all those datasets , most images are medium resolution , i.e. about 300× 300 pixels.

(4)

For the ScSPM, 200,000 patches are randomly chosen for codebook construc-tion, and the codebook size is set to 1024. For the AP-ScSPM, diﬀerent number of images, 25 and 30 are tried in the experiment, for each class are investigated to train the codebook. And both two-level and three-level AP framework are studied. For these two algorithms, 25 or 30 images are randomly chosen from each class, respectively, for SVM training and the rest for testing.

In the following part, both the algorithm time complexity and expansibility are recorded and analyzed. In the tables, the time is recorded in H (hour): m (minute): s (second) format and the annotation accuracy is an average value over ﬁve runs. The label “30-LVL2” means 30 images are randomly selected for codebook construction in AP-ScSPM, and a two-level AP framework is used, and so on. The record in table 2 and 5 is from experiments run on one computer, the rest are from experiments run on another one.

3.2 Algorithm Description

In this section, the proposed multi-level AP framework for codebook construction process is presented in details.

Consider an image databaseT = {I1, I2, ..., IN} and a semantic label

vocab-ularyL = {w1, w2, ..., wN}. In the training phrase, a training set D={(I1, W1),

..., (IN, WN)} of image-caption pairs is assumed, where Ii∈ T and Wi⊂ L. i. For each semantic class w_i∈ L, randomly select n_itraining images as subset

ws_i.

ii. For each image in WS={ws1, ws2,..., wsN},split it into overlapping patches

with the size 16× 16, and get a SIFT descriptor from each patch.

iii. For SIFT feature vectors from each image, apply AP algorithm to select representative vectors VI_i.

iv. For the training vectors of each class VT_i={VI1, VI2,..., VI_n_i}, pick the

cluster centers VC_i with AP algorithm.

v. In the two-level AP framework, put together all the cluster centers of each class to construct the codebook V={VC1, VC2,..., VCN}. As for the

three-level framework, AP algorithm is applied to V, and a codebook with smaller size is generated.

3.3 Time Cost and Annotation Accuracy Analysis

In this work, the performance of the ScSPM and the AP-ScSPM are investigated. The performance of AP-ScSPM using diﬀerent image numbers for traing and diﬀerent levels of AP algorithm for vector quantization on dataset Caltech-101-20-a is recorded in table 1. As we can see that in AP-ScSPM, very little images are required for the codebook construction (about 25 images for each class). And in the three-level AP framework, the codebook size is reduced to a very small level. All the unimportant vectors are cut away, the dimension of the histogram vector for each image is largely reduced. As a result, less time is needed for SVM training. And at the same time, the average accuracy for image annotation of the AP-ScSPM algorithm stay at a high level.

(5)

Table 1. Comparison annotation performance using diﬀerent image numbers and dif-ferent levels of AP algorithm for codebook construction on Caltech-101-20-a dataset

Variable 30-LVL2 30-LVL3 25-LVL2 25-LVL3 Codebook Construction Time 68m4s 74m26s 56m26s 59m51s

Codebook Size 128× 1562 128 × 191 128 × 1426 128 × 162 Sparse Coding Time 39m38s 23m11s 35m9s 21m25s

ScSIFT Dimension 1× 32802 1 × 4011 1 × 29946 1 × 3402

SVM Training Time 2m44s 28s 2m41s 23s

SVM Testing Time 1s 1s 1s 1s

Average Accuracy 0.843894 0.820849 0.843091 0.817134 Standard Deviation 0.013249 0.056145 0.013011 0.00494

Table 2. Comparison annotation performance using diﬀerent framework on SceneClass-13 image dataset

Variable ScSPM AP-ScSPM-LVL2 AP-ScSPM-LVL3 Codebook Construction Time 27h1m4s 3h6m26s 5h43m22s

Codebook Size 128× 1024 128× 1706 128× 129 Sparse Coding Time 3h20m38s 3h52m9s 2h24m

ScSIFT Dimension 1× 21504 1× 35826 1× 2901

SVM Training Time 1m9s 3m44s 10s

SVM Testing Time 1s 1s 1s

Average Accuracy 0.794337 0.798154 0.756115 Standard Deviation 0.011569 0.013385 0.006178

From table 1 to table 5, the performance for both ScSPM and AP-ScSPM on diﬀerent datasets is recorded. From the tables we can see that, on all datasets tried in the experiments, the codebook construction time of AP-ScSPM is far more shorter than that of SPM and both algorithm can achieve high annotation accuracy. Meanwhile, the three-level AP-ScSPM algorithm can reduce the com-plexity of the codebook remarkably and SVM training time with very little cost of annotation accuracy.

In the ScSPM codebook training process, the codebook is initialized by

k−Means algorithm which is sensitive to the initial value and unstable. So

it-eration process is necessary for a stable codebook, which will take a long time. For example, in the code of [6], 50 cycles are used and this will take about 16 hours to obtain a stable codebook using dataset Caltech-101-20-a. However, if multi-level AP algorithm is used, the construction time will be reduced to about 1 hour with only a tiny drop in annotation accuracy.

3.4 Algorithm Expansibility

As discussed in [6], the codebook size in ScSPM is experientially settled by the author. As the data increase, this value must be experientially resettled and the codebook must be construction again, which seriously limit the expansibility of

(6)

Table 3. Comparison annotation performance using diﬀerent framework on Caltech-101-20-a image dataset

Variable ScSPM AP-ScSPM-LVL3

Codebook Construction Time 16h34m27s 59m51s Codebook Size 128× 1024 128× 162 Sparse Coding Time 30m38s 21m25s

ScSIFT Dimension 1× 21504 1× 3402

SVM Training Time 1m56s 23s

SVM Testing Time 1s 1s

Average Accuracy 0.845240 0.817134 Standard Deviation 0.012530 0.00494

Table 4. Comparison annotation performance using diﬀerent framework on Caltech-101-20-b image dataset

Variable ScSPM AP-ScSPM-LVL3

Codebook Construction Time 18h26m32s 1h15m28s Codebook Size 128× 1024 128× 240 Sparse Coding Time 53m57s 33m28s

ScSIFT Dimension 1× 21504 1× 5040

SVM Training Time 2m13s 32s

the algorithm. However, with the application of AP algorithm, the codebook size can be automatically decided by the algorithm itself based on the data feature as shown in the tables bellow.

With the increase of class number, the size of the codebook become larger and larger. Though this makes sense, the large size codebook will signiﬁcantly slow down the sparse coding, SVM training and testing process. One possible solution for this problem is to use the three level AP framework to train the codebook.

Table 5. Comparison annotation performance using diﬀerent framework on Caltech-101-40 image dataset

Variable ScSPM AP-ScSPM

Codebook Construction Time 14h37m5s 6h3m24s Codebook Size 128× 1024 128× 388 Sparse Coding Time 4h52mm15s 3h35m27s ScSIFT Dimension 1× 21504 1× 8148

SVM Training Time 16m15s 3m23s

(7)

From records in the table, it can be found that the three level framework can eﬃciently shrink the codebook size with little cost of annotation accuracy. Ac-cording to the experiment, with the three level AP framework, the codebook size is about ten times the number of classes.

4 Conclusion

This paper studies the performance of ScSPM algorithm and proposed an improved codebook construction process technique based on AP algorithm for ScSPM algorithm. Compared with the other codebook construction process in [6], the proposed framework shows great advantages:

a. The proposed scheme can remarkably reduce the codebook construction time with only a tiny cost of annotation accuracy.

b. As the AP algorithm can automatically determine the quantization vector number, the size of codebook used in ScSPM can be self-adjusted by the algorithm base on the data. In this sense, the expansibility of the algorithm is enhanced.

c. With multi-level AP algorithm, the size of the codebook can be maximally reduced with acceptable drop in annotation accuracy. And the shrink of codebook size will shorten the SVM training time.

In this work, traditional AP algorithm is used to train the codebook. As it is known, the performance of AP algorithm is controlled by the input “Similarity” and “Preference” matrix as well as some other parameters. In the future, by adjusting those parameters, the performance of AP-ScSPM framework maybe enhanced.

Acknowledgement. The research work described in this paper was fully

sup-ported by the grants from the National Natural Science Foundation of China (Project No. 90820010, 60911130513). Prof. Ping Guo is the author to whom the correspondence should be addressed, his e-mail address is [email protected].

References

1. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Com-puter Vision and Pattern Recognition, pp. 2169–2178 (2006)

2. Liang, M.Y., Du, J.P., Jia, Y.M., Sun, Z.Q.: Image Semantic Description and Auto-matic Semantic Annotation. In: International Conference on Control, Automation and Systems, pp. 1192–1195 (2010)

3. Perronnin, F.: Universal and Adapted Vocabularies for Generic Visual Catego-rization. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(7), 1243–1256 (2008)

4. Monay, F., Gatica-Perez, D.: Modeling Semantic Aspects for Cross-Media Image Indexing. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(10), 1802–1817 (2007)

(8)

5. Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised Learning of Semantic Classes for Image Annotation and Retrieval. IEEE Transactions on Pat-tern Analysis and Machine Intelligence 29(3), 394–410 (2007)

6. Yang, J.C., Yu, K., Gong, Y.H., Huang, T.: Linear Spatial Pyramid Matching Using Sparse Coding for Image Classiﬁcation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1794–1801 (2009)

7. Lin, S., Yao, Y., Guo, P.: Speed Up Image Annotation Based on LVQ Technique with Aﬃnity Propagation Algorithm. In: Proceedings of the International Con-ference on Neural Information Processing: Models and Applications, pp. 533–540 (2010)

8. Frey, B.J., Dueck, D.: Clustering by Passing Messages between Data Points. Sci-ence 315(5814), 972–976 (2007)

9. Jiang, W., Ding, F., Xiang, Q.L.: An Aﬃnity Propagation Based method for Vector Quantization Codebook Design. In: International Conference on Pattern Recogni-tion, pp. 1–4 (2008)

10. Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., Poggio, T.: Robust Object Recog-nition with Cortex-Like Mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3), 411–426 (2007)

11. Mutch, J., Lowe, D.: Object Class Recognition and Localization Using Sparse Features with Limited Receptive Fields. International Journal of Computer Vi-sion 80(1), 45–57 (2008)

12. Viitaniemi, V., Laaksonen, J.: Evaluating the performance in automatic image annotation: Example case by adaptive fusion of global image features. Image Com-munication 22(6), 557–568 (2007)

13. Jiang, Z., He, J., Guo, P.: Feature Data Optimization with LVQ Technique in Semantic Image Annotation. In: International Conference on Intelligent Systems Design and Applications, pp. 906–911 (2010)

14. Zhang, H., Berg, A., Maire, M., Malik, J.: SVM-KNN: Discriminative nearest neigh-bor classiﬁcation for visual category recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2126–2136 (2006)

15. Everingham, M., Gool, L.V., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes challenge 2008. In: ECCV Workshop (2008)

16. Lee, H., Battle, A., Raina, R., Ng, A.Y.: Eﬃcient sparse coding algorithms. In: Neural Information Processing Systems, pp. 801–808 (2006)

17. Mairal, J., Bach, F., Ponce, J., Sapiro, G., Zisserman, A.: Supervised dictionary learning. In: Neural Information Processing Systems (2009)