In this thesis, we introduce two case studies of using the Internet for two computer vi- sion tasks: object image retrieval and object image classification. Our extensive experi- mental results on benchmark datasets show that the Internet can be used to significantly boost the performance of the current state-of-the-art methods.
Many interesting issues remain. One is that everything on the Internet is linked rather than independent. Search engines have used such linkage structure for better relevance ranking. The linkage would be useful for computer vision, too. For example, if we want to retrieve images for one object class, then the images which are cited by many web pages that deal with this object category should be more relevant. New thoughts and new techniques to use linkage are highly expected.
Another interesting question is how to better establish correspondence between im- ages and text on the Interent. Currently, we simply consider images and nearby text to be relevant. This should not be true. Pre-processing techniques which can filter noisy words before running our algorithms could be useful here.
REFERENCES
[1] O. Chum and A. Zisserman, “An exemplar model for learning object classes,” in
IEEE Conference on Computer Vision and Pattern Recognition, 2007.
[2] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of sim- ple features,” in IEEE Conference on Computer Vision and Pattern Recognition, 2001, pp. 511–518.
[3] M. Blaschko and C. Lampert, “Learning to localize objects with structured output regression,” in European Conference on Computer Vision, 2008, pp. 2–15. [4] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition
using shape contexts,” IEEE Transactions on Pattern Analysis and Machine Intel-
ligence, vol. 24, no. 4, pp. 509–522, 2002.
[5] A. Berg, T. Berg, and J. Malik, “Shape matching and object recognition using low distortion correspondences,” in IEEE Conference on Computer Vision and Pattern
Recognition, 2005, pp. 26–33.
[6] M. Burl and P. Perona, “Recognition of planar object classes,” in IEEE Conference
on Computer Vision and Pattern Recognition, 1996, pp. 223–230.
[7] M. Burl, M. Weber, and P. Perona, “A probabilistic approach to object recogni- tion using local photometry and global geometry,” in European Conference on
[8] L. Fei-Fei, R. Fergus, and P. Perona, “A Bayesian approach to unsupervised one- shot learning of object categories,” in IEEE International Conference on Com-
puter Vision, 2003, pp. 1134–1141.
[9] P. Felzenszwalb and D. Huttenlocher, “Pictorial structures for object recogni- tion,” in IEEE Conference on Computer Vision and Pattern Recognition, 2000, pp. 2066–2073.
[10] R. Fergus, P. Perona, and A. Zisserman, “Object class recognition by unsuper- vised scale-invariant learning,” in IEEE Conference on Computer Vision and Pat-
tern Recognition, 2003, pp. 264–271.
[11] B. Leibe and B. Schiele, “Analyzing appearance and contour based methods for object categorization,” in IEEE Conference on Computer Vision and Pattern
Recognition, 2003, pp. 409–415.
[12] C. computational vision group, “Caltech 5 dataset,” 2001. [Online]. Available: http://vision.caltech.edu/archive.html
[13] L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories,” in Workshop and Special Issue on Generative-Model Based Vision, 2004, p. 178.
[14] G. Griffin, A. Holub, and P. Perona, “Caltech-256 object category dataset,” California Institute of Technology, Tech. Rep. 7694, 2007. [Online]. Available: http://authors.library.caltech.edu/7694
[15] M. Everingham, L. Van Gool, C. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results,” 2007.
[16] L. Li, G. Wang, and L. Fei-Fei, “OPTIMOL: Automatic Online Picture collecTion via Incremental MOdel Learning,” in IEEE Conference on Computer Vision and
Pattern Recognition, 2007.
[17] T. Berg and D. Forsyth, “Animals on the web,” in IEEE Conference on Computer
Vision and Pattern Recognition, 2006, pp. 1463–1470.
[18] D. Blei, A. Ng, and M. Jordan, “Latent Dirichlet allocation,” Journal of Machine
Learning Research, vol. 3, pp. 993–1022, 2003.
[19] F. Schroff, A. Criminisi, and A. Zisserman, “Harvesting image databases from the web,” in IEEE International Conference on Computer Vision, 2007.
[20] “Wikipedia.” [Online]. Available: http://en.wikipedia.org/wiki
[21] M. Swain and D. Ballard, “Color indexing,” International Journal of Computer
Vision, vol. 7, no. 1, pp. 11–32, 1991.
[22] B. Manjunath and W. Ma, “Texture features for browsing and retrieval of image data,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 837–842, 1996.
[23] D. Lowe, “Distinctive image features from scale-invariant keypoints,” Interna-
tional Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
[24] N. Dalai, B. Triggs, I. Rhone-Alps, and F. Montbonnot, “Histograms of oriented gradients for human detection,” in IEEE Conference on Computer Vision and Pat-
tern Recognition, 2005, pp. 886–893.
[25] A. Torralba, R. Fergus, and W. T. Freeman, “Tiny images,” Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Tech- nology, Tech. Rep. MIT-CSAIL-TR-2007-024, 2007. [Online]. Available: http://dspace.mit.edu/handle/1721.1/37291
[26] J. Hays and A. A. Efros, “Scene completion using millions of photographs,” ACM
Transactions on Graphics, vol. 26, no. 3, August 2007.
[27] J. Hays and A. Efros, “IM2GPS: Estimating geographic information from a single image,” in IEEE Conference on Computer Vision and Pattern Recognition, 2008. [28] K. Barnard, P. Duygulu, and D. Forsyth, “Clustering art,” in IEEE Conference on
Computer Vision and Pattern Recognition, 2001, pp. 434–441.
[29] K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D. Blei, and M. Jordan, “Matching words and pictures,” Journal of Machine Learning Research, vol. 3, pp. 1107–1135, 2003.
[30] Y. Chen and J. Z. Wang, “Image categorization by learning and reasoning with regions,” Journal of Machine Learning Research, vol. 5, pp. 913–939, 2004. [31] Y. Wang, Z. Liu, and J.-C. Huang, “Multimedia content analysis-using both audio
and visual clues,” IEEE Signal Processing Magazine, vol. 17, no. 6, pp. 12–36, 2000.
[32] H. Wactlar, T. Kanade, M. Smith, and S. Stevens, “Intelligent access to digital video: The informedia project,” IEEE Computer, vol. 29, no. 5, pp. 46–52, 1996. [33] R. Datta, J. Li, and J. Z. Wang, “Content-based image retrieval: Approaches and trends of the new age,” in ACM SIGMM International Workshop on Multimedia
Information Retrieval, 2005, pp. 253–262.
[34] S. Belongie, C. Carson, H. Greenspan, and J. Malik, “Color and texture-based im- age segmentation using EM and its applications to content based image retrieval,” in IEEE International Conference on Computer Vision, 1998, pp. 675–682.
[35] D. Joshi, J. Z. Wang, and J. Li, “The story picturing engine: Finding elite images to illustrate a story using mutual reinforcement,” in ACM SIGMM International
Workshop on Multimedia Information Retrieval, 2004, pp. 119–126.
[36] J. Li and J. Z. Wang, “Automatic linguistic indexing of pictures by a statistical modeling approach,” IEEE Transactions on Pattern Analysis and Machine Intel-
ligence, vol. 25, no. 10, pp. 1075–1088, 2003.
[37] O. Maron and A. Ratan, “Multiple-instance learning for natural scene classifica- tion,” in International Conference on Machine Learning, 1998, pp. 341–349. [38] P. Duygulu, K. Barnard, N. de Freitas, and D. Forsyth, “Object recognition as
machine translation,” in European Conference on Computer Vision, 2002, pp. IV: 97–112.
[39] P. Brown, S. D. Pietra, V. D. Pietra, and R. Mercer, “The mathematics of statistical machine translation: Parameter estimation,” Computational Linguistics, vol. 32, no. 2, pp. 263–311, 1993.
[40] D. M. Blei and M. I. Jordan, “Modeling annotated data,” in Annual ACM SIGIR
Conference, 2003, pp. 127–134.
[41] J. Jeon, V. Lavrenko, and R. Manmatha, “Automatic image annotation and re- trieval using crossmedia relevance models,” in Annual ACM SIGIR Conference, 2003, pp. 119–126.
[42] V. Lavrenko, R. Manmatha, and J. Jeon, “A model for learning the semantics of pictures,” in Annual Conference on Neural Information Processing Systems, 2003. [43] G. Carneiro, A. B. Chan, P. J. Moreno, and N. Vasconcelos, “Supervised learning of semantic classes for image annotation and retrieval,” IEEE Transactions on
[44] C. Zhai and J. Lafferty, “A study of smoothing methods for language models applied to information retrieval,” ACM Transactions on Information Systems, vol. 22, no. 2, pp. 179–214, 2004.
[45] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyra- mid matching for recognizing natural scene categories,” in IEEE Conference on
Computer Vision and Pattern Recognition, 2006, pp. 2169–2178.
[46] J. Platt, “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,” Advances in Large Margin Classifiers, vol. 10, no. 3, pp. 61–74, 1999.
[47] R. Raina, A. Battle, H. Lee, B. Packer, and A. Ng, “Self-taught learning: Transfer learning from unlabeled data,” in International Conference on Machine Learning, 2007, pp. 759–766.
[48] A. Quattoni, M. Collins, and T. Darrell, “Learning visual representations us- ing images with captions,” in IEEE Conference on Computer Vision and Pattern
Recognition, 2007.
[49] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, “Learning object categories from Googles image search,” in IEEE International Conference on Computer Vi-
sion, 2005, pp. 1816–1823.
[50] D. Lowe, “Object recognition from local scale-invariant features.” in IEEE Inter-
national Conference on Computer Vision, 1999, pp. 1150–1157.
[51] A. Oliva and A. Torralba, “Modeling the shape of the scene: A holistic represen- tation of the spatial envelope,” International Journal of Computer Vision, vol. 42, pp. 145–175, 2001.