5.4 Object-aware Image Search System
5.4.3 Image Retrieval
Figure 5.4: Example results of the object-aware retrieval. The experiment is performed on the VOC [2] image database. For each result, the left image is the user query and the right images are top ranked images.
As aforementioned, each query consists of the names, positions and the sizes of the expected objects. The search target of the system is to find images with similar object layout. Providing a query, the system first retrieves all the images with similar kind and amount of objects. Given N matched objects on the query image and one retrieved image, the system calculates the total distance and overlap ratio between the matched objects
to score the retrieved image. For each matched object i, distance(i) is the distance between the object center point in the query and the retrieved image normalized by the object size. And overlap(i) is the intersection ratio between the object in the query and retrieved image. With such elements, the score of one retrieved image can be calculated as Equation 5.1:
score = N X
i=1
(A × overlap(i) + B × (1 − distance(i))) × conf idence(i), (5.1) while A and B are constant coefficients to control the weight of overlap and distance. In practice, we set A = B = 1. The retrieval results are then sorted with descending order of the scores and returned to the search client. Some example retrieval results are shown in Figure 5.4.
5.5
Discussion
In this chapter, we propose to perform the object-aware search based on daily object detection. The performance of such object-aware image re- trieval heavily relies on the joint prediction of multiple objects, which is still not well investigated in the object-related research. Our context-based object prediction can provide a preliminary context-based refinement to the final result and hence achieves promising performance.
Though effectiveness has been shown in exemplar retrieval tasks, our work still need to be further refined for more convincing results, e.g. pre- senting statistical results to consolidate the performance object-aware im- age retrieval as well as performing use experience study on using such kind of system comparing to competitive solutions.
Other future work might focus on refinements towards better retrieval performance, which can be achieved by using more elegant contextual ob-
ject layout models and encode the object position, scale and interactive relation most robustly. The system can also be improved by wider query input such as the description of the object pose, color and texture which can also be recognized by extending our object-aware recognition system.
Conclusion and Future Work
In this work, we investigated the application of the object recognition tech- niques in Web-based systems. An implementation framework of object- aware recognition was proposed and several applications for human facial traits, human appearance, and daily object recognition and retrieval were built. Advanced object recognition techniques were integrated into the applications and showed great importance in building more intelligent sys- tems.
We first introduced our implementation on the object and object part detection inspired from recent studies[7, 32]. Our implementation has achieved promising performance in the object and object part detection, which built a solid foundation for our further research.
Then we proposed to utilize the object detection to several recognition systems:
• The Facial Traits Recognition System
We studied the face descriptions utilizing the face and face landmark detection and achieved promising performance. Moreover, our system showed much better estimation stability in Web photos than previous studies.
• The Human Appearance Recognition System
We investigated the human description using automatically located human key parts and for the first time implemented a comprehensive and interactive human clothes style recognition and retrieval system. • The Object-Layout-Aware Retrieval System
We applied the detection of daily objects to daily photos and proposed a novel image representation using object-aware indices. A photo database was built based on the object-aware representation which can retrieve desired user photos from simple object layout queries. In addition, in order to achieve recognition under the Web scenario, sev- eral datasets including the Web face photo dataset, the Web video face con- text dataset, the human clothes style dataset and the human role dataset, were constructed to learn the recognition models and evaluate the system performance. It was shown that these datasets had significant importance for the Web photo recognition than previous research-based datasets.
Though our system prototypes showed great potential to apply to cur- rent Web systems, the implementation was still immature in several as- pects. Drawbacks in our current implementation are mainly due to insuf- ficient effort in the algorithm optimization, which causes the system effi- ciency problem. Moreover, the comprehensiveness and scale of our current object understanding, such as the number of object categories, the parsing of object views and poses and the understanding of surrounding environ- ment, can also be further refined. Thus, several possible future directions to extend the proposed system might be:
• Computation Cost Optimization
The total cost of Web systems is directly related to the computa- tion cost of the algorithm. Therefore optimizing computation cost,
which is mostly from the object detection part, is an important di- rection. Besides, real-time systems working in surveillance videos are also possible as demonstrated by recent studies [29, 115].
• Prior Knowledge from Global Image Information
Many studies [112, 45] including part of this work has shown the global image descriptions can greatly help the recognition of objects and object attributes. Currently only naive combination of the global image descriptions has been investigated, i.e. only feature concate- nation. Thus better frameworks to combine image scene and object understanding can be studied in order to bring better assistance for the object-level analysis.
• Large-Scale Extensions
After achieving better system efficiency, large-scale systems which models more objects and more object attribute categories can be im- plemented. Larger object model scale might enable more applications and can also boost the performance via the object co-occurence prior.
Finally, we wish to readdress the importance of the object-aware recog- nition techniques for the current image and video processing systems, and we hope to witness a revolution in our life caused by the research in this area.
Bibliography
[1] V. Ferrari, M. Marin-Jimenez, and A. Zisserman, “Progressive search space reduction for human pose estimation,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8, 2008.
[2] M. Everingham, L. Van Gool, C. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Chal- lenge 2007 (VOC2007) Results.” http://www.pascal-network. org/challenges/VOC/voc2007/workshop/index.html.
[3] F.-F. Li, R. Fergus, and P. Perona, “Learning generative visual mod- els from few training examples: An incremental bayesian approach tested on 101 object categories,” Computer Vision and Image Un- derstanding, vol. 106, no. 1, pp. 59–70, 2007.
[4] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2169–2178, 2006.
[5] B. Yao and F.-F. Li, “Modeling mutual context of object and human pose in human-object interaction activities,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 17–24, 2010.
[6] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893, 2005.
[7] P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan, “Ob- ject detection with discriminatively trained part based models,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 32, no. 9, pp. 1627 – 1645, 2010.
[8] T. Leung and J. Malik, “Representing and recognizing the visual ap- pearance of materials using three-dimensional textons,” International Journal of Computer Vision (IJCV), vol. 43, no. 1, pp. 29–44, 2001. [9] D. G. Lowe, “Distinctive image features from scale-invariant key- points,” International Journal of Computer Vision (IJCV), vol. 60, no. 2, pp. 91–110, 2004.
[10] T. Ojala, M. Pietikainen, and D. Harwood, “A comparative study of texture measures with classification based on feature distributions,” Pattern Recognition, pp. 51–59, 1995.
[11] D. Martin, C. Fowlkes, and J. Malik, “Learning to detect natural image boundaries using local brightness, color, and texture cues,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 26, no. 5, pp. 530–549, 2004.
[12] J. H. van Hateren and A. van der Schaaf, “Independent component filters of natural images compared with simple cells in primary vi- sual cortex,” Proceedings of the Royal Society B: Biological Sciences, vol. 265, no. 1394, pp. 359–366, 1998.
[13] L. Zhang, M. Tong, T. Marks, H. Shan, and G. W. Cottrell, “SUN: A Bayesian framework for saliency using natural statistics,” Journal of Vision, vol. 8, pp. 1–20, 2008.
[14] P. Viola and M. Jones, “Robust real-time face detection,” Interna- tional Journal of Computer Vision (IJCV), vol. 57, pp. 137–154, 2004.
[15] T. Ahonen, “Face description with local binary patterns: Application to face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), no. 12, pp. 2037–2041, 2006.
[16] L. Bourdev, S. Maji, and J. Malik, “Describing people: A poselet- based approach to attribute classification,” in International Confer- ence on Computer Vision (ICCV), pp. 1543–1550, 2011.
[17] L. David, “Naive (bayes) at forty: The independence assumption in information retrieval,” in Europe Conference on Machine Learning, pp. 4–15, 1998.
[18] F.-F. Li and P. Perona, “A bayesian hierarchical model for learning natural scene categories,” in IEEE International Conference on Com- puter Vision and Pattern Recognition (CVPR), pp. 524–531, 2005. [19] A. Torralba, K. P. Murphy, W. T. Freeman, and M. A. Rubin,
“Context-based vision system for place and object recognition,” in International Conference on Computer Vision (ICCV), pp. 273–280, 2003.
[20] S. Yan, X. Zhou, M. Liu, M. Hasegawa-Johnson, and T. S. Huang, “Regression from patch-kernel,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8, 2008.
[21] J. S. Florent Perronnin and T. Mensink, “Improving the fisher kernel for large-scale image classification,” in Europe Conference on Com- puter Vision (ECCV), pp. 143–156, 2010.
[22] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality- constrained linear coding for image classification,” in IEEE Inter- national Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3360–3367, 2010.
[23] C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2, pp. 121– 167, 1998.
[24] S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin, “Graph embedding and extensions: A general framework for dimensional- ity reduction,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 29, no. 1, pp. 40 – 51, 2007.
[25] A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet, “Sim- pleMKL,” Journal of Machine Learning Research, vol. 9, pp. 2491– 2521, 2008.
[26] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. Huang, and S. Yan, “Sparse representation for computer vision and pattern recognition,” Proceed- ings of the IEEE, pp. 1031–1044, 2010.
[27] H. Liu, M. Palatucci, and J. Zhang, “Blockwise coordinate descent procedures for the multi-task lasso, with applications to neural se- mantic basis discovery,” in International Conference on Machine Learning (ICML), pp. 649–656, 2009.
[28] G. Liu, Z. Lin, and Y. Yong, “Robust recovery of subspace structures by low-rank representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), no. 1, pp. 171–184, 2012. [29] P. Felzenszwalb, R. Girshick, and D. McAllester, “Cascade object de-
tection with deformable part models,” in IEEE International Confer- ence on Computer Vision and Pattern Recognition (CVPR), pp. 2241 – 2248, 2010.
[30] B. Alexe, T. Deselaers, and V. Ferrari, “What is an object?,” in IEEE International Conference on Computer Vision and Pattern Recogni- tion (CVPR), pp. 73–80, 2010.
[31] H. Ai, C. Huang, S. Lao, and B. Wu, “Specified object detection apparatus.” Europe Patent, 12 2004. EP1596323.
[32] Y. Yang and D. Ramanan, “Articulated pose estimation with flexible mixtures of parts,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1385 – 1392, 2011. [33] E. Borenstein and S. Ullman, “Learning to segment,” in Europe Con-
ference on Computer Vision (ECCV), pp. 315–328, 2004.
[34] X. Ren and J. Malik, “Learning a classification model for segmen- tation,” in International Conference on Computer Vision (ICCV), pp. 10–17, 2003.
[35] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-based image segmentation,” International Journal of Computer Vision (IJCV), vol. 59, no. 2, 2004.
[36] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 22, no. 8, pp. 888 – 905, 2000.
[37] F. Li, J. Carreira, and C. Sminchisescu, “Object recognition as ranking holistic figure-ground hypotheses,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1712–1719, 2010.
[38] X. Boix, J. van de Weijer, A. Bagdanov, J. Serrat, and J. Gonza- lez, “Harmony potentials for joint classification and segmentation,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3280–3287, 2010.
[39] T. Brox, L. Bourdev, S. Maji, and J. Malik, “Object segmentation by alignment of poselet activations to image contours,” in IEEE In- ternational Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2225–2232, 2011.
[40] T. Cootes, G. Edwards, and C. Taylor, “Active appearance mod- els,” IEEE Transactions on Pattern Analysis and Machine Intelli- gence (TPAMI), vol. 23, no. 6, pp. 681–685, 2001.
[41] T. Cootes, C. Taylor, and D. Cooper, “Active shape models-their training and application,” Computer Vision and Image Understand- ing, vol. 61, no. 1, pp. 38–59, 1995.
[42] D. Ramanan, “Learning to parse images of articulated bodies,” in Neural Information Processing Systems (NIPS), pp. 1129–1136, 2006.
[43] L. Bourdev, S. Maji, T. Brox, and J. Malik, “Detecting people using mutually consistent poselet activations,” in Europe Conference on Computer Vision (ECCV), pp. 168–181, 2010.
[44] X. Zhu and D. Ramanan, “Face detection, pose estimation, and land- mark localization in the wild,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886, 2012.
[45] Z. Song, M. Wang, X. Hua, and S. Yan, “Predicting occupation via human clothing and contexts,” in International Conference on Com- puter Vision (ICCV), pp. 1084–1091, 2011.
[46] P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona, “Caltech-UCSD Birds 200,” Tech. Rep. CNS-TR- 2010-001, California Institute of Technology, 2010.
[47] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, “Describing objects by their attributes,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1778–1785, 2009. [48] M. Berkowitz and G. Haines Jr, “Relative attributes,” in Interna-
tional Conference on Computer Vision (ICCV), pp. 503–510, 2011. [49] T. Sim, S. Baker, and M. Bsat, “The cmu pose, illumination, and
expression database,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), no. 12, pp. 1615 – 1618, 2003. [50] G. Guo, G. Mu, Y. Fu, C. Dyer, and T. Huang, “A study on auto-
matic age estimation using a large database,” in International Con- ference on Computer Vision (ICCV), pp. 1986 – 1991, 2009.
[51] N. Kumar, A. Berg, P. Belhumeur, and S. Nayar, “Attribute and simile classifiers for face verification,” in International Conference on Computer Vision (ICCV), pp. 365–372, 2009.
[52] S. Liu, Z. Song, G. Liu, C. Xu, H. Lu, and S. Yan, “Street-to-shop: Cross-scenario clothing retrieval via parts alignment,” in IEEE In- ternational Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3330–3337, 2012.
[53] N. Singhai and S. K. Shandilya, “A survey on: content based image retrieval systems,” International Journal of Computer Applications, vol. 4, no. 2, pp. 22–26, 2010.
[54] Y. Fu, G. Guo, and T. S. Huang, “Age synthesis and estimation via faces: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 32, no. 11, pp. 1955 – 1976, 2010. [55] Z. Cao, Q. Yin, X. Tang, and J. Sun, “Face recognition with learning- based descriptor,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2707–2714, 2010. [56] V. Ferrari, L. Fevrier, F. Jurie, and C. Schmid, “Groups of adja-
cent contour segments for object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 30, no. 1, pp. 36 – 51, 2008.
[57] W. Zhou, H. Li, Y. Lu, and Q. Tian, “Principal visual word discovery for automatic license plate detection,” IEEE Transactions on Image Processing (TIP), vol. 21, pp. 4269–4279, 2012.
[58] V. Gulshan, M. Varma, and A. Zisserman, “Multiple kernels for ob- ject detection,” in IEEE International Conference on Computer Vi- sion and Pattern Recognition (CVPR), pp. 606–613, 2009.
[59] G. Guo, Y. Fu, C. Dyer, and T. Huang, “Image-based human age esti- mation by manifold learning and locally adjusted robust regression,” IEEE Transactions on Image Processing (TIP), vol. 17, pp. 1178 – 1188, 2008.
[60] K. Chatfield, V. Lempitsky, and A. Vedaldi, “The devil is in the details: an evaluation of recent feature encoding methods,” in British Machine Vision Conference (BMVC), pp. 76.1–76.12, 2011.
[61] C. Yu and T. Joachims, “Learning structural SVMs with latent vari- ables,” in International Conference on Machine Learning (ICML), pp. 1169 – 1176, 2009.
[62] B. Leibe and B. Schiele, “Analyzing appearance and contour based methods for object categorization,” in IEEE International Confer- ence on Computer Vision and Pattern Recognition (CVPR), pp. 409– 415, 2003.
[63] B. Ni, Z. Song, and S. Yan, “Web image mining towards universal age estimator,” in International ACM Multimedia Conference, pp. 85–94, 2009.
[64] Z. Song, B. Ni, D. Guo, T. Sim, and S. Yan, “Learning universal multi-view age estimator by video contexts,” in International Con- ference on Computer Vision (ICCV), pp. 241–248, 2011.
[65] S. Vijayanarasimhan and K. Grauman, “Large-scale live active learn- ing: Training object detectors with crawled data and crowds,” in
IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1449–1456, 2011.
[66] S. Branson, C. Wah, F. Schroff, B. Babenko, P. Welinder, P. Perona, and S. Belongie, “Visual recognition with humans in the loop,” in Europe Conference on Computer Vision (ECCV), pp. 438–451, 2010. [67] “The Flickr App.” http://www.flickr.com/services/api/. [68] B. Russell, A. Torralba, K. Murphy, and W. T. Freeman, “LabelMe:
a database and web-based tool for image annotation,” International Journal of Computer Vision (IJCV), vol. 77, pp. 157–173, 2008. [69] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld, “Face recog-
nition: A literature survey,” ACM Computing Surveys, pp. 399–458, 2003.
[70] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), pp. 1031–1044, 2009. [71] A. Wagner, J. Wright, A. Ganesh, Z. Zhou, and Y. Ma, “Towards
a practical face recognition system: robust registration and illumi- nation by sparse representation,” in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 597–604, 2009.
[72] Y. Hu, A. S. Mian, and R. Owens, “Face recognition using sparse approximated nearest points between image sets,” IEEE Transac- tions on Pattern Analysis and Machine Intelligence (TPAMI), no. 10, pp. 1992–2004, 2012.
[73] J. Hayashi, M. Yasumoto, H. Ito, and H. Koshimizu, “Method for estimating and modeling age and gender using facial image process- ing,” in International Conference on Virtual Systems and Multime- dia, pp. 439–448, 2001.
[74] Y. Kwon and N. Lobo, “Age classification from facial images,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 74, no. 1, pp. 1–21, 1999.
[75] A. Lanitis, C. Draganova, and C. Christodoulou, “Comparing differ- ent classifiers for automatic age estimation,” IEEE Transactions on Systems, Man and Cybernetics, Part B, vol. 34, no. 1, pp. 621–628, 2004.
[76] Y. Fu and T. Huang, “Human age estimation with regression on discriminative aging manifold,” IEEE Transactions on Multimedia, vol. 10, no. 4, pp. 578–584, 2008.
[77] X. Geng, Z. H. Zhou, and K. Smith-Miles, “Automatic age estimation based on facial aging patterns,” IEEE Transactions on Pattern Anal- ysis and Machine Intelligence (TPAMI), vol. 29, no. 12, pp. 2234– 2240, 2007.
[78] G. Guo, G. Mu, Y. Fu, and T. S. Huang, “Human age estimation us- ing bio-inspired features,” in IEEE International Conference on Com- puter Vision and Pattern Recognition (CVPR), pp. 112–119, 2009. [79] A. Georghiades, P. Belhumeur, and D. Kriegman, “From few to
many: Illumination cone models for face recognition under variable lighting and pose,” IEEE Transactions on Pattern Analysis and Ma- chine Intelligence (TPAMI), vol. 23, no. 6, pp. 643–660, 2001.
[80] “The FG-NET aging database.” http://sting.cycollege.ac.cy/ ~alanitis/fgnetaging.html.
[81] K. Ricanek and T. Tesafaye, “MORPH: A longitudinal image database of normal adult age-progression,” in IEEE Conference on Automatic Face and Gesture Recognition, pp. 341–345, 2006.
[82] S. Yan, H. Wang, X. Tang, and T. S. Huang, “Learning auto- structured regressor from uncertain nonnegative labels,” in Interna- tional Conference on Computer Vision (ICCV), pp. 1–8, 2007. [83] S. Yan, H. Wang, T. S. Huang, and X. Tang, “Ranking with uncertain
labels,” in IEEE Conference on Multimedia and Expo, pp. 96–99, 2007.
[84] H. Takimoto, Y. Mitsukura, M. Fukumi, and N. Akamatsu, “Robust gender and age estimation under varying facial pose,” Electronics and Communications in Japan, vol. 91, no. 7, pp. 32–40, 2008.
[85] Z. Li, Y. Fu, and T. S. Huang, “A robust framework for multiview age estimation,” in IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 9–16, 2010.
[86] R. Gross, I. Matthews, J. F. Cohn, T. Kanade, and S. Baker, “Multi- PIE,” in IEEE Conference on Automatic Face and Gesture Recogni- tion, pp. 1–8, 2008.
[87] H. Chen, Z. Xu, Z. Liu, and S. Zhu, “Composite templates for cloth modeling and sketching,” in IEEE International Conference on Com- puter Vision and Pattern Recognition (CVPR), pp. 943–950, 2006.
[88] B. Hasan and D. Hogg, “Segmentation using deformable spatial priors with application to clothing,” in British Machine Vision Conference (BMVC), pp. 1–11, 2010.
[89] N. Wang and H. Ai, “Who blocks who: Simultaneous clothing segmentation for grouping images,” in International Conference on Computer Vision (ICCV), pp. 1535–1542, 2011.
[90] Y. Song and T. Leung, “Context-aided human recognition- clustering,” in Europe Conference on Computer Vision (ECCV), pp. 382–395, 2006.
[91] X. Yang, S. Yuan, and Y. Tian, “Recognizing clothes patterns for blind people by confidence margin based feature combination,” in International ACM Multimedia Conference, pp. 1097–1100, 2011. [92] M. Yang and K. Yu, “Real-time clothing recognition in surveillance
videos,” in International Conference on Image Processing (ICIP), pp. 2937–2940, 2011.
[93] D. Anguelov, K. Lee, S. Gokturk, and B. Sumengen, “Contex- tual identity recognition in personal photo albums,” in IEEE In- ternational Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–7, 2007.
[94] J. Sivic, C. Zitnick, and R. Szeliski, “Finding people in repeated shots of the same scene,” in British Machine Vision Conference (BMVC), pp. 909–918, 2006.
[95] W. Zhang, T. Zhang, and D. Tretter, “Clothing-based person clus-