Future Work - Learning with Contexts

This dissertation opens up some interesting directions for further investigation. We describe some of them in the following sections.

6.3.1 Spatial Context Modeling in Visual Learning

In current work, the proposed SRF framework is only applied to dense regular-grid based image features, namely, image features are sampled uniformly on pre-defined regular image grids. However, in some applications, such as scene classification, object recognition, activity analysis, sparse features, e.g., detected corners, salient points, SIFT points convey more discriminative information since they are less affected by the background. In our future work, we therefore plan to extend the

CHAPTER 6. CONCLUSIONS AND FUTURE WORK

SRF method to handle the scenarios with sparse detected features and use them for activity classiﬁcation and general object detection. In this sense, new image neighborhood structure deﬁnitions and new schemes for local feature neighbors’ selection should be developed to cope with this new application scenario.

6.3.2 Web Context Mining for Age Estimation

There are three interesting directions for future study on web context based age estimation, which are given as follows.

1) As the current estimation accuracy still has potential to be improved, we plan to investigate new facial feature representations and new regression models to further boost the estimation accuracy.

2) As the online resources are ever-increasing, it would be reasonable to de- velop incremental learning algorithm for learning multi-instance regressor with noisy labels, which is practically valuable for two reasons. First, the online learning scheme can well adapt to the distribution diﬀusion of the real world by always incorporating new training resources. Second, as it could be computationally diﬃcult to perform the training process on an extremely large database (a web-scale database from the Internet), an online training scheme is appropriate to deal with this problem.

3) Non-frontal face based age estimation problem was rarely studied mainly due to the diﬃculties in collecting reliable non-frontal face database with precise age ground-truths. However, in real world, it is more practical to use non- frontal face for age estimation since frontal face is generally harder to obtain.

CHAPTER 6. CONCLUSIONS AND FUTURE WORK

Fortunately, the online-sharing videos provide us an extremely good resource for obtain the non-frontal face samples and the (frontal face:non-frontal face) relationship could serve as implicit label information. Therefore, we plan to investigate how to learn a non-frontal face based age estimator without any labeled non-frontal faces based on web context, e.g., online sharing videos.

List of Publications

1) Q. Tian, S. Zhang, W. Zhou, R. Ji, B. Ni and N. Sebe. Building Descrip- tive and Discriminative Visual Codebook for Large-scale Image Applications. Submitted to Multimedia Tools and Applications, 2010 (invited paper).

2) B. Ni, S. Yan, Q. Tian and A. Kassim. High-order Context Modeling by Spatialized Random Forest. Submitted to IEEE Transactions on Image Pro- cessing, 2011.

3) B. K. Bao ,B. Ni, Y. Mu and S. Yan. Eﬃcient Region-aware Image Similarity Towards Scalable Multi-label Propagation. Pattern Recognition, 2010 (in press).

4) Y. Zhou, B. Ni, S. Yan and T. S. Huang. Recognizing Pair-Activities by Causality Analysis. ACM Transactions on Intelligent Systems and Technol- ogy, 2010 (in press).

5) J. Feng, B. Ni and S. Yan. Histogram Contextualization. IEEE Transactions on Image Processing, 2010 (in press).

6) B. Ni, Z. Song and S. Yan. Web Image and Video Mining towards Universal and Robust Age Estimator. IEEE Transactions on Multimedia, 2010 (in press).

CHAPTER 6. CONCLUSIONS AND FUTURE WORK

7) B. Ni, S. Yan and A. Kassim. Learning a Propagable Graph for Semi- supervised Learning: Classiﬁcation and Regression. IEEE Transactions on Knowledge and Data Engineering, 2009 (in press).

8) B. Ni, A. Kassim and S. Winkler. A Hybrid Framework for 3D Human Motion Tracking. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 18, No. 8, pp. 1075-1084, 2008.

9) J. Feng, B. Ni, Q. Tian and S. Yan. Geometric Lp-norm Feature Pooling for

Image Classiﬁcation. CVPR, 2011.

10) J. Feng, B. Ni and S. Yan. Auto-generate Professional Background Music for Home-made Videos. ICIMCS, 2010.

11) B. Cheng, B. Ni, S. Yan and Q. Tian. Learning to Photograph. ACM Multimedia, 2010 (full paper).

12) B. Ni, Z. Song and S. Yan. Web Image Mining Towards Universal Age Esti- mator. ACM Multimedia, 2009 (full paper).

13) B. Ni, S. Yan and A. Kassim. Contextualizing Histogram. CVPR, 2009.

14) B. Ni, S. Yan and A. Kassim. Recognizing Human Group Activities with Localized Causalities. CVPR, 2009.

15) B. Ni, S. Yan and A. Kassim. Directional Stationary Markov Features. ICASSP, 2009.

16) B. Ni, S. Yan, A. Kassim and L. F. Cheong. Learning by Propagability. International Conference on Data Mining, 2008 (full paper).

17) B. Ni, S. Winkler and A. Kassim. An Eﬃcient Stochastic Framework for 3D Human Motion Tracking. SPIE2008, 2008.

CHAPTER 6. CONCLUSIONS AND FUTURE WORK

18) B. Ni, S. Winkler and A. Kassim. Articulated Object Registration Using Simulated Physical Force/Moment for 3D Human Motion Tracking. 2nd Workshop on Human Motion in Conjunction with ICCV07, 2007.

19) A. K. Mishra, B. Ni, S. Winkler and A. Kassim. 3D Surveillance System Using Multiple Cameras. SPIE2007, 2007.

20) B. Ni and S. Yan. Human Group Activities: Database and Algorithms. Advanced Topics in Biometrics, World Scientiﬁc Publishing, 2009.

21) B. Ni, S. Yan, G. Zhu, Z. Song, D. Guo, Y. Lu and J. Yan. A Vision-based Demographic Advertisement System. ICCV, 2009.

Bibliography

[1] Moosmann, F., Triggs, B.: Fast discriminative visual codebooks using randomized clustering forests. In: Proceedings of the Neural Information Processing Systems. (2007)

[2] Wikipedia: (http://en.wikipedia.org/wiki/context)

[3] Huang, J., Kumar, S., Mitra, M., Zhu, W., Zabih, R.: Spatial color indexing and applications. International Journal on Computer Vision 35 (1999) 245– 268

[4] Birchﬁeld, S., Rangarajan, S.: Spatiograms versus histograms for region-based tracking. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2005) 1158–1163

[5] Haralick, R., Shanmugam, K., Dinstein, I.: Textural features for image clas- siﬁcation. IEEE Transactions on Systems, Man and Cybernetics 3 (1973) 610–621

[6] Li, J., Wu, W., Wang, T., Zhang, Y.: One step beyond histogram: Image representation using markov stationary features. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2008)

BIBLIOGRAPHY

[7] Flickr: (http://www.ﬂickr.com/)

[8] Picasa: (http://picasa.google.com/)

[9] Youtube: (http://www.youtube.com/)

[10] GoogleImage: (http://images.google.com/)

[11] Zuech, N., Miller, R.: Machine vision (1989) Springer.

[12] Stricker, M.A., Orengo, M.: Similarity of color images. In: Proceedings of the International Conference on Storage and Retrieval for Image and Video Databases. (1995) 381–392

[13] Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution gray-scale and rota- tion invariant texture classiﬁcation with local binary patterns. IEEE Trans- actions on Pattern Analysis and Machine Intelligence (2002) 971–987

[14] Lowe, D.: Distinctive image features from scale-invariant keypoints. Interna- tional Journal on Computer Vision 60 (2004) 91–110

[15] Hadjidemetriou, E., Grossberg, M., Nayar, B.: Multiresolution histograms and their use for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (2004) 831–847

[16] Elfadel, I., Picard, R.: Gibbs random ﬁelds, cooccurrences, and texture mod- eling. IEEE Transactions on Pattern Analysis and Machine Intelligence 16 (1994) 24–37

[17] Partio, M., Cramariuc, B., Gabbouj, M., Visa, A.: Rock texture retrieval using gray level co-occurrence matrix. (2010)

BIBLIOGRAPHY

[18] Davis, L., Johns, S., Aggarwal, J.: Texture analysis using generalized co- occurrence matrices. IEEE Transactions on Pattern Analysis and Machine Intelligence 1 (1979) 251–259

[19] Kumar, R., Banerjee, A., Vemuri, B.C.: Volterrafaces: Discriminant analysis using volterra kernels. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2009)

[20] Ni, B., Yan, S., Kassim, A.: Contextualizing histogram. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2009) 1682–1689

[21] Geisler, W.S., Perry, J.S., Super, B.J., Gallogly, D.P.: Edge co-occurrence in natural images predicts contour grouping performance. Vision Research 41 (2001) 711–724

[22] Zheng, Y., Zhao, M., Neo, S., Chua, T., Tian, Q.: Visual synset: towards a higher-level visual representation. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2008)

[23] Grauman, K., Darrell, T.: The pyramid match kernel: Discriminative classiﬁ- cation with sets of image features. In: Proceedings of the IEEE International Conference on Computer Vision. (2005) 1458–1465

[24] Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2006) 2169–2178

BIBLIOGRAPHY

[25] Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: Proceedings of the ACM International Conference on Image and Video Retrieval. (2007) 401–408

[26] Yang, J., Yu, K., Gong, Y., Huang, T.S.: Linear spatial pyramid matching using sparse coding for image classiﬁcation. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2009) 1794–1801

[27] Boureau, Y.L., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the International Conference on Machine Learning. (2010)

[28] Ling, H., Soatto, S.: Proximity distribution kernels for geometric context in category recognition. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2007)

[29] Vedaldi, A., Soatto, S.: Relaxed matching kernels for robust image compar- ison. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2008)

[30] Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.: Context-based vision system for place and object recognition. In: Proceedings of the IEEE International Conference on Computer Vision. (2003) 273–280

[31] Yuan, J., Li, J., Zhang, B.: Exploiting spatial context constraints for automatic image region annotation. In: Proceedings of the ACM International Conference on Multimedia. (2007) 595–604

BIBLIOGRAPHY

[32] Singha, A., Luo, J., Zhu, W.: Probabilistic spatial context models for scene content understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2003)

[33] Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2010)

[34] Yao, B., Fei-Fei, L.: Grouplet: a structured image representation for recognizing human and object interactions. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2010)

[35] Chua, T.S., Tang, J., Hong, R., Li, H., Luo, Z., Zheng, Y.T.: Nus-wide: A real-world web image database from national university of singapore. In: Pro- ceedings of the ACM International Conference on Image and Video Retrieval. (2009)

[36] Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large- scale hierarchical image database. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2009)

[37] Miller, G.A.: Wordnet: A lexical database for english. Communications of the ACM 38 (1995) 39–41

[38] Zheng, Y.T., Zhao, M., Song, Y., Adam, H., Buddemeier, U., Bissacco, A., Brucher, F., Chua, T.S., Neven, H.: Tour the world: building a web-scale landmark recognition engine. In: Proceedings of the IEEE International Con- ference on Computer Vision and Pattern Recognition. (2009)

BIBLIOGRAPHY

[39] Ji, R., Xie, X., Yao, H., Ma, W.Y.: Mining city landmarks from blogs by graph modeling. In: Proceedings of the ACM International Conference on Multimedia. (2009) 105–114

[40] Hao, Q., Cai, R., Wang, C., Xiao, R., Yang, J.M., Pang, Y., Zhang, L.: Equip tourists with knowledge mined from travelogues. In: Proceedings of the International World Wide Web Conference. (2010)

[41] Hoi, C.H., Lyu, M.R.: Web image learning for searching semantic concepts in image databases. In: Proceedings of the International World Wide Web Conference. (2004) 406–407

[42] Li, H., Tang, J., Li, G., Chua, T.S.: Word2image: towards visual interpreting of words. In: Proceedings of the ACM International Conference on Multime- dia. (2008) 813–816

[43] Song, Y., Zhao, M., Yagnik, J., Wu, X.: Taxonomic classiﬁcation for web- based videos. In: Proceedings of the IEEE International Conference on Com- puter Vision and Pattern Recognition. (2010)

[44] Wang, Z., Zhao, M., Song, Y., Kumar, S., Li, B.: Youtubecat: Learning to categorize wild web videos. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2010)

[45] Ikizler-Cinbis, N., Cinbis, R.G., Sclaroﬀ, S.: Learning actions from the web. In: Proceedings of the IEEE International Conference on Computer Vision. (2009)

[46] Kwon, Y., Lobo, N.: Age classiﬁcation from facial images. IEEE Transactions on Pattern Analysis and Machine Intelligence 74 (1999) 1–21

BIBLIOGRAPHY

[47] Hayashi, J., Yasumoto, M., Ito, H., Koshimizu, H.: A method for estimating and modeling age and gender using facial image processing. In: Proceedings of the International Conference on Virtual Systems and Multimedia. (2001) 439–448

[48] Lanitis, A., Draganova, C., Christodoulou, C.: Comparing diﬀerent classi- ﬁers for automatic age estimation. IEEE Transactions on Systems, Man and Cybernetics, Part B 34 (2004) 621–628

[49] Cootes, T., Edwards, G., Taylor, C.: Active appearance models. IEEE Trans- actions on Pattern Analysis and Machine Intelligence 23 (2001) 681–685

[50] Geng, X., Zhou, Z., Smith-Miles, K.: Automatic age estimation based on facial aging patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (2007) 2234–2240

[51] Yan, S., Wang, H., Tang, X., Liu, J., Huang, T.: Regression from uncertain labels and its applications to soft-biometrics. IEEE Transactions on Information Forensics and Security 3 (2008) 698–708

[52] FG-NET: (The fg-net aging database: http://sting.cycollege.ac.cy/ alani- tis/fgnetaging.html.)

[53] Yan, S., Zhou, X., Liu, M., Hasegawa-Johnson, M., Huang, T.S.: Regression from patch-kernel. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2008) 1–8

[54] Guo, G., Fu, Y., Dyer, C., Huang, T.: Image-based human age estimation by manifold learning and locally adjusted robust regression. IEEE Transactions on Image Processing 17 (2008) 1178–1188

BIBLIOGRAPHY

[55] Guo, G., Fu, Y., Huang, T.S., Dyer, C.R.: Locally adjusted robust regression for human age estimation. In: Proceedings of the IEEE International Workshop on Application of Computer Vision. (2008)

[56] Guo, G., Fu, Y., Huang, T.S.: Human age estimation using bio-inspired features. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2009)

[57] Guo, G., Fu, Y., Dyer, C.R., Huang, T.S.: A probabilistic fusion approach to human age prediction. In: International Workshop on Semantic Learning Applications in Multimedia. (2008)

[58] Fu, Y., Huang, T.: Human age estimation with regression on discriminative aging manifold. IEEE Transactions on Multimedia 10 (2008) 578–584

[59] Li, Z., Fu, Y., Huang, T.S.: A robust framework for multiview age estimation. In: Proceedings of the IEEE International Workshop on Analysis and Modeling of Faces and Gestures. (2010)

[60] Su, Y., Fu, Y., Tian, Q., Gao, X.: Cross-database age estimation based on transfer learning. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. (2010)

[61] Guo, G., Mu, G., Fu, Y., Dyer, C., Huang, T.S.: A study on automatic age estimation using a large database. In: Proceedings of the IEEE International Conference on Computer Vision. (2009)

[62] Guo, G., Dyer, C.R., Fu, Y., Huang, T.S.: Is gender recognition aﬀected by age? In: Proceedings of the IEEE International Workshop on Human Computer Interaction. (2009)

BIBLIOGRAPHY

[63] Guo, G., Mu, G.: Human age estimation: What is the inﬂuence across race and gender? In: IEEE International Workshop on Analysis and Modeling of Faces and Gestures. (2010)

[64] Dalai, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2005) 886–893

[65] Breiman, L.: Probability. Society for Industrial and Applied Mathematics (1992)

[66] Sim, T., Baker, S., Bsat, M.: The cmu pose, illumination, and expression database. IEEE Transactions on Pattern Analysis and Machine Intelligence (2003) 1615–1618

[67] Phillips, P., Flynn, P., Scruggs, T., Bowyer, K., Chang, J., Hoﬀman, K., Marques, J., Min, J., Worek, W.: Overview of the face recognition grand challenge. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2005) 947–954

[68] : (The behave website: http://homepages.inf.ed.ac.uk/rbf/BEHAVE/ )

[69] Chang, C., Lin, C.: Libsvm: A library for support vector machines. http://www.csie.ntu.edu.tw/ cjlin/libsvm (2001)

[70] Ribeiro, P., Victor, J.: Human activity recognition from video: Modeling, feature selection and classiﬁcation architecture. In: Proceedings of International Workshop on Human Activity Recognition and Modelling. (2005)

BIBLIOGRAPHY

[71] Nascimento, J., Figueiredo, M., Marques, J.: Recognition of human activities using space dependent switched dynamical models. In: Proceedings of the IEEE International Conference on Image Processing. (2005)

[72] Nascimento, J., Figueiredo, M., Marques, J.: Segmentation and classiﬁcation of human activities. In: Proceedings of International Workshop on Human Activity Recognition and Modelling. (2005)

[73] Andrade, E., Blunsden, S., Fisher, R.: Hidden markov models for optical ﬂow analysis in crowds. In: Proceedings of the IEEE International Conference on Pattern Recognition. (2006) 460–463

[74] Breiman, L.: Random forests. Machine Learning Journal 45 (2004) 5–32

[75] Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2008)

[76] Nowak, E., Jurie, F.: Learning visual similarity measures for comparing never seen objects. In: Proceedings of the IEEE International Conference on Com- puter Vision and Pattern Recognition. (2007)

[77] Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. (2005) 524–531

[78] Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machile Learning Journal 63 (2006) 3–42

BIBLIOGRAPHY

[80] Yan, S., Wang, H., Liu, J., Tang, X., Huang, T.: Misalignment robust face recognition. IEEE Transactions on Image Processing (in press) (2009)

[81] : (http://www.vision.ee.ethz.ch/projects/categorization/)

[82] Ni, B., Song, Z., Yan, S.: Web image mining towards universal age estimator. In: Proceedings of the ACM International Conference on Multimedia. (2009)

[83] Ni, B., Song, Z., Yan, S.: Web image mining towards universal age estimator. ACM Transactions on Intelligent Systems and Technology (in press) (2010)

[84] Yanai, K., Barnard, K.: Finding visual concept by web image mining. In: Proceedings of the International World Wide Web Conference. (2006)

[85] Joliﬀe, I.: Principal component analysis. Springer-Verlag, New York (1986)

[86] Keeler, J., Rumelhart, D., , Leow, W.: Integrated segmentation and recognition of hand-printed numerals. In: Proceedings of the Neural Information Processing Systems. (1990) 557–563

[87] Maron, O., Lozano-Prez, T.: A framework for multiple-instance learning. In: Proceedings of the Neural Information Processing Systems. (1998) 570–576

[88] Zhang, Q., Goldman, S.: An improved multiple-instance learning technique. In: Proceedings of the Neural Information Processing Systems. (2001)

[89] Wang, J., Zucker, J.: Solving the multiple-instance problem: a lazy learning approach. In: Proceedings of the International Conference on Machine Learning. (2000) 1119–1125

[90] Viola, P., Platt, J., Zhang, C.: Multiple instance boosting for object detection. In: Proceedings of the Neural Information Processing Systems. (2005)

BIBLIOGRAPHY

[91] Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Proceedings of the Neural Information Pro- cessing Systems. (2002)

[92] Chen, Y., Bi, J., Wang, J.: Multiple-instance learning via embedded instance selection. IEEE Transactions on Pattern Analysis and Machine Intelligence

28 (2006) 1931–1947

[93] Zhou, Z., Zhang, M.: Multi-instance multi-label learning with application to scene classiﬁcation. In: Proceedings of the Neural Information Processing Systems. (2007)

[94] Ray, S., Page, D.: Multiple instance regression. In: Proceedings of the Inter- national Conference on Machine Learning. (2001) 425–432

[95] Viola, P., Jones, M.: Robust real-time face detection. International Journal of Computer Vision 57 (2004) 137–154

[96] Rudin, W.: Principles of Mathematical Analysis, 3nd Edition. McGray-Hill, New York (1978)

[97] Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society 39 (1977) 1–38

[98] Ricanek, K., Tesafaye, T.: A longitudinal image database of normal adult age-progression. In: Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition. (2006) 341–345

[99] Ni, B., Song, Z., Yan, S., Guo, D., Zhu, G., Yan, J.: Demographics targeted advertisement system. In: ICCV09 Demo. (2009)

In document Learning with Contexts (Page 115-132)