Gaussian Mixture Model (GMM) clustering - Video processing and background subtraction for chang

Gaussian Mixture Model (GMM) clustering can also be used for quantizing the visual feature space. Unlike K-means, GMM use has not been as widespread in the computer vision community for partitioning visual feature spaces, although recent results in image classification are very promising [46], [37].

For GMM clustering, a set of L feature vectors X = {¯x₁, ¯x₂, ..., ¯x_L}, where ¯x_i ∈ R^D, i = {1, 2, ..., L} derived from training videos is modeled by a mixture of m Gaus-sian distributions that are completely specified by the set of parameters Θ. Θ = {π₁, ¯µ1, Σ1, ..., πm, ¯µm, Σm} comprises of prior probabilities π_l∈ R₊, mean values ¯µl∈ R^D and covariance matrices Σ_l∈ R^DxD, where l = {1, 2, ..., m}. A sample ¯x_i, i = {1, 2, ..., L}, derived from the X set of feature vectors, is characterized by its density p(¯x_i|Θ):

p(¯xi|π_l, ¯µl, Σl) =

where p_l, l = [1, 2, ..., m] is a probability density function (pdf) with parameters {π₁, ¯µ1, Σ1, ..., πm, ¯µm,Σm}, prior probabilities π_l ∈ R₊, mean values ¯µl ∈ R^D and covariance matrices Σ_l∈ R^DxD for m Gaussian components. We make the assumption that the data is uncorrelated, leading to diagonal covariance matrixes, so the GMM is fully described by (2D + 1)m scalar parameters. A GMM is fit to the data X = {¯x1, ..., ¯xn}

Expectation Maximization (EM) is then initialized to learn these parameters. At the assignment step of the algorithm, soft data (i = {1, 2, ..., L}) to distribution (k = {1, 2, ..., m}) assignments are defined as follows:

q_ik = p_k(¯xi) · π_k Pm

l=1p(¯x_i|¯µ_l, Σ_l) · π_l

At the update step (Maximization step or M-step), the mixture parameter estimates are refined using the computed probabilities:

π_k= 1 L

i=1

q_ik, µ_k= PL

i=1qikx¯i

PL i=1qik

Σ_k = PL

i=1qik(¯xi− ¯µk)(¯xi− ¯µk)^T PL

i=1q_ik , k = {1, 2, ..., m}

[1] ComScore; US data. Video views for youtube vs its competitors, 2009.

[2] Central Intelligence Agency 2011. Current world population growth rate, 11 2011.

[3] A. Yilmaz and M. Shah. Recognizing human actions in videos acquired by uncal-ibrated moving cameras. In IEEE Internation Conference on Computer Vision (ICCV), 2005.

[4] D. Ramanan, D.A. Forsyth, and A. Zisserman. Tracking people by learning their appearance. IEEE Transaction on Pattern Analysi and Machine Intelligence (tPAMI), 29:65, 2007.

[5] A. F. Bobick and J. W. Davis. The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence (tPAMI), 23:257–267, 2001.

[6] L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. In IEEE transaction on Pattern Analysis and Machine Intelligence (tPAMI), pages 1395–1402, Dec 2007.

[7] D. Weinland, R. Ronfard, and E. Boyer. Free viewpoint action recognition using motion history volumes. In Computer Vision and Image Understanding (CVIU), 2006.

[8] A. A. Efros, A. C. Berg, G. Mori, and J. Malik. Recognizing action at a distance.

In International Conference on Computer Vision (ICCV), 2003.

[9] I. Laptev and T. Lindeberg. Space-time interest points. In IEEE International Conference on Computer Vision (ICCV), pages 432–439 vol.1, 2003.

[10] J. Sun, X. Wu, S. Yan, L.-F. Cheong, T.-S. Chua, and J. Li. Hierarchical spatio-temporal context modeling for action recognition. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), page 1–8, June 2009.

125

[11] R. Messing, C. Pal, and H. Kautz. Activity recognition using the velocity histories of tracked keypoints. In IEEE International Conference on Computer Vision (ICCV), Washington, DC, USA, 2009. IEEE Computer Society.

[12] I. Laptev, M. Marsza lek, C. Schmid, and B. Rozenfeld. Learning realistic human ac-tions from movies. In IEEE Conference on Computer Vision & Pattern Recognition (CVPR), Jun 2008.

[13] A. Gaidon, Z. Harchaoui, and C. Schmid. Temporal localization of actions with ac-toms. IEEE Transactions on Pattern Analysis and Machince Intelligence (tPAMI), 35(11):2782–2795, 2013.

[14] C. Sch¨uldt, I. Laptev, and B. Caputo. Recognizing human actions: A local svm approach. In IEEE International Conference on Pattern Recognition (ICPR), pages 32–36, 2004.

[15] M. D. Rodriguez, J. Ahmed, and M. Shah. Action mach: a spatio-temporal maxi-mum average correlation height filter for action recognition. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2008.

[16] L. Rybok, S. Friedberger, U. D. Hanebeck, and R. Stiefelhagen. The KIT Robo-Kitchen Data set for the Evaluation of View-based Activity Recognition Systems.

In IEEE-RAS International Conference on Humanoid Robots, 2011.

[17] K. Avgerinakis and I. Kompatsiaris. Demcare action dataset for evaluating dementia patients in a home-based environment. In Ambient TeleCare session of Innovation in Medicine and Healthcare (InMed), Athens, 2013.

[18] C. Zach, T. Pock, and H. Bischof. A duality based approach for realtime tv-l1 optical flow. In Annual Symposium of the German Association for Pattern Recognition, pages 214–223, 2007.

[19] A. Bruhn, J. Weickert, and C. Schnorr. Lucas/kanade meets horn/schunck: Com-bining local and global optic flow methods. International Journal of Computer Vision (IJCV), 61:211–231, 2005.

[20] A. Klaser, M. Marszalek, and C. Schmid. A spatio-temporal descriptor based on 3d-gradients. In In British Machine Vision Conference (BMVC), 2008.

[21] H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action Recognition by Dense Trajectories. In IEEE Conference on Computer Vision & Pattern Recognition (CVPR), pages 3169–3176, Colorado Springs, United States, June 2011. URL

http://hal.inria.fr/inria-00583818/en.

[22] A. Gilbert, J. Illingworth, and R. Bowden. Action recognition using mined hierarchical compound features. IEEE Transactions on Pattern Analysis and Machine Intelligence (tPAMI), 33(5):883–897, 2011. ISSN 0162-8828. doi:

10.1109/TPAMI.2010.144.

[23] M.Zhang and A.A.Sawchuck. Human daily activity recognition with sparse repre-sentation using wearable sensors. pages 553–560, 2013.

[24] D.Olguin and A. Pentland. Human activity recognition: Accuracy across common locations for wearable sensors. In in Proc. 10th Int. Symp. Wearable Computers, pages 11–13, 2006.

[25] S. Chen and Y. Huang. Recognizing human activities from multi-modal sensors.

In IEEE International Conference on Intelligence and Security Informatics, pages 220–222, June 2009.

[26] K. Yatani and K. N. Truong. Bodyscope: a wearable acoustic sensor for activ-ity recognition. In in Proceedings of the 2012 ACM Conference on Ubiquitous Computing (UbiComp), pages 341–350, 2012.

[27] H. Gellersen G. Troster A. Bulling, J. A. Ward. Eye movement analysis for activity recognition using electrooculography. In in Pattern Analysis and Machine Intelligence, IEEE Transactions on, pages 741–753, 2011.

[28] J. Boudy J-L. Baldinger H. Medjahed, D. Istrate and B. Dorizzi. A pervasive multi-sensor data fusion for smart home healthcare monitoring. In in Fuzzy Systems (FUZZ), pages 1466–1473, 2011.

[29] D. Cook N. Krishnan. Activity recognition on streaming sensor data. In in Pervasive Mob. Comput., pages 138–154, 2014.

[30] D. J. Cook. Learning setting-generalized activity models for smart spaces. In in IEEE Intelligent Systems, volume 27, pages 32–38, Jan. - Feb. 2012.

[31] U. Blanke A. Bulling and S. Bernt. A tutorial on human activity recognition using body-worn inertial sensors. In in ACM Computing Surveys, 2014.

[32] R. Davis S. Yale, L.-P. Morency. Action recognition by hierarchical sequence summarization. In in Computer Vision and Pattern Recognition (CVPR),IEEE Conference on, pages 3562–3569, June 2013.

[33] F. Bremond N. Zouba and M. Thonnat. Multisensor fusion for monitoring elderly activities at home. In in Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, pages 98–103, Sept 2009.

[34] C. Park-K. Oh H. J. Choi S. Jin, S.-Y. Jeong. An intelligent multi-sensor surveillance system for elderly care. In in Smart CR, pages 296–307, 2012.

[35] H. Lindgren D. Surie, S. Partonia. Human sensing using computer vision for personalized smart spaces. In Ubiquitous Intelligence and Computing, 2013 IEEE 10th International Conference on and 10th International Conference on Autonomic and Trusted Computing (UIC/ATC), pages 487–494, Dec 2013.

[36] X. Dong H. Jungong, S. Ling and booktitle=Cybernetics, IEEE Transactions on volume= 43 number = 5 year = 2013 pages=1318-1334 month=Oct J. Shotton,

“Enhanced Computer Vision With title = Microsoft Kinect Sensor: A Review.

[37] H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and C. Schmid. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence (tPAMI), 34(9):1704–1716, 2012. ISSN 0162-8828. doi:

10.1109/TPAMI.2011.235.

[38] F. Perronnin, J. Sanchez, and T. Mensink. Improving the fisher kernel for large-scale image classification. In IN: ECCV, 2010.

[39] P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pages 65–72, 2005. doi:

10.1109/VSPETS.2005.1570899.

[40] A. Oikonomopoulos, I. Patras, and M. Pantic. Spatiotemporal salient points for visual recognition of human actions. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 36(3):710–719, 2005. ISSN 1083-4419. doi:

10.1109/TSMCB.2005.861864.

[41] L. Fei-Fei and P. Perona. A bayesian hierarchical model for learning natural scene categories. In IEEE Computer Vision and Pattern Recognition (CVPR), volume 2, pages 524–531 vol. 2, 2005. doi: 10.1109/CVPR.2005.16.

[42] E. Nowak, F. Jurie, and B. Triggs. Sampling strategies for bag-of-features image classification. In IEEE European Conference on Computer Vision (ECCV), pages 490–503. Springer, 2006.

[43] H. Wang and C. Schmid. Action recognition with improved trajectories. In IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 2013.

URLhttp://hal.inria.fr/hal-00873267.

[44] D. Oneata, J. Verbeek, and C. Schmid. Action and event recognition with fisher vectors on a compact feature set. In IEEE International Conference in Computer Vision (ICCV), 2013.

[45] M. Jain, H. Jegou, and P. Bouthemy. Better exploiting motion for better action recognition. In IEEE Conference in Computer Vision and Pattern Recognition (CVPR), 2013.

[46] K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman. The devil is in the details: an evaluation of recent feature encoding methods. In British Machine Vision Conference (BMVC), 2011.

[47] A. Klaser, M. Marsza lek, C. Schmid, and A. Zisserman. Human focused action localization in video. In Kiriakos N. Kutulakos, editor, IEEE European Conference on Computer Vision (ECCV Workshops), volume 6553 of Lecture Notes in Computer Science, pages 219–233. Springer, 2010. ISBN 978-3-642-35748-0.

[48] I. Laptev and P.Perez. Retrieving actions in movies. IEEE International Conference on Computer Vision (ICCV), 2007.

[49] M. Marsza lek, I. Laptev, and C. Schmid. Actions in context. In IEEE Conference on Computer Vision & Pattern Recognition (CVPR), jun 2009. URLhttp://lear.

inrialpes.fr/pubs/2009/MLS09.

[50] J. C. Niebles, C.-W. Chen, , and L. Fei-Fei. Modeling temporal structure of decomposable motion segments for activity classification. In IEEE European Conference of Computer Vision (ECCV), Crete, Greece, September 2010.

[51] G. Johansson. Visual perception of biological motion and a model for its analysis.

Perception and Psychophysics, 14:201–211, 1973.

[52] V. Ferrari, M. Marin-Jimenez, and A. Zisserman. Progressive search space reduction for human pose estimation. In IEEE confernece in Computer Vision and Pattern Recognition (CVPR), 2008.

[53] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from single depth images. In in International Conference in Computer Vision and Pattern Recognition (CVPR), 2011.

[54] K. G. Derpanis, M. Sizintsev, K. Cannons, and R. P. Wildes. Efficient action spot-ting based on a spacetime oriented structure representation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.

[55] S. Sadanand and J. J. Corso. Action bank: A high-level representation of ac-tivity in video. In IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR), 2012. URLhttp://www.cse.buffalo.edu/~jcorso/pubs/jcorso_

CVPR2012_actionbank.pdf.

[56] I. Laptev. On space-time interest points. International Journal on Computer Vision (IJCV), 64:107–123, 2005.

[57] C. Harris and M.J. Stephens. A combined corner and edge detector. In In Alvey Vision Conference, 1988.

[58] M. Bregonzio, S. Gong, and T. Xiang. Recognising action as clouds of space-time interest points. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1948–1955, June 2009. doi: 10.1109/CVPR.2009.5206779.

[59] G. Willems, T. Tuytelaars, and L. Gool. An efficient dense and scale-invariant spatio-temporal interest point detector. In Proceedings of the 10th European Conference on Computer Vision: Part II, IEEE European Conference on Computer Vision (ECCV), pages 650–663, Berlin, Heidelberg, 2008. ISBN 978-3-540-88685-3.

[60] H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid. Evaluation of local spatio-temporal features for action recognition. In IEEE British Machine Vision Conference (BMVC), 2009.

[61] P. Scovanner, S. Ali, and M. Shah. A 3-dimensional sift descriptor and its application to action recognition. In Proceedings of the 15th international conference on Multimedia, MULTIMEDIA ’07, pages 357–360, New York, NY, USA, 2007. ACM.

ISBN 978-1-59593-702-5. doi: 10.1145/1291233.1291311. URLhttp://doi.acm.

org/10.1145/1291233.1291311.

[62] P. Matikainen, M. Hebert, and R. Sukthankar. Trajectons: Action recognition through the motion analysis of tracked features. In International Conference on Computer Vision Workshop on Video-Oriented Object and Event Classification (ICCV Workshop), September 2009.

[63] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In in International Conference on Computer Vision and Pattern Recognition(CVPR), 2006.

[64] M. M. Ullah, S. N. Parizi, and I. Laptev. Improving bag-of-features action recogni-tion with non-local cues. In IEEE British Machine Vision Conference (BMVC), pages 1–11, 2010.

[65] T. Xiang and S. Gong. Video behaviour profiling and abnormality detection without manual labelling. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, volume 2, pages 1238–1245 Vol. 2, Oct 2005. doi:

10.1109/ICCV.2005.248.

[66] T. Xiang and S. Gong. Video behavior profiling for anomaly detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (tPAMI), 30(5):893–908,

May 2008. ISSN 0162-8828. doi: 10.1109/TPAMI.2007.70731.

[67] G. Willems, J. H. Becker, T. Tuytelaars, and L. van Gool. Exemplar-based action recognition in videos. In in British Machine Vision Conference (BMVC), 2009.

[68] Olicier Duchene, Ivan Laptev, Josef Sivic, Francis Bach, and Jean Ponce. Automatic annotation of human action in video. In IEEE Internation Conference in Computer Vision, 2009.

[69] Chao-Yey Chen and Kristen Grauman. Efficient activity detection with max subgraph search. In IEEE conference in Computer Vision and Pattern Recognition (CVPR), 2012.

[70] Tian Lan, Yang Wang, and Greg Mori. Discriminative figure-centric models for joint action localization and recognition. In IEEE International Conference in Computer Vision (ICCV), 2011.

[71] Du Tran and Junsong Yuan. Optimal spatio-temporal path discovery for video event detection. pages 3321–3328. IEEE, 2011.

[72] Shugao Ma, Jianming Zhang, Nazli Ikizler-Cinbis, and Stan Sclaroff. Action recog-nition and localization by hierarchical space-time segments. In IEEE International Conference in Computer Vision (ICCV), 2013.

[73] Junsong Yuan, Zicheng Liu, and Ying Wu. Discriminative subvolume search for efficient action detection. In IEEE conference in Computer Vision and Pattern Recognition (CVPR), 2009.

[74] Dan Oneata, Jacob Verbeek, and Cordelia Schmid. Efficient action localization with approximately normalized fisher vectors. In IEEE Conference in Computer Vision and Pattern Recognition (CVPR), 2014.

[75] D. Weinland, R. Ronfard, and E. Boyer. Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding (CVIU), 2006.

[76] L. Sigal, A. Balan, and M. J. Black. Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion.

In In International Journal of Computer Vision (IJCV), volume 87, 2010.

[77] J. Liu, J. Luo, and M. Shah. Recognizing realistic actions from videos “in the wild”, 2011.

[78] Amir Roshan Zamir Khurram Soomro and Mubarak Shah. Ucf101: A dataset of 101 human action classes from videos in the wild. In CRCV-TR-12-01, Nov 2012.

[79] M. Tenorth and M. Beetz J. Bandouch. The tum kitchen data set of everyday manipulation activities for motion tracking and action recognition. In IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2009.

[80] F. De la Torre, J.Hodgins, J.Montano, S.Valcarcel, and J.Macey. Guide to the carnegie mellon university multimodal activity database. In Tech. Rep., July 2009.

[81] T. de Campos, M. Barnard, K. Mikolajczyk, J. Kittler, F. Yan, W. Christmas, and D. Windridge. An evaluation of bags-of-words and spatio-temporal shapes for action recognition. In in Winter Conference on Applications of Computer Vision (WACV), 2011.

[82] T. Yu and Y. Zhang. Retrieval of video clips using global motion information. 37 (14):893–895, July 2001.

[83] G.B. Rath and A. Makur. Iterative least squares and compression based estimations for a four-parameter linear global motion model and global motion compensation.

IEEE Transactions on Circuits and Systems for Video Technology (CSVT), 9(7):

1075–1099, Oct 1999. ISSN 1051-8215. doi: 10.1109/76.795060.

[84] J.A. Rice. Mathematical Statistics and Data Analysis. Number p. 3 in Advanced series. Brooks/Cole CENGAGE Learning, 2007. ISBN 9780534399429. URL http://books.google.gr/books?id=EKA-yeX2GVgC.

[85] G. Marsaglia, W. W. Tsang, and J. Wang. Evaluating kolmogorov’s distribution.

Journal of Statistical Software, 8(18):1–4, 11 2003. ISSN 1548-7660. URLhttp:

//www.jstatsoft.org/v08/i18.

[86] J. F. Kenney and E. S. Keeping. Mathematics of statistics. Pt. 2, 2nd ed:p. 27, 1951.

[87] A. Briassouli and Y. Kompatsiaris. Robust temporal activity templates using higher order statistics. IEEE Transactions on Image Processing (tIP), 18(12):

2756–2768, 2009. ISSN 1057-7149. doi: 10.1109/TIP.2009.2029595.

[88] J.Y. Bouguet. Pyramidal implementation of the lucas kanade feature tracker. Intel Corporation, Microprocessor Research Labs, 2000.

[89] Alonso Patron-Perez, Marcin Marszalek, Andrew Zisserman, and Ian Reid. High five: Recognising human interactions in tv shows. 2010.

[90] Du Tran and Jungsong Yuan. Max-margin structured output regression for spatio-temporal action localization. In Neural Information Processing Systems (NIPS), 2012.

[91] A. Briassouli Y. Kompatsiaris K. Avgerinakis, K. Adam. Moving camera human activity localization and recognition with motionplanes and multiple homographies.

In submitted to ICIP, 2015.

[92] L. T. DeCarlo. On the meaning and use of kurtosis. In Psychological Methods, volume 2, pages 292–307, Sept. 1997.

[93] I.V. Blagouchine and E. Moreau. Unbiased efficient estimator of the fourth-order cumulant for random zero-mean non-i.i.d. signals: Particular case of ma stochastic process. IEEE Transactions on Information Theory, 56(12):6450–6458, 2010. ISSN 0018-9448. doi: 10.1109/TIT.2010.2078270.

[94] M. Fischler and R. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communication of the ACM, 24(6):381–395, June 1981. ISSN 0001-0782. doi: 10.1145/358669.358692.

URLhttp://doi.acm.org/10.1145/358669.358692.

[95] A. Wald. Sequential tests of statistical hypotheses. The Annals of Mathematical Statistics, 16(2):pp. 117–186, 1945. ISSN 00034851. URLhttp://www.jstor.org/

stable/2235829.

[96] S. Muthukrishnan, E. van den Berg, and Yihua Wu. Sequential change detection on data streams. In ICDM Workshops, pages 551–550, 2007.

[97] V. P. Dragalin. Optimality of a generalized cusum procedure in quickest detection problem. In In statistics and Control of Random Processes: Proceedings of the Steklov Institute of Mathematics, pages 107–120, 1994.

[98] M. Basseville and I. V. Nikiforov. Detection of abrupt changes: Theory and application, 1993.

[99] K. Mardia. Measures of multivariate skewnees and kurtosis with applications.

Biometrika, 57(3):519–530, 1970.

[100] J. Stevens. Applied multivariate statistics for social sciences. 2nd. ed. New-Jersey:Lawrance Erlbaum Associates Publishers, pages 247–248, 1992.

[101] E. S. Page. Continuous inspection scheme. Biometrika, 41:100–115, 1954.

[102] B. Krausz and C. Bauckhage. Loveparade 2010: Automatic video analysis of a crowd disaster. Computer Vision and Image Understanding, 116(3):307–319, 2012.

[103] D. Tax and R. Duin. Support vector data description. Machine Learning, 54(1):

45–66, January 2004. ISSN 0885-6125. doi: 10.1023/B:MACH.0000008084.60811.49.

URLhttp://dx.doi.org/10.1023/B:MACH.0000008084.60811.49.

[104] S. Lloyd. Least squares quantization in pcm. IEEE Transaction Information Theory, 28(2):129–137, September 2006. ISSN 0018-9448. doi: 10.1109/TIT.1982.1056489.

URLhttp://dx.doi.org/10.1109/TIT.1982.1056489.

[105] C. Elkan. Using the triangle inequality to accelerate k-means, 2003.

In document Video processing and background subtraction for change detection and activity recognition. (Page 140-151)