CONCLUSIONS AND FUTURE WORK - A foreground detection system for automatic surveillance

Our method takes a signal decoding perspective, which gives us a nice frame-work to incorporate many of the useful ideas from which previous frame-works on background subtraction have benefited, under a single linear optimization problem. It is useful to recognize key advantages of our method.

Directly optimizing for the error that is sparse lets us employ regional information to make pixel-wise decisions, which in turn yields precise and well defined object boundaries. Our approach to background subtraction is therefore a good match for certain trackers and object detectors which can work directly on background subtracted data.

Another key advantage is that our method is quite forgiving on changes that are coherent with the past in terms of background structure; this is due to the fact that we express the background in the current frame as a linear combination of other frames in the video. This yields an approximate sub-space, where the current background is assumed to have come from. Note that while the previous works have been relatively effective in coping with sudden global illumination changes, they still require a certain amount of gradualness in the illumination change rather than a change that is com-pletely unforeseen, because eventually they rely on learned models for the magnitude of the color (and consequently edge) components, and the respec-tive systems are only as fast as their model’s ability to react to the change in the environment. We believe that our subspace approach combined with

separate modeling of each color channel is more robust against sudden and unforeseen lighting variations because there is no explicit bound on the mag-nitudes of the background bases as long as they produce structurally coherent results with the current composition.

We have also introduced robust methods for building the background dic-tionary, which are more forgiving to practical challenges such as foreground containment in training frames. This, however, comes with a nontrivial com-putational cost, which must be factored in according to the application re-quirements.

Finally, we have introduced an easy way to incorporate different cues for foreground segmentation, by which the most useful cues can be utilized based on the challenges of the target application. As of now our information fusion strategy is static, with weights set at the initialization. In the future we would like to devise a dynamical way of adjusting feature priorities in order to cope with scenarios that present challenges that vary over time.

One of the future challenges is to bring the computational cost of the sys-tem down. The main bottleneck of the algorithm is the linear programming step used for optimizing the `₁ cost function. However, the theory of sparse representation is a rapidly growing field producing alternative methods to the well known algorithms such as basis pursuit and orthogonal matching pur-suit. These alternatives can provide substantial improvement to the overall algorithm both in terms of speed and accuracy.

Another challenge worth pursuing is generalizing the field of application of the algorithm from static cameras to slow moving cameras. With sufficiently weak assumptions, which should apply to many far- to mid-field views of surveillance cameras, the displacement of the static background due to the ego-motion of the camera can be modeled with an affine transformation.

Incorporating this kind of transformation into the optimization framework can pave the way for generalizing the algorithm to subtract background from moving cameras.

Lastly, improving overall accuracy of the system remains another intriguing challenge. While we recognize that our system is quite good at accurately segmenting foreground objects, this performance is achieved by making no assumptions on the shape of the foreground objects. Forcing the foreground pixels to be in connected clusters will definitely yield algorithms more robust to random sensor noise as well as dynamic textures and foreground subjects with clothing similar in color to the background.

REFERENCES

[1] S. Zeki, J. Watson, C. Lueck, K. Friston, C. Kennard, and R. Frack-owiak, “A direct demonstration of functional specialization in human visual cortex,” Journal of Neuroscience, vol. 11, no. 3, pp. 641–649, 1991.

[2] K. Bowyer, “Face recognition technology: Security versus privacy,”

IEEE Technology and Society Magazine, vol. 23, no. 1, pp. 9–19, Spring 2004.

[3] M. Nieto, “Public video surveillance: Is it an effective crime prevention tool,” 1997, California Research Bureau. [Online]. Available:

http://www. library. ca. gov/CRB/97/05

[4] B. C. Welsh and D. P. Farrington, “Effects of closed-circuit television on crime,” The ANNALS of the American Academy of Political and Social Science, vol. 587, no. 1, pp. 110–135, May 2003.

[5] H. Morita, M. Hild, J. Miura, and Y. Shirai, “Panoramic view-based navigation in outdoor environments based on support vector learning,”

in IEEE/RSJ International Conference on Intelligent Robots and Sys-tems, Oct. 2006, pp. 2302–2307.

[6] J. Shi and J. Malik, “Motion segmentation and tracking using normal-ized cuts,” Sixth International Conference on Computer Vision 1998, pp. 1154–1160, Jan. 1998.

[7] M. Irani, B. Rousso, and S. Peleg, “Recovery of ego-motion using im-age stabilization,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 454–460, June 1994.

[8] A. Yilmaz, X. Li, and M. Shah, “Contour-based object tracking with oc-clusion handling in video acquired using mobile cameras,” IEEE Trans-actions on Pattern Analysis and Machine Intelligence, vol. 26, no. 11, pp. 1531–1536, 2004.

[9] P. Fieguth and D. Terzopoulos, “Color-based tracking of heads and other mobile objects at video frame rates,” in IEEE Conference on Computer

Vision and Pattern Recognition. Washington, DC, USA: IEEE Com-puter Society, 1997, p. 21.

[10] A. Ali and J. Aggarwal, “Segmentation and recognition of continuous human activity,” in IEEE Workshop on Detection and Recognition of Events in Video, 2001, pp. 28–35.

[11] J. Davis and A. Bobick, “The representation and recognition of action using temporal templates,” in IEEE Conference on Computer Vision and Pattern Recognition, 1997, pp. 928–934.

[12] D. Makris and T. Ellis, “Learning semantic scene models from observing activity in visual surveillance,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 35, no. 3, pp. 397–408, 2005.

[13] D. Donoho and X. Huo, “Uncertainty principles and ideal atomic decom-position,” IEEE Transactions on Information Theory, vol. 47, no. 7, pp.

2845–2862, Nov. 2001.

[14] E. Candes and T. Tao, “Near-optimal signal recovery from random pro-jections: Universal encoding strategies?” IEEE Transactions on Infor-mation Theory, vol. 52, no. 12, pp. 5406–5425, Dec. 2006.

[15] C. Wren, A. Azarbayejani, T. Darrell, and A. Pentland, “Pfinder: real-time tracking of the human body,” IEEE Transactions on Pattern Anal-ysis and Machine Intelligence, vol. 19, no. 7, pp. 780–785, Jul 1997.

[16] T. Horprasert, D. Harwood, and L. S. Davis, “A statistical approach for real-time robust background subtraction and shadow detection,” in IEEE ICCV99 Frame-Rate Workshop, Sept. 1999, pp. 1–19.

[17] C. Stauffer and W. Grimson, “Adaptive background mixture models for real-time tracking,” in IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 1999, pp. 245–252.

[18] D.-S. Lee, J. Hull, and B. Erol, “A Bayesian framework for Gaussian mixture background modeling,” in IEEE International Conference on Image Processing, vol. 3, Sept. 2003, pp. 973–6.

[19] M. Cristani, M. Bicego, and V. Murino, “Integrated region- and pixel-based approach to background modelling,” in Workshop on Motion and Video Computing, 2002, Dec. 2002, pp. 3–8.

[20] F. Porikli and O. Tuzel, “Human body tracking by adaptive background models and mean-shift analysis,” Mitsubishi Electric Research Labora-tory, Tech. Rep. TR-2003-36, 2003.

[21] M. Harville, “A framework for high-level feedback to adaptive, per-pixel, mixture-of-Gaussian background models,” in 7th European Conference on Computer Vision, vol. 3, London, UK, 2002, pp. 543–560.

[22] O. Javed, K. Shafique, and M. Shah, “A hierarchical approach to robust background subtraction using color and gradient information,” IEEE Workshop on Motion and Video Computing, vol. 75, p. 22, 2002.

[23] A. Mittal and N. Paragios, “Motion-based background subtraction using adaptive kernel density estimation,” in IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, June 2004, pp. 302–309.

[24] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: Prin-ciples and practice of background maintenance,” IEEE International Conference on Computer Vision, vol. 1, p. 255, 1999.

[25] A. M. Elgammal, D. Harwood, and L. S. Davis, “Non-parametric model for background subtraction,” in 6th European Conference on Computer Vision, vol. 2, 2000, pp. 751–767.

[26] K. Kim, T. H. Chalidabhongse, D. Harwood, and L. Davis, “Real-time foreground-background segmentation using codebook model,” Real-Time Imaging: Special Issue on Video Object Processing, vol. 11, no. 3, pp. 172–185, 2005.

[27] A. Prati, R. Cucchiara, I. Mikic, and M. Trivedi, “Analysis and detec-tion of shadows in video streams: a comparative evaluadetec-tion,” in IEEE Conference Computer Vision and Pattern Recognition, vol. 2, 2001, pp.

571–576.

[28] E. Candes and T. Tao, “Decoding by linear programming,” IEEE Trans-actions on Information Theory, vol. 51, no. 12, pp. 4203–4215, 2005.

[29] D. Donoho, “For most large underdetermined systems of linear equations the minimal `₁-norm solution is also the sparsest solution,” Communi-cations on Pure and Applied Mathematics, vol. 59, no. 6, pp. 797–829, 2006.

[30] D. Donoho, “For most large underdetermined systems of equations, the minimal `₁-norm near-solution approximates the sparsest near-solution,”

Communications on Pure and Applied Mathematics, vol. 59, no. 7, pp.

907–934, 2006.

[31] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, Feb.

2009.

[32] J. Starck, M. Elad, and D. Donoho, “Image decomposition via the com-bination of sparse representations and a variational approach,” IEEE Transactions on Image Processing, vol. 14, no. 10, pp. 1570–1582, 2005.

[33] B. Olshausen, P. Sallee, and M. Lewicki, “Learning sparse image codes using a wavelet pyramid architecture,” Advances in Neural Information Processing Systems, pp. 887–893, 2001.

[34] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Transactions on Image Processing, vol. 15, no. 12, pp. 3736–3745, Dec. 2006.

[35] K. Huang and S. Aviyente, “Sparse representation for signal classifica-tion,” Advances in Neural Information Processing Systems, vol. 19, pp.

609–617, 2007.

[36] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: Design of dictionaries for sparse representation,” in Proceedings of SPARS05, 2005, pp. 9–12.

[37] S. Lloyd, “Least squares quantization in PCM,” IEEE Transactions on Information Theory, vol. 28, no. 2, pp. 129–137, Mar. 1982.

[38] J. Tropp and A. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Transactions on Information Theory, vol. 53, no. 12, pp. 4655–4666, Dec. 2007.

[39] F. De La Torre and M. J. Black, “A framework for robust subspace learning,” International Journal of Computer Vision, vol. 54, pp. 117–

142, 2003.

[40] M. Dikmen and T. Huang, “Robust estimation of foreground in surveil-lance videos by sparse error estimation,” in 19th International Confer-ence on Pattern Recognition, Dec. 2008, pp. 1–4.

[41] A. T. Nghiem, F. Bremond, M. Thonnat, and V. Valentin, “ETISEO, performance evaluation for video surveillance systems,” in IEEE Con-ference on Advanced Video and Signal Based Surveillance, 2007, pp.

476–481.

[42] L. Li, W. Huang, I. Y.-H. Gu, and Q. Tian, “Statistical modeling of com-plex backgrounds for foreground object detection,” IEEE Transactions on Image Processing, vol. 13, no. 11, pp. 1459–1472, Nov. 2004.

In document A foreground detection system for automatic surveillance (Page 42-49)